docs: post-PR-7.2 cleanup — audit + three-track scrub
Audit (three parallel agent passes) found 43 markdown files carrying stale references to the deleted Galaxy.Host/Proxy/Shared projects after the v2-mxgw merge. This commit lands the prioritized fixes. Track 1 — high-traffic in-place rewrites (3 files, ~454 lines deleted) - README.md (202 → 91 lines): drops .NET 4.8 / x86 / TopShelf install text; leads with the multi-driver .NET 10 server identity and points at scripts/install/Install-Services.ps1 and the parity rig. - docs/v2/driver-specs.md §1 Galaxy (~289 → ~66 lines): replaces the Tier-C out-of-process spec with a Tier-A in-process description matching the current GalaxyDriver code, with the four-section GalaxyDriverOptions JSON shape pulled verbatim from Config/GalaxyDriverOptions.cs. - docs/drivers/Galaxy.md (211 → 92 lines): full rewrite around the current Browse/Runtime/Health/Config sub-folders. Track 2 — historical banners (5 files) - lmx_mxgw.md, lmx_mxgw_impl.md, lmx_backend.md, docs/v2/Galaxy.ParityMatrix.md, docs/v2/implementation/phase-2-galaxy-out-of-process.md each get a "✅ Completed 2026-04-30 — historical record" banner block. lmx_mxgw.md also fixes two dead links (`docs/Galaxy.Driver.md` and `docs/v2/Galaxy.Driver.md`) → `docs/drivers/Galaxy.md`. Track 3 — v1 archive sweep (10 git mv + 1 new index + 2 in-place scrubs) - Moved 10 v1 docs under docs/v1/ preserving subpath structure: AlarmTracking, Configuration, DataTypeMapping, HistoricalDataAccess, Subscriptions (top-level); drivers/Galaxy-Repository, drivers/Galaxy-Test-Fixture; reqs/GalaxyRepositoryReqs, reqs/MxAccessClientReqs, reqs/ServiceHostReqs. - New docs/v1/README.md is the shared archive banner + per-file table. - docs/README.md repointed to the v1 paths and updated to reflect the v2 two-process deploy shape (Server + Admin + optional OtOpcUaWonderwareHistorian). - docs/v2/Galaxy.ParityRig.md got a historical banner + four inline scrubs marking the OtOpcUaGalaxyHost service / Driver.Galaxy.Host EXE / Driver.Galaxy.ParityTests project as deleted-in-PR-7.2. The repo's live-reading surface (README + CLAUDE.md + docs/v2/) now describes only the post-PR-7.2 architecture. v1 docs are preserved as a labelled archive under docs/v1/. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
128
docs/v1/AlarmTracking.md
Normal file
128
docs/v1/AlarmTracking.md
Normal file
@@ -0,0 +1,128 @@
|
||||
# Alarm Tracking
|
||||
|
||||
Alarm surfacing is an optional driver capability exposed via `IAlarmSource` (`src/ZB.MOM.WW.OtOpcUa.Core.Abstractions/IAlarmSource.cs`). Drivers whose backends have an alarm concept implement it — today: Galaxy (MXAccess alarms), FOCAS (CNC alarms), OPC UA Client (A&C events from the upstream server). Modbus / S7 / AB CIP / AB Legacy / TwinCAT do not implement the interface and the feature is simply absent from their subtrees.
|
||||
|
||||
## IAlarmSource surface
|
||||
|
||||
```csharp
|
||||
Task<IAlarmSubscriptionHandle> SubscribeAlarmsAsync(
|
||||
IReadOnlyList<string> sourceNodeIds, CancellationToken cancellationToken);
|
||||
Task UnsubscribeAlarmsAsync(IAlarmSubscriptionHandle handle, CancellationToken cancellationToken);
|
||||
Task AcknowledgeAsync(IReadOnlyList<AlarmAcknowledgeRequest> acknowledgements,
|
||||
CancellationToken cancellationToken);
|
||||
event EventHandler<AlarmEventArgs>? OnAlarmEvent;
|
||||
```
|
||||
|
||||
The driver fires `OnAlarmEvent` for every transition (`Active`, `Acknowledged`, `Inactive`) with an `AlarmEventArgs` carrying the source node id, condition id, alarm type, message, severity (`AlarmSeverity` enum), and source timestamp.
|
||||
|
||||
## AlarmSurfaceInvoker
|
||||
|
||||
`AlarmSurfaceInvoker` (`src/ZB.MOM.WW.OtOpcUa.Core/Resilience/AlarmSurfaceInvoker.cs`) wraps the three mutating surfaces through `CapabilityInvoker`:
|
||||
|
||||
- `SubscribeAlarmsAsync` / `UnsubscribeAlarmsAsync` run through the `DriverCapability.AlarmSubscribe` pipeline — retries apply under the tier configuration.
|
||||
- `AcknowledgeAsync` runs through `DriverCapability.AlarmAcknowledge` which does NOT retry per decision #143. A timed-out ack may have already registered at the plant floor; replay would silently double-acknowledge.
|
||||
|
||||
Multi-host fan-out: when the driver implements `IPerCallHostResolver`, each source node id is resolved individually and batches are grouped by host so a dead PLC inside a multi-device driver doesn't poison sibling breakers. Single-host drivers fall back to `IDriver.DriverInstanceId` as the pipeline-key host.
|
||||
|
||||
## Condition-node creation via CapturingBuilder
|
||||
|
||||
Alarm-condition nodes are materialized at address-space build time. During `GenericDriverNodeManager.BuildAddressSpaceAsync` the builder is wrapped in a `CapturingBuilder` that observes every `Variable()` call. When a driver calls `IVariableHandle.MarkAsAlarmCondition(AlarmConditionInfo)` on a returned handle, the server-side `DriverNodeManager.VariableHandle` creates a sibling `AlarmConditionState` node and returns an `IAlarmConditionSink`. The wrapper stores the sink in `_alarmSinks` keyed by the variable's full reference, then `GenericDriverNodeManager` registers a forwarder on `IAlarmSource.OnAlarmEvent` that routes each push to the matching sink by `SourceNodeId`. Unknown source ids are dropped silently — they may belong to another driver.
|
||||
|
||||
The `AlarmConditionState` layout matches OPC UA Part 9:
|
||||
|
||||
- `SourceNode` → the originating variable
|
||||
- `SourceName` / `ConditionName` → from `AlarmConditionInfo.SourceName`
|
||||
- Initial state: enabled, inactive, acknowledged, severity per `InitialSeverity`, retain false
|
||||
- `HasCondition` references wire the source variable ↔ the condition node bidirectionally
|
||||
|
||||
Drivers flag alarm-bearing variables at discovery time via `DriverAttributeInfo.IsAlarm = true`. The Galaxy driver, for example, sets this on attributes that have an `AlarmExtension` primitive in the Galaxy repository DB; FOCAS sets it on the CNC alarm register.
|
||||
|
||||
## State transitions
|
||||
|
||||
`ConditionSink.OnTransition` runs under the node manager's `Lock` and maps the `AlarmEventArgs.AlarmType` string to Part 9 state:
|
||||
|
||||
| AlarmType | Action |
|
||||
|---|---|
|
||||
| `Active` | `SetActiveState(true)`, `SetAcknowledgedState(false)`, `Retain = true` |
|
||||
| `Acknowledged` | `SetAcknowledgedState(true)` |
|
||||
| `Inactive` | `SetActiveState(false)`; `Retain = false` once both inactive and acknowledged |
|
||||
|
||||
Severity is remapped: `AlarmSeverity.Low/Medium/High/Critical` → OPC UA numeric 250 / 500 / 700 / 900. `Message.Value` is set from `AlarmEventArgs.Message` on every transition. `ClearChangeMasks(true)` and `ReportEvent(condition)` fire the OPC UA event notification for clients subscribed to any ancestor notifier.
|
||||
|
||||
## Acknowledge dispatch
|
||||
|
||||
Alarm acknowledgement initiated by an OPC UA client flows:
|
||||
|
||||
1. The SDK invokes the `AlarmConditionState.OnAcknowledge` method delegate.
|
||||
2. The handler checks the session's roles for `AlarmAck` — drivers never see a request the session wasn't entitled to make.
|
||||
3. `AlarmSurfaceInvoker.AcknowledgeAsync` is called with the source / condition / comment tuple. The invoker groups by host and runs each batch through the no-retry `AlarmAcknowledge` pipeline.
|
||||
|
||||
Drivers return normally for success or throw to signal the ack failed at the backend.
|
||||
|
||||
## EventNotifier propagation
|
||||
|
||||
Drivers that want hierarchical alarm subscriptions propagate `EventNotifier.SubscribeToEvents` up the containment chain during discovery — the Galaxy driver flips the flag on every ancestor of an alarm-bearing object up to the driver root, mirroring v1 behavior. Clients subscribed at the driver root, a mid-level folder, or the `Objects/` root see alarm events from every descendant with an `AlarmConditionState` sibling. The driver-root `FolderState` is created in `DriverNodeManager.CreateAddressSpace` with `EventNotifier = SubscribeToEvents | HistoryRead` so alarm event subscriptions and alarm history both have a single natural target.
|
||||
|
||||
## ConditionRefresh
|
||||
|
||||
The OPC UA `ConditionRefresh` service queues the current state of every retained condition back to the requesting monitored items. `DriverNodeManager` iterates the node manager's `AlarmConditionState` collection and queues each condition whose `Retain.Value == true` — matching the Part 9 requirement.
|
||||
|
||||
## Alarm historian sink
|
||||
|
||||
Distinct from the live `IAlarmSource` stream and the Part 9 `AlarmConditionState` materialization above, qualifying alarm transitions are **also** persisted to a durable event log for downstream AVEVA Historian ingestion. This is a separate subsystem from the `IHistoryProvider` capability used by `HistoryReadEvents` (see [HistoricalDataAccess.md](HistoricalDataAccess.md#alarm-event-history-vs-ihistoryprovider)): the sink is a *producer* path (server → Historian) that runs independently of any client HistoryRead call.
|
||||
|
||||
### `IAlarmHistorianSink`
|
||||
|
||||
`src/ZB.MOM.WW.OtOpcUa.Core.AlarmHistorian/IAlarmHistorianSink.cs` defines the intake contract:
|
||||
|
||||
```csharp
|
||||
Task EnqueueAsync(AlarmHistorianEvent evt, CancellationToken cancellationToken);
|
||||
HistorianSinkStatus GetStatus();
|
||||
```
|
||||
|
||||
`EnqueueAsync` is fire-and-forget from the producer's perspective — it must never block the emitting thread. The event payload (`AlarmHistorianEvent` — same file) is source-agnostic: `AlarmId`, `EquipmentPath`, `AlarmName`, `AlarmTypeName` (Part 9 subtype name), `Severity`, `EventKind` (free-form transition string — `Activated` / `Cleared` / `Acknowledged` / `Confirmed` / `Shelved` / …), `Message`, `User`, `Comment`, `TimestampUtc`.
|
||||
|
||||
The sink scope is defined to span every alarm source (plan decision #15: scripted, Galaxy-native, AB CIP ALMD, any future `IAlarmSource`), gated per-alarm by a `HistorizeToAveva` toggle on the producer. Today only `Phase7EngineComposer.RouteToHistorianAsync` (`src/ZB.MOM.WW.OtOpcUa.Server/Phase7/Phase7EngineComposer.cs`) is wired — it subscribes to `ScriptedAlarmEngine.OnEvent` and marshals each emission into `AlarmHistorianEvent`. Galaxy-native alarms continue to reach AVEVA Historian via the driver's direct `aahClientManaged` path and do not flow through the sink; the AB CIP ALMD path remains unwired pending a producer-side integration.
|
||||
|
||||
### `SqliteStoreAndForwardSink`
|
||||
|
||||
Default production implementation (`src/ZB.MOM.WW.OtOpcUa.Core.AlarmHistorian/SqliteStoreAndForwardSink.cs`). A local SQLite queue absorbs every `EnqueueAsync` synchronously; a background `Timer` drains batches asynchronously to an `IAlarmHistorianWriter` so operator actions are never blocked on historian reachability.
|
||||
|
||||
Queue schema (single table `Queue`): `RowId PK autoincrement`, `AlarmId`, `EnqueuedUtc`, `PayloadJson` (serialized `AlarmHistorianEvent`), `AttemptCount`, `LastAttemptUtc`, `LastError`, `DeadLettered` (bool), plus `IX_Queue_Drain (DeadLettered, RowId)`. Default capacity `1_000_000` non-dead-lettered rows; oldest rows evict with a WARN log past the cap.
|
||||
|
||||
Drain cadence: `StartDrainLoop(tickInterval)` arms a periodic timer. `DrainOnceAsync` reads up to `batchSize` rows (default 100) in `RowId` order and forwards them through `IAlarmHistorianWriter.WriteBatchAsync`, which returns one `HistorianWriteOutcome` per row:
|
||||
|
||||
| Outcome | Action |
|
||||
|---|---|
|
||||
| `Ack` | Row deleted. |
|
||||
| `PermanentFail` | Row flipped to `DeadLettered = 1` with reason. Peers in the batch retry independently. |
|
||||
| `RetryPlease` | `AttemptCount` bumped; row stays queued. Drain worker enters `BackingOff`. |
|
||||
|
||||
Writer-side exceptions treat the whole batch as `RetryPlease`.
|
||||
|
||||
Backoff ladder on `RetryPlease` (hard-coded): 1s → 2s → 5s → 15s → 60s cap. Reset to 0 on any batch with no retries. `CurrentBackoff` exposes the current step for instrumentation; the drain timer itself fires on `tickInterval`, so the ladder governs write cadence rather than timer period.
|
||||
|
||||
Dead-letter retention defaults to 30 days (plan decision #21). `PurgeAgedDeadLetters` runs each drain pass and deletes rows whose `LastAttemptUtc` is past the cutoff. `RetryDeadLettered()` is an operator action that clears `DeadLettered` + resets `AttemptCount` on every dead-lettered row so they rejoin the main queue.
|
||||
|
||||
### Composition and writer resolution
|
||||
|
||||
`Phase7Composer.ResolveHistorianSink` (`src/ZB.MOM.WW.OtOpcUa.Server/Phase7/Phase7Composer.cs`) scans the registered drivers for one that implements `IAlarmHistorianWriter`. Today that is `GalaxyProxyDriver` via `GalaxyHistorianWriter` (`src/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Proxy/Ipc/GalaxyHistorianWriter.cs`), which forwards batches over the Galaxy.Host pipe to the `aahClientManaged` alarm schema. When a writer is found, a `SqliteStoreAndForwardSink` is instantiated against `%ProgramData%/OtOpcUa/alarm-historian-queue.db` with a 2 s drain tick and the writer attached. When no driver provides a writer the fallback is the DI-registered `NullAlarmHistorianSink` (`src/ZB.MOM.WW.OtOpcUa.Server/Program.cs`), which silently discards and reports `HistorianDrainState.Disabled`.
|
||||
|
||||
### Status and observability
|
||||
|
||||
`GetStatus()` returns `HistorianSinkStatus(QueueDepth, DeadLetterDepth, LastDrainUtc, LastSuccessUtc, LastError, DrainState)` — two `COUNT(*)` scalars plus last-drain telemetry. `DrainState` is one of `Disabled` / `Idle` / `Draining` / `BackingOff`.
|
||||
|
||||
The Admin UI `/alarms/historian` page surfaces this through `HistorianDiagnosticsService` (`src/ZB.MOM.WW.OtOpcUa.Admin/Services/HistorianDiagnosticsService.cs`), which also exposes `TryRetryDeadLettered` — it calls through to `SqliteStoreAndForwardSink.RetryDeadLettered` when the live sink is the SQLite implementation and returns 0 otherwise.
|
||||
|
||||
## Key source files
|
||||
|
||||
- `src/ZB.MOM.WW.OtOpcUa.Core.Abstractions/IAlarmSource.cs` — capability contract + `AlarmEventArgs`
|
||||
- `src/ZB.MOM.WW.OtOpcUa.Core/Resilience/AlarmSurfaceInvoker.cs` — per-host fan-out + no-retry ack
|
||||
- `src/ZB.MOM.WW.OtOpcUa.Core/OpcUa/GenericDriverNodeManager.cs` — `CapturingBuilder` + alarm forwarder
|
||||
- `src/ZB.MOM.WW.OtOpcUa.Server/OpcUa/DriverNodeManager.cs` — `VariableHandle.MarkAsAlarmCondition` + `ConditionSink`
|
||||
- `src/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Host/Backend/Alarms/GalaxyAlarmTracker.cs` — Galaxy-specific alarm-event production
|
||||
- `src/ZB.MOM.WW.OtOpcUa.Core.AlarmHistorian/IAlarmHistorianSink.cs` — historian sink intake contract + `AlarmHistorianEvent` + `HistorianSinkStatus` + `IAlarmHistorianWriter`
|
||||
- `src/ZB.MOM.WW.OtOpcUa.Core.AlarmHistorian/SqliteStoreAndForwardSink.cs` — durable queue + drain worker + backoff ladder + dead-letter retention
|
||||
- `src/ZB.MOM.WW.OtOpcUa.Server/Phase7/Phase7EngineComposer.cs` — `RouteToHistorianAsync` wires scripted-alarm emissions into the sink
|
||||
- `src/ZB.MOM.WW.OtOpcUa.Server/Phase7/Phase7Composer.cs` — `ResolveHistorianSink` selects `SqliteStoreAndForwardSink` vs `NullAlarmHistorianSink`
|
||||
- `src/ZB.MOM.WW.OtOpcUa.Admin/Services/HistorianDiagnosticsService.cs` — Admin UI `/alarms/historian` status + retry-dead-lettered operator action
|
||||
141
docs/v1/Configuration.md
Normal file
141
docs/v1/Configuration.md
Normal file
@@ -0,0 +1,141 @@
|
||||
# Configuration
|
||||
|
||||
## Two-layer model
|
||||
|
||||
OtOpcUa configuration is split into two layers:
|
||||
|
||||
| Layer | Where | Scope | Edited by |
|
||||
|---|---|---|---|
|
||||
| **Bootstrap** | `appsettings.json` per process | Enough to start the process and reach the Config DB | Local file edit + process restart |
|
||||
| **Authoritative config** | Config DB (SQL Server) via `OtOpcUaConfigDbContext` | Clusters, namespaces, UNS hierarchy, equipment, tags, driver instances, ACLs, role grants, poll groups | Admin UI draft/publish workflow |
|
||||
|
||||
The rule: if the setting describes *how the process connects to the rest of the world* (Config DB connection string, LDAP bind, transport security profile, node identity, logging), it lives in `appsettings.json`. If it describes *what the fleet does* (clusters, drivers, tags, UNS, ACLs), it lives in the Config DB and is edited through the Admin UI.
|
||||
|
||||
---
|
||||
|
||||
## Bootstrap configuration (`appsettings.json`)
|
||||
|
||||
Each of the three processes (Server, Admin, Galaxy.Host) reads its own `appsettings.json` plus environment overrides.
|
||||
|
||||
### OtOpcUa Server — `src/ZB.MOM.WW.OtOpcUa.Server/appsettings.json`
|
||||
|
||||
Bootstrap-only. `Program.cs` reads four top-level sections:
|
||||
|
||||
| Section | Keys | Purpose |
|
||||
|---|---|---|
|
||||
| `Node` | `NodeId`, `ClusterId`, `ConfigDbConnectionString`, `LocalCachePath` | Identity + path to the Config DB + LiteDB offline cache path. |
|
||||
| `OpcUaServer` | `EndpointUrl`, `ApplicationName`, `ApplicationUri`, `PkiStoreRoot`, `AutoAcceptUntrustedClientCertificates`, `SecurityProfile` | OPC UA endpoint + transport security. See [`security.md`](security.md). |
|
||||
| `OpcUaServer:Ldap` | `Enabled`, `Server`, `Port`, `UseTls`, `AllowInsecureLdap`, `SearchBase`, `ServiceAccountDn`, `ServiceAccountPassword`, `GroupToRole`, `UserNameAttribute`, `GroupAttribute` | LDAP auth for OPC UA UserName tokens. See [`security.md`](security.md). |
|
||||
| `Serilog` | Standard Serilog keys + `WriteJson` bool | Logging verbosity + optional JSON file sink for SIEM ingest. |
|
||||
| `Authorization` | `StrictMode` (bool) | Flip `true` to fail-closed on sessions lacking LDAP group metadata. Default false during ACL rollouts. |
|
||||
| `Metrics:Prometheus:Enabled` | bool | Toggles the `/metrics` endpoint. |
|
||||
|
||||
Minimal example:
|
||||
|
||||
```json
|
||||
{
|
||||
"Serilog": { "MinimumLevel": "Information" },
|
||||
"Node": {
|
||||
"NodeId": "node-dev-a",
|
||||
"ClusterId": "cluster-dev",
|
||||
"ConfigDbConnectionString": "Server=localhost,14330;Database=OtOpcUaConfig;Integrated Security=True;TrustServerCertificate=True;Encrypt=False;",
|
||||
"LocalCachePath": "config_cache.db"
|
||||
},
|
||||
"OpcUaServer": {
|
||||
"EndpointUrl": "opc.tcp://0.0.0.0:4840/OtOpcUa",
|
||||
"ApplicationUri": "urn:node-dev-a:OtOpcUa",
|
||||
"SecurityProfile": "None",
|
||||
"AutoAcceptUntrustedClientCertificates": true,
|
||||
"Ldap": { "Enabled": false }
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### OtOpcUa Admin — `src/ZB.MOM.WW.OtOpcUa.Admin/appsettings.json`
|
||||
|
||||
| Section | Purpose |
|
||||
|---|---|
|
||||
| `ConnectionStrings:ConfigDb` | SQL connection string — must point at the same Config DB every Server reaches. |
|
||||
| `Authentication:Ldap` | LDAP bind for the Admin login form (same options shape as the Server's `OpcUaServer:Ldap`). |
|
||||
| `CertTrust` | `CertTrustOptions` — file-system path under the Server's `PkiStoreRoot` so the Admin Certificates page can promote rejected client certs. |
|
||||
| `Metrics:Prometheus:Enabled` | Toggles the `/metrics` scrape endpoint (default true). |
|
||||
| `Serilog` | Logging. |
|
||||
|
||||
### Galaxy.Host
|
||||
|
||||
Environment-variable driven (`OTOPCUA_GALAXY_PIPE`, `OTOPCUA_ALLOWED_SID`, `OTOPCUA_GALAXY_SECRET`, `OTOPCUA_GALAXY_BACKEND`, `OTOPCUA_GALAXY_ZB_CONN`, `OTOPCUA_HISTORIAN_*`). No `appsettings.json` — the supervisor owns the launch environment. See [`ServiceHosting.md`](ServiceHosting.md#galaxyhost-process).
|
||||
|
||||
### Environment overrides
|
||||
|
||||
Standard .NET config layering applies: `appsettings.{Environment}.json`, then environment variables with `Section__Property` naming. `DOTNET_ENVIRONMENT` (or `ASPNETCORE_ENVIRONMENT` for Admin) selects the overlay.
|
||||
|
||||
---
|
||||
|
||||
## Authoritative configuration (Config DB)
|
||||
|
||||
The Config DB is the single source of truth for every setting that a v1 deployment used to carry in `appsettings.json` as driver-specific state. `OtOpcUaConfigDbContext` (`src/ZB.MOM.WW.OtOpcUa.Configuration/OtOpcUaConfigDbContext.cs`) is the EF Core context used by both the Admin writer and every Server reader.
|
||||
|
||||
### Top-level sections operators touch
|
||||
|
||||
| Concept | Entity | Admin UI surface | Purpose |
|
||||
|---|---|---|---|
|
||||
| Cluster | `ServerCluster` | Clusters pages | Fleet unit; owns nodes, generations, UNS, ACLs. |
|
||||
| Cluster node | `ClusterNode` + `ClusterNodeCredential` | RedundancyTab, Hosts page | Per-node identity, `RedundancyRole`, `ServiceLevelBase`, ApplicationUri, service-account credentials. |
|
||||
| Generation | `ConfigGeneration` + `ClusterNodeGenerationState` | Generations / DiffViewer | Append-only; draft → publish workflow (`sp_PublishGeneration`). |
|
||||
| Namespace | `Namespace` | Namespaces tab | Per-cluster OPC UA namespace; `Kind` = Equipment / SystemPlatform / Simulated. |
|
||||
| Driver instance | `DriverInstance` | Drivers tab | Configured driver (Modbus, S7, OpcUaClient, Galaxy, …) + `DriverConfig` JSON + resilience profile. |
|
||||
| Device | `Device` | Under each driver instance | Per-host settings inside a driver instance (IP, port, unit-id…). |
|
||||
| UNS hierarchy | `UnsArea` + `UnsLine` | UnsTab (drag/drop) | L3 / L4 of the unified namespace. |
|
||||
| Equipment | `Equipment` | Equipment pages, CSV import | L5; carries `MachineCode`, `ZTag`, `SAPID`, `EquipmentUuid`, reservation-backed external ids. |
|
||||
| Tag | `Tag` | Under each equipment | Driver-specific tag address + `SecurityClassification` + poll-group assignment. |
|
||||
| Poll group | `PollGroup` | Driver-scoped | Poll cadence buckets; `PollGroupEngine` in Core.Abstractions uses this at runtime. |
|
||||
| ACL | `NodeAcl` | AclsTab + Probe dialog | Per-level permission grants, additive only. See [`security.md`](security.md#data-plane-authorization). |
|
||||
| Role grant | `LdapGroupRoleMapping` | RoleGrants page | Maps LDAP groups → Admin roles (`ConfigViewer` / `ConfigEditor` / `FleetAdmin`). |
|
||||
| External id reservation | `ExternalIdReservation` | Reservations page | Reservation-backed `ZTag` and `SAPID` uniqueness. |
|
||||
| Equipment import batch | `EquipmentImportBatch` | CSV import flow | Staged bulk-add with validation preview. |
|
||||
| Audit log | `ConfigAuditLog` | Audit page | Append-only record of every publish, rollback, credential rotation, role-grant change. |
|
||||
|
||||
### Draft → publish generation model
|
||||
|
||||
All edits go into a **draft** generation scoped to one cluster. `DraftValidationService` checks invariants (same-cluster FKs, reservation collisions, UNS path consistency, ACL scope validity). When the operator clicks Publish, `sp_PublishGeneration` atomically promotes the draft, records the audit event, and causes every `RedundancyCoordinator.RefreshAsync` in the affected cluster to pick up the new topology + ACL set. The Admin UI `DiffViewer` shows exactly what's changing before publish.
|
||||
|
||||
Old generations are retained; rollback is "publish older generation as new". `ConfigAuditLog` makes every change auditable by principal + timestamp.
|
||||
|
||||
### Offline cache
|
||||
|
||||
Each Server process caches the last-seen published generation in `Node:LocalCachePath` via LiteDB (`LiteDbConfigCache` in `src/ZB.MOM.WW.OtOpcUa.Configuration/LocalCache/`). The cache lets a node start without the central DB reachable; once the DB comes back, `NodeBootstrap` syncs to the current generation.
|
||||
|
||||
### Full schema reference
|
||||
|
||||
For table columns, indexes, stored procedures, the publish-transaction semantics, and the SQL authorization model (per-node SQL principals + `SESSION_CONTEXT` cluster binding), see [`docs/v2/config-db-schema.md`](v2/config-db-schema.md).
|
||||
|
||||
### Admin UI flow
|
||||
|
||||
For the draft editor, DiffViewer, CSV import, IdentificationFields, RedundancyTab, AclsTab + Probe-this-permission, RoleGrants, and the SignalR real-time surface, see [`docs/v2/admin-ui.md`](v2/admin-ui.md).
|
||||
|
||||
---
|
||||
|
||||
## Where did v1 appsettings sections go?
|
||||
|
||||
Quick index for operators coming from v1 LmxOpcUa:
|
||||
|
||||
| v1 appsettings section | v2 home |
|
||||
|---|---|
|
||||
| `OpcUa.Port` / `BindAddress` / `EndpointPath` / `ServerName` | Bootstrap `OpcUaServer:EndpointUrl` + `ApplicationName`. |
|
||||
| `OpcUa.ApplicationUri` | Config DB `ClusterNode.ApplicationUri`. |
|
||||
| `OpcUa.MaxSessions` / `SessionTimeoutMinutes` | Bootstrap `OpcUaServer:*` (if exposed) or stack defaults. |
|
||||
| `OpcUa.AlarmTrackingEnabled` / `AlarmFilter` | Per driver instance in Config DB (alarm surface is capability-driven per `IAlarmSource`). |
|
||||
| `MxAccess.*` | Galaxy driver instance `DriverConfig` JSON + Galaxy.Host env vars (see [`ServiceHosting.md`](ServiceHosting.md#galaxyhost-process)). |
|
||||
| `GalaxyRepository.*` | Galaxy driver instance `DriverConfig` JSON + `OTOPCUA_GALAXY_ZB_CONN` env var. |
|
||||
| `Dashboard.*` | Retired — Admin UI replaces the dashboard. See [`StatusDashboard.md`](StatusDashboard.md). |
|
||||
| `Historian.*` | Galaxy driver instance `DriverConfig` JSON + `OTOPCUA_HISTORIAN_*` env vars. |
|
||||
| `Authentication.Ldap.*` | Bootstrap `OpcUaServer:Ldap` (same shape) + Admin `Authentication:Ldap` for the UI login. |
|
||||
| `Security.*` | Bootstrap `OpcUaServer:SecurityProfile` + `PkiStoreRoot` + `AutoAcceptUntrustedClientCertificates`. |
|
||||
| `Redundancy.*` | Config DB `ClusterNode.RedundancyRole` + `ServiceLevelBase`. |
|
||||
|
||||
---
|
||||
|
||||
## Validation
|
||||
|
||||
- **Bootstrap**: the process fails fast on missing required keys in `Program.cs` (e.g. `Node:NodeId`, `Node:ClusterId`, `Node:ConfigDbConnectionString` all throw `InvalidOperationException` if unset).
|
||||
- **Authoritative**: `DraftValidationService` runs on every save; `sp_ValidateDraft` runs as part of `sp_PublishGeneration` so an invalid draft cannot reach any node.
|
||||
65
docs/v1/DataTypeMapping.md
Normal file
65
docs/v1/DataTypeMapping.md
Normal file
@@ -0,0 +1,65 @@
|
||||
# Data Type Mapping
|
||||
|
||||
Data-type mapping is driver-defined. Each driver translates its native attribute metadata into two driver-agnostic enums from `Core.Abstractions` — `DriverDataType` (`src/ZB.MOM.WW.OtOpcUa.Core.Abstractions/DriverDataType.cs`) and `SecurityClassification` (`src/ZB.MOM.WW.OtOpcUa.Core.Abstractions/SecurityClassification.cs`) — and populates the `DriverAttributeInfo` record it hands to `IAddressSpaceBuilder.Variable(...)`. Core doesn't interpret the native types; it trusts the driver's translation.
|
||||
|
||||
## DriverDataType → OPC UA built-in type
|
||||
|
||||
`DriverNodeManager.MapDataType` (`src/ZB.MOM.WW.OtOpcUa.Server/OpcUa/DriverNodeManager.cs`) is the single translation table for every driver:
|
||||
|
||||
| DriverDataType | OPC UA NodeId |
|
||||
|---|---|
|
||||
| `Boolean` | `DataTypeIds.Boolean` (i=1) |
|
||||
| `Int32` | `DataTypeIds.Int32` (i=6) |
|
||||
| `Float32` | `DataTypeIds.Float` (i=10) |
|
||||
| `Float64` | `DataTypeIds.Double` (i=11) |
|
||||
| `String` | `DataTypeIds.String` (i=12) |
|
||||
| `DateTime` | `DataTypeIds.DateTime` (i=13) |
|
||||
| anything else | `DataTypeIds.BaseDataType` |
|
||||
|
||||
The enum also carries `Int16 / Int64 / UInt16 / UInt32 / UInt64 / Reference` members for drivers that need them; the mapping table is extended as those types surface in actual drivers. `Reference` is the Galaxy-style attribute reference — it's encoded as an OPC UA `String` on the wire.
|
||||
|
||||
## Per-driver mappers
|
||||
|
||||
Each driver owns its native → `DriverDataType` translation:
|
||||
|
||||
- **Galaxy Proxy** — `GalaxyProxyDriver.MapDataType(int mxDataType)` and `MapSecurity(int mxSec)` (inline in `src/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Proxy/GalaxyProxyDriver.cs`). The Galaxy `mx_data_type` integer is sent across the Host↔Proxy pipe and mapped on the Proxy side. Galaxy's full classic 16-entry table (Boolean / Integer / Float / Double / String / Time / ElapsedTime / Reference / Enumeration / Custom / InternationalizedString) is preserved but compressed into the seven-entry `DriverDataType` enum — `ElapsedTime` → `Float64`, `InternationalizedString` → `String`, `Reference` → `Reference`, enumerations → `Int32`.
|
||||
- **AB CIP** — `src/ZB.MOM.WW.OtOpcUa.Driver.AbCip/AbCipDataType.cs` maps CIP tag type codes.
|
||||
- **Modbus** — `src/ZB.MOM.WW.OtOpcUa.Driver.Modbus/ModbusDriver.cs` maps register shapes (16-bit signed, 16-bit unsigned, 32-bit float, etc.) including the DirectLogic quirk table in `DirectLogicAddress.cs`.
|
||||
- **S7 / AB Legacy / TwinCAT / FOCAS / OPC UA Client** — each has its own inline mapper or `*DataType.cs` file per the same pattern.
|
||||
|
||||
The driver's mapping is authoritative — when a field type is ambiguous (a `LREAL` that could be bit-reinterpreted, a BCD counter, a string of a particular encoding), the driver decides the exposed OPC UA shape.
|
||||
|
||||
## Array handling
|
||||
|
||||
`DriverAttributeInfo.IsArray = true` flips `ValueRank = OneDimension` on the generated `BaseDataVariableState`; scalars stay at `ValueRank.Scalar`. `DriverAttributeInfo.ArrayDim` carries the declared length. Writing element-by-element (OPC UA `IndexRange`) is a driver-level decision — see `docs/ReadWriteOperations.md`.
|
||||
|
||||
## SecurityClassification — metadata, not ACL
|
||||
|
||||
`SecurityClassification` is driver-reported metadata only. Drivers never enforce write permissions themselves — the classification flows into the Server project where `WriteAuthzPolicy.IsAllowed(classification, userRoles)` (`src/ZB.MOM.WW.OtOpcUa.Server/Security/WriteAuthzPolicy.cs`) gates the write against the session's LDAP-derived roles, and (Phase 6.2) the `AuthorizationGate` + permission trie apply on top. This is the "ACL at server layer" invariant documented in `docs/security.md`.
|
||||
|
||||
The classification values mirror the v1 Galaxy model so existing Galaxy galaxies keep their published semantics:
|
||||
|
||||
| SecurityClassification | Required role | Write-from-OPC-UA |
|
||||
|---|---|---|
|
||||
| `FreeAccess` | — | yes (even anonymous) |
|
||||
| `Operate` | `WriteOperate` | yes |
|
||||
| `Tune` | `WriteTune` | yes |
|
||||
| `Configure` | `WriteConfigure` | yes |
|
||||
| `SecuredWrite` | `WriteOperate` | yes |
|
||||
| `VerifiedWrite` | `WriteConfigure` | yes |
|
||||
| `ViewOnly` | — | no |
|
||||
|
||||
Drivers whose backend has no notion of classification (Modbus, most PLCs) default every tag to `FreeAccess` or `Operate`; drivers whose backend does carry the notion (Galaxy, OPC UA Client relaying `UserAccessLevel`) translate it directly.
|
||||
|
||||
## Historization
|
||||
|
||||
`DriverAttributeInfo.IsHistorized = true` flips `AccessLevel.HistoryRead` and `Historizing = true` on the variable. The driver must then implement `IHistoryProvider` for HistoryRead service calls to succeed; otherwise the node manager surfaces `BadHistoryOperationUnsupported` per request.
|
||||
|
||||
## Key source files
|
||||
|
||||
- `src/ZB.MOM.WW.OtOpcUa.Core.Abstractions/DriverDataType.cs` — driver-agnostic type enum
|
||||
- `src/ZB.MOM.WW.OtOpcUa.Core.Abstractions/SecurityClassification.cs` — write-authz tier metadata
|
||||
- `src/ZB.MOM.WW.OtOpcUa.Core.Abstractions/DriverAttributeInfo.cs` — per-attribute descriptor
|
||||
- `src/ZB.MOM.WW.OtOpcUa.Server/OpcUa/DriverNodeManager.cs` — `MapDataType` translation
|
||||
- `src/ZB.MOM.WW.OtOpcUa.Server/Security/WriteAuthzPolicy.cs` — classification-to-role policy
|
||||
- Per-driver mappers in each `Driver.*` project
|
||||
129
docs/v1/HistoricalDataAccess.md
Normal file
129
docs/v1/HistoricalDataAccess.md
Normal file
@@ -0,0 +1,129 @@
|
||||
# Historical Data Access
|
||||
|
||||
OPC UA HistoryRead is a **per-driver optional capability** in OtOpcUa. The Core dispatches HistoryRead service calls to the owning driver through the `IHistoryProvider` capability interface (`src/ZB.MOM.WW.OtOpcUa.Core.Abstractions/IHistoryProvider.cs`). Drivers that don't implement the interface return `BadHistoryOperationUnsupported` for every history call on their nodes; that is the expected behavior for protocol drivers (Modbus, S7, AB CIP, AB Legacy, TwinCAT, FOCAS) whose wire protocols carry no time-series data.
|
||||
|
||||
Historian integration is no longer a separate bolt-on assembly, as it was in v1 (`ZB.MOM.WW.LmxOpcUa.Historian.Aveva` plugin). It is now one optional capability any driver can implement. The first implementation is the Galaxy driver's Wonderware Historian integration; OPC UA Client forwards HistoryRead to the upstream server. Every other driver leaves the capability unimplemented and the Core short-circuits history calls on nodes that belong to those drivers.
|
||||
|
||||
## `IHistoryProvider`
|
||||
|
||||
Four methods, mapping onto the four OPC UA HistoryRead service variants:
|
||||
|
||||
| Method | OPC UA service | Notes |
|
||||
|--------|----------------|-------|
|
||||
| `ReadRawAsync` | HistoryReadRawModified (raw subset) | Returns `HistoryReadResult { Samples, ContinuationPoint? }`. The Core handles `ContinuationPoint` pagination. |
|
||||
| `ReadProcessedAsync` | HistoryReadProcessed | Takes a `HistoryAggregateType` (Average / Minimum / Maximum / Total / Count) and a bucket `interval`. Drivers that can't express an aggregate throw `NotSupportedException`; the Core translates that into `BadAggregateNotSupported`. |
|
||||
| `ReadAtTimeAsync` | HistoryReadAtTime | Default implementation throws `NotSupportedException` — drivers without interpolation / prior-boundary support leave the default. |
|
||||
| `ReadEventsAsync` | HistoryReadEvents | Historical alarm/event rows, distinct from the live `IAlarmSource` stream. Default throws; only drivers with an event historian (Galaxy's A&E log) override. |
|
||||
|
||||
Supporting DTOs live alongside the interface in `Core.Abstractions`:
|
||||
|
||||
- `HistoryReadResult(IReadOnlyList<DataValueSnapshot> Samples, byte[]? ContinuationPoint)`
|
||||
- `HistoryAggregateType` — enum `{ Average, Minimum, Maximum, Total, Count }`
|
||||
- `HistoricalEvent(EventId, SourceName?, EventTimeUtc, ReceivedTimeUtc, Message?, Severity)`
|
||||
- `HistoricalEventsResult(IReadOnlyList<HistoricalEvent> Events, byte[]? ContinuationPoint)`
|
||||
|
||||
## Alarm event history vs. `IHistoryProvider`
|
||||
|
||||
`IHistoryProvider.ReadEventsAsync` is the **pull** path: an OPC UA client calls `HistoryReadEvents` against a notifier node and the driver walks its own backend event store to satisfy the request. The Galaxy driver's implementation reads from AVEVA Historian's event schema via `aahClientManaged`; every other driver leaves the default `NotSupportedException` in place.
|
||||
|
||||
There is also a separate **push** path for persisting alarm transitions from any `IAlarmSource` (and the Phase 7 scripted-alarm engine) into a durable event log, independent of any client HistoryRead call. That path is covered by `IAlarmHistorianSink` + `SqliteStoreAndForwardSink` in `src/ZB.MOM.WW.OtOpcUa.Core.AlarmHistorian/` and is documented in [AlarmTracking.md#alarm-historian-sink](AlarmTracking.md#alarm-historian-sink). The two paths are complementary — the sink populates an external historian's alarm schema; `ReadEventsAsync` reads from whatever event store the driver owns — and share neither interface nor dispatch.
|
||||
|
||||
## Dispatch through `CapabilityInvoker`
|
||||
|
||||
All four HistoryRead surfaces are wrapped by `CapabilityInvoker` (`Core/Resilience/CapabilityInvoker.cs`) with `DriverCapability.HistoryRead`. The Polly pipeline keyed on `(DriverInstanceId, HostName, DriverCapability.HistoryRead)` provides timeout, circuit-breaker, and bulkhead defaults per the driver's stability tier (see [docs/v2/driver-stability.md](v2/driver-stability.md)).
|
||||
|
||||
The dispatch point is `DriverNodeManager` in `ZB.MOM.WW.OtOpcUa.Server`. When the OPC UA stack calls `HistoryRead`, the node manager:
|
||||
|
||||
1. Resolves the target `NodeHandle` to a `(DriverInstanceId, fullReference)` pair.
|
||||
2. Checks the owning driver's `DriverTypeMetadata` to see if the type may advertise history at all (fast reject for types that never implement `IHistoryProvider`).
|
||||
3. If the driver instance implements `IHistoryProvider`, wraps the `ReadRawAsync` / `ReadProcessedAsync` / `ReadAtTimeAsync` / `ReadEventsAsync` call in `CapabilityInvoker.InvokeAsync(... DriverCapability.HistoryRead ...)`.
|
||||
4. Translates the `HistoryReadResult` into an OPC UA `HistoryData` + `ExtensionObject`.
|
||||
5. Manages the continuation point via `HistoryContinuationPointManager` so clients can page through large result sets.
|
||||
|
||||
Driver-level history code never sees the continuation-point protocol or the OPC UA stack types — those stay in the Core.
|
||||
|
||||
## Driver coverage
|
||||
|
||||
| Driver | Implements `IHistoryProvider`? | Source |
|
||||
|--------|:------------------------------:|--------|
|
||||
| Galaxy | Yes — raw, processed, at-time, events | `aahClientManaged` SDK (Wonderware Historian) on the Host side, forwarded through the Proxy's IPC |
|
||||
| OPC UA Client | Yes — raw, processed, at-time, events (forwarded to upstream) | `Opc.Ua.Client.Session.HistoryRead` against the remote server |
|
||||
| Modbus | No | Wire protocol has no time-series concept |
|
||||
| Siemens S7 | No | S7comm has no time-series concept |
|
||||
| AB CIP | No | CIP has no time-series concept |
|
||||
| AB Legacy | No | PCCC has no time-series concept |
|
||||
| TwinCAT | No | ADS symbol reads are point-in-time; archiving is an external concern |
|
||||
| FOCAS | No | Default — FOCAS has no general-purpose historian API |
|
||||
|
||||
## Galaxy — Wonderware Historian (`aahClientManaged`)
|
||||
|
||||
The Galaxy driver's `IHistoryProvider` implementation lives on the Host side (`.NET 4.8 x86`) in `src/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Host/Backend/Historian/`. The Proxy's `GalaxyProxyDriver.ReadRawAsync` / `ReadProcessedAsync` / `ReadAtTimeAsync` / `ReadEventsAsync` each serializes a `HistoryRead*Request` and awaits the matching `HistoryRead*Response` over the named pipe (see [drivers/Galaxy.md](drivers/Galaxy.md#ipc-transport)).
|
||||
|
||||
Host-side, `HistorianDataSource` uses the AVEVA Historian managed SDK (`aahClientManaged.dll`) to query historical data via a cursor-based API through `ArchestrA.HistorianAccess`:
|
||||
|
||||
- **`HistoryQuery`** — raw historical samples (timestamp, value, OPC quality)
|
||||
- **`AnalogSummaryQuery`** — pre-computed aggregates (Average, Minimum, Maximum, ValueCount, First, Last, StdDev)
|
||||
|
||||
The SDK DLLs are pulled into the Galaxy.Host project at build time; the Server and every other driver project remain SDK-free.
|
||||
|
||||
> **Gap / status note.** The raw SDK wrapper (`HistorianDataSource`, `HistorianClusterEndpointPicker`, `HistorianHealthSnapshot`, etc.) has been ported from the v1 `ZB.MOM.WW.LmxOpcUa.Historian.Aveva` plugin into `Driver.Galaxy.Host/Backend/Historian/`. The **IPC wire-up** — `HistoryReadRequest` / `HistoryReadResponse` message kinds, Proxy-side `ReadRawAsync` / `ReadProcessedAsync` / `ReadAtTimeAsync` / `ReadEventsAsync` forwarding — is in place on `GalaxyProxyDriver`. What remains to close on a given branch is Host-side **mapping of `HistoryAggregateType` onto the `AnalogSummaryQuery` column names** (done in `GalaxyProxyDriver.MapAggregateToColumn`; the Host side must mirror it) and the **end-to-end integration test** that was held by the v1 plugin suite. Until those land on a given driver branch, history calls against Galaxy may surface `GalaxyIpcException { Code = "not-implemented" }` or backend-specific errors rather than populated `HistoryReadResult`s. Track the remaining work against the Phase 2 Galaxy out-of-process gate in `docs/v2/plan.md`.
|
||||
|
||||
### Aggregate function mapping
|
||||
|
||||
`GalaxyProxyDriver.MapAggregateToColumn` (Proxy-side) translates the OPC UA Part 13 standard aggregate enum onto `AnalogSummaryQuery` column names consumed by `HistorianDataSource.ReadAggregateAsync`:
|
||||
|
||||
| `HistoryAggregateType` | Result Property |
|
||||
|------------------------|-----------------|
|
||||
| `Average` | `Average` |
|
||||
| `Minimum` | `Minimum` |
|
||||
| `Maximum` | `Maximum` |
|
||||
| `Count` | `ValueCount` |
|
||||
|
||||
`HistoryAggregateType.Total` is **not supported** by Wonderware `AnalogSummary` and raises `NotSupportedException`, which the Core translates to `BadAggregateNotSupported`. Additional OPC UA aggregates (`Start`, `End`, `StandardDeviationPopulation`) sit on the Historian columns `First`, `Last`, `StdDev` and can be exposed by extending the enum + mapping together.
|
||||
|
||||
### Read-only cluster failover
|
||||
|
||||
`HistorianConfiguration.ServerNames` accepts an ordered list of cluster nodes. `HistorianClusterEndpointPicker` iterates the list in configuration order, marks failed nodes with a `FailureCooldownSeconds` window, and re-admits them when the cooldown elapses. One picker instance is shared by the process-values connection and the event-history connection (two SDK silos), so a node failure on one silo immediately benches it for the other. `FailureCooldownSeconds = 0` disables the cooldown — the SDK's own retry semantics are the sole gate.
|
||||
|
||||
Host-side cluster health is surfaced via `HistorianHealthSnapshot { NodeCount, HealthyNodeCount, ActiveProcessNode, ActiveEventNode, Nodes }` and forwarded to the Proxy so the Admin UI Historian panel can render a per-node table. `HealthCheckService` flips overall service health to `Degraded` when `HealthyNodeCount < NodeCount`.
|
||||
|
||||
### Runtime health counters
|
||||
|
||||
`HistorianDataSource` maintains per-read counters — `TotalQueries`, `TotalSuccesses`, `TotalFailures`, `ConsecutiveFailures`, `LastSuccessTime`, `LastFailureTime`, `LastError`, `ProcessConnectionOpen`, `EventConnectionOpen` — so the dashboard can distinguish "backend loaded but never queried" from "backend loaded and queries are failing". `LastError` is prefixed with the read path (`raw:`, `aggregate:`, `at-time:`, `events:`) so operators can tell which silo is broken. `HealthCheckService` degrades at `ConsecutiveFailures >= 3`.
|
||||
|
||||
### Quality mapping
|
||||
|
||||
The Historian SDK returns standard OPC DA quality values in `QueryResult.OpcQuality` (UInt16). The low byte flows through the shared `QualityMapper` pipeline (`MapFromMxAccessQuality` → `MapToOpcUaStatusCode`):
|
||||
|
||||
| OPC Quality Byte | OPC DA Family | OPC UA StatusCode |
|
||||
|------------------|---------------|-------------------|
|
||||
| 0-63 | Bad | `Bad` (with sub-code when an exact enum match exists) |
|
||||
| 64-191 | Uncertain | `Uncertain` (with sub-code when an exact enum match exists) |
|
||||
| 192+ | Good | `Good` (with sub-code when an exact enum match exists) |
|
||||
|
||||
See `Domain/QualityMapper.cs` and `Domain/Quality.cs` in `Driver.Galaxy.Host` for the full table.
|
||||
|
||||
## OPC UA Client — upstream forwarding
|
||||
|
||||
The OPC UA Client driver (`Driver.OpcUaClient`) implements `IHistoryProvider` by forwarding each call to the upstream server via `Session.HistoryRead`. Raw / processed / at-time / events map onto the stack's native HistoryRead details types. Continuation points are passed through — the Core's `HistoryContinuationPointManager` treats the driver as an opaque pager.
|
||||
|
||||
## Historizing flag and AccessLevel
|
||||
|
||||
During variable node creation, drivers that advertise history set:
|
||||
|
||||
```csharp
|
||||
if (attr.IsHistorized)
|
||||
accessLevel |= AccessLevels.HistoryRead;
|
||||
variable.Historizing = attr.IsHistorized;
|
||||
```
|
||||
|
||||
- **`Historizing = true`** — tells OPC UA clients that the node has historical data available.
|
||||
- **`AccessLevels.HistoryRead`** — enables the `HistoryRead` access bit. The OPC UA stack checks this bit before routing history requests to the Core dispatcher; nodes without it are rejected before reaching `IHistoryProvider`.
|
||||
|
||||
The `IsHistorized` flag originates in the driver's discovery output. For Galaxy it comes from the repository query detecting a `HistoryExtension` primitive (see [drivers/Galaxy-Repository.md](drivers/Galaxy-Repository.md)). For OPC UA Client it is copied from the upstream server's `Historizing` property.
|
||||
|
||||
## Configuration
|
||||
|
||||
Driver-specific historian config lives in each driver's `DriverConfig` JSON blob, validated against the driver type's `DriverConfigJsonSchema` in `DriverTypeRegistry`. The Galaxy driver's historian section carries the fields exercised by `HistorianConfiguration` — `ServerName` / `ServerNames`, `FailureCooldownSeconds`, `IntegratedSecurity` / `UserName` / `Password`, `Port` (default `32568`), `CommandTimeoutSeconds`, `RequestTimeoutSeconds`, `MaxValuesPerRead`. The OPC UA Client driver inherits its timeouts from the upstream session.
|
||||
|
||||
See [Configuration.md](Configuration.md) for the schema shape and validation path.
|
||||
30
docs/v1/README.md
Normal file
30
docs/v1/README.md
Normal file
@@ -0,0 +1,30 @@
|
||||
# v1 documentation archive
|
||||
|
||||
This folder contains documentation that described the original v1
|
||||
in-process MXAccess architecture (`Galaxy.Host` + `Galaxy.Proxy` +
|
||||
`Galaxy.Shared` three-project split, .NET 4.8 x86 + COM apartment, the
|
||||
`OtOpcUaGalaxyHost` Windows service). That architecture was retired in
|
||||
PR 7.2 (merged 2026-04-30 at commit `ae7106d`). These docs are kept as
|
||||
the historical record of how the system worked before the v2-mxgw
|
||||
migration; treat their content as accurate at the time of writing, NOT
|
||||
as current state.
|
||||
|
||||
For current architecture see:
|
||||
|
||||
- `CLAUDE.md` — agent-facing v2 overview
|
||||
- `docs/drivers/Galaxy.md` — current Galaxy driver doc
|
||||
- `docs/v2/Galaxy.ParityRig.md` — current testing setup
|
||||
- `docs/v2/Galaxy.Performance.md` — observability + perf
|
||||
- `lmx_mxgw.md` (in repo root) — design rationale for the migration
|
||||
|
||||
| File | What it covered |
|
||||
|---|---|
|
||||
| `AlarmTracking.md` | v1 alarm-tracking flow through the in-process MXAccess client |
|
||||
| `Configuration.md` | v1 server configuration (`OTOPCUA_GALAXY_*` env vars now live in mxaccessgw config) |
|
||||
| `DataTypeMapping.md` | Galaxy `mx_data_type` → OPC UA type mapping (still accurate as a reference; the live mapping logic is in `src/ZB.MOM.WW.OtOpcUa.Driver.Galaxy/Browse/DataTypeMap.cs`) |
|
||||
| `HistoricalDataAccess.md` | v1 IHistoryProvider on the Host side; current path is the server-level HistoryRouter + Wonderware sidecar |
|
||||
| `Subscriptions.md` | v1 MXAccess subscription mechanics; current path uses gateway StreamEvents |
|
||||
| `drivers/Galaxy-Repository.md` | v1 Host-side ZB SQL repository client; the gateway owns this path now |
|
||||
| `drivers/Galaxy-Test-Fixture.md` | v1 test-fixture setup (parity tests + Galaxy.Host EXE spawn) |
|
||||
| `reqs/GalaxyRepositoryReqs.md`, `reqs/MxAccessClientReqs.md` | Original Phase 0 requirements; satisfied in mxaccessgw repo today |
|
||||
| `reqs/ServiceHostReqs.md` | Service-hosting requirements including `OtOpcUaGalaxyHost` (GHX-* section); only `OtOpcUa` server hosting remains in scope post-7.2 |
|
||||
69
docs/v1/Subscriptions.md
Normal file
69
docs/v1/Subscriptions.md
Normal file
@@ -0,0 +1,69 @@
|
||||
# Subscriptions
|
||||
|
||||
Driver-side data-change subscriptions live behind `ISubscribable` (`src/ZB.MOM.WW.OtOpcUa.Core.Abstractions/ISubscribable.cs`). The interface is deliberately mechanism-agnostic: it covers native subscriptions (Galaxy MXAccess advisory, OPC UA monitored items on an upstream server, TwinCAT ADS notifications) and driver-internal polled subscriptions (Modbus, AB CIP, S7, FOCAS). Core sees the same event shape regardless — drivers fire `OnDataChange` and Core dispatches to the matching OPC UA monitored items.
|
||||
|
||||
## Driver vs virtual dispatch
|
||||
|
||||
Per [ADR-002](v2/implementation/adr-002-driver-vs-virtual-dispatch.md), `DriverNodeManager` routes subscriptions across both driver tags and virtual (scripted) tags through the same `ISubscribable` contract. The per-variable `NodeSourceKind` (registered from `DriverAttributeInfo` at discovery) selects the backend:
|
||||
|
||||
- `NodeSourceKind.Driver` — subscribes via the driver's `ISubscribable`, wrapped by `CapabilityInvoker` (the rest of this doc).
|
||||
- `NodeSourceKind.Virtual` — subscribes via `VirtualTagSource` (`src/ZB.MOM.WW.OtOpcUa.Core.VirtualTags/VirtualTagSource.cs`), which forwards change events emitted by `VirtualTagEngine` as `OnDataChange`. The ref-counting, initial-value, and transfer-restoration behaviour below applies identically.
|
||||
|
||||
Because both kinds expose `ISubscribable`, Core's dispatch, ref-count map, and monitored-item fan-out are unchanged across the source branch.
|
||||
|
||||
## ISubscribable surface
|
||||
|
||||
```csharp
|
||||
Task<ISubscriptionHandle> SubscribeAsync(
|
||||
IReadOnlyList<string> fullReferences,
|
||||
TimeSpan publishingInterval,
|
||||
CancellationToken cancellationToken);
|
||||
|
||||
Task UnsubscribeAsync(ISubscriptionHandle handle, CancellationToken cancellationToken);
|
||||
|
||||
event EventHandler<DataChangeEventArgs>? OnDataChange;
|
||||
```
|
||||
|
||||
A single `SubscribeAsync` call may batch many attributes and returns an opaque handle the caller passes back to `UnsubscribeAsync`. The driver may emit an immediate `OnDataChange` for each subscribed reference (the OPC UA initial-data convention) and then a push per change.
|
||||
|
||||
Every subscribe / unsubscribe call goes through `CapabilityInvoker.ExecuteAsync(DriverCapability.Subscribe, host, …)` so the per-host pipeline applies.
|
||||
|
||||
## Reference counting at Core
|
||||
|
||||
Multiple OPC UA clients can monitor the same variable simultaneously. Rather than open duplicate driver subscriptions, Core maintains a ref-count per `(driver, fullReference)` pair: the first OPC UA monitored-item for a reference triggers `ISubscribable.SubscribeAsync` with that single reference; each additional monitored-item just increments the count; decrement-to-zero triggers `UnsubscribeAsync`. Transferred subscriptions (client reconnect → resume session) replay against the same ref-count map so active driver subscriptions are preserved across session migration.
|
||||
|
||||
## Threading
|
||||
|
||||
The STA thread story is now driver-specific, not a server-wide concern:
|
||||
|
||||
- **Galaxy** runs its MXAccess COM objects on a dedicated STA thread with a Win32 message pump (`src/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Host/Sta/StaPump.cs`) inside the standalone `Driver.Galaxy.Host` Windows service. The Proxy driver (`Driver.Galaxy.Proxy`) connects to the Host via named pipe and re-exposes the data on a free-threaded surface to Core. Core never touches COM.
|
||||
- **Modbus / S7 / AB CIP / AB Legacy / TwinCAT / FOCAS** are free-threaded — they run their polling loops on ordinary `Task`s. Their `OnDataChange` fires on thread-pool threads.
|
||||
- **OPC UA Client** delegates to the OPC Foundation stack's subscription loop.
|
||||
|
||||
The common contract: drivers are responsible for marshalling from whatever native thread the backend uses onto thread-pool threads before raising `OnDataChange`. Core's dispatch path acquires the OPC UA framework `Lock` and calls `ClearChangeMasks` on the corresponding `BaseDataVariableState` to notify subscribed clients.
|
||||
|
||||
## Dispatch
|
||||
|
||||
Core's subscription dispatch path:
|
||||
|
||||
1. `ISubscribable.OnDataChange` fires on a thread-pool thread with a `DataChangeEventArgs(subscriptionHandle, fullReference, DataValueSnapshot)`.
|
||||
2. Core looks up the variable by `fullReference` in the driver's `DriverNodeManager` variable map.
|
||||
3. Under the OPC UA framework `Lock`, the variable's `Value` / `StatusCode` / `Timestamp` are updated and `ClearChangeMasks(SystemContext, false)` is called.
|
||||
4. The OPC Foundation stack then enqueues data-change notifications for every monitored-item attached to that variable, honoring each subscription's sampling + filter configuration.
|
||||
|
||||
Batch coalescing — coalescing multiple pushes for the same reference between publish cycles — is done driver-side when the backend natively supports it (Galaxy keeps the v1 coalescing dictionary); otherwise the SDK's own data-change filter suppresses no-change notifications.
|
||||
|
||||
## Initial values
|
||||
|
||||
A freshly-built variable carries `StatusCode = BadWaitingForInitialData` until the driver delivers the first value. Drivers whose backends supply an initial read (Galaxy `AdviseSupervisory`, TwinCAT `AddDeviceNotification`) fire `OnDataChange` immediately after `SubscribeAsync` returns. Polled drivers fire the first push when their first poll cycle completes.
|
||||
|
||||
## Transferred subscription restoration
|
||||
|
||||
When an OPC UA session is resumed (client reconnect with `TransferSubscriptions`), Core walks the transferred monitored-items and ensures every referenced `(driver, fullReference)` has a live driver subscription. References already active (in-process migration) skip re-subscribing; references that lost their driver-side handle during the session gap are re-subscribed via `SubscribeAsync`.
|
||||
|
||||
## Key source files
|
||||
|
||||
- `src/ZB.MOM.WW.OtOpcUa.Core.Abstractions/ISubscribable.cs` — capability contract
|
||||
- `src/ZB.MOM.WW.OtOpcUa.Core/Resilience/CapabilityInvoker.cs` — pipeline wrapping
|
||||
- `src/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Host/Sta/StaPump.cs` — Galaxy STA thread + message pump
|
||||
- Per-driver subscribe implementations in each `Driver.*` project
|
||||
152
docs/v1/drivers/Galaxy-Repository.md
Normal file
152
docs/v1/drivers/Galaxy-Repository.md
Normal file
@@ -0,0 +1,152 @@
|
||||
# Galaxy Repository — Tag Discovery for the Galaxy Driver
|
||||
|
||||
`GalaxyRepositoryService` reads the Galaxy object hierarchy and attribute metadata from the System Platform Galaxy Repository SQL Server database. It is the Galaxy driver's implementation of **`ITagDiscovery.DiscoverAsync`** — every driver has its own discovery source, and the Galaxy driver's is a direct SQL query against the Galaxy Repository (the `ZB` database). Other drivers use completely different mechanisms:
|
||||
|
||||
| Driver | `ITagDiscovery` source |
|
||||
|--------|------------------------|
|
||||
| Galaxy | ZB SQL hierarchy + attribute queries (this doc) |
|
||||
| AB CIP | `@tags` walker against the PLC controller |
|
||||
| AB Legacy | Data-table scan via PCCC `LogicalRead` on the PLC |
|
||||
| TwinCAT | Beckhoff `SymbolLoaderFactory` — uploads the full symbol tree from the ADS runtime |
|
||||
| S7 | Config-DB enumeration (no native symbol upload for S7comm) |
|
||||
| Modbus | Config-DB enumeration (flat register map, user-authored) |
|
||||
| FOCAS | CNC queries (`cnc_rdaxisname`, `cnc_rdmacroinfo`, …) + optional Config-DB overlays |
|
||||
| OPC UA Client | `Session.Browse` against the remote server |
|
||||
|
||||
`GalaxyRepositoryService` lives in `src/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Host/Backend/GalaxyRepository/` — Host-side, .NET Framework 4.8 x86, same process that owns the MXAccess COM objects. The Proxy forwards discovery over IPC the same way it forwards reads and writes.
|
||||
|
||||
## Connection Configuration
|
||||
|
||||
`GalaxyRepositoryConfiguration` controls database access:
|
||||
|
||||
| Property | Default | Description |
|
||||
|----------|---------|-------------|
|
||||
| `ConnectionString` | `Server=localhost;Database=ZB;Integrated Security=true;` | SQL Server connection using Windows Authentication |
|
||||
| `ChangeDetectionIntervalSeconds` | `30` | Polling frequency for deploy change detection |
|
||||
| `CommandTimeoutSeconds` | `30` | SQL command timeout for all queries |
|
||||
| `ExtendedAttributes` | `false` | When true, loads primitive-level attributes in addition to dynamic attributes |
|
||||
| `Scope` | `Galaxy` | `Galaxy` loads all deployed objects. `LocalPlatform` filters to the local platform's subtree only |
|
||||
| `PlatformName` | `null` | Explicit platform hostname for `LocalPlatform` filtering. When null, uses `Environment.MachineName` |
|
||||
|
||||
The connection uses Windows Authentication because the Galaxy Repository database is local to the System Platform node and secured through domain credentials.
|
||||
|
||||
## SQL Queries
|
||||
|
||||
All queries are embedded as `const string` fields in `GalaxyRepositoryService`. No dynamic SQL is used. Project convention `GR-006` requires `const string` SQL queries; any new query must be added as a named constant rather than built at runtime.
|
||||
|
||||
### Hierarchy query
|
||||
|
||||
Returns deployed Galaxy objects with their parent relationships, browse names, and template derivation chains:
|
||||
|
||||
- Joins `gobject` to `template_definition` to filter by relevant `category_id` values (1, 3, 4, 10, 11, 13, 17, 24, 26)
|
||||
- Uses `contained_name` as the browse name, falling back to `tag_name` when `contained_name` is null or empty
|
||||
- Resolves the parent using `contained_by_gobject_id` when non-zero, otherwise falls back to `area_gobject_id`
|
||||
- Marks objects with `category_id = 13` as areas
|
||||
- Filters to `is_template = 0` (instances only, not templates)
|
||||
- Filters to `deployed_package_id <> 0` (deployed objects only)
|
||||
- Returns a `template_chain` column built by a recursive CTE that walks `gobject.derived_from_gobject_id` from each instance through its immediate template and ancestor templates (depth guard `< 10`). Template names are ordered by depth and joined with `|` via `STUFF(... FOR XML PATH(''))`. Example: `TestMachine_001` returns `$TestMachine|$gMachine|$gUserDefined|$UserDefined`. The C# repository reader splits the column on `|`, trims, and populates `GalaxyObjectInfo.TemplateChain`, which is consumed by `AlarmObjectFilter` for template-based alarm filtering. See [Alarm Tracking](../AlarmTracking.md#template-based-alarm-object-filter).
|
||||
- Returns `template_definition.category_id` as a `category_id` column, populated into `GalaxyObjectInfo.CategoryId`. The runtime status probe manager filters this down to `CategoryId == 1` (`$WinPlatform`) and `CategoryId == 3` (`$AppEngine`) to decide which objects get a `<Host>.ScanState` probe advised. Also used during the hosted-variables walk to identify Platform/Engine ancestors.
|
||||
- Returns `gobject.hosted_by_gobject_id` as a `hosted_by_gobject_id` column, populated into `GalaxyObjectInfo.HostedByGobjectId`. This is the **runtime host** of the object (e.g., which `$AppEngine` actually runs it), **not** the browse-containment parent (`contained_by_gobject_id`). The two are often different — an object can live in one Area in the browse tree but be hosted by an Engine on a different Platform for runtime execution. The driver walks this chain during `BuildHostedVariablesMap` to find the nearest `$WinPlatform` or `$AppEngine` ancestor so subtree quality invalidation on a Stopped host reaches exactly the variables that were actually executing there. Note: the Galaxy schema column is named `hosted_by_gobject_id` (not `host_gobject_id` as some documentation sources guess). See [Galaxy driver — Per-Host Runtime Status Probes](Galaxy.md#per-host-runtime-status-probes-hostscanstate).
|
||||
|
||||
### Attributes query (standard)
|
||||
|
||||
Returns user-defined dynamic attributes for deployed objects:
|
||||
|
||||
- Uses a recursive CTE (`deployed_package_chain`) to walk the package inheritance chain from `deployed_package_id` through `derived_from_package_id`, limited to 10 levels
|
||||
- Joins `dynamic_attribute` on each package in the chain to collect inherited attributes
|
||||
- Uses `ROW_NUMBER() OVER (PARTITION BY gobject_id, attribute_name ORDER BY depth)` to pick the most-derived definition when an attribute is overridden at multiple levels
|
||||
- Builds `full_tag_reference` as `tag_name.attribute_name` with `[]` appended for arrays
|
||||
- Extracts `array_dimension` from the binary `mx_value` column (bytes 13-16, little-endian int32)
|
||||
- Detects historized attributes by checking for a `HistoryExtension` primitive instance
|
||||
- Detects alarm attributes by checking for an `AlarmExtension` primitive instance
|
||||
- Excludes internal attributes (names starting with `_`) and `.Description` suffixes
|
||||
- Filters by `mx_attribute_category` to include only user-relevant categories
|
||||
|
||||
### Attributes query (extended)
|
||||
|
||||
When `ExtendedAttributes = true`, a more comprehensive query runs that unions two sources:
|
||||
|
||||
1. **Primitive attributes** — Joins through `primitive_instance` and `attribute_definition` to include system-level attributes from primitive components. Each attribute carries its `primitive_name` so the address space can group them under their parent variable.
|
||||
2. **Dynamic attributes** — The same CTE-based query as the standard path, with an empty `primitive_name`.
|
||||
|
||||
The `full_tag_reference` for primitive attributes follows the pattern `tag_name.primitive_name.attribute_name` (e.g., `TestMachine_001.AlarmAttr.InAlarm`).
|
||||
|
||||
### Change detection query
|
||||
|
||||
A single-column query: `SELECT time_of_last_deploy FROM galaxy`. The `galaxy` table contains one row with the timestamp of the most recent deployment.
|
||||
|
||||
## Why deployed_package_id Instead of checked_in_package_id
|
||||
|
||||
The Galaxy maintains two package references for each object:
|
||||
|
||||
- `checked_in_package_id` — the latest saved version, which may include undeployed configuration changes
|
||||
- `deployed_package_id` — the version currently running on the target platform
|
||||
|
||||
The queries filter on `deployed_package_id <> 0` because the OPC UA address space must mirror what is actually running in the Galaxy runtime. Using `checked_in_package_id` would expose attributes and objects that exist in the IDE but have not been deployed, causing mismatches between the OPC UA address space and the MXAccess runtime.
|
||||
|
||||
## Platform Scope Filter
|
||||
|
||||
When `Scope` is set to `LocalPlatform`, the repository applies a post-query C# filter to restrict the address space to objects hosted by the local platform. This reduces memory footprint, MXAccess subscription count, and address space size on multi-node Galaxy deployments where each OPC UA server instance only needs to serve its own platform's objects.
|
||||
|
||||
### How it works
|
||||
|
||||
1. **Platform lookup** — A separate `const string` SQL query (`PlatformLookupSql`) reads `platform_gobject_id` and `node_name` from the `platform` table for all deployed platforms. This runs once per hierarchy load.
|
||||
2. **Platform matching** — The configured `PlatformName` (or `Environment.MachineName` when null) is matched case-insensitively against the `node_name` column. If no match is found, a warning is logged listing the available platforms and the address space is empty.
|
||||
3. **Host chain collection** — The filter collects the matching platform's `gobject_id`, then iterates the hierarchy to find all `$AppEngine` (category 3) objects whose `HostedByGobjectId` equals the platform. This produces the full set of host gobject_ids under the local platform.
|
||||
4. **Object inclusion** — All non-area objects whose `HostedByGobjectId` is in the host set are included, along with the hosts themselves.
|
||||
5. **Area retention** — `ParentGobjectId` chains are walked upward from included objects to pull in ancestor areas, keeping the browse tree connected. Areas that contain no local descendants are excluded.
|
||||
6. **Attribute filtering** — The set of included `gobject_id` values is cached after `GetHierarchyAsync` and reused by `GetAttributesAsync` to filter attributes to the same scope.
|
||||
|
||||
### Design rationale
|
||||
|
||||
The filter is applied in C# rather than SQL because project convention `GR-006` requires `const string` SQL queries with no dynamic SQL. The hierarchy query already returns `HostedByGobjectId` and `CategoryId` on every row, so all information needed for filtering is already in memory after the query runs. The only new SQL is the lightweight platform lookup query.
|
||||
|
||||
### Configuration
|
||||
|
||||
```json
|
||||
"GalaxyRepository": {
|
||||
"Scope": "LocalPlatform",
|
||||
"PlatformName": null
|
||||
}
|
||||
```
|
||||
|
||||
- Set `Scope` to `"LocalPlatform"` to enable filtering. Default is `"Galaxy"` (load everything).
|
||||
- Set `PlatformName` to an explicit hostname to target a specific platform, or leave null to use the local machine name.
|
||||
|
||||
### Startup log
|
||||
|
||||
When `LocalPlatform` is active, the startup log shows the filtering result:
|
||||
|
||||
```
|
||||
GalaxyRepository.Scope="LocalPlatform", PlatformName=MYNODE
|
||||
GetHierarchyAsync returned 49 objects
|
||||
GetPlatformsAsync returned 2 platform(s)
|
||||
Scope filter targeting platform 'MYNODE' (gobject_id=1042)
|
||||
Scope filter retained 25 of 49 objects for platform 'MYNODE'
|
||||
GetAttributesAsync returned 4206 attributes (extended=true)
|
||||
Scope filter retained 2100 of 4206 attributes
|
||||
```
|
||||
|
||||
## Change Detection Polling and IRediscoverable
|
||||
|
||||
`ChangeDetectionService` runs a background polling loop in the Host process that calls `GetLastDeployTimeAsync` at the configured interval. It compares the returned timestamp against the last known value:
|
||||
|
||||
- On the first poll (no previous state), the timestamp is recorded and `OnGalaxyChanged` fires unconditionally
|
||||
- On subsequent polls, `OnGalaxyChanged` fires only when `time_of_last_deploy` differs from the cached value
|
||||
|
||||
When the event fires, the Host re-runs the hierarchy and attribute queries and pushes the result back to the Server via an IPC `RediscoveryNeeded` message. That surfaces on `GalaxyProxyDriver` as the **`IRediscoverable.OnRediscoveryNeeded`** event; the Server's `DriverNodeManager` consumes it and calls `SyncAddressSpace` to compute the diff against the live address space.
|
||||
|
||||
The polling approach is used because the Galaxy Repository database does not provide change notifications. The `galaxy.time_of_last_deploy` column updates only on completed deployments, so the polling interval controls how quickly the OPC UA address space reflects Galaxy changes.
|
||||
|
||||
## TestConnection
|
||||
|
||||
`TestConnectionAsync` runs `SELECT 1` against the configured database. This is used at Host startup to verify connectivity before attempting the full hierarchy query.
|
||||
|
||||
## Key source files
|
||||
|
||||
- `src/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Host/Backend/GalaxyRepository/GalaxyRepositoryService.cs` — SQL queries and data access
|
||||
- `src/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Host/Backend/GalaxyRepository/PlatformScopeFilter.cs` — Platform-based hierarchy and attribute filtering
|
||||
- `src/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Host/Backend/GalaxyRepository/ChangeDetectionService.cs` — Deploy timestamp polling loop
|
||||
- `src/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Host/Configuration/GalaxyRepositoryConfiguration.cs` — Connection, polling, and scope settings
|
||||
- `src/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Host/Domain/PlatformInfo.cs` — Platform-to-hostname DTO
|
||||
- `src/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Shared/Contracts/DiscoveryResponse.cs` — IPC DTO the Host uses to return hierarchy + attribute results across the pipe
|
||||
165
docs/v1/drivers/Galaxy-Test-Fixture.md
Normal file
165
docs/v1/drivers/Galaxy-Test-Fixture.md
Normal file
@@ -0,0 +1,165 @@
|
||||
# Galaxy test fixture
|
||||
|
||||
Coverage map + gap inventory for the Galaxy driver — out-of-process Host
|
||||
(net48 x86 MXAccess COM) + Proxy (net10) + Shared protocol.
|
||||
|
||||
**TL;DR: Galaxy has the richest test harness in the fleet** — real Host
|
||||
subprocess spawn, real ZB SQL queries, IPC parity checks against the v1
|
||||
LmxProxy reference, + live-smoke tests when MXAccess runtime is actually
|
||||
installed. Gaps are live-plant + failover-shaped: the E2E suite covers the
|
||||
representative ~50-tag deployment but not large-site discovery stress, real
|
||||
Rockwell/Siemens PLC enumeration through MXAccess, or ZB SQL Always-On
|
||||
replica failover.
|
||||
|
||||
## What the fixture is
|
||||
|
||||
Multi-project test topology:
|
||||
|
||||
- **E2E parity** —
|
||||
`tests/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.E2E/ParityFixture.cs` spawns the
|
||||
production `OtOpcUa.Driver.Galaxy.Host.exe` as a subprocess, opens the
|
||||
named-pipe IPC, connects `GalaxyProxyDriver` + runs hierarchy / stability
|
||||
parity tests against both.
|
||||
- **Host.Tests** —
|
||||
`tests/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Host.Tests/` — direct Host process
|
||||
testing (18+ test classes covering alarm discovery, AVEVA prerequisite
|
||||
checks, IPC dispatcher, alarm tracker, probe manager, historian
|
||||
cluster/quality/wiring, history read, OPC UA attribute mapping,
|
||||
subscription lifecycle, reconnect, multi-host proxy, ADS address routing,
|
||||
expression evaluation) + `GalaxyRepositoryLiveSmokeTests` that hit real
|
||||
ZB SQL.
|
||||
- **Proxy.Tests** — `GalaxyProxyDriver` client contract tests.
|
||||
- **Shared.Tests** — shared protocol + address model.
|
||||
- **TestSupport** — test helpers reused across the above.
|
||||
|
||||
## How tests skip
|
||||
|
||||
- **E2E parity**: `ParityFixture.SkipIfUnavailable()` runs at class init and
|
||||
checks Windows-only, ZB SQL reachable on `localhost:1433`, Host EXE built
|
||||
in the expected `bin/` folder. Any miss → tests skip.
|
||||
- **Live-smoke** (`GalaxyRepositoryLiveSmokeTests`): `Assert.Skip` when ZB
|
||||
unreachable. A `per project_galaxy_host_installed` memory on this repo's
|
||||
dev box notes the MXAccess runtime is installed. The pipe ACL allows the
|
||||
configured SID outright; elevation of the caller doesn't matter because
|
||||
the per-connection SID check in `PipeServer.VerifyCaller` only compares
|
||||
user SIDs (not group membership or integrity level).
|
||||
- **Unit** tests (Shared, Proxy contract, most Host.Tests) have no skip —
|
||||
they run anywhere.
|
||||
|
||||
## What it actually covers
|
||||
|
||||
### E2E parity suite
|
||||
|
||||
- `HierarchyParityTests` — Host address-space hierarchy vs v1 LmxProxy
|
||||
reference (same ZB, same Galaxy, same shape)
|
||||
- `StabilityFindingsRegressionTests` — probe subscription failure
|
||||
handling + host-status mutation guard from the v1 stability findings
|
||||
backlog
|
||||
|
||||
### Host.Tests (representative)
|
||||
|
||||
- Alarm discovery → subsystem setup
|
||||
- AVEVA prerequisite checks (runtime installed, platform deployed, etc.)
|
||||
- IPC dispatcher — request/response routing over the named pipe
|
||||
- Alarm tracker state machine
|
||||
- Probe manager — per-runtime probe subscription + reconnect
|
||||
- Historian cluster / quality / wiring — Aveva Historian integration
|
||||
- OPC UA attribute mapping
|
||||
- Subscription lifecycle + reconnect
|
||||
- Multi-host proxy routing
|
||||
- ADS address routing + expression evaluation (Galaxy's legacy expression
|
||||
language)
|
||||
|
||||
### Live-smoke
|
||||
|
||||
- `GalaxyRepositoryLiveSmokeTests` — real SQL against ZB database, verifies
|
||||
the ZB schema + `LocalPlatform` scope filter + change-detection query
|
||||
shape match production.
|
||||
|
||||
### Capability surfaces hit
|
||||
|
||||
All of them: `IDriver`, `IReadable`, `IWritable`, `ITagDiscovery`,
|
||||
`ISubscribable`, `IHostConnectivityProbe`, `IPerCallHostResolver`,
|
||||
`IAlarmSource`, `IHistoryProvider`. Galaxy is the only driver where every
|
||||
interface sees both contract + real-integration coverage.
|
||||
|
||||
## What it does NOT cover
|
||||
|
||||
### 1. MXAccess COM by default
|
||||
|
||||
The E2E parity suite backs subscriptions via the DB-only path; MXAccess COM
|
||||
integration opts in via a separate live-smoke. So "does the MXAccess STA
|
||||
pump correctly handle real Wonderware runtime events" is exercised only
|
||||
when the operator runs live smoke on a machine with MXAccess installed.
|
||||
|
||||
### 2. Real Rockwell / Siemens PLC enumeration
|
||||
|
||||
Galaxy runtime talks to PLCs through MXAccess (Device Integration Objects).
|
||||
The CI parity suite uses a representative ~50-tag deployment; large sites
|
||||
(1000+ tag hierarchies, multi-Galaxy replication, deeply-nested templates)
|
||||
are not stressed.
|
||||
|
||||
### 3. ZB SQL Always-On failover
|
||||
|
||||
Live-smoke hits a single SQL instance. Real production ZB often runs on
|
||||
Always-On availability groups; replica failover behavior is not tested.
|
||||
|
||||
### 4. Galaxy replication / backup-restore
|
||||
|
||||
Galaxy supports backup + partial replication across platforms — these
|
||||
rewrite the ZB schema in ways that change the contained_name vs tag_name
|
||||
mapping. Not exercised.
|
||||
|
||||
### 5. Historian failover
|
||||
|
||||
Aveva Historian can be clustered. `historian cluster / quality` tests
|
||||
verify the cluster-config query; they don't exercise actual failover
|
||||
(primary dies → secondary takes over mid-HistoryRead).
|
||||
|
||||
### 6. AVEVA runtime version matrix
|
||||
|
||||
MXAccess COM contract varies subtly across System Platform 2017 / 2020 /
|
||||
2023. The live-smoke runs against whatever version is installed on the dev
|
||||
box; CI has no AVEVA installed at all (licensing + footprint).
|
||||
|
||||
## When to trust the Galaxy suite, when to reach for a live plant
|
||||
|
||||
| Question | E2E parity | Live-smoke | Real plant |
|
||||
| --- | --- | --- | --- |
|
||||
| "Does Host spawn + IPC round-trip work?" | yes | yes | yes |
|
||||
| "Does the ZB schema query match production shape?" | partial | yes | yes |
|
||||
| "Does MXAccess COM handle runtime reconnect correctly?" | no | yes | yes |
|
||||
| "Does the driver scale to 1000+ tags on one Galaxy?" | no | partial | yes (required) |
|
||||
| "Does historian failover mid-read return a clean error?" | no | no | yes (required) |
|
||||
| "Does System Platform 2023's MXAccess differ from 2020?" | no | partial | yes (required) |
|
||||
| "Does ZB Always-On replica failover preserve generation?" | no | no | yes (required) |
|
||||
|
||||
## Follow-up candidates
|
||||
|
||||
1. **System Platform 2023 live-smoke matrix** — set up a second dev box
|
||||
running SP2023; run the same live-smoke against both to catch COM-contract
|
||||
drift early.
|
||||
2. **Synthetic large-site fixture** — script a ZB populator that creates a
|
||||
1000-Equipment / 20000-tag hierarchy, run the parity suite against it.
|
||||
Catches O(N) → O(N²) discovery regressions.
|
||||
3. **Historian failover scripted test** — with a two-node AVEVA Historian
|
||||
cluster, tear down primary mid-HistoryRead + verify the driver's failover
|
||||
behavior + error surface.
|
||||
4. **ZB Always-On CI** — SQL Server 2022 on Linux supports Always-On;
|
||||
could stand up a two-replica group for replica-failover coverage.
|
||||
|
||||
This is already the best-tested driver; the remaining work is site-scale
|
||||
+ production-topology coverage, not capability coverage.
|
||||
|
||||
## Key fixture / config files
|
||||
|
||||
- `tests/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.E2E/ParityFixture.cs` — E2E fixture
|
||||
that spawns Host + connects Proxy
|
||||
- `tests/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Host.Tests/GalaxyRepositoryLiveSmokeTests.cs`
|
||||
— live ZB smoke with `Assert.Skip` gate
|
||||
- `tests/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.TestSupport/` — shared helpers
|
||||
- `docs/drivers/Galaxy.md` — COM bridge + STA pump + IPC architecture
|
||||
- `docs/drivers/Galaxy-Repository.md` — ZB SQL reader + `LocalPlatform`
|
||||
scope filter + change detection
|
||||
- `docs/v2/aveva-system-platform-io-research.md` — MXAccess + Wonderware
|
||||
background
|
||||
141
docs/v1/reqs/GalaxyRepositoryReqs.md
Normal file
141
docs/v1/reqs/GalaxyRepositoryReqs.md
Normal file
@@ -0,0 +1,141 @@
|
||||
# Galaxy Driver — Galaxy Repository Requirements
|
||||
|
||||
> **Revision** — Refreshed 2026-04-19 for the OtOpcUa v2 multi-driver platform (task #205). Scope clarified: this document is **Galaxy-driver-specific**. Galaxy is one of seven drivers in the OtOpcUa platform; the requirements below describe the SQL-side of the Galaxy driver (hierarchy/attribute/change-detection queries against the ZB database) that backs the Galaxy driver's `ITagDiscovery.DiscoverAsync` and `IRediscoverable` implementations. All Galaxy-specific SQL runs inside `OtOpcUa.Galaxy.Host` (.NET 4.8 x86 Windows service); the in-server `Driver.Galaxy.Proxy` calls it over a named pipe. For platform-wide tag discovery requirements see `OpcUaServerReqs.md` OPC-002. For deeper spec see `docs/GalaxyRepository.md` and `docs/v2/driver-specs.md`.
|
||||
|
||||
Parent: [HLR-002](HighLevelReqs.md#hlr-002-multi-driver-plug-in-model), [HLR-003](HighLevelReqs.md#hlr-003-address-space-composition-per-namespace), [HLR-006](HighLevelReqs.md#hlr-006-change-detection-and-rediscovery)
|
||||
|
||||
Driver scope: Galaxy only. Namespace kind: `SystemPlatform`.
|
||||
|
||||
## GR-001: Hierarchy Extraction
|
||||
|
||||
The Galaxy driver's `ITagDiscovery.DiscoverAsync` implementation shall query the ZB Galaxy Repository database to extract all deployed objects with their parent-child containment relationships, contained names, and tag names.
|
||||
|
||||
### Acceptance Criteria
|
||||
|
||||
- Executes `queries/hierarchy.sql` against the ZB database from within `OtOpcUa.Galaxy.Host`.
|
||||
- Returns a list of objects with: `gobject_id`, `tag_name`, `contained_name`, `browse_name`, `parent_gobject_id`, `is_area`.
|
||||
- Objects with `parent_gobject_id = 0` become children of the root ZB node inside the `SystemPlatform` namespace.
|
||||
- Only deployed, non-template objects matching the category filter (areas, engines, user-defined objects, etc.) are returned.
|
||||
- Query completes within 10 seconds on a typical Galaxy (hundreds of objects). Log Warning if it takes longer.
|
||||
|
||||
### Details
|
||||
|
||||
- Results are ordered by `parent_gobject_id, tag_name` for deterministic tree building.
|
||||
- Empty result → Warning logged (Galaxy may have no deployed objects, or the DB connection may be misconfigured).
|
||||
- Orphan detection: a row referencing a non-existent `parent_gobject_id` (and not 0) is skipped with a Warning.
|
||||
- Streamed to the core via `IAddressSpaceBuilder.AddFolder` / `AddObject` calls over the Galaxy named pipe; no in-memory full-tree buffering on the Host side.
|
||||
|
||||
---
|
||||
|
||||
## GR-002: Attribute Extraction
|
||||
|
||||
The Galaxy driver shall query user-defined (dynamic) attributes for deployed objects, including data type, array flag, and array dimensions.
|
||||
|
||||
### Acceptance Criteria
|
||||
|
||||
- Executes `queries/attributes.sql` using the template chain CTE to resolve inherited attributes.
|
||||
- Returns: `gobject_id`, `tag_name`, `attribute_name`, `full_tag_reference`, `mx_data_type`, `is_array`, `array_dimension`, `security_classification`.
|
||||
- Attributes starting with `_` are filtered out by the query.
|
||||
- `array_dimension` is extracted from the `mx_value` hex bytes (positions 13-16, little-endian uint16).
|
||||
|
||||
### Details
|
||||
|
||||
- CTE recursion depth is limited to 10 levels.
|
||||
- `mx_data_type` not in the known set (1-8, 13-16) defaults to String.
|
||||
- `gobject_id` that doesn't match a hierarchy object is skipped (object may not be deployed).
|
||||
- Each emitted attribute is reported via `DriverAttributeInfo` to the core through `IAddressSpaceBuilder.AddVariable`.
|
||||
|
||||
---
|
||||
|
||||
## GR-003: Change Detection and IRediscoverable
|
||||
|
||||
The Galaxy driver shall implement `IRediscoverable` by polling `galaxy.time_of_last_deploy` on a configurable interval to detect when a new deployment has occurred.
|
||||
|
||||
### Acceptance Criteria
|
||||
|
||||
- Polls `SELECT time_of_last_deploy FROM galaxy` at a configurable interval (`Galaxy:ChangeDetectionIntervalSeconds`, default 30 seconds).
|
||||
- Compares the returned timestamp to the last known value stored in memory.
|
||||
- If different, raises the `IRediscoverable.RediscoveryNeeded` signal so the core re-runs `ITagDiscovery.DiscoverAsync` and surgically rebuilds the Galaxy namespace subtree (per OPC-017).
|
||||
- First poll after startup always triggers an initial discovery.
|
||||
- Query failure → Warning logged; no rediscovery triggered; retry at next interval.
|
||||
|
||||
### Details
|
||||
|
||||
- Polling runs on a background `Task` inside `OtOpcUa.Galaxy.Host`, not on the STA message-pump thread.
|
||||
- `time_of_last_deploy` is a `datetime` column; compared using exact equality (not a range).
|
||||
- Signal delivery to the Proxy happens via a server-push message on the Galaxy named pipe.
|
||||
|
||||
---
|
||||
|
||||
## GR-004: Rediscovery Data Flow
|
||||
|
||||
On a deployment change, the Galaxy driver shall re-query hierarchy + attributes and stream the updated structure to the core for surgical namespace rebuild.
|
||||
|
||||
### Acceptance Criteria
|
||||
|
||||
- On change signal, re-run `GR-001` (hierarchy) and `GR-002` (attributes) queries.
|
||||
- Stream the new tree to the core via `IAddressSpaceBuilder` over the named pipe.
|
||||
- Log at Information level: `"Galaxy deployment change detected. Rebuilding. ({ObjectCount} objects, {AttributeCount} attributes)"`.
|
||||
- Log total rediscovery duration at Information level.
|
||||
- On re-query failure: Error logged; existing Galaxy subtree is retained.
|
||||
|
||||
### Details
|
||||
|
||||
- Rediscovery is not atomic from the DB perspective — hierarchy and attributes are two separate queries. Acceptable; Galaxy deployment is an infrequent operation.
|
||||
- The core owns the diff/surgical apply per OPC-017; the Galaxy driver only streams the new authoritative tree.
|
||||
|
||||
---
|
||||
|
||||
## GR-005: Connection Configuration
|
||||
|
||||
Galaxy DB connection parameters shall be configurable via environment variables passed from the `OtOpcUa.Galaxy.Host` supervisor at spawn time.
|
||||
|
||||
### Acceptance Criteria
|
||||
|
||||
- Connection string via `OTOPCUA_GALAXY_ZB_CONN` environment variable.
|
||||
- Default: `Server=localhost;Database=ZB;Integrated Security=True;TrustServerCertificate=True;Encrypt=False;` (Windows Auth).
|
||||
- ADO.NET `SqlConnection` used for queries (.NET Framework 4.8).
|
||||
- Connection is opened per-query (not kept open). Connection pooling handles efficiency.
|
||||
- If the initial connection test at startup fails, log Error with the connection string sanitized and continue attempting (change-detection polls keep retrying).
|
||||
|
||||
### Details
|
||||
|
||||
- Command timeout: `Galaxy:CommandTimeoutSeconds` in Config DB driver JSON (default 30 seconds).
|
||||
- No ORM. Raw ADO.NET with `SqlCommand` and `SqlDataReader`. SQL text embedded as constants.
|
||||
|
||||
---
|
||||
|
||||
## GR-006: Query Safety
|
||||
|
||||
All Galaxy SQL queries shall be static read-only SELECT statements. No writes to the Galaxy Repository database.
|
||||
|
||||
### Acceptance Criteria
|
||||
|
||||
- All queries are hardcoded SQL strings with no string concatenation or user-supplied parameters.
|
||||
- No INSERT, UPDATE, DELETE, or DDL statements are ever executed against the Galaxy database.
|
||||
- Queries use only SELECT with read-only intent.
|
||||
|
||||
---
|
||||
|
||||
## GR-007: Startup Validation
|
||||
|
||||
On startup, the Galaxy driver's DB component inside `OtOpcUa.Galaxy.Host` shall validate database connectivity.
|
||||
|
||||
### Acceptance Criteria
|
||||
|
||||
- Execute a simple test query (`SELECT 1`) against the configured Galaxy DB.
|
||||
- If the database is unreachable, log Error but do not prevent Host startup.
|
||||
- The Galaxy driver runs in degraded mode (empty SystemPlatform namespace) until the database becomes available and the next change-detection poll succeeds.
|
||||
- In degraded mode the Galaxy driver instance reports `DriverHealth.Unavailable`, causing its Polly circuit state to be open until the first successful discovery.
|
||||
|
||||
---
|
||||
|
||||
## GR-008: Capability Wrapping
|
||||
|
||||
All calls into the Galaxy DB component from the Proxy side shall route through `CapabilityInvoker.InvokeAsync(DriverCapability.Discover, …)`.
|
||||
|
||||
### Acceptance Criteria
|
||||
|
||||
- `Driver.Galaxy.Proxy.DiscoverAsync` is a thin capability-invoker call that sends a MessagePack request over the named pipe to the Host's DB component.
|
||||
- Roslyn analyzer **OTOPCUA0001** validates there are no direct discovery calls bypassing the invoker.
|
||||
- Polly pipeline for `DriverCapability.Discover` on the Galaxy driver instance carries Timeout + Retry + CircuitBreaker.
|
||||
205
docs/v1/reqs/MxAccessClientReqs.md
Normal file
205
docs/v1/reqs/MxAccessClientReqs.md
Normal file
@@ -0,0 +1,205 @@
|
||||
# Galaxy Driver — MXAccess Client Requirements
|
||||
|
||||
> **Revision** — Refreshed 2026-04-19 for the OtOpcUa v2 multi-driver platform (task #205). Scope narrowed: this document covers the MXAccess surface **inside `OtOpcUa.Galaxy.Host`** (.NET Framework 4.8 x86 Windows service). The in-server `Driver.Galaxy.Proxy` implements the `IReadable` / `IWritable` / `ISubscribable` / `IAlarmSource` / `IHistoryProvider` capability interfaces and routes every wire call through the named pipe to this Host process. The STA thread + reconnect playback + subscription refcount requirements from v1 are preserved; what changed is where they live (Host service, not the Server process). MXA-010 (proxy-side wrapping) and MXA-011 (pipe ACL / shared secret) are new.
|
||||
|
||||
Parent: [HLR-002](HighLevelReqs.md#hlr-002-multi-driver-plug-in-model), [HLR-005](HighLevelReqs.md#hlr-005-live-data-access), [HLR-007](HighLevelReqs.md#hlr-007-service-hosting)
|
||||
|
||||
Driver scope: Galaxy only. Process scope: `OtOpcUa.Galaxy.Host` (Host side) and `Driver.Galaxy.Proxy` (server-side forwarder).
|
||||
|
||||
## MXA-001: STA Thread with Message Pump
|
||||
|
||||
All MXAccess COM objects shall be created and called on a dedicated STA thread running a Win32 message pump to ensure COM callbacks are delivered.
|
||||
|
||||
### Acceptance Criteria
|
||||
|
||||
- A dedicated thread is created with `ApartmentState.STA` before any MXAccess COM object is instantiated; implementation lives in `StaPump` inside `OtOpcUa.Galaxy.Host`.
|
||||
- The thread runs a Win32 message pump using `GetMessage` / `TranslateMessage` / `DispatchMessage`.
|
||||
- Work items are marshalled to the STA thread via `PostThreadMessage(WM_APP)` and a concurrent queue.
|
||||
- All COM object creation (`LMXProxyServer`), method calls, and event callbacks happen on this thread.
|
||||
- Thread name `Galaxy.Sta` (for diagnostics).
|
||||
|
||||
### Details
|
||||
|
||||
- If the STA thread dies unexpectedly, log Fatal and trigger Host service shutdown. The supervisor restarts the Host under its driver-stability policy (`docs/v2/driver-stability.md`). COM objects on the dead thread are unrecoverable; no in-process recovery is attempted.
|
||||
- `RunAsync(Action)` returns a `Task` that completes when the action executes on the STA thread. Callers can `await` it.
|
||||
|
||||
---
|
||||
|
||||
## MXA-002: Connection Lifecycle
|
||||
|
||||
The Host shall support Register/Unregister lifecycle with the `LMXProxyServer` COM object, tracking the connection handle.
|
||||
|
||||
### Acceptance Criteria
|
||||
|
||||
- `Register(clientName)` is called on the STA thread and returns a positive connection handle on success.
|
||||
- Handle ≤ 0 → descriptive error thrown; Host reports `DriverHealth.Unavailable` via the pipe so the Proxy reports Bad quality to the core.
|
||||
- `Unregister(handle)` is called during disconnect after all subscriptions are removed.
|
||||
- Client name comes from `OTOPCUA_GALAXY_CLIENT_NAME` environment variable; default `OtOpcUa-Galaxy.Host`. Must be unique per MXAccess registration (a cluster's Primary and Secondary each get their own client-name suffix via node override).
|
||||
- Connection state transitions: Disconnected → Connecting → Connected → Disconnecting → Disconnected (and Error from any state).
|
||||
|
||||
### Details
|
||||
|
||||
- `ConnectedSince` (UTC) recorded after successful Register.
|
||||
- `ReconnectCount` tracked for diagnostics and `/metrics`.
|
||||
- State changes are emitted over the pipe as `DriverHealth` updates.
|
||||
|
||||
---
|
||||
|
||||
## MXA-003: Tag Subscription
|
||||
|
||||
The Host shall support subscribing to tags via AddItem + AdviseSupervisory, receiving value updates through OnDataChange callbacks.
|
||||
|
||||
### Acceptance Criteria
|
||||
|
||||
- Subscribe sequence: `AddItem(handle, address)` returns item handle, then `AdviseSupervisory(handle, itemHandle)` starts the subscription.
|
||||
- `OnDataChange` callback delivers value, quality, timestamp, and MXSTATUS_PROXY array.
|
||||
- Item address format: `tag_name.AttributeName` for scalars, `tag_name.AttributeName[]` for whole arrays.
|
||||
- AddItem failure → Warning logged, failure propagated over the pipe to the Proxy.
|
||||
- Bidirectional maps of `address ↔ itemHandle` maintained for callback resolution.
|
||||
- Multi-client refcounting: two Proxy-side subscribe calls for the same address produce one MXAccess subscription; refcount decrement on the last unsubscribe triggers `UnAdvise` / `RemoveItem`.
|
||||
|
||||
### Details
|
||||
|
||||
- `AdviseSupervisory` (not `Advise`) is used because this is a background service without an interactive user session.
|
||||
- Stored subscriptions dictionary maps address → callback for reconnect replay.
|
||||
- On reconnect, every entry in stored subscriptions is re-subscribed (AddItem + AdviseSupervisory with new handles).
|
||||
|
||||
---
|
||||
|
||||
## MXA-004: Tag Read/Write
|
||||
|
||||
The Host shall support synchronous-style read and write operations, marshalled to the STA thread, with configurable timeouts.
|
||||
|
||||
### Acceptance Criteria
|
||||
|
||||
- Read pattern: prefer cached subscription value; fall back to subscribe-get-first-value-unsubscribe (AddItem → AdviseSupervisory → wait for OnDataChange → UnAdvise → RemoveItem).
|
||||
- Write: AddItem → AdviseSupervisory → `Write()` → await `OnWriteComplete` callback → cleanup.
|
||||
- Read timeout: `Galaxy:ReadTimeoutSeconds` in driver config (default 5 seconds) — enforced on the Host side in addition to the Proxy-side Polly `Timeout` leg.
|
||||
- Write timeout: `Galaxy:WriteTimeoutSeconds` (default 5 seconds) — enforced similarly.
|
||||
- Concurrent operation limit: configurable semaphore (`Galaxy:MaxConcurrentOperations`, default 10).
|
||||
- All operations marshalled to the STA thread.
|
||||
|
||||
### Details
|
||||
|
||||
- Write uses security classification `-1` (no security). Galaxy runtime enforces security; OtOpcUa authorization is enforced server-side before the call ever reaches the pipe (per OPC-014 `AuthorizationGate`).
|
||||
- `OnWriteComplete`: check `MXSTATUS_PROXY.success`. If 0, extract detail code and propagate as an error over the pipe.
|
||||
- COM exceptions translated to meaningful error messages.
|
||||
|
||||
---
|
||||
|
||||
## MXA-005: Auto-Reconnect
|
||||
|
||||
The Host shall monitor connection health and automatically reconnect on failure, replaying all stored subscriptions after reconnect.
|
||||
|
||||
### Acceptance Criteria
|
||||
|
||||
- Monitor loop runs on a background thread at `Galaxy:MonitorIntervalSeconds` (default 5 seconds).
|
||||
- On disconnect, attempt reconnect. On success, replay all stored subscriptions.
|
||||
- On reconnect failure, log Warning and retry at next interval (no exponential backoff inside the Host; the Proxy-side Polly pipeline handles cross-process backoff against pipe failures).
|
||||
- Reconnect count is incremented on each successful reconnect.
|
||||
- Monitor loop is cancellable for clean Host shutdown.
|
||||
|
||||
### Details
|
||||
|
||||
- Reconnect cleans up old COM objects before creating new ones.
|
||||
- After reconnect, probe subscription (MXA-006) is re-established first, then stored subscriptions.
|
||||
- No max retry limit — keep trying indefinitely until the Host service is stopped.
|
||||
|
||||
---
|
||||
|
||||
## MXA-006: Probe-Based Health Monitoring
|
||||
|
||||
The Host shall optionally subscribe to a configurable probe tag and use OnDataChange callback staleness to detect silent connection failures.
|
||||
|
||||
### Acceptance Criteria
|
||||
|
||||
- Probe tag address configured via `Galaxy:ProbeTag`. If unset, probe monitoring is disabled.
|
||||
- Track `_lastProbeValueTime` (UTC) updated on each OnDataChange for the probe tag.
|
||||
- If `DateTime.UtcNow - _lastProbeValueTime > staleThreshold`, force disconnect and reconnect.
|
||||
- Stale threshold: `Galaxy:ProbeStaleThresholdSeconds` (default 60 seconds).
|
||||
- Implements `IHostConnectivityProbe` on the Proxy side so the core's `CapabilityInvoker` records probe outcomes with `DriverCapability.Probe` telemetry.
|
||||
|
||||
### Details
|
||||
|
||||
- The probe tag should be an attribute the Galaxy runtime updates regularly (platform heartbeat, area timestamp). Specific tag is site-dependent.
|
||||
- After forced reconnect, reset `_lastProbeValueTime` to `DateTime.UtcNow`.
|
||||
|
||||
---
|
||||
|
||||
## MXA-007: COM Cleanup
|
||||
|
||||
On disconnect or disposal, the Host shall unwire event handlers, unadvise/remove all items, unregister, and release COM objects via `Marshal.ReleaseComObject`.
|
||||
|
||||
### Acceptance Criteria
|
||||
|
||||
- Cleanup order: UnAdvise all active subscriptions → RemoveItem all items → unwire OnDataChange and OnWriteComplete handlers → Unregister → `Marshal.ReleaseComObject`.
|
||||
- On dispose: run disconnect if still connected, then dispose STA thread.
|
||||
- Each cleanup step wrapped in try/catch (cleanup must not throw).
|
||||
- After cleanup: handle maps cleared, pending write TCS entries abandoned, COM reference set to null.
|
||||
|
||||
### Details
|
||||
|
||||
- Stored subscriptions are NOT cleared on disconnect (preserved for reconnect replay). Only cleared on Dispose.
|
||||
- Event handlers unwired BEFORE Unregister (else callbacks may fire on a dead object).
|
||||
- `Marshal.ReleaseComObject` in a `finally` block, always.
|
||||
|
||||
---
|
||||
|
||||
## MXA-008: Operation Metrics
|
||||
|
||||
The MXAccess Host shall record timing and success/failure for Read, Write, and Subscribe operations.
|
||||
|
||||
### Acceptance Criteria
|
||||
|
||||
- Each operation records duration (ms) + success/failure.
|
||||
- Metrics exposed over the pipe to the Proxy, which re-publishes them via OpenTelemetry → Prometheus under `DriverInstanceId = "galaxy-*"`, `HostName = "galaxy.host"`.
|
||||
- Rolling 1000-entry buffer for percentile calculation.
|
||||
- Uses an `ITimingScope` pattern: `using (var scope = metrics.BeginOperation("read")) { ... }`.
|
||||
|
||||
---
|
||||
|
||||
## MXA-009: Error Code Translation
|
||||
|
||||
The Host shall translate known MXAccess error codes from `MXSTATUS_PROXY.detail` into human-readable messages for logging and OPC UA status propagation.
|
||||
|
||||
### Acceptance Criteria
|
||||
|
||||
- Error 1008 → "User lacks security permission"
|
||||
- Error 1012 → "Secured write required (one signature)"
|
||||
- Error 1013 → "Verified write required (two signatures)"
|
||||
- Unknown error codes logged with their numeric value.
|
||||
- Translated messages flow back through the pipe and surface in OPC UA `StatusCode` descriptions and Server logs.
|
||||
- Errors 1008 / 1012 / 1013 on write operations map to `Bad_UserAccessDenied` at the OPC UA surface.
|
||||
|
||||
---
|
||||
|
||||
## MXA-010: Proxy-Side Capability Wrapping
|
||||
|
||||
`Driver.Galaxy.Proxy` shall implement the capability interfaces as thin forwarders that serialize every call through the named pipe and route every call through `CapabilityInvoker`.
|
||||
|
||||
### Acceptance Criteria
|
||||
|
||||
- `Driver.Galaxy.Proxy` implements `IDriver` + `IReadable` + `IWritable` + `ISubscribable` + `ITagDiscovery` + `IRediscoverable` + `IAlarmSource` + `IHistoryProvider` + `IHostConnectivityProbe`.
|
||||
- Each implementation uses `CapabilityInvoker.InvokeAsync(DriverCapability.<...>, …)` — direct pipe calls bypassing the invoker are caught by Roslyn **OTOPCUA0001**.
|
||||
- Each method serializes a MessagePack request frame, sends over the pipe, awaits the response frame, deserializes, returns.
|
||||
- Pipe disconnect mid-call → `CapabilityInvoker`'s circuit breaker counts the failure; sustained disconnect opens the circuit and Galaxy nodes surface Bad quality until the pipe reconnects.
|
||||
- Proxy tolerates Host service restarts — it automatically reconnects and replays subscription setup (parallel to MXA-005 but across the IPC boundary).
|
||||
|
||||
---
|
||||
|
||||
## MXA-011: Pipe Security
|
||||
|
||||
The named pipe between Proxy and Host shall be restricted to the Server's runtime principal via SID-based ACL and authenticated with a per-process shared secret.
|
||||
|
||||
### Acceptance Criteria
|
||||
|
||||
- Pipe name from `OTOPCUA_GALAXY_PIPE` environment variable; default `OtOpcUaGalaxy`.
|
||||
- Allowed SID passed as `OTOPCUA_ALLOWED_SID` — only the declared principal (typically the Server service account) can open the pipe; `Administrators` is explicitly NOT granted (per the `project_galaxy_host_installed` memory note).
|
||||
- Shared secret passed via `OTOPCUA_GALAXY_SECRET` at spawn time; the Proxy must present the matching secret on the opening handshake.
|
||||
- Secret is process-scoped (regenerated per Host restart) and never persisted to disk or Config DB.
|
||||
- Pipe ACL denials are logged as Warning with the rejected principal SID.
|
||||
|
||||
### Details
|
||||
|
||||
- Environment variables are passed by the supervisor launching the Host (`docs/v2/driver-stability.md`).
|
||||
- Dev-box secret is stored at `.local/galaxy-host-secret.txt` for NSSM-wrapped development runs (memory note: `project_galaxy_host_installed`).
|
||||
265
docs/v1/reqs/ServiceHostReqs.md
Normal file
265
docs/v1/reqs/ServiceHostReqs.md
Normal file
@@ -0,0 +1,265 @@
|
||||
# Service Host — Component Requirements
|
||||
|
||||
> **Revision** — Refreshed 2026-04-19 for the OtOpcUa v2 multi-driver platform (task #205). v1 was a single Windows service; v2 ships **three cooperating Windows services** and the service-host requirements are rewritten per-process. SVC-001…SVC-006 from v1 are preserved in spirit (TopShelf, Serilog, config loading, graceful shutdown, startup sequence, unhandled-exception handling) but are now scoped to the process they apply to. SRV-* prefixes the Server process, ADM-* the Admin process, GHX-* the Galaxy Host process. A shared-requirements section at the top covers cross-process concerns (Serilog, logging rotation, bootstrap config scope).
|
||||
|
||||
Parent: [HLR-007](HighLevelReqs.md#hlr-007-service-hosting), [HLR-008](HighLevelReqs.md#hlr-008-logging), [HLR-011](HighLevelReqs.md#hlr-011-config-db-and-draft-publish)
|
||||
|
||||
## Shared Requirements (all three processes)
|
||||
|
||||
### SVC-SHARED-001: Serilog Logging
|
||||
|
||||
Every process shall use Serilog with a rolling daily file sink at Information level minimum, plus a console sink, plus opt-in CompactJsonFormatter file sink.
|
||||
|
||||
#### Acceptance Criteria
|
||||
|
||||
- Console sink active on every process (for interactive / debug mode).
|
||||
- Rolling daily file sink:
|
||||
- Server: `logs/otopcua-YYYYMMDD.log`
|
||||
- Admin: `logs/otopcua-admin-YYYYMMDD.log`
|
||||
- Galaxy Host: `%ProgramData%\OtOpcUa\galaxy-host-YYYYMMDD.log`
|
||||
- Retention count and min level configurable via `Serilog:*` in each process's `appsettings.json`.
|
||||
- JSON sink opt-in via `Serilog:WriteJson = true` (emits `*.json.log` alongside the plain-text file) for SIEM ingestion.
|
||||
- `Log.CloseAndFlush()` invoked in a `finally` block on shutdown.
|
||||
- Structured logging (Serilog message templates) — no `string.Format`.
|
||||
|
||||
---
|
||||
|
||||
### SVC-SHARED-002: Bootstrap Configuration Scope
|
||||
|
||||
`appsettings.json` is bootstrap-only per HLR-011. Operational configuration (clusters, drivers, namespaces, tags, ACLs, poll groups) lives in the Config DB.
|
||||
|
||||
#### Acceptance Criteria
|
||||
|
||||
- `appsettings.json` may contain only: Config DB connection string, `Node:NodeId`, `Node:ClusterId`, `Node:LocalCachePath`, `OpcUa:*` security bootstrap fields, `Ldap:*` bootstrap fields, `Serilog:*`, `Redundancy:*` role id.
|
||||
- Any attempt to configure driver instances, tags, or equipment through `appsettings.json` shall be rejected at startup with a descriptive error.
|
||||
- Invalid or missing required bootstrap fields are detected at startup with a clear error (`"Node:NodeId not configured"` style).
|
||||
|
||||
---
|
||||
|
||||
## OtOpcUa.Server — Service Host Requirements (SRV-*)
|
||||
|
||||
### SRV-001: Microsoft.Extensions.Hosting + AddWindowsService
|
||||
|
||||
The Server shall use `Host.CreateApplicationBuilder(args)` with `AddWindowsService(o => o.ServiceName = "OtOpcUa")` to run as a Windows service.
|
||||
|
||||
#### Acceptance Criteria
|
||||
|
||||
- Service name `OtOpcUa`.
|
||||
- Installs via standard `sc.exe` tooling or the build-provided installer.
|
||||
- Runs as a configured service account (typically a domain service account with Config DB read access; Windows Auth to SQL Server).
|
||||
- Console mode (running `ZB.MOM.WW.OtOpcUa.Server.exe` with no Windows service context) works for development and debugging.
|
||||
- Platform target: .NET 10 x64 (default per decision in `plan.md` §3).
|
||||
|
||||
---
|
||||
|
||||
### SRV-002: Startup Sequence
|
||||
|
||||
The Server shall start components in a defined order, with failure handling at each step.
|
||||
|
||||
#### Acceptance Criteria
|
||||
|
||||
- Startup sequence:
|
||||
1. Load `appsettings.json` bootstrap configuration + initialize Serilog.
|
||||
2. Validate bootstrap fields (NodeId, ClusterId, Config DB connection).
|
||||
3. Initialize `OpcUaApplicationHost` (server-certificate resolution via `SecurityProfileResolver`).
|
||||
4. Connect to Config DB; request current published generation for `ClusterId`.
|
||||
5. If unreachable, fall back to `LiteDbConfigCache` (latest applied generation).
|
||||
6. Apply generation: register driver instances, build namespaces, wire capability pipelines.
|
||||
7. Start `OpcUaServerService` hosted service (opens endpoint listener).
|
||||
8. Start `HostStatusPublisher` (pushes `ClusterNodeGenerationState` to Config DB for Admin UI SignalR consumers).
|
||||
9. Start `RedundancyCoordinator` + `ServiceLevelCalculator`.
|
||||
- Failure in steps 1-3 prevents startup.
|
||||
- Failure in steps 4-6 logs Error and enters degraded mode (empty namespaces, `DriverHealth.Unavailable` on every driver, `ServiceLevel = 0`).
|
||||
- Failure in steps 7-9 logs Error and shuts down (endpoint is non-optional).
|
||||
|
||||
---
|
||||
|
||||
### SRV-003: Graceful Shutdown
|
||||
|
||||
On service stop, the Server shall gracefully shut down all driver instances, the OPC UA listener, and flush logs before exiting.
|
||||
|
||||
#### Acceptance Criteria
|
||||
|
||||
- `IHostApplicationLifetime.ApplicationStopping` triggers orderly shutdown.
|
||||
- Shutdown sequence: stop `HostStatusPublisher` → stop driver instances (disconnect each via `IDriver.DisposeAsync`, which for Galaxy tears down the named pipe) → stop OPC UA server (stop accepting new sessions, complete pending reads/writes) → flush Serilog.
|
||||
- Shutdown completes within 30 seconds (Windows SCM timeout).
|
||||
- All `IDisposable` / `IAsyncDisposable` components disposed in reverse-creation order.
|
||||
- Final log entry: `"OtOpcUa.Server shutdown complete"` at Information level.
|
||||
|
||||
---
|
||||
|
||||
### SRV-004: Unhandled Exception Handling
|
||||
|
||||
The Server shall handle unexpected crashes gracefully.
|
||||
|
||||
#### Acceptance Criteria
|
||||
|
||||
- Registers `AppDomain.CurrentDomain.UnhandledException` handler that logs Fatal before the process terminates.
|
||||
- Windows service recovery configured: restart on failure with 60-second delay.
|
||||
- Fatal log entry includes full exception details.
|
||||
|
||||
---
|
||||
|
||||
### SRV-005: Drivers Hosted In-Process
|
||||
|
||||
All drivers except Galaxy run in-process within the Server.
|
||||
|
||||
#### Acceptance Criteria
|
||||
|
||||
- Modbus TCP, AB CIP, AB Legacy, S7, TwinCAT, FOCAS, OPC UA Client drivers are resolved from the DI container and managed by `DriverHost`.
|
||||
- Galaxy driver in-process component is `Driver.Galaxy.Proxy`, which forwards to `OtOpcUa.Galaxy.Host` over the named pipe (see GHX-*).
|
||||
- Each driver instance's lifecycle (connect, discover, subscribe, dispose) is orchestrated by `DriverHost`.
|
||||
|
||||
---
|
||||
|
||||
### SRV-006: Redundancy-Node Bootstrap
|
||||
|
||||
The Server shall bootstrap its redundancy identity from `appsettings.json` and the Config DB.
|
||||
|
||||
#### Acceptance Criteria
|
||||
|
||||
- `Node:NodeId` + `Node:ClusterId` identify this node uniquely; the `Redundancy` coordinator looks up `ClusterNode.RedundancyRole` (Primary / Secondary) from the Config DB.
|
||||
- Two nodes of the same cluster connect to the same Config DB and the same ClusterId but have different NodeIds and different `ApplicationUri` values.
|
||||
- Missing or ambiguous `(ClusterId, NodeId)` causes startup failure.
|
||||
|
||||
---
|
||||
|
||||
## OtOpcUa.Admin — Service Host Requirements (ADM-*)
|
||||
|
||||
### ADM-001: ASP.NET Core Blazor Server
|
||||
|
||||
The Admin app shall use `WebApplication.CreateBuilder` with Razor Components (`AddRazorComponents().AddInteractiveServerComponents()`), SignalR, and cookie authentication.
|
||||
|
||||
#### Acceptance Criteria
|
||||
|
||||
- Blazor Server (not WebAssembly) per `plan.md` §Tech Stack.
|
||||
- Hosts SignalR hubs for live cluster state (used by `ClusterNodeGenerationState` views, crash-loop alerts, etc.).
|
||||
- Runs as a Windows service via `AddWindowsService` OR as a standard ASP.NET Core process behind IIS / reverse proxy (site decides).
|
||||
- Platform target: .NET 10 x64.
|
||||
|
||||
---
|
||||
|
||||
### ADM-002: Authentication and Authorization
|
||||
|
||||
Admin users authenticate via LDAP bind with cookie auth; three admin roles gate operations.
|
||||
|
||||
#### Acceptance Criteria
|
||||
|
||||
- Cookie auth scheme: `OtOpcUa.Admin`, 8-hour expiry, path `/login` for challenge.
|
||||
- LDAP bind via `LdapAuthService`; user group memberships map to admin roles (`ConfigViewer`, `ConfigEditor`, `FleetAdmin`).
|
||||
- Authorization policies:
|
||||
- `CanEdit` requires `ConfigEditor` or `FleetAdmin`.
|
||||
- `CanPublish` requires `FleetAdmin`.
|
||||
- View-only access requires `ConfigViewer` (or higher).
|
||||
- Unauthenticated requests to any Admin page redirect to `/login`.
|
||||
- Per-cluster role grants layer on top: a `ConfigEditor` with no grant for cluster X can view it but not edit.
|
||||
|
||||
---
|
||||
|
||||
### ADM-003: Config DB as Sole Write Path
|
||||
|
||||
The Admin service shall be the only process with write access to the Config DB.
|
||||
|
||||
#### Acceptance Criteria
|
||||
|
||||
- EF Core `OtOpcUaConfigDbContext` configured with the SQL login / connection string that has read+write permission on config tables.
|
||||
- Server nodes connect with a read-only principal (`grant SELECT` only).
|
||||
- Admin writes produce draft-generation rows; publish writes are atomic and transactional.
|
||||
- Every write is audited via `AuditLogService` per ADM-006.
|
||||
|
||||
---
|
||||
|
||||
### ADM-004: Prometheus /metrics Endpoint
|
||||
|
||||
The Admin service shall expose an OpenTelemetry → Prometheus metrics endpoint at `/metrics`.
|
||||
|
||||
#### Acceptance Criteria
|
||||
|
||||
- `OpenTelemetry.Metrics` registered with Prometheus exporter.
|
||||
- `/metrics` scrapeable without authentication (standard Prometheus pattern) OR gated behind an infrastructure allow-list (site-configurable).
|
||||
- Exports metrics from Server nodes of managed clusters (aggregated via Config DB heartbeat telemetry) plus Admin-local metrics (login attempts, publish duration, active sessions).
|
||||
|
||||
---
|
||||
|
||||
### ADM-005: Graceful Shutdown
|
||||
|
||||
On shutdown, the Admin service shall disconnect SignalR clients cleanly, finish in-flight DB writes, and flush Serilog.
|
||||
|
||||
#### Acceptance Criteria
|
||||
|
||||
- `IHostApplicationLifetime.ApplicationStopping` closes SignalR hub connections gracefully.
|
||||
- In-flight publish transactions are allowed to complete up to 30 seconds.
|
||||
- Final log entry: `"OtOpcUa.Admin shutdown complete"`.
|
||||
|
||||
---
|
||||
|
||||
### ADM-006: Audit Logging
|
||||
|
||||
Every publish and every ACL / role-grant change shall produce an immutable audit row via `AuditLogService`.
|
||||
|
||||
#### Acceptance Criteria
|
||||
|
||||
- Audit rows include: timestamp (UTC), acting principal (LDAP DN + display name), action, entity kind + id, before/after generation number where applicable, session id, source IP.
|
||||
- Audit rows are never mutated or deleted by application code.
|
||||
- Audit table schema enforces immutability via DB permissions (no UPDATE / DELETE granted to the Admin app's principal).
|
||||
|
||||
---
|
||||
|
||||
## OtOpcUa.Galaxy.Host — Service Host Requirements (GHX-*)
|
||||
|
||||
### GHX-001: TopShelf Windows Service Hosting
|
||||
|
||||
The Galaxy Host shall use TopShelf for Windows service lifecycle (install, uninstall, start, stop) and interactive console mode.
|
||||
|
||||
#### Acceptance Criteria
|
||||
|
||||
- Service name `OtOpcUaGalaxyHost`, display name `OtOpcUa Galaxy Host`.
|
||||
- Installs via `ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Host.exe install`.
|
||||
- Uninstalls via `ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Host.exe uninstall`.
|
||||
- Runs as a configured user account (typically the same account as the Server, or a dedicated Galaxy service account with ArchestrA platform access).
|
||||
- Interactive console mode (no args) for development / debugging.
|
||||
- Platform target: **.NET Framework 4.8 x86** — required for MXAccess COM 32-bit interop.
|
||||
- Development deployments may use NSSM in place of TopShelf (memory: `project_galaxy_host_installed`).
|
||||
|
||||
### Details
|
||||
|
||||
- Service description: "OtOpcUa Galaxy Host — MXAccess + Galaxy Repository backend for the Galaxy driver, named-pipe IPC to OtOpcUa.Server."
|
||||
|
||||
---
|
||||
|
||||
### GHX-002: Named-Pipe IPC Bootstrap
|
||||
|
||||
The Host shall open a named pipe on startup whose name, ACL, and shared secret come from environment variables supplied by the supervisor at spawn time.
|
||||
|
||||
#### Acceptance Criteria
|
||||
|
||||
- `OTOPCUA_GALAXY_PIPE` → pipe name (default `OtOpcUaGalaxy`).
|
||||
- `OTOPCUA_ALLOWED_SID` → SID of the principal allowed to connect; any other principal is denied at the ACL layer.
|
||||
- `OTOPCUA_GALAXY_SECRET` → per-process shared secret; `Driver.Galaxy.Proxy` must present it on handshake.
|
||||
- `OTOPCUA_GALAXY_BACKEND` → `stub` / `db` / `mxaccess` (default `mxaccess`) — selects which backend implementation is loaded.
|
||||
- Missing `OTOPCUA_ALLOWED_SID` or `OTOPCUA_GALAXY_SECRET` at startup throws with a descriptive error.
|
||||
|
||||
---
|
||||
|
||||
### GHX-003: Backend Lifecycle
|
||||
|
||||
The Host shall instantiate the STA pump + MXAccess backend + Galaxy Repository + optional Historian plugin in a defined order and tear them down cleanly on shutdown.
|
||||
|
||||
#### Acceptance Criteria
|
||||
|
||||
- Startup (mxaccess backend): initialize Serilog → resolve env vars → create `PipeServer` → start `StaPump` → create `MxAccessClient` on STA thread → initialize `GalaxyRepository` → optionally initialize Historian plugin → begin pipe request handling.
|
||||
- Shutdown: stop pipe → dispose MxAccessClient (MXA-007 COM cleanup) → dispose STA pump → flush Serilog.
|
||||
- Shutdown must complete within 30 seconds (Windows SCM timeout).
|
||||
- `Console.CancelKeyPress` triggers the same sequence in console mode.
|
||||
|
||||
---
|
||||
|
||||
### GHX-004: Unhandled Exception Handling
|
||||
|
||||
The Host shall log Fatal on crash and let the supervisor restart it.
|
||||
|
||||
#### Acceptance Criteria
|
||||
|
||||
- `AppDomain.CurrentDomain.UnhandledException` handler logs Fatal with full exception details before termination.
|
||||
- The supervisor's driver-stability policy (`docs/v2/driver-stability.md`) governs restart behavior — backoff, crash-loop detection, and alerting live there, not in the Host.
|
||||
- Server-side: `Driver.Galaxy.Proxy` detects pipe disconnect, opens its capability circuit, reports Bad quality on Galaxy nodes; reconnects automatically when the Host is back.
|
||||
Reference in New Issue
Block a user