Implement LmxOpcUa server — all 6 phases complete

Full OPC UA server on .NET Framework 4.8 (x86) exposing AVEVA System
Platform Galaxy tags via MXAccess. Mirrors Galaxy object hierarchy as
OPC UA address space, translating contained-name browse paths to
tag-name runtime references.

Components implemented:
- Configuration: AppConfiguration with 4 sections, validator
- Domain: ConnectionState, Quality, Vtq, MxDataTypeMapper, error codes
- MxAccess: StaComThread, MxAccessClient (partial classes), MxProxyAdapter
  using strongly-typed ArchestrA.MxAccess COM interop
- Galaxy Repository: SQL queries (hierarchy, attributes, change detection),
  ChangeDetectionService with auto-rebuild on deploy
- OPC UA Server: LmxNodeManager (CustomNodeManager2), LmxOpcUaServer,
  OpcUaServerHost with programmatic config, SecurityPolicy None
- Status Dashboard: HTTP server with HTML/JSON/health endpoints
- Integration: Full 14-step startup, graceful shutdown, component wiring

175 tests (174 unit + 1 integration), all passing.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
Joseph Doherty
2026-03-25 05:55:27 -04:00
commit a7576ffb38
283 changed files with 16493 additions and 0 deletions

384
docs/implementation-plan.md Normal file
View File

@@ -0,0 +1,384 @@
# Implementation Plan: LmxOpcUa Server — All 44 Requirements
## Context
The LmxOpcUa project is scaffolded (solution, projects, configs, requirements docs) but has no implementation beyond Program.cs and a stub OpcUaService.cs. This plan implements all 44 requirements across 6 phases, each with verification gates and wiring checks to ensure nothing is left unconnected.
## Architecture
Five major components wired together in OpcUaService.cs:
```
┌─────────────────┐ ┌──────────────────┐ ┌─────────────────┐
│ Galaxy Repository│────>│ OPC UA Server │<───>│ OPC UA Clients │
│ (SQL queries) │ │ (address space) │ │ │
└─────────────────┘ └────────┬──────────┘ └─────────────────┘
┌────────┴──────────┐
│ MxAccessClient │
│ (STA + COM) │
└───────────────────┘
┌────────┴──────────┐
│ Status Dashboard │
│ (HTTP + metrics) │
└───────────────────┘
```
Reference implementation: `C:\Users\dohertj2\Desktop\scadalink-design\lmxproxy\src\ZB.MOM.WW.LmxProxy.Host\`
---
## PHASE 1: Foundation — Domain Models, Configuration, Interfaces
**Reqs:** SVC-003, SVC-006 (partial), MXA-008 (interfaces), MXA-009, OPC-005, OPC-012 (partial), GR-005 (config)
### Files to Create
**Configuration/**
- `AppConfiguration.cs` — top-level holder for all config sections
- `OpcUaConfiguration.cs` — Port, EndpointPath, ServerName, GalaxyName, MaxSessions, SessionTimeoutMinutes
- `MxAccessConfiguration.cs` — ClientName, timeouts, concurrency, probe settings
- `GalaxyRepositoryConfiguration.cs` — ConnectionString, intervals, command timeout
- `DashboardConfiguration.cs` — Enabled, Port, RefreshIntervalSeconds
- `ConfigurationValidator.cs` — validate and log effective config at startup
**Domain/**
- `ConnectionState.cs` — enum: Disconnected, Connecting, Connected, Disconnecting, Error, Reconnecting
- `ConnectionStateChangedEventArgs.cs` — PreviousState, CurrentState, Message
- `Vtq.cs` — Value/Timestamp/Quality struct with factory methods
- `Quality.cs` — enum with Bad/Uncertain/Good families matching OPC DA codes
- `QualityMapper.cs` — MapFromMxAccessQuality(int) and MapToOpcUaStatusCode(Quality)
- `MxDataTypeMapper.cs` — MapToOpcUaDataType(int mxDataType), MapToClrType(int). Unknown defaults to String
- `MxErrorCodes.cs` — translate 1008/1012/1013 to human messages
- `GalaxyObjectInfo.cs` — DTO matching hierarchy.sql columns
- `GalaxyAttributeInfo.cs` — DTO matching attributes.sql columns
- `IMxAccessClient.cs` — interface: Connect, Disconnect, Subscribe, Read, Write, OnTagValueChanged delegate
- `IGalaxyRepository.cs` — interface: GetHierarchy, GetAttributes, GetLastDeployTime, TestConnection, OnGalaxyChanged event
- `IMxProxy.cs` — abstraction over LMXProxyServer COM object (enables testing without DLL)
**Metrics/**
- `PerformanceMetrics.cs` — ITimingScope, OperationMetrics (1000-entry rolling buffer), BeginOperation/GetStatistics. Adapt from reference.
### Tests
- `ConfigurationLoadingTests.cs` — bind appsettings.json, verify defaults
- `MxDataTypeMapperTests.cs` — all 12 type mappings + unknown default
- `QualityMapperTests.cs` — boundary values (0, 63, 64, 191, 192)
- `MxErrorCodesTests.cs` — known codes + unknown
- `PerformanceMetricsTests.cs` — recording, P95, buffer eviction, empty state
### Verification Gate 1
- [ ] `dotnet build` — zero errors
- [ ] All Phase 1 tests pass
- [ ] Config binding loads all 4 sections from appsettings.json
- [ ] MxDataTypeMapper covers every row in `gr/data_type_mapping.md`
- [ ] Quality enum covers all reference impl values
- [ ] Builds WITHOUT ArchestrA.MxAccess.dll (interface-based, no COM refs in Phase 1)
- [ ] Every new file has doc-comment referencing requirement ID(s)
- [ ] IMxAccessClient has every method needed by OPC-007, OPC-008, OPC-009
- [ ] IGalaxyRepository has every method needed by GR-001 through GR-004
---
## PHASE 2: MxAccessClient — STA Thread and COM Interop
**Reqs:** MXA-001, MXA-002, MXA-003, MXA-004, MXA-005, MXA-006, MXA-007, MXA-008 (wiring)
### Files to Create
**MxAccess/**
- `StaComThread.cs` — adapt from reference. STA thread, Win32 message pump, RunAsync(Action)/RunAsync<T>(Func<T>), WM_APP dispatch
- `MxAccessClient.cs` — core partial class implementing IMxAccessClient. Fields: StaComThread, IMxProxy, handle, state, semaphores, maps
- `MxAccessClient.Connection.cs` — ConnectAsync (Register on STA), DisconnectAsync (cleanup per MXA-007), COM cleanup
- `MxAccessClient.Subscription.cs` — SubscribeAsync (AddItem+AdviseSupervisory), UnsubscribeAsync, ReplayStoredSubscriptions
- `MxAccessClient.ReadWrite.cs` — ReadAsync (subscribe-get-first-unsubscribe), WriteAsync (Write+OnWriteComplete), semaphore-limited, timeout, ITimingScope metrics
- `MxAccessClient.EventHandlers.cs` — OnDataChange (resolve handle→address, create Vtq, invoke callback, update probe), OnWriteComplete (complete TCS, translate errors)
- `MxAccessClient.Monitor.cs` — monitor loop (reconnect on disconnect, probe staleness→force reconnect), cancellable
- `MxProxyAdapter.cs` — wraps real LMXProxyServer COM object, forwards calls to IMxProxy interface
**Test Helpers (in Tests project):**
- `FakeMxProxy.cs` — implements IMxProxy, simulates connections/data changes for testing
### Design Decision: IMxProxy Abstraction
Code against `IMxProxy` interface (not `LMXProxyServer` directly). This allows testing without ArchestrA.MxAccess.dll. `MxProxyAdapter` wraps the real COM object at runtime.
### Tests
- `StaComThreadTests.cs` — STA apartment verified, work item execution, dispose
- `MxAccessClientConnectionTests.cs` — state transitions, cleanup order
- `MxAccessClientSubscriptionTests.cs` — subscribe/unsubscribe, stored subscriptions, reconnect replay, OnDataChange→callback
- `MxAccessClientReadWriteTests.cs` — read returns value, read timeout, write completes on callback, write timeout, semaphore limiting
- `MxAccessClientMonitorTests.cs` — reconnect on disconnect, probe staleness
### Verification Gate 2
- [ ] Solution builds without ArchestrA.MxAccess.dll
- [ ] STA thread test proves work items execute on STA apartment
- [ ] Connection lifecycle: Disconnected→Connecting→Connected→Disconnecting→Disconnected
- [ ] Subscription replay: stored subscriptions replayed after simulated reconnect
- [ ] Read/Write: timeout behavior returns error within expected window
- [ ] Metrics: Read/Write record timing in PerformanceMetrics
- [ ] **WIRING CHECK:** OnDataChange callback reaches OnTagValueChanged delegate
- [ ] COM cleanup order: UnAdvise→RemoveItem→unwire events→Unregister→ReleaseComObject
- [ ] Error codes 1008/1012/1013 translate correctly in OnWriteComplete path
---
## PHASE 3: Galaxy Repository — SQL Queries and Change Detection
**Reqs:** GR-001, GR-002, GR-003, GR-004, GR-006, GR-007
### Files to Create
**GalaxyRepository/**
- `GalaxyRepositoryService.cs` — implements IGalaxyRepository. SQL embedded as `const string` (from gr/queries/). ADO.NET SqlConnection per-query. GetHierarchyAsync, GetAttributesAsync, GetLastDeployTimeAsync, TestConnectionAsync
- `ChangeDetectionService.cs` — background Timer at configured interval. Polls GetLastDeployTimeAsync, compares to last known, fires OnGalaxyChanged on change. First poll always triggers. Failed poll logs Warning, retries next interval
- `GalaxyRepositoryStats.cs` — POCO for dashboard: GalaxyName, DbConnected, LastDeployTime, ObjectCount, AttributeCount, LastRebuildTime
### Tests
- `ChangeDetectionServiceTests.cs` — first poll triggers, same timestamp skips, changed triggers, failed poll retries
- `GalaxyRepositoryServiceTests.cs` (integration, in IntegrationTests) — TestConnection, GetHierarchy returns rows, GetAttributes returns rows
### Verification Gate 3
- [ ] All SQL is `const string` — no concatenation, no parameters, no INSERT/UPDATE/DELETE (GR-006 code review)
- [ ] GetHierarchyAsync maps all columns: gobject_id, tag_name, contained_name, browse_name, parent_gobject_id, is_area
- [ ] GetAttributesAsync maps all columns including array_dimension
- [ ] Change detection: first poll fires, same timestamp skips, changed fires
- [ ] Failed query does NOT crash or trigger false rebuild
- [ ] GalaxyRepositoryStats populated for dashboard
- [ ] Zero rows from hierarchy logs Warning
---
## PHASE 4: OPC UA Server — Address Space and Node Manager
**Reqs:** OPC-001, OPC-002, OPC-003, OPC-004, OPC-005, OPC-006, OPC-007, OPC-008, OPC-009, OPC-010, OPC-011, OPC-012, OPC-013
### Files to Create
**OpcUa/**
- `LmxOpcUaServer.cs` — inherits StandardServer. Creates custom node manager. SecurityPolicy None. Registers namespace `urn:{GalaxyName}:LmxOpcUa`
- `LmxNodeManager.cs` — inherits CustomNodeManager2. Core class:
- `BuildAddressSpace(hierarchy, attributes)` — creates folder/object/variable nodes from Galaxy data. NodeId: `ns=1;s={tag_name}` / `ns=1;s={tag_name}.{attr}`. Stores full_tag_reference lookup
- `RebuildAddressSpace(hierarchy, attributes)` — removes old nodes, rebuilds. Preserves sessions
- Read/Write overrides delegate to IMxAccessClient via stored full_tag_reference
- Subscription management: ref-counted shared MXAccess subscriptions
- `OpcUaServerHost.cs` — manages ApplicationInstance lifecycle. Programmatic config (no XML). Start/Stop. Exposes ActiveSessionCount
- `OpcUaQualityMapper.cs` — domain Quality → OPC UA StatusCodes
- `DataValueConverter.cs` — COM variant ↔ OPC UA DataValue. Handles all types from data_type_mapping.md. DateTime UTC. Arrays
### Tests
- `DataValueConverterTests.cs` — all type conversions, arrays, DateTime UTC
- `LmxNodeManagerBuildTests.cs` — synthetic hierarchy matching gr/layout.md, verify node types, NodeIds, data types, ValueRank, ArrayDimensions
- `LmxNodeManagerRebuildTests.cs` — rebuild replaces nodes, old nodes gone, new nodes present
- `OpcUaQualityMapperTests.cs` — all quality families
### Verification Gate 4
- [ ] Endpoint URL: `opc.tcp://{hostname}:{port}/LmxOpcUa`
- [ ] Namespace: `urn:{GalaxyName}:LmxOpcUa` at index 1
- [ ] Root ZB folder under Objects
- [ ] Areas → FolderType + Organizes reference
- [ ] Non-areas → BaseObjectType + HasComponent reference
- [ ] Variable nodes: correct DataType, ValueRank, ArrayDimensions per data_type_mapping.md
- [ ] **WIRING CHECK:** Read handler resolves NodeId → full_tag_reference → calls IMxAccessClient.ReadAsync
- [ ] **WIRING CHECK:** Write handler resolves NodeId → full_tag_reference → calls IMxAccessClient.WriteAsync
- [ ] Rebuild removes old nodes, creates new ones without crash
- [ ] SecurityPolicy is None
- [ ] MaxSessions/SessionTimeout configured from appsettings
---
## PHASE 5: Status Dashboard — HTTP, HTML, JSON, Health
**Reqs:** DASH-001 through DASH-009
### Files to Create
**Status/**
- `StatusData.cs` — DTO: ConnectionInfo, HealthInfo, SubscriptionInfo, GalaxyInfo, OperationMetrics, Footer
- `HealthCheckService.cs` — rules: not connected→Unhealthy, success rate<50% w/>100 ops→Degraded, else Healthy
- `StatusReportService.cs` — aggregates from all components. GenerateHtml (self-contained, inline CSS, color-coded panels, meta-refresh). GenerateJson. IsHealthy
- `StatusWebServer.cs` — HttpListener. Routes: / → HTML, /api/status → JSON, /api/health → 200/503. GET only. no-cache headers. Disableable
### Tests
- `HealthCheckServiceTests.cs` — three health rules, messages
- `StatusReportServiceTests.cs` — HTML contains all panels, JSON deserializes, meta-refresh tag
- `StatusWebServerTests.cs` — routing (200/405/404), cache headers, start/stop
### Verification Gate 5
- [ ] HTML contains all panels: Connection, Health, Subscriptions, Galaxy Info, Operations table, Footer
- [ ] Connection panel: green/red/yellow border per state
- [ ] Health panel: three states with correct colors
- [ ] Operations table: Read/Write/Subscribe/Browse with Count/SuccessRate/Avg/Min/Max/P95
- [ ] Galaxy Info panel: galaxy name, DB status, last deploy, object/attribute counts, last rebuild
- [ ] Footer: timestamp + assembly version
- [ ] JSON API: all same data as HTML
- [ ] /api/health: 200 when healthy, 503 when unhealthy
- [ ] Meta-refresh tag with configured interval
- [ ] Port conflict does not prevent service startup
- [ ] Dashboard disabled via config skips HttpListener
---
## PHASE 6: Integration Wiring and End-to-End Verification
**Reqs:** SVC-004, SVC-005, SVC-006, ALL wiring verification
### OpcUaService.cs — Full Implementation
**Start() sequence (SVC-005):**
1. Load AppConfiguration via IConfiguration
2. ConfigurationValidator.ValidateAndLog()
3. Register AppDomain.UnhandledException handler (SVC-006)
4. Create PerformanceMetrics
5. Create MxAccessClient → ConnectAsync (failure = fatal, don't start)
6. Start MxAccessClient monitor loop
7. Create GalaxyRepositoryService → TestConnectionAsync (failure = warning, continue)
8. Create OpcUaServerHost + LmxNodeManager, inject IMxAccessClient
9. Query initial hierarchy + attributes → BuildAddressSpace
10. Start OPC UA server listener (failure = fatal)
11. Create ChangeDetectionService → **wire OnGalaxyChanged → nodeManager.RebuildAddressSpace**
12. Start change detection polling
13. Create HealthCheckService, StatusReportService, StatusWebServer → Start (failure = warning)
14. Log "LmxOpcUa service started successfully"
**Critical wiring (GUARDRAILS):**
- `_mxAccessClient.OnTagValueChanged` → node manager subscription delivery
- `_changeDetectionService.OnGalaxyChanged``_nodeManager.RebuildAddressSpace`
- `_mxAccessClient.ConnectionStateChanged` → health check updates
- Node manager Read/Write → `_mxAccessClient.ReadAsync/WriteAsync`
- StatusReportService reads from: MxAccessClient, PerformanceMetrics, GalaxyRepositoryStats, OpcUaServerHost
**Stop() sequence (SVC-004, reverse order, 30s max):**
1. Cancel CancellationTokenSource (stops all background loops)
2. Stop change detection
3. Stop OPC UA server
4. Disconnect MXAccess (full COM cleanup)
5. Stop StatusWebServer
6. Dispose PerformanceMetrics
7. Log "Service shutdown complete"
### Wiring Verification Tests (GUARDRAILS)
These tests prove components are connected end-to-end, not just implemented in isolation:
- `Wiring/MxAccessToNodeManagerWiringTest.cs` — simulate OnDataChange on FakeMxProxy → verify data reaches node manager subscription delivery
- `Wiring/ChangeDetectionToRebuildWiringTest.cs` — mock GalaxyRepository returns changed timestamp → verify RebuildAddressSpace called
- `Wiring/OpcUaReadToMxAccessWiringTest.cs` — issue Read via NodeManager → verify FakeMxProxy receives correct full_tag_reference
- `Wiring/OpcUaWriteToMxAccessWiringTest.cs` — issue Write via NodeManager → verify FakeMxProxy receives correct tag + value
- `Wiring/ServiceStartupSequenceTest.cs` — create OpcUaService with fakes, call Start(), verify all components created and wired
- `Wiring/ShutdownCompletesTest.cs` — Start then Stop, verify completes within 30s
- `EndToEnd/FullDataFlowTest.cs`**THE ULTIMATE SMOKE TEST**: full service with fakes, verify: (1) address space built, (2) MXAccess data change → OPC UA variable, (3) read → correct tag ref, (4) write → correct tag+value, (5) dashboard HTML has real data
### Verification Gate 6 (FINAL)
- [ ] Startup: all 14 steps execute in order
- [ ] Shutdown: completes within 30s, all components disposed in reverse order
- [ ] **WIRING:** MXAccess OnDataChange → node manager subscription delivery
- [ ] **WIRING:** Galaxy change → address space rebuild
- [ ] **WIRING:** OPC UA Read → MXAccess ReadAsync with correct tag reference
- [ ] **WIRING:** OPC UA Write → MXAccess WriteAsync with correct tag+value
- [ ] **WIRING:** Dashboard aggregates data from all components
- [ ] **WIRING:** Health endpoint reflects actual connection state
- [ ] AppDomain.UnhandledException registered
- [ ] TopShelf recovery configured (restart, 60s delay)
- [ ] FullDataFlowTest passes end-to-end
---
## Master Requirement Traceability (all 44)
| Req | Phase | Verified By |
|-----|-------|-------------|
| SVC-001 | Done | Program.cs already configured |
| SVC-002 | Done | Program.cs already configured |
| SVC-003 | 1 | ConfigurationLoadingTests |
| SVC-004 | 6 | ShutdownCompletesTest |
| SVC-005 | 6 | ServiceStartupSequenceTest |
| SVC-006 | 6 | AppDomain handler registration test |
| MXA-001 | 2 | StaComThreadTests |
| MXA-002 | 2 | MxAccessClientConnectionTests |
| MXA-003 | 2 | MxAccessClientSubscriptionTests |
| MXA-004 | 2 | MxAccessClientReadWriteTests |
| MXA-005 | 2 | MxAccessClientMonitorTests |
| MXA-006 | 2 | MxAccessClientMonitorTests (probe) |
| MXA-007 | 2 | Cleanup order test |
| MXA-008 | 2 | Metrics integration in ReadWrite |
| MXA-009 | 1+2 | MxErrorCodesTests + write error path |
| GR-001 | 3 | GetHierarchyAsync maps all columns |
| GR-002 | 3 | GetAttributesAsync maps all columns |
| GR-003 | 3 | ChangeDetectionServiceTests |
| GR-004 | 3+6 | ChangeDetectionToRebuildWiringTest |
| GR-005 | 1+3 | Config tests + ADO.NET usage |
| GR-006 | 3 | Code review: const string SQL only |
| GR-007 | 3 | TestConnectionAsync test |
| OPC-001 | 4 | Endpoint URL test |
| OPC-002 | 4 | BuildTests: node types + references |
| OPC-003 | 4 | BuildTests: variable nodes |
| OPC-004 | 4+6 | ReadWiringTest: browse→tag_name |
| OPC-005 | 1+4 | MxDataTypeMapperTests + variable node DataType |
| OPC-006 | 4 | BuildTests: ValueRank + ArrayDimensions |
| OPC-007 | 4+6 | OpcUaReadToMxAccessWiringTest |
| OPC-008 | 4+6 | OpcUaWriteToMxAccessWiringTest |
| OPC-009 | 4+6 | MxAccessToNodeManagerWiringTest |
| OPC-010 | 4+6 | RebuildTests + ChangeDetectionToRebuildWiringTest |
| OPC-011 | 4 | ServerStatus node test |
| OPC-012 | 4 | Namespace URI test |
| OPC-013 | 4 | Session config test |
| DASH-001 | 5 | StatusWebServerTests routing |
| DASH-002 | 5 | HTML contains Connection panel |
| DASH-003 | 5 | HealthCheckServiceTests |
| DASH-004 | 5 | HTML contains Subscriptions panel |
| DASH-005 | 5 | HTML contains Operations table |
| DASH-006 | 5 | HTML contains Footer |
| DASH-007 | 5 | Meta-refresh tag test |
| DASH-008 | 5 | JSON API deserialization test |
| DASH-009 | 5 | HTML contains Galaxy Info panel |
---
## Final Folder Structure
```
src/ZB.MOM.WW.LmxOpcUa.Host/
Configuration/ (Phase 1)
Domain/ (Phase 1)
Metrics/ (Phase 1)
MxAccess/ (Phase 2)
GalaxyRepository/ (Phase 3)
OpcUa/ (Phase 4)
Status/ (Phase 5)
OpcUaService.cs (Phase 6 — full wiring)
Program.cs (existing)
appsettings.json (existing)
tests/ZB.MOM.WW.LmxOpcUa.Tests/
Configuration/ (Phase 1)
Domain/ (Phase 1)
Metrics/ (Phase 1)
MxAccess/ (Phase 2)
GalaxyRepository/ (Phase 3)
OpcUa/ (Phase 4)
Status/ (Phase 5)
Wiring/ (Phase 6 — GUARDRAILS)
EndToEnd/ (Phase 6 — GUARDRAILS)
Helpers/FakeMxProxy.cs (Phase 2)
```
## Verification: How to Run
```bash
# Build
dotnet build ZB.MOM.WW.LmxOpcUa.slnx
# All tests
dotnet test ZB.MOM.WW.LmxOpcUa.slnx
# Phase-specific (by namespace convention)
dotnet test tests/ZB.MOM.WW.LmxOpcUa.Tests --filter "FullyQualifiedName~Configuration"
dotnet test tests/ZB.MOM.WW.LmxOpcUa.Tests --filter "FullyQualifiedName~MxAccess"
dotnet test tests/ZB.MOM.WW.LmxOpcUa.Tests --filter "FullyQualifiedName~GalaxyRepository"
dotnet test tests/ZB.MOM.WW.LmxOpcUa.Tests --filter "FullyQualifiedName~OpcUa"
dotnet test tests/ZB.MOM.WW.LmxOpcUa.Tests --filter "FullyQualifiedName~Status"
dotnet test tests/ZB.MOM.WW.LmxOpcUa.Tests --filter "FullyQualifiedName~Wiring"
dotnet test tests/ZB.MOM.WW.LmxOpcUa.Tests --filter "FullyQualifiedName~EndToEnd"
# Integration tests (requires ZB database)
dotnet test tests/ZB.MOM.WW.LmxOpcUa.IntegrationTests
```

View File

@@ -0,0 +1,121 @@
# Galaxy Repository — Component Requirements
Parent: [HLR-002](HighLevelReqs.md#hlr-002-galaxy-hierarchy-as-opc-ua-address-space), [HLR-005](HighLevelReqs.md#hlr-005-dynamic-address-space-rebuild)
## GR-001: Hierarchy Extraction
The system shall query the Galaxy Repository database to extract all deployed objects with their parent-child containment relationships, contained names, and tag names.
### Acceptance Criteria
- Executes `queries/hierarchy.sql` against the ZB database.
- Returns a list of objects with: `gobject_id`, `tag_name`, `contained_name`, `browse_name`, `parent_gobject_id`, `is_area`.
- Objects with `parent_gobject_id = 0` are children of the root ZB node.
- Only deployed, non-template objects matching the category filter (areas, engines, user-defined objects, etc.) are returned.
- Query completes within 10 seconds on a typical Galaxy (hundreds of objects). Log a Warning if it takes longer.
### Details
- Results are ordered by `parent_gobject_id, tag_name` for deterministic tree building.
- If the query returns zero rows, log a Warning (Galaxy may have no deployed objects, or the DB connection may be misconfigured).
- Orphan detection: if a row references a `parent_gobject_id` that does not exist in the result set and is not 0, log a Warning and skip that node.
---
## GR-002: Attribute Extraction
The system shall query user-defined (dynamic) attributes for deployed objects, including data type, array flag, and array dimensions.
### Acceptance Criteria
- Executes `queries/attributes.sql` using the template chain CTE to resolve inherited attributes.
- Returns: `gobject_id`, `tag_name`, `attribute_name`, `full_tag_reference`, `mx_data_type`, `is_array`, `array_dimension`, `security_classification`.
- Attributes starting with `_` are filtered out by the query.
- `array_dimension` is correctly extracted from the `mx_value` hex bytes (positions 13-16, little-endian uint16).
### Details
- CTE recursion depth is limited to 10 levels (per the query). This is sufficient for Galaxy template hierarchies.
- If `mx_data_type` is null or not in the known set (1-8, 13-16), default to String.
- If `gobject_id` from an attribute row does not match any hierarchy object, skip that attribute (object may not be deployed).
---
## GR-003: Change Detection
The system shall poll `galaxy.time_of_last_deploy` at a configurable interval to detect when a new deployment has occurred.
### Acceptance Criteria
- Polls `SELECT time_of_last_deploy FROM galaxy` at a configurable interval (`GalaxyRepository:ChangeDetectionIntervalSeconds`, default 30 seconds).
- Compares the returned timestamp to the last known value stored in memory.
- If different, triggers a rebuild (re-run hierarchy + attributes queries, notify OPC UA server).
- First poll after startup always triggers an initial build.
- If the query fails (SQL timeout, connection error), log Warning and retry at next interval. Do not trigger a rebuild on failure.
### Details
- Polling runs on a background timer thread, not blocking the STA thread.
- `time_of_last_deploy` is a datetime column. Compare using exact equality (not range).
---
## GR-004: Rebuild on Change
When a deployment change is detected, the system shall re-query hierarchy and attributes and provide the updated structure to the OPC UA server for address space rebuild.
### Acceptance Criteria
- On change detection, re-query both hierarchy and attributes.
- Provide the new data set to the OPC UA server component for address space replacement.
- Log at Information level: "Galaxy deployment change detected. Rebuilding address space. ({ObjectCount} objects, {AttributeCount} attributes)".
- Log total rebuild time at Information level.
- If the re-query fails, log Error and keep the existing address space (do not clear it).
### Details
- Rebuild is not atomic from the DB perspective — hierarchy and attributes are two separate queries. This is acceptable; deployment is an infrequent operation.
- Raise an event/callback that the OPC UA server subscribes to: `OnGalaxyChanged(hierarchyData, attributeData)`.
---
## GR-005: Connection Configuration
Database connection parameters shall be configurable via appsettings.json (connection string using Windows Authentication by default).
### Acceptance Criteria
- Connection string in `appsettings.json` under `GalaxyRepository:ConnectionString`.
- Default: `Server=localhost;Database=ZB;Integrated Security=true` (Windows Auth).
- ADO.NET `SqlConnection` used for queries (.NET Framework 4.8 built-in).
- Connection is opened per-query (not kept open). Connection pooling handles efficiency.
- If the initial connection test at startup fails, log Error with the connection string and continue attempting (change detection polls will keep retrying).
### Details
- Command timeout: configurable via `GalaxyRepository:CommandTimeoutSeconds`, default 30 seconds.
- No ORM. Raw ADO.NET with `SqlCommand` and `SqlDataReader`. SQL text is embedded as constants (not dynamically constructed).
---
## GR-006: Query Safety
All SQL queries shall be static read-only SELECT statements. No writes to the Galaxy Repository database.
### Acceptance Criteria
- All queries are hardcoded SQL strings with no string concatenation or user-supplied parameters.
- No INSERT, UPDATE, DELETE, or DDL statements are ever executed against the Galaxy database.
- Queries use only SELECT with read-only intent.
---
## GR-007: Startup Validation
On startup, the Galaxy Repository component shall validate database connectivity.
### Acceptance Criteria
- Execute a simple test query (`SELECT 1`) against the configured database.
- If the database is unreachable, log an Error but do not prevent service startup.
- The service runs in degraded mode (empty address space) until the database becomes available and the next change detection poll succeeds.

View File

@@ -0,0 +1,47 @@
# High-Level Requirements
## HLR-001: OPC UA Server
The system shall expose an OPC UA server endpoint that OPC UA clients can connect to for browsing, reading, and writing Galaxy tag data.
## HLR-002: Galaxy Hierarchy as OPC UA Address Space
The system shall build an OPC UA address space that mirrors the System Platform Galaxy object hierarchy, using contained names for browse structure and tag names for runtime data access.
## HLR-003: MXAccess Runtime Data Access
The system shall use the MXAccess toolkit to subscribe to, read, and write Galaxy tag attribute values at runtime on behalf of connected OPC UA clients.
## HLR-004: Data Type Mapping
The system shall map Galaxy attribute data types (mx_data_type) to appropriate OPC UA built-in types, including support for array attributes.
## HLR-005: Dynamic Address Space Rebuild
The system shall detect Galaxy deployment changes (via `galaxy.time_of_last_deploy`) and rebuild the OPC UA address space to reflect the current deployed state.
## HLR-006: Windows Service Hosting
The system shall run as a Windows service (via TopShelf) with support for install, uninstall, and interactive console modes.
## HLR-007: Logging
The system shall log operational events to rolling daily log files using Serilog.
## HLR-008: Connection Resilience
The system shall automatically reconnect to MXAccess after connection loss, replaying active subscriptions upon reconnect.
## HLR-009: Status Dashboard
The system shall host an embedded HTTP status dashboard (similar to the LmxProxy dashboard) providing at-a-glance operational visibility including connection state, health, subscription statistics, and operation metrics.
## Component-Level Requirements
Detailed requirements are broken out into the following documents:
- [OPC UA Server Requirements](OpcUaServerReqs.md)
- [MXAccess Client Requirements](MxAccessClientReqs.md)
- [Galaxy Repository Requirements](GalaxyRepositoryReqs.md)
- [Service Host Requirements](ServiceHostReqs.md)
- [Status Dashboard Requirements](StatusDashboardReqs.md)

View File

@@ -0,0 +1,172 @@
# MXAccess Client — Component Requirements
Parent: [HLR-003](HighLevelReqs.md#hlr-003-mxaccess-runtime-data-access), [HLR-008](HighLevelReqs.md#hlr-008-connection-resilience)
## MXA-001: STA Thread with Message Pump
All MXAccess COM objects shall be created and called on a dedicated STA thread running a Win32 message pump to ensure COM callbacks are delivered.
### Acceptance Criteria
- A dedicated thread is created with `ApartmentState.STA` before any MXAccess COM objects are instantiated.
- The thread runs a Win32 message pump using `GetMessage`/`TranslateMessage`/`DispatchMessage` loop.
- Work items are marshalled to the STA thread via `PostThreadMessage(WM_APP)` and a concurrent queue.
- The STA thread processes work items between message pump iterations.
- All COM object creation (`LMXProxyServer` constructor), method calls, and event callbacks happen on this thread.
### Details
- Thread name: `MxAccess-STA` (for diagnostics).
- If the STA thread dies unexpectedly, log Fatal and trigger service shutdown. Do not attempt to create a replacement thread (COM objects on the dead thread are unrecoverable).
- `RunAsync(Action)` method returns a `Task` that completes when the action executes on the STA thread. Callers can `await` it.
---
## MXA-002: Connection Lifecycle
The client shall support Register/Unregister lifecycle with the LMXProxyServer COM object, tracking the connection handle.
### Acceptance Criteria
- `Register(clientName)` is called on the STA thread and returns a positive connection handle on success.
- If Register returns handle <= 0, throw with descriptive error.
- `Unregister(handle)` is called during disconnect after all subscriptions are removed.
- Client name: configurable via `MxAccess:ClientName`, default `LmxOpcUa`. Must be unique per MXAccess registration.
- Connection state transitions: Disconnected → Connecting → Connected → Disconnecting → Disconnected (and Error from any state).
### Details
- `ConnectedSince` timestamp (UTC) is recorded after successful Register.
- `ReconnectCount` is tracked for diagnostics and dashboard display.
- State change events are raised for dashboard and health check consumption.
---
## MXA-003: Tag Subscription
The client shall support subscribing to tags via AddItem + AdviseSupervisory, receiving value updates through OnDataChange callbacks.
### Acceptance Criteria
- Subscribe sequence: `AddItem(handle, address)` returns item handle, then `AdviseSupervisory(handle, itemHandle)` starts the subscription.
- `OnDataChange` callback delivers value, quality (integer), timestamp, and MXSTATUS_PROXY array.
- Item address format: `tag_name.AttributeName` for scalars, `tag_name.AttributeName[]` for whole arrays.
- If AddItem fails (e.g., tag does not exist), log Warning and return failure to caller.
- Bidirectional maps of `address ↔ itemHandle` are maintained for callback resolution.
### Details
- Use `AdviseSupervisory` (not `Advise`) because this is a background service with no interactive user session. AdviseSupervisory allows secured/verified writes without user authentication.
- Stored subscriptions dictionary maps address to callback for reconnect replay.
- On reconnect, all entries in stored subscriptions are re-subscribed (AddItem + AdviseSupervisory with new handles).
---
## MXA-004: Tag Read/Write
The client shall support synchronous-style read and write operations, marshalled to the STA thread, with configurable timeouts.
### Acceptance Criteria
- Read: implemented as subscribe-get-first-value-unsubscribe pattern (AddItem → AdviseSupervisory → wait for OnDataChange → UnAdvise → RemoveItem).
- Write: AddItem → AdviseSupervisory → `Write()` → await `OnWriteComplete` callback → cleanup.
- Read timeout: configurable via `MxAccess:ReadTimeoutSeconds`, default 5 seconds.
- Write timeout: configurable via `MxAccess:WriteTimeoutSeconds`, default 5 seconds. On timeout, log Warning and return timeout error.
- Concurrent operation limit: configurable semaphore via `MxAccess:MaxConcurrentOperations`, default 10.
- All operations marshalled to the STA thread.
### Details
- Write uses security classification -1 (no security). Galaxy runtime handles security enforcement.
- `OnWriteComplete` callback: check MXSTATUS_PROXY `success` field. If 0, extract detail code and propagate error.
- COM exceptions (`COMException` with HRESULT) are caught and translated to meaningful error messages.
---
## MXA-005: Auto-Reconnect
The client shall monitor connection health and automatically reconnect on failure, replaying all stored subscriptions after reconnect.
### Acceptance Criteria
- Monitor loop runs on a background thread, checking connection health at configurable interval (`MxAccess:MonitorIntervalSeconds`, default 5 seconds).
- If disconnected, attempt reconnect. On success, replay all stored subscriptions.
- On reconnect failure, log Warning and retry at next interval (no exponential backoff — reconnect as quickly as possible on a plant-floor service).
- Reconnect count is incremented on each successful reconnect.
- Monitor loop is cancellable (for clean shutdown).
### Details
- Reconnect cleans up old COM objects before creating new ones.
- After reconnect, probe subscription is re-established first, then stored subscriptions.
- No max retry limit — keep trying indefinitely until service is stopped.
---
## MXA-006: Probe-Based Health Monitoring
The client shall optionally subscribe to a configurable probe tag and use OnDataChange callback staleness to detect silent connection failures.
### Acceptance Criteria
- Subscribe to a configurable probe tag (a known-good Galaxy attribute that changes periodically).
- Track `_lastProbeValueTime` (UTC) updated on each OnDataChange for the probe tag.
- If `DateTime.UtcNow - _lastProbeValueTime > staleThreshold`, force disconnect and reconnect.
- Probe tag address: configurable via `MxAccess:ProbeTag`. If not configured, probe monitoring is disabled.
- Stale threshold: configurable via `MxAccess:ProbeStaleThresholdSeconds`, default 60 seconds.
### Details
- The probe tag should be an attribute that the Galaxy runtime updates regularly (e.g., a platform heartbeat or area-level timestamp). The specific tag is site-dependent.
- After forced reconnect, reset `_lastProbeValueTime` to `DateTime.UtcNow` to give the new connection a full threshold window.
---
## MXA-007: COM Cleanup
On disconnect or disposal, the client shall unwire event handlers, unadvise/remove all items, unregister, and release COM objects via Marshal.ReleaseComObject.
### Acceptance Criteria
- Cleanup order: UnAdvise all active subscriptions → RemoveItem all items → unwire OnDataChange and OnWriteComplete event handlers → Unregister → `Marshal.ReleaseComObject`.
- On dispose: run disconnect if still connected, then dispose STA thread.
- Each cleanup step is wrapped in try/catch (cleanup must not throw).
- After cleanup: handle maps are cleared, pending write TCS entries are abandoned, COM reference is set to null.
### Details
- `_storedSubscriptions` is NOT cleared on disconnect (preserved for reconnect replay). Only cleared on Dispose.
- Event handlers must be unwired BEFORE Unregister, or callbacks may fire on a dead object.
- `Marshal.ReleaseComObject` in a finally block, always, even if earlier steps fail.
---
## MXA-008: Operation Metrics
The MXAccess client shall record timing and success/failure for Read, Write, and Subscribe operations.
### Acceptance Criteria
- Each operation records: duration (ms), success/failure.
- Metrics are available for the status dashboard: count, success rate, avg/min/max/P95 latency.
- Uses a rolling 1000-entry buffer for percentile calculation.
- Metrics are exposed via a queryable interface consumed by the status report service.
### Details
- Uses an `ITimingScope` pattern: `using (var scope = metrics.BeginOperation("read")) { ... }` for automatic timing and success tracking.
- Metrics are periodically logged at Debug level for diagnostics.
---
## MXA-009: Error Code Translation
The client shall translate known MXAccess error codes from MXSTATUS_PROXY.detail into human-readable messages for logging and OPC UA status propagation.
### Acceptance Criteria
- Error 1008 → "User lacks security permission"
- Error 1012 → "Secured write required (one signature)"
- Error 1013 → "Verified write required (two signatures)"
- Unknown error codes are logged with their numeric value.
- Translated messages are included in OPC UA StatusCode descriptions and log entries.

View File

@@ -0,0 +1,229 @@
# OPC UA Server — Component Requirements
Parent: [HLR-001](HighLevelReqs.md#hlr-001-opc-ua-server), [HLR-002](HighLevelReqs.md#hlr-002-galaxy-hierarchy-as-opc-ua-address-space), [HLR-004](HighLevelReqs.md#hlr-004-data-type-mapping)
## OPC-001: Server Endpoint
The OPC UA server shall listen on a configurable TCP port (default 4840) using the OPC Foundation .NET Standard stack.
### Acceptance Criteria
- Server starts and accepts TCP connections on the configured port.
- Port is read from `appsettings.json` under `OpcUa:Port`; defaults to 4840 if absent.
- Endpoint URL format: `opc.tcp://<hostname>:<port>/LmxOpcUa`.
- If the port is in use at startup, log an Error and fail to start (do not silently pick another port).
- Security policy: None (no certificate validation). This is an internal plant-floor service.
### Details
- Configurable items: port (default 4840), endpoint path (default `/LmxOpcUa`), server application name (default `LmxOpcUa`).
- Server shall use the `OPCFoundation.NetStandard.Opc.Ua.Server` NuGet package.
- On startup, log the endpoint URL at Information level.
---
## OPC-002: Address Space Structure
The server shall create folder nodes for areas and object nodes for automation objects, organized in the same parent-child hierarchy as the Galaxy.
### Acceptance Criteria
- The root folder node has BrowseName `ZB` (hardcoded Galaxy name).
- Objects where `is_area = 1` are created as FolderType nodes (organizational).
- Objects where `is_area = 0` are created as BaseObjectType nodes.
- Parent-child relationships use Organizes references (for areas) and HasComponent references (for contained objects).
- A client browsing Root → Objects → ZB → DEV → TestArea → TestMachine_001 → DelmiaReceiver sees the same structure as `gr/layout.md`.
### Details
- NodeIds use a string-based identifier scheme: `ns=1;s=<tag_name>` for object nodes, `ns=1;s=<tag_name>.<attribute_name>` for variable nodes.
- Infrastructure objects (AppEngines, Platforms) are included in the tree but may have no variable children.
- When `contained_name` is null or empty, fall back to `tag_name` as the BrowseName.
---
## OPC-003: Variable Nodes for Attributes
Each user-defined attribute on a deployed object shall be represented as an OPC UA variable node under its parent object node.
### Acceptance Criteria
- Each row from `attributes.sql` creates one variable node under the matching object node (matched by `gobject_id`).
- Variable node BrowseName and DisplayName are set to `attribute_name`.
- Variable node stores `full_tag_reference` as its runtime MXAccess address.
- Variable nodes have AccessLevel = CurrentRead | CurrentWrite (3) by default.
- Objects with no user-defined attributes still appear as object nodes with zero children.
### Details
- Security classification from the attributes query is noted but not enforced at the OPC UA level (Galaxy runtime handles security).
- Attributes whose names start with `_` are already filtered by the SQL query.
---
## OPC-004: Browse Name Translation
Browse names shall use contained names (human-readable, scoped to parent). The server shall internally translate browse paths to tag_name references for MXAccess operations.
### Acceptance Criteria
- A variable node browsed as `ZB/DEV/TestArea/TestMachine_001/DelmiaReceiver/DownloadPath` correctly translates to MXAccess reference `DelmiaReceiver_001.DownloadPath`.
- Translation uses the `tag_name` stored on the parent object node, not the browse path.
- No runtime path parsing — the mapping is baked into each node at build time.
### Details
- Each variable node stores its `full_tag_reference` (e.g., `DelmiaReceiver_001.DownloadPath`) at address-space build time. Read/write operations use this stored reference directly.
---
## OPC-005: Data Type Mapping
Variable nodes shall use OPC UA data types mapped from Galaxy mx_data_type values per the mapping in `gr/data_type_mapping.md`.
### Acceptance Criteria
- Every `mx_data_type` value in the mapping table produces the correct OPC UA DataType NodeId on the variable node.
- Unknown/unmapped `mx_data_type` values default to String (i=12).
- ElapsedTime (type 7) maps to Double representing seconds.
### Details
- Full mapping table in `gr/data_type_mapping.md`.
- DateTime conversion: Galaxy may store local time; convert to UTC for OPC UA.
- LocalizedText (type 15): use empty locale string with the text value.
---
## OPC-006: Array Support
Attributes marked as arrays shall have ValueRank=1 and ArrayDimensions set to the attribute's array_dimension value.
### Acceptance Criteria
- `is_array = 1` produces ValueRank = 1 (OneDimension) and ArrayDimensions = `[array_dimension]`.
- `is_array = 0` produces ValueRank = -1 (Scalar) and no ArrayDimensions.
- MXAccess reference for array attributes uses `tag_name.attribute[]` (whole array) format.
### Details
- Individual array element access (`tag_name.attribute[n]`) is not required for initial implementation. Whole-array read/write only.
- If `array_dimension` is null or 0 when `is_array = 1`, log a Warning and default to ArrayDimensions = [0] (variable-length).
---
## OPC-007: Read Operations
The server shall fulfill OPC UA Read requests by reading the corresponding tag value from MXAccess using the tag_name.AttributeName reference.
### Acceptance Criteria
- OPC UA Read request for a variable node results in a read via MXAccess using the node's stored `full_tag_reference`.
- Returned value is converted from the COM variant to the OPC UA data type specified on the node.
- OPC UA StatusCode reflects MXAccess quality: Good maps to Good, Bad/Uncertain map appropriately.
- If MXAccess is not connected, return StatusCode = Bad_NotConnected.
- Read timeout: configurable, default 5 seconds. On timeout, return Bad_Timeout.
### Details
- Prefer cached subscription-delivered values over on-demand reads to reduce COM round-trips.
- If no subscription is active for the tag, perform an on-demand read (AddItem, AdviseSupervisory, wait for first OnDataChange, then UnAdvise/RemoveItem).
- Concurrency: semaphore-limited to configurable max (default 10) concurrent MXAccess operations.
---
## OPC-008: Write Operations
The server shall fulfill OPC UA Write requests by writing to the corresponding tag via MXAccess.
### Acceptance Criteria
- OPC UA Write request results in an MXAccess `Write()` call with completion confirmed via `OnWriteComplete()` callback.
- Write timeout: configurable, default 5 seconds. On timeout, log Warning and return Bad_Timeout.
- MXSTATUS_PROXY with `success = 0` causes the OPC UA write to return Bad_InternalError with the detail message.
- MXAccess errors 1008 (no permission), 1012 (secured write), 1013 (verified write) return Bad_UserAccessDenied.
- Write to a non-existent tag returns Bad_NodeIdUnknown.
- The server shall attempt to convert the written value to the expected Galaxy data type before passing to Write().
### Details
- Write uses security classification -1 (no security). Galaxy runtime handles security enforcement.
- Write sequence: uses existing subscription handle if available, otherwise AddItem + AdviseSupervisory + Write + await OnWriteComplete + cleanup.
- Concurrent write limit: same semaphore as reads (configurable, default 10).
---
## OPC-009: Subscriptions
The server shall support OPC UA subscriptions by mapping them to MXAccess advisory subscriptions and forwarding data change notifications.
### Acceptance Criteria
- OPC UA CreateMonitoredItems results in MXAccess `AdviseSupervisory()` subscriptions for the requested tags.
- Data changes from `OnDataChange` callback are forwarded as OPC UA notifications to all subscribed clients.
- Shared subscriptions: if two OPC UA clients subscribe to the same tag, only one MXAccess subscription exists (ref-counted).
- Last subscriber unsubscribing triggers UnAdvise/RemoveItem on the MXAccess side.
- After MXAccess reconnect, all active MXAccess subscriptions are re-established automatically.
### Details
- Publishing interval from the OPC UA subscription request is honored on the OPC UA side; MXAccess delivers changes as fast as it receives them.
- OPC UA quality mapping from MXAccess quality integers: 192+ = Good, 64-191 = Uncertain, 0-63 = Bad.
- OnDataChange with MXSTATUS_PROXY failure: deliver notification with Bad quality to subscribed clients.
---
## OPC-010: Address Space Rebuild
When a Galaxy deployment change is detected, the server shall rebuild the address space without dropping existing OPC UA client connections where possible.
### Acceptance Criteria
- When Galaxy Repository detects a deployment change, the OPC UA address space is rebuilt.
- Existing OPC UA client sessions are preserved — clients stay connected.
- Subscriptions for tags that still exist after rebuild continue to work.
- Subscriptions for tags that no longer exist receive a Bad_NodeIdUnknown status notification.
- Rebuild is logged at Information level with timing (duration).
### Details
- Rebuild is a full replace, not an incremental diff. Re-query hierarchy and attributes, build new tree, swap atomically.
- During rebuild, reads/writes against the old address space may fail briefly. This is acceptable.
- New MXAccess subscriptions for new tags are established; removed tags are unsubscribed.
---
## OPC-011: Server Diagnostics Node
The server shall expose a ServerStatus node under the standard OPC UA Server object with ServerState, CurrentTime, and StartTime. This is required by the OPC UA specification for compliant servers.
### Acceptance Criteria
- ServerState reports Running during normal operation.
- CurrentTime returns the server's current UTC time.
- StartTime returns the UTC time when the service started.
---
## OPC-012: Namespace Configuration
The server shall register a namespace URI at namespace index 1. All application-specific NodeIds shall use this namespace.
### Acceptance Criteria
- Namespace URI: `urn:ZB:LmxOpcUa` (Galaxy name is configurable).
- All object and variable NodeIds created from Galaxy data use namespace index 1.
- Standard OPC UA nodes remain in namespace 0.
---
## OPC-013: Session Management
The server shall support multiple concurrent OPC UA client sessions.
### Acceptance Criteria
- Maximum concurrent sessions: configurable, default 100.
- Session timeout: configurable, default 30 minutes of inactivity.
- Expired sessions are cleaned up and their subscriptions removed.
- Session count is reported to the status dashboard.

View File

@@ -0,0 +1,117 @@
# Service Host — Component Requirements
Parent: [HLR-006](HighLevelReqs.md#hlr-006-windows-service-hosting), [HLR-007](HighLevelReqs.md#hlr-007-logging)
## SVC-001: TopShelf Hosting
The application shall use TopShelf for Windows service lifecycle (install, uninstall, start, stop) and interactive console mode for development.
### Acceptance Criteria
- TopShelf HostFactory configures the service with name `LmxOpcUa`, display name `LMX OPC UA Server`.
- Service installs via command line: `ZB.MOM.WW.LmxOpcUa.Host.exe install`.
- Service uninstalls via: `ZB.MOM.WW.LmxOpcUa.Host.exe uninstall`.
- Service runs as LocalSystem account (needed for MXAccess COM access and Windows Auth to SQL Server).
- Interactive console mode (exe with no args) works for development/debugging.
- `StartAutomatically` is set for Windows service registration.
### Details
- Platform target: x86 (32-bit) — required for MXAccess COM interop.
- Service description: "OPC UA server exposing System Platform Galaxy tags via MXAccess."
---
## SVC-002: Serilog Logging
The application shall configure Serilog with a rolling daily file sink and console sink, with log files retained for a configurable number of days (default 31).
### Acceptance Criteria
- Console sink active (for interactive/debug mode).
- Rolling daily file sink writing to `logs/lmxopcua-YYYYMMDD.log`.
- Retained file count: configurable, default 31 days.
- Minimum log level: configurable, default Information.
- Log file path: configurable, default `logs/lmxopcua-.log`.
- Serilog is initialized before any other component (first thing in Main).
- `Log.CloseAndFlush()` called in finally block on exit.
### Details
- Structured logging with Serilog message templates (not string.Format).
- Log output includes timestamp, level, source context, message, and exception.
- Fatal exceptions are caught at the top level and logged before exit.
---
## SVC-003: Configuration
The application shall load configuration from appsettings.json with support for environment-specific overrides (appsettings.*.json) and environment variables.
### Acceptance Criteria
- `appsettings.json` is the primary configuration file.
- Environment-specific overrides via `appsettings.{environment}.json`.
- Configuration sections: `OpcUa`, `MxAccess`, `GalaxyRepository`, `Dashboard`.
- Missing optional configuration keys use documented defaults (service does not crash).
- Invalid configuration (e.g., port = -1) is detected at startup with a clear error message.
### Details
- Config is loaded once at startup. No hot-reload (service restart required for config changes). This is appropriate for an industrial service.
- All configurable values and their defaults are documented in `appsettings.json`.
---
## SVC-004: Graceful Shutdown
On service stop, the application shall gracefully shut down all components and flush logs before exiting.
### Acceptance Criteria
- TopShelf WhenStopped triggers orderly shutdown.
- Shutdown sequence: (1) stop change detection polling, (2) stop OPC UA server (stop accepting new sessions, complete pending operations), (3) disconnect MXAccess (cleanup all COM objects), (4) stop status dashboard HTTP listener, (5) flush Serilog.
- Shutdown completes within 30 seconds (Windows SCM timeout).
- All IDisposable components are disposed in reverse-creation order.
### Details
- `CancellationTokenSource` signals all background loops (monitor, change detection, HTTP listener) to stop.
- Log "Service shutdown complete" at Information level as the final log entry before flush.
---
## SVC-005: Startup Sequence
The service shall start components in a defined order, with failure handling at each step.
### Acceptance Criteria
- Startup sequence:
1. Load configuration
2. Initialize Serilog
3. Start STA thread
4. Connect to MXAccess
5. Query Galaxy Repository for initial build
6. Build OPC UA address space
7. Start OPC UA server listener
8. Start change detection polling
9. Start status dashboard HTTP listener
- Failure in steps 1-4 prevents startup (service fails to start).
- Failure in steps 5-9 logs Error but allows the service to run in degraded mode.
### Details
- Degraded mode means the service is running but may have an empty address space (waiting for Galaxy DB) or no dashboard (port conflict). MXAccess connection is the minimum required for the service to be useful.
---
## SVC-006: Unhandled Exception Handling
The service shall handle unexpected crashes gracefully.
### Acceptance Criteria
- Register `AppDomain.CurrentDomain.UnhandledException` handler that logs Fatal before the process terminates.
- TopShelf service recovery is configured: restart on failure with 60-second delay.
- Fatal-level log entry includes the full exception details.

View File

@@ -0,0 +1,157 @@
# Status Dashboard — Component Requirements
Parent: [HLR-009](HighLevelReqs.md#hlr-009-status-dashboard)
Reference: LmxProxy Status Dashboard (see `dashboard.JPG` in project root).
## DASH-001: Embedded HTTP Endpoint
The service shall host a lightweight HTTP listener on a configurable port serving a self-contained HTML status dashboard page (no external dependencies).
### Acceptance Criteria
- Uses `System.Net.HttpListener` on a configurable port (`Dashboard:Port`, default 8081).
- Routes:
- `GET /` → HTML dashboard
- `GET /api/status` → JSON status report
- `GET /api/health` → 200 OK if healthy, 503 if unhealthy
- Only GET requests accepted; other methods return 405.
- Unknown paths return 404.
- All responses include `Cache-Control: no-cache, no-store, must-revalidate` headers.
- Dashboard can be disabled via config (`Dashboard:Enabled`, default true).
### Details
- HTTP prefix: `http://+:{port}/` to bind to all interfaces.
- If HttpListener fails to start (port conflict, missing URL reservation), log Error and continue service startup without the dashboard.
- HTML page is self-contained: inline CSS, no external resources (no CDN, no JavaScript frameworks).
---
## DASH-002: Connection Panel
The dashboard shall display a Connection panel showing MXAccess connection state.
### Acceptance Criteria
- Shows: **Connected** (True/False), **State** (Connected/Disconnected/Reconnecting/Error), **Connected Since** (UTC timestamp).
- Green left border when Connected, red when Disconnected/Error, yellow when Reconnecting.
- "Connected Since" shows "N/A" when not connected.
- Data sourced from MXAccess client's connection state properties.
### Details
- Timestamp format: `yyyy-MM-dd HH:mm:ss UTC`.
- Panel title: "Connection".
---
## DASH-003: Health Panel
The dashboard shall display a Health panel showing overall service health.
### Acceptance Criteria
- Three states: **Healthy** (green text), **Degraded** (yellow text), **Unhealthy** (red text).
- Includes a health message string explaining the status.
- Health rules:
- Not connected to MXAccess → Unhealthy
- Success rate < 50% with > 100 total operations → Degraded
- Connected with acceptable success rate → Healthy
### Details
- Health message examples: "LmxOpcUa is healthy", "MXAccess client is not connected", "Average success rate is below 50%".
- Green left border for Healthy, yellow for Degraded, red for Unhealthy.
---
## DASH-004: Subscriptions Panel
The dashboard shall display a Subscriptions panel showing subscription statistics.
### Acceptance Criteria
- Shows: **Clients** (connected OPC UA client count), **Tags** (total variable nodes in address space), **Active** (active MXAccess subscriptions), **Delivered** (cumulative data change notifications delivered).
- Values update on each dashboard refresh.
- Zero values shown as "0", not blank.
### Details
- "Tags" is the count of variable nodes, not object/folder nodes.
- "Active" is the count of distinct MXAccess item subscriptions (after ref-counting — the number of actual AdviseSupervisory calls, not the number of OPC UA monitored items).
- "Delivered" is a running counter since service start (not reset on reconnect).
---
## DASH-005: Operations Table
The dashboard shall display an operations metrics table showing performance statistics.
### Acceptance Criteria
- Table with columns: **Operation**, **Count**, **Success Rate**, **Avg (ms)**, **Min (ms)**, **Max (ms)**, **P95 (ms)**.
- Rows: Read, Write, Subscribe, Browse.
- Empty cells show em-dash ("—") when no data available (count = 0).
- Success rate displayed as percentage (e.g., "99.8%").
- Latency values rounded to 1 decimal place.
### Details
- Metrics sourced from the PerformanceMetrics component (1000-entry rolling buffer for percentile calculation).
- "Browse" row tracks OPC UA browse operations.
- "Subscribe" row tracks OPC UA CreateMonitoredItems operations.
---
## DASH-006: Footer
The dashboard shall display a footer with last-updated time and service identification.
### Acceptance Criteria
- Format: "Last updated: {timestamp} UTC | Service: ZB.MOM.WW.LmxOpcUa.Host v{version}".
- Timestamp is the server-side UTC time when the HTML was generated.
- Version is read from the assembly version (`Assembly.GetExecutingAssembly().GetName().Version`).
---
## DASH-007: Auto-Refresh
The dashboard page shall auto-refresh to show current status without manual reload.
### Acceptance Criteria
- HTML page includes `<meta http-equiv="refresh" content="10">` for 10-second auto-refresh.
- No JavaScript required for refresh (pure HTML meta-refresh).
- Refresh interval: configurable via `Dashboard:RefreshIntervalSeconds`, default 10 seconds.
---
## DASH-008: JSON Status API
The `/api/status` endpoint shall return a JSON object with all dashboard data for programmatic consumption.
### Acceptance Criteria
- Response Content-Type: `application/json`.
- JSON structure includes: connection state, health status, subscription statistics, and operation metrics.
- Same data as the HTML dashboard, structured for machine consumption.
- Suitable for integration with external monitoring tools.
---
## DASH-009: Galaxy Info Panel
The dashboard shall display a Galaxy Info panel showing Galaxy Repository state.
### Acceptance Criteria
- Shows: **Galaxy Name** (e.g., ZB), **DB Status** (Connected/Disconnected), **Last Deploy** (timestamp from `galaxy.time_of_last_deploy`), **Objects** (count), **Attributes** (count), **Last Rebuild** (timestamp of last address space rebuild).
- Provides visibility into the Galaxy Repository component's state independently of MXAccess connection status.
### Details
- "DB Status" reflects whether the most recent change detection poll succeeded.
- "Last Deploy" shows the raw `time_of_last_deploy` value from the Galaxy database.
- "Objects" and "Attributes" show counts from the most recent successful hierarchy/attribute query.