Commit Graph

17 Commits

Author SHA1 Message Date
Joseph Doherty
5506b43ddc Doc refresh (task #204) — operational docs for multi-process multi-driver OtOpcUa
Five operational docs rewritten for v2 (multi-process, multi-driver, Config-DB authoritative):

- docs/Configuration.md — replaced appsettings-only story with the two-layer model.
  appsettings.json is bootstrap only (Node identity, Config DB connection string,
  transport security, LDAP bind, logging). Authoritative config (clusters, namespaces,
  UNS, equipment, tags, driver instances, ACLs, role grants, poll groups) lives in
  the Config DB accessed via OtOpcUaConfigDbContext and edited through the Admin UI
  draft/publish workflow. Added v1-to-v2 migration index so operators can locate where
  each old section moved. Cross-links to docs/v2/config-db-schema.md + docs/v2/admin-ui.md.

- docs/Redundancy.md — Phase 6.3 rewrite. Named every class under
  src/ZB.MOM.WW.OtOpcUa.Server/Redundancy/: RedundancyCoordinator, RedundancyTopology,
  ApplyLeaseRegistry (publish fencing), PeerReachabilityTracker, RecoveryStateManager,
  ServiceLevelCalculator (pure function), RedundancyStatePublisher. Documented the
  full 11-band ServiceLevel matrix (Maintenance=0 through AuthoritativePrimary=255)
  from ServiceLevelCalculator.cs and the per-ClusterNode fields (RedundancyRole,
  ServiceLevelBase, ApplicationUri). Covered metrics
  (otopcua.redundancy.role_transition counter + primary/secondary/stale_count gauges
  on meter ZB.MOM.WW.OtOpcUa.Redundancy) and SignalR RoleChanged push from
  FleetStatusPoller to RedundancyTab.razor.

- docs/security.md — preserved the transport-security section (still accurate) and
  added Phase 6.2 authorization. Four concerns now documented in one place:
  (1) transport security profiles, (2) OPC UA auth via LdapUserAuthenticator
  (note: task spec called this LdapAuthenticationProvider — actual class name is
  LdapUserAuthenticator in Server/Security/), (3) data-plane authorization via
  NodeAcl + PermissionTrie + AuthorizationGate — additive-only model per decision
  #129, ClusterId → Namespace → UnsArea → UnsLine → Equipment → Tag hierarchy,
  NodePermissions bundle, PermissionProbeService in Admin for "probe this permission",
  (4) control-plane authorization via LdapGroupRoleMapping + AdminRole
  (ConfigViewer / ConfigEditor / FleetAdmin, CanEdit / CanPublish policies) —
  deliberately independent of data-plane ACLs per decision #150. Documented the
  OTOPCUA0001 Roslyn analyzer (UnwrappedCapabilityCallAnalyzer) as the compile-time
  guard ensuring every driver-capability async call is wrapped by CapabilityInvoker.

- docs/ServiceHosting.md — three-process rewrite: OtOpcUa Server (net10 x64,
  BackgroundService + AddWindowsService, hosts OPC UA endpoint + all non-Galaxy
  drivers), OtOpcUa Admin (net10 x64, Blazor Server + SignalR + /metrics via
  OpenTelemetry Prometheus exporter), OtOpcUa Galaxy.Host (.NET Framework 4.8 x86,
  NSSM-wrapped, env-variable driven, STA thread + MXAccess COM). Pipe ACL
  denies-Admins detail + non-elevated shell requirement captured from feedback memory.
  Divergence from CLAUDE.md: task spec said "TopShelf is still the service-installer
  wrapper per CLAUDE.md note" but no csproj in the repo references TopShelf — decision
  #30 replaced it with the generic host's AddWindowsService wrapper (per the doc
  comment on OpcUaServerService). Reflected the actual state + flagged this divergence
  here so someone can update CLAUDE.md separately.

- docs/StatusDashboard.md — replaced the full v1 reference (dashboard endpoints,
  health check rules, StatusData DTO, etc.) with a short "superseded by Admin UI"
  pointer that preserves git-blame continuity + avoids broken links from other docs
  that reference it.

Class references verified by reading:
  src/ZB.MOM.WW.OtOpcUa.Server/Redundancy/{RedundancyCoordinator, ServiceLevelCalculator,
      ApplyLeaseRegistry, RedundancyStatePublisher}.cs
  src/ZB.MOM.WW.OtOpcUa.Core/Authorization/{PermissionTrie, PermissionTrieBuilder,
      PermissionTrieCache, TriePermissionEvaluator, AuthorizationGate}.cs
  src/ZB.MOM.WW.OtOpcUa.Server/Security/{AuthorizationGate, LdapUserAuthenticator}.cs
  src/ZB.MOM.WW.OtOpcUa.Admin/{Program.cs, Services/AdminRoles.cs,
      Services/RedundancyMetrics.cs, Hubs/FleetStatusPoller.cs}
  src/ZB.MOM.WW.OtOpcUa.Server/Program.cs + appsettings.json
  src/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Host/{Program.cs, Ipc/PipeServer.cs}
  src/ZB.MOM.WW.OtOpcUa.Configuration/Entities/{ClusterNode, NodeAcl,
      LdapGroupRoleMapping}.cs
  src/ZB.MOM.WW.OtOpcUa.Analyzers/UnwrappedCapabilityCallAnalyzer.cs

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-20 01:34:25 -04:00
Joseph Doherty
3b2defd94f Phase 0 — mechanical rename ZB.MOM.WW.LmxOpcUa.* → ZB.MOM.WW.OtOpcUa.*
Renames all 11 projects (5 src + 6 tests), the .slnx solution file, all source-file namespaces, all axaml namespace references, and all v1 documentation references in CLAUDE.md and docs/*.md (excluding docs/v2/ which is already in OtOpcUa form). Also updates the TopShelf service registration name from "LmxOpcUa" to "OtOpcUa" per Phase 0 Task 0.6.

Preserves runtime identifiers per Phase 0 Out-of-Scope rules to avoid breaking v1/v2 client trust during coexistence: OPC UA `ApplicationUri` defaults (`urn:{GalaxyName}:LmxOpcUa`), server `EndpointPath` (`/LmxOpcUa`), `ServerName` default (feeds cert subject CN), `MxAccessConfiguration.ClientName` default (defensive — stays "LmxOpcUa" for MxAccess audit-trail consistency), client OPC UA identifiers (`ApplicationName = "LmxOpcUaClient"`, `ApplicationUri = "urn:localhost:LmxOpcUaClient"`, cert directory `%LocalAppData%\LmxOpcUaClient\pki\`), and the `LmxOpcUaServer` class name (class rename out of Phase 0 scope per Task 0.5 sed pattern; happens in Phase 1 alongside `LmxNodeManager → GenericDriverNodeManager` Core extraction). 23 LmxOpcUa references retained, all enumerated and justified in `docs/v2/implementation/exit-gate-phase-0.md`.

Build clean: 0 errors, 30 warnings (lower than baseline 167). Tests at strict improvement over baseline: 821 passing / 1 failing vs baseline 820 / 2 (one flaky pre-existing failure passed this run; the other still fails — both pre-existing and unrelated to the rename). `Client.UI.Tests`, `Historian.Aveva.Tests`, `Client.Shared.Tests`, `IntegrationTests` all match baseline exactly. Exit gate compliance results recorded in `docs/v2/implementation/exit-gate-phase-0.md` with all 7 checks PASS or DEFERRED-to-PR-review (#7 service install verification needs Windows service permissions on the reviewer's box).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-17 13:57:47 -04:00
Joseph Doherty
bc282b6788 Add Galaxy platform scope filter so multi-node deployments can restrict the OPC UA address space to only objects hosted by the local platform, reducing memory footprint and MXAccess subscription count from the full Galaxy (49 objects / 4206 attributes) down to the local subtree (3 objects / 386 attributes on the dev Galaxy).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-16 00:39:11 -04:00
Joseph Doherty
c76ab8fdee Close all four stability-review 2026-04-13 findings so a failed runtime probe subscription can no longer leave a phantom entry that Tick() flips to Stopped and fans out false BadOutOfService quality across a host's subtree, a silently-failed dashboard bind no longer lets the service advertise a successful start while an operator-visible endpoint is dead, the seven sync-over-async sites in LmxNodeManager (rebuild probe sync, Read, Write, four HistoryRead overrides) can no longer park the OPC UA stack thread indefinitely on a hung backend, and alarm auto-subscribe + transferred-subscription restore no longer race shutdown as untracked fire-and-forget tasks.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-14 00:48:07 -04:00
Joseph Doherty
0003984c1a Document the Galaxy runtime status feature across the architecture guides so operators and future maintainers can find probe machinery, config fields, dashboard panel, and HealthCheck Rule 2e without having to dig through runtimestatus.md or service_info.md
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-13 16:36:35 -04:00
Joseph Doherty
8f340553d9 Instrument the historian plugin with runtime query health counters and read-only cluster failover so operators can detect silent query degradation and keep serving history when a single cluster node goes down
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-13 14:08:32 -04:00
Joseph Doherty
517d92c76f Scope alarm tracking to selected templates and surface endpoint/security state on the dashboard so operators can deploy in large galaxies without drowning clients in irrelevant alarms or guessing what the server is advertising
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-13 09:48:57 -04:00
Joseph Doherty
9b42b61eb6 Extract historian into a runtime-loaded plugin so hosts without the Wonderware SDK can run with Historian.Enabled=false
The aahClientManaged SDK is now isolated in ZB.MOM.WW.LmxOpcUa.Historian.Aveva and loaded via HistorianPluginLoader from a Historian/ subfolder only when enabled, removing the SDK from Host's compile-time and deploy-time surface.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-12 15:16:07 -04:00
Joseph Doherty
6d47687573 Resolve DA, A&C, and security spec gaps with ServerCapabilities, alarm methods, and modern profiles
Add ServerCapabilities/OperationLimits node, enable diagnostics, add OnModifyMonitoredItemsComplete
override for DA compliance. Wire shelving, enable/disable, confirm, and addcomment handlers on
alarm conditions with LocalTime/Quality event fields for Part 9 compliance. Add Aes128/Aes256
security profiles, X.509 certificate authentication, and AUDIT-prefixed auth logging. Fix flaky
probe monitor test. Update docs for all changes.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-06 22:02:05 -04:00
Joseph Doherty
41f0e9ec4c Migrate historian from SQL to aahClientManaged SDK and resolve all OPC UA Part 11 gaps
Replace direct SQL queries against Historian Runtime database with the Wonderware
Historian managed SDK (ArchestrA.HistorianAccess). Add HistoryServerCapabilities node,
AggregateFunctions folder, continuation points, ReadAtTime interpolation, ReturnBounds,
ReadModified rejection, HistoricalDataConfiguration per node, historical event access,
and client-side StandardDeviation aggregate support. Remove screenshot tests.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-06 16:38:00 -04:00
Joseph Doherty
50b85d41bd Consolidate LDAP roles into OPC UA session roles with granular write permissions
Map LDAP groups to custom OPC UA role NodeIds on RoleBasedIdentity.GrantedRoleIds
during authentication, replacing the username-to-role side cache. Split ReadWrite
into WriteOperate/WriteTune/WriteConfigure so write access is gated per Galaxy
security classification. AnonymousCanWrite now behaves consistently regardless
of LDAP state.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-29 01:50:16 -04:00
Joseph Doherty
d9463d6998 Remove static Users auth, use shared QualityMapper for historian, simplify LDAP permission checks
- Remove ConfigUserAuthenticationProvider and Users property — LDAP is the only auth mechanism
- Fix historian quality mapping to use existing QualityMapper (OPC DA quality bytes, not custom mapping)
- Add AppRoles constants, unify HasWritePermission/HasAlarmAckPermission into shared HasRole helper
- Hoist write permission check out of per-item loop, eliminate redundant _ldapRolesEnabled field
- Update docs (Configuration.md, Security.md, OpcUaServer.md, HistoricalDataAccess.md)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-28 19:23:20 -04:00
Joseph Doherty
74107ea95e Add LDAP authentication with role-based OPC UA permissions
Replace static user list with GLAuth LDAP authentication. Group
membership (ReadOnly, ReadWrite, AlarmAck) maps to granular OPC UA
permissions for write and alarm-ack operations. Anonymous can still
browse and read but not write.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-28 18:57:30 -04:00
Joseph Doherty
a55153d7d5 Add configurable non-transparent OPC UA server redundancy
Separates ApplicationUri from namespace identity so each instance in a
redundant pair has a unique server URI while sharing the same Galaxy
namespace. Exposes RedundancySupport, ServerUriArray, and dynamic
ServiceLevel through the standard OPC UA server object. ServiceLevel
is computed from role (Primary/Secondary) and runtime health (MXAccess
and DB connectivity). Adds CLI redundancy command, second deployed
service instance, and 31 new tests including paired-server integration.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-28 13:32:17 -04:00
Joseph Doherty
55173665b1 Add configurable transport security profiles and bind address
Adds Security section to appsettings.json with configurable OPC UA
transport profiles (None, Basic256Sha256-Sign, Basic256Sha256-SignAndEncrypt),
certificate policy settings, and a configurable BindAddress for the
OPC UA endpoint. Defaults preserve backward compatibility.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-27 15:59:43 -04:00
Joseph Doherty
bbd043e97b Add authentication and role-based write access control
Implements configurable user authentication (anonymous + username/password)
with pluggable credential provider (IUserAuthenticationProvider). Anonymous
writes can be disabled via AnonymousCanWrite setting while reads remain
open. Adds -U/-P flags to all CLI commands for authenticated sessions.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-27 02:14:37 -04:00
Joseph Doherty
965e430f48 Add component-level documentation for all 14 server subsystems
Provides technical documentation covering OPC UA server, address space,
Galaxy repository, MXAccess bridge, data types, read/write, subscriptions,
alarms, historian, incremental sync, configuration, dashboard, service
hosting, and CLI tool. Updates README with component documentation table.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-26 15:47:59 -04:00