Files
lmxopcua/docs/security.md
Joseph Doherty 5506b43ddc Doc refresh (task #204) — operational docs for multi-process multi-driver OtOpcUa
Five operational docs rewritten for v2 (multi-process, multi-driver, Config-DB authoritative):

- docs/Configuration.md — replaced appsettings-only story with the two-layer model.
  appsettings.json is bootstrap only (Node identity, Config DB connection string,
  transport security, LDAP bind, logging). Authoritative config (clusters, namespaces,
  UNS, equipment, tags, driver instances, ACLs, role grants, poll groups) lives in
  the Config DB accessed via OtOpcUaConfigDbContext and edited through the Admin UI
  draft/publish workflow. Added v1-to-v2 migration index so operators can locate where
  each old section moved. Cross-links to docs/v2/config-db-schema.md + docs/v2/admin-ui.md.

- docs/Redundancy.md — Phase 6.3 rewrite. Named every class under
  src/ZB.MOM.WW.OtOpcUa.Server/Redundancy/: RedundancyCoordinator, RedundancyTopology,
  ApplyLeaseRegistry (publish fencing), PeerReachabilityTracker, RecoveryStateManager,
  ServiceLevelCalculator (pure function), RedundancyStatePublisher. Documented the
  full 11-band ServiceLevel matrix (Maintenance=0 through AuthoritativePrimary=255)
  from ServiceLevelCalculator.cs and the per-ClusterNode fields (RedundancyRole,
  ServiceLevelBase, ApplicationUri). Covered metrics
  (otopcua.redundancy.role_transition counter + primary/secondary/stale_count gauges
  on meter ZB.MOM.WW.OtOpcUa.Redundancy) and SignalR RoleChanged push from
  FleetStatusPoller to RedundancyTab.razor.

- docs/security.md — preserved the transport-security section (still accurate) and
  added Phase 6.2 authorization. Four concerns now documented in one place:
  (1) transport security profiles, (2) OPC UA auth via LdapUserAuthenticator
  (note: task spec called this LdapAuthenticationProvider — actual class name is
  LdapUserAuthenticator in Server/Security/), (3) data-plane authorization via
  NodeAcl + PermissionTrie + AuthorizationGate — additive-only model per decision
  #129, ClusterId → Namespace → UnsArea → UnsLine → Equipment → Tag hierarchy,
  NodePermissions bundle, PermissionProbeService in Admin for "probe this permission",
  (4) control-plane authorization via LdapGroupRoleMapping + AdminRole
  (ConfigViewer / ConfigEditor / FleetAdmin, CanEdit / CanPublish policies) —
  deliberately independent of data-plane ACLs per decision #150. Documented the
  OTOPCUA0001 Roslyn analyzer (UnwrappedCapabilityCallAnalyzer) as the compile-time
  guard ensuring every driver-capability async call is wrapped by CapabilityInvoker.

- docs/ServiceHosting.md — three-process rewrite: OtOpcUa Server (net10 x64,
  BackgroundService + AddWindowsService, hosts OPC UA endpoint + all non-Galaxy
  drivers), OtOpcUa Admin (net10 x64, Blazor Server + SignalR + /metrics via
  OpenTelemetry Prometheus exporter), OtOpcUa Galaxy.Host (.NET Framework 4.8 x86,
  NSSM-wrapped, env-variable driven, STA thread + MXAccess COM). Pipe ACL
  denies-Admins detail + non-elevated shell requirement captured from feedback memory.
  Divergence from CLAUDE.md: task spec said "TopShelf is still the service-installer
  wrapper per CLAUDE.md note" but no csproj in the repo references TopShelf — decision
  #30 replaced it with the generic host's AddWindowsService wrapper (per the doc
  comment on OpcUaServerService). Reflected the actual state + flagged this divergence
  here so someone can update CLAUDE.md separately.

- docs/StatusDashboard.md — replaced the full v1 reference (dashboard endpoints,
  health check rules, StatusData DTO, etc.) with a short "superseded by Admin UI"
  pointer that preserves git-blame continuity + avoids broken links from other docs
  that reference it.

Class references verified by reading:
  src/ZB.MOM.WW.OtOpcUa.Server/Redundancy/{RedundancyCoordinator, ServiceLevelCalculator,
      ApplyLeaseRegistry, RedundancyStatePublisher}.cs
  src/ZB.MOM.WW.OtOpcUa.Core/Authorization/{PermissionTrie, PermissionTrieBuilder,
      PermissionTrieCache, TriePermissionEvaluator, AuthorizationGate}.cs
  src/ZB.MOM.WW.OtOpcUa.Server/Security/{AuthorizationGate, LdapUserAuthenticator}.cs
  src/ZB.MOM.WW.OtOpcUa.Admin/{Program.cs, Services/AdminRoles.cs,
      Services/RedundancyMetrics.cs, Hubs/FleetStatusPoller.cs}
  src/ZB.MOM.WW.OtOpcUa.Server/Program.cs + appsettings.json
  src/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Host/{Program.cs, Ipc/PipeServer.cs}
  src/ZB.MOM.WW.OtOpcUa.Configuration/Entities/{ClusterNode, NodeAcl,
      LdapGroupRoleMapping}.cs
  src/ZB.MOM.WW.OtOpcUa.Analyzers/UnwrappedCapabilityCallAnalyzer.cs

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-20 01:34:25 -04:00

17 KiB

Security

OtOpcUa has four independent security concerns. This document covers all four:

  1. Transport security — OPC UA secure channel (signing, encryption, X.509 trust).
  2. OPC UA authentication — Anonymous / UserName / X.509 session identities; UserName tokens authenticated by LDAP bind.
  3. Data-plane authorization — who can browse, read, subscribe, write, acknowledge alarms on which nodes. Evaluated by PermissionTrie against the Config DB NodeAcl tree.
  4. Control-plane authorization — who can view or edit fleet configuration in the Admin UI. Gated by the AdminRole (ConfigViewer / ConfigEditor / FleetAdmin) claim from LdapGroupRoleMapping.

Transport security and OPC UA authentication are per-node concerns configured in the Server's bootstrap appsettings.json. Data-plane ACLs and Admin role grants live in the Config DB.


Transport Security

Overview

The OtOpcUa Server supports configurable OPC UA transport security profiles that control how data is protected on the wire between OPC UA clients and the server.

There are two distinct layers of security in OPC UA:

  • Transport security -- secures the communication channel itself using TLS-style certificate exchange, message signing, and encryption. This is what the OpcUaServer:SecurityProfile setting controls.
  • UserName token encryption -- protects user credentials (username/password) sent during session activation. The OPC UA stack encrypts UserName tokens using the server's application certificate regardless of the transport security mode. UserName authentication therefore works on None endpoints too — the credentials themselves are always encrypted. A secure transport profile adds protection against message-level tampering and eavesdropping of data payloads.

Supported security profiles

The server supports seven transport security profiles:

Profile Name Security Policy Message Security Mode Description
None None None No signing or encryption. Suitable for development and isolated networks only.
Basic256Sha256-Sign Basic256Sha256 Sign Messages are signed but not encrypted. Protects against tampering but data is visible on the wire.
Basic256Sha256-SignAndEncrypt Basic256Sha256 SignAndEncrypt Messages are both signed and encrypted. Full protection against tampering and eavesdropping.
Aes128_Sha256_RsaOaep-Sign Aes128_Sha256_RsaOaep Sign Modern profile with AES-128 encryption and SHA-256 signing.
Aes128_Sha256_RsaOaep-SignAndEncrypt Aes128_Sha256_RsaOaep SignAndEncrypt Modern profile with AES-128 encryption. Recommended for production.
Aes256_Sha256_RsaPss-Sign Aes256_Sha256_RsaPss Sign Strongest profile with AES-256 and RSA-PSS signatures.
Aes256_Sha256_RsaPss-SignAndEncrypt Aes256_Sha256_RsaPss SignAndEncrypt Strongest profile. Recommended for high-security deployments.

The server exposes a separate endpoint for each configured profile, and clients select the one they prefer during connection.

Configuration

Transport security is configured in the OpcUaServer section of the Server process's bootstrap appsettings.json:

{
  "OpcUaServer": {
    "EndpointUrl": "opc.tcp://0.0.0.0:4840/OtOpcUa",
    "ApplicationName": "OtOpcUa Server",
    "ApplicationUri": "urn:node-a:OtOpcUa",
    "PkiStoreRoot": "C:/ProgramData/OtOpcUa/pki",
    "AutoAcceptUntrustedClientCertificates": false,
    "SecurityProfile": "Basic256Sha256-SignAndEncrypt"
  }
}

The server certificate is auto-generated on first start if none exists in PkiStoreRoot/own/. Always generated even for None-only deployments because UserName token encryption depends on it.

PKI directory layout

{PkiStoreRoot}/
  own/        Server's own application certificate and private key
  issuer/     CA certificates that issued trusted client certificates
  trusted/    Explicitly trusted client (peer) certificates
  rejected/   Certificates that were presented but not trusted

Certificate trust flow

When a client connects using a secure profile (Sign or SignAndEncrypt), the following trust evaluation occurs:

  1. The client presents its application certificate during the secure channel handshake.
  2. The server checks whether the certificate exists in the trusted/ store.
  3. If found, the connection proceeds.
  4. If not found and AutoAcceptUntrustedClientCertificates is true, the certificate is automatically copied to trusted/ and the connection proceeds.
  5. If not found and AutoAcceptUntrustedClientCertificates is false, the certificate is copied to rejected/ and the connection is refused.

The Admin UI Certificates.razor page uses CertTrustService (singleton reading CertTrustOptions for the Server's PkiStoreRoot) to promote rejected client certs to trusted without operators having to file-copy manually.

Production hardening

  • Set AutoAcceptUntrustedClientCertificates = false.
  • Drop None from the profile set.
  • Use the Admin UI to promote trusted client certs rather than the auto-accept fallback.
  • Periodically audit the rejected/ directory; an unexpected entry is often a misconfigured client or a probe attempt.

OPC UA Authentication

The Server accepts three OPC UA identity-token types:

Token Handler Notes
Anonymous IUserAuthenticator.AuthenticateAsync(username: "", password: "") Refused in strict mode unless explicit anonymous grants exist; allowed in lax mode for backward compatibility.
UserName/Password LdapUserAuthenticator (src/ZB.MOM.WW.OtOpcUa.Server/Security/LdapUserAuthenticator.cs) LDAP bind + group lookup; resolved LdapGroups flow into the session's identity bearer (ILdapGroupsBearer).
X.509 Certificate Stack-level acceptance + role mapping via CN X.509 identity carries AuthenticatedUser + read roles; finer-grain authorization happens through the data-plane ACLs.

LDAP bind flow (LdapUserAuthenticator)

Program.cs in the Server registers the authenticator based on OpcUaServer:Ldap:

builder.Services.AddSingleton<IUserAuthenticator>(sp => ldapOptions.Enabled
    ? new LdapUserAuthenticator(ldapOptions, sp.GetRequiredService<ILogger<LdapUserAuthenticator>>())
    : new DenyAllUserAuthenticator());

LdapUserAuthenticator:

  1. Refuses to bind over plain-LDAP unless AllowInsecureLdap = true (dev/test only).
  2. Connects to Server:Port, optionally upgrades to TLS (UseTls = true, port 636 for AD).
  3. Binds as the service account; searches SearchBase for UserNameAttribute = username.
  4. Rebinds as the resolved user DN with the supplied password (the actual credential check).
  5. Reads GroupAttribute (default memberOf) and strips the leading CN= so operators configure friendly group names in GroupToRole.
  6. Returns a UserAuthResult carrying the validated username + the set of LDAP groups. The set flows through to the session identity via ILdapGroupsBearer.LdapGroups.

Configuration example (Active Directory production):

{
  "OpcUaServer": {
    "Ldap": {
      "Enabled": true,
      "Server": "dc01.corp.example.com",
      "Port": 636,
      "UseTls": true,
      "AllowInsecureLdap": false,
      "SearchBase": "DC=corp,DC=example,DC=com",
      "ServiceAccountDn": "CN=OtOpcUaSvc,OU=Service Accounts,DC=corp,DC=example,DC=com",
      "ServiceAccountPassword": "<from your secret store>",
      "GroupAttribute": "memberOf",
      "UserNameAttribute": "sAMAccountName",
      "GroupToRole": {
        "OPCUA-Operators": "WriteOperate",
        "OPCUA-Engineers": "WriteConfigure",
        "OPCUA-Tuners": "WriteTune",
        "OPCUA-AlarmAck": "AlarmAck"
      }
    }
  }
}

UserNameAttribute: "sAMAccountName" is the critical AD override — the default uid is not populated on AD user entries. Use userPrincipalName instead if operators log in with user@corp.example.com form. Nested group membership is not expanded — assign users directly to the role-mapped groups, or pre-flatten in AD.

The same options bind the Admin's LdapAuthService (cookie auth / login form) so operators authenticate with a single credential across both processes.


Data-Plane Authorization

Data-plane authorization is the check run on every OPC UA operation against an OtOpcUa endpoint: can this authenticated user Browse / Read / Subscribe / Write / HistoryRead / AckAlarm / Call on this specific node?

Per decision #129 the model is additive-only — no explicit Deny. Grants at each hierarchy level union; absence of a grant is the default-deny.

Hierarchy

ACLs are evaluated against the UNS path:

ClusterId → Namespace → UnsArea → UnsLine → Equipment → Tag

Each level can carry NodeAcl rows (src/ZB.MOM.WW.OtOpcUa.Configuration/Entities/NodeAcl.cs) that grant a permission bundle to a set of LdapGroups.

Permission flags

[Flags]
public enum NodePermissions : uint
{
    Browse            = 1 << 0,
    Read              = 1 << 1,
    Subscribe         = 1 << 2,
    HistoryRead       = 1 << 3,
    WriteOperate      = 1 << 4,
    WriteTune         = 1 << 5,
    WriteConfigure    = 1 << 6,
    AlarmRead         = 1 << 7,
    AlarmAcknowledge  = 1 << 8,
    AlarmConfirm      = 1 << 9,
    AlarmShelve       = 1 << 10,
    MethodCall        = 1 << 11,

    ReadOnly  = Browse | Read | Subscribe | HistoryRead | AlarmRead,
    Operator  = ReadOnly | WriteOperate | AlarmAcknowledge | AlarmConfirm,
    Engineer  = Operator | WriteTune | AlarmShelve,
    Admin     = Engineer | WriteConfigure | MethodCall,
}

The three Write tiers map to Galaxy's v1 SecurityClassificationFreeAccess/OperateWriteOperate, TuneWriteTune, ConfigureWriteConfigure. SecuredWrite / VerifiedWrite / ViewOnly classifications remain read-only from OPC UA regardless of grant.

Evaluator — PermissionTrie

src/ZB.MOM.WW.OtOpcUa.Core/Authorization/:

Class Role
PermissionTrie Cluster-scoped trie; each node carries (GroupId → NodePermissions) grants.
PermissionTrieBuilder Builds a trie from the current NodeAcl rows in one pass.
PermissionTrieCache Per-cluster memoised trie; invalidated via AclChangeNotifier when the Admin publishes a draft that touches ACLs.
TriePermissionEvaluator Implements IPermissionEvaluator.Authorize(session, operation, scope) — walks from the root to the leaf for the supplied NodeScope, unions grants along the path, compares required permission to the union.

NodeScope carries (ClusterId, NamespaceId, AreaId, LineId, EquipmentId, TagId); any suffix may be null — a tag-level ACL is more specific than an area-level ACL but both contribute via union.

Dispatch gate — AuthorizationGate

src/ZB.MOM.WW.OtOpcUa.Server/Security/AuthorizationGate.cs bridges the OPC UA stack's ISystemContext.UserIdentity to the evaluator. DriverNodeManager holds exactly one reference to it and calls IsAllowed(identity, OpcUaOperation.*, NodeScope) on every Read, Write, HistoryRead, Browse, Subscribe, AckAlarm, Call path. A false return short-circuits the dispatch with BadUserAccessDenied.

Key properties:

  • Driver-agnostic. No driver-level code participates in authorization decisions. Drivers report SecurityClassification as metadata on tag discovery; everything else flows through AuthorizationGate.
  • Fail-open-during-transition. StrictMode = false (default during ACL rollouts) lets sessions without resolved LDAP groups proceed; flip Authorization:StrictMode = true in production once ACLs are populated.
  • Evaluator stays pure. TriePermissionEvaluator has no OPC UA stack dependency — it's tested directly from xUnit.

Probe-this-permission (Admin UI)

PermissionProbeService (src/ZB.MOM.WW.OtOpcUa.Admin/Services/PermissionProbeService.cs) lets an operator ask "if a user with groups X, Y, Z asked to do operation O on node N, would it succeed?" The answer is rendered in the AclsTab "Probe" dialog — same evaluator, same trie, so the Admin UI answer and the live Server answer cannot disagree.

Full model

See docs/v2/acl-design.md for the complete design: trie invalidation, flag semantics, per-path override rules, and the reasoning behind additive-only (no Deny).


Control-Plane Authorization

Control-plane authorization governs the Admin UI — who can view fleet config, edit drafts, publish generations, manage cluster nodes + credentials.

Per decision #150 control-plane roles are deliberately independent of data-plane ACLs. An operator who can read every OPC UA tag in production may not be allowed to edit cluster config; conversely a ConfigEditor may not have any data-plane grants at all.

Roles

src/ZB.MOM.WW.OtOpcUa.Admin/Services/AdminRoles.cs:

Role Capabilities
ConfigViewer Read-only access to drafts, generations, audit log, fleet status.
ConfigEditor ConfigViewer plus draft editing (UNS, equipment, tags, ACLs, driver instances, reservations, CSV imports). Cannot publish.
FleetAdmin ConfigEditor plus publish, cluster/node CRUD, credential management, role-grant management.

Policies registered in Admin Program.cs:

builder.Services.AddAuthorizationBuilder()
    .AddPolicy("CanEdit",    p => p.RequireRole(AdminRoles.ConfigEditor, AdminRoles.FleetAdmin))
    .AddPolicy("CanPublish", p => p.RequireRole(AdminRoles.FleetAdmin));

Razor pages and API endpoints gate with [Authorize(Policy = "CanEdit")] / "CanPublish"; nav-menu sections hide via <AuthorizeView>.

Role grant source

Admin reads LdapGroupRoleMapping rows from the Config DB (src/ZB.MOM.WW.OtOpcUa.Configuration/Entities/LdapGroupRoleMapping.cs) — the same pattern as the data-plane NodeAcl but scoped to Admin roles + (optionally) cluster scope for multi-site fleets. The RoleGrants.razor page lets FleetAdmins edit these mappings without leaving the UI.


OTOPCUA0001 Analyzer — Compile-Time Guard

Per-capability resilience (retry, timeout, circuit-breaker, bulkhead) is applied by CapabilityInvoker in src/ZB.MOM.WW.OtOpcUa.Core/Resilience/. A driver-capability call made outside the invoker bypasses resilience entirely — which in production looks like inconsistent timeouts, un-wrapped retries, and unbounded blocking.

OTOPCUA0001 (Roslyn analyzer at src/ZB.MOM.WW.OtOpcUa.Analyzers/UnwrappedCapabilityCallAnalyzer.cs) fires as a compile-time warning when an async/Task-returning method on one of the seven guarded capability interfaces (IReadable, IWritable, ITagDiscovery, ISubscribable, IHostConnectivityProbe, IAlarmSource, IHistoryProvider) is invoked outside a lambda passed to CapabilityInvoker.ExecuteAsync / ExecuteWriteAsync / AlarmSurfaceInvoker.*. The analyzer walks up the syntax tree from the call site, finds any enclosing invoker invocation, and verifies the call lives transitively inside that invocation's anonymous-function argument — a sibling pattern (do the call, then invoke ExecuteAsync on something unrelated nearby) does not satisfy the rule.

Five xUnit-v3 + Shouldly tests at tests/ZB.MOM.WW.OtOpcUa.Analyzers.Tests cover the common fail/pass shapes + the sibling-pattern regression guard.

The rule is intentionally scoped to async surfaces — pure in-memory accessors like IHostConnectivityProbe.GetHostStatuses() return synchronously and do not require the invoker wrap.


Audit Logging

  • Server: Serilog AUDIT: prefix on every authentication success/failure, certificate validation result, write access denial. Written alongside the regular rolling file sink.
  • Admin: AuditLogService writes ConfigAuditLog rows to the Config DB for every publish, rollback, cluster-node CRUD, credential rotation. Visible in the Audit page for operators with ConfigViewer or above.

Troubleshooting

Certificate trust failure

Check {PkiStoreRoot}/rejected/ for the client's cert. Promote via Admin UI Certificates page, or copy the .der file manually to trusted/.

LDAP users can connect but fail authorization

Verify (a) OpcUaServer:Ldap:GroupAttribute returns groups in the form CN=MyGroup,… (OtOpcUa strips the CN= for matching), (b) a NodeAcl grant exists at any level of the node's UNS path that unions to the required permission, (c) Authorization:StrictMode is correctly set for the deployment stage.

LDAP bind rejected as "insecure"

Set UseTls = true + Port = 636, or temporarily flip AllowInsecureLdap = true in dev. Production Active Directory increasingly refuses plain-LDAP bind under LDAP-signing enforcement.

AuthorizationGate denies every call after a publish

AclChangeNotifier invalidates the PermissionTrieCache on publish; a stuck cache is usually a missed notification. Restart the Server as a quick mitigation and file a bug — the design is to stay fresh without restarts.