Doc refresh (task #204) — operational docs for multi-process multi-driver OtOpcUa

Five operational docs rewritten for v2 (multi-process, multi-driver, Config-DB authoritative):

- docs/Configuration.md — replaced appsettings-only story with the two-layer model.
  appsettings.json is bootstrap only (Node identity, Config DB connection string,
  transport security, LDAP bind, logging). Authoritative config (clusters, namespaces,
  UNS, equipment, tags, driver instances, ACLs, role grants, poll groups) lives in
  the Config DB accessed via OtOpcUaConfigDbContext and edited through the Admin UI
  draft/publish workflow. Added v1-to-v2 migration index so operators can locate where
  each old section moved. Cross-links to docs/v2/config-db-schema.md + docs/v2/admin-ui.md.

- docs/Redundancy.md — Phase 6.3 rewrite. Named every class under
  src/ZB.MOM.WW.OtOpcUa.Server/Redundancy/: RedundancyCoordinator, RedundancyTopology,
  ApplyLeaseRegistry (publish fencing), PeerReachabilityTracker, RecoveryStateManager,
  ServiceLevelCalculator (pure function), RedundancyStatePublisher. Documented the
  full 11-band ServiceLevel matrix (Maintenance=0 through AuthoritativePrimary=255)
  from ServiceLevelCalculator.cs and the per-ClusterNode fields (RedundancyRole,
  ServiceLevelBase, ApplicationUri). Covered metrics
  (otopcua.redundancy.role_transition counter + primary/secondary/stale_count gauges
  on meter ZB.MOM.WW.OtOpcUa.Redundancy) and SignalR RoleChanged push from
  FleetStatusPoller to RedundancyTab.razor.

- docs/security.md — preserved the transport-security section (still accurate) and
  added Phase 6.2 authorization. Four concerns now documented in one place:
  (1) transport security profiles, (2) OPC UA auth via LdapUserAuthenticator
  (note: task spec called this LdapAuthenticationProvider — actual class name is
  LdapUserAuthenticator in Server/Security/), (3) data-plane authorization via
  NodeAcl + PermissionTrie + AuthorizationGate — additive-only model per decision
  #129, ClusterId → Namespace → UnsArea → UnsLine → Equipment → Tag hierarchy,
  NodePermissions bundle, PermissionProbeService in Admin for "probe this permission",
  (4) control-plane authorization via LdapGroupRoleMapping + AdminRole
  (ConfigViewer / ConfigEditor / FleetAdmin, CanEdit / CanPublish policies) —
  deliberately independent of data-plane ACLs per decision #150. Documented the
  OTOPCUA0001 Roslyn analyzer (UnwrappedCapabilityCallAnalyzer) as the compile-time
  guard ensuring every driver-capability async call is wrapped by CapabilityInvoker.

- docs/ServiceHosting.md — three-process rewrite: OtOpcUa Server (net10 x64,
  BackgroundService + AddWindowsService, hosts OPC UA endpoint + all non-Galaxy
  drivers), OtOpcUa Admin (net10 x64, Blazor Server + SignalR + /metrics via
  OpenTelemetry Prometheus exporter), OtOpcUa Galaxy.Host (.NET Framework 4.8 x86,
  NSSM-wrapped, env-variable driven, STA thread + MXAccess COM). Pipe ACL
  denies-Admins detail + non-elevated shell requirement captured from feedback memory.
  Divergence from CLAUDE.md: task spec said "TopShelf is still the service-installer
  wrapper per CLAUDE.md note" but no csproj in the repo references TopShelf — decision
  #30 replaced it with the generic host's AddWindowsService wrapper (per the doc
  comment on OpcUaServerService). Reflected the actual state + flagged this divergence
  here so someone can update CLAUDE.md separately.

- docs/StatusDashboard.md — replaced the full v1 reference (dashboard endpoints,
  health check rules, StatusData DTO, etc.) with a short "superseded by Admin UI"
  pointer that preserves git-blame continuity + avoids broken links from other docs
  that reference it.

Class references verified by reading:
  src/ZB.MOM.WW.OtOpcUa.Server/Redundancy/{RedundancyCoordinator, ServiceLevelCalculator,
      ApplyLeaseRegistry, RedundancyStatePublisher}.cs
  src/ZB.MOM.WW.OtOpcUa.Core/Authorization/{PermissionTrie, PermissionTrieBuilder,
      PermissionTrieCache, TriePermissionEvaluator, AuthorizationGate}.cs
  src/ZB.MOM.WW.OtOpcUa.Server/Security/{AuthorizationGate, LdapUserAuthenticator}.cs
  src/ZB.MOM.WW.OtOpcUa.Admin/{Program.cs, Services/AdminRoles.cs,
      Services/RedundancyMetrics.cs, Hubs/FleetStatusPoller.cs}
  src/ZB.MOM.WW.OtOpcUa.Server/Program.cs + appsettings.json
  src/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Host/{Program.cs, Ipc/PipeServer.cs}
  src/ZB.MOM.WW.OtOpcUa.Configuration/Entities/{ClusterNode, NodeAcl,
      LdapGroupRoleMapping}.cs
  src/ZB.MOM.WW.OtOpcUa.Analyzers/UnwrappedCapabilityCallAnalyzer.cs

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
Joseph Doherty
2026-04-20 01:30:56 -04:00
parent 71339307fa
commit 5506b43ddc
5 changed files with 509 additions and 1236 deletions

View File

@@ -1,15 +1,28 @@
# Transport Security
# Security
## Overview
OtOpcUa has four independent security concerns. This document covers all four:
The LmxOpcUa server supports configurable transport security profiles that control how data is protected on the wire between OPC UA clients and the server.
1. **Transport security** — OPC UA secure channel (signing, encryption, X.509 trust).
2. **OPC UA authentication** — Anonymous / UserName / X.509 session identities; UserName tokens authenticated by LDAP bind.
3. **Data-plane authorization** — who can browse, read, subscribe, write, acknowledge alarms on which nodes. Evaluated by `PermissionTrie` against the Config DB `NodeAcl` tree.
4. **Control-plane authorization** — who can view or edit fleet configuration in the Admin UI. Gated by the `AdminRole` (`ConfigViewer` / `ConfigEditor` / `FleetAdmin`) claim from `LdapGroupRoleMapping`.
Transport security and OPC UA authentication are per-node concerns configured in the Server's bootstrap `appsettings.json`. Data-plane ACLs and Admin role grants live in the Config DB.
---
## Transport Security
### Overview
The OtOpcUa Server supports configurable OPC UA transport security profiles that control how data is protected on the wire between OPC UA clients and the server.
There are two distinct layers of security in OPC UA:
- **Transport security** -- secures the communication channel itself using TLS-style certificate exchange, message signing, and encryption. This is what the `Security` configuration section controls.
- **UserName token encryption** -- protects user credentials (username/password) sent during session activation. The OPC UA stack encrypts UserName tokens using the server's application certificate regardless of the transport security mode. This means UserName authentication works on `None` endpoints too — the credentials themselves are always encrypted. However, a secure transport profile adds protection against message-level tampering and eavesdropping of data payloads.
- **Transport security** -- secures the communication channel itself using TLS-style certificate exchange, message signing, and encryption. This is what the `OpcUaServer:SecurityProfile` setting controls.
- **UserName token encryption** -- protects user credentials (username/password) sent during session activation. The OPC UA stack encrypts UserName tokens using the server's application certificate regardless of the transport security mode. UserName authentication therefore works on `None` endpoints too — the credentials themselves are always encrypted. A secure transport profile adds protection against message-level tampering and eavesdropping of data payloads.
## Supported Security Profiles
### Supported security profiles
The server supports seven transport security profiles:
@@ -23,334 +36,88 @@ The server supports seven transport security profiles:
| `Aes256_Sha256_RsaPss-Sign` | Aes256_Sha256_RsaPss | Sign | Strongest profile with AES-256 and RSA-PSS signatures. |
| `Aes256_Sha256_RsaPss-SignAndEncrypt` | Aes256_Sha256_RsaPss | SignAndEncrypt | Strongest profile. Recommended for high-security deployments. |
Multiple profiles can be enabled simultaneously. The server exposes a separate endpoint for each configured profile, and clients select the one they prefer during connection.
The server exposes a separate endpoint for each configured profile, and clients select the one they prefer during connection.
If no valid profiles are configured (or all names are unrecognized), the server falls back to `None` with a warning in the log.
### Configuration
## Configuration
Transport security is configured in the `Security` section of `appsettings.json`:
Transport security is configured in the `OpcUaServer` section of the Server process's bootstrap `appsettings.json`:
```json
{
"Security": {
"Profiles": ["None"],
"AutoAcceptClientCertificates": true,
"RejectSHA1Certificates": true,
"MinimumCertificateKeySize": 2048,
"PkiRootPath": null,
"CertificateSubject": null
"OpcUaServer": {
"EndpointUrl": "opc.tcp://0.0.0.0:4840/OtOpcUa",
"ApplicationName": "OtOpcUa Server",
"ApplicationUri": "urn:node-a:OtOpcUa",
"PkiStoreRoot": "C:/ProgramData/OtOpcUa/pki",
"AutoAcceptUntrustedClientCertificates": false,
"SecurityProfile": "Basic256Sha256-SignAndEncrypt"
}
}
```
### Properties
The server certificate is auto-generated on first start if none exists in `PkiStoreRoot/own/`. Always generated even for `None`-only deployments because UserName token encryption depends on it.
| Property | Type | Default | Description |
|--------------------------------|------------|--------------------------------------------------|-------------|
| `Profiles` | `string[]` | `["None"]` | List of security profile names to expose as server endpoints. Valid values: `None`, `Basic256Sha256-Sign`, `Basic256Sha256-SignAndEncrypt`, `Aes128_Sha256_RsaOaep-Sign`, `Aes128_Sha256_RsaOaep-SignAndEncrypt`, `Aes256_Sha256_RsaPss-Sign`, `Aes256_Sha256_RsaPss-SignAndEncrypt`. Profile names are case-insensitive. Duplicates are ignored. |
| `AutoAcceptClientCertificates` | `bool` | `true` | When `true`, the server automatically trusts client certificates that are not already in the trusted store. Set to `false` in production for explicit trust management. |
| `RejectSHA1Certificates` | `bool` | `true` | When `true`, client certificates signed with SHA-1 are rejected. SHA-1 is considered cryptographically weak. |
| `MinimumCertificateKeySize` | `int` | `2048` | Minimum RSA key size (in bits) required for client certificates. Certificates with shorter keys are rejected. |
| `PkiRootPath` | `string?` | `null` (defaults to `%LOCALAPPDATA%\OPC Foundation\pki`) | Override for the PKI root directory where certificates are stored. When `null`, uses the OPC Foundation default location. |
| `CertificateSubject` | `string?` | `null` (defaults to `CN={ServerName}, O=ZB MOM, DC=localhost`) | Override for the server certificate subject name. When `null`, the subject is derived from the configured `ServerName`. |
### Example: Development (no security)
```json
{
"Security": {
"Profiles": ["None"],
"AutoAcceptClientCertificates": true
}
}
```
### Example: Production (encrypted only)
```json
{
"Security": {
"Profiles": ["Basic256Sha256-SignAndEncrypt"],
"AutoAcceptClientCertificates": false,
"RejectSHA1Certificates": true,
"MinimumCertificateKeySize": 2048
}
}
```
### Example: Mixed (sign and encrypt endpoints, no plaintext)
```json
{
"Security": {
"Profiles": ["Basic256Sha256-Sign", "Basic256Sha256-SignAndEncrypt"],
"AutoAcceptClientCertificates": false
}
}
```
## PKI Directory Layout
The server stores certificates in a directory-based PKI store. The default root is:
### PKI directory layout
```
%LOCALAPPDATA%\OPC Foundation\pki\
```
This can be overridden with the `PkiRootPath` setting. The directory structure is:
```
pki/
{PkiStoreRoot}/
own/ Server's own application certificate and private key
issuer/ CA certificates that issued trusted client certificates
trusted/ Explicitly trusted client (peer) certificates
rejected/ Certificates that were presented but not trusted
```
### Certificate Trust Flow
### Certificate trust flow
When a client connects using a secure profile (`Sign` or `SignAndEncrypt`), the following trust evaluation occurs:
1. The client presents its application certificate during the secure channel handshake.
2. The server checks whether the certificate exists in the `trusted/` store.
3. If found, the connection proceeds (subject to key size and SHA-1 checks).
4. If not found and `AutoAcceptClientCertificates` is `true`, the certificate is automatically copied to `trusted/` and the connection proceeds.
5. If not found and `AutoAcceptClientCertificates` is `false`, the certificate is copied to `rejected/` and the connection is refused.
6. Regardless of trust status, the certificate must meet the `MinimumCertificateKeySize` requirement and pass the SHA-1 check (if `RejectSHA1Certificates` is `true`).
3. If found, the connection proceeds.
4. If not found and `AutoAcceptUntrustedClientCertificates` is `true`, the certificate is automatically copied to `trusted/` and the connection proceeds.
5. If not found and `AutoAcceptUntrustedClientCertificates` is `false`, the certificate is copied to `rejected/` and the connection is refused.
On first startup with a secure profile, the server automatically generates a self-signed application certificate in the `own/` directory if one does not already exist.
The Admin UI `Certificates.razor` page uses `CertTrustService` (singleton reading `CertTrustOptions` for the Server's `PkiStoreRoot`) to promote rejected client certs to trusted without operators having to file-copy manually.
## Production Hardening
### Production hardening
The default settings prioritize ease of development. Before deploying to production, apply the following changes:
### 1. Disable automatic certificate acceptance
Set `AutoAcceptClientCertificates` to `false` so that only explicitly trusted client certificates are accepted:
```json
{
"Security": {
"AutoAcceptClientCertificates": false
}
}
```
After changing this setting, you must manually copy each client's application certificate (the `.der` file) into the `trusted/` directory.
### 2. Remove the None profile
Remove `None` from the `Profiles` list to prevent unencrypted connections:
```json
{
"Security": {
"Profiles": ["Aes256_Sha256_RsaPss-SignAndEncrypt"]
}
}
```
### 3. Configure LDAP authentication
Enable LDAP authentication to validate credentials against the GLAuth server. LDAP group membership controls what each user can do (read, write, alarm acknowledgment). See [Configuration Guide](Configuration.md) for the full LDAP property reference.
```json
{
"Authentication": {
"AllowAnonymous": false,
"AnonymousCanWrite": false,
"Ldap": {
"Enabled": true,
"Host": "localhost",
"Port": 3893,
"BaseDN": "dc=lmxopcua,dc=local",
"ServiceAccountDn": "cn=serviceaccount,dc=lmxopcua,dc=local",
"ServiceAccountPassword": "serviceaccount123"
}
}
}
```
While UserName tokens are always encrypted by the OPC UA stack (using the server certificate), enabling a secure transport profile adds protection against message-level tampering and data eavesdropping.
### 4. Review the rejected certificate store
Periodically inspect the `rejected/` directory. Certificates that appear here were presented by clients but were not trusted. If you recognize a legitimate client certificate, move it to the `trusted/` directory to grant access.
## X.509 Certificate Authentication
The server supports X.509 certificate-based user authentication in addition to Anonymous and UserName tokens. When any non-None security profile is configured, the server advertises `UserTokenType.Certificate` in its endpoint descriptions.
Clients can authenticate by presenting an X.509 certificate. The server extracts the Common Name (CN) from the certificate subject and assigns the `AuthenticatedUser` and `ReadOnly` roles. The authentication is logged with the certificate's CN, subject, and thumbprint.
X.509 authentication is available automatically when transport security is enabled -- no additional configuration is required.
## Audit Logging
The server generates audit log entries for security-relevant operations. All audit entries use the `AUDIT:` prefix and are written to the Serilog rolling file sink for compliance review.
Audited events:
- **Authentication success**: Logs username, assigned roles, and session ID
- **Authentication failure**: Logs username and session ID
- **X.509 authentication**: Logs certificate CN, subject, and thumbprint
- **Certificate validation**: Logs certificate subject, thumbprint, and expiry for all validation events (accepted or rejected)
- **Write access denial**: Logged by the role-based access control system when a user lacks the required role
Example audit log entries:
```
AUDIT: Authentication SUCCESS for user admin with roles [ReadOnly, WriteOperate, AlarmAck] session abc123
AUDIT: Authentication FAILED for user baduser from session def456
X509 certificate authenticated: CN=ClientApp, Subject=CN=ClientApp,O=Acme, Thumbprint=AB12CD34
```
## CLI Examples
The Client CLI supports the `-S` (or `--security`) flag to select the transport security mode when connecting. Valid values are `none`, `sign`, `encrypt`, and `signandencrypt`.
### Connect with no security
```bash
dotnet run --project src/ZB.MOM.WW.OtOpcUa.Client.CLI -- connect -u opc.tcp://localhost:4840/LmxOpcUa -S none
```
### Connect with signing
```bash
dotnet run --project src/ZB.MOM.WW.OtOpcUa.Client.CLI -- connect -u opc.tcp://localhost:4840/LmxOpcUa -S sign
```
### Connect with signing and encryption
```bash
dotnet run --project src/ZB.MOM.WW.OtOpcUa.Client.CLI -- connect -u opc.tcp://localhost:4840/LmxOpcUa -S encrypt
```
### Browse with encryption and authentication
```bash
dotnet run -- browse -u opc.tcp://localhost:4840/LmxOpcUa -S encrypt -U operator -P secure-password -r -d 3
```
### Read a node with signing
```bash
dotnet run -- read -u opc.tcp://localhost:4840/LmxOpcUa -S sign -n "ns=2;s=TestMachine_001/Speed"
```
The CLI tool auto-generates its own client certificate on first use (stored under `%LOCALAPPDATA%\OpcUaCli\pki\own\`). When connecting to a server with `AutoAcceptClientCertificates` set to `false`, you must copy the CLI tool's certificate into the server's `trusted/` directory before the connection will succeed.
## Troubleshooting
### Certificate trust failure
**Symptom:** The client receives a `BadSecurityChecksFailed` or `BadCertificateUntrusted` error when connecting.
**Cause:** The server does not trust the client's certificate (or vice versa), and `AutoAcceptClientCertificates` is `false`.
**Resolution:**
1. Check the server's `rejected/` directory for the client's certificate file.
2. Copy the `.der` file from `rejected/` to `trusted/`.
3. Retry the connection.
4. If the server's own certificate is not trusted by the client, copy the server's certificate from `pki/own/certs/` to the client's trusted store.
### Endpoint mismatch
**Symptom:** The client receives a `BadSecurityModeRejected` or `BadSecurityPolicyRejected` error, or reports "No endpoint found with security mode...".
**Cause:** The client is requesting a security mode that the server does not expose. For example, the client requests `SignAndEncrypt` but the server only has `None` configured.
**Resolution:**
1. Verify the server's configured `Profiles` in `appsettings.json`.
2. Ensure the profile matching the client's requested mode is listed (e.g., add `Basic256Sha256-SignAndEncrypt` for encrypted connections).
3. Restart the server after changing the configuration.
4. Use the CLI tool to verify available endpoints:
```bash
dotnet run -- connect -u opc.tcp://localhost:4840/LmxOpcUa -S none
```
The output displays the security mode and policy of the connected endpoint.
### Server certificate not generated
**Symptom:** The server logs a warning about application certificate check failure on startup.
**Cause:** The `pki/own/` directory may not be writable, or the certificate generation failed.
**Resolution:**
1. Ensure the service account has write access to the PKI root directory.
2. Check that the `PkiRootPath` (if overridden) points to a valid, writable location.
3. Delete any corrupt certificate files in `pki/own/` and restart the server to trigger regeneration.
### SHA-1 certificate rejection
**Symptom:** A client with a valid certificate is rejected, and the server logs mention SHA-1.
**Cause:** The client's certificate was signed with SHA-1, and `RejectSHA1Certificates` is `true` (the default).
**Resolution:**
- Regenerate the client certificate using SHA-256 or stronger (recommended).
- Alternatively, set `RejectSHA1Certificates` to `false` in the server configuration (not recommended for production).
- Set `AutoAcceptUntrustedClientCertificates = false`.
- Drop `None` from the profile set.
- Use the Admin UI to promote trusted client certs rather than the auto-accept fallback.
- Periodically audit the `rejected/` directory; an unexpected entry is often a misconfigured client or a probe attempt.
---
## LDAP Authentication
## OPC UA Authentication
The server supports LDAP-based user authentication via GLAuth (or any standard LDAP server). When enabled, OPC UA `UserName` token credentials are validated by LDAP bind. LDAP group membership is resolved once during authentication and mapped to custom OPC UA role `NodeId`s in the `urn:zbmom:lmxopcua:roles` namespace. These role NodeIds are stored on the session's `RoleBasedIdentity.GrantedRoleIds` and checked directly during write and alarm-ack operations.
The Server accepts three OPC UA identity-token types:
### Architecture
| Token | Handler | Notes |
|---|---|---|
| Anonymous | `IUserAuthenticator.AuthenticateAsync(username: "", password: "")` | Refused in strict mode unless explicit anonymous grants exist; allowed in lax mode for backward compatibility. |
| UserName/Password | `LdapUserAuthenticator` (`src/ZB.MOM.WW.OtOpcUa.Server/Security/LdapUserAuthenticator.cs`) | LDAP bind + group lookup; resolved `LdapGroups` flow into the session's identity bearer (`ILdapGroupsBearer`). |
| X.509 Certificate | Stack-level acceptance + role mapping via CN | X.509 identity carries `AuthenticatedUser` + read roles; finer-grain authorization happens through the data-plane ACLs. |
```
OPC UA Client → UserName Token → LmxOpcUa Server → LDAP Bind (validate credentials)
→ LDAP Search (resolve group membership)
→ Map groups to OPC UA role NodeIds
→ Store on RoleBasedIdentity.GrantedRoleIds
→ Permission checks via GrantedRoleIds.Contains()
### LDAP bind flow (`LdapUserAuthenticator`)
`Program.cs` in the Server registers the authenticator based on `OpcUaServer:Ldap`:
```csharp
builder.Services.AddSingleton<IUserAuthenticator>(sp => ldapOptions.Enabled
? new LdapUserAuthenticator(ldapOptions, sp.GetRequiredService<ILogger<LdapUserAuthenticator>>())
: new DenyAllUserAuthenticator());
```
### LDAP Groups and OPC UA Permissions
`LdapUserAuthenticator`:
All authenticated LDAP users can browse and read nodes regardless of group membership. Groups grant additional permissions:
1. Refuses to bind over plain-LDAP unless `AllowInsecureLdap = true` (dev/test only).
2. Connects to `Server:Port`, optionally upgrades to TLS (`UseTls = true`, port 636 for AD).
3. Binds as the service account; searches `SearchBase` for `UserNameAttribute = username`.
4. Rebinds as the resolved user DN with the supplied password (the actual credential check).
5. Reads `GroupAttribute` (default `memberOf`) and strips the leading `CN=` so operators configure friendly group names in `GroupToRole`.
6. Returns a `UserAuthResult` carrying the validated username + the set of LDAP groups. The set flows through to the session identity via `ILdapGroupsBearer.LdapGroups`.
| LDAP Group | Permission |
|---|---|
| ReadOnly | No additional permissions (read-only access) |
| WriteOperate | Write FreeAccess and Operate attributes |
| WriteTune | Write Tune attributes |
| WriteConfigure | Write Configure attributes |
| AlarmAck | Acknowledge alarms |
Users can belong to multiple groups. The `admin` user in the default GLAuth configuration belongs to all groups.
### Effective Permission Matrix
The effective permission for a write operation depends on two factors: the user's session role (from LDAP group membership or anonymous access) and the Galaxy attribute's security classification. The security classification controls the node's `AccessLevel` — attributes classified as `SecuredWrite`, `VerifiedWrite`, or `ViewOnly` are exposed as read-only nodes regardless of the user's role. For writable classifications, the required write role depends on the classification.
| | FreeAccess | Operate | SecuredWrite | VerifiedWrite | Tune | Configure | ViewOnly |
|---|---|---|---|---|---|---|---|
| **Anonymous (`AnonymousCanWrite=true`)** | Write | Write | Read | Read | Write | Write | Read |
| **Anonymous (`AnonymousCanWrite=false`)** | Read | Read | Read | Read | Read | Read | Read |
| **ReadOnly** | Read | Read | Read | Read | Read | Read | Read |
| **WriteOperate** | Write | Write | Read | Read | Read | Read | Read |
| **WriteTune** | Read | Read | Read | Read | Write | Read | Read |
| **WriteConfigure** | Read | Read | Read | Read | Read | Write | Read |
| **AlarmAck** (only) | Read | Read | Read | Read | Read | Read | Read |
| **Admin** (all groups) | Write | Write | Read | Read | Write | Write | Read |
All roles can browse and read all nodes. The "Read" entries above mean the node is either read-only by classification or the user lacks the required write role. "Write" means the write is permitted by both the node's classification and the user's role.
Alarm acknowledgment is an independent permission controlled by the `AlarmAck` role and is not affected by security classification.
### GLAuth Setup
The project uses [GLAuth](https://github.com/glauth/glauth) v2.4.0 as the LDAP server, installed at `C:\publish\glauth\`. See `C:\publish\glauth\auth.md` for the complete user/group reference and service management commands.
### Configuration
Enable LDAP in `appsettings.json` under `Authentication.Ldap`. See [Configuration Guide](Configuration.md) for the full property reference.
### Active Directory configuration
Production deployments typically point at Active Directory instead of GLAuth. Only four properties differ from the dev defaults: `Server`, `Port`, `UserNameAttribute`, and `ServiceAccountDn`. The same `GroupToRole` mechanism works — map your AD security groups to OPC UA roles.
Configuration example (Active Directory production):
```json
{
@@ -362,32 +129,169 @@ Production deployments typically point at Active Directory instead of GLAuth. On
"UseTls": true,
"AllowInsecureLdap": false,
"SearchBase": "DC=corp,DC=example,DC=com",
"ServiceAccountDn": "CN=OpcUaSvc,OU=Service Accounts,DC=corp,DC=example,DC=com",
"ServiceAccountDn": "CN=OtOpcUaSvc,OU=Service Accounts,DC=corp,DC=example,DC=com",
"ServiceAccountPassword": "<from your secret store>",
"DisplayNameAttribute": "displayName",
"GroupAttribute": "memberOf",
"UserNameAttribute": "sAMAccountName",
"GroupToRole": {
"OPCUA-Operators": "WriteOperate",
"OPCUA-Engineers": "WriteConfigure",
"OPCUA-AlarmAck": "AlarmAck",
"OPCUA-Tuners": "WriteTune"
"OPCUA-Tuners": "WriteTune",
"OPCUA-AlarmAck": "AlarmAck"
}
}
}
}
```
Notes:
`UserNameAttribute: "sAMAccountName"` is the critical AD override — the default `uid` is not populated on AD user entries. Use `userPrincipalName` instead if operators log in with `user@corp.example.com` form. Nested group membership is not expanded — assign users directly to the role-mapped groups, or pre-flatten in AD.
- `UserNameAttribute: "sAMAccountName"` is the critical AD override — the default `uid` is not populated on AD user entries, so the user-DN lookup returns no results without it. Use `userPrincipalName` instead if operators log in with `user@corp.example.com` form.
- `Port: 636` + `UseTls: true` is required under AD's LDAP-signing enforcement. AD increasingly rejects plain-LDAP bind; set `AllowInsecureLdap: false` to refuse fallback.
- `ServiceAccountDn` should name a dedicated read-only service principal — not a privileged admin. The account needs read access to user and group entries in the search base.
- `memberOf` values come back as full DNs like `CN=OPCUA-Operators,OU=OPC UA Security Groups,OU=Groups,DC=corp,DC=example,DC=com`. The authenticator strips the leading `CN=` RDN value so operators configure `GroupToRole` with readable group common-names.
- Nested group membership is **not** expanded — assign users directly to the role-mapped groups, or pre-flatten membership in AD. `LDAP_MATCHING_RULE_IN_CHAIN` / `tokenGroups` expansion is an authenticator enhancement, not a config change.
The same options bind the Admin's `LdapAuthService` (cookie auth / login form) so operators authenticate with a single credential across both processes.
### Security Considerations
---
- LDAP credentials are transmitted in plaintext over the OPC UA channel unless transport security is enabled. Use `Basic256Sha256-SignAndEncrypt` for production deployments.
- The GLAuth LDAP server itself listens on plain LDAP (port 3893). Enable LDAPS in `glauth.cfg` for environments where LDAP traffic crosses network boundaries.
- The service account password is stored in `appsettings.json`. Protect this file with appropriate filesystem permissions.
## Data-Plane Authorization
Data-plane authorization is the check run on every OPC UA operation against an OtOpcUa endpoint: *can this authenticated user Browse / Read / Subscribe / Write / HistoryRead / AckAlarm / Call on this specific node?*
Per decision #129 the model is **additive-only — no explicit Deny**. Grants at each hierarchy level union; absence of a grant is the default-deny.
### Hierarchy
ACLs are evaluated against the UNS path:
```
ClusterId → Namespace → UnsArea → UnsLine → Equipment → Tag
```
Each level can carry `NodeAcl` rows (`src/ZB.MOM.WW.OtOpcUa.Configuration/Entities/NodeAcl.cs`) that grant a permission bundle to a set of `LdapGroups`.
### Permission flags
```csharp
[Flags]
public enum NodePermissions : uint
{
Browse = 1 << 0,
Read = 1 << 1,
Subscribe = 1 << 2,
HistoryRead = 1 << 3,
WriteOperate = 1 << 4,
WriteTune = 1 << 5,
WriteConfigure = 1 << 6,
AlarmRead = 1 << 7,
AlarmAcknowledge = 1 << 8,
AlarmConfirm = 1 << 9,
AlarmShelve = 1 << 10,
MethodCall = 1 << 11,
ReadOnly = Browse | Read | Subscribe | HistoryRead | AlarmRead,
Operator = ReadOnly | WriteOperate | AlarmAcknowledge | AlarmConfirm,
Engineer = Operator | WriteTune | AlarmShelve,
Admin = Engineer | WriteConfigure | MethodCall,
}
```
The three Write tiers map to Galaxy's v1 `SecurityClassification``FreeAccess`/`Operate``WriteOperate`, `Tune``WriteTune`, `Configure``WriteConfigure`. `SecuredWrite` / `VerifiedWrite` / `ViewOnly` classifications remain read-only from OPC UA regardless of grant.
### Evaluator — `PermissionTrie`
`src/ZB.MOM.WW.OtOpcUa.Core/Authorization/`:
| Class | Role |
|---|---|
| `PermissionTrie` | Cluster-scoped trie; each node carries `(GroupId → NodePermissions)` grants. |
| `PermissionTrieBuilder` | Builds a trie from the current `NodeAcl` rows in one pass. |
| `PermissionTrieCache` | Per-cluster memoised trie; invalidated via `AclChangeNotifier` when the Admin publishes a draft that touches ACLs. |
| `TriePermissionEvaluator` | Implements `IPermissionEvaluator.Authorize(session, operation, scope)` — walks from the root to the leaf for the supplied `NodeScope`, unions grants along the path, compares required permission to the union. |
`NodeScope` carries `(ClusterId, NamespaceId, AreaId, LineId, EquipmentId, TagId)`; any suffix may be null — a tag-level ACL is more specific than an area-level ACL but both contribute via union.
### Dispatch gate — `AuthorizationGate`
`src/ZB.MOM.WW.OtOpcUa.Server/Security/AuthorizationGate.cs` bridges the OPC UA stack's `ISystemContext.UserIdentity` to the evaluator. `DriverNodeManager` holds exactly one reference to it and calls `IsAllowed(identity, OpcUaOperation.*, NodeScope)` on every Read, Write, HistoryRead, Browse, Subscribe, AckAlarm, Call path. A false return short-circuits the dispatch with `BadUserAccessDenied`.
Key properties:
- **Driver-agnostic.** No driver-level code participates in authorization decisions. Drivers report `SecurityClassification` as metadata on tag discovery; everything else flows through `AuthorizationGate`.
- **Fail-open-during-transition.** `StrictMode = false` (default during ACL rollouts) lets sessions without resolved LDAP groups proceed; flip `Authorization:StrictMode = true` in production once ACLs are populated.
- **Evaluator stays pure.** `TriePermissionEvaluator` has no OPC UA stack dependency — it's tested directly from xUnit.
### Probe-this-permission (Admin UI)
`PermissionProbeService` (`src/ZB.MOM.WW.OtOpcUa.Admin/Services/PermissionProbeService.cs`) lets an operator ask "if a user with groups X, Y, Z asked to do operation O on node N, would it succeed?" The answer is rendered in the AclsTab "Probe" dialog — same evaluator, same trie, so the Admin UI answer and the live Server answer cannot disagree.
### Full model
See [`docs/v2/acl-design.md`](v2/acl-design.md) for the complete design: trie invalidation, flag semantics, per-path override rules, and the reasoning behind additive-only (no Deny).
---
## Control-Plane Authorization
Control-plane authorization governs **the Admin UI** — who can view fleet config, edit drafts, publish generations, manage cluster nodes + credentials.
Per decision #150 control-plane roles are **deliberately independent of data-plane ACLs**. An operator who can read every OPC UA tag in production may not be allowed to edit cluster config; conversely a ConfigEditor may not have any data-plane grants at all.
### Roles
`src/ZB.MOM.WW.OtOpcUa.Admin/Services/AdminRoles.cs`:
| Role | Capabilities |
|---|---|
| `ConfigViewer` | Read-only access to drafts, generations, audit log, fleet status. |
| `ConfigEditor` | ConfigViewer plus draft editing (UNS, equipment, tags, ACLs, driver instances, reservations, CSV imports). Cannot publish. |
| `FleetAdmin` | ConfigEditor plus publish, cluster/node CRUD, credential management, role-grant management. |
Policies registered in Admin `Program.cs`:
```csharp
builder.Services.AddAuthorizationBuilder()
.AddPolicy("CanEdit", p => p.RequireRole(AdminRoles.ConfigEditor, AdminRoles.FleetAdmin))
.AddPolicy("CanPublish", p => p.RequireRole(AdminRoles.FleetAdmin));
```
Razor pages and API endpoints gate with `[Authorize(Policy = "CanEdit")]` / `"CanPublish"`; nav-menu sections hide via `<AuthorizeView>`.
### Role grant source
Admin reads `LdapGroupRoleMapping` rows from the Config DB (`src/ZB.MOM.WW.OtOpcUa.Configuration/Entities/LdapGroupRoleMapping.cs`) — the same pattern as the data-plane `NodeAcl` but scoped to Admin roles + (optionally) cluster scope for multi-site fleets. The `RoleGrants.razor` page lets FleetAdmins edit these mappings without leaving the UI.
---
## OTOPCUA0001 Analyzer — Compile-Time Guard
Per-capability resilience (retry, timeout, circuit-breaker, bulkhead) is applied by `CapabilityInvoker` in `src/ZB.MOM.WW.OtOpcUa.Core/Resilience/`. A driver-capability call made **outside** the invoker bypasses resilience entirely — which in production looks like inconsistent timeouts, un-wrapped retries, and unbounded blocking.
`OTOPCUA0001` (Roslyn analyzer at `src/ZB.MOM.WW.OtOpcUa.Analyzers/UnwrappedCapabilityCallAnalyzer.cs`) fires as a compile-time **warning** when an `async`/`Task`-returning method on one of the seven guarded capability interfaces (`IReadable`, `IWritable`, `ITagDiscovery`, `ISubscribable`, `IHostConnectivityProbe`, `IAlarmSource`, `IHistoryProvider`) is invoked **outside** a lambda passed to `CapabilityInvoker.ExecuteAsync` / `ExecuteWriteAsync` / `AlarmSurfaceInvoker.*`. The analyzer walks up the syntax tree from the call site, finds any enclosing invoker invocation, and verifies the call lives transitively inside that invocation's anonymous-function argument — a sibling pattern (do the call, then invoke `ExecuteAsync` on something unrelated nearby) does not satisfy the rule.
Five xUnit-v3 + Shouldly tests at `tests/ZB.MOM.WW.OtOpcUa.Analyzers.Tests` cover the common fail/pass shapes + the sibling-pattern regression guard.
The rule is intentionally scoped to async surfaces — pure in-memory accessors like `IHostConnectivityProbe.GetHostStatuses()` return synchronously and do not require the invoker wrap.
---
## Audit Logging
- **Server**: Serilog `AUDIT:` prefix on every authentication success/failure, certificate validation result, write access denial. Written alongside the regular rolling file sink.
- **Admin**: `AuditLogService` writes `ConfigAuditLog` rows to the Config DB for every publish, rollback, cluster-node CRUD, credential rotation. Visible in the Audit page for operators with `ConfigViewer` or above.
---
## Troubleshooting
### Certificate trust failure
Check `{PkiStoreRoot}/rejected/` for the client's cert. Promote via Admin UI Certificates page, or copy the `.der` file manually to `trusted/`.
### LDAP users can connect but fail authorization
Verify (a) `OpcUaServer:Ldap:GroupAttribute` returns groups in the form `CN=MyGroup,…` (OtOpcUa strips the `CN=` for matching), (b) a `NodeAcl` grant exists at any level of the node's UNS path that unions to the required permission, (c) `Authorization:StrictMode` is correctly set for the deployment stage.
### LDAP bind rejected as "insecure"
Set `UseTls = true` + `Port = 636`, or temporarily flip `AllowInsecureLdap = true` in dev. Production Active Directory increasingly refuses plain-LDAP bind under LDAP-signing enforcement.
### `AuthorizationGate` denies every call after a publish
`AclChangeNotifier` invalidates the `PermissionTrieCache` on publish; a stuck cache is usually a missed notification. Restart the Server as a quick mitigation and file a bug — the design is to stay fresh without restarts.