Commit Graph

3 Commits

Author SHA1 Message Date
Joseph Doherty
9dd5e4e745 Phase 6.1 Stream C — health endpoints on :4841 + LogContextEnricher + Serilog JSON sink + CapabilityInvoker enrichment
Closes Stream C per docs/v2/implementation/phase-6-1-resilience-and-observability.md.

Core.Observability (new namespace):
- DriverHealthReport — pure-function aggregation over DriverHealthSnapshot list.
  Empty fleet = Healthy. Any Faulted = Faulted. Any Unknown/Initializing (no
  Faulted) = NotReady. Any Degraded or Reconnecting (no Faulted, no NotReady)
  = Degraded. Else Healthy. HttpStatus(verdict) maps to the Stream C.1 state
  matrix: Healthy/Degraded → 200, NotReady/Faulted → 503.
- LogContextEnricher — Serilog LogContext wrapper. Push(id, type, capability,
  correlationId) returns an IDisposable scope; inner log calls carry
  DriverInstanceId / DriverType / CapabilityName / CorrelationId structured
  properties automatically. NewCorrelationId = 12-hex-char GUID slice for
  cases where no OPC UA RequestHeader.RequestHandle is in flight.

CapabilityInvoker — now threads LogContextEnricher around every ExecuteAsync /
ExecuteWriteAsync call site. OtOpcUaServer passes driver.DriverType through
so logs correlate to the driver type too. Every capability call emits
structured fields per the Stream C.4 compliance check.

Server.Observability:
- HealthEndpointsHost — standalone HttpListener on http://localhost:4841/
  (loopback avoids Windows URL-ACL elevation; remote probing via reverse
  proxy or explicit netsh urlacl grant). Routes:
    /healthz → 200 when (configDbReachable OR usingStaleConfig); 503 otherwise.
      Body: status, uptimeSeconds, configDbReachable, usingStaleConfig.
    /readyz  → DriverHealthReport.Aggregate + HttpStatus mapping.
      Body: verdict, drivers[], degradedDrivers[], uptimeSeconds.
    anything else → 404.
  Disposal cooperative with the HttpListener shutdown.
- OpcUaApplicationHost starts the health host after the OPC UA server comes up
  and disposes it on shutdown. New OpcUaServerOptions knobs:
  HealthEndpointsEnabled (default true), HealthEndpointsPrefix (default
  http://localhost:4841/).

Program.cs:
- Serilog pipeline adds Enrich.FromLogContext + opt-in JSON file sink via
  `Serilog:WriteJson = true` appsetting. Uses Serilog.Formatting.Compact's
  CompactJsonFormatter (one JSON object per line — SIEMs like Splunk,
  Datadog, Graylog ingest without a regex parser).

Server.Tests:
- Existing 3 OpcUaApplicationHost integration tests now set
  HealthEndpointsEnabled=false to avoid port :4841 collisions under parallel
  execution.
- New HealthEndpointsHostTests (9): /healthz healthy empty fleet; stale-config
  returns 200 with flag; unreachable+no-cache returns 503; /readyz empty/
  Healthy/Faulted/Degraded/Initializing drivers return correct status and
  bodies; unknown path → 404. Uses ephemeral ports via Interlocked counter.

Core.Tests:
- DriverHealthReportTests (8): empty fleet, all-healthy, any-Faulted trumps,
  any-NotReady without Faulted, Degraded without Faulted/NotReady, HttpStatus
  per-verdict theory.
- LogContextEnricherTests (8): all 4 properties attach; scope disposes cleanly;
  NewCorrelationId shape; null/whitespace driverInstanceId throws.
- CapabilityInvokerEnrichmentTests (2): inner logs carry structured
  properties; no context leak outside the call site.

Full solution dotnet test: 1016 passing (baseline 906, +110 for Phase 6.1 so
far across Streams A+B+C). Pre-existing Client.CLI Subscribe flake unchanged.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-19 08:15:44 -04:00
Joseph Doherty
22d3b0d23c Phase 3 PR 19 — LDAP user identity + Basic256Sha256 security profile. Replaces the anonymous-only endpoint with a configurable security profile and an LDAP-backed UserName token validator. New IUserAuthenticator abstraction in Backend/Security/: LdapUserAuthenticator binds to the configured directory (reuses the pattern from Admin.Security.LdapAuthService without the cross-app dependency — Novell.Directory.Ldap.NETStandard 3.6.0 package ref added to Server alongside the existing OPCFoundation packages) and maps group membership to OPC UA roles via LdapOptions.GroupToRole (case-insensitive). DenyAllUserAuthenticator is the default when Ldap.Enabled=false so UserName token attempts return a clean BadUserAccessDenied rather than hanging on a localhost:3893 bind attempt. OpcUaSecurityProfile enum + LdapOptions nested record on OpcUaServerOptions. Profile=None keeps the PR 17 shape (SecurityPolicies.None + Anonymous token only) so existing integration tests stay green; Profile=Basic256Sha256SignAndEncrypt adds a second ServerSecurityPolicy (Basic256Sha256 + SignAndEncrypt) to the collection and, when Ldap.Enabled=true, adds a UserName token policy scoped to SecurityPolicies.Basic256Sha256 only — passwords must ride an encrypted channel, the stack rejects UserName over None. OtOpcUaServer.OnServerStarted hooks SessionManager.ImpersonateUser: AnonymousIdentityToken passes through; UserNameIdentityToken delegates to IUserAuthenticator.AuthenticateAsync — rejected identities throw ServiceResultException(BadUserAccessDenied); accepted identities get a RoleBasedIdentity that carries the resolved roles through session.Identity so future PRs can gate writes by role. OpcUaApplicationHost + OtOpcUaServer constructors take IUserAuthenticator as a dependency. Program.cs binds the new OpcUaServer:Ldap section from appsettings (Enabled defaults false, GroupToRole parsed as Dictionary<string,string>), registers IUserAuthenticator as LdapUserAuthenticator when enabled or DenyAllUserAuthenticator otherwise. PR 17 integration test updated to pass DenyAllUserAuthenticator so it keeps exercising the anonymous-only path unchanged. Tests — SecurityConfigurationTests (new, 13 cases): DenyAllAuthenticator rejects every credential; LdapAuthenticator rejects blank creds without hitting the server; rejects when Enabled=false; rejects plaintext when both UseTls=false AND AllowInsecureLdap=false (safety guard matching the Admin service); EscapeLdapFilter theory (4 rows: plain passthrough, parens/asterisk/backslash → hex escape) — regression guard against LDAP injection; ExtractOuSegment theory (3 rows: finds ou=, returns null when absent, handles multiple ou segments by returning first); ExtractFirstRdnValue theory (3 rows: strips cn= prefix, handles single-segment DN, returns plain string unchanged when no =). OpcUaServerOptions_default_is_anonymous_only asserts the default posture preserves PR 17 behavior. InternalsVisibleTo('ZB.MOM.WW.OtOpcUa.Server.Tests') added to Server csproj so ExtractOuSegment and siblings are reachable from the tests. Full solution: 0 errors, 180 tests pass (8 Core + 14 Proxy + 24 Configuration + 6 Shared + 91 Galaxy.Host + 19 Server (17 unit + 2 integration) + 18 Admin). Live-LDAP integration test (connect via Basic256Sha256 endpoint with a real user from GLAuth, assert the session.Identity carries the mapped role) is deferred to a follow-up — it requires the GLAuth dev instance to be running at localhost:3893 which is dev-machine-specific, and the test harness for that also needs a fresh client-side certificate provisioned by the live server's trusted store.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-18 08:49:46 -04:00
Joseph Doherty
46834a43bd Phase 3 PR 17 — complete OPC UA server startup end-to-end + integration test. PR 16 shipped the materialization shape (DriverNodeManager / OtOpcUaServer) without the activation glue; this PR finishes the scope so an external OPC UA client can actually connect, browse, and read. New OpcUaServerOptions DTO bound from the OpcUaServer section of appsettings.json (EndpointUrl default opc.tcp://0.0.0.0:4840/OtOpcUa, ApplicationName, ApplicationUri, PkiStoreRoot default %ProgramData%\OtOpcUa\pki, AutoAcceptUntrustedClientCertificates default true for dev — production flips via config). OpcUaApplicationHost wraps Opc.Ua.Configuration.ApplicationInstance: BuildConfiguration constructs the ApplicationConfiguration programmatically (no external XML) with SecurityConfiguration pointing at <PkiStoreRoot>/own, /issuers, /trusted, /rejected directories — stack auto-creates the cert folders on first run and generates a self-signed application certificate via CheckApplicationInstanceCertificate, ServerConfiguration.BaseAddresses set to the endpoint URL + SecurityPolicies just None + UserTokenPolicies just Anonymous with PolicyId='Anonymous' + SecurityPolicyUri=None so the client's UserTokenPolicy lookup succeeds at OpenSession, TransportQuotas.OperationTimeout=15s + MinRequestThreadCount=5 / MaxRequestThreadCount=100 / MaxQueuedRequestCount=200, CertificateValidator auto-accepts untrusted when configured. StartAsync creates the OtOpcUaServer (passes DriverHost + ILoggerFactory so one DriverNodeManager is created per registered driver in CreateMasterNodeManager from PR 16), calls ApplicationInstance.Start(server) to bind the endpoint, then walks each DriverNodeManager and drives a fresh GenericDriverNodeManager.BuildAddressSpaceAsync against it so the driver's discovery streams into the address space that's already serving clients. Per-driver discovery is isolated per decision #12: a discovery exception marks the driver's subtree faulted but the server stays up serving the other drivers' subtrees. DriverHost.GetDriver(instanceId) public accessor added alongside the existing GetHealth so OtOpcUaServer can enumerate drivers during CreateMasterNodeManager. DriverNodeManager.Driver property made public so OpcUaApplicationHost can identify which driver each node manager wraps during the discovery loop. OpcUaServerService constructor takes OpcUaApplicationHost — ExecuteAsync sequence now: bootstrap.LoadCurrentGenerationAsync → applicationHost.StartAsync → infinite Task.Delay until stop. StopAsync disposes the application host (which stops the server via OtOpcUaServer.Stop) before disposing DriverHost. Program.cs binds OpcUaServerOptions from appsettings + registers OpcUaApplicationHost + OpcUaServerOptions as singletons. Integration test (OpcUaServerIntegrationTests, Category=Integration): IAsyncLifetime spins up the server on a random non-default port (48400+random for test isolation) with a per-test-run PKI store root (%temp%/otopcua-test-<guid>) + a FakeDriver registered in DriverHost that has ITagDiscovery + IReadable implementations — DiscoverAsync registers TestFolder>Var1, ReadAsync returns 42. Client_can_connect_and_browse_driver_subtree creates an in-process OPC UA client session via CoreClientUtils.SelectEndpoint (which talks to the running server's GetEndpoints and fetches the live EndpointDescription with the actual PolicyId), browses the fake driver's root, asserts TestFolder appears in the returned references. Client_can_read_a_driver_variable_through_the_node_manager constructs the variable NodeId using the namespace index the server registered (urn:OtOpcUa:fake), calls Session.ReadValue, asserts the DataValue.Value is 42 — the whole pipeline (client → server endpoint → DriverNodeManager.OnReadValue → FakeDriver.ReadAsync → back through the node manager → response to client) round-trips correctly. Dispose tears down the session, server, driver host, and PKI store directory. Full solution: 0 errors, 165 tests pass (8 Core unit + 14 Proxy unit + 24 Configuration unit + 6 Shared unit + 91 Galaxy.Host unit + 4 Server (2 unit NodeBootstrap + 2 new integration) + 18 Admin). End-to-end outcome: PR 14's GalaxyAlarmTracker alarm events now flow through PR 15's GenericDriverNodeManager event forwarder → PR 16's ConditionSink → OPC UA AlarmConditionState.ReportEvent → out to every OPC UA client subscribed to the alarm condition. The full alarm subsystem (driver-side subscription of the Galaxy 4-attribute quartet, Core-side routing by source node id, Server-side AlarmConditionState materialization with ReportEvent dispatch) is now complete and observable through any compliant OPC UA client. LDAP / security-profile wire-up (replacing the anonymous-only endpoint with BasicSignAndEncrypt + user identity mapping to NodePermissions role) is the next layer — it reuses the same ApplicationConfiguration plumbing this PR introduces but needs a deployment-policy source (central config DB) for the cert trust decisions.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-18 08:18:37 -04:00