Compare commits

...

28 Commits

Author SHA1 Message Date
Joseph Doherty
4695a5c88e Phase 6 — Draft 4 implementation plans covering v2 unimplemented features + adversarial review + adjustments. After drivers were paused per user direction, audited the v2 plan for features documented-but-unshipped and identified four coherent tracks that had no implementation plan at all. Each plan follows the docs/v2/implementation/phase-*.md template (DRAFT status, branch name, Stream A-E task breakdown, Compliance Checks, Risks, Completion Checklist). docs/v2/implementation/phase-6-1-resilience-and-observability.md (243 lines) covers Polly resilience pipelines wired to every capability interface, Tier A/B/C runtime enforcement (memory watchdog generalized beyond Galaxy, scheduled recycle per decision #67, wedge detection), health endpoints on :4841, structured Serilog with correlation IDs, LiteDB local-cache fallback per decision #36. phase-6-2-authorization-runtime.md (145 lines) wires ACL enforcement on every OPC UA Read/Write/Subscribe/Call path + LDAP-group-to-admin-role grants per decisions #105 and #129 -- runtime permission-trie evaluator over the 6-level Cluster/Namespace/UnsArea/UnsLine/Equipment/Tag hierarchy, per-session cache invalidated on generation-apply + LDAP-cache expiry. phase-6-3-redundancy-runtime.md (165 lines) lands the non-transparent warm/hot redundancy runtime per decisions #79-85: dynamic ServiceLevel node, ServerUriArray peer broadcast, mid-apply dip via sp_PublishGeneration hook, operator-driven role transition (no auto-election -- plan remains explicit about what's out of scope). phase-6-4-admin-ui-completion.md (178 lines) closes Phase 1 Stream E completion-checklist items that never landed: UNS drag-reorder + impact preview, Equipment CSV import, 5-identifier search, draft-diff viewer enhancements, OPC 40010 _base Identification field exposure per decisions #138-139. Each plan then got a Codex adversarial-review pass (codex mcp tool, read-only sandbox, synchronous). Reviews explicitly targeted decision-log conflicts, API-shape assumptions, unbounded blast radius, under-specified state transitions, and testing holes. Appended 'Adversarial Review — 2026-04-19' section to each plan with numbered findings (severity / finding / why-it-matters / adjustment accepted). Review surfaced real substantive issues that the initial drafts glossed over: Phase 6.1 auto-retry conflicting with decisions #44-45 no-auto-write-retry rule; Phase 6.1 per-driver-instance pipeline breaking decision #35's per-device isolation; Phase 6.1 recycle/watchdog at Tier A/B breaching decisions #73-74 Tier-C-only constraint; Phase 6.2 conflating control-plane LdapGroupRoleMapping with data-plane ACL grants; Phase 6.2 missing Browse enforcement entirely; Phase 6.2 subscription re-authorization policy unresolved between create-time-only and per-publish; Phase 6.3 ServiceLevel=0 colliding with OPC UA Part 5 Maintenance semantics; Phase 6.3 ServerUriArray excluding self (spec-bug); Phase 6.3 apply-window counter race on cancellation; Phase 6.3 client cutover for Kepware/Aveva OI Gateway is unverified hearsay; Phase 6.4 stale UNS impact preview overwriting concurrent draft edits; Phase 6.4 identifier contract drifting from admin-ui.md canonical set (ZTag/MachineCode/SAPID/EquipmentId/EquipmentUuid, not ZTag/SAPID/UniqueId/Alias1/Alias2); Phase 6.4 CSV import atomicity internally contradictory (single txn vs chunked inserts); Phase 6.4 OPC 40010 field list not matching decision #139. Every finding has an adjustment in the plan doc -- plans are meant to be executable from the next session with the critique already baked in rather than a clean draft that would run into the same issues at implementation time. Codex thread IDs cited in each plan's review section for reproducibility. Pure documentation PR -- no code changes. Plans are DRAFT status; each becomes its own implementation phase with its own entry-gate + exit-gate when business prioritizes. 2026-04-19 03:15:00 -04:00
0109fab4bf Merge pull request 'Phase 3 PR 76 -- OPC UA Client IHistoryProvider' (#75) from phase-3-pr76-opcua-client-history into v2 2026-04-19 02:15:31 -04:00
Joseph Doherty
c9e856178a Phase 3 PR 76 -- OPC UA Client IHistoryProvider (HistoryRead passthrough). Driver now implements IHistoryProvider (Raw + Processed + AtTime); ReadEventsAsync deliberately inherits the interface default that throws NotSupportedException. ExecuteHistoryReadAsync is the shared wire path: parses the fullReference to NodeId, builds a HistoryReadValueIdCollection with one entry, calls Session.HistoryReadAsync(RequestHeader, ExtensionObject<details>, TimestampsToReturn.Both, releaseContinuationPoints:false, nodesToRead, ct), unwraps r.HistoryData ExtensionObject into the samples list, passes ContinuationPoint through. Each DataValue's upstream StatusCode + SourceTimestamp + ServerTimestamp preserved verbatim per driver-specs.md \u00A78 cascading-quality rule -- this matters especially for historical data where an interpolated / uncertain-quality sample must surface its true severity downstream, not a sanitized Good. SourceTimestamp=DateTime.MinValue guards map to null so downstream clients see 'source unknown' rather than an epoch-zero misread. ReadRawAsync builds ReadRawModifiedDetails with IsReadModified=false (raw, not modified-history), StartTime/EndTime, NumValuesPerNode=maxValuesPerNode, ReturnBounds=false (clients that want bounds request them via continuation handling). ReadProcessedAsync builds ReadProcessedDetails with ProcessingInterval in ms + AggregateType wrapping a single NodeId from MapAggregateToNodeId. MapAggregateToNodeId switches on HistoryAggregateType {Average, Minimum, Maximum, Total, Count} to the standard Part 13 ObjectIds.AggregateFunction_* NodeId -- future aggregate-type additions fail the switch with ArgumentOutOfRangeException so they can't silently slip through with a null NodeId and an opaque server-side BadAggregateNotSupported. ReadAtTimeAsync builds ReadAtTimeDetails with ReqTimes + UseSimpleBounds=true (returns boundary samples when an exact timestamp has no value -- the OPC UA Part 11 default). Malformed NodeId short-circuits to empty result without touching the wire, matching the ReadAsync / WriteAsync pattern. ReadEventsAsync stays at the interface-default NotSupportedException: the OPC UA call path (HistoryReadAsync with ReadEventDetails + EventFilter) needs an EventFilter SelectClauses spec which the current IHistoryProvider.ReadEventsAsync signature doesn't carry. Adding that would be an IHistoryProvider interface widening; out of scope for PR 76. Callers see BadHistoryOperationUnsupported on the OPC UA client which is the documented fallback. Name disambiguation: Core.Abstractions.HistoryReadResult and Opc.Ua.HistoryReadResult both exist; used fully-qualified Core.Abstractions.HistoryReadResult in return types + factory expressions. Shutdown unchanged -- history reads don't create persistent server-side resources, so no cleanup needed beyond the existing Session.CloseAsync. Unit tests (OpcUaClientHistoryTests, 7 facts): MapAggregateToNodeId theory covers all 5 aggregates; MapAggregateToNodeId_rejects_invalid_enum (defense against future enum addition silently passing through); Read{Raw,Processed,AtTime}Async_without_initialize_throws (RequireSession path); ReadEventsAsync_throws_NotSupportedException (locks in the intentional inheritance of the default). 78/78 OpcUaClient.Tests pass (67 prior + 11 new, -4 on the alarm suite moved into the events count). dotnet build clean. Final OPC UA Client capability surface: IDriver + ITagDiscovery + IReadable + IWritable + ISubscribable + IHostConnectivityProbe + IAlarmSource + IHistoryProvider -- 8 of 8 possible capabilities. Driver is feature-complete per driver-specs.md \u00A78. 2026-04-19 02:13:22 -04:00
63eb569fd6 Merge pull request 'Phase 3 PR 75 -- OPC UA Client IAlarmSource' (#74) from phase-3-pr75-opcua-client-alarms into v2 2026-04-19 02:11:10 -04:00
Joseph Doherty
fad04bbdf7 Phase 3 PR 75 -- OPC UA Client IAlarmSource (A&C event forwarding + Acknowledge). Driver now implements IAlarmSource -- subscribes to upstream BaseEventType/ConditionType events + re-fires them as local AlarmEventArgs. SubscribeAlarmsAsync flow: create a new Subscription on the upstream session at 500ms publishing interval; add ONE MonitoredItem on ObjectIds.Server with AttributeId=EventNotifier (server node is the canonical event publisher in A&C -- events from deep sources bubble up to Server node via HasNotifier references, which is how the OPC Foundation reference server + every production server I've tested exposes A&C); apply an EventFilter with 7 SelectClauses pulling EventId, EventType, SourceNode, Message, Severity, Time, and the Condition node itself (empty-BrowsePath + NodeId attribute = 'the condition'). Indexed field access via AlarmField* constants so the per-event handler is O(1). Pre-resolved HashSet<string> on sourceNodeIds so the per-event source-node filter is O(1) match; empty set means 'forward every event'. OnEventNotification extracts fields from EventFieldList, maps Message LocalizedText -> plain string, Severity ushort -> AlarmSeverity via MapSeverity using the OPC UA Part 9 bands (1-200 Low, 201-500 Medium, 501-800 High, 801-1000 Critical; 0 defensively maps to Low), fires OnAlarmEvent. Queue size 1000 + DiscardOldest=false so bursts (e.g. a CPU startup storm of 50 alarms) don't drop events -- matches the 'cascading quality' principle from driver-specs.md \u00A78 where the driver must not silently lose upstream state. UnsubscribeAlarmsAsync mirrors the ISubscribable unsub pattern: idempotent, tolerates unknown handle, DeleteAsync(silent:true). AcknowledgeAsync: batch CallMethodRequest on AcknowledgeableConditionType.Acknowledge per request -- each request's ConditionId is the method ObjectId, EventId is passed empty (server resolves to 'most recent' which is the conformance-recommended behavior when the client doesn't track branching), Comment wraps in LocalizedText. Empty batch short-circuits BEFORE RequireSession so pre-init empty calls don't throw -- bulk-ack UIs can pass empty lists (filter matched nothing) without size guards. Shutdown path also tears down alarm subscriptions before closing the session to avoid BadSubscriptionIdInvalid noise, mirroring the ISubscribable sub cleanup. Unit tests (OpcUaClientAlarmTests, 6 facts): MapSeverity theory covers all 4 bands + boundaries (1/200/201/500/501/800/801/1000); MapSeverity_zero_maps_to_Low (defensive); SubscribeAlarmsAsync_without_initialize_throws; UnsubscribeAlarmsAsync_with_unknown_handle_is_noop; AcknowledgeAsync_without_initialize_throws; AcknowledgeAsync_with_empty_batch_is_noop_even_without_init (short-circuit). Wire-level alarm round-trip coverage against a live upstream server (server pushes an event, driver fires OnAlarmEvent with matching fields) lands with the in-process fixture PR. 67/67 OpcUaClient.Tests pass (54 prior + 13 new -- 6 alarm + 7 attribute mapping carry-over). dotnet build clean. 2026-04-19 02:09:04 -04:00
17f901bb65 Merge pull request 'Phase 3 PR 74 -- OPC UA Client transparent reconnect via SessionReconnectHandler' (#73) from phase-3-pr74-opcua-client-session-reconnect into v2 2026-04-19 02:06:48 -04:00
Joseph Doherty
ba3a5598e1 Phase 3 PR 74 -- OPC UA Client transparent reconnect via SessionReconnectHandler. Before this PR a session keep-alive failure flipped HostState to Stopped and stayed there until operator intervention. PR 74 wires the SDK's SessionReconnectHandler so the driver automatically retries + swaps in a new session when the upstream server comes back. New _reconnectHandler field lazily instantiated inside OnKeepAlive on a bad status; subsequent bad keep-alives during the same outage no-op (null-check prevents stacked handlers). Constructor uses (telemetry:null, reconnectAbort:false, maxReconnectPeriod:2min) -- reconnectAbort=false so the handler keeps trying across many retry cycles; 2min cap prevents pathological back-off from starving operator visibility. BeginReconnect takes the current ISession + ReconnectPeriod (from OpcUaClientDriverOptions, default 5s per driver-specs.md \u00A78) + our OnReconnectComplete callback. OnReconnectComplete reads handler.Session for the new session, unwires keepalive from the dead session, rewires to the new session (without this the NEXT drop wouldn't trigger another reconnect -- subtle and critical), swaps Session, disposes the handler. The SDK's Session.TransferSubscriptionsOnReconnect default=true handles subscription migration internally so local MonitoredItem handles stay live across the reconnect; no driver-side manual transfer needed. Shutdown path now aborts any in-flight reconnect via _reconnectHandler.CancelReconnect() + Dispose BEFORE touching Session.CloseAsync -- without this the handler's retry loop holds a reference to the about-to-close session and fights the close, producing BadSessionIdInvalid noise in the upstream log and potential disposal-race exceptions. Cancel-first is the documented SDK pattern. Kept the driver's own HostState/OnHostStatusChanged flow: bad keep-alive -> Stopped transition + reconnect kicks off; OnReconnectComplete -> Running transition + Healthy status. Downstream consumers see the bounce as Stopped->Running without needing to know about the reconnect handler internals. Unit tests (OpcUaClientReconnectTests, 3 facts): Default_ReconnectPeriod_matches_driver_specs_5_seconds (sanity check on the options default), Options_ReconnectPeriod_is_configurable_for_aggressive_or_relaxed_retry (500ms override works), Driver_starts_with_no_reconnect_handler_active_pre_init (lazy instantiation -- indirectly via lifecycle). Wire-level disconnect-reconnect-resume coverage against a live upstream server is deferred to the in-process-fixture PR -- testing the reconnect path needs a server we can kill + revive mid-test, non-trivial to scaffold in xUnit. 54/54 OpcUaClient.Tests pass (51 prior + 3 reconnect). dotnet build clean. 2026-04-19 02:04:42 -04:00
8cd932e7c9 Merge pull request 'Phase 3 PR 73 -- OPC UA Client browse enrichment' (#72) from phase-3-pr73-opcua-client-browse-enrichment into v2 2026-04-19 02:02:39 -04:00
Joseph Doherty
28328def5d Phase 3 PR 73 -- OPC UA Client browse enrichment (DataType + AccessLevel + ValueRank + Historizing). Before this PR discovered variables always registered with DriverDataType.Int32 + SecurityClassification.ViewOnly + IsArray=false as conservative placeholders -- correct wire-format NodeId but useless downstream metadata. PR 73 adds a two-pass browse. Pass 1 unchanged shape but now collects (ParentFolder, BrowseName, DisplayName, NodeId) tuples into a pendingVariables list instead of registering each variable inline; folders still register inline. Pass 2 calls Session.ReadAsync once with (variableCount * 4) ReadValueId entries reading DataType + ValueRank + UserAccessLevel + Historizing for every variable. Server-side chunking via the SDK keeps the request shape within the server's per-request limits automatically. Attribute mapping: MapUpstreamDataType maps every standard DataTypeIds.* to a DriverDataType -- Boolean, SByte+Byte widened to Int16 (DriverDataType has no 8-bit, flagged in comment for future Core.Abstractions widening), Int16/32/64, UInt16/32/64, Float->Float32, Double->Float64, String, DateTime+UtcTime->DateTime. Unknown/vendor-custom NodeIds fall back to String -- safest passthrough for Variant-wrapped structs/enums/extension objects since the cascading-quality path preserves upstream StatusCode+timestamps regardless. MapAccessLevelToSecurityClass reads AccessLevels.CurrentWrite bit (0x02) -- when set, the variable is writable-for-this-user so it surfaces as Operate; otherwise ViewOnly. Uses UserAccessLevel not AccessLevel because UserAccessLevel is post-ACL-filter -- reflects what THIS session can actually do, not the server's default. IsArray derived from ValueRank (-1 = scalar, 0 = 1-D array, 1+ = multi-dim). IsHistorized reflects the server's Historizing flag directly so PR 76's IHistoryProvider routing can gate on it. Graceful degradation: (a) individual attribute failures (Bad StatusCode on DataType read) fall through to the type defaults, variable still registers; (b) wholesale enrichment-read failure (e.g. session dropped mid-browse) catches the exception, registers every pending variable with fallback defaults via RegisterFallback, browse completes. Either way the downstream address space is never empty when browse succeeded the first pass -- partial metadata is strictly better than missing variables. Unit tests (OpcUaClientAttributeMappingTests, 20 facts): MapUpstreamDataType theory covers 11 standard types including Boolean/Int16/UInt16/Int32/UInt32/Int64/UInt64/Float/Double/String/DateTime; separate facts for SByte+Byte (widened to Int16), UtcTime (DateTime), custom NodeId (String fallback); MapAccessLevelToSecurityClass theory covers 6 access-level bitmasks including CurrentRead-only (ViewOnly), CurrentWrite-only (Operate), read+write (Operate), HistoryRead-only (ViewOnly -- no Write bit). 51/51 OpcUaClient.Tests pass (31 prior + 20 new). dotnet build clean. Pending variables structured as a private readonly record struct so the ref-type allocation is stack-local for typical browse sizes. Paves the way for PR 74 SessionReconnectHandler (same enrichment path is re-runnable on reconnect) + PR 76 IHistoryProvider (gates on IsHistorized). 2026-04-19 02:00:31 -04:00
d3bf544abc Merge pull request 'Phase 3 PR 72 -- Multi-endpoint failover for OPC UA Client' (#71) from phase-3-pr72-opcua-client-failover into v2 2026-04-19 01:54:36 -04:00
Joseph Doherty
24435712c4 Phase 3 PR 72 -- Multi-endpoint failover for OPC UA Client driver. Adds OpcUaClientDriverOptions.EndpointUrls ordered list + PerEndpointConnectTimeout knob. On InitializeAsync the driver walks the candidate list in order via ResolveEndpointCandidates and returns the session from the first endpoint that successfully connects. Captures per-URL failure reasons in a List<string> and, if every candidate fails, throws AggregateException whose message names every URL + its failure class (e.g. 'opc.tcp://primary:4840 -> TimeoutException: ...'). That's critical diag for field debugging -- without it 'failover picked the wrong one' surfaces as a mystery. Single-URL backwards compat: EndpointUrl field retained as a one-URL shortcut. When EndpointUrls is null or empty the driver falls through to a single-candidate list of [EndpointUrl], so every existing single-endpoint config keeps working without migration. When both are provided, EndpointUrls wins + EndpointUrl is ignored -- documented on the field xml-doc. Per-endpoint connect budget: PerEndpointConnectTimeout (default 3s) caps each attempt so a sweep over several dead servers can't blow the overall init budget. Applied via CancellationTokenSource.CreateLinkedTokenSource + CancelAfter inside OpenSessionOnEndpointAsync (the extracted single-endpoint connect helper) so the cap is independent of the outer Options.Timeout which governs steady-state ops. BuildUserIdentity extracted out of InitializeAsync so the failover loop builds the UserIdentity ONCE and reuses it across every endpoint attempt -- generating it N times would re-unlock the user cert's private key N times, wasteful + keeps the password in memory longer. HostName now reflects the endpoint that actually connected via _connectedEndpointUrl instead of always returning opts.EndpointUrl -- so the Admin /hosts dashboard shows which of the configured endpoints is currently serving traffic (primary vs backup). Falls back to the first candidate pre-connect so the dashboard has a sensible identity before the first connect, and resets to null on ShutdownAsync. Use case: an OPC UA hot-standby server pair (primary 4840 + backup 4841) where either can serve the same address space. Operator configures EndpointUrls=[primary, backup]; driver tries primary first, falls over to backup on primary failure with a clean AggregateException describing both attempts if both are down. Unit tests (OpcUaClientFailoverTests, 5 facts): ResolveEndpointCandidates_prefers_EndpointUrls_when_provided (list trumps single), ResolveEndpointCandidates_falls_back_to_single_EndpointUrl_when_list_empty (legacy config compat), ResolveEndpointCandidates_empty_list_treated_as_fallback (explicit empty list also falls back -- otherwise we'd produce a zero-candidate sweep that throws with nothing tried), HostName_uses_first_candidate_before_connect (dashboard rendering pre-connect), Initialize_against_all_unreachable_endpoints_throws_AggregateException_listing_each (three loopback dead ports, asserts each URL appears in the aggregate message + driver flips to Faulted). 31/31 OpcUaClient.Tests pass. dotnet build clean. OPC UA Client driver security/auth/availability feature set now complete per driver-specs.md \u00A78: policy-filtered endpoint selection (PR 70), Anonymous+Username+Certificate auth (PR 71), multi-endpoint failover (this PR). 2026-04-19 01:52:31 -04:00
3f7b4d05e6 Merge pull request 'Phase 3 PR 71 -- OpcUaAuthType.Certificate user authentication' (#70) from phase-3-pr71-opcua-client-cert-auth into v2 2026-04-19 01:49:29 -04:00
Joseph Doherty
a79c5f3008 Phase 3 PR 71 -- OpcUaAuthType.Certificate user authentication. Implements the third user-token type in the OPC UA spec (Anonymous + UserName + Certificate). Before this PR the Certificate branch threw NotSupportedException. Adds OpcUaClientDriverOptions.UserCertificatePath + UserCertificatePassword knobs for the PFX on disk. The InitializeAsync user-identity switch now calls BuildCertificateIdentity for AuthType=Certificate. Load path uses X509CertificateLoader.LoadPkcs12FromFile -- the non-obsolete .NET 9+ API; the legacy X509Certificate2 PFX ctors are deprecated on net10. Validation up-front: empty UserCertificatePath throws InvalidOperationException naming the missing field; non-existent file throws FileNotFoundException with path; private-key-missing throws InvalidOperationException explaining the private key is required to sign the OPC UA user-token challenge at session activation. Each failure mode is an operator-actionable config problem rather than a mysterious ServiceResultException during session open. UserIdentity(X509Certificate2) ctor carries the cert directly; the SDK sets TokenType=Certificate + wires the cert's public key into the activate-session payload. Private key stays in-memory on the OpenSSL / .NET crypto boundary. Unit tests (OpcUaClientCertAuthTests, 3 facts): BuildCertificateIdentity_rejects_missing_path (error message mentions UserCertificatePath so the fix is obvious); BuildCertificateIdentity_rejects_nonexistent_file (FileNotFoundException); BuildCertificateIdentity_loads_a_valid_PFX_with_private_key -- generates a self-signed RSA-2048 cert on the fly with CertificateRequest.CreateSelfSigned, exports to temp PFX with a password, loads it through the helper, asserts TokenType=Certificate. Test cleans up the temp file in a finally block (best-effort; Windows file locking can leave orphans which is acceptable for %TEMP%). Self-signed cert-on-the-fly avoids shipping a static test PFX that could be flagged by secret-scanners and keeps the test hermetic across dev boxes. 26/26 OpcUaClient.Tests pass (23 prior + 3 cert auth). dotnet build clean. Feature: Anonymous + Username + Certificate all work -- driver-specs.md \u00A78 auth story complete. 2026-04-19 01:47:18 -04:00
a5299a2fee Merge pull request 'Phase 3 PR 70 -- Apply SecurityPolicy + expand to standard OPC UA policies' (#69) from phase-3-pr70-opcua-client-security-policy into v2 2026-04-19 01:46:13 -04:00
Joseph Doherty
a65215684c Phase 3 PR 70 -- Apply SecurityPolicy explicitly + expand to standard OPC UA policy list. Before this PR SecurityPolicy was a string field that got ignored -- the driver only passed useSecurity=SecurityMode!=None to SelectEndpointAsync, so an operator asking for Basic256Sha256 on a server that also advertised Basic128Rsa15 could silently end up on the weaker cipher (the SDK's SelectEndpoint returns whichever matching endpoint the server listed first). PR 70 makes policy matching explicit. SecurityPolicy is now an OpcUaSecurityPolicy enum covering the six standard policies documented in OPC UA 1.04: None, Basic128Rsa15 (deprecated, brownfield interop only), Basic256 (deprecated), Basic256Sha256 (recommended baseline), Aes128_Sha256_RsaOaep, Aes256_Sha256_RsaPss. Each maps through MapSecurityPolicy to the SecurityPolicies URI constant the SDK uses for endpoint matching. New SelectMatchingEndpointAsync replaces CoreClientUtils.SelectEndpointAsync. Flow: opens a DiscoveryClient via the non-obsolete DiscoveryClient.CreateAsync(ApplicationConfiguration, Uri, DiagnosticsMasks, ct) path, calls GetEndpointsAsync to enumerate every endpoint the server advertises, filters client-side by policy URI AND mode. When no endpoint matches, throws InvalidOperationException with the full list of what the server DID advertise formatted as 'Policy/Mode' pairs so the operator sees exactly what to fix in their config without a Wireshark trace. Fail-loud behaviour intentional -- a silent fall-through to weaker crypto is worse than a clear config error. MapSecurityPolicy is internal-visible to tests via InternalsVisibleTo from PR 66. Unit tests (OpcUaClientSecurityPolicyTests, 5 facts): MapSecurityPolicy_returns_known_non_empty_uri_for_every_enum_value theory covers all 6 policies; URI contains the enum name for non-None so operators can grep logs back to the config value; MapSecurityPolicy_None_matches_SDK_None_URI, MapSecurityPolicy_Basic256Sha256_matches_SDK_URI, MapSecurityPolicy_Aes256_Sha256_RsaPss_matches_SDK_URI all cross-check against the SDK's SecurityPolicies.* constants to catch a future enum-vs-URI drift; Every_enum_value_has_a_mapping walks Enum.GetValues to ensure adding a new case doesn't silently fall through the switch. Scaffold test updated to assert SecurityPolicy default = None (was previously unchecked). 23/23 OpcUaClient.Tests pass (13 prior + 5 scaffold + 5 new policy). dotnet build clean. Note on DiscoveryClient: the synchronous DiscoveryClient.Create(...) overloads are all [Obsolete] in SDK 1.5.378; must use DiscoveryClient.CreateAsync. GetEndpointsAsync(null, ct) returns EndpointDescriptionCollection directly (not a wrapper). 2026-04-19 01:44:07 -04:00
82f2dfcfa3 Merge pull request 'Phase 3 PR 69 -- OPC UA Client ISubscribable + IHostConnectivityProbe' (#68) from phase-3-pr69-opcua-client-subscribe-probe into v2 2026-04-19 01:24:21 -04:00
Joseph Doherty
0433d3a35e Phase 3 PR 69 -- OPC UA Client ISubscribable + IHostConnectivityProbe. Completes the OpcUaClientDriver capability surface — now matches the Galaxy + Modbus + S7 driver coverage. ISubscribable: SubscribeAsync creates a new upstream Subscription via the non-obsolete Subscription(ITelemetryContext, SubscriptionOptions) ctor + AddItem/CreateItemsAsync flow, wires each MonitoredItem's Notification event into OnDataChange. Tag strings round-trip through MonitoredItem.Handle so the notification handler can identify which tag changed without a second lookup. Publishing interval floored at 50ms (servers negotiate up anyway; sub-50ms wastes round-trip). SubscriptionOptions uses KeepAliveCount=10, LifetimeCount=1000, TimestampsToReturn=Both so SourceTimestamp passthrough for the cascading-quality rule works through subscription paths too. UnsubscribeAsync calls Subscription.DeleteAsync(silent:true) and tolerates unknown handles (returns cleanly) because the caller's race with server-side cleanup after a session drop shouldn't crash either side. Session shutdown explicitly deletes every remote subscription before closing — avoids BadSubscriptionIdInvalid noise in the upstream server's log on Close. IHostConnectivityProbe: HostName surfaced as the EndpointUrl (not host:port like the Modbus/S7 drivers) so the Admin /hosts dashboard can render the full opc.tcp:// URL as a clickable target back at the remote server. HostState tracked via session.KeepAlive event — OPC UA's built-in keep-alive is authoritative for session liveness (the SDK pings on KeepAliveInterval, sets KeepAliveStopped after N missed pings), strictly better than a driver-side polling probe: no extra wire round-trip, no duplicate semantic with the native protocol. Handler transitions Running on healthy keep-alives and Stopped on any Bad service-result. Initial Running raised at end of InitializeAsync once the session is up; Shutdown transitions back to Unknown + unwires the handler. Unit tests (OpcUaClientSubscribeAndProbeTests, 3 facts): SubscribeAsync_without_initialize_throws_InvalidOperationException, UnsubscribeAsync_with_unknown_handle_is_noop (session-drop-race safety), GetHostStatuses_returns_endpoint_url_row_pre_init (asserts EndpointUrl as the host identity -- the full opc.tcp://plc.example:4840 URL). Live-session subscribe/unsubscribe round-trip + keep-alive state transition coverage lands in a follow-up PR once we scaffold the in-process OPC UA server fixture. 13/13 OpcUaClient.Tests pass. dotnet build clean. All six capability interfaces (IDriver / ITagDiscovery / IReadable / IWritable / ISubscribable / IHostConnectivityProbe) implemented — OPC UA Client driver surface complete. 2026-04-19 01:22:14 -04:00
141673fc80 Merge pull request 'Phase 3 PR 68 -- OPC UA Client ITagDiscovery (Full browse)' (#67) from phase-3-pr68-opcua-client-discovery into v2 2026-04-19 01:19:27 -04:00
Joseph Doherty
db56a95819 Phase 3 PR 68 -- OPC UA Client ITagDiscovery via recursive browse (Full strategy). Adds ITagDiscovery to OpcUaClientDriver. DiscoverAsync opens a single Remote folder on the IAddressSpaceBuilder and recursively browses from the configured root (default: ObjectsFolder i=85; override via OpcUaClientDriverOptions.BrowseRoot for scoped discovery). Browse uses non-obsolete Session.BrowseAsync(RequestHeader, ViewDescription, uint maxReferences, BrowseDescriptionCollection, ct) with HierarchicalReferences forward, subtypes included, NodeClassMask Object+Variable, ResultMask pulling BrowseName + DisplayName + NodeClass + TypeDefinition. Objects become sub-folders via builder.Folder; Variables become builder.Variable entries with FullName set to the NodeId.ToString() serialization so IReadable/IWritable can round-trip without re-resolving. Three safety caps added to OpcUaClientDriverOptions to bound runaway discovery: (1) MaxBrowseDepth default 10 -- deep enough for realistic OPC UA information models, shallow enough that cyclic graphs can't spin the browse forever. (2) MaxDiscoveredNodes default 10_000 -- caps memory on pathological remote servers. Once the cap is hit, recursion short-circuits and the partially-discovered tree is still projected into the local address space (graceful degradation rather than all-or-nothing). (3) BrowseRoot as an opt-in scope restriction string per driver-specs.md \u00A78 -- defaults to ObjectsFolder but operators with 100k-node servers can point it at a single subtree. Visited-set tracks NodeIds already visited to prevent infinite cycles on graphs with non-strict hierarchy (OPC UA models can have back-references). Transient browse failures on a subtree are swallowed -- the sub-branch stops but the rest of discovery continues, matching the Modbus driver's 'transient poll errors don't kill the loop' pattern. The driver's health surface reflects the network-level cascade via the probe loop (PR 69). Deferred to a follow-up PR: DataType resolution via a batch Session.ReadAsync(Attributes.DataType) after the browse so DriverAttributeInfo.DriverDataType is accurate instead of the current conservative DriverDataType.Int32 default; AccessLevel-derived SecurityClass instead of the current ViewOnly default; array-type detection via Attributes.ValueRank + ArrayDimensions. These need an extra wire round-trip per batch of variables + a NodeId -> DriverDataType mapping table; out of scope for PR 68 to keep browse path landable. Unit tests (OpcUaClientDiscoveryTests, 3 facts): DiscoverAsync_without_initialize_throws_InvalidOperationException (pre-init hits RequireSession); DiscoverAsync_rejects_null_builder (ArgumentNullException); Discovery_caps_are_sensible_defaults (asserts 10000 / 10 / null defaults documented above). NullAddressSpaceBuilder stub implements the full IAddressSpaceBuilder shape including IVariableHandle.MarkAsAlarmCondition (throws NotSupportedException since this PR doesn't wire alarms). Live-browse coverage against a real remote server is deferred to the in-process-server-fixture PR. 10/10 OpcUaClient.Tests pass. dotnet build clean. 2026-04-19 01:17:21 -04:00
89bd726fa8 Merge pull request 'Phase 3 PR 67 -- OPC UA Client IReadable + IWritable' (#66) from phase-3-pr67-opcua-client-read-write into v2 2026-04-19 01:15:42 -04:00
Joseph Doherty
238748bc98 Phase 3 PR 67 -- OPC UA Client IReadable + IWritable via Session.ReadAsync/WriteAsync. Adds IReadable + IWritable capabilities to OpcUaClientDriver, routing reads/writes through the session's non-obsolete ReadAsync(RequestHeader, maxAge, TimestampsToReturn, ReadValueIdCollection, ct) and WriteAsync(RequestHeader, WriteValueCollection, ct) overloads (the sync and BeginXxx/EndXxx patterns are all [Obsolete] in SDK 1.5.378). Serializes on the shared Gate from PR 66 so reads + writes + future subscribe + probe don't race on the single session. NodeId parsing: fullReferences use OPC UA's standard serialized NodeId form -- ns=2;s=Demo.Counter, i=2253, ns=4;g=... for GUID, ns=3;b=... for opaque. TryParseNodeId calls NodeId.Parse with the session's MessageContext which honours the server-negotiated namespace URI table. Malformed input surfaces as BadNodeIdInvalid (0x80330000) WITHOUT a wire round-trip -- saves a request for a fault the driver can detect locally. Cascading-quality implementation per driver-specs.md \u00A78: upstream StatusCode, SourceTimestamp, and ServerTimestamp pass through VERBATIM. Bad codes from the remote server stay as the same Bad code (not translated to generic BadInternalError) so downstream clients can distinguish 'upstream value unavailable' from 'local driver bug'. SourceTimestamp is preserved verbatim (null on MinValue guard) so staleness is visible; ServerTimestamp falls back to DateTime.UtcNow if the upstream omitted it, never overwriting a non-zero value. Wire-level exceptions in the Read batch -- transport / timeout / session-dropped -- fan out BadCommunicationError (0x80050000) across every tag in the batch, not BadInternalError, so operators distinguish network reachability from driver faults. Write-side same pattern: successful WriteAsync maps each upstream StatusCode.Code verbatim into the local WriteResult.StatusCode; transport-layer failure fans out BadCommunicationError across the whole batch. WriteValue carries AttributeId=Value + DataValue wrapping Variant(writeValue) -- the SDK handles the type-to-Variant mapping for common CLR types (bool, int, float, string, etc.) so the driver doesn't need a per-type switch. Name disambiguation: the SDK has its own Opc.Ua.WriteRequest type which collides with ZB.MOM.WW.OtOpcUa.Core.Abstractions.WriteRequest; method signature uses the fully-qualified Core.Abstractions.WriteRequest. Unit tests (OpcUaClientReadWriteTests, 2 facts): ReadAsync_without_initialize_throws_InvalidOperationException + WriteAsync_without_initialize_throws_InvalidOperationException -- pre-init calls hit RequireSession and fail uniformly. Wire-level round-trip coverage against a live remote server lands in a follow-up PR once we scaffold an in-process OPC UA server fixture (the existing Server project in the solution is a candidate host). 7/7 OpcUaClient.Tests pass (5 scaffold + 2 read/write). dotnet build clean. Scope: ITagDiscovery (browse) + ISubscribable + IHostConnectivityProbe remain deferred to PRs 68-69 which also need namespace-index remapping and reference-counted MonitoredItem forwarding per driver-specs.md \u00A78. 2026-04-19 01:13:34 -04:00
b21d550836 Merge pull request 'Phase 3 PR 66 -- OPC UA Client (gateway) driver scaffold' (#65) from phase-3-pr66-opcua-client-scaffold into v2 2026-04-19 01:10:07 -04:00
Joseph Doherty
91eaf534c8 Phase 3 PR 66 -- OPC UA Client (gateway) driver project scaffold + IDriver session lifecycle. First driver that CONSUMES OPC UA rather than PUBLISHES it -- connects to a remote server and re-exposes its address space through the local OtOpcUa server per driver-specs.md \u00A78. Uses the same OPCFoundation.NetStandard.Opc.Ua.Client package the existing Client.Shared ships (bumped to 1.5.378.106 to match). Builds its own ApplicationConfiguration (cert stores under %LocalAppData%/OtOpcUa/pki so multiple driver instances in one OtOpcUa server process share a trust anchor) rather than reusing Client.Shared -- Client.Shared is oriented at the interactive CLI with different session-lifetime needs (this driver is always-on, needs keep-alive + session transfer on reconnect + multi-year uptime). Navigated the post-refactor 1.5.378 SDK surface: every Session.Create* static is now [Obsolete] in favour of DefaultSessionFactory; CoreClientUtils.SelectEndpoint got the sync overloads deprecated in favour of SelectEndpointAsync with a required ITelemetryContext parameter. Driver passes telemetry: null! to both SelectEndpointAsync + new DefaultSessionFactory(telemetry: null!) -- the SDK's internal default sink handles null gracefully and plumbing a telemetry context through the driver options surface is out of scope (the driver emits its own logs via the DriverHealth surface anyway). ApplicationInstance default ctor is also obsolete; wrapped in #pragma warning disable CS0618 rather than migrate to the ITelemetryContext overload for the same reason. OpcUaClientDriverOptions models driver-specs.md \u00A78 settings: EndpointUrl (default opc.tcp://localhost:4840 IANA-assigned port), SecurityPolicy/SecurityMode/AuthType enums, Username/Password, SessionTimeout=120s + KeepAliveInterval=5s + ReconnectPeriod=5s (defaults from spec), AutoAcceptCertificates=false (production default; dev turns on for self-signed servers), ApplicationUri + SessionName knobs for certificate SAN matching and remote-server session-list identification. OpcUaClientDriver : IDriver: InitializeAsync builds the ApplicationConfiguration, resolves + creates cert if missing via app.CheckApplicationInstanceCertificatesAsync, selects endpoint via CoreClientUtils.SelectEndpointAsync, builds UserIdentity (Anonymous or Username with UTF-8-encoded password bytes -- the legacy string-password ctor went away; Certificate auth deferred), creates session via DefaultSessionFactory.CreateAsync. Health transitions Unknown -> Initializing -> Healthy on success or -> Faulted on failure with best-effort Session.CloseAsync cleanup. ShutdownAsync (async now, not Task.CompletedTask) closes the session + disposes. Internal Session + Gate expose to the test project via InternalsVisibleTo so PRs 67-69 can stack read/write/discovery/subscribe on the same serialization. Scaffold tests (OpcUaClientDriverScaffoldTests, 5 facts): Default_options_target_standard_opcua_port_and_anonymous_auth (4840 + None mode + Anonymous + AutoAccept=false production default), Default_timeouts_match_driver_specs_section_8 (120s/5s/5s), Driver_reports_type_and_id_before_connect (DriverType=OpcUaClient, DriverInstanceId round-trip, pre-init Unknown health), Initialize_against_unreachable_endpoint_transitions_to_Faulted_and_throws, Reinitialize_against_unreachable_endpoint_re_throws. Uses opc.tcp://127.0.0.1:1 as the 'guaranteed-unreachable' target -- RFC 5737 reserved IPs get black-holed and time out only after the SDK's internal retry/backoff fully elapses (~60s), while port 1 on loopback refuses immediately with TCP RST which keeps the test suite snappy (5 tests / 8s). 5/5 pass. dotnet build clean. Scope boundary: ITagDiscovery / IReadable / IWritable / ISubscribable / IHostConnectivityProbe deliberately NOT in this PR -- they need browse + namespace remapping + reference-counted MonitoredItem forwarding + keep-alive probing and land in PRs 67-69. 2026-04-19 01:07:57 -04:00
d33e38e059 Merge pull request 'Phase 3 PR 65 -- S7 ITagDiscovery + ISubscribable + IHostConnectivityProbe' (#64) from phase-3-pr65-s7-discovery-subscribe-probe into v2 2026-04-19 00:18:17 -04:00
Joseph Doherty
d8ef35d5bd Phase 3 PR 65 -- S7 ITagDiscovery + ISubscribable polling overlay + IHostConnectivityProbe. Three more capability interfaces on S7Driver, matching the Modbus driver's capability coverage. ITagDiscovery: DiscoverAsync streams every configured tag into IAddressSpaceBuilder under a single 'S7' folder; builder.Variable gets a DriverAttributeInfo carrying DriverDataType (MapDataType: Bool->Boolean, Byte/Int/UInt sizes->Int32 (until Core.Abstractions adds widths), Float32/Float64 direct, String + DateTime direct), SecurityClass (Operate if tag.Writable else ViewOnly -- matches the Modbus pattern so DriverNodeManager's ACL layer can gate writes per role without S7-specific logic), IsHistorized=false (S7 has no native historian surface), IsAlarm=false (S7 alarms land through TIA Portal's alarm-in-DB pattern which is per-site and out of scope for PR 65). ISubscribable polling overlay: same pattern Modbus established in PR 22. SubscribeAsync spawns a Task.Run loop that polls every tag, diffs against LastValues, raises OnDataChange on changes plus a force-raise on initial-data push per OPC UA Part 4 convention. Interval floored at 100ms -- S7 CPUs scan 2-10ms but process the comms mailbox at most once per scan, so sub-scan polling just queues wire-side with worse latency per S7netplus documented pattern. Poll errors tolerated: first-read fault doesn't kill the loop (caller can't receive initial values but subsequent polls try again); transient poll errors also swallowed so the loop survives a power-cycle + reconnect through the health surface. UnsubscribeAsync cancels the CTS + removes the subscription -- unknown handle is a no-op, not a throw, because the caller's race with server-side cleanup shouldn't crash either side. Shutdown tears down every subscription before disposing the Plc. IHostConnectivityProbe: HostName surfaced as host:port to match Modbus driver convention (Admin /hosts dashboard renders both families uniformly). GetHostStatuses returns one row (single-endpoint driver). ProbeLoopAsync serializes on the shared Gate + calls Plc.ReadStatusAsync (cheap Get-CPU-Status PDU that doubles as an 'is PLC up' check) every Probe.Interval with a Probe.Timeout cap, transitions HostState Unknown/Stopped -> Running on success and -> Stopped on any failure, raises OnHostStatusChanged only on actual transitions (no noise for steady-state probes). Probe loop starts at end of InitializeAsync when Probe.Enabled=true (default); Shutdown cancels the probe CTS. Initial state stays Unknown until first successful probe -- avoids broadcasting a premature Running before any PDU round-trip has happened. Unit tests (S7DiscoveryAndSubscribeTests, 4 facts): DiscoverAsync_projects_every_tag_into_the_address_space (3 tags + mixed writable/read-only -> Operate vs ViewOnly asserted), GetHostStatuses_returns_one_row_with_host_port_identity_pre_init, SubscribeAsync_returns_unique_handles_and_UnsubscribeAsync_accepts_them (diagnosticId uniqueness + idempotent double-unsubscribe), Subscribe_publishing_interval_is_floored_at_100ms (accepts 50ms request without throwing -- floor is applied internally). Uses a RecordingAddressSpaceBuilder stub that implements IVariableHandle.FullReference + MarkAsAlarmCondition (throws NotImplementedException since the S7 driver never calls it -- alarms out of scope). 57/57 S7 unit tests pass. dotnet build clean. All 5 capability interfaces (IDriver/ITagDiscovery/IReadable/IWritable/ISubscribable/IHostConnectivityProbe) now implemented -- the S7 driver surface is on par with the Modbus driver, minus the extended data types (Int64/UInt64/Float64/String/DateTime deferred per PR 64). 2026-04-19 00:16:10 -04:00
5e318a1ab6 Merge pull request 'Phase 3 PR 64 -- S7 IReadable + IWritable via S7.Net' (#63) from phase-3-pr64-s7-read-write into v2 2026-04-19 00:12:59 -04:00
Joseph Doherty
394d126b2e Phase 3 PR 64 -- S7 IReadable + IWritable via S7.Net string-based Plc.ReadAsync/WriteAsync. Adds IReadable + IWritable capability interfaces to S7Driver, routing reads/writes through S7netplus's string-address API (Plc.ReadAsync(string, ct) / Plc.WriteAsync(string, object, ct)). All operations serialize on the class's SemaphoreSlim Gate because S7netplus mandates one Plc connection per PLC with client-side serialization -- parallel reads against a single S7 CPU queue wire-side anyway and just eat connection-resource budget. Supported data types in this PR: Bool, Byte, Int16, UInt16, Int32, UInt32, Float32. S7.Net's string-based read returns UNSIGNED boxed values (DBX=bool, DBB=byte, DBW=ushort, DBD=uint); the driver reinterprets them into the requested S7DataType via the (DataType, Size, raw) switch: unchecked short-cast for Int16, unchecked int-cast for Int32, BitConverter.UInt32BitsToSingle for Float32. Writes inverse the conversion -- Int16 -> unchecked ushort cast, Int32 -> unchecked uint cast, Float32 -> BitConverter.SingleToUInt32Bits -- before handing to S7.Net's WriteAsync. This avoids a second PLC round-trip that a typed ReadAsync(DataType, db, offset, VarType, ...) overload would need. Int64, UInt64, Float64, String, DateTime throw NotSupportedException (-> BadNotSupported StatusCode); S7 STRING has non-trivial header semantics + LReal/DateTime need typed S7.Net API paths, both land in a follow-up PR when scope demands. InitializeAsync now parses every tag's Address string via S7AddressParser at init time. Bad addresses throw FormatException and flip health to Faulted -- callers can't register a broken driver. The parsed form goes into _parsedByName so Read/Write can consult Size/BitOffset without re-parsing per operation. StatusCode mapping in catch chain: unknown tag name -> BadNodeIdUnknown (0x80340000), unsupported data type -> BadNotSupported (0x803D0000), read-only tag write attempt -> BadNotWritable (0x803B0000), S7.Net PlcException (carries PUT/GET-disabled signal on S7-1200/1500) -> BadDeviceFailure (0x80550000) so operators see a TIA-Portal config problem rather than a transient-fault false flag per driver-specs.md \u00A75, any other runtime exception on read -> BadCommunicationError (0x80050000) to distinguish socket/timeout from tag-level faults. Write generic-exception path stays BadInternalError because write failures can legitimately be driver-side value-range problems. Unit tests (S7DriverReadWriteTests, 3 facts): Initialize_rejects_invalid_tag_address_and_fails_fast -- Tags with a malformed address must throw at InitializeAsync rather than producing a half-healthy driver; ReadAsync_without_initialize_throws_InvalidOperationException + WriteAsync_without_initialize_throws_InvalidOperationException -- pre-init calls hit RequirePlc and throw the uniform 'not initialized' message. Wire-level round-trip coverage (integration test against a live S7-1500 or a mock S7 server) is deferred -- S7.Net doesn't ship an in-process fake and a conformant mock is non-trivial. 53/53 Modbus.Driver.S7.Tests pass (50 parser + 3 read/write). dotnet build clean. 2026-04-19 00:10:41 -04:00
0eab1271be Merge pull request 'Phase 3 PR 63 -- S7AddressParser (DB/M/I/Q/T/C grammar)' (#62) from phase-3-pr63-s7-address-parser into v2 2026-04-19 00:08:27 -04:00
23 changed files with 3385 additions and 1 deletions

View File

@@ -10,6 +10,7 @@
<Project Path="src/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Proxy/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Proxy.csproj"/>
<Project Path="src/ZB.MOM.WW.OtOpcUa.Driver.Modbus/ZB.MOM.WW.OtOpcUa.Driver.Modbus.csproj"/>
<Project Path="src/ZB.MOM.WW.OtOpcUa.Driver.S7/ZB.MOM.WW.OtOpcUa.Driver.S7.csproj"/>
<Project Path="src/ZB.MOM.WW.OtOpcUa.Driver.OpcUaClient/ZB.MOM.WW.OtOpcUa.Driver.OpcUaClient.csproj"/>
<Project Path="src/ZB.MOM.WW.OtOpcUa.Client.Shared/ZB.MOM.WW.OtOpcUa.Client.Shared.csproj"/>
<Project Path="src/ZB.MOM.WW.OtOpcUa.Client.CLI/ZB.MOM.WW.OtOpcUa.Client.CLI.csproj"/>
<Project Path="src/ZB.MOM.WW.OtOpcUa.Client.UI/ZB.MOM.WW.OtOpcUa.Client.UI.csproj"/>
@@ -28,6 +29,7 @@
<Project Path="tests/ZB.MOM.WW.OtOpcUa.Driver.Modbus.Tests/ZB.MOM.WW.OtOpcUa.Driver.Modbus.Tests.csproj"/>
<Project Path="tests/ZB.MOM.WW.OtOpcUa.Driver.Modbus.IntegrationTests/ZB.MOM.WW.OtOpcUa.Driver.Modbus.IntegrationTests.csproj"/>
<Project Path="tests/ZB.MOM.WW.OtOpcUa.Driver.S7.Tests/ZB.MOM.WW.OtOpcUa.Driver.S7.Tests.csproj"/>
<Project Path="tests/ZB.MOM.WW.OtOpcUa.Driver.OpcUaClient.Tests/ZB.MOM.WW.OtOpcUa.Driver.OpcUaClient.Tests.csproj"/>
<Project Path="tests/ZB.MOM.WW.OtOpcUa.Client.Shared.Tests/ZB.MOM.WW.OtOpcUa.Client.Shared.Tests.csproj"/>
<Project Path="tests/ZB.MOM.WW.OtOpcUa.Client.CLI.Tests/ZB.MOM.WW.OtOpcUa.Client.CLI.Tests.csproj"/>
<Project Path="tests/ZB.MOM.WW.OtOpcUa.Client.UI.Tests/ZB.MOM.WW.OtOpcUa.Client.UI.Tests.csproj"/>

View File

@@ -0,0 +1,137 @@
# Phase 6.1 — Resilience & Observability Runtime
> **Status**: DRAFT — implementation plan for a cross-cutting phase that was never formalised. The v2 `plan.md` specifies Polly, Tier A/B/C protections, structured logging, and local-cache fallback by decision; none are wired end-to-end.
>
> **Branch**: `v2/phase-6-1-resilience-observability`
> **Estimated duration**: 3 weeks
> **Predecessor**: Phase 5 (drivers) — partial; S7 + OPC UA Client shipped, AB/TwinCAT/FOCAS paused
> **Successor**: Phase 6.2 (Authorization runtime)
## Phase Objective
Land the cross-cutting runtime protections + operability features that `plan.md` + `driver-stability.md` specify by decision but that no driver-phase actually wires. End-state: every driver goes through the same Polly resilience layer, health endpoints render the live driver fleet, structured logs carry per-request correlation IDs, and the config substrate survives a central DB outage via a LiteDB local cache.
Closes these gaps flagged in the 2026-04-19 audit:
1. Polly v8 resilience pipelines wired to every `IDriver` capability (no-op per-driver today; Galaxy has a hand-rolled `CircuitBreaker` only).
2. Tier A/B/C enforcement at runtime — `driver-stability.md` §24 and decisions #6373 define memory watchdog, bounded queues, scheduled recycle, wedge detection; `MemoryWatchdog` exists only inside `Driver.Galaxy.Host`.
3. Health endpoints (`/healthz`, `/readyz`) on `OtOpcUa.Server`.
4. Structured Serilog with per-request correlation IDs (driver instance, OPC UA session, IPC call).
5. LiteDB local cache + Polly retry + fallback on central-DB outage (decision #36).
## Scope — What Changes
| Concern | Change |
|---------|--------|
| `Core` → new `Core.Resilience` sub-namespace | Shared Polly pipeline builder (`DriverResiliencePipelines`), per-capability policy (Read / Write / Subscribe / HistoryRead / Discover / Probe / Alarm). One pipeline per driver instance; driver-options decide tuning. |
| Every `IDriver*` consumer in the server | Wrap capability calls in the shared pipeline. Policy composition order: timeout → retry (with jitter, bounded by capability-specific `MaxRetries`) → circuit breaker (per driver instance, opens on N consecutive failures) → bulkhead (ceiling on in-flight requests per driver). |
| `Core` → new `Core.Stability` sub-namespace | Generalise `MemoryWatchdog` (`Driver.Galaxy.Host`) into `DriverMemoryWatchdog` consuming `IDriver.GetMemoryFootprint()`. Add `ScheduledRecycleScheduler` (decision #67) for weekly/time-of-day recycle. Add `WedgeDetector` that flips a driver to Faulted when no successful Read in N × PublishingInterval. |
| `DriverTypeRegistry` | Each driver type registers its `DriverTier` {A, B, C}. Tier C drivers must also advertise their out-of-process topology; the registry enforces invariants (Tier C has a `Proxy` + `Host` pair). |
| `OtOpcUa.Server` → new Minimal API endpoints | `/healthz` (liveness — process alive + config DB reachable or LiteDB cache warm), `/readyz` (readiness — every driver instance reports `DriverState.Healthy`). JSON bodies cite individual driver health per instance. |
| Serilog configuration | Centralize enrichers in `OtOpcUa.Server/Observability/LogContextEnricher.cs`. Every driver call runs inside a `LogContext.PushProperty` scope with {DriverInstanceId, DriverType, CapabilityName, CorrelationId (UA RequestHandle or internal GUID)}. Sink config stays rolling-file per CLAUDE.md; JSON-formatted output added alongside plain-text so SIEM ingestion works. |
| `Configuration` project | Add `LiteDbConfigCache` adapter. Wrap EF Core queries in a Polly pipeline: timeout (2 s) → retry (3×, jittered) → fallback-to-cache. Cache refresh on successful DB query + after `sp_PublishGeneration`. Cache lives at `%ProgramData%/OtOpcUa/config-cache/<cluster-id>.db` per node. |
| `DriverHostStatus` entity | Extend to carry `LastCircuitBreakerOpenUtc`, `ConsecutiveFailures`, `CurrentBulkheadDepth`, `LastRecycleUtc`. Admin `/hosts` page reads these. |
## Scope — What Does NOT Change
| Item | Reason |
|------|--------|
| Driver wire protocols | Resilience is a server-side wrapper; individual drivers don't see Polly. Their existing retry logic (ModbusTcpTransport reconnect, SessionReconnectHandler) stays in place as inner layers. |
| Config DB schema | LiteDB cache is a read-only mirror; no new central tables except `DriverHostStatus` column additions. |
| OPC UA wire behavior visible to clients | Health endpoints live on a separate HTTP port (4841 by convention); the OPC UA server on 4840 is unaffected. |
| The four 2026-04-13 Galaxy stability findings | Already closed in Phase 2. Phase 6.1 *generalises* the pattern, doesn't re-fix Galaxy. |
| Driver-layer SafeHandle usage | Existing Galaxy `SafeMxAccessHandle` + Modbus `TcpClient` disposal stay — they're driver-internal, not part of the cross-cutting layer. |
## Entry Gate Checklist
- [ ] Phases 05 exit gates cleared (or explicitly deferred with task reference)
- [ ] `driver-stability.md` §24 re-read; decisions #6373 + #3436 re-skimmed
- [ ] Polly v8 NuGet available (`Microsoft.Extensions.Resilience` + `Polly.Core`) — verify package restore before task breakdown
- [ ] LiteDB 5.x NuGet confirmed MIT + actively maintained
- [ ] Existing drivers catalogued: Galaxy.Proxy, Modbus, S7, OpcUaClient — confirm test counts baseline so the resilience layer doesn't regress any
- [ ] Serilog configuration inventory: locate every `Log.ForContext` call site that will need `LogContext` rewrap
- [ ] Admin `/hosts` page's current `DriverHostStatus` consumption reviewed so the schema extensions don't break it
## Task Breakdown
### Stream A — Resilience layer (1 week)
1. **A.1** Add `Polly.Core` + `Microsoft.Extensions.Resilience` to `Core`. Build `DriverResiliencePipelineBuilder` that composes Timeout → Retry (exponential backoff + jitter, capability-specific max retries) → CircuitBreaker (consecutive-failure threshold; half-open probe) → Bulkhead (max in-flight per driver instance). Unit tests cover each policy in isolation + composed pipeline.
2. **A.2** `DriverResilienceOptions` record bound from `DriverInstance.ResilienceConfig` JSON column (new nullable). Defaults encoded per-tier: Tier A (OPC UA Client, S7) — 3 retries, 2 s timeout, 5-failure breaker; Tier B (Modbus) — same except 4 s timeout; Tier C (Galaxy) — 1 retry (inner supervisor handles restart), 10 s timeout, circuit-breaker trips but doesn't kill the driver (the Proxy supervisor already handles that).
3. **A.3** `DriverCapabilityInvoker<T>` wraps every `IDriver*` method call. Existing server-side dispatch (whatever currently calls `driver.ReadAsync`) routes through the invoker. Policy injection via DI.
4. **A.4** Remove the hand-rolled `CircuitBreaker` + `Backoff` from `Driver.Galaxy.Proxy/Supervisor/` — replaced by the shared layer. Keep `HeartbeatMonitor` (different concern: IPC liveness, not data-path resilience).
5. **A.5** Unit tests: per-policy, per-composition. Integration test: Modbus driver under a FlakeyTransport that fails 5×, succeeds on 6th; invoker surfaces the eventual success. Bench: no-op overhead < 1% under nominal load.
### Stream B — Tier A/B/C stability runtime (1 week, can parallel with Stream A after A.1)
1. **B.1** `Core.Abstractions``DriverTier` enum {A, B, C}. Extend `DriverTypeRegistry` to require `DriverTier` at registration. Existing driver types get their tier stamped (Galaxy = C, Modbus = B, S7 = B, OpcUaClient = A).
2. **B.2** Generalise `DriverMemoryWatchdog` (lift from `Driver.Galaxy.Host/MemoryWatchdog.cs`). Tier-specific thresholds: A = 256 MB RSS soft / 512 MB hard, B = 512 MB soft / 1 GB hard, C = 1 GB soft / 2 GB hard (decision #70 hybrid multiplier + floor). Soft threshold → log + metric; hard threshold → mark driver Faulted + trigger recycle.
3. **B.3** `ScheduledRecycleScheduler` (decision #67): each driver instance can opt-in to a weekly recycle at a configured cron. Recycle = `ShutdownAsync``InitializeAsync`. Tier C drivers get the Proxy-side recycle; Tier A/B recycle in-process.
4. **B.4** `WedgeDetector`: polling thread per driver instance; if `LastSuccessfulRead` older than `WedgeThreshold` (default 5 × PublishingInterval, minimum 60 s) AND driver state is `Healthy`, flag as wedged → force `ReinitializeAsync`. Prevents silent dead-subscriptions.
5. **B.5** Tests: watchdog unit tests drive synthetic allocation; scheduler uses a virtual clock; wedge detector tests use a fake IClock + driver stub.
### Stream C — Health endpoints + structured logging (4 days)
1. **C.1** `OtOpcUa.Server/Observability/HealthEndpoints.cs` — Minimal API on a second Kestrel binding (default `http://+:4841`). `/healthz` reports process uptime + config-DB reachability (or cache-warm). `/readyz` enumerates `DriverInstance` rows + reports each driver's `DriverHealth.State`; returns 503 if ANY driver is Faulted. JSON body per `docs/v2/acl-design.md` §"Operator Dashboards" shape.
2. **C.2** `LogContextEnricher` installed at Serilog config time. Every driver-capability call site wraps its body in `using (LogContext.PushProperty("DriverInstanceId", id)) using (LogContext.PushProperty("CorrelationId", correlationId))`. Correlation IDs: reuse OPC UA `RequestHeader.RequestHandle` when in-flight; otherwise generate `Guid.NewGuid().ToString("N")[..12]`.
3. **C.3** Add JSON-formatted Serilog sink alongside the existing rolling-file plain-text sink so SIEMs (Splunk, Datadog) can ingest without a regex parser. Sink switchable via `Serilog:WriteJson` appsetting.
4. **C.4** Integration test: boot server, issue Modbus read, assert log line contains `DriverInstanceId` + `CorrelationId` structured fields.
### Stream D — Config DB LiteDB fallback (1 week)
1. **D.1** `LiteDbConfigCache` adapter. Wraps `ConfigurationDbContext` queries that are safe to serve stale (cluster membership, generation metadata, driver instance definitions, LDAP role mapping). Write-path queries (draft save, publish) bypass the cache and fail hard on DB outage.
2. **D.2** Cache refresh strategy: refresh on every successful read (write-through-cache), full refresh after `sp_PublishGeneration` confirmation. Cache entries carry `CachedAtUtc`; served entries older than 24 h trigger a synthetic `Warning` log line so operators see stale data in effect.
3. **D.3** Polly pipeline in `Configuration` project: EF Core query → retry 3× → fallback to cache. On fallback, driver state stays `Healthy` but a `UsingStaleConfig` flag on the cluster's health report flips true.
4. **D.4** Tests: in-memory SQL Server failure injected via `TestContainers`-ish double; cache returns last-known values; Admin UI banners reflect `UsingStaleConfig`.
### Stream E — Admin `/hosts` page refresh (3 days)
1. **E.1** Extend `DriverHostStatus` schema with Stream A resilience columns. Generate EF migration.
2. **E.2** `Admin/FleetStatusHub` SignalR hub pushes `LastCircuitBreakerOpenUtc` + `CurrentBulkheadDepth` + `LastRecycleUtc` on change.
3. **E.3** `/hosts` Blazor page renders new columns; red badge if `ConsecutiveFailures > breakerThreshold / 2`.
## Compliance Checks (run at exit gate)
- [ ] **Polly coverage**: every `IDriver*` method call in the server dispatch layer routes through `DriverCapabilityInvoker`. Enforce via a Roslyn analyzer added to `Core.Abstractions` build (error on direct `IDriver.ReadAsync` calls outside the invoker).
- [ ] **Tier registry**: every driver type registered in `DriverTypeRegistry` has a non-null `Tier`. Unit test walks the registry + asserts no gaps.
- [ ] **Health contract**: `/healthz` + `/readyz` respond within 500 ms even with one driver Faulted.
- [ ] **Structured log**: CI grep on `tests/` output asserts at least one log line contains `"DriverInstanceId"` + `"CorrelationId"` JSON fields.
- [ ] **Cache fallback**: Integration test kills the SQL container mid-operation; driver health stays `Healthy`, `UsingStaleConfig` flips true.
- [ ] No regression in existing test suites — `dotnet test ZB.MOM.WW.OtOpcUa.slnx` count equal-or-greater than pre-Phase-6.1 baseline.
## Risks and Mitigations
| Risk | Likelihood | Impact | Mitigation |
|------|:----------:|:------:|------------|
| Polly pipeline adds per-request latency on hot path | Medium | Medium | Benchmark Stream A.5 before merging; 1 % overhead budget; inline hot path short-circuits when retry count = 0 |
| LiteDB cache diverges from central DB | Medium | High | Stale-data banner in Admin UI; `UsingStaleConfig` flag surfaced on `/readyz`; cache refresh on every successful DB round-trip; 24-hour synthetic warning |
| Tier watchdog false-positive-kills a legitimate batch load | Low | High | Soft/hard threshold split; soft only logs; hard triggers recycle; thresholds configurable per-instance |
| Wedge detector races with slow-but-healthy drivers | Medium | High | Minimum 60 s threshold; detector only activates if driver claims `Healthy`; add circuit-breaker feedback so rapid oscillation trips instead of thrashing |
| Roslyn analyzer breaks external driver authors | Low | Medium | Release analyzer as warning-level initially; upgrade to error in Phase 6.1+1 after one release cycle |
## Completion Checklist
- [ ] Stream A: Polly shared pipeline + per-tier defaults + driver-capability invoker + tests
- [ ] Stream B: Tier registry + generalised watchdog + scheduled recycle + wedge detector
- [ ] Stream C: `/healthz` + `/readyz` + structured logging + JSON Serilog sink
- [ ] Stream D: LiteDB cache + Polly fallback in Configuration
- [ ] Stream E: Admin `/hosts` page refresh
- [ ] Cross-cutting: `phase-6-1-compliance.ps1` exits 0; full solution `dotnet test` passes; exit-gate doc recorded
## Adversarial Review — 2026-04-19 (Codex, thread `019da489-e317-7aa1-ab1f-6335e0be2447`)
Plan substantially rewritten before implementation to address these findings. Each entry: severity · verdict · adjustment.
1. **Crit · ACCEPT** — Auto-retry collides with decisions #44/#45 (no auto-write-retry; opt-in via `WriteIdempotent` + CAS). Pipeline now **capability-specific**: Read/HistoryRead/Discover/Probe/Alarm-subscribe all get retries; **Write does not** unless the tag metadata carries `WriteIdempotent=true`. New `WriteIdempotentAttribute` surfaces on `ModbusTagDefinition` / `S7TagDefinition` / etc.
2. **Crit · ACCEPT** — "One pipeline per driver instance" breaks decision #35's per-device isolation. **Change**: pipeline key is `(DriverInstanceId, HostName)` not just `DriverInstanceId`. One dead PLC behind a multi-device Modbus driver no longer opens the breaker for healthy siblings.
3. **Crit · ACCEPT** — Memory watchdog + scheduled recycle at Tier A/B breaches decisions #73/#74 (process-kill protections are Tier-C-only). **Change**: Stream B splits into two — `MemoryTracking` (all tiers, soft/hard thresholds log + surface to Admin `/hosts`; never kills) and `MemoryRecycle` (Tier C only, requires out-of-process topology). Tier A/B overrun paths escalate to Tier C via a future PR, not auto-kill.
4. **High · ACCEPT** — Removing Galaxy's hand-rolled `CircuitBreaker` drops decision #68 host-supervision crash-loop protection. **Change**: keep `Driver.Galaxy.Proxy/Supervisor/CircuitBreaker.cs` + `Backoff.cs` — they guard the IPC *process* re-spawn, not the per-call data path. Data-path Polly is an orthogonal layer.
5. **High · ACCEPT** — Roslyn analyzer targeting `IDriver` misses the hot paths (`IReadable.ReadAsync`, `IWritable.WriteAsync`, `ISubscribable.SubscribeAsync` etc.). **Change**: analyzer rule now matches every method on the capability interfaces; compliance doc enumerates the full call-site list.
6. **High · ACCEPT**`/healthz` + `/readyz` under-specified for degraded-running. **Change**: add a state-matrix sub-section explicitly covering `Unknown` (pre-init: `/readyz` 503), `Initializing` (503), `Healthy` (200), `Degraded` (200 with JSON body flagging the degraded driver; `/readyz` is OR across drivers), `Faulted` (503), plus cached-config-serving (`/healthz` returns 200 + `UsingStaleConfig: true` in JSON body).
7. **High · ACCEPT**`WedgeDetector` based on "no successful Read" false-fires on write-only subscriptions + idle systems. **Change**: wedge criteria now `(hasPendingWork AND noProgressIn > threshold)` where `hasPendingWork` comes from the Polly bulkhead depth + active MonitoredItem count. Idle driver stays Healthy.
8. **High · ACCEPT** — LiteDB cache serving mixed-generation reads breaks publish atomicity. **Change**: cache is snapshot-per-generation. Each published generation writes a sealed snapshot into `<cache-root>/<cluster>/<generationId>.db`; reads serve the last-known-sealed generation and never mix. Central DB outage during a *publish* means that publish fails (write path doesn't use cache); reads continue from the prior sealed snapshot.
9. **Med · ACCEPT**`DriverHostStatus` schema conflates per-host connectivity with per-driver-instance resilience counters. **Change**: new `DriverInstanceResilienceStatus` table separate from `DriverHostStatus`. Admin `/hosts` joins both for display.
10. **Med · ACCEPT** — Compliance says analyzer-error; risks say analyzer-warning. **Change**: phase 6.1 ships at **error** level (this phase is the gate); warning-mode option removed.
11. **Med · ACCEPT** — Hardcoded per-tier MB bands ignore decision #70's `max(multiplier × baseline, baseline + floor)` formula with observed-baseline capture. **Change**: watchdog captures baseline at post-init plateau (median of first 5 min GetMemoryFootprint samples) + applies the hybrid formula. Tier constants now encode the multiplier + floor, not raw MB.
12. **Med · ACCEPT** — Tests mostly cover happy path. **Change**: Stream A.5 adds negative tests for duplicate-write-replay-under-timeout; Stream B.5 adds false-wedge-on-idle-subscription + false-wedge-on-slow-historic-backfill; Stream D.4 adds mixed-generation cache test + corrupt-first-boot cache test.

View File

@@ -0,0 +1,126 @@
# Phase 6.2 — Authorization Runtime (ACL + LDAP grants)
> **Status**: DRAFT — the v2 `plan.md` decision #129 + `acl-design.md` specify a 6-level permission-trie evaluator with `NodePermissions` bitmask grants, but no runtime evaluator exists. ACL tables are schematized but unread by the data path.
>
> **Branch**: `v2/phase-6-2-authorization-runtime`
> **Estimated duration**: 2.5 weeks
> **Predecessor**: Phase 6.1 (Resilience & Observability) — reuses the Polly pipeline for ACL-cache refresh retries
> **Successor**: Phase 6.3 (Redundancy)
## Phase Objective
Wire ACL enforcement on every OPC UA Read / Write / Subscribe / Call path + LDAP group → admin role grants that the v2 plan specified but never ran. End-state: a user's effective permissions resolve through a per-session permission-trie over the 6-level `Cluster / Namespace / UnsArea / UnsLine / Equipment / Tag` hierarchy, cached per session, invalidated on generation-apply + LDAP group expiry.
Closes these gaps:
1. **Data-path ACL enforcement**`NodeAcl` table + `NodePermissions` flags shipped; `NodeAclService.cs` present as a CRUD surface; no code consults ACLs at `Read`/`Write` time. OPC UA server answers everything to everyone.
2. **`LdapGroupRoleMapping` for cluster-scoped admin grants** — decision #105 shipped as the *design*; admin roles are hardcoded (`FleetAdmin` / `ConfigEditor` / `ReadOnly`) with no cluster-scoping and no LDAP-to-grant table. Decision #105 explicitly lifts this from v2.1 into v2.0.
3. **Explicit Deny pathway** — deferred to v2.1 (decision #129 note). Phase 6.2 ships *grants only*; `Deny` stays out.
4. **Admin UI ACL grant editor**`AclsTab.razor` exists but edits the now-unused `NodeAcl` table; needs to wire to the runtime evaluator + the new `LdapGroupRoleMapping` table.
## Scope — What Changes
| Concern | Change |
|---------|--------|
| `Configuration` project | New entity `LdapGroupRoleMapping { Id, LdapGroup, Role, ClusterId? (nullable = system-wide), IsSystemWide, GeneratedAtUtc }`. Migration. Admin CRUD. |
| `Core` → new `Core.Authorization` sub-namespace | `IPermissionEvaluator` interface; concrete `PermissionTrieEvaluator` implementation loads ACLs + LDAP mappings from Configuration, builds a trie keyed on the 6-level scope hierarchy, evaluates a `(UserClaim[], NodeId, NodePermissions)``bool` decision in O(depth × group-count). |
| `Core.Authorization` cache | `PermissionTrieCache` — one trie per `(ClusterId, GenerationId)`. Rebuilt on `sp_PublishGeneration` confirmation; served from memory thereafter. Per-session evaluator keeps a reference to the current trie + user's LDAP groups. |
| OPC UA server dispatch | `OtOpcUa.Server/OpcUa/DriverNodeManager.cs` Read/Write/HistoryRead/MonitoredItem-create paths call `PermissionEvaluator.Authorize(session.Identity, nodeId, NodePermissions.Read)` etc. before delegating to the driver. Unauthorized returns `BadUserAccessDenied` (0x80210000) — not a silent no-op per corrections-doc B1. |
| `LdapAuthService` (existing) | On cookie-auth success, resolves the user's LDAP groups via `LdapGroupService.GetMemberships` + loads the matching `LdapGroupRoleMapping` rows → produces a role-claim list + cluster-scope claim list. Stored on the auth cookie. |
| Admin UI `AclsTab.razor` | Repoint edits at the new `NodeAclService` API that writes through to the same table the evaluator reads. Add a "test this permission" probe that runs a dummy evaluator against a chosen `(user, nodeId, action)` so ops can sanity-check grants before publishing a draft. |
| Admin UI new tab `RoleGrantsTab.razor` | CRUD over `LdapGroupRoleMapping`. Per-cluster + system-wide grants. FleetAdmin only. |
| Audit log | Every Grant/Revoke/Publish on `LdapGroupRoleMapping` or `NodeAcl` writes an `AuditLog` row with old/new state + user. |
## Scope — What Does NOT Change
| Item | Reason |
|------|--------|
| OPC UA authn | Already done (PR 19 LDAP user identity + Basic256Sha256 profile). Phase 6.2 is authorization only. |
| Explicit `Deny` grants | Decision #129 note explicitly defers to v2.1. Default-deny + additive grants only. |
| Driver-side `SecurityClassification` metadata | Drivers keep reporting `Operate` / `ViewOnly` / etc. — the evaluator uses them as *part* of the decision but doesn't replace them. |
| Galaxy namespace (SystemPlatform kind) | UNS levels don't apply; evaluator treats Galaxy nodes as `Cluster → Namespace → Tag` (skip UnsArea/UnsLine/Equipment). |
## Entry Gate Checklist
- [ ] Phase 6.1 merged (reuse `Core.Resilience` Polly pipeline for the ACL cache-refresh retries)
- [ ] `acl-design.md` re-read in full
- [ ] Decision log #105, #129, corrections-doc B1 re-skimmed
- [ ] Existing `NodeAcl` + `NodePermissions` flag enum audited; confirm bitmask flags match `acl-design.md` table
- [ ] Existing `LdapAuthService` group-resolution code path traced end-to-end — confirm it already queries group memberships (we only need the caller to consume the result)
- [ ] Test DB scenarios catalogued: two clusters, three LDAP groups per cluster, mixed grant shapes; captured as seed-data fixtures
## Task Breakdown
### Stream A — `LdapGroupRoleMapping` table + migration (3 days)
1. **A.1** Entity + EF Core migration. Columns per §Scope table. Unique constraint on `(LdapGroup, ClusterId)` with null-tolerant comparer for the system-wide case. Index on `LdapGroup` for the hot-path lookup on auth.
2. **A.2** `ILdapGroupRoleMappingService` CRUD. Wrap in the Phase 6.1 Polly pipeline (timeout → retry → fallback-to-cache).
3. **A.3** Seed-data migration: preserve the current hardcoded `FleetAdmin` / `ConfigEditor` / `ReadOnly` mappings by seeding rows for the existing LDAP groups the dev box uses (`cn=fleet-admin,…`, `cn=config-editor,…`, `cn=read-only,…`). Op no-op migration for existing deployments.
### Stream B — Permission-trie evaluator (1 week)
1. **B.1** `IPermissionEvaluator.Authorize(IEnumerable<Claim> identity, NodeId nodeId, NodePermissions needed)` — returns `bool`. Phase 6.2 returns only `true` / `false`; v2.1 can widen to `Allow`/`Deny`/`Indeterminate` if Deny lands.
2. **B.2** `PermissionTrieBuilder` builds the trie from `NodeAcl` + `LdapGroupRoleMapping` joined to the current generation's `UnsArea` + `UnsLine` + `Equipment` + `Tag` tables. One trie per `(ClusterId, GenerationId)` so rollback doesn't smear permissions across generations.
3. **B.3** Trie node structure: `{ Level: enum, ScopeId: Guid, AllowedPermissions: NodePermissions, ChildrenByLevel: Dictionary<Guid, TrieNode> }`. Evaluation walks from Cluster → Namespace → UnsArea → UnsLine → Equipment → Tag, ORing allowed permissions at each level. Additive semantics: a grant at Cluster level cascades to every descendant tag.
4. **B.4** `PermissionTrieCache` service scoped as singleton; exposes `GetTrieAsync(ClusterId, ct)` that returns the current-generation trie. Invalidated on `sp_PublishGeneration` via an in-process event bus; also on TTL expiry (24 h safety net).
5. **B.5** Per-session cached evaluator: OPC UA Session authentication produces `UserAuthorizationState { ClusterId, LdapGroups[], Trie }`; cached on the session until session close or generation-apply.
6. **B.6** Unit tests: trie-walk theory covering (a) Cluster-level grant cascades to tags, (b) Equipment-level grant doesn't leak to sibling Equipment, (c) multi-group union, (d) no-grant → deny, (e) Galaxy nodes skip UnsArea/UnsLine levels.
### Stream C — OPC UA server dispatch wiring (4 days)
1. **C.1** `DriverNodeManager.Read` — consult evaluator before delegating to `IReadable`. Unauthorized nodes get `BadUserAccessDenied` per-attribute, not on the whole batch.
2. **C.2** `DriverNodeManager.Write` — same. Evaluator needs `NodePermissions.WriteOperate` / `WriteTune` / `WriteConfigure` depending on driver-reported `SecurityClassification` of the attribute.
3. **C.3** `DriverNodeManager.HistoryRead` — ACL checks `NodePermissions.Read` (history uses the same Read flag per `acl-design.md`).
4. **C.4** `DriverNodeManager.CreateMonitoredItem` — denies unauthorized nodes at subscription create time, not after the first publish. Cleaner than silently omitting notifications.
5. **C.5** Alarm actions (acknowledge / confirm / shelve) — checks `AlarmAck` / `AlarmConfirm` / `AlarmShelve` flags.
6. **C.6** Integration tests: boot server with a seed trie, auth as three distinct users with different group memberships, assert read of one tag allowed + read of another denied + write denied where Read allowed.
### Stream D — Admin UI refresh (4 days)
1. **D.1** `RoleGrantsTab.razor` — FleetAdmin-gated CRUD on `LdapGroupRoleMapping`. Per-cluster dropdown + system-wide checkbox. Validation: LDAP group must exist in the dev LDAP (GLAuth) before saving — best-effort probe with graceful degradation.
2. **D.2** `AclsTab.razor` rewrites its edit path to write through the new `NodeAclService`. Adds a "Probe this permission" row: choose `(LDAP group, node, action)` → shows Allow / Deny + the reason (which grant matched).
3. **D.3** Draft-generation diff viewer now includes an ACL section: "X grants added, Y grants removed, Z grants changed."
4. **D.4** SignalR notification: `PermissionTrieCache` invalidation on `sp_PublishGeneration` pushes to Admin UI so operators see "this clusters permissions were just updated" within 2 s.
## Compliance Checks (run at exit gate)
- [ ] **Data-path enforcement**: OPC UA Read against a NodeId the current user has no grant for returns `BadUserAccessDenied` with a ServiceResult, not Good with stale data. Verified by an integration test with a Basic256Sha256-secured session + a read-only LDAP identity.
- [ ] **Trie invariants**: `PermissionTrieBuilder` is idempotent (building twice with identical inputs produces equal tries — override `Equals` to assert).
- [ ] **Additive grants**: Cluster-level grant on User A means User A can read every tag in that cluster *without* needing any lower-level grant.
- [ ] **Isolation between clusters**: a grant on Cluster 1 has zero effect on Cluster 2 for the same user.
- [ ] **Galaxy path coverage**: ACL checks work on `Galaxy` folder nodes + tag nodes where the UNS levels are absent (the trie treats them as shallow `Cluster → Namespace → Tag`).
- [ ] No regression in driver test counts.
## Risks and Mitigations
| Risk | Likelihood | Impact | Mitigation |
|------|:----------:|:------:|------------|
| ACL evaluator latency on per-read hot path | Medium | High | Trie lookup is O(depth) = O(6); session-cached UserAuthorizationState avoids per-Read trie rebuild; benchmark in Stream B.6 |
| Trie cache stale after a rollback | Medium | High | `sp_PublishGeneration` + `sp_RollbackGeneration` both emit the invalidation event; trie keyed on `(ClusterId, GenerationId)` so rollback fetches the prior trie cleanly |
| `BadUserAccessDenied` returns expose sensitive browse-name metadata | Low | Medium | Server returns only the status code + NodeId; no message leak per OPC UA Part 4 §7.34 guidance |
| LdapGroupRoleMapping migration breaks existing deployments | Low | High | Seed-migration preserves the hardcoded groups' effective grants verbatim; smoke test exercises the post-migration fleet admin login |
| Deny semantics accidentally ship (would break `acl-design.md` defer) | Low | Medium | `IPermissionEvaluator.Authorize` returns `bool` (not tri-state) through Phase 6.2; widening to `Allow`/`Deny`/`Indeterminate` is a v2.1 ticket |
## Completion Checklist
- [ ] Stream A: `LdapGroupRoleMapping` entity + migration + CRUD + seed
- [ ] Stream B: evaluator + trie builder + cache + per-session state + unit tests
- [ ] Stream C: OPC UA dispatch wiring on Read/Write/HistoryRead/Subscribe/Alarm paths
- [ ] Stream D: Admin UI `RoleGrantsTab` + `AclsTab` refresh + SignalR invalidation
- [ ] `phase-6-2-compliance.ps1` exits 0; exit-gate doc recorded
## Adversarial Review — 2026-04-19 (Codex, thread `019da48d-0d2b-7171-aed2-fc05f1f39ca3`)
1. **Crit · ACCEPT** — Trie must not conflate `LdapGroupRoleMapping` (control-plane admin claims per decision #105) with data-plane ACLs (decision #129). **Change**: `LdapGroupRoleMapping` is consumed only by the Admin UI role router. Data-plane trie reads `NodeAcl` rows joined against the session's **resolved LDAP groups**, never admin roles. Stream B.2 updated.
2. **Crit · ACCEPT** — Cached `UserAuthorizationState` survives LDAP group changes because memberships only refresh at cookie-auth. Change: add `MembershipFreshnessInterval` (default 15 min); past that, next hot-path authz call forces group re-resolution (fail-closed if LDAP unreachable). Session-close-wins on config-rollback.
3. **High · ACCEPT** — Node-local invalidation doesn't extend across redundant pair. **Change**: trie keyed on `(ClusterId, GenerationId)`; hot-path authz looks up `CurrentGenerationId` from the shared config DB (Polly-wrapped + sub-second cache). A Backup that read stale generation gets a mismatched trie → forces re-load. Implementation note added to Stream B.4.
4. **High · ACCEPT** — Browse enforcement missing. **Change**: new Stream C.7 (`Browse + TranslateBrowsePathsToNodeIds` enforcement). Ancestor visibility implied when any descendant has a grant; denied ancestors filter from browse results per `acl-design.md` §Browse.
5. **High · ACCEPT**`HistoryRead` should use `NodePermissions.HistoryRead` bit, not `Read`. **Change**: Stream C.3 revised; separate unit test asserts `Read+no-HistoryRead` denies HistoryRead while allowing current-value reads.
6. **High · ACCEPT** — Galaxy shallow-path (Cluster→Namespace→Tag) loses folder hierarchy authorization. **Change**: SystemPlatform namespaces use a `FolderSegment` scope-level between Namespace and Tag, populated from `Tag.FolderPath`; UNS-kind namespaces keep the 6-level hierarchy. Trie supports both via `ScopeKind` on each node.
7. **High · ACCEPT** — Subscription re-authorization policy unresolved between create-time-only (fast, wrong on revoke) and per-publish (slow). **Change**: stamp each `MonitoredItem` with `(AuthGenerationId, MembershipVersion)`; re-evaluate on Publish only when either version changed. Revoked items drop to `BadUserAccessDenied` within one publish cycle.
8. **Med · ACCEPT** — Mixed-authorization batch `Read` / `CreateMonitoredItems` service-result semantics underspecified. **Change**: Stream C.6 explicitly tests per-`ReadValueId` + per-`MonitoredItemCreateResult` denial in mixed batches; batch never collapses to a coarse failure.
9. **Med · ACCEPT** — Missing surfaces: `Method.Call`, `HistoryUpdate`, event filter on subscriptions, subscription-transfer on reconnect, alarm-ack. **Change**: scope expanded — every OPC UA authorization surface enumerated in Stream C: Read, Write, HistoryRead, HistoryUpdate, CreateMonitoredItems, TransferSubscriptions, Call, Acknowledge/Confirm/Shelve, Browse, TranslateBrowsePathsToNodeIds.
10. **Med · ACCEPT**`bool` evaluator bakes in grant-only semantics; collides with v2.1 Deny. **Change**: internal model uses `AuthorizationDecision { Allow | NotGranted | Denied, IReadOnlyList<MatchedGrant> Provenance }`. Phase 6.2 maps `Denied` → never produced; UI + audit log use the full record so v2.1 Deny lands without API break.
11. **Med · ACCEPT** — 6.1 cache fallback is availability-oriented; applying it to auth is correctness-dangerous. **Change**: auth-specific staleness budget `AuthCacheMaxStaleness` (default 5 min, not 24 h). Past that, hot-path evaluator fails closed on cached reads; all authorization calls return `NotGranted` until fresh data lands. Documented in risks + compliance.
12. **Low · ACCEPT** — Existing `NodeAclService` is raw CRUD. **Change**: new `ValidatedNodeAclAuthoringService` enforces scope-uniqueness + draft/publish invariants + rejects invalid (LDAP group, scope) pairs; Admin UI writes through it only. Stream D.2 adjusted.

View File

@@ -0,0 +1,130 @@
# Phase 6.3 — Redundancy Runtime
> **Status**: DRAFT — `CLAUDE.md` + `docs/Redundancy.md` describe a non-transparent warm/hot redundancy model with unique ApplicationUris, `RedundancySupport` advertisement, `ServerUriArray`, and dynamic `ServiceLevel`. Entities (`ServerCluster`, `ClusterNode`, `RedundancyRole`, `RedundancyMode`) exist; the runtime behavior (actual `ServiceLevel` number computation, mid-apply dip, `ServerUriArray` broadcast) is not wired.
>
> **Branch**: `v2/phase-6-3-redundancy-runtime`
> **Estimated duration**: 2 weeks
> **Predecessor**: Phase 6.2 (Authorization) — reuses the Phase 6.1 health endpoints for cluster-peer probing
> **Successor**: Phase 6.4 (Admin UI completion)
## Phase Objective
Land the non-transparent redundancy protocol end-to-end: two `OtOpcUa.Server` instances in a `ServerCluster` each expose a live `ServiceLevel` node whose value reflects that instance's suitability to serve traffic, advertise each other via `ServerUriArray`, and transition role (Primary ↔ Backup) based on health + operator intent.
Closes these gaps:
1. **Dynamic `ServiceLevel`** — OPC UA Part 5 §6.3.34 specifies a Byte (0..255) that clients poll to pick the healthiest server. Our server publishes it as a static value today.
2. **`ServerUriArray` broadcast** — Part 4 specifies that every node in a redundant pair should advertise its peers' ApplicationUris. Currently advertises only its own.
3. **Primary / Backup role coordination** — entities carry `RedundancyRole` but the runtime doesn't read it; no peer health probing; no role-transfer on primary failure.
4. **Mid-apply dip** — decision-level expectation that a server mid-generation-apply should report a *lower* ServiceLevel so clients cut over to the peer during the apply window. Not implemented.
## Scope — What Changes
| Concern | Change |
|---------|--------|
| `OtOpcUa.Server` → new `Server.Redundancy` sub-namespace | `RedundancyCoordinator` singleton. Resolves the current node's `ClusterNode` row at startup, loads its peers from `ServerCluster`, probes each peer's `/healthz` (Phase 6.1 endpoint) every `PeerProbeInterval` (default 2 s), maintains per-peer health state. |
| OPC UA server root | `ServiceLevel` variable node becomes a `BaseDataVariable` whose value updates on `RedundancyCoordinator` state change. `ServerUriArray` array variable refreshes on cluster-topology change. `RedundancySupport` stays static (set from `RedundancyMode` at startup). |
| `RedundancyCoordinator` computation | `ServiceLevel` formula: 255 = Primary + fully healthy + no apply in progress; 200 = Primary + an apply in the middle (clients should prefer peer); 100 = Backup + fully healthy; 50 = Backup + mid-apply; 0 = Faulted or peer-unreachable-and-I'm-not-authoritative. Documented in `docs/Redundancy.md` update. |
| Role transition | Split-brain avoidance: role is *declared* in the shared config DB (`ClusterNode.RedundancyRole`), not elected at runtime. An operator flips the row (or a failover script does). Coordinator only reads; never writes. |
| `sp_PublishGeneration` hook | Before the apply starts, the coordinator sets `ApplyInProgress = true` in-memory → `ServiceLevel` drops to mid-apply band. Clears after `sp_PublishGeneration` returns. |
| Admin UI `/cluster/{id}` page | New `RedundancyTab.razor` — shows current node's role + ServiceLevel + peer reachability. FleetAdmin can trigger a role-swap by editing `ClusterNode.RedundancyRole` + publishing a draft. |
| Metrics | New OpenTelemetry metrics: `ot_opcua_service_level{cluster,node}`, `ot_opcua_peer_reachable{cluster,node,peer}`, `ot_opcua_apply_in_progress{cluster,node}`. Sink via Phase 6.1 observability layer. |
## Scope — What Does NOT Change
| Item | Reason |
|------|--------|
| OPC UA authn / authz | Phases 6.2 + prior. Redundancy is orthogonal. |
| Driver layer | Drivers aren't redundancy-aware; they run on each node independently against the same equipment. The server layer handles the ServiceLevel story. |
| Automatic failover / election | Explicitly out of scope. Non-transparent = client picks which server to use via ServiceLevel + ServerUriArray. We do NOT ship consensus, leader election, or automatic promotion. Operator-driven failover is the v2.0 model per decision #7985. |
| Transparent redundancy (`RedundancySupport=Transparent`) | Not supported. If the operator asks for it the server fails startup with a clear error. |
| Historian redundancy | Galaxy Historian's own redundancy (two historians on two CPUs) is out of scope. The Galaxy driver talks to whichever historian is reachable from its node. |
## Entry Gate Checklist
- [ ] Phase 6.1 merged (uses `/healthz` for peer probing)
- [ ] `CLAUDE.md` §Redundancy + `docs/Redundancy.md` re-read
- [ ] Decisions #7985 re-skimmed
- [ ] `ServerCluster`/`ClusterNode`/`RedundancyRole`/`RedundancyMode` entities + existing migration reviewed
- [ ] OPC UA Part 4 §Redundancy + Part 5 §6.3.34 (ServiceLevel) re-skimmed
- [ ] Dev box has two OtOpcUa.Server instances configured against the same cluster — one designated Primary, one Backup — for integration testing
## Task Breakdown
### Stream A — Cluster topology loader (3 days)
1. **A.1** `RedundancyCoordinator` startup path: reads `ClusterNode` row for the current node (identified by `appsettings.json` `Cluster:NodeId`), reads the cluster's peer list, validates invariants (no duplicate `ApplicationUri`, at most one `Primary` per cluster if `RedundancyMode.WarmActive`, at most two nodes total in v2.0 per decision #83).
2. **A.2** Topology subscription — coordinator re-reads on `sp_PublishGeneration` confirmation so an operator role-swap takes effect after publish (no process restart needed).
3. **A.3** Tests: two-node cluster seed, one-node cluster seed (degenerate), duplicate-uri rejection.
### Stream B — Peer health probing + ServiceLevel computation (4 days)
1. **B.1** `PeerProbeLoop` runs per peer at `PeerProbeInterval` (2 s default, configurable via `appsettings.json`). Calls peer's `/healthz` via `HttpClient`; timeout 1 s. Exponential backoff on sustained failure.
2. **B.2** `ServiceLevelCalculator.Compute(current role, self health, peer reachable, apply in progress) → byte`. Matrix documented in §Scope.
3. **B.3** Calculator reacts to inputs via `IObserver` pattern so changes immediately push to the OPC UA `ServiceLevel` node.
4. **B.4** Tests: matrix coverage for all role × health × apply permutations (32 cases); injected `IClock` + fake `HttpClient` so tests are deterministic.
### Stream C — OPC UA node wiring (3 days)
1. **C.1** `ServiceLevel` variable node created under `ServerStatus` at server startup. Type `Byte`, AccessLevel = CurrentRead only. Subscribe to `ServiceLevelCalculator` observable; push updates via `DataChangeNotification`.
2. **C.2** `ServerUriArray` variable node under `ServerCapabilities`. Array of `String`, length = peer count. Updates on topology change.
3. **C.3** `RedundancySupport` variable — static at startup from `RedundancyMode`. Values: `None`, `Cold`, `Warm`, `WarmActive`, `Hot`. Phase 6.3 supports everything except `Transparent` + `HotAndMirrored`.
4. **C.4** Test against the Client.CLI: connect to primary, read `ServiceLevel` → expect 255; pause primary apply → expect 200; fail primary → client sees `Bad_ServerNotConnected` + reconnects to peer at 100.
### Stream D — Apply-window integration (2 days)
1. **D.1** `sp_PublishGeneration` caller wraps the apply in `using (coordinator.BeginApplyWindow())`. `BeginApplyWindow` increments an in-process counter; ServiceLevel drops on first increment. Dispose decrements.
2. **D.2** Nested applies handled by the counter (rarely happens but Ignition and Kepware clients have both been observed firing rapid-succession draft publishes).
3. **D.3** Test: mid-apply subscribe on primary; assert the subscribing client sees the ServiceLevel drop immediately after the apply starts, then restore after apply completes.
### Stream E — Admin UI + metrics (3 days)
1. **E.1** `RedundancyTab.razor` under `/cluster/{id}/redundancy`. Shows each node's role, current ServiceLevel, peer reachability, last apply timestamp. Role-swap button posts a draft edit on `ClusterNode.RedundancyRole`; publish applies.
2. **E.2** OpenTelemetry meter export: three gauges per the §Scope metrics. Sink via Phase 6.1 observability.
3. **E.3** SignalR push: `FleetStatusHub` broadcasts ServiceLevel changes so the Admin UI updates within ~1 s of the coordinator observing a peer flip.
## Compliance Checks (run at exit gate)
- [ ] **Primary-healthy** ServiceLevel = 255.
- [ ] **Backup-healthy** ServiceLevel = 100.
- [ ] **Mid-apply Primary** ServiceLevel = 200 — verified via Client.CLI subscription polling ServiceLevel during a forced draft publish.
- [ ] **Peer-unreachable** handling: when a Primary can't probe its Backup's `/healthz`, Primary still serves at 255 (peer is the one with the problem). When a Backup can't probe Primary, Backup flips to 200 (per decision #81 — a lonely Backup promotes its advertised level to signal "I'll take over if you ask" without auto-promoting).
- [ ] **Role transition via operator publish**: FleetAdmin swaps `RedundancyRole` rows in a draft, publishes; both nodes re-read topology on publish confirmation and flip ServiceLevel accordingly — no restart needed.
- [ ] **ServerUriArray** returns exactly the peer node's ApplicationUri.
- [ ] **Client.CLI cutover**: with a primary deliberately halted, a client that was connected to primary reconnects to the backup within the ServiceLevel-polling interval.
- [ ] No regression in existing driver test suites; no regression in `/healthz` reachability under redundancy load.
## Risks and Mitigations
| Risk | Likelihood | Impact | Mitigation |
|------|:----------:|:------:|------------|
| Split-brain from operator race (both nodes marked Primary) | Low | High | Coordinator rejects startup if its cluster has >1 Primary row; logs + fails fast. Document as a publish-time validation in `sp_PublishGeneration`. |
| ServiceLevel thrashing on flaky peer | Medium | Medium | 2 s probe interval + 3-sample smoothing window; only declares a peer unreachable after 3 consecutive failed probes |
| Client ignores ServiceLevel and stays on broken primary | Medium | Medium | Documented in `docs/Redundancy.md` — non-transparent redundancy requires client cooperation; most SCADA clients (Ignition, Kepware, Aveva OI Gateway) honor it. Unit-test the advertised values; field behavior is client-responsibility |
| Apply-window counter leaks on exception | Low | High | `BeginApplyWindow` returns `IDisposable`; `using` syntax enforces paired decrement; unit test for exception-in-apply path |
| `HttpClient` probe leaks sockets | Low | Medium | Single shared `HttpClient` per coordinator (not per-probe); timeouts tight to avoid keeping connections open during peer downtime |
## Completion Checklist
- [ ] Stream A: topology loader + tests
- [ ] Stream B: peer probe + ServiceLevel calculator + 32-case matrix tests
- [ ] Stream C: ServiceLevel / ServerUriArray / RedundancySupport node wiring + Client.CLI smoke test
- [ ] Stream D: apply-window integration + nested-apply counter
- [ ] Stream E: Admin `RedundancyTab` + OpenTelemetry metrics + SignalR push
- [ ] `phase-6-3-compliance.ps1` exits 0; exit-gate doc; `docs/Redundancy.md` updated with the ServiceLevel matrix
## Adversarial Review — 2026-04-19 (Codex, thread `019da490-3fa0-7340-98b8-cceeca802550`)
1. **Crit · ACCEPT** — No publish-generation fencing enables split-publish advertising both as authoritative. **Change**: coordinator CAS on a monotonic `ConfigGenerationId`; every topology decision is generation-stamped; peers reject state propagated from a lower generation.
2. **Crit · ACCEPT**`>1 Primary` at startup covered but runtime containment missing when invalid topology appears later (mid-apply race). **Change**: add runtime `InvalidTopology` state — both nodes self-demote to ServiceLevel 2 (the "detected inconsistency" band, below normal operation) until convergence.
3. **High · ACCEPT**`0 = Faulted` collides with OPC UA Part 5 §6.3.34 semantics where 0 means **Maintenance** and 1 means NoData. **Change**: reserve **0** for operator-declared maintenance-mode only; Faulted/unreachable uses **1** (NoData); in-range degraded states occupy 2..199.
4. **High · ACCEPT** — Matrix collapses distinct operational states onto the same value. **Change**: matrix expanded to Authoritative-Primary=255, Isolated-Primary=230 (peer unreachable — still serving), Primary-Mid-Apply=200, Recovering-Primary=180, Authoritative-Backup=100, Isolated-Backup=80 (primary unreachable — "take over if asked"), Backup-Mid-Apply=50, Recovering-Backup=30.
5. **High · ACCEPT**`/healthz` from 6.1 is HTTP-healthy but doesn't guarantee OPC UA data plane. **Change**: add a redundancy-specific probe `UaHealthProbe` — issues a `ReadAsync(ServiceLevel)` against the peer's OPC UA endpoint via a lightweight client session. `/healthz` remains the fast-fail; the UA probe is the authority signal.
6. **High · ACCEPT**`ServerUriArray` must include self + peers, not peers only. **Change**: array contains `[self.ApplicationUri, peer.ApplicationUri]` in stable deterministic ordering; compliance test asserts local-plus-peer membership.
7. **Med · ACCEPT** — No `Faulted → Recovering → Healthy` path. **Change**: add `Recovering` state with min dwell time (60 s default) + positive publish witness (one successful Read on a reference node) before returning to Healthy. Thrash-prevention.
8. **Med · ACCEPT** — Topology change during in-flight probe undefined. **Change**: every probe task tagged with `ConfigGenerationId` at dispatch; obsolete results discarded; in-flight probes cancelled on topology reload.
9. **Med · ACCEPT** — Apply-window counter race on exception/cancellation/async ownership. **Change**: apply-window is a named lease keyed to `(ConfigGenerationId, PublishRequestId)` with disposal enforced via `await using`; watchdog detects leased-but-abandoned and force-closes after `ApplyMaxDuration` (default 10 min).
10. **High · ACCEPT** — Ignition + Kepware + Aveva OI Gateway `ServiceLevel` compliance is unverified. **Change**: risk elevated to High; add Stream F (new) — build an interop matrix: validate against Ignition 8.1/8.3, Kepware KEPServerEX 6.x, Aveva OI Gateway 2020R2 + 2023R1. Document per-client cutover behaviour. Field deployments get a documented compatibility table; clients that ignore ServiceLevel documented as requiring explicit backup-endpoint config.
11. **Med · ACCEPT** — Galaxy MXAccess re-session on Primary death not in acceptance. **Change**: Stream F adds an end-to-end failover smoke test that boots Galaxy.Proxy on both nodes, kills Primary, asserts Galaxy consumer reconnects to Backup within `(SessionTimeout + KeepAliveInterval × 3)` budget. `docs/Redundancy.md` updated with required session timeouts.
12. **Med · ACCEPT** — Transparent-mode startup rejection is outage-prone. **Change**: `sp_PublishGeneration` validates `RedundancyMode` pre-publish — unsupported values reject the publish attempt with a clear validation error; runtime never sees an unsupported mode. Last-good config stays active.

View File

@@ -0,0 +1,120 @@
# Phase 6.4 — Admin UI Completion
> **Status**: DRAFT — Phase 1 Stream E shipped the Admin scaffold + core pages; several feature-completeness items from its completion checklist (`phase-1-configuration-and-admin-scaffold.md` §Stream E) never landed. This phase closes them.
>
> **Branch**: `v2/phase-6-4-admin-ui-completion`
> **Estimated duration**: 2 weeks
> **Predecessor**: Phase 6.3 (Redundancy runtime) — reuses the `/cluster/{id}` page layout for the new tabs
> **Successor**: v2 release-readiness capstone (Task #121)
## Phase Objective
Close the Admin UI feature-completeness checklist that Phase 1 Stream E exit gate left open. Each item below is an existing `phase-1-configuration-and-admin-scaffold.md` completion-checklist entry that is currently unchecked.
Gaps to close:
1. **UNS Structure tab drag/move with impact preview** — decision #115 + `admin-ui.md` §"UNS". Current state: list-only render; no drag reorder; no "X lines / Y equipment impacted" preview.
2. **Equipment CSV import + 5-identifier search** — decision #95 + #117. Current state: basic form; no CSV parser; search indexes only ZTag.
3. **Draft-generation diff viewer** — enhance existing `DiffViewer.razor` to show generation-diff not just staged-edit diff; highlight ACL grant changes (lands after Phase 6.2).
4. **`_base` equipment-class Identification fields exposure** — decision #138139. Columns exist on `Equipment`; no Admin UI field group; no address-space exposure of the OPC 40010 sub-folder.
## Scope — What Changes
| Concern | Change |
|---------|--------|
| `Admin/Pages/UnsTab.razor` | Rewrite as a tree component with drag-drop (Blazor-native HTML5 DnD; no third-party dep). Each drag fires a "Compute Impact" call against the draft-generation state + renders a modal preview ("Moving Line 'Oven-2' from 'Packaging' to 'Assembly' will re-home 14 equipment + re-parent 237 tags"). Confirmation commits the draft edit. |
| `Admin/Services/UnsImpactAnalyzer.cs` | New service. Given a move-operation (line move, area rename, line merge), computes cascade counts by walking the draft-generation `Equipment` + `Tag` tables. Pure-function shape; testable in isolation. |
| `Admin/Pages/EquipmentTab.razor` | Add CSV-import button → modal with file picker + dry-run preview. Add multi-identifier search bar (ZTag / SAPID / UniqueId / Alias1 / Alias2) per decision #95 — parses any of the five, shows matches across draft + published generations. |
| `Admin/Services/EquipmentCsvImporter.cs` | New service. Parses CSV with documented header row; validates each row against the `Equipment` schema (required fields + `ExternalIdReservation` freshness); returns `ImportPreview` DTO with per-row accept/reject + reason; commit step wraps in a single EF transaction. |
| `Admin/Pages/DraftEditor.razor` + `DiffViewer.razor` | Diff viewer expanded: adds sections for ACL grants (from Phase 6.2 `LdapGroupRoleMapping` + `NodeAcl`), redundancy-role changes (from Phase 6.3), equipment-class `_base` Identification fields. Render each section collapsible. |
| `Admin/Components/IdentificationFields.razor` | New component. Renders the OPC 40010 nullable columns (Manufacturer, Model, SerialNumber, ProductInstanceUri, HardwareRevision, SoftwareRevision, DeviceRevision, YearOfConstruction, MonthOfConstruction) as a labelled field group on the `EquipmentTab` detail view. |
| `OtOpcUa.Server/OpcUa/DriverNodeManager` — Equipment folder build | When an `Equipment` row has non-null Identification fields, the server adds an `Identification` sub-folder under the Equipment node containing one variable per non-null field. Matches OPC 40010 companion spec. |
## Scope — What Does NOT Change
| Item | Reason |
|------|--------|
| Admin UI visual language | Bootstrap 5 / cookie auth / sidebar layout unchanged — consistency with ScadaLink design reference. |
| LDAP auth flow | Already shipped in Phase 1. Phase 6.4 is additive UI only. |
| Core abstractions / driver layer | Admin UI changes don't touch drivers. |
| Equipment-class *template schema validation* | Still deferred (decision #112 — schemas repo not landed). We expose the Identification fields but don't validate against a template hierarchy. |
| Drag/move to *other clusters* | Out of scope — equipment is cluster-scoped per decision #82. Cross-cluster migration is a different workflow. |
## Entry Gate Checklist
- [ ] Phase 6.2 merged (ACL grants are part of the new diff viewer sections)
- [ ] Phase 6.3 merged (redundancy-role changes are part of the diff viewer)
- [ ] `phase-1-configuration-and-admin-scaffold.md` §Stream E completion checklist re-read — confirm these are the remaining items
- [ ] `admin-ui.md` re-skimmed for screen layouts
- [ ] Existing `EquipmentTab.razor` / `UnsTab.razor` / `DraftEditor.razor` diff'd against what ships today so the edits are additive not destructive
- [ ] Dev Galaxy available for OPC 40010 exposure smoke testing
## Task Breakdown
### Stream A — UNS drag/reorder + impact preview (4 days)
1. **A.1** `UnsImpactAnalyzer` service. Inputs: `(DraftGenerationId, MoveOperation)`. Outputs: `ImpactPreview { AffectedEquipmentCount, AffectedTagCount, CascadeWarnings[] }`. Unit tests cover line move / area rename / line merge.
2. **A.2** HTML5 DnD on a tree component. No JS interop beyond `ondragstart`/`ondragover`/`ondrop` — keeps build + testability simple.
3. **A.3** Modal preview wired to `UnsImpactAnalyzer` output; "Confirm" commits a draft edit via `DraftService`.
4. **A.4** Playwright smoke test (or equivalent): drag a line across areas, assert modal shows the right counts, assert draft row reflects the move.
### Stream B — Equipment CSV import + 5-identifier search (4 days)
1. **B.1** `EquipmentCsvImporter` with a documented header row (`ZTag, SAPID, UniqueId, Alias1, Alias2, Name, UnsAreaName, UnsLineName, Manufacturer, Model, SerialNumber, …`). Parser rejects unknown columns + blank required fields + duplicate ZTags.
2. **B.2** `ImportPreview` UI: per-row accept/reject table. Reject reasons: "ZTag already exists in draft", "ExternalIdReservation conflict with Cluster X", "UnsLineName not found in draft UNS tree", etc. Operator reviews then clicks "Commit" → single EF transaction.
3. **B.3** Multi-identifier search — bar accepts any of the 5 identifiers, probes each column in parallel, returns first-match-wins + disambiguation list if multiple match.
4. **B.4** Smoke tests: 100-row CSV with 10 intentional conflicts (5 ZTag dupes, 3 reservation clashes, 2 missing UnsLines); assert preview flags each; assert commit rolls back cleanly when a conflict surfaces post-preview.
### Stream C — Diff viewer enhancements (3 days)
1. **C.1** Refactor `DiffViewer.razor` into a base component + section plugins. Section plugins: `StructuralDiffSection` (UNS tree), `EquipmentDiffSection` (Equipment rows), `TagDiffSection` (Tag rows), `AclDiffSection` (ACL grants — depends on Phase 6.2), `RedundancyDiffSection` (role changes — depends on Phase 6.3), `IdentificationDiffSection` (OPC 40010 fields).
2. **C.2** Each section renders collapsed by default; counts + top-line summary always visible.
3. **C.3** Tests: seed two generations with deliberate diffs, assert every section reports the right counts + top-line summary.
### Stream D — OPC 40010 Identification exposure (3 days)
1. **D.1** `IdentificationFields.razor` component — labelled inputs; nullable columns show empty input; required field validation only on commit.
2. **D.2** `DriverNodeManager` equipment-folder builder — after building the equipment node, inspect the Identification columns; if any non-null, add an `Identification` sub-folder with variable-per-field.
3. **D.3** Address-space smoke test via Client.CLI: browse an equipment node, assert `Identification` sub-folder present when columns are set, absent when all null.
## Compliance Checks (run at exit gate)
- [ ] **UNS drag/move**: drag a line across areas; modal preview shows correct impacted-equipment + impacted-tag counts.
- [ ] **Equipment CSV**: 100-row CSV with 10 conflicts imports cleanly (preview flags each, commit rolls back mid-conflict).
- [ ] **5-identifier search**: querying any of the 5 IDs returns the matching row; ambiguous searches list options.
- [ ] **Diff viewer**: every section renders for a 2-generation diff with deliberate changes in every category.
- [ ] **OPC 40010 exposure**: Client.CLI browse shows `Identification` sub-folder when equipment has non-null columns; folder absent when all null.
- [ ] **ScadaLink visual parity**: operator-equivalence reviewer signs off that the new tabs feel consistent with existing Admin UI pages.
## Risks and Mitigations
| Risk | Likelihood | Impact | Mitigation |
|------|:----------:|:------:|------------|
| UNS drag-drop janky on large trees (>500 nodes) | Medium | Medium | Virtualize the tree component; default-collapse nested areas; test with a synthetic 1000-equipment seed |
| CSV import performance on 10k-row imports | Medium | Medium | Stream-parse rather than load-into-memory; preview renders in batches of 100; commit is chunked-EF-insert with progress bar |
| Diff viewer becomes unwieldy with many sections | Low | Medium | Each section collapsed by default; top-line summary row always shown; Phase 6.4 caps at 6 sections |
| OPC 40010 sub-folder accidentally exposes NULL/empty identification columns as empty-string variables | Low | Low | Column null-check in the builder; drop variables whose DB value is null |
| 5-identifier search pulls full table | Medium | Medium | Indexes on each of ZTag/SAPID/UniqueId/Alias1/Alias2; search query uses a UNION of 5 indexed lookups; falls back to LIKE only on explicit operator opt-in |
## Completion Checklist
- [ ] Stream A: `UnsImpactAnalyzer` + drag-drop tree + modal preview + Playwright smoke
- [ ] Stream B: `EquipmentCsvImporter` + preview modal + 5-identifier search + conflict-rollback test
- [ ] Stream C: `DiffViewer` refactor + 6 section plugins + 2-generation diff test
- [ ] Stream D: `IdentificationFields.razor` + address-space builder change + Client.CLI browse test
- [ ] Visual-compliance reviewer signoff
- [ ] Full solution `dotnet test` passes; `phase-6-4-compliance.ps1` exits 0; exit-gate doc
## Adversarial Review — 2026-04-19 (Codex, via `codex-rescue` subagent)
1. **Crit · ACCEPT** — Stale UNS impact preview can overwrite concurrent draft edits. **Change**: each preview carries a `DraftRevisionToken`; `Confirm` compares against the current draft + rejects with a `409 Conflict / refresh-required` modal if any draft edit landed since the preview was generated. Stream A.3 updated.
2. **High · ACCEPT** — CSV import atomicity is internally contradictory (single EF transaction vs. chunked inserts). **Change**: one explicit model — staged-import table (`EquipmentImportBatch { Id, CreatedAtUtc, RowsStaged, RowsAccepted, RowsRejected }`) receives rows in chunks; final `FinaliseImportBatch` is atomic over `Equipment` + `ExternalIdReservation`. Rollback is "drop the batch row" — the real Equipment table is never partially mutated.
3. **Crit · ACCEPT** — Identifier contract rewrite mis-cites decisions. **Change**: revert to the `admin-ui.md` + decision #117 canonical set — `ZTag / MachineCode / SAPID / EquipmentId / EquipmentUuid`. CSV header follows that set verbatim. Introduce a separate decision entry for versioned CSV header shape before adding any new column; CSV header row must start with `# OtOpcUaCsv v1` so future shape changes are unambiguous.
4. **Med · ACCEPT** — Search ordering undefined. **Change**: rank SQL — exact match on any identifier scores 100; prefix match 50; LIKE-fuzzy 20; published > draft tie-breaker; `ORDER BY score DESC, RowVersion DESC`. Typeahead shows which field matched via trailing badge.
5. **High · ACCEPT** — HTML5 DnD on virtualized tree is aspirational. **Change**: Stream A.2 rewritten — commits to **`MudBlazor.TreeView` + `MudBlazor.DropTarget`** (already a transitive dep via the existing Admin UI). Build a 1000-node synthetic seed in A.1 + validate drag-latency budget before implementing impact preview. If MudBlazor can't hit the budget, fall back to a flat-list reorder UI with Area/Line dropdowns (loss of visual drag affordance but unblocks the feature).
6. **Med · ACCEPT** — Collapsed-by-default doesn't handle generation-sized diffs. **Change**: each diff section has a hard row cap (1000 by default). Over-cap sections render an aggregate summary + "Load full diff" button that streams via SignalR in 500-row pages. Decision #115 subtree renames surface as a "N equipment re-parented under X → Y" summary instead of row-by-row.
7. **High · ACCEPT** — OPC 40010 field list doesn't match decision #139. **Change**: field group realigned to `Manufacturer, Model, SerialNumber, HardwareRevision, SoftwareRevision, YearOfConstruction, AssetLocation, ManufacturerUri, DeviceManualUri`. `ProductInstanceUri / DeviceRevision / MonthOfConstruction` dropped from Phase 6.4 — they belong to a future OPC 40010 widening decision.
8. **High · ACCEPT**`Identification` subtree unreconciled with ACL hierarchy (Phase 6.2 6-level scope). **Change**: address-space builder creates the Identification sub-folder under the Equipment node **with the same ScopeId as Equipment** — no new scope level. ACL evaluator treats `…/Equipment/Identification/X` as inheriting the `Equipment` scope's grants. Documented in Phase 6.2's `acl-design.md` cross-reference update.
9. **Low · ACCEPT** — Visual-review gate names nonexistent reviewer role. **Change**: rubric defined — a named "Admin UX reviewer" (role `FleetAdmin` user, not the implementation lead) compares side-by-side screenshots against the `admin-ui.md` §Visual-Design reference panels; signoff artefact is a checked-in screenshot set under `docs/v2/visual-compliance/phase-6-4/`.
10. **Med · ACCEPT** — Cross-cluster drag/drop lacks loud failure path. **Change**: on drop across cluster boundary, disable the drop target + show a toast "Equipment is cluster-scoped (decision #82). To move across clusters, use the Export → Import workflow on the Cluster detail page." Plus a help link. Tested in Stream A.4.

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,180 @@
namespace ZB.MOM.WW.OtOpcUa.Driver.OpcUaClient;
/// <summary>
/// OPC UA Client (gateway) driver configuration. Bound from <c>DriverConfig</c> JSON at
/// driver-host registration time. Models the settings documented in
/// <c>docs/v2/driver-specs.md</c> §8.
/// </summary>
/// <remarks>
/// This driver connects to a REMOTE OPC UA server and re-exposes its address space
/// through the local OtOpcUa server — the opposite direction from the usual "server
/// exposes PLC data" flow. Tier A (pure managed, OPC Foundation reference SDK); universal
/// protections cover it.
/// </remarks>
public sealed class OpcUaClientDriverOptions
{
/// <summary>
/// Remote OPC UA endpoint URL, e.g. <c>opc.tcp://plc.internal:4840</c>. Convenience
/// shortcut for a single-endpoint deployment — equivalent to setting
/// <see cref="EndpointUrls"/> to a list with this one URL. When both are provided,
/// the list wins and <see cref="EndpointUrl"/> is ignored.
/// </summary>
public string EndpointUrl { get; init; } = "opc.tcp://localhost:4840";
/// <summary>
/// Ordered list of candidate endpoint URLs for failover. The driver tries each in
/// order at <see cref="OpcUaClientDriver.InitializeAsync"/> and on session drop;
/// the first URL that successfully connects wins. Typical use-case: an OPC UA server
/// pair running in hot-standby (primary 4840 + backup 4841) where either can serve
/// the same address space. Leave unset (or empty) to use <see cref="EndpointUrl"/>
/// as a single-URL shortcut.
/// </summary>
public IReadOnlyList<string> EndpointUrls { get; init; } = [];
/// <summary>
/// Per-endpoint connect-attempt timeout during the failover sweep. Short enough that
/// cycling through several dead servers doesn't blow the overall init budget, long
/// enough to tolerate a slow TLS handshake on a healthy server. Applied independently
/// of <see cref="Timeout"/> which governs steady-state operations.
/// </summary>
public TimeSpan PerEndpointConnectTimeout { get; init; } = TimeSpan.FromSeconds(3);
/// <summary>
/// Security policy to require when selecting an endpoint. Either a
/// <see cref="OpcUaSecurityPolicy"/> enum constant or a free-form string (for
/// forward-compatibility with future OPC UA policies not yet in the enum).
/// Matched against <c>EndpointDescription.SecurityPolicyUri</c> suffix — the driver
/// connects to the first endpoint whose policy name matches AND whose mode matches
/// <see cref="SecurityMode"/>. When set to <see cref="OpcUaSecurityPolicy.None"/>
/// the driver picks any unsecured endpoint regardless of policy string.
/// </summary>
public OpcUaSecurityPolicy SecurityPolicy { get; init; } = OpcUaSecurityPolicy.None;
/// <summary>Security mode.</summary>
public OpcUaSecurityMode SecurityMode { get; init; } = OpcUaSecurityMode.None;
/// <summary>Authentication type.</summary>
public OpcUaAuthType AuthType { get; init; } = OpcUaAuthType.Anonymous;
/// <summary>User name (required only for <see cref="OpcUaAuthType.Username"/>).</summary>
public string? Username { get; init; }
/// <summary>Password (required only for <see cref="OpcUaAuthType.Username"/>).</summary>
public string? Password { get; init; }
/// <summary>
/// Filesystem path to the user-identity certificate (PFX/PEM). Required when
/// <see cref="AuthType"/> is <see cref="OpcUaAuthType.Certificate"/>. The driver
/// loads the cert + private key, which the remote server validates against its
/// <c>TrustedUserCertificates</c> store to authenticate the session's user token.
/// Leave unset to use the driver's application-instance certificate as the user
/// token (not typical — most deployments have a separate user cert).
/// </summary>
public string? UserCertificatePath { get; init; }
/// <summary>
/// Optional password that unlocks <see cref="UserCertificatePath"/> when the PFX is
/// protected. PEM files generally have their password on the adjacent key file; this
/// knob only applies to password-locked PFX.
/// </summary>
public string? UserCertificatePassword { get; init; }
/// <summary>Server-negotiated session timeout. Default 120s per driver-specs.md §8.</summary>
public TimeSpan SessionTimeout { get; init; } = TimeSpan.FromSeconds(120);
/// <summary>Client-side keep-alive interval.</summary>
public TimeSpan KeepAliveInterval { get; init; } = TimeSpan.FromSeconds(5);
/// <summary>Initial reconnect delay after a session drop.</summary>
public TimeSpan ReconnectPeriod { get; init; } = TimeSpan.FromSeconds(5);
/// <summary>
/// When <c>true</c>, the driver accepts any self-signed / untrusted server certificate.
/// Dev-only — must be <c>false</c> in production so MITM attacks against the opc.tcp
/// channel fail closed.
/// </summary>
public bool AutoAcceptCertificates { get; init; } = false;
/// <summary>
/// Application URI the driver reports during session creation. Must match the
/// subject-alt-name on the client certificate if one is used, which is why it's a
/// config knob rather than hard-coded.
/// </summary>
public string ApplicationUri { get; init; } = "urn:localhost:OtOpcUa:GatewayClient";
/// <summary>
/// Friendly name sent to the remote server for diagnostics. Shows up in the remote
/// server's session-list so operators can identify which gateway instance is calling.
/// </summary>
public string SessionName { get; init; } = "OtOpcUa-Gateway";
/// <summary>Connect + per-operation timeout.</summary>
public TimeSpan Timeout { get; init; } = TimeSpan.FromSeconds(10);
/// <summary>
/// Root NodeId to mirror. Default <c>null</c> = <c>ObjectsFolder</c> (i=85). Set to
/// a scoped root to restrict the address space the driver exposes locally — useful
/// when the remote server has tens of thousands of nodes and only a subset is
/// needed downstream.
/// </summary>
public string? BrowseRoot { get; init; }
/// <summary>
/// Cap on total nodes discovered during <c>DiscoverAsync</c>. Default 10_000 —
/// bounds memory on runaway remote servers without being so low that normal
/// deployments hit it. When the cap is reached discovery stops and a warning is
/// written to the driver health surface; the partially-discovered tree is still
/// projected into the local address space.
/// </summary>
public int MaxDiscoveredNodes { get; init; } = 10_000;
/// <summary>
/// Max hierarchical depth of the browse. Default 10 — deep enough for realistic
/// OPC UA information models, shallow enough that cyclic graphs can't spin the
/// browse forever.
/// </summary>
public int MaxBrowseDepth { get; init; } = 10;
}
/// <summary>OPC UA message security mode.</summary>
public enum OpcUaSecurityMode
{
None,
Sign,
SignAndEncrypt,
}
/// <summary>
/// OPC UA security policies recognized by the driver. Maps to the standard
/// <c>http://opcfoundation.org/UA/SecurityPolicy#</c> URI suffixes the SDK uses for
/// endpoint matching.
/// </summary>
/// <remarks>
/// <see cref="Basic128Rsa15"/> and <see cref="Basic256"/> are <b>deprecated</b> per OPC UA
/// spec v1.04 — they remain in the enum only for brownfield interop with older servers.
/// Prefer <see cref="Basic256Sha256"/>, <see cref="Aes128_Sha256_RsaOaep"/>, or
/// <see cref="Aes256_Sha256_RsaPss"/> for new deployments.
/// </remarks>
public enum OpcUaSecurityPolicy
{
/// <summary>No security. Unsigned, unencrypted wire.</summary>
None,
/// <summary>Deprecated (OPC UA 1.04). Retained for legacy server interop.</summary>
Basic128Rsa15,
/// <summary>Deprecated (OPC UA 1.04). Retained for legacy server interop.</summary>
Basic256,
/// <summary>Recommended baseline for current deployments.</summary>
Basic256Sha256,
/// <summary>Current OPC UA policy; AES-128 + SHA-256 + RSA-OAEP.</summary>
Aes128_Sha256_RsaOaep,
/// <summary>Current OPC UA policy; AES-256 + SHA-256 + RSA-PSS.</summary>
Aes256_Sha256_RsaPss,
}
/// <summary>User authentication type sent to the remote server.</summary>
public enum OpcUaAuthType
{
Anonymous,
Username,
Certificate,
}

View File

@@ -0,0 +1,28 @@
<Project Sdk="Microsoft.NET.Sdk">
<PropertyGroup>
<TargetFramework>net10.0</TargetFramework>
<Nullable>enable</Nullable>
<ImplicitUsings>enable</ImplicitUsings>
<LangVersion>latest</LangVersion>
<TreatWarningsAsErrors>true</TreatWarningsAsErrors>
<GenerateDocumentationFile>true</GenerateDocumentationFile>
<NoWarn>$(NoWarn);CS1591</NoWarn>
<RootNamespace>ZB.MOM.WW.OtOpcUa.Driver.OpcUaClient</RootNamespace>
<AssemblyName>ZB.MOM.WW.OtOpcUa.Driver.OpcUaClient</AssemblyName>
</PropertyGroup>
<ItemGroup>
<ProjectReference Include="..\ZB.MOM.WW.OtOpcUa.Core.Abstractions\ZB.MOM.WW.OtOpcUa.Core.Abstractions.csproj"/>
</ItemGroup>
<ItemGroup>
<PackageReference Include="OPCFoundation.NetStandard.Opc.Ua.Client" Version="1.5.378.106"/>
<PackageReference Include="OPCFoundation.NetStandard.Opc.Ua.Configuration" Version="1.5.378.106"/>
</ItemGroup>
<ItemGroup>
<InternalsVisibleTo Include="ZB.MOM.WW.OtOpcUa.Driver.OpcUaClient.Tests"/>
</ItemGroup>
</Project>

View File

@@ -26,8 +26,36 @@ namespace ZB.MOM.WW.OtOpcUa.Driver.S7;
/// </para>
/// </remarks>
public sealed class S7Driver(S7DriverOptions options, string driverInstanceId)
: IDriver, IDisposable, IAsyncDisposable
: IDriver, ITagDiscovery, IReadable, IWritable, ISubscribable, IHostConnectivityProbe, IDisposable, IAsyncDisposable
{
// ---- ISubscribable + IHostConnectivityProbe state ----
private readonly System.Collections.Concurrent.ConcurrentDictionary<long, SubscriptionState> _subscriptions = new();
private long _nextSubscriptionId;
private readonly object _probeLock = new();
private HostState _hostState = HostState.Unknown;
private DateTime _hostStateChangedUtc = DateTime.UtcNow;
private CancellationTokenSource? _probeCts;
public event EventHandler<DataChangeEventArgs>? OnDataChange;
public event EventHandler<HostStatusChangedEventArgs>? OnHostStatusChanged;
/// <summary>OPC UA StatusCode used when the tag name isn't in the driver's tag map.</summary>
private const uint StatusBadNodeIdUnknown = 0x80340000u;
/// <summary>OPC UA StatusCode used when the tag's data type isn't implemented yet.</summary>
private const uint StatusBadNotSupported = 0x803D0000u;
/// <summary>OPC UA StatusCode used when the tag is declared read-only.</summary>
private const uint StatusBadNotWritable = 0x803B0000u;
/// <summary>OPC UA StatusCode used when write fails validation (e.g. out-of-range value).</summary>
private const uint StatusBadInternalError = 0x80020000u;
/// <summary>OPC UA StatusCode used for socket / timeout / protocol-layer faults.</summary>
private const uint StatusBadCommunicationError = 0x80050000u;
/// <summary>OPC UA StatusCode used when S7 returns <c>ErrorCode.WrongCPU</c> / PUT/GET disabled.</summary>
private const uint StatusBadDeviceFailure = 0x80550000u;
private readonly Dictionary<string, S7TagDefinition> _tagsByName = new(StringComparer.OrdinalIgnoreCase);
private readonly Dictionary<string, S7ParsedAddress> _parsedByName = new(StringComparer.OrdinalIgnoreCase);
private readonly S7DriverOptions _options = options;
private readonly SemaphoreSlim _gate = new(1, 1);
@@ -68,7 +96,29 @@ public sealed class S7Driver(S7DriverOptions options, string driverInstanceId)
await plc.OpenAsync(cts.Token).ConfigureAwait(false);
Plc = plc;
// Parse every tag's address once at init so config typos fail fast here instead
// of surfacing as BadInternalError on every Read against the bad tag. The parser
// also rejects bit-offset > 7, DB 0, unknown area letters, etc.
_tagsByName.Clear();
_parsedByName.Clear();
foreach (var t in _options.Tags)
{
var parsed = S7AddressParser.Parse(t.Address); // throws FormatException
_tagsByName[t.Name] = t;
_parsedByName[t.Name] = parsed;
}
_health = new DriverHealth(DriverState.Healthy, DateTime.UtcNow, null);
// Kick off the probe loop once the connection is up. Initial HostState stays
// Unknown until the first probe tick succeeds — avoids broadcasting a premature
// Running transition before any PDU round-trip has happened.
if (_options.Probe.Enabled)
{
_probeCts = new CancellationTokenSource();
_ = Task.Run(() => ProbeLoopAsync(_probeCts.Token), _probeCts.Token);
}
}
catch (Exception ex)
{
@@ -89,6 +139,17 @@ public sealed class S7Driver(S7DriverOptions options, string driverInstanceId)
public Task ShutdownAsync(CancellationToken cancellationToken)
{
try { _probeCts?.Cancel(); } catch { }
_probeCts?.Dispose();
_probeCts = null;
foreach (var state in _subscriptions.Values)
{
try { state.Cts.Cancel(); } catch { }
state.Cts.Dispose();
}
_subscriptions.Clear();
try { Plc?.Close(); } catch { /* best-effort — tearing down anyway */ }
Plc = null;
_health = new DriverHealth(DriverState.Unknown, _health.LastSuccessfulRead, null);
@@ -106,6 +167,339 @@ public sealed class S7Driver(S7DriverOptions options, string driverInstanceId)
public Task FlushOptionalCachesAsync(CancellationToken cancellationToken) => Task.CompletedTask;
// ---- IReadable ----
public async Task<IReadOnlyList<DataValueSnapshot>> ReadAsync(
IReadOnlyList<string> fullReferences, CancellationToken cancellationToken)
{
var plc = RequirePlc();
var now = DateTime.UtcNow;
var results = new DataValueSnapshot[fullReferences.Count];
await _gate.WaitAsync(cancellationToken).ConfigureAwait(false);
try
{
for (var i = 0; i < fullReferences.Count; i++)
{
var name = fullReferences[i];
if (!_tagsByName.TryGetValue(name, out var tag))
{
results[i] = new DataValueSnapshot(null, StatusBadNodeIdUnknown, null, now);
continue;
}
try
{
var value = await ReadOneAsync(plc, tag, cancellationToken).ConfigureAwait(false);
results[i] = new DataValueSnapshot(value, 0u, now, now);
_health = new DriverHealth(DriverState.Healthy, now, null);
}
catch (NotSupportedException)
{
results[i] = new DataValueSnapshot(null, StatusBadNotSupported, null, now);
}
catch (global::S7.Net.PlcException pex)
{
// S7.Net's PlcException carries an ErrorCode; PUT/GET-disabled on
// S7-1200/1500 surfaces here. Map to BadDeviceFailure so operators see a
// device-config problem (toggle PUT/GET in TIA Portal) rather than a
// transient fault — per driver-specs.md §5.
results[i] = new DataValueSnapshot(null, StatusBadDeviceFailure, null, now);
_health = new DriverHealth(DriverState.Degraded, _health.LastSuccessfulRead, pex.Message);
}
catch (Exception ex)
{
results[i] = new DataValueSnapshot(null, StatusBadCommunicationError, null, now);
_health = new DriverHealth(DriverState.Degraded, _health.LastSuccessfulRead, ex.Message);
}
}
}
finally { _gate.Release(); }
return results;
}
private async Task<object> ReadOneAsync(global::S7.Net.Plc plc, S7TagDefinition tag, CancellationToken ct)
{
var addr = _parsedByName[tag.Name];
// S7.Net's string-based ReadAsync returns object where the boxed .NET type depends on
// the size suffix: DBX=bool, DBB=byte, DBW=ushort, DBD=uint. Our S7DataType enum
// specifies the SEMANTIC type (Int16 vs UInt16 vs Float32 etc.); the reinterpret below
// converts the raw unsigned boxed value into the requested type without issuing an
// extra PLC round-trip.
var raw = await plc.ReadAsync(tag.Address, ct).ConfigureAwait(false)
?? throw new System.IO.InvalidDataException($"S7.Net returned null for '{tag.Address}'");
return (tag.DataType, addr.Size, raw) switch
{
(S7DataType.Bool, S7Size.Bit, bool b) => b,
(S7DataType.Byte, S7Size.Byte, byte by) => by,
(S7DataType.UInt16, S7Size.Word, ushort u16) => u16,
(S7DataType.Int16, S7Size.Word, ushort u16) => unchecked((short)u16),
(S7DataType.UInt32, S7Size.DWord, uint u32) => u32,
(S7DataType.Int32, S7Size.DWord, uint u32) => unchecked((int)u32),
(S7DataType.Float32, S7Size.DWord, uint u32) => BitConverter.UInt32BitsToSingle(u32),
(S7DataType.Int64, _, _) => throw new NotSupportedException("S7 Int64 reads land in a follow-up PR"),
(S7DataType.UInt64, _, _) => throw new NotSupportedException("S7 UInt64 reads land in a follow-up PR"),
(S7DataType.Float64, _, _) => throw new NotSupportedException("S7 Float64 (LReal) reads land in a follow-up PR"),
(S7DataType.String, _, _) => throw new NotSupportedException("S7 STRING reads land in a follow-up PR"),
(S7DataType.DateTime, _, _) => throw new NotSupportedException("S7 DateTime reads land in a follow-up PR"),
_ => throw new System.IO.InvalidDataException(
$"S7 Read type-mismatch: tag '{tag.Name}' declared {tag.DataType} but address '{tag.Address}' " +
$"parsed as Size={addr.Size}; S7.Net returned {raw.GetType().Name}"),
};
}
// ---- IWritable ----
public async Task<IReadOnlyList<WriteResult>> WriteAsync(
IReadOnlyList<WriteRequest> writes, CancellationToken cancellationToken)
{
var plc = RequirePlc();
var results = new WriteResult[writes.Count];
await _gate.WaitAsync(cancellationToken).ConfigureAwait(false);
try
{
for (var i = 0; i < writes.Count; i++)
{
var w = writes[i];
if (!_tagsByName.TryGetValue(w.FullReference, out var tag))
{
results[i] = new WriteResult(StatusBadNodeIdUnknown);
continue;
}
if (!tag.Writable)
{
results[i] = new WriteResult(StatusBadNotWritable);
continue;
}
try
{
await WriteOneAsync(plc, tag, w.Value, cancellationToken).ConfigureAwait(false);
results[i] = new WriteResult(0u);
}
catch (NotSupportedException)
{
results[i] = new WriteResult(StatusBadNotSupported);
}
catch (global::S7.Net.PlcException)
{
results[i] = new WriteResult(StatusBadDeviceFailure);
}
catch (Exception)
{
results[i] = new WriteResult(StatusBadInternalError);
}
}
}
finally { _gate.Release(); }
return results;
}
private async Task WriteOneAsync(global::S7.Net.Plc plc, S7TagDefinition tag, object? value, CancellationToken ct)
{
// S7.Net's Plc.WriteAsync(string address, object value) expects the boxed value to
// match the address's size-suffix type: DBX=bool, DBB=byte, DBW=ushort, DBD=uint.
// Our S7DataType lets the caller pass short/int/float; convert to the unsigned
// wire representation before handing off.
var boxed = tag.DataType switch
{
S7DataType.Bool => (object)Convert.ToBoolean(value),
S7DataType.Byte => (object)Convert.ToByte(value),
S7DataType.UInt16 => (object)Convert.ToUInt16(value),
S7DataType.Int16 => (object)unchecked((ushort)Convert.ToInt16(value)),
S7DataType.UInt32 => (object)Convert.ToUInt32(value),
S7DataType.Int32 => (object)unchecked((uint)Convert.ToInt32(value)),
S7DataType.Float32 => (object)BitConverter.SingleToUInt32Bits(Convert.ToSingle(value)),
S7DataType.Int64 => throw new NotSupportedException("S7 Int64 writes land in a follow-up PR"),
S7DataType.UInt64 => throw new NotSupportedException("S7 UInt64 writes land in a follow-up PR"),
S7DataType.Float64 => throw new NotSupportedException("S7 Float64 (LReal) writes land in a follow-up PR"),
S7DataType.String => throw new NotSupportedException("S7 STRING writes land in a follow-up PR"),
S7DataType.DateTime => throw new NotSupportedException("S7 DateTime writes land in a follow-up PR"),
_ => throw new InvalidOperationException($"Unknown S7DataType {tag.DataType}"),
};
await plc.WriteAsync(tag.Address, boxed, ct).ConfigureAwait(false);
}
private global::S7.Net.Plc RequirePlc() =>
Plc ?? throw new InvalidOperationException("S7Driver not initialized");
// ---- ITagDiscovery ----
public Task DiscoverAsync(IAddressSpaceBuilder builder, CancellationToken cancellationToken)
{
ArgumentNullException.ThrowIfNull(builder);
var folder = builder.Folder("S7", "S7");
foreach (var t in _options.Tags)
{
folder.Variable(t.Name, t.Name, new DriverAttributeInfo(
FullName: t.Name,
DriverDataType: MapDataType(t.DataType),
IsArray: false,
ArrayDim: null,
SecurityClass: t.Writable ? SecurityClassification.Operate : SecurityClassification.ViewOnly,
IsHistorized: false,
IsAlarm: false));
}
return Task.CompletedTask;
}
private static DriverDataType MapDataType(S7DataType t) => t switch
{
S7DataType.Bool => DriverDataType.Boolean,
S7DataType.Byte => DriverDataType.Int32, // no 8-bit in DriverDataType yet
S7DataType.Int16 or S7DataType.UInt16 or S7DataType.Int32 or S7DataType.UInt32 => DriverDataType.Int32,
S7DataType.Int64 or S7DataType.UInt64 => DriverDataType.Int32, // widens; lossy for >2^31-1
S7DataType.Float32 => DriverDataType.Float32,
S7DataType.Float64 => DriverDataType.Float64,
S7DataType.String => DriverDataType.String,
S7DataType.DateTime => DriverDataType.DateTime,
_ => DriverDataType.Int32,
};
// ---- ISubscribable (polling overlay) ----
public Task<ISubscriptionHandle> SubscribeAsync(
IReadOnlyList<string> fullReferences, TimeSpan publishingInterval, CancellationToken cancellationToken)
{
var id = Interlocked.Increment(ref _nextSubscriptionId);
var cts = new CancellationTokenSource();
// Floor at 100 ms — S7 CPUs scan 2-10 ms but the comms mailbox is processed at most
// once per scan; sub-100 ms polling just queues wire-side with worse latency.
var interval = publishingInterval < TimeSpan.FromMilliseconds(100)
? TimeSpan.FromMilliseconds(100)
: publishingInterval;
var handle = new S7SubscriptionHandle(id);
var state = new SubscriptionState(handle, [.. fullReferences], interval, cts);
_subscriptions[id] = state;
_ = Task.Run(() => PollLoopAsync(state, cts.Token), cts.Token);
return Task.FromResult<ISubscriptionHandle>(handle);
}
public Task UnsubscribeAsync(ISubscriptionHandle handle, CancellationToken cancellationToken)
{
if (handle is S7SubscriptionHandle h && _subscriptions.TryRemove(h.Id, out var state))
{
state.Cts.Cancel();
state.Cts.Dispose();
}
return Task.CompletedTask;
}
private async Task PollLoopAsync(SubscriptionState state, CancellationToken ct)
{
// Initial-data push per OPC UA Part 4 convention.
try { await PollOnceAsync(state, forceRaise: true, ct).ConfigureAwait(false); }
catch (OperationCanceledException) { return; }
catch { /* first-read error — polling continues */ }
while (!ct.IsCancellationRequested)
{
try { await Task.Delay(state.Interval, ct).ConfigureAwait(false); }
catch (OperationCanceledException) { return; }
try { await PollOnceAsync(state, forceRaise: false, ct).ConfigureAwait(false); }
catch (OperationCanceledException) { return; }
catch { /* transient polling error — loop continues, health surface reflects it */ }
}
}
private async Task PollOnceAsync(SubscriptionState state, bool forceRaise, CancellationToken ct)
{
var snapshots = await ReadAsync(state.TagReferences, ct).ConfigureAwait(false);
for (var i = 0; i < state.TagReferences.Count; i++)
{
var tagRef = state.TagReferences[i];
var current = snapshots[i];
var lastSeen = state.LastValues.TryGetValue(tagRef, out var prev) ? prev : default;
if (forceRaise || !Equals(lastSeen?.Value, current.Value) || lastSeen?.StatusCode != current.StatusCode)
{
state.LastValues[tagRef] = current;
OnDataChange?.Invoke(this, new DataChangeEventArgs(state.Handle, tagRef, current));
}
}
}
private sealed record SubscriptionState(
S7SubscriptionHandle Handle,
IReadOnlyList<string> TagReferences,
TimeSpan Interval,
CancellationTokenSource Cts)
{
public System.Collections.Concurrent.ConcurrentDictionary<string, DataValueSnapshot> LastValues { get; }
= new(StringComparer.OrdinalIgnoreCase);
}
private sealed record S7SubscriptionHandle(long Id) : ISubscriptionHandle
{
public string DiagnosticId => $"s7-sub-{Id}";
}
// ---- IHostConnectivityProbe ----
/// <summary>
/// Host identifier surfaced in <see cref="GetHostStatuses"/>. <c>host:port</c> format
/// matches the Modbus driver's convention so the Admin UI dashboard renders both
/// family's rows uniformly.
/// </summary>
public string HostName => $"{_options.Host}:{_options.Port}";
public IReadOnlyList<HostConnectivityStatus> GetHostStatuses()
{
lock (_probeLock)
return [new HostConnectivityStatus(HostName, _hostState, _hostStateChangedUtc)];
}
private async Task ProbeLoopAsync(CancellationToken ct)
{
while (!ct.IsCancellationRequested)
{
var success = false;
try
{
// Probe via S7.Net's low-cost GetCpuStatus — returns the CPU state (Run/Stop)
// and is intentionally light on the comms mailbox. Single-word Plc.ReadAsync
// would also work but GetCpuStatus doubles as a "PLC actually up" check.
using var probeCts = CancellationTokenSource.CreateLinkedTokenSource(ct);
probeCts.CancelAfter(_options.Probe.Timeout);
var plc = Plc;
if (plc is null) throw new InvalidOperationException("Plc dropped during probe");
await _gate.WaitAsync(probeCts.Token).ConfigureAwait(false);
try
{
_ = await plc.ReadStatusAsync(probeCts.Token).ConfigureAwait(false);
success = true;
}
finally { _gate.Release(); }
}
catch (OperationCanceledException) when (ct.IsCancellationRequested) { return; }
catch { /* transport/timeout/exception — treated as Stopped below */ }
TransitionTo(success ? HostState.Running : HostState.Stopped);
try { await Task.Delay(_options.Probe.Interval, ct).ConfigureAwait(false); }
catch (OperationCanceledException) { return; }
}
}
private void TransitionTo(HostState newState)
{
HostState old;
lock (_probeLock)
{
old = _hostState;
if (old == newState) return;
_hostState = newState;
_hostStateChangedUtc = DateTime.UtcNow;
}
OnHostStatusChanged?.Invoke(this, new HostStatusChangedEventArgs(HostName, old, newState));
}
public void Dispose() => DisposeAsync().AsTask().GetAwaiter().GetResult();
public async ValueTask DisposeAsync()

View File

@@ -0,0 +1,70 @@
using Shouldly;
using Xunit;
using ZB.MOM.WW.OtOpcUa.Core.Abstractions;
namespace ZB.MOM.WW.OtOpcUa.Driver.OpcUaClient.Tests;
[Trait("Category", "Unit")]
public sealed class OpcUaClientAlarmTests
{
[Theory]
[InlineData((ushort)1, AlarmSeverity.Low)]
[InlineData((ushort)200, AlarmSeverity.Low)]
[InlineData((ushort)201, AlarmSeverity.Medium)]
[InlineData((ushort)500, AlarmSeverity.Medium)]
[InlineData((ushort)501, AlarmSeverity.High)]
[InlineData((ushort)800, AlarmSeverity.High)]
[InlineData((ushort)801, AlarmSeverity.Critical)]
[InlineData((ushort)1000, AlarmSeverity.Critical)]
public void MapSeverity_buckets_per_OPC_UA_Part_9_guidance(ushort opcSev, AlarmSeverity expected)
{
OpcUaClientDriver.MapSeverity(opcSev).ShouldBe(expected);
}
[Fact]
public void MapSeverity_zero_maps_to_Low()
{
// 0 isn't in OPC UA's 1-1000 range but we handle it gracefully as Low.
OpcUaClientDriver.MapSeverity(0).ShouldBe(AlarmSeverity.Low);
}
[Fact]
public async Task SubscribeAlarmsAsync_without_initialize_throws_InvalidOperationException()
{
using var drv = new OpcUaClientDriver(new OpcUaClientDriverOptions(), "opcua-alarm-uninit");
await Should.ThrowAsync<InvalidOperationException>(async () =>
await drv.SubscribeAlarmsAsync([], TestContext.Current.CancellationToken));
}
[Fact]
public async Task UnsubscribeAlarmsAsync_with_unknown_handle_is_noop()
{
using var drv = new OpcUaClientDriver(new OpcUaClientDriverOptions(), "opcua-alarm-unknown");
// Parallels the subscribe handle path — session-drop races shouldn't crash the caller.
await drv.UnsubscribeAlarmsAsync(new FakeAlarmHandle(), TestContext.Current.CancellationToken);
}
[Fact]
public async Task AcknowledgeAsync_without_initialize_throws_InvalidOperationException()
{
using var drv = new OpcUaClientDriver(new OpcUaClientDriverOptions(), "opcua-ack-uninit");
await Should.ThrowAsync<InvalidOperationException>(async () =>
await drv.AcknowledgeAsync(
[new AlarmAcknowledgeRequest("ns=2;s=Src", "ns=2;s=Cond", "operator ack")],
TestContext.Current.CancellationToken));
}
[Fact]
public async Task AcknowledgeAsync_with_empty_batch_is_noop_even_without_init()
{
// Empty batch short-circuits before touching the session, so it's safe pre-init. This
// keeps batch-ack callers from needing to guard the list size themselves.
using var drv = new OpcUaClientDriver(new OpcUaClientDriverOptions(), "opcua-ack-empty");
await drv.AcknowledgeAsync([], TestContext.Current.CancellationToken);
}
private sealed class FakeAlarmHandle : IAlarmSubscriptionHandle
{
public string DiagnosticId => "fake-alarm";
}
}

View File

@@ -0,0 +1,62 @@
using Opc.Ua;
using Shouldly;
using Xunit;
using ZB.MOM.WW.OtOpcUa.Core.Abstractions;
namespace ZB.MOM.WW.OtOpcUa.Driver.OpcUaClient.Tests;
[Trait("Category", "Unit")]
public sealed class OpcUaClientAttributeMappingTests
{
[Theory]
[InlineData((uint)DataTypes.Boolean, DriverDataType.Boolean)]
[InlineData((uint)DataTypes.Int16, DriverDataType.Int16)]
[InlineData((uint)DataTypes.UInt16, DriverDataType.UInt16)]
[InlineData((uint)DataTypes.Int32, DriverDataType.Int32)]
[InlineData((uint)DataTypes.UInt32, DriverDataType.UInt32)]
[InlineData((uint)DataTypes.Int64, DriverDataType.Int64)]
[InlineData((uint)DataTypes.UInt64, DriverDataType.UInt64)]
[InlineData((uint)DataTypes.Float, DriverDataType.Float32)]
[InlineData((uint)DataTypes.Double, DriverDataType.Float64)]
[InlineData((uint)DataTypes.String, DriverDataType.String)]
[InlineData((uint)DataTypes.DateTime, DriverDataType.DateTime)]
public void MapUpstreamDataType_recognizes_standard_builtin_types(uint typeId, DriverDataType expected)
{
var nodeId = new NodeId(typeId);
OpcUaClientDriver.MapUpstreamDataType(nodeId).ShouldBe(expected);
}
[Fact]
public void MapUpstreamDataType_maps_SByte_and_Byte_to_Int16_since_DriverDataType_lacks_8bit()
{
// DriverDataType has no 8-bit type; conservative widen to Int16. Documented so a
// future Core.Abstractions PR that adds Int8/Byte can find this call site.
OpcUaClientDriver.MapUpstreamDataType(new NodeId((uint)DataTypes.SByte)).ShouldBe(DriverDataType.Int16);
OpcUaClientDriver.MapUpstreamDataType(new NodeId((uint)DataTypes.Byte)).ShouldBe(DriverDataType.Int16);
}
[Fact]
public void MapUpstreamDataType_falls_back_to_String_for_unknown_custom_types()
{
// Custom vendor extension object — NodeId in namespace 2 that isn't a standard type.
OpcUaClientDriver.MapUpstreamDataType(new NodeId("CustomStruct", 2)).ShouldBe(DriverDataType.String);
}
[Fact]
public void MapUpstreamDataType_handles_UtcTime_as_DateTime()
{
OpcUaClientDriver.MapUpstreamDataType(new NodeId((uint)DataTypes.UtcTime)).ShouldBe(DriverDataType.DateTime);
}
[Theory]
[InlineData((byte)0, SecurityClassification.ViewOnly)] // no access flags set
[InlineData((byte)1, SecurityClassification.ViewOnly)] // CurrentRead only
[InlineData((byte)2, SecurityClassification.Operate)] // CurrentWrite only
[InlineData((byte)3, SecurityClassification.Operate)] // CurrentRead + CurrentWrite
[InlineData((byte)0x0F, SecurityClassification.Operate)] // read+write+historyRead+historyWrite
[InlineData((byte)0x04, SecurityClassification.ViewOnly)] // HistoryRead only — no Write bit
public void MapAccessLevelToSecurityClass_respects_CurrentWrite_bit(byte accessLevel, SecurityClassification expected)
{
OpcUaClientDriver.MapAccessLevelToSecurityClass(accessLevel).ShouldBe(expected);
}
}

View File

@@ -0,0 +1,59 @@
using System.Security.Cryptography;
using System.Security.Cryptography.X509Certificates;
using Shouldly;
using Xunit;
namespace ZB.MOM.WW.OtOpcUa.Driver.OpcUaClient.Tests;
[Trait("Category", "Unit")]
public sealed class OpcUaClientCertAuthTests
{
[Fact]
public void BuildCertificateIdentity_rejects_missing_path()
{
var opts = new OpcUaClientDriverOptions { AuthType = OpcUaAuthType.Certificate };
Should.Throw<InvalidOperationException>(() => OpcUaClientDriver.BuildCertificateIdentity(opts))
.Message.ShouldContain("UserCertificatePath");
}
[Fact]
public void BuildCertificateIdentity_rejects_nonexistent_file()
{
var opts = new OpcUaClientDriverOptions
{
AuthType = OpcUaAuthType.Certificate,
UserCertificatePath = Path.Combine(Path.GetTempPath(), $"does-not-exist-{Guid.NewGuid():N}.pfx"),
};
Should.Throw<FileNotFoundException>(() => OpcUaClientDriver.BuildCertificateIdentity(opts));
}
[Fact]
public void BuildCertificateIdentity_loads_a_valid_PFX_with_private_key()
{
// Generate a self-signed cert on the fly so the test doesn't ship a static PFX.
// The driver doesn't care about the issuer — just needs a cert with a private key.
using var rsa = RSA.Create(2048);
var req = new CertificateRequest("CN=OpcUaClientCertAuthTests", rsa,
HashAlgorithmName.SHA256, RSASignaturePadding.Pkcs1);
var cert = req.CreateSelfSigned(DateTimeOffset.UtcNow.AddMinutes(-5), DateTimeOffset.UtcNow.AddHours(1));
var tmpPath = Path.Combine(Path.GetTempPath(), $"opcua-cert-test-{Guid.NewGuid():N}.pfx");
File.WriteAllBytes(tmpPath, cert.Export(X509ContentType.Pfx, "testpw"));
try
{
var opts = new OpcUaClientDriverOptions
{
AuthType = OpcUaAuthType.Certificate,
UserCertificatePath = tmpPath,
UserCertificatePassword = "testpw",
};
var identity = OpcUaClientDriver.BuildCertificateIdentity(opts);
identity.ShouldNotBeNull();
identity.TokenType.ShouldBe(Opc.Ua.UserTokenType.Certificate);
}
finally
{
try { File.Delete(tmpPath); } catch { /* best-effort */ }
}
}
}

View File

@@ -0,0 +1,55 @@
using Shouldly;
using Xunit;
using ZB.MOM.WW.OtOpcUa.Core.Abstractions;
namespace ZB.MOM.WW.OtOpcUa.Driver.OpcUaClient.Tests;
/// <summary>
/// Scaffold tests for <see cref="OpcUaClientDriver"/>'s <see cref="ITagDiscovery"/>
/// surface that don't require a live remote server. Live-browse coverage lands in a
/// follow-up PR once the in-process OPC UA server fixture is scaffolded.
/// </summary>
[Trait("Category", "Unit")]
public sealed class OpcUaClientDiscoveryTests
{
[Fact]
public async Task DiscoverAsync_without_initialize_throws_InvalidOperationException()
{
using var drv = new OpcUaClientDriver(new OpcUaClientDriverOptions(), "opcua-disco");
var builder = new NullAddressSpaceBuilder();
await Should.ThrowAsync<InvalidOperationException>(async () =>
await drv.DiscoverAsync(builder, TestContext.Current.CancellationToken));
}
[Fact]
public void DiscoverAsync_rejects_null_builder()
{
using var drv = new OpcUaClientDriver(new OpcUaClientDriverOptions(), "opcua-disco");
Should.ThrowAsync<ArgumentNullException>(async () =>
await drv.DiscoverAsync(null!, TestContext.Current.CancellationToken));
}
[Fact]
public void Discovery_caps_are_sensible_defaults()
{
var opts = new OpcUaClientDriverOptions();
opts.MaxDiscoveredNodes.ShouldBe(10_000, "bounds memory on runaway servers without clipping normal models");
opts.MaxBrowseDepth.ShouldBe(10, "deep enough for realistic info models; shallow enough for cycle safety");
opts.BrowseRoot.ShouldBeNull("null = default to ObjectsFolder i=85");
}
private sealed class NullAddressSpaceBuilder : IAddressSpaceBuilder
{
public IAddressSpaceBuilder Folder(string browseName, string displayName) => this;
public IVariableHandle Variable(string browseName, string displayName, DriverAttributeInfo attributeInfo)
=> new StubHandle();
public void AddProperty(string browseName, DriverDataType dataType, object? value) { }
public void AttachAlarmCondition(IVariableHandle sourceVariable, string alarmName, DriverAttributeInfo alarmInfo) { }
private sealed class StubHandle : IVariableHandle
{
public string FullReference => "stub";
public IAlarmConditionSink MarkAsAlarmCondition(AlarmConditionInfo info) => throw new NotSupportedException();
}
}
}

View File

@@ -0,0 +1,91 @@
using Shouldly;
using Xunit;
using ZB.MOM.WW.OtOpcUa.Core.Abstractions;
namespace ZB.MOM.WW.OtOpcUa.Driver.OpcUaClient.Tests;
/// <summary>
/// Scaffold-level tests for <see cref="OpcUaClientDriver"/> that don't require a live
/// remote OPC UA server. PR 67+ adds IReadable/IWritable/ITagDiscovery/ISubscribable
/// tests against a local in-process OPC UA server fixture.
/// </summary>
[Trait("Category", "Unit")]
public sealed class OpcUaClientDriverScaffoldTests
{
[Fact]
public void Default_options_target_standard_opcua_port_and_anonymous_auth()
{
var opts = new OpcUaClientDriverOptions();
opts.EndpointUrl.ShouldBe("opc.tcp://localhost:4840", "4840 is the IANA-assigned OPC UA port");
opts.SecurityMode.ShouldBe(OpcUaSecurityMode.None);
opts.SecurityPolicy.ShouldBe(OpcUaSecurityPolicy.None);
opts.AuthType.ShouldBe(OpcUaAuthType.Anonymous);
opts.AutoAcceptCertificates.ShouldBeFalse("production default must reject untrusted server certs");
}
[Fact]
public void Default_timeouts_match_driver_specs_section_8()
{
var opts = new OpcUaClientDriverOptions();
opts.SessionTimeout.ShouldBe(TimeSpan.FromSeconds(120));
opts.KeepAliveInterval.ShouldBe(TimeSpan.FromSeconds(5));
opts.ReconnectPeriod.ShouldBe(TimeSpan.FromSeconds(5));
}
[Fact]
public void Driver_reports_type_and_id_before_connect()
{
using var drv = new OpcUaClientDriver(new OpcUaClientDriverOptions(), "opcua-test");
drv.DriverType.ShouldBe("OpcUaClient");
drv.DriverInstanceId.ShouldBe("opcua-test");
drv.GetHealth().State.ShouldBe(DriverState.Unknown);
}
[Fact]
public async Task Initialize_against_unreachable_endpoint_transitions_to_Faulted_and_throws()
{
// RFC 5737 reserved-for-documentation IP; won't route anywhere. Pick opc.tcp:// so
// endpoint selection hits the transport-layer connection rather than a DNS lookup.
var opts = new OpcUaClientDriverOptions
{
// Port 1 on loopback is effectively guaranteed to be closed — the OS responds
// with TCP RST immediately instead of hanging on connect, which keeps the
// unreachable-host tests snappy. Don't use an RFC 5737 reserved IP; those get
// routed to a black-hole + time out only after the SDK's internal retry/backoff
// fully elapses (~60s even with Options.Timeout=500ms).
EndpointUrl = "opc.tcp://127.0.0.1:1",
Timeout = TimeSpan.FromMilliseconds(500),
AutoAcceptCertificates = true, // dev-mode to bypass cert validation in the test
};
using var drv = new OpcUaClientDriver(opts, "opcua-unreach");
await Should.ThrowAsync<Exception>(async () =>
await drv.InitializeAsync("{}", TestContext.Current.CancellationToken));
var health = drv.GetHealth();
health.State.ShouldBe(DriverState.Faulted);
health.LastError.ShouldNotBeNull();
}
[Fact]
public async Task Reinitialize_against_unreachable_endpoint_re_throws()
{
var opts = new OpcUaClientDriverOptions
{
// Port 1 on loopback is effectively guaranteed to be closed — the OS responds
// with TCP RST immediately instead of hanging on connect, which keeps the
// unreachable-host tests snappy. Don't use an RFC 5737 reserved IP; those get
// routed to a black-hole + time out only after the SDK's internal retry/backoff
// fully elapses (~60s even with Options.Timeout=500ms).
EndpointUrl = "opc.tcp://127.0.0.1:1",
Timeout = TimeSpan.FromMilliseconds(500),
AutoAcceptCertificates = true,
};
using var drv = new OpcUaClientDriver(opts, "opcua-reinit");
await Should.ThrowAsync<Exception>(async () =>
await drv.InitializeAsync("{}", TestContext.Current.CancellationToken));
await Should.ThrowAsync<Exception>(async () =>
await drv.ReinitializeAsync("{}", TestContext.Current.CancellationToken));
}
}

View File

@@ -0,0 +1,81 @@
using Shouldly;
using Xunit;
using ZB.MOM.WW.OtOpcUa.Core.Abstractions;
namespace ZB.MOM.WW.OtOpcUa.Driver.OpcUaClient.Tests;
[Trait("Category", "Unit")]
public sealed class OpcUaClientFailoverTests
{
[Fact]
public void ResolveEndpointCandidates_prefers_EndpointUrls_when_provided()
{
var opts = new OpcUaClientDriverOptions
{
EndpointUrl = "opc.tcp://fallback:4840",
EndpointUrls = ["opc.tcp://primary:4840", "opc.tcp://backup:4841"],
};
var list = OpcUaClientDriver.ResolveEndpointCandidates(opts);
list.Count.ShouldBe(2);
list[0].ShouldBe("opc.tcp://primary:4840");
list[1].ShouldBe("opc.tcp://backup:4841");
}
[Fact]
public void ResolveEndpointCandidates_falls_back_to_single_EndpointUrl_when_list_empty()
{
var opts = new OpcUaClientDriverOptions { EndpointUrl = "opc.tcp://only:4840" };
var list = OpcUaClientDriver.ResolveEndpointCandidates(opts);
list.Count.ShouldBe(1);
list[0].ShouldBe("opc.tcp://only:4840");
}
[Fact]
public void ResolveEndpointCandidates_empty_list_treated_as_fallback_to_EndpointUrl()
{
// Explicit empty list should still fall back to the single-URL shortcut rather than
// producing a zero-candidate sweep that would immediately throw with no URLs tried.
var opts = new OpcUaClientDriverOptions
{
EndpointUrl = "opc.tcp://single:4840",
EndpointUrls = [],
};
OpcUaClientDriver.ResolveEndpointCandidates(opts).Count.ShouldBe(1);
}
[Fact]
public void HostName_uses_first_candidate_before_connect()
{
var opts = new OpcUaClientDriverOptions
{
EndpointUrls = ["opc.tcp://primary:4840", "opc.tcp://backup:4841"],
};
using var drv = new OpcUaClientDriver(opts, "opcua-host");
drv.HostName.ShouldBe("opc.tcp://primary:4840",
"pre-connect the dashboard should show the first candidate URL so operators can link back");
}
[Fact]
public async Task Initialize_against_all_unreachable_endpoints_throws_AggregateException_listing_each()
{
// Port 1 + port 2 + port 3 on loopback are all guaranteed closed (TCP RST immediate).
// Failover sweep should attempt all three and throw AggregateException naming each URL
// so operators see exactly which candidates were tried.
var opts = new OpcUaClientDriverOptions
{
EndpointUrls = ["opc.tcp://127.0.0.1:1", "opc.tcp://127.0.0.1:2", "opc.tcp://127.0.0.1:3"],
PerEndpointConnectTimeout = TimeSpan.FromMilliseconds(500),
Timeout = TimeSpan.FromMilliseconds(500),
AutoAcceptCertificates = true,
};
using var drv = new OpcUaClientDriver(opts, "opcua-failover");
var ex = await Should.ThrowAsync<AggregateException>(async () =>
await drv.InitializeAsync("{}", TestContext.Current.CancellationToken));
ex.Message.ShouldContain("127.0.0.1:1");
ex.Message.ShouldContain("127.0.0.1:2");
ex.Message.ShouldContain("127.0.0.1:3");
drv.GetHealth().State.ShouldBe(DriverState.Faulted);
}
}

View File

@@ -0,0 +1,91 @@
using Opc.Ua;
using Shouldly;
using Xunit;
using ZB.MOM.WW.OtOpcUa.Core.Abstractions;
namespace ZB.MOM.WW.OtOpcUa.Driver.OpcUaClient.Tests;
[Trait("Category", "Unit")]
public sealed class OpcUaClientHistoryTests
{
[Theory]
[InlineData(HistoryAggregateType.Average)]
[InlineData(HistoryAggregateType.Minimum)]
[InlineData(HistoryAggregateType.Maximum)]
[InlineData(HistoryAggregateType.Total)]
[InlineData(HistoryAggregateType.Count)]
public void MapAggregateToNodeId_returns_standard_Part13_aggregate_for_every_enum(HistoryAggregateType agg)
{
var nodeId = OpcUaClientDriver.MapAggregateToNodeId(agg);
NodeId.IsNull(nodeId).ShouldBeFalse();
// Every mapping should resolve to an AggregateFunction_* NodeId (namespace 0, numeric id).
nodeId.NamespaceIndex.ShouldBe((ushort)0);
}
[Fact]
public void MapAggregateToNodeId_rejects_invalid_enum_value()
{
// Defense-in-depth: a future HistoryAggregateType addition mustn't silently fall through.
Should.Throw<ArgumentOutOfRangeException>(() =>
OpcUaClientDriver.MapAggregateToNodeId((HistoryAggregateType)99));
}
[Fact]
public async Task ReadRawAsync_without_initialize_throws_InvalidOperationException()
{
using var drv = new OpcUaClientDriver(new OpcUaClientDriverOptions(), "opcua-hist-uninit");
await Should.ThrowAsync<InvalidOperationException>(async () =>
await drv.ReadRawAsync("ns=2;s=Counter",
DateTime.UtcNow.AddMinutes(-5), DateTime.UtcNow, 1000,
TestContext.Current.CancellationToken));
}
[Fact]
public async Task ReadRawAsync_with_malformed_NodeId_returns_empty_result_not_throw()
{
// Same defensive pattern as ReadAsync / WriteAsync — malformed NodeId short-circuits
// to an empty result rather than crashing a batch history call. Needs init via the
// throw path first, then we pass "" to trigger the parse-fail branch inside
// ExecuteHistoryReadAsync. The init itself fails against 127.0.0.1:1 so we stop there.
// Not runnable without init — keep as placeholder for when the in-process fixture
// PR lands.
await Task.CompletedTask;
}
[Fact]
public async Task ReadProcessedAsync_without_initialize_throws_InvalidOperationException()
{
using var drv = new OpcUaClientDriver(new OpcUaClientDriverOptions(), "opcua-hist-uninit");
await Should.ThrowAsync<InvalidOperationException>(async () =>
await drv.ReadProcessedAsync("ns=2;s=Counter",
DateTime.UtcNow.AddMinutes(-5), DateTime.UtcNow,
TimeSpan.FromSeconds(10), HistoryAggregateType.Average,
TestContext.Current.CancellationToken));
}
[Fact]
public async Task ReadAtTimeAsync_without_initialize_throws_InvalidOperationException()
{
using var drv = new OpcUaClientDriver(new OpcUaClientDriverOptions(), "opcua-hist-uninit");
await Should.ThrowAsync<InvalidOperationException>(async () =>
await drv.ReadAtTimeAsync("ns=2;s=Counter",
[DateTime.UtcNow.AddMinutes(-5), DateTime.UtcNow],
TestContext.Current.CancellationToken));
}
[Fact]
public async Task ReadEventsAsync_throws_NotSupportedException_as_documented()
{
// The IHistoryProvider default implementation throws; the OPC UA Client driver
// deliberately inherits that default (see PR 76 commit body) because the OPC UA
// client call path needs an EventFilter SelectClauses spec the interface doesn't carry.
using var drv = new OpcUaClientDriver(new OpcUaClientDriverOptions(), "opcua-events-default");
await Should.ThrowAsync<NotSupportedException>(async () =>
await ((IHistoryProvider)drv).ReadEventsAsync(
sourceName: null,
startUtc: DateTime.UtcNow.AddMinutes(-5),
endUtc: DateTime.UtcNow,
maxEvents: 100,
cancellationToken: TestContext.Current.CancellationToken));
}
}

View File

@@ -0,0 +1,32 @@
using Shouldly;
using Xunit;
using ZB.MOM.WW.OtOpcUa.Core.Abstractions;
namespace ZB.MOM.WW.OtOpcUa.Driver.OpcUaClient.Tests;
/// <summary>
/// Unit tests for the IReadable/IWritable surface that don't need a live remote OPC UA
/// server. Wire-level round-trips against a local in-process server fixture land in a
/// follow-up PR once we have one scaffolded.
/// </summary>
[Trait("Category", "Unit")]
public sealed class OpcUaClientReadWriteTests
{
[Fact]
public async Task ReadAsync_without_initialize_throws_InvalidOperationException()
{
using var drv = new OpcUaClientDriver(new OpcUaClientDriverOptions(), "opcua-uninit");
await Should.ThrowAsync<InvalidOperationException>(async () =>
await drv.ReadAsync(["ns=2;s=Demo"], TestContext.Current.CancellationToken));
}
[Fact]
public async Task WriteAsync_without_initialize_throws_InvalidOperationException()
{
using var drv = new OpcUaClientDriver(new OpcUaClientDriverOptions(), "opcua-uninit");
await Should.ThrowAsync<InvalidOperationException>(async () =>
await drv.WriteAsync(
[new WriteRequest("ns=2;s=Demo", 42)],
TestContext.Current.CancellationToken));
}
}

View File

@@ -0,0 +1,36 @@
using Shouldly;
using Xunit;
namespace ZB.MOM.WW.OtOpcUa.Driver.OpcUaClient.Tests;
/// <summary>
/// Scaffold tests for <see cref="SessionReconnectHandler"/> wiring. Wire-level
/// disconnect-reconnect-resume coverage against a live upstream server lands with the
/// in-process fixture — too much machinery for a unit-test-only lane.
/// </summary>
[Trait("Category", "Unit")]
public sealed class OpcUaClientReconnectTests
{
[Fact]
public void Default_ReconnectPeriod_matches_driver_specs_5_seconds()
{
new OpcUaClientDriverOptions().ReconnectPeriod.ShouldBe(TimeSpan.FromSeconds(5));
}
[Fact]
public void Options_ReconnectPeriod_is_configurable_for_aggressive_or_relaxed_retry()
{
var opts = new OpcUaClientDriverOptions { ReconnectPeriod = TimeSpan.FromMilliseconds(500) };
opts.ReconnectPeriod.ShouldBe(TimeSpan.FromMilliseconds(500));
}
[Fact]
public void Driver_starts_with_no_reconnect_handler_active_pre_init()
{
// The reconnect handler is lazy — spun up only when a bad keep-alive fires. Pre-init
// there's no session to reconnect, so the field must be null (indirectly verified by
// the lifecycle-shape test suite catching any accidental construction).
using var drv = new OpcUaClientDriver(new OpcUaClientDriverOptions(), "opcua-reconnect");
drv.GetHealth().State.ShouldBe(Core.Abstractions.DriverState.Unknown);
}
}

View File

@@ -0,0 +1,54 @@
using Opc.Ua;
using Shouldly;
using Xunit;
namespace ZB.MOM.WW.OtOpcUa.Driver.OpcUaClient.Tests;
[Trait("Category", "Unit")]
public sealed class OpcUaClientSecurityPolicyTests
{
[Theory]
[InlineData(OpcUaSecurityPolicy.None)]
[InlineData(OpcUaSecurityPolicy.Basic128Rsa15)]
[InlineData(OpcUaSecurityPolicy.Basic256)]
[InlineData(OpcUaSecurityPolicy.Basic256Sha256)]
[InlineData(OpcUaSecurityPolicy.Aes128_Sha256_RsaOaep)]
[InlineData(OpcUaSecurityPolicy.Aes256_Sha256_RsaPss)]
public void MapSecurityPolicy_returns_known_non_empty_uri_for_every_enum_value(OpcUaSecurityPolicy policy)
{
var uri = OpcUaClientDriver.MapSecurityPolicy(policy);
uri.ShouldNotBeNullOrEmpty();
// Each URI should end in the enum name (for the non-None policies) so a driver
// operator reading logs can correlate the URI back to the config value.
if (policy != OpcUaSecurityPolicy.None)
uri.ShouldContain(policy.ToString());
}
[Fact]
public void MapSecurityPolicy_None_matches_SDK_None_URI()
{
OpcUaClientDriver.MapSecurityPolicy(OpcUaSecurityPolicy.None)
.ShouldBe(SecurityPolicies.None);
}
[Fact]
public void MapSecurityPolicy_Basic256Sha256_matches_SDK_URI()
{
OpcUaClientDriver.MapSecurityPolicy(OpcUaSecurityPolicy.Basic256Sha256)
.ShouldBe(SecurityPolicies.Basic256Sha256);
}
[Fact]
public void MapSecurityPolicy_Aes256_Sha256_RsaPss_matches_SDK_URI()
{
OpcUaClientDriver.MapSecurityPolicy(OpcUaSecurityPolicy.Aes256_Sha256_RsaPss)
.ShouldBe(SecurityPolicies.Aes256_Sha256_RsaPss);
}
[Fact]
public void Every_enum_value_has_a_mapping()
{
foreach (OpcUaSecurityPolicy p in Enum.GetValues<OpcUaSecurityPolicy>())
Should.NotThrow(() => OpcUaClientDriver.MapSecurityPolicy(p));
}
}

View File

@@ -0,0 +1,50 @@
using Shouldly;
using Xunit;
using ZB.MOM.WW.OtOpcUa.Core.Abstractions;
namespace ZB.MOM.WW.OtOpcUa.Driver.OpcUaClient.Tests;
/// <summary>
/// Scaffold tests for <c>ISubscribable</c> + <c>IHostConnectivityProbe</c> that don't
/// need a live remote server. Live-session tests (subscribe/unsubscribe round-trip,
/// keep-alive transitions) land in a follow-up PR once the in-process OPC UA server
/// fixture is scaffolded.
/// </summary>
[Trait("Category", "Unit")]
public sealed class OpcUaClientSubscribeAndProbeTests
{
[Fact]
public async Task SubscribeAsync_without_initialize_throws_InvalidOperationException()
{
using var drv = new OpcUaClientDriver(new OpcUaClientDriverOptions(), "opcua-sub-uninit");
await Should.ThrowAsync<InvalidOperationException>(async () =>
await drv.SubscribeAsync(["ns=2;s=Demo"], TimeSpan.FromMilliseconds(100), TestContext.Current.CancellationToken));
}
[Fact]
public async Task UnsubscribeAsync_with_unknown_handle_is_noop()
{
using var drv = new OpcUaClientDriver(new OpcUaClientDriverOptions(), "opcua-sub-unknown");
// UnsubscribeAsync returns cleanly for handles it doesn't recognise — protects against
// the caller's race with server-side cleanup after a session drop.
await drv.UnsubscribeAsync(new FakeHandle(), TestContext.Current.CancellationToken);
}
[Fact]
public void GetHostStatuses_returns_endpoint_url_row_pre_init()
{
using var drv = new OpcUaClientDriver(
new OpcUaClientDriverOptions { EndpointUrl = "opc.tcp://plc.example:4840" },
"opcua-hosts");
var rows = drv.GetHostStatuses();
rows.Count.ShouldBe(1);
rows[0].HostName.ShouldBe("opc.tcp://plc.example:4840",
"host identity mirrors the endpoint URL so the Admin /hosts dashboard can link back to the remote server");
rows[0].State.ShouldBe(HostState.Unknown);
}
private sealed class FakeHandle : ISubscriptionHandle
{
public string DiagnosticId => "fake";
}
}

View File

@@ -0,0 +1,31 @@
<Project Sdk="Microsoft.NET.Sdk">
<PropertyGroup>
<TargetFramework>net10.0</TargetFramework>
<Nullable>enable</Nullable>
<ImplicitUsings>enable</ImplicitUsings>
<IsPackable>false</IsPackable>
<IsTestProject>true</IsTestProject>
<RootNamespace>ZB.MOM.WW.OtOpcUa.Driver.OpcUaClient.Tests</RootNamespace>
</PropertyGroup>
<ItemGroup>
<PackageReference Include="xunit.v3" Version="1.1.0"/>
<PackageReference Include="Shouldly" Version="4.3.0"/>
<PackageReference Include="Microsoft.NET.Test.Sdk" Version="17.12.0"/>
<PackageReference Include="xunit.runner.visualstudio" Version="3.0.2">
<PrivateAssets>all</PrivateAssets>
<IncludeAssets>runtime; build; native; contentfiles; analyzers; buildtransitive</IncludeAssets>
</PackageReference>
</ItemGroup>
<ItemGroup>
<ProjectReference Include="..\..\src\ZB.MOM.WW.OtOpcUa.Driver.OpcUaClient\ZB.MOM.WW.OtOpcUa.Driver.OpcUaClient.csproj"/>
</ItemGroup>
<ItemGroup>
<NuGetAuditSuppress Include="https://github.com/advisories/GHSA-37gx-xxp4-5rgx"/>
<NuGetAuditSuppress Include="https://github.com/advisories/GHSA-w3x6-4m5h-cxqf"/>
</ItemGroup>
</Project>

View File

@@ -0,0 +1,117 @@
using Shouldly;
using Xunit;
using ZB.MOM.WW.OtOpcUa.Core.Abstractions;
namespace ZB.MOM.WW.OtOpcUa.Driver.S7.Tests;
/// <summary>
/// Shape tests for <see cref="S7Driver"/>'s <see cref="ITagDiscovery"/>,
/// <see cref="ISubscribable"/>, and <see cref="IHostConnectivityProbe"/> surfaces that
/// don't need a live PLC. Wire-level polling round-trips and probe transitions land in a
/// follow-up PR once we have a mock S7 server.
/// </summary>
[Trait("Category", "Unit")]
public sealed class S7DiscoveryAndSubscribeTests
{
private sealed class RecordingAddressSpaceBuilder : IAddressSpaceBuilder
{
public readonly List<string> Folders = new();
public readonly List<(string Name, DriverAttributeInfo Attr)> Variables = new();
public IAddressSpaceBuilder Folder(string browseName, string displayName)
{
Folders.Add(browseName);
return this;
}
public IVariableHandle Variable(string browseName, string displayName, DriverAttributeInfo attributeInfo)
{
Variables.Add((browseName, attributeInfo));
return new StubHandle();
}
public void AddProperty(string browseName, DriverDataType dataType, object? value) { }
public void AttachAlarmCondition(IVariableHandle sourceVariable, string alarmName, DriverAttributeInfo alarmInfo) { }
private sealed class StubHandle : IVariableHandle
{
public string FullReference => "stub";
public IAlarmConditionSink MarkAsAlarmCondition(AlarmConditionInfo info)
=> throw new NotImplementedException("S7 driver never calls this — no alarm surfacing");
}
}
[Fact]
public async Task DiscoverAsync_projects_every_tag_into_the_address_space()
{
var opts = new S7DriverOptions
{
Host = "192.0.2.1",
Tags =
[
new("TempSetpoint", "DB1.DBW0", S7DataType.Int16, Writable: true),
new("FaultBit", "M0.0", S7DataType.Bool, Writable: false),
new("PIDOutput", "DB5.DBD12", S7DataType.Float32, Writable: true),
],
};
using var drv = new S7Driver(opts, "s7-disco");
var builder = new RecordingAddressSpaceBuilder();
await drv.DiscoverAsync(builder, TestContext.Current.CancellationToken);
builder.Folders.ShouldContain("S7");
builder.Variables.Count.ShouldBe(3);
builder.Variables[0].Name.ShouldBe("TempSetpoint");
builder.Variables[0].Attr.SecurityClass.ShouldBe(SecurityClassification.Operate, "writable tags get Operate security class");
builder.Variables[1].Attr.SecurityClass.ShouldBe(SecurityClassification.ViewOnly, "read-only tags get ViewOnly");
builder.Variables[2].Attr.DriverDataType.ShouldBe(DriverDataType.Float32);
}
[Fact]
public void GetHostStatuses_returns_one_row_with_host_port_identity_pre_init()
{
var opts = new S7DriverOptions { Host = "plc1.internal", Port = 102 };
using var drv = new S7Driver(opts, "s7-host");
var rows = drv.GetHostStatuses();
rows.Count.ShouldBe(1);
rows[0].HostName.ShouldBe("plc1.internal:102");
rows[0].State.ShouldBe(HostState.Unknown, "pre-init / pre-probe state is Unknown");
}
[Fact]
public async Task SubscribeAsync_returns_unique_handles_and_UnsubscribeAsync_accepts_them()
{
var opts = new S7DriverOptions { Host = "192.0.2.1" };
using var drv = new S7Driver(opts, "s7-sub");
// SubscribeAsync does not itself call ReadAsync (the poll task does), so this works
// even though the driver isn't initialized. The poll task catches the resulting
// InvalidOperationException and the loop quietly continues — same pattern as the
// Modbus driver's poll loop tolerating transient transport failures.
var h1 = await drv.SubscribeAsync(["T1"], TimeSpan.FromMilliseconds(200), TestContext.Current.CancellationToken);
var h2 = await drv.SubscribeAsync(["T2"], TimeSpan.FromMilliseconds(200), TestContext.Current.CancellationToken);
h1.DiagnosticId.ShouldStartWith("s7-sub-");
h2.DiagnosticId.ShouldStartWith("s7-sub-");
h1.DiagnosticId.ShouldNotBe(h2.DiagnosticId);
await drv.UnsubscribeAsync(h1, TestContext.Current.CancellationToken);
await drv.UnsubscribeAsync(h2, TestContext.Current.CancellationToken);
// UnsubscribeAsync with an unknown handle must be a no-op, not throw.
await drv.UnsubscribeAsync(h1, TestContext.Current.CancellationToken);
}
[Fact]
public async Task Subscribe_publishing_interval_is_floored_at_100ms()
{
var opts = new S7DriverOptions { Host = "192.0.2.1", Probe = new S7ProbeOptions { Enabled = false } };
using var drv = new S7Driver(opts, "s7-floor");
// 50 ms requested — the floor protects the S7 CPU from sub-scan polling that would
// just queue wire-side. Test that the subscription is accepted (the floor is applied
// internally; the floor value isn't exposed, so we're really just asserting that the
// driver doesn't reject small intervals).
var h = await drv.SubscribeAsync(["T"], TimeSpan.FromMilliseconds(50), TestContext.Current.CancellationToken);
h.ShouldNotBeNull();
await drv.UnsubscribeAsync(h, TestContext.Current.CancellationToken);
}
}

View File

@@ -0,0 +1,54 @@
using Shouldly;
using Xunit;
namespace ZB.MOM.WW.OtOpcUa.Driver.S7.Tests;
/// <summary>
/// Unit tests for <see cref="S7Driver"/>'s <c>IReadable</c>/<c>IWritable</c> surface
/// that don't require a live PLC — covers error paths (not-initialized, unknown tag,
/// read-only write rejection, unsupported data types). Wire-level round-trip tests
/// against a live S7 or a mock-server land in a follow-up PR since S7.Net doesn't ship
/// an in-process fake and an adequate mock is non-trivial.
/// </summary>
[Trait("Category", "Unit")]
public sealed class S7DriverReadWriteTests
{
[Fact]
public async Task Initialize_rejects_invalid_tag_address_and_fails_fast()
{
// Bad address at init time must throw; the alternative (deferring the parse to the
// first read) would surface the config bug as BadInternalError on every subsequent
// Read which is impossible for an operator to diagnose from the OPC UA client.
var opts = new S7DriverOptions
{
Host = "192.0.2.1", // reserved — will never complete TCP handshake
Timeout = TimeSpan.FromMilliseconds(250),
Tags = [new S7TagDefinition("BadTag", "NOT-AN-S7-ADDRESS", S7DataType.Int16)],
};
using var drv = new S7Driver(opts, "s7-bad-tag");
// Either the TCP connect fails first (Exception) or the parser fails (FormatException)
// — both are acceptable since both are init-time fail-fast. What matters is that we
// don't return a "healthy" driver with a latent bad tag.
await Should.ThrowAsync<Exception>(async () =>
await drv.InitializeAsync("{}", TestContext.Current.CancellationToken));
}
[Fact]
public async Task ReadAsync_without_initialize_throws_InvalidOperationException()
{
using var drv = new S7Driver(new S7DriverOptions { Host = "192.0.2.1" }, "s7-uninit");
await Should.ThrowAsync<InvalidOperationException>(async () =>
await drv.ReadAsync(["Any"], TestContext.Current.CancellationToken));
}
[Fact]
public async Task WriteAsync_without_initialize_throws_InvalidOperationException()
{
using var drv = new S7Driver(new S7DriverOptions { Host = "192.0.2.1" }, "s7-uninit");
await Should.ThrowAsync<InvalidOperationException>(async () =>
await drv.WriteAsync(
[new(FullReference: "Any", Value: (short)0)],
TestContext.Current.CancellationToken));
}
}