Joseph Doherty
fad04bbdf7
Phase 3 PR 75 -- OPC UA Client IAlarmSource (A&C event forwarding + Acknowledge). Driver now implements IAlarmSource -- subscribes to upstream BaseEventType/ConditionType events + re-fires them as local AlarmEventArgs. SubscribeAlarmsAsync flow: create a new Subscription on the upstream session at 500ms publishing interval; add ONE MonitoredItem on ObjectIds.Server with AttributeId=EventNotifier (server node is the canonical event publisher in A&C -- events from deep sources bubble up to Server node via HasNotifier references, which is how the OPC Foundation reference server + every production server I've tested exposes A&C); apply an EventFilter with 7 SelectClauses pulling EventId, EventType, SourceNode, Message, Severity, Time, and the Condition node itself (empty-BrowsePath + NodeId attribute = 'the condition'). Indexed field access via AlarmField* constants so the per-event handler is O(1). Pre-resolved HashSet<string> on sourceNodeIds so the per-event source-node filter is O(1) match; empty set means 'forward every event'. OnEventNotification extracts fields from EventFieldList, maps Message LocalizedText -> plain string, Severity ushort -> AlarmSeverity via MapSeverity using the OPC UA Part 9 bands (1-200 Low, 201-500 Medium, 501-800 High, 801-1000 Critical; 0 defensively maps to Low), fires OnAlarmEvent. Queue size 1000 + DiscardOldest=false so bursts (e.g. a CPU startup storm of 50 alarms) don't drop events -- matches the 'cascading quality' principle from driver-specs.md \u00A78 where the driver must not silently lose upstream state. UnsubscribeAlarmsAsync mirrors the ISubscribable unsub pattern: idempotent, tolerates unknown handle, DeleteAsync(silent:true). AcknowledgeAsync: batch CallMethodRequest on AcknowledgeableConditionType.Acknowledge per request -- each request's ConditionId is the method ObjectId, EventId is passed empty (server resolves to 'most recent' which is the conformance-recommended behavior when the client doesn't track branching), Comment wraps in LocalizedText. Empty batch short-circuits BEFORE RequireSession so pre-init empty calls don't throw -- bulk-ack UIs can pass empty lists (filter matched nothing) without size guards. Shutdown path also tears down alarm subscriptions before closing the session to avoid BadSubscriptionIdInvalid noise, mirroring the ISubscribable sub cleanup. Unit tests (OpcUaClientAlarmTests, 6 facts): MapSeverity theory covers all 4 bands + boundaries (1/200/201/500/501/800/801/1000); MapSeverity_zero_maps_to_Low (defensive); SubscribeAlarmsAsync_without_initialize_throws; UnsubscribeAlarmsAsync_with_unknown_handle_is_noop; AcknowledgeAsync_without_initialize_throws; AcknowledgeAsync_with_empty_batch_is_noop_even_without_init (short-circuit). Wire-level alarm round-trip coverage against a live upstream server (server pushes an event, driver fires OnAlarmEvent with matching fields) lands with the in-process fixture PR. 67/67 OpcUaClient.Tests pass (54 prior + 13 new -- 6 alarm + 7 attribute mapping carry-over). dotnet build clean.
2026-04-19 02:09:04 -04:00
Joseph Doherty
ba3a5598e1
Phase 3 PR 74 -- OPC UA Client transparent reconnect via SessionReconnectHandler. Before this PR a session keep-alive failure flipped HostState to Stopped and stayed there until operator intervention. PR 74 wires the SDK's SessionReconnectHandler so the driver automatically retries + swaps in a new session when the upstream server comes back. New _reconnectHandler field lazily instantiated inside OnKeepAlive on a bad status; subsequent bad keep-alives during the same outage no-op (null-check prevents stacked handlers). Constructor uses (telemetry:null, reconnectAbort:false, maxReconnectPeriod:2min) -- reconnectAbort=false so the handler keeps trying across many retry cycles; 2min cap prevents pathological back-off from starving operator visibility. BeginReconnect takes the current ISession + ReconnectPeriod (from OpcUaClientDriverOptions, default 5s per driver-specs.md \u00A78) + our OnReconnectComplete callback. OnReconnectComplete reads handler.Session for the new session, unwires keepalive from the dead session, rewires to the new session (without this the NEXT drop wouldn't trigger another reconnect -- subtle and critical), swaps Session, disposes the handler. The SDK's Session.TransferSubscriptionsOnReconnect default=true handles subscription migration internally so local MonitoredItem handles stay live across the reconnect; no driver-side manual transfer needed. Shutdown path now aborts any in-flight reconnect via _reconnectHandler.CancelReconnect() + Dispose BEFORE touching Session.CloseAsync -- without this the handler's retry loop holds a reference to the about-to-close session and fights the close, producing BadSessionIdInvalid noise in the upstream log and potential disposal-race exceptions. Cancel-first is the documented SDK pattern. Kept the driver's own HostState/OnHostStatusChanged flow: bad keep-alive -> Stopped transition + reconnect kicks off; OnReconnectComplete -> Running transition + Healthy status. Downstream consumers see the bounce as Stopped->Running without needing to know about the reconnect handler internals. Unit tests (OpcUaClientReconnectTests, 3 facts): Default_ReconnectPeriod_matches_driver_specs_5_seconds (sanity check on the options default), Options_ReconnectPeriod_is_configurable_for_aggressive_or_relaxed_retry (500ms override works), Driver_starts_with_no_reconnect_handler_active_pre_init (lazy instantiation -- indirectly via lifecycle). Wire-level disconnect-reconnect-resume coverage against a live upstream server is deferred to the in-process-fixture PR -- testing the reconnect path needs a server we can kill + revive mid-test, non-trivial to scaffold in xUnit. 54/54 OpcUaClient.Tests pass (51 prior + 3 reconnect). dotnet build clean.
2026-04-19 02:04:42 -04:00
Joseph Doherty
28328def5d
Phase 3 PR 73 -- OPC UA Client browse enrichment (DataType + AccessLevel + ValueRank + Historizing). Before this PR discovered variables always registered with DriverDataType.Int32 + SecurityClassification.ViewOnly + IsArray=false as conservative placeholders -- correct wire-format NodeId but useless downstream metadata. PR 73 adds a two-pass browse. Pass 1 unchanged shape but now collects (ParentFolder, BrowseName, DisplayName, NodeId) tuples into a pendingVariables list instead of registering each variable inline; folders still register inline. Pass 2 calls Session.ReadAsync once with (variableCount * 4) ReadValueId entries reading DataType + ValueRank + UserAccessLevel + Historizing for every variable. Server-side chunking via the SDK keeps the request shape within the server's per-request limits automatically. Attribute mapping: MapUpstreamDataType maps every standard DataTypeIds.* to a DriverDataType -- Boolean, SByte+Byte widened to Int16 (DriverDataType has no 8-bit, flagged in comment for future Core.Abstractions widening), Int16/32/64, UInt16/32/64, Float->Float32, Double->Float64, String, DateTime+UtcTime->DateTime. Unknown/vendor-custom NodeIds fall back to String -- safest passthrough for Variant-wrapped structs/enums/extension objects since the cascading-quality path preserves upstream StatusCode+timestamps regardless. MapAccessLevelToSecurityClass reads AccessLevels.CurrentWrite bit (0x02) -- when set, the variable is writable-for-this-user so it surfaces as Operate; otherwise ViewOnly. Uses UserAccessLevel not AccessLevel because UserAccessLevel is post-ACL-filter -- reflects what THIS session can actually do, not the server's default. IsArray derived from ValueRank (-1 = scalar, 0 = 1-D array, 1+ = multi-dim). IsHistorized reflects the server's Historizing flag directly so PR 76's IHistoryProvider routing can gate on it. Graceful degradation: (a) individual attribute failures (Bad StatusCode on DataType read) fall through to the type defaults, variable still registers; (b) wholesale enrichment-read failure (e.g. session dropped mid-browse) catches the exception, registers every pending variable with fallback defaults via RegisterFallback, browse completes. Either way the downstream address space is never empty when browse succeeded the first pass -- partial metadata is strictly better than missing variables. Unit tests (OpcUaClientAttributeMappingTests, 20 facts): MapUpstreamDataType theory covers 11 standard types including Boolean/Int16/UInt16/Int32/UInt32/Int64/UInt64/Float/Double/String/DateTime; separate facts for SByte+Byte (widened to Int16), UtcTime (DateTime), custom NodeId (String fallback); MapAccessLevelToSecurityClass theory covers 6 access-level bitmasks including CurrentRead-only (ViewOnly), CurrentWrite-only (Operate), read+write (Operate), HistoryRead-only (ViewOnly -- no Write bit). 51/51 OpcUaClient.Tests pass (31 prior + 20 new). dotnet build clean. Pending variables structured as a private readonly record struct so the ref-type allocation is stack-local for typical browse sizes. Paves the way for PR 74 SessionReconnectHandler (same enrichment path is re-runnable on reconnect) + PR 76 IHistoryProvider (gates on IsHistorized).
2026-04-19 02:00:31 -04:00
Joseph Doherty
24435712c4
Phase 3 PR 72 -- Multi-endpoint failover for OPC UA Client driver. Adds OpcUaClientDriverOptions.EndpointUrls ordered list + PerEndpointConnectTimeout knob. On InitializeAsync the driver walks the candidate list in order via ResolveEndpointCandidates and returns the session from the first endpoint that successfully connects. Captures per-URL failure reasons in a List<string> and, if every candidate fails, throws AggregateException whose message names every URL + its failure class (e.g. 'opc.tcp://primary:4840 -> TimeoutException: ...'). That's critical diag for field debugging -- without it 'failover picked the wrong one' surfaces as a mystery. Single-URL backwards compat: EndpointUrl field retained as a one-URL shortcut. When EndpointUrls is null or empty the driver falls through to a single-candidate list of [EndpointUrl], so every existing single-endpoint config keeps working without migration. When both are provided, EndpointUrls wins + EndpointUrl is ignored -- documented on the field xml-doc. Per-endpoint connect budget: PerEndpointConnectTimeout (default 3s) caps each attempt so a sweep over several dead servers can't blow the overall init budget. Applied via CancellationTokenSource.CreateLinkedTokenSource + CancelAfter inside OpenSessionOnEndpointAsync (the extracted single-endpoint connect helper) so the cap is independent of the outer Options.Timeout which governs steady-state ops. BuildUserIdentity extracted out of InitializeAsync so the failover loop builds the UserIdentity ONCE and reuses it across every endpoint attempt -- generating it N times would re-unlock the user cert's private key N times, wasteful + keeps the password in memory longer. HostName now reflects the endpoint that actually connected via _connectedEndpointUrl instead of always returning opts.EndpointUrl -- so the Admin /hosts dashboard shows which of the configured endpoints is currently serving traffic (primary vs backup). Falls back to the first candidate pre-connect so the dashboard has a sensible identity before the first connect, and resets to null on ShutdownAsync. Use case: an OPC UA hot-standby server pair (primary 4840 + backup 4841) where either can serve the same address space. Operator configures EndpointUrls=[primary, backup]; driver tries primary first, falls over to backup on primary failure with a clean AggregateException describing both attempts if both are down. Unit tests (OpcUaClientFailoverTests, 5 facts): ResolveEndpointCandidates_prefers_EndpointUrls_when_provided (list trumps single), ResolveEndpointCandidates_falls_back_to_single_EndpointUrl_when_list_empty (legacy config compat), ResolveEndpointCandidates_empty_list_treated_as_fallback (explicit empty list also falls back -- otherwise we'd produce a zero-candidate sweep that throws with nothing tried), HostName_uses_first_candidate_before_connect (dashboard rendering pre-connect), Initialize_against_all_unreachable_endpoints_throws_AggregateException_listing_each (three loopback dead ports, asserts each URL appears in the aggregate message + driver flips to Faulted). 31/31 OpcUaClient.Tests pass. dotnet build clean. OPC UA Client driver security/auth/availability feature set now complete per driver-specs.md \u00A78: policy-filtered endpoint selection (PR 70), Anonymous+Username+Certificate auth (PR 71), multi-endpoint failover (this PR).
2026-04-19 01:52:31 -04:00
Joseph Doherty
a79c5f3008
Phase 3 PR 71 -- OpcUaAuthType.Certificate user authentication. Implements the third user-token type in the OPC UA spec (Anonymous + UserName + Certificate). Before this PR the Certificate branch threw NotSupportedException. Adds OpcUaClientDriverOptions.UserCertificatePath + UserCertificatePassword knobs for the PFX on disk. The InitializeAsync user-identity switch now calls BuildCertificateIdentity for AuthType=Certificate. Load path uses X509CertificateLoader.LoadPkcs12FromFile -- the non-obsolete .NET 9+ API; the legacy X509Certificate2 PFX ctors are deprecated on net10. Validation up-front: empty UserCertificatePath throws InvalidOperationException naming the missing field; non-existent file throws FileNotFoundException with path; private-key-missing throws InvalidOperationException explaining the private key is required to sign the OPC UA user-token challenge at session activation. Each failure mode is an operator-actionable config problem rather than a mysterious ServiceResultException during session open. UserIdentity(X509Certificate2) ctor carries the cert directly; the SDK sets TokenType=Certificate + wires the cert's public key into the activate-session payload. Private key stays in-memory on the OpenSSL / .NET crypto boundary. Unit tests (OpcUaClientCertAuthTests, 3 facts): BuildCertificateIdentity_rejects_missing_path (error message mentions UserCertificatePath so the fix is obvious); BuildCertificateIdentity_rejects_nonexistent_file (FileNotFoundException); BuildCertificateIdentity_loads_a_valid_PFX_with_private_key -- generates a self-signed RSA-2048 cert on the fly with CertificateRequest.CreateSelfSigned, exports to temp PFX with a password, loads it through the helper, asserts TokenType=Certificate. Test cleans up the temp file in a finally block (best-effort; Windows file locking can leave orphans which is acceptable for %TEMP%). Self-signed cert-on-the-fly avoids shipping a static test PFX that could be flagged by secret-scanners and keeps the test hermetic across dev boxes. 26/26 OpcUaClient.Tests pass (23 prior + 3 cert auth). dotnet build clean. Feature: Anonymous + Username + Certificate all work -- driver-specs.md \u00A78 auth story complete.
2026-04-19 01:47:18 -04:00
Joseph Doherty
a65215684c
Phase 3 PR 70 -- Apply SecurityPolicy explicitly + expand to standard OPC UA policy list. Before this PR SecurityPolicy was a string field that got ignored -- the driver only passed useSecurity=SecurityMode!=None to SelectEndpointAsync, so an operator asking for Basic256Sha256 on a server that also advertised Basic128Rsa15 could silently end up on the weaker cipher (the SDK's SelectEndpoint returns whichever matching endpoint the server listed first). PR 70 makes policy matching explicit. SecurityPolicy is now an OpcUaSecurityPolicy enum covering the six standard policies documented in OPC UA 1.04: None, Basic128Rsa15 (deprecated, brownfield interop only), Basic256 (deprecated), Basic256Sha256 (recommended baseline), Aes128_Sha256_RsaOaep, Aes256_Sha256_RsaPss. Each maps through MapSecurityPolicy to the SecurityPolicies URI constant the SDK uses for endpoint matching. New SelectMatchingEndpointAsync replaces CoreClientUtils.SelectEndpointAsync. Flow: opens a DiscoveryClient via the non-obsolete DiscoveryClient.CreateAsync(ApplicationConfiguration, Uri, DiagnosticsMasks, ct) path, calls GetEndpointsAsync to enumerate every endpoint the server advertises, filters client-side by policy URI AND mode. When no endpoint matches, throws InvalidOperationException with the full list of what the server DID advertise formatted as 'Policy/Mode' pairs so the operator sees exactly what to fix in their config without a Wireshark trace. Fail-loud behaviour intentional -- a silent fall-through to weaker crypto is worse than a clear config error. MapSecurityPolicy is internal-visible to tests via InternalsVisibleTo from PR 66. Unit tests (OpcUaClientSecurityPolicyTests, 5 facts): MapSecurityPolicy_returns_known_non_empty_uri_for_every_enum_value theory covers all 6 policies; URI contains the enum name for non-None so operators can grep logs back to the config value; MapSecurityPolicy_None_matches_SDK_None_URI, MapSecurityPolicy_Basic256Sha256_matches_SDK_URI, MapSecurityPolicy_Aes256_Sha256_RsaPss_matches_SDK_URI all cross-check against the SDK's SecurityPolicies.* constants to catch a future enum-vs-URI drift; Every_enum_value_has_a_mapping walks Enum.GetValues to ensure adding a new case doesn't silently fall through the switch. Scaffold test updated to assert SecurityPolicy default = None (was previously unchecked). 23/23 OpcUaClient.Tests pass (13 prior + 5 scaffold + 5 new policy). dotnet build clean. Note on DiscoveryClient: the synchronous DiscoveryClient.Create(...) overloads are all [Obsolete] in SDK 1.5.378; must use DiscoveryClient.CreateAsync. GetEndpointsAsync(null, ct) returns EndpointDescriptionCollection directly (not a wrapper).
2026-04-19 01:44:07 -04:00
Joseph Doherty
0433d3a35e
Phase 3 PR 69 -- OPC UA Client ISubscribable + IHostConnectivityProbe. Completes the OpcUaClientDriver capability surface — now matches the Galaxy + Modbus + S7 driver coverage. ISubscribable: SubscribeAsync creates a new upstream Subscription via the non-obsolete Subscription(ITelemetryContext, SubscriptionOptions) ctor + AddItem/CreateItemsAsync flow, wires each MonitoredItem's Notification event into OnDataChange. Tag strings round-trip through MonitoredItem.Handle so the notification handler can identify which tag changed without a second lookup. Publishing interval floored at 50ms (servers negotiate up anyway; sub-50ms wastes round-trip). SubscriptionOptions uses KeepAliveCount=10, LifetimeCount=1000, TimestampsToReturn=Both so SourceTimestamp passthrough for the cascading-quality rule works through subscription paths too. UnsubscribeAsync calls Subscription.DeleteAsync(silent:true) and tolerates unknown handles (returns cleanly) because the caller's race with server-side cleanup after a session drop shouldn't crash either side. Session shutdown explicitly deletes every remote subscription before closing — avoids BadSubscriptionIdInvalid noise in the upstream server's log on Close. IHostConnectivityProbe: HostName surfaced as the EndpointUrl (not host:port like the Modbus/S7 drivers) so the Admin /hosts dashboard can render the full opc.tcp:// URL as a clickable target back at the remote server. HostState tracked via session.KeepAlive event — OPC UA's built-in keep-alive is authoritative for session liveness (the SDK pings on KeepAliveInterval, sets KeepAliveStopped after N missed pings), strictly better than a driver-side polling probe: no extra wire round-trip, no duplicate semantic with the native protocol. Handler transitions Running on healthy keep-alives and Stopped on any Bad service-result. Initial Running raised at end of InitializeAsync once the session is up; Shutdown transitions back to Unknown + unwires the handler. Unit tests (OpcUaClientSubscribeAndProbeTests, 3 facts): SubscribeAsync_without_initialize_throws_InvalidOperationException, UnsubscribeAsync_with_unknown_handle_is_noop (session-drop-race safety), GetHostStatuses_returns_endpoint_url_row_pre_init (asserts EndpointUrl as the host identity -- the full opc.tcp://plc.example:4840 URL). Live-session subscribe/unsubscribe round-trip + keep-alive state transition coverage lands in a follow-up PR once we scaffold the in-process OPC UA server fixture. 13/13 OpcUaClient.Tests pass. dotnet build clean. All six capability interfaces (IDriver / ITagDiscovery / IReadable / IWritable / ISubscribable / IHostConnectivityProbe) implemented — OPC UA Client driver surface complete.
2026-04-19 01:22:14 -04:00
Joseph Doherty
db56a95819
Phase 3 PR 68 -- OPC UA Client ITagDiscovery via recursive browse (Full strategy). Adds ITagDiscovery to OpcUaClientDriver. DiscoverAsync opens a single Remote folder on the IAddressSpaceBuilder and recursively browses from the configured root (default: ObjectsFolder i=85; override via OpcUaClientDriverOptions.BrowseRoot for scoped discovery). Browse uses non-obsolete Session.BrowseAsync(RequestHeader, ViewDescription, uint maxReferences, BrowseDescriptionCollection, ct) with HierarchicalReferences forward, subtypes included, NodeClassMask Object+Variable, ResultMask pulling BrowseName + DisplayName + NodeClass + TypeDefinition. Objects become sub-folders via builder.Folder; Variables become builder.Variable entries with FullName set to the NodeId.ToString() serialization so IReadable/IWritable can round-trip without re-resolving. Three safety caps added to OpcUaClientDriverOptions to bound runaway discovery: (1) MaxBrowseDepth default 10 -- deep enough for realistic OPC UA information models, shallow enough that cyclic graphs can't spin the browse forever. (2) MaxDiscoveredNodes default 10_000 -- caps memory on pathological remote servers. Once the cap is hit, recursion short-circuits and the partially-discovered tree is still projected into the local address space (graceful degradation rather than all-or-nothing). (3) BrowseRoot as an opt-in scope restriction string per driver-specs.md \u00A78 -- defaults to ObjectsFolder but operators with 100k-node servers can point it at a single subtree. Visited-set tracks NodeIds already visited to prevent infinite cycles on graphs with non-strict hierarchy (OPC UA models can have back-references). Transient browse failures on a subtree are swallowed -- the sub-branch stops but the rest of discovery continues, matching the Modbus driver's 'transient poll errors don't kill the loop' pattern. The driver's health surface reflects the network-level cascade via the probe loop (PR 69). Deferred to a follow-up PR: DataType resolution via a batch Session.ReadAsync(Attributes.DataType) after the browse so DriverAttributeInfo.DriverDataType is accurate instead of the current conservative DriverDataType.Int32 default; AccessLevel-derived SecurityClass instead of the current ViewOnly default; array-type detection via Attributes.ValueRank + ArrayDimensions. These need an extra wire round-trip per batch of variables + a NodeId -> DriverDataType mapping table; out of scope for PR 68 to keep browse path landable. Unit tests (OpcUaClientDiscoveryTests, 3 facts): DiscoverAsync_without_initialize_throws_InvalidOperationException (pre-init hits RequireSession); DiscoverAsync_rejects_null_builder (ArgumentNullException); Discovery_caps_are_sensible_defaults (asserts 10000 / 10 / null defaults documented above). NullAddressSpaceBuilder stub implements the full IAddressSpaceBuilder shape including IVariableHandle.MarkAsAlarmCondition (throws NotSupportedException since this PR doesn't wire alarms). Live-browse coverage against a real remote server is deferred to the in-process-server-fixture PR. 10/10 OpcUaClient.Tests pass. dotnet build clean.
2026-04-19 01:17:21 -04:00
Joseph Doherty
238748bc98
Phase 3 PR 67 -- OPC UA Client IReadable + IWritable via Session.ReadAsync/WriteAsync. Adds IReadable + IWritable capabilities to OpcUaClientDriver, routing reads/writes through the session's non-obsolete ReadAsync(RequestHeader, maxAge, TimestampsToReturn, ReadValueIdCollection, ct) and WriteAsync(RequestHeader, WriteValueCollection, ct) overloads (the sync and BeginXxx/EndXxx patterns are all [Obsolete] in SDK 1.5.378). Serializes on the shared Gate from PR 66 so reads + writes + future subscribe + probe don't race on the single session. NodeId parsing: fullReferences use OPC UA's standard serialized NodeId form -- ns=2;s=Demo.Counter, i=2253, ns=4;g=... for GUID, ns=3;b=... for opaque. TryParseNodeId calls NodeId.Parse with the session's MessageContext which honours the server-negotiated namespace URI table. Malformed input surfaces as BadNodeIdInvalid (0x80330000) WITHOUT a wire round-trip -- saves a request for a fault the driver can detect locally. Cascading-quality implementation per driver-specs.md \u00A78: upstream StatusCode, SourceTimestamp, and ServerTimestamp pass through VERBATIM. Bad codes from the remote server stay as the same Bad code (not translated to generic BadInternalError) so downstream clients can distinguish 'upstream value unavailable' from 'local driver bug'. SourceTimestamp is preserved verbatim (null on MinValue guard) so staleness is visible; ServerTimestamp falls back to DateTime.UtcNow if the upstream omitted it, never overwriting a non-zero value. Wire-level exceptions in the Read batch -- transport / timeout / session-dropped -- fan out BadCommunicationError (0x80050000) across every tag in the batch, not BadInternalError, so operators distinguish network reachability from driver faults. Write-side same pattern: successful WriteAsync maps each upstream StatusCode.Code verbatim into the local WriteResult.StatusCode; transport-layer failure fans out BadCommunicationError across the whole batch. WriteValue carries AttributeId=Value + DataValue wrapping Variant(writeValue) -- the SDK handles the type-to-Variant mapping for common CLR types (bool, int, float, string, etc.) so the driver doesn't need a per-type switch. Name disambiguation: the SDK has its own Opc.Ua.WriteRequest type which collides with ZB.MOM.WW.OtOpcUa.Core.Abstractions.WriteRequest; method signature uses the fully-qualified Core.Abstractions.WriteRequest. Unit tests (OpcUaClientReadWriteTests, 2 facts): ReadAsync_without_initialize_throws_InvalidOperationException + WriteAsync_without_initialize_throws_InvalidOperationException -- pre-init calls hit RequireSession and fail uniformly. Wire-level round-trip coverage against a live remote server lands in a follow-up PR once we scaffold an in-process OPC UA server fixture (the existing Server project in the solution is a candidate host). 7/7 OpcUaClient.Tests pass (5 scaffold + 2 read/write). dotnet build clean. Scope: ITagDiscovery (browse) + ISubscribable + IHostConnectivityProbe remain deferred to PRs 68-69 which also need namespace-index remapping and reference-counted MonitoredItem forwarding per driver-specs.md \u00A78.
2026-04-19 01:13:34 -04:00
Joseph Doherty
91eaf534c8
Phase 3 PR 66 -- OPC UA Client (gateway) driver project scaffold + IDriver session lifecycle. First driver that CONSUMES OPC UA rather than PUBLISHES it -- connects to a remote server and re-exposes its address space through the local OtOpcUa server per driver-specs.md \u00A78. Uses the same OPCFoundation.NetStandard.Opc.Ua.Client package the existing Client.Shared ships (bumped to 1.5.378.106 to match). Builds its own ApplicationConfiguration (cert stores under %LocalAppData%/OtOpcUa/pki so multiple driver instances in one OtOpcUa server process share a trust anchor) rather than reusing Client.Shared -- Client.Shared is oriented at the interactive CLI with different session-lifetime needs (this driver is always-on, needs keep-alive + session transfer on reconnect + multi-year uptime). Navigated the post-refactor 1.5.378 SDK surface: every Session.Create* static is now [Obsolete] in favour of DefaultSessionFactory; CoreClientUtils.SelectEndpoint got the sync overloads deprecated in favour of SelectEndpointAsync with a required ITelemetryContext parameter. Driver passes telemetry: null! to both SelectEndpointAsync + new DefaultSessionFactory(telemetry: null!) -- the SDK's internal default sink handles null gracefully and plumbing a telemetry context through the driver options surface is out of scope (the driver emits its own logs via the DriverHealth surface anyway). ApplicationInstance default ctor is also obsolete; wrapped in #pragma warning disable CS0618 rather than migrate to the ITelemetryContext overload for the same reason. OpcUaClientDriverOptions models driver-specs.md \u00A78 settings: EndpointUrl (default opc.tcp://localhost:4840 IANA-assigned port), SecurityPolicy/SecurityMode/AuthType enums, Username/Password, SessionTimeout=120s + KeepAliveInterval=5s + ReconnectPeriod=5s (defaults from spec), AutoAcceptCertificates=false (production default; dev turns on for self-signed servers), ApplicationUri + SessionName knobs for certificate SAN matching and remote-server session-list identification. OpcUaClientDriver : IDriver: InitializeAsync builds the ApplicationConfiguration, resolves + creates cert if missing via app.CheckApplicationInstanceCertificatesAsync, selects endpoint via CoreClientUtils.SelectEndpointAsync, builds UserIdentity (Anonymous or Username with UTF-8-encoded password bytes -- the legacy string-password ctor went away; Certificate auth deferred), creates session via DefaultSessionFactory.CreateAsync. Health transitions Unknown -> Initializing -> Healthy on success or -> Faulted on failure with best-effort Session.CloseAsync cleanup. ShutdownAsync (async now, not Task.CompletedTask) closes the session + disposes. Internal Session + Gate expose to the test project via InternalsVisibleTo so PRs 67-69 can stack read/write/discovery/subscribe on the same serialization. Scaffold tests (OpcUaClientDriverScaffoldTests, 5 facts): Default_options_target_standard_opcua_port_and_anonymous_auth (4840 + None mode + Anonymous + AutoAccept=false production default), Default_timeouts_match_driver_specs_section_8 (120s/5s/5s), Driver_reports_type_and_id_before_connect (DriverType=OpcUaClient, DriverInstanceId round-trip, pre-init Unknown health), Initialize_against_unreachable_endpoint_transitions_to_Faulted_and_throws, Reinitialize_against_unreachable_endpoint_re_throws. Uses opc.tcp://127.0.0.1:1 as the 'guaranteed-unreachable' target -- RFC 5737 reserved IPs get black-holed and time out only after the SDK's internal retry/backoff fully elapses (~60s), while port 1 on loopback refuses immediately with TCP RST which keeps the test suite snappy (5 tests / 8s). 5/5 pass. dotnet build clean. Scope boundary: ITagDiscovery / IReadable / IWritable / ISubscribable / IHostConnectivityProbe deliberately NOT in this PR -- they need browse + namespace remapping + reference-counted MonitoredItem forwarding + keep-alive probing and land in PRs 67-69.
2026-04-19 01:07:57 -04:00
Joseph Doherty
d8ef35d5bd
Phase 3 PR 65 -- S7 ITagDiscovery + ISubscribable polling overlay + IHostConnectivityProbe. Three more capability interfaces on S7Driver, matching the Modbus driver's capability coverage. ITagDiscovery: DiscoverAsync streams every configured tag into IAddressSpaceBuilder under a single 'S7' folder; builder.Variable gets a DriverAttributeInfo carrying DriverDataType (MapDataType: Bool->Boolean, Byte/Int/UInt sizes->Int32 (until Core.Abstractions adds widths), Float32/Float64 direct, String + DateTime direct), SecurityClass (Operate if tag.Writable else ViewOnly -- matches the Modbus pattern so DriverNodeManager's ACL layer can gate writes per role without S7-specific logic), IsHistorized=false (S7 has no native historian surface), IsAlarm=false (S7 alarms land through TIA Portal's alarm-in-DB pattern which is per-site and out of scope for PR 65). ISubscribable polling overlay: same pattern Modbus established in PR 22. SubscribeAsync spawns a Task.Run loop that polls every tag, diffs against LastValues, raises OnDataChange on changes plus a force-raise on initial-data push per OPC UA Part 4 convention. Interval floored at 100ms -- S7 CPUs scan 2-10ms but process the comms mailbox at most once per scan, so sub-scan polling just queues wire-side with worse latency per S7netplus documented pattern. Poll errors tolerated: first-read fault doesn't kill the loop (caller can't receive initial values but subsequent polls try again); transient poll errors also swallowed so the loop survives a power-cycle + reconnect through the health surface. UnsubscribeAsync cancels the CTS + removes the subscription -- unknown handle is a no-op, not a throw, because the caller's race with server-side cleanup shouldn't crash either side. Shutdown tears down every subscription before disposing the Plc. IHostConnectivityProbe: HostName surfaced as host:port to match Modbus driver convention (Admin /hosts dashboard renders both families uniformly). GetHostStatuses returns one row (single-endpoint driver). ProbeLoopAsync serializes on the shared Gate + calls Plc.ReadStatusAsync (cheap Get-CPU-Status PDU that doubles as an 'is PLC up' check) every Probe.Interval with a Probe.Timeout cap, transitions HostState Unknown/Stopped -> Running on success and -> Stopped on any failure, raises OnHostStatusChanged only on actual transitions (no noise for steady-state probes). Probe loop starts at end of InitializeAsync when Probe.Enabled=true (default); Shutdown cancels the probe CTS. Initial state stays Unknown until first successful probe -- avoids broadcasting a premature Running before any PDU round-trip has happened. Unit tests (S7DiscoveryAndSubscribeTests, 4 facts): DiscoverAsync_projects_every_tag_into_the_address_space (3 tags + mixed writable/read-only -> Operate vs ViewOnly asserted), GetHostStatuses_returns_one_row_with_host_port_identity_pre_init, SubscribeAsync_returns_unique_handles_and_UnsubscribeAsync_accepts_them (diagnosticId uniqueness + idempotent double-unsubscribe), Subscribe_publishing_interval_is_floored_at_100ms (accepts 50ms request without throwing -- floor is applied internally). Uses a RecordingAddressSpaceBuilder stub that implements IVariableHandle.FullReference + MarkAsAlarmCondition (throws NotImplementedException since the S7 driver never calls it -- alarms out of scope). 57/57 S7 unit tests pass. dotnet build clean. All 5 capability interfaces (IDriver/ITagDiscovery/IReadable/IWritable/ISubscribable/IHostConnectivityProbe) now implemented -- the S7 driver surface is on par with the Modbus driver, minus the extended data types (Int64/UInt64/Float64/String/DateTime deferred per PR 64).
2026-04-19 00:16:10 -04:00
Joseph Doherty
394d126b2e
Phase 3 PR 64 -- S7 IReadable + IWritable via S7.Net string-based Plc.ReadAsync/WriteAsync. Adds IReadable + IWritable capability interfaces to S7Driver, routing reads/writes through S7netplus's string-address API (Plc.ReadAsync(string, ct) / Plc.WriteAsync(string, object, ct)). All operations serialize on the class's SemaphoreSlim Gate because S7netplus mandates one Plc connection per PLC with client-side serialization -- parallel reads against a single S7 CPU queue wire-side anyway and just eat connection-resource budget. Supported data types in this PR: Bool, Byte, Int16, UInt16, Int32, UInt32, Float32. S7.Net's string-based read returns UNSIGNED boxed values (DBX=bool, DBB=byte, DBW=ushort, DBD=uint); the driver reinterprets them into the requested S7DataType via the (DataType, Size, raw) switch: unchecked short-cast for Int16, unchecked int-cast for Int32, BitConverter.UInt32BitsToSingle for Float32. Writes inverse the conversion -- Int16 -> unchecked ushort cast, Int32 -> unchecked uint cast, Float32 -> BitConverter.SingleToUInt32Bits -- before handing to S7.Net's WriteAsync. This avoids a second PLC round-trip that a typed ReadAsync(DataType, db, offset, VarType, ...) overload would need. Int64, UInt64, Float64, String, DateTime throw NotSupportedException (-> BadNotSupported StatusCode); S7 STRING has non-trivial header semantics + LReal/DateTime need typed S7.Net API paths, both land in a follow-up PR when scope demands. InitializeAsync now parses every tag's Address string via S7AddressParser at init time. Bad addresses throw FormatException and flip health to Faulted -- callers can't register a broken driver. The parsed form goes into _parsedByName so Read/Write can consult Size/BitOffset without re-parsing per operation. StatusCode mapping in catch chain: unknown tag name -> BadNodeIdUnknown (0x80340000), unsupported data type -> BadNotSupported (0x803D0000), read-only tag write attempt -> BadNotWritable (0x803B0000), S7.Net PlcException (carries PUT/GET-disabled signal on S7-1200/1500) -> BadDeviceFailure (0x80550000) so operators see a TIA-Portal config problem rather than a transient-fault false flag per driver-specs.md \u00A75, any other runtime exception on read -> BadCommunicationError (0x80050000) to distinguish socket/timeout from tag-level faults. Write generic-exception path stays BadInternalError because write failures can legitimately be driver-side value-range problems. Unit tests (S7DriverReadWriteTests, 3 facts): Initialize_rejects_invalid_tag_address_and_fails_fast -- Tags with a malformed address must throw at InitializeAsync rather than producing a half-healthy driver; ReadAsync_without_initialize_throws_InvalidOperationException + WriteAsync_without_initialize_throws_InvalidOperationException -- pre-init calls hit RequirePlc and throw the uniform 'not initialized' message. Wire-level round-trip coverage (integration test against a live S7-1500 or a mock S7 server) is deferred -- S7.Net doesn't ship an in-process fake and a conformant mock is non-trivial. 53/53 Modbus.Driver.S7.Tests pass (50 parser + 3 read/write). dotnet build clean.
2026-04-19 00:10:41 -04:00
Joseph Doherty
d5034c40f7
Phase 3 PR 63 -- S7AddressParser for DB/M/I/Q/T/C address strings. Adds S7AddressParser + S7ParsedAddress + S7Area + S7Size to the Driver.S7 project. Grammar follows driver-specs.md \u00A75 + Siemens TIA Portal / STEP 7 Classic convention: (1) Data blocks: DB{n}.DB{X|B|W|D}{offset}[.bit] where X=bit (requires .bit suffix 0-7), B=byte, W=word (16-bit), D=dword (32-bit). (2) Merkers: MB{n}, MW{n}, MD{n}, or M{n}.{bit} for bit access. (3) Inputs + Outputs: same {B|W|D} prefix or {n}.{bit} pattern as M. (4) Timers: T{n}. (5) Counters: C{n}. Output is an immutable S7ParsedAddress record struct with Area (DataBlock / Memory / Input / Output / Timer / Counter), DbNumber (only meaningful for DataBlock), Size (Bit / Byte / Word / DWord), ByteOffset (also timer/counter number when Area is Timer/Counter), BitOffset (0-7 for Size=Bit; 0 otherwise). Case-insensitive via ToUpperInvariant, whitespace trimmed on entry. Parse throws FormatException with the offending input echoed in the message; TryParse returns bool for config-validation callers that can't afford exceptions (e.g. Admin UI tag-editor live validation). Strict rejection policy -- 16 garbage cases covered in the theory test: empty/whitespace input, unknown area letter (Z0), DB without number/tail, DB bit size without .bit suffix, bit offset 8+, word/dword with .bit suffix, DB number 0 (must be >=1), non-numeric DB number, unknown size letter (Q), M without offset, M bit access without .bit, bit 8, negative offset, non-digit offset, non-numeric timer. Strict rejection surfaces config errors at driver-init time rather than as BadInternalError on every Read against the bad tag. No driver code wires through yet -- PR 64 is where IReadable/IWritable consume S7ParsedAddress and translate into S7netplus Plc.ReadAsync calls (the S7.Net address grammar is a strict subset of what we accept, and the parser's S7ParsedAddress is the bridge). Unit tests (S7AddressParserTests, 50 facts): parse-valid theories for DB/M/I/Q/T/C covering all size variants + edge bit offsets 0 and 7; case-insensitive + whitespace-trim theory; reject-invalid theory with 16 garbage cases; TryParse round-trip for valid and invalid inputs. 50/50 pass, dotnet build clean.
2026-04-19 00:06:24 -04:00
Joseph Doherty
0575280a3b
Phase 3 PR 62 -- Siemens S7 native driver project scaffold (S7comm via S7netplus). First non-Modbus in-process driver. Creates src/ZB.MOM.WW.OtOpcUa.Driver.S7 (.NET 10, x64 -- S7netplus is managed, no bitness constraint like MXAccess) + tests/ZB.MOM.WW.OtOpcUa.Driver.S7.Tests + slnx entries. Depends on S7netplus 0.20.0 which is the latest version on NuGet resolvable in this cache (0.21.0 per driver-specs.md is not yet published; 0.20.0 covers the same Plc+CpuType+ReadAsync surface). S7DriverOptions captures the connection settings documented in driver-specs.md \u00A75: Host, Port (default 102 ISO-on-TCP), CpuType (default S71500 per most-common deployment), Rack=0, Slot=0 (S7-1200/1500 onboard PN convention; S7-300/400 operators must override to slot 2 or 3), Timeout=5s, Tags list + Probe settings with default MW0 probe address. S7TagDefinition uses S7.Net-style address strings (DB1.DBW0, M0.0, I0.0, QD4) with an S7DataType enum (Bool, Byte, Int16, UInt16, Int32, UInt32, Int64, UInt64, Float32, Float64, String, DateTime -- the full type matrix from the spec); StringLength defaults to 254 (S7 STRING max). S7Driver implements the IDriver-only subset per the PR plan: InitializeAsync opens a managed Plc with the configured CpuType + Host + Rack + Slot, pins WriteTimeout / ReadTimeout on the underlying TcpClient, awaits Plc.OpenAsync with a linked CTS bounded by Options.Timeout so the ISO handshake itself respects the configured bound; health transitions Unknown -> Initializing -> Healthy on success or Unknown -> Initializing -> Faulted on handshake failure, with a best-effort Plc.Close() on the faulted path so retries don't leak the TcpClient. ShutdownAsync closes the Plc and flips health back to Unknown. DisposeAsync routes through ShutdownAsync + disposes the SemaphoreSlim. Internal Gate + Plc accessors are exposed to the test project (InternalsVisibleTo) so PRs 63-65 can stack read/write/subscribe on the same serialization semaphore per the S7netplus documented 'one Plc per PLC, SemaphoreSlim-serialized' pattern. ITagDiscovery, IReadable, IWritable, ISubscribable, IHostConnectivityProbe are all deliberately omitted from this PR -- they depend on the S7AddressParser (PR 63) and land sequenced in PRs 64-65. Unit tests (S7DriverScaffoldTests, 5 facts): default options target S7-1500 / port 102 / slot 0, default probe interval 5s, tag defaults to writable with StringLength 254, driver reports DriverType=S7 + Unknown health pre-init, Initialize against RFC-5737 reserved IP 192.0.2.1 with 250ms timeout transitions to Faulted and throws (tests the connect-failure path doesn't leave the driver in an ambiguous state). 5/5 pass. dotnet build ZB.MOM.WW.OtOpcUa.slnx: 0 errors. No regression in Modbus / Galaxy suites. PR 63 ships S7AddressParser next, PR 64 wires IReadable/IWritable over S7netplus, PR 65 adds discovery + polling-overlay subscribe + probe.
2026-04-19 00:03:09 -04:00
Joseph Doherty
d4c1873998
Phase 3 PR 59 -- MelsecAddress helper for MELSEC X/Y hex-vs-octal family trap + D/M bank bases. Adds MelsecAddress static class with XInputToDiscrete, YOutputToCoil, MRelayToCoil, DRegisterToHolding helpers and a MelsecFamily enum {Q_L_iQR, F_iQF} that drives whether X/Y addresses are parsed as hex (Q-series convention) or octal (FX-series convention). This is the #1 MELSEC driver bug source per docs/v2/mitsubishi.md: the string 'X20' on a MELSEC-Q means DI 32 (hex 0x20) while the same string on an FX3U means DI 16 (octal 0o20). The helper forces the caller to name the family explicitly; no 'sensible default' because wrong defaults just move the bug. Key design decisions: (1) Family is an enum argument, not a helper-level static-selector, because real deployments have BOTH Q-series and FX-series PLCs on the same gateway -- one driver instance per device means family must be per-tag, not per-driver. (2) Bank base is a ushort argument defaulting to 0. Real QJ71MT91/LJ71MT91 assignment blocks commonly place X at DI 8192+, Y at coil 8192+, etc. to leave the low-address range for D-registers; the helper takes the site's configured base as runtime config rather than a compile-time constant. Matches the 'driver opt-in per tag' pattern DirectLogicAddress established for DL260. (3) M-relay and D-register are DECIMAL on every MELSEC family -- docs explicitly; the MELSEC confusion is only about X/Y, not about data registers or internal relays. Helpers reject non-numeric M/D addresses and honor bank bases the same way. (4) Parser walks digits manually for both hex and octal (instead of int.Parse with NumberStyles) so non-hex / non-octal characters give a clear ArgumentException with the offending char + family name. Prevents a subtle class of bugs where int.Parse('X20', Hex) silently returns 32 even for F_iQF callers. Unit tests (MelsecAddressTests, 34 facts): XInputToDiscrete_QLiQR_parses_hex theory (X0, X9, XA, XF, X10, X20, X1FF + lowercase); XInputToDiscrete_FiQF_parses_octal theory (X0, X7, X10, X20, X777); YOutputToCoil equivalents; Same_address_string_decodes_differently_between_families (the headline trap, X20 => 32 on Q vs 16 on FX); reject-non-octal / reject-non-hex / reject-empty / overflow facts; honors-bank-base for X and M and D. 176/176 Modbus.Tests pass (143 prior + 34 new Melsec). No driver core changes -- this is purely a new helper class in the Driver.Modbus project. PR 60 wires it into integration tests against the mitsubishi pymodbus profile.
2026-04-18 23:04:52 -04:00
Joseph Doherty
793c787315
Phase 3 PR 53 -- Transport reconnect-on-drop + SO_KEEPALIVE for DL205 no-keepalive quirk. AutomationDirect H2-ECOM100 does NOT send TCP keepalives per docs/v2/dl205.md behavioral-oddities section -- any NAT/firewall device between the gateway and the PLC can silently close an idle socket after 2-5 minutes of inactivity. The PLC itself never notices and the first SendAsync after the drop would previously surface as IOException / EndOfStreamException / SocketException to the caller even though the PLC is perfectly healthy. PR 53 makes ModbusTcpTransport survive mid-session socket drops: SendAsync wraps the previous body as SendOnceAsync; on the first attempt, if the failure is a socket-layer error (IOException, SocketException, EndOfStreamException, ObjectDisposedException) AND autoReconnect is enabled (default true), the transport tears down the dead socket, calls ConnectAsync to re-establish, and resends the PDU exactly once. Deliberately single-retry -- further failures propagate so the driver health surface reflects the real state, no masking a dead PLC. Protocol-layer failures (e.g. ModbusException with exception code 02) are specifically NOT caught by the reconnect path -- they would just come back with the same exception code after the reconnect, so retrying is wasted wire time. Socket-level vs protocol-level is a discriminator inside IsSocketLevelFailure. Also enables SO_KEEPALIVE on the TcpClient with aggressive timing: TcpKeepAliveTime=30s, TcpKeepAliveInterval=10s, TcpKeepAliveRetryCount=3. Total time-to-detect-dead-socket = 30 + 10*3 = 60s, vs the Windows default 2-hour idle + 9 retries = 2h40min. Best-effort: older OSes that don't expose the fine-grained keepalive knobs silently skip them (catch {}). New ModbusDriverOptions.AutoReconnect bool (default true) threads through to the default transport factory in ModbusDriver -- callers wanting the old 'fail loud on drop' behavior can set AutoReconnect=false, or use a custom transportFactory that ignores the option. Unit tests: ModbusTcpReconnectTests boots a FlakeyModbusServer in-process (real TcpListener on loopback) that serves one valid FC03 response then forcibly shuts down the socket. Transport_recovers_from_mid_session_drop_and_retries_successfully issues two consecutive SendAsync calls and asserts both return valid PDUs -- the second must trigger the reconnect path transparently. Transport_without_AutoReconnect_propagates_drop_to_caller asserts the legacy behavior when the opt-out is taken. Validates real socket semantics rather than mocked exceptions. 142/142 Modbus.Tests pass (113 prior + 2 mapper + 2 reconnect + 25 accumulated across PRs 45-52); 11/11 DL205 integration tests still pass with MODBUS_SIM_PROFILE=dl205 -- no regression from the transport change.
2026-04-18 22:32:13 -04:00
Joseph Doherty
cde018aec1
Phase 3 PR 52 -- Modbus exception-code -> OPC UA StatusCode translation. Before this PR every server-side Modbus exception AND every transport-layer failure collapsed to BadInternalError (0x80020000) in the driver's Read/Write results, making field diagnosis 'is this a tag misconfig or a driver bug?' impossible from the OPC UA client side. PR 52 adds a MapModbusExceptionToStatus helper that translates per spec: 01 Illegal Function -> BadNotSupported (0x803D0000); 02 Illegal Data Address -> BadOutOfRange (0x803C0000); 03 Illegal Data Value -> BadOutOfRange; 04 Server Failure -> BadDeviceFailure (0x80550000); 05/06 Acknowledge/Busy -> BadDeviceFailure; 0A/0B Gateway -> BadCommunicationError (0x80050000); unknown -> BadInternalError fallback. Non-Modbus failures (socket drop, timeout, malformed frame) in ReadAsync are now distinguished from tag-level faults: they map to BadCommunicationError so operators check network/PLC reachability rather than tag definitions. Why per-DL205: docs/v2/dl205.md documents DL205/DL260 returning only codes 01-04 with specific triggers -- exception 04 specifically means 'CPU in PROGRAM mode during a protected write', which is operator-recoverable by switching the CPU to RUN; surfacing it as BadDeviceFailure (not BadInternalError) makes the fix obvious. Changes in ModbusDriver: Read catch-chain now ModbusException first (-> mapper), generic Exception second (-> BadCommunicationError); Write catch-chain same pattern but generic Exception stays BadInternalError because write failures can legitimately come from EncodeRegister (out-of-range value) which is a driver-layer fault. Unit tests: MapModbusExceptionToStatus theory exercising every code in the table including the 0xFF fallback; Read_surface_exception_02_as_BadOutOfRange with an ExceptionRaisingTransport that forces code 02; Write_surface_exception_04_as_BadDeviceFailure for CPU-mode faults; Read_non_modbus_failure_maps_to_BadCommunicationError with a NonModbusFailureTransport that raises EndOfStreamException. 115/115 Modbus.Tests pass. Integration test: DL205ExceptionCodeTests.DL205_FC03_at_unmapped_register_returns_BadOutOfRange reads HR[16383] which is beyond the seeded uint16 cells on the dl205.json profile; pymodbus returns exception 02 and the driver surfaces BadOutOfRange. 11/11 DL205 integration tests pass with MODBUS_SIM_PROFILE=dl205.
2026-04-18 22:28:37 -04:00
Joseph Doherty
b5464f11ee
Phase 3 PR 50 -- DL260 bit-memory address helpers (Y/C/X/SP) + live coil integration tests. Adds four new static helpers to DirectLogicAddress covering every discrete-memory bank on the DL260: YOutputToCoil (Y0=coil 2048), CRelayToCoil (C0=coil 3072), XInputToDiscrete (X0=DI 0), SpecialToDiscrete (SP0=DI 1024). Each helper takes the DirectLOGIC ladder-logic address (e.g. 'Y0', 'Y17', 'C1777') and adds the octal-decoded offset to the bank's Modbus base per the DL260 user manual's I/O-configuration chapter table. Uses the same 'octal-walk + reject 8/9' pattern as UserVMemoryToPdu so misaligned addresses fail loudly with a clear ArgumentException rather than silently hitting the wrong coil. Fixes a pymodbus-config bug surfaced during integration-test validation: dl205.json had bits entries at cell indices 2048 / 3072 / 4000, but pymodbus's ModbusSimulatorContext.validate divides bit addresses by 16 before indexing into the shared cell array -- so Modbus coil 2048 reads cell 128, not cell 2048. The sim was returning Illegal Data Address (exception 02) for every bit read in the Y/C/scratch range. Moved bits entries to cells 128 (Y bank marker = 0b101 for Y0=ON, Y1=OFF, Y2=ON), 192 (C bank marker = 0b101 for C0/C1/C2), 250 (scratch cell covering coils 4000..4015). write list updated to the correct cell addresses. Unit tests: YOutputToCoil theory sweep (Y0->2048, Y1->2049, Y7->2055, Y10->2056 octal-to-decimal, Y17->2063, Y777->2559 top of DL260 Y range), CRelayToCoil theory (C0->3072 through C1777->4095), XInputToDiscrete theory, SpecialToDiscrete theory (with case-insensitive 'SP' prefix). Bit_address_rejects_non_octal_digits (Y8/C9/X18), Bit_address_rejects_empty, accepts_lowercase_prefix, accepts_bare_octal_without_prefix. 48/48 Modbus.Tests pass. Integration tests: DL205CoilMappingTests with three facts -- DL260_Y0_maps_to_coil_2048 (FC01 at Y0 returns ON), DL260_C0_maps_to_coil_3072 (FC01 at C0 returns ON), DL260_scratch_Crelay_supports_write_then_read (FC05 write + FC01 read round-trip at coil 4000 proves the DL-mapped coil bank is fully read/write capable end-to-end). 9/9 DL205 integration tests pass against the pymodbus dl205 profile with MODBUS_SIM_PROFILE=dl205. Caller opts into the helpers per tag the same way as PR 47's V-memory helper -- pass DirectLogicAddress.YOutputToCoil("Y0") as the ModbusTagDefinition Address; no driver-wide DL-family flag. PR 51 adds the X-input read-side integration test (there's nothing to write since X-inputs are FC02 discrete inputs, read-only); PR 52 exception-code translation; PR 53 transport reconnect-on-drop since DL260 doesn't send TCP keepalives.
2026-04-18 22:22:42 -04:00
Joseph Doherty
a3f2f95344
Phase 3 PR 49 -- Per-device FC03/FC16 register caps with auto-chunking. Adds MaxRegistersPerRead (default 125, spec max) + MaxRegistersPerWrite (default 123, spec max) to ModbusDriverOptions. Reads that exceed the cap automatically split into consecutive FC03 requests: the driver dispatches chunks of [cap] regs at incrementing addresses, copies each response into an assembled byte[] buffer, and hands the full payload to DecodeRegister. From the caller's view a 240-char string read against a cap-100 device is still one Read() call returning one string -- the chunking is invisible, the wire shows N requests of cap-sized quantity plus one tail chunk. Writes are NOT auto-chunked. Splitting an FC16 across two transactions would lose atomicity -- mid-split crash leaves half the value written, which is strictly worse than rejecting upfront. Instead, writes exceeding MaxRegistersPerWrite throw InvalidOperationException with a message naming the tag + cap + the caller's escape hatch (shorten StringLength or split into multiple tags). The driver catches the exception internally and surfaces it to IWritable as BadInternalError so the caller pattern stays symmetric with other failure modes. Per-family cap cheat-sheet (documented in xml-doc on the option): Modbus-TCP spec = 125 read / 123 write, AutomationDirect DL205/DL260 = 128 read / 100 write (128 exceeds spec byte-count capacity so in practice 125 is the working ceiling), Mitsubishi Q/FX3U = 64 / 64, Omron CJ/CS = 125 / 123. Not all PLCs reject over-cap requests cleanly -- some drop the connection silently -- so having the cap enforced client-side prevents the hard-to-diagnose 'driver just stopped' failure mode. Unit tests: Read_within_cap_issues_single_FC03_request (control: no unnecessary chunking), Read_above_cap_splits_into_two_FC03_requests (120 regs / cap 100 -> 100+20, asserts exact per-chunk (Address,Quantity) and end-to-end payload continuity starting with register[100] high byte = 'A'), Read_cap_honors_Mitsubishi_lower_cap_of_64 (100 regs / cap 64 -> 64+36), Write_exceeding_cap_throws_instead_of_splitting (110 regs / cap 100 -> status != 0 AND Fc16Requests.Count == 0 to prove nothing was sent), Write_within_cap_proceeds_normally (control: cap honored on short writes too). Tests use a new RecordingTransport that captures the (Address, Quantity) tuple of every FC03/FC16 request so the chunk layout is directly assertable -- the existing FakeTransport does not expose request history. 103/103 Modbus.Tests pass; 6/6 DL205 integration tests still pass against the live pymodbus dl205 profile with MODBUS_SIM_PROFILE=dl205.
2026-04-18 21:58:49 -04:00
Joseph Doherty
2b5222f5db
Phase 3 PR 47 -- DL205 V-memory octal-address helper. Adds DirectLogicAddress static class with two entry points: UserVMemoryToPdu(string) parses a DirectLOGIC V-address (V-prefixed or bare, whitespace tolerated) as OCTAL and returns the 0-based Modbus PDU address. V2000 octal = decimal 1024 = PDU 0x0400, which is the canonical start of the user V-memory bank on DL205/DL260. SystemVMemoryBasePdu + SystemVMemoryToPdu(ushort offset) handle the system bank (V40400 and up) which does NOT follow the simple octal-to-decimal formula -- the CPU relocates the system bank to PDU 0x2100 in H2-ECOM100 absolute mode. A naive caller converting 40400 octal would land at PDU 0x4100 (decimal 16640) and miss the system registers entirely; the helper routes the correct 0x2100 base. Why this matters: DirectLOGIC operators think in OCTAL (the ladder-logic editor, the Productivity/Do-more UI, every AutomationDirect manual addresses V-memory octally) while the Modbus wire is DECIMAL. Integrators routinely copy V-addresses from the PLC documentation into client configs and read garbage because they treated V2000 as decimal 2000 (HR[2000] = 0 in the dl205 sim, zero in most PLCs). The helper makes the translation explicit per the D2-USER-M appendix + H2-ECOM-M \u00A76.5 references cited in docs/v2/dl205.md. Unit tests: UserVMemoryToPdu_converts_octal_V_prefix (V0, V1, V7, V10, V2000, V7777, V10000, V17777 -- the exact sweep documented in dl205.md), UserVMemoryToPdu_accepts_bare_or_prefixed_or_padded (case + whitespace tolerance), UserVMemoryToPdu_rejects_non_octal_digits (V8/V19/V2009 must throw ArgumentException with 'octal' in the message -- .NET has no base-8 int.Parse so we hand-walk digits to catch 8/9 instead of silently accepting them), UserVMemoryToPdu_rejects_empty_input, UserVMemoryToPdu_overflow_rejected (200000 octal = 0x10000 overflows ushort), SystemVMemoryBasePdu_is_0x2100_for_V40400, SystemVMemoryToPdu_offsets_within_bank, SystemVMemoryToPdu_rejects_overflow. 23/23 Modbus.Tests pass. Integration tests against dl205.json pymodbus profile: DL205_V2000_user_memory_resolves_to_PDU_0x0400_marker (reads HR[0x0400]=0x2000), DL205_V40400_system_memory_resolves_to_PDU_0x2100_marker (reads HR[0x2100]=0x4040). 5/5 DL205 integration tests pass. Caller opts into the helper per tag by calling DirectLogicAddress.UserVMemoryToPdu("V2000") as the ModbusTagDefinition Address -- no driver-wide "DL205 mode" flag needed, because users mix DL and non-DL tags in a single driver instance all the time.
2026-04-18 21:49:58 -04:00
Joseph Doherty
8248b126ce
Phase 3 PR 46 -- DL205 BCD decoder (binary-coded-decimal numeric encoding). Adds ModbusDataType.Bcd16 and Bcd32 to the driver. Bcd16 is 1 register wide, Bcd32 is 2 registers wide; Bcd32 respects ModbusByteOrder (BigEndian/WordSwap) the same way Int32 does so the CDAB-style families (including DL205/DL260 themselves) can be configured. DecodeRegister uses the new internal DecodeBcd helper: walks each nibble from MSB to LSB, multiplies the running result by 10, adds the nibble as a decimal digit. Explicitly rejects nibbles > 9 with InvalidDataException -- hardware sometimes produces garbage during write-in-progress transitions and silently returning wrong numeric values would quietly corrupt the caller's data. EncodeRegister's new EncodeBcd inverts the operation (mod/div by 10 nibble-by-nibble) with an up-front overflow check against 10^nibbles-1. Why this matters for DL205/DL260: AutomationDirect DirectLOGIC uses BCD as the default numeric encoding for timers, counters, and operator-display numerics (not binary). A plain Int16 read of register 0x1234 returns 4660; the BCD path returns 1234. The two differ enough that silently defaulting to Int16 would give wildly wrong HMI values -- the caller must opt in to Bcd16/Bcd32 per tag. Unit tests: DecodeBcd (theory: 0,1,9,10,1234,9999), DecodeBcd_rejects_nibbles_above_nine, EncodeBcd (theory), Bcd16_decodes_DL205_register_1234_as_decimal_1234 (control: same bytes as Int16 decode to 4660), Bcd16_encode_round_trips_with_decode, Bcd16_encode_rejects_out_of_range_values, Bcd32_decodes_8_digits_big_endian, Bcd32_word_swap_handles_CDAB_layout, Bcd32_encode_round_trips_with_decode, Bcd_RegisterCount_matches_underlying_width. 66/66 Modbus.Tests pass. Integration test: DL205BcdQuirkTests.DL205_BCD16_decodes_HR1072_as_decimal_1234 against dl205.json pymodbus profile (HR[1072]=0x1234). Asserts Bcd16 decode=1234 AND Int16 decode=0x1234 on the same wire bytes to prove the paths are distinct. 3/3 DL205 integration tests pass with MODBUS_SIM_PROFILE=dl205.
2026-04-18 21:46:25 -04:00
Joseph Doherty
cd19022d19
Phase 3 PR 45 -- DL205 string byte-order quirk (low-byte-first ASCII packing). Adds ModbusStringByteOrder enum {HighByteFirst, LowByteFirst} + StringByteOrder field on ModbusTagDefinition (default HighByteFirst, the standard Modbus convention). DecodeRegister + EncodeRegister String branches now respect per-tag byte order. Under LowByteFirst each register packs the first char in the low byte instead of the high byte -- the AutomationDirect DirectLOGIC DL205/DL260/DL350 family's headline string quirk. Without the flag the driver decodes 'eHllo' garbage from HR[1040..1042] even though wire bytes are identical. Unit tests: String_LowByteFirst_decodes_DL205_packed_Hello (5 chars across 3 regs with nul pad), String_LowByteFirst_decode_truncates_at_first_nul, String_LowByteFirst_encode_round_trips_with_decode (asserts exact DL205-documented byte sequence {0x65,0x48,0x6C,0x6C,0x00,0x6F} + symmetric encode->decode), String_HighByteFirst_and_LowByteFirst_differ_on_same_wire (control: same wire, different flag => different decode). 56/56 Modbus.Tests pass. Integration test: DL205StringQuirkTests.DL205_string_low_byte_first_decodes_Hello_from_HR1040 against the dl205.json pymodbus profile; reads HR[1040..1042] with both flags on the same tag map and asserts LowByteFirst='Hello' + HighByteFirst!='Hello'. Gated on MODBUS_SIM_PROFILE=dl205 since the standard profile doesn't seed HR[1040..1042]. Verified 2/2 integration tests pass against running pymodbus dl205 simulator. Baseline for PR 46 (BCD decoder), PR 47 (V-memory octal helper), PR 48 (CDAB float order), PR 49 (FC03/FC16 per-device caps) -- each lands its own DL205_<behavior> test class in tests/.../DL205/.
2026-04-18 21:43:32 -04:00
Joseph Doherty
02fccbc762
Phase 3 PR 43 — followup commit: validate pymodbus simulator end-to-end + fix three real bugs surfaced by running it. winget-installed Python 3.12.10 + pip-installed pymodbus[simulator]==3.13.0 on the dev box; both profiles boot cleanly, the integration-suite smoke test passes against either profile.
...
Three substantive issues caught + fixed during the validation pass:
1. pymodbus rejects unknown keys at device-list / setup level. My PR 43 commit had `_layout_note`, `_uint16_layout`, `_bits_layout`, `_write_note` device-level JSON-comment fields that crashed pymodbus startup with `INVALID key in setup`. Removed all device-level _* fields. Inline `_quirk` keys WITHIN individual register entries are tolerated by pymodbus 3.13.0 — kept those in dl205.json since they document the byte math per quirk and the README + git history aren't enough context for a hand-author reading raw integer values. Documented the constraint in the top-level _comment of each profile.
2. pymodbus rejects sweeping `write` ranges that include any cell not assigned a type. My initial standard.json had `write: [[0, 2047]]` but only seeded HR[0..31] + HR[100] + HR[200..209] + bits[1024..1109] — pymodbus blew up on cell 32 (gap between HR[31] and HR[100]). Fixed by listing per-block write ranges that exactly mirror the seeded ranges. Same fix in dl205.json (was `[[0, 16383]]`).
3. pymodbus simulator stores all 4 standard Modbus tables in ONE underlying cell array — each cell can only be typed once (BITS or UINT16, not both). My initial standard.json had `bits[0..31]` AND `uint16[0..31]` overlapping at the same addresses; pymodbus crashed with `ERROR "uint16" <Cell> used`. Fixed by relocating coils to address 1024+, well clear of the uint16 entries at 0..209. Documented the layout constraint in the standard.json top-level _comment.
Substantive driver bug fixed: ModbusTcpTransport.ConnectAsync was using `new TcpClient()` (default constructor — dual-stack, IPv6 first) then `ConnectAsync(host, port)` with the user's hostname. .NET's TcpClient default-resolves "localhost" to ::1 first, fails to connect to pymodbus (which binds 0.0.0.0 IPv4-only), and only then retries IPv4 — the failure surfaces as the entire ConnectAsync timeout (2s by default) before the IPv4 attempt even starts. PR 30's smoke test silently SKIPPED because the fixture's TCP probe hit the same dual-stack ordering and timed out. Both fixed: ModbusSimulatorFixture probe now resolves Dns.GetHostAddresses, prefers AddressFamily.InterNetwork, dials IPv4 explicitly. ModbusTcpTransport does the same — resolves first, prefers IPv4, falls back to whatever Dns returns (handles IPv6-only hosts in the future). This is a real production-readiness fix because most Modbus PLCs are IPv4-only — a generic dual-stack TcpClient would burn the entire connect timeout against any IPv4-only PLC, masquerading as a connection failure when the PLC is actually fine.
Smoke-test address shifted HR[100] -> HR[200]. Standard.json's HR[100] is the auto-incrementing register that drives subscribe-and-receive tests, so write-then-read against it would race the increment. HR[200] is the first cell of a writable scratch range present in BOTH simulator profiles. DL205Profile.cs xml-doc updated to explain the shift; tag name "DL205_Smoke_HReg100" -> "Smoke_HReg200" + smoke test references updated. dl205.json gains a matching scratch HR[200..209] range so the smoke test runs identically against either profile.
Validation matrix:
- standard.json boot: clean (TCP 5020 listening within ~3s of pymodbus.simulator launch).
- dl205.json boot: clean.
- pymodbus client direct FC06 to HR[200]=1234 + FC03 read: round-trip OK.
- raw-bytes PowerShell TcpClient FC06 + 12-byte response: matches FC06 spec (echo of address + value).
- DL205SmokeTest against standard.json: 1/1 pass (was failing as 'BadInternalError' due to the dual-stack timeout + tag-name typo — both fixed).
- DL205SmokeTest against dl205.json: 1/1 pass.
- Modbus.Tests Unit suite: 52/52 pass — dual-stack transport fix is non-breaking.
- Solution build clean.
Memory + future-PR setup: pymodbus install + activation pattern is now bullet-pointed at the top of Pymodbus/README.md so future PRs (the per-quirk DL205_<behavior> tests in PR 44+) don't have to repeat the trial-and-error of getting the simulator + integration tests cooperating. The three bugs above are documented inline in the JSON profiles + ModbusTcpTransport so they don't bite again.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com >
2026-04-18 21:14:02 -04:00
Joseph Doherty
52a29100b1
Phase 3 PR 38 — DriverNodeManager HistoryRead override (LMX #1 finish). Wires the OPC UA HistoryRead service through CustomNodeManager2's four protected per-kind hooks — HistoryReadRawModified / HistoryReadProcessed / HistoryReadAtTime / HistoryReadEvents — each dispatching to the driver's IHistoryProvider capability (PR 35 for ReadAtTime + ReadEvents on top of PR 19-era ReadRaw + ReadProcessed). Was the last missing piece of the end-to-end HistoryRead path: PR 10 + PR 11 shipped the Galaxy.Host IPC contracts, PR 35 surfaced them on IHistoryProvider + GalaxyProxyDriver, but no server-side handler bridged OPC UA HistoryRead service requests onto the capability interface. Now it does.
...
Per-kind override shape: each hook receives the pre-filtered nodesToProcess list (NodeHandles for nodes this manager claimed), iterates them, resolves handle.NodeId.Identifier to the driver-side full reference string, and dispatches to the right IHistoryProvider method. Write back into the outer results + errors slots at handle.Index (not the local loop counter — nodesToProcess is a filtered subset of nodesToRead, so indexing by the loop counter lands in the wrong slot for mixed-manager batches). WriteResult helper sets both results[i] AND errors[i]; this matters because MasterNodeManager merges them and leaving errors[i] at its default (BadHistoryOperationUnsupported) overrides a Good result with Unsupported on the wire — this was the subtle failure mode that masked a correctly-constructed HistoryData response during debugging. Failure-isolation per node: NotSupportedException from a driver that doesn't implement a particular HistoryProvider method translates to BadHistoryOperationUnsupported in that slot; generic exceptions log and surface BadInternalError; unresolvable NodeIds get BadNodeIdUnknown. The batch continues unconditionally.
Aggregate mapping: MapAggregate translates ObjectIds.AggregateFunction_Average / Minimum / Maximum / Total / Count to the driver's HistoryAggregateType enum. Null for anything else (e.g. TimeAverage, Interpolative) so the handler surfaces BadAggregateNotSupported at the batch level — per Part 13, one unsupported aggregate means the whole request fails since ReadProcessedDetails carries one aggregate list for all nodes. BuildHistoryData wraps driver DataValueSnapshots as Opc.Ua.HistoryData in an ExtensionObject; BuildHistoryEvent wraps HistoricalEvents as Opc.Ua.HistoryEvent with the canonical BaseEventType field list (EventId, SourceName, Message, Severity, Time, ReceiveTime — the order OPC UA clients that didn't customize the SelectClause expect). ToDataValue preserves null SourceTimestamp (Galaxy historian rows often carry only ServerTimestamp) — synthesizing a SourceTimestamp would lie about actual sample time.
Two address-space changes were required to make the stack dispatch reach the per-kind hooks at all: (1) historized variables get AccessLevels.HistoryRead added to their AccessLevel byte — the base's early-gate check on (variable.AccessLevel & HistoryRead != 0) was rejecting requests before our override ever ran; (2) the driver-root folder gets EventNotifiers.HistoryRead | SubscribeToEvents so HistoryReadEvents can target it (the conventional pattern for alarm-history browse against a driver-owned object). Document the 'set both bits' requirement inline since it's not obvious from the surface API.
OpcHistoryReadResult alias: Opc.Ua.HistoryReadResult (service-layer per-node result) collides with Core.Abstractions.HistoryReadResult (driver-side samples + continuation point) by type name; the alias 'using OpcHistoryReadResult = Opc.Ua.HistoryReadResult' keeps the override signatures unambiguous and the test project applies the mirror pattern for its stub driver impl.
Tests — DriverNodeManagerHistoryMappingTests (12 new Category=Unit cases): MapAggregate translates each supported aggregate NodeId via reflection-backed theory (guards against the stack renaming AggregateFunction_* constants); returns null for unsupported NodeIds (TimeAverage) and null input; BuildHistoryData wraps samples with correct DataValues + SourceTimestamp preservation; BuildHistoryEvent emits the 6-element BaseEventType field list in canonical order (regression guard for a future 'respect the client's SelectClauses' change); null SourceName / Message translate to empty-string Variants (nullable-Variant refactor trap); ToDataValue preserves StatusCode + both timestamps; ToDataValue leaves SourceTimestamp at default when the snapshot omits it. HistoryReadIntegrationTests (5 new Category=Integration): drives a real OPC UA client Session.HistoryRead against a fake HistoryDriver through the running server. Covers raw round-trip (verifies per-node DataValue ordering + values); processed with Average aggregate (captures the driver's received aggregate + interval, asserting MapAggregate routed correctly); unsupported aggregate (TimeAverage → BadAggregateNotSupported); at-time (forwards the per-timestamp list); events (BaseEventType field list shape, SelectClauses populated to satisfy the stack's filter validator). Server.Tests Unit: 55 pass / 0 fail (43 prior + 12 new mapping). Server.Tests Integration: 14 pass / 0 fail (9 prior + 5 new history). Full solution build clean, 0 errors.
lmx-followups.md #1 updated to 'DONE (PRs 35 + 38)' with two explicit deferred items: continuation-point plumbing (driver returns null today so pass-through is fine) and per-SelectClause evaluation in HistoryReadEvents (clients with custom field selections get the canonical BaseEventType layout today).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com >
2026-04-18 17:50:23 -04:00
Joseph Doherty
bf329b05d8
Phase 3 PR 35 — IHistoryProvider gains ReadAtTimeAsync + ReadEventsAsync; GalaxyProxyDriver implements both. Extends Core.Abstractions.IHistoryProvider with two new methods that round out the OPC UA Part 11 HistoryRead surface (HistoryReadAtTime + HistoryReadEvents are the last two modes not covered by the PR 19-era ReadRawAsync + ReadProcessedAsync) and wires GalaxyProxyDriver to call the existing PR-10/PR-11 IPC contracts the Host already implements.
...
Interface additions use C# default interface implementations that throw NotSupportedException — existing IHistoryProvider implementations keep compiling, only drivers whose backend carries the relevant capability override. This matches the 'capabilities are optional per driver' design already used by IHistoryProvider.ReadProcessedAsync's docs (Modbus / OPC UA Client drivers never had an event historian and the default-throw path lets callers see BadHistoryOperationUnsupported naturally). New HistoricalEvent record models one historian row (EventId, SourceName, EventTimeUtc + ReceivedTimeUtc — process vs historian-persist timestamps, Message, Severity mapped to OPC UA's 1-1000 range); HistoricalEventsResult pairs the event list with a continuation-point token for future batching. Both live in Core.Abstractions so downstream (Proxy, Host, Server) reference a single domain shape — no Shared-contract leak into the driver-facing interface.
GalaxyProxyDriver.ReadAtTimeAsync maps the domain DateTime[] to Unix-ms longs, calls CallAsync on the existing MessageKind.HistoryReadAtTimeRequest, and trusts the Host's one-sample-per-requested-timestamp contract (the Host pads with bad-quality snapshots for timestamps it can't interpolate; re-aligning on the Proxy side would duplicate the Host's interpolation policy logic). ReadEventsAsync does the same for HistoryReadEventsRequest; ToHistoricalEvent translates GalaxyHistoricalEvent (MessagePack-annotated, Unix-ms) to the domain record, explicitly tagging DateTimeKind.Utc on both timestamp fields so downstream serializers (JSON, OPC UA types) don't apply an unexpected local-time offset.
Tests — HistoricalEventMappingTests (3 new Proxy.Tests unit cases): every field maps correctly from wire to domain; null SourceName and null DisplayText preserve through the mapping (system events without a source come out with null so callers can distinguish them from alarm events); both timestamps come out as DateTimeKind.Utc (regression guard against a future refactor using DateTime.FromFileTimeUtc or similar that defaults to Unspecified). Driver.Galaxy.Proxy.Tests Unit suite: 17 pass / 0 fail (14 prior + 3 new). Full solution build clean, 0 errors.
Scope exclusions — DriverNodeManager HistoryRead service-handler wiring (on the OPC UA Server side, where HistoryReadAtTime and HistoryReadEvents service requests land) and the full-loop integration test (OPC UA client → server → IPC → Host → HistorianDataSource → back) are deferred to a focused follow-up PR. The capability surface is the load-bearing change; wiring the service handlers is mechanical in comparison and worth its own PR for reviewability. docs/v2/lmx-followups.md #1 updated with the split.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com >
2026-04-18 16:08:27 -04:00
Joseph Doherty
ef2a810b2d
Phase 3 PR 34 — Host-status publisher (Server) + /hosts drill-down page (Admin). Closes LMX follow-up #7 by wiring together the data layer from PR 33. Server.HostStatusPublisher is a BackgroundService that walks every driver registered in DriverHost every 10 seconds, skips drivers that don't implement IHostConnectivityProbe, calls GetHostStatuses() on each probe-capable driver, and upserts one DriverHostStatus row per (NodeId, DriverInstanceId, HostName) into the central config DB. Upsert path: SingleOrDefaultAsync on the composite PK; if no row exists, Add a new one; if a row exists, LastSeenUtc advances unconditionally (heartbeat) and State + StateChangedUtc update only on transitions so Admin UI can distinguish 'still reporting, still Running' from 'freshly transitioned to Running'. MapState translates Core.Abstractions.HostState to Configuration.Enums.DriverHostState (intentional duplicate enum — Configuration project stays free of driver-runtime deps per PR 33's choice). If a driver's GetHostStatuses throws, log warning and skip that driver this tick — never take down the Server on a publisher failure. If the DB is unreachable, log warning + retry next heartbeat (no buffering — next tick's current-state snapshot is more useful than replaying stale transitions after a long outage). 2-second startup delay so NodeBootstrap's RegisterAsync calls land before the first publish tick, then tick runs immediately so a freshly-started Server surfaces its host topology in the Admin UI without waiting a full interval.
...
Polling chosen over event-driven for initial scope: simpler, matches Admin UI consumer cadence, avoids DriverHost lifecycle-event plumbing that doesn't exist today. Event-driven push for sub-heartbeat latency is a straightforward follow-up.
Admin.Services.HostStatusService left-joins DriverHostStatus against ClusterNode on NodeId so rows persist even when the ClusterNode entry doesn't exist yet (first-boot bootstrap case). StaleThreshold = 30s — covers one missed publisher heartbeat plus a generous buffer for clock skew and GC pauses. Admin Components/Pages/Hosts.razor — FleetAdmin-visible page grouped by cluster (handles the '(unassigned)' case for rows without a matching ClusterNode). Four summary cards (Hosts / Running / Stale / Faulted); per-cluster table with Node / Driver / Host / State + Stale-badge / Last-transition / Last-seen / Detail columns; 10s auto-refresh via IServiceScopeFactory timer pattern matching FleetStatusPoller + Fleet dashboard (PR 27). Row-class highlighting: Faulted → table-danger, Stale → table-warning, else default. State badge maps DriverHostState enum to bootstrap color classes. Sidebar link added between 'Fleet status' and 'Clusters'.
Server csproj adds Microsoft.EntityFrameworkCore.SqlServer 10.0.0 + registers OtOpcUaConfigDbContext in Program.cs scoped via NodeOptions.ConfigDbConnectionString (no Admin-style manual SQL raw — the DbContext is the only access path, keeps migrations owner-of-record).
Tests — HostStatusPublisherTests (4 new Integration cases, uses per-run throwaway DB matching the FleetStatusPollerTests pattern): publisher upserts one row per host from each probe-capable driver and skips non-probe drivers; second tick advances LastSeenUtc without creating duplicate rows (upsert pattern verified end-to-end); state change between ticks updates State AND StateChangedUtc (datetime2(3) rounds to millisecond precision so comparison uses 1ms tolerance — documented inline); MapState translates every HostState enum member. Server.Tests Integration: 4 new tests pass. Admin build clean, Admin.Tests Unit still 23 / 0. docs/v2/lmx-followups.md item #7 marked DONE with three explicit deferred items (event-driven push, failure-count column, SignalR fan-out).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com >
2026-04-18 15:51:55 -04:00
Joseph Doherty
8464e3f376
Phase 3 PR 33 — DriverHostStatus entity + EF migration (data-layer for LMX #7 ). New DriverHostStatus entity with composite key (NodeId, DriverInstanceId, HostName) persists each server node's per-host connectivity view — one row per (server node, driver instance, probe-reported host), which means a redundant 2-node cluster with one Galaxy driver reporting 3 platforms produces 6 rows because each server node owns its own runtime view of the shared host topology, not 3. Fields: NodeId (64), DriverInstanceId (64), HostName (256 — fits Galaxy FQDNs and Modbus host:port strings), State (DriverHostState enum — Unknown/Running/Stopped/Faulted, persisted as nvarchar(16) via HasConversion<string> so DBAs inspecting the table see readable state names not ordinals), StateChangedUtc + LastSeenUtc (datetime2(3) — StateChangedUtc tracks actual transitions while LastSeenUtc advances on every publisher heartbeat so the Admin UI can flag stale rows from a crashed Server independent of State), Detail (nullable 1024 — exception message from the driver's probe when Faulted, null otherwise).
...
DriverHostState enum lives in Configuration.Enums/ rather than reusing Core.Abstractions.HostState so the Configuration project stays free of driver-runtime dependencies (it's referenced by both the Admin process and the Server process, so pulling in the driver-abstractions assembly to every Admin build would be unnecessary weight). The server-side publisher hosted service (follow-up PR 34) will translate HostStatusChangedEventArgs.NewState to this enum on every transition.
No foreign key to ClusterNode — a Server may start reporting host status before its ClusterNode row exists (first-boot bootstrap), and we'd rather keep the status row than drop it. The Admin-side service that renders the dashboard will left-join on NodeId when presenting. Two indexes declared: IX_DriverHostStatus_Node drives the per-cluster drill-down (Admin UI joins ClusterNode on ClusterId to pick which NodeIds to fetch), IX_DriverHostStatus_LastSeen drives the stale-row query (now - LastSeen > threshold).
EF migration AddDriverHostStatus creates the table + PK + both indexes. Model snapshot updated. SchemaComplianceTests expected-tables list extended. DriverHostStatusTests (3 new cases, category SchemaCompliance, uses the shared fixture DB): composite key allows same (host, driver) across different nodes AND same (node, host) across different drivers — both real-world cases the publisher needs to support; upsert-in-place pattern (fetch-by-composite-PK, mutate, save) produces one row not two — the pattern the publisher will use; State enum persists as string not int — reading the DB via ADO.NET returns 'Faulted' not '3'.
Configuration.Tests SchemaCompliance suite: 10 pass / 0 fail (7 prior + 3 new). Configuration build clean. No Server or Admin code changes yet — publisher + /hosts page are PR 34.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com >
2026-04-18 15:38:41 -04:00
Joseph Doherty
4886a5783f
Phase 3 PR 31 — Live-LDAP integration test + Active Directory compatibility. Closes LMX follow-up #4 with 6 live-bind tests in Server.Tests/LdapUserAuthenticatorLiveTests.cs against the dev GLAuth instance at localhost:3893 (skipped cleanly when unreachable via Assert.Skip + a clear SkipReason — matches the GalaxyRepositoryLiveSmokeTests pattern). Coverage: valid credentials bind + surface DisplayName; wrong password fails; unknown user fails; empty credentials fail pre-flight without touching the directory; writeop user's memberOf maps through GroupToRole to WriteOperate (the exact string WriteAuthzPolicy.IsAllowed expects); admin user surfaces all four mapped roles (WriteOperate + WriteTune + WriteConfigure + AlarmAck) proving memberOf parsing doesn't stop after the first match. While wiring this up, the authenticator's hard-coded user-lookup filter 'uid=<name>' didn't match GLAuth (which keys users by cn and doesn't populate uid) — AND it doesn't match Active Directory either, which uses sAMAccountName. Added UserNameAttribute to LdapOptions (default 'uid' for RFC 2307 backcompat) so deployments override to 'cn' / 'sAMAccountName' / 'userPrincipalName' as the directory requires; authenticator filter now interpolates the configured attribute. The default stays 'uid' so existing test fixtures and OpenLDAP installs keep working without a config change — a regression guard in LdapUserAuthenticatorAdCompatTests.LdapOptions_default_UserNameAttribute_is_uid_for_rfc2307_compat pins this so a future 'helpful' default change can't silently break anyone.
...
Active Directory compatibility. LdapOptions xml-doc expanded with a cheat-sheet covering Server (DC FQDN), Port 389 vs 636, UseTls=true under AD LDAP-signing enforcement, dedicated read-only service account DN, sAMAccountName vs userPrincipalName vs cn trade-offs, memberOf DN shape (CN=Group,OU=...,DC=... with the CN= RDN stripped to become the GroupToRole key), and the explicit 'nested groups NOT expanded' call-out (LDAP_MATCHING_RULE_IN_CHAIN / tokenGroups is a future authenticator enhancement, not a config change). docs/security.md §'Active Directory configuration' adds a complete appsettings.json snippet with realistic AD group names (OPCUA-Operators → WriteOperate, OPCUA-Engineers → WriteConfigure, OPCUA-AlarmAck → AlarmAck, OPCUA-Tuners → WriteTune), LDAPS port 636, TLS on, insecure-LDAP off, and operator-facing notes on each field. LdapUserAuthenticatorAdCompatTests (5 unit guards): ExtractFirstRdnValue parses AD-style 'CN=OPCUA-Operators,OU=...,DC=...' DNs correctly (case-preserving — operators' GroupToRole keys stay readable); also handles mixed case and spaces in group names ('Domain Users'); also works against the OpenLDAP ou=<group>,ou=groups shape (GLAuth) so one extractor tolerates both memberOf formats common in the field; EscapeLdapFilter escapes the RFC 4515 injection set (\, *, (, ), \0) so a malicious login like 'admin)(cn=*' can't break out of the filter; default UserNameAttribute regression guard.
Test posture — Server.Tests Unit: 43 pass / 0 fail (38 prior + 5 new AD-compat guards). Server.Tests LiveLdap category: 6 pass / 0 fail against running GLAuth (would skip cleanly without). Server build clean, 0 errors, 0 warnings.
Deferred: the session-identity end-to-end check (drive a full OPC UA UserName session, then read a 'whoami' node to verify the role landed on RoleBasedIdentity). That needs a test-only address-space node and is scoped for a separate PR.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com >
2026-04-18 15:23:22 -04:00
Joseph Doherty
d5fa1f450e
Phase 3 PR 29 — Account/session page expanding the minimal sidebar role display into a dedicated /account route. Shows the authenticated operator's identity (username from ClaimTypes.NameIdentifier, display name from ClaimTypes.Name), their Admin roles as badges (from ClaimTypes.Role), the raw LDAP groups that mapped to those roles (from the 'ldap_group' claim added by Login.razor at sign-in), and a capability table listing each Admin capability with its required role and a Yes/No badge showing whether this session has it. Capability list mirrors the Program.cs authorization policies + each page's [Authorize] attribute so operators can self-service check whether their session has access without trial-and-error navigation — capabilities covered: view clusters + fleet status (all roles), edit configuration drafts (ConfigEditor or FleetAdmin per CanEdit policy), publish generations (FleetAdmin per CanPublish policy), manage certificate trust (FleetAdmin per PR 28 Certificates page attribute), manage external-ID reservations (ConfigEditor or FleetAdmin per Reservations page attribute).
...
Sidebar's 'Signed in as' line now wraps the display name in a link to /account so the existing sidebar-compact view becomes the entry point for the fuller page — keeps the sign-out button where it was for muscle memory, just adds the detail page one click away. Page is gated with [Authorize] (any authenticated admin) rather than a specific role — the capability table deliberately works for every signed-in user so they can see what they DON'T have access to, which helps them file the right ticket with their LDAP admin instead of getting a plain Access Denied when navigating blindly.
Capability → required-role table is defined as a private readonly record list in the page rather than pulled from a service because it's a UI-presentation concern, not runtime policy state — the runtime policy IS Program.cs's AddAuthorizationBuilder + each page's [Authorize] attribute, and this table just mirrors it for operator readability. Comment on the list reminds future-me to extend it when a new policy or [Authorize] page lands. No behavior change if roles are empty, but the page surfaces a hint ('Sign-in would have been blocked, so if you're seeing this, the session claim is likely stale') that nudges the operator toward signing out + back in.
No new tests added — the page is pure display over claims; its only logic is the 'has-capability' Any-overlap check which is exactly what ASP.NET's [Authorize(Roles=...)] does in-framework, and duplicating that in a unit test would test the framework rather than our code. Admin.Tests Unit stays 23 pass / 0 fail. Admin build clean.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com >
2026-04-18 14:43:35 -04:00
Joseph Doherty
ed88835d34
Phase 3 PR 28 — Admin UI cert-trust management page. New /certificates route (FleetAdmin-only) surfaces the OPC UA server's PKI store rejected + trusted certs and gives operators Trust / Delete / Revoke actions so rejected client certs can be promoted without touching disk. CertTrustService reads $PkiStoreRoot/{rejected,trusted}/certs/*.der files directly via X509CertificateLoader — no Opc.Ua dependency in the Admin project, which keeps the Admin host runnable on a machine that doesn't have the full Server install locally (only needs the shared PKI directory reachable; typical deployment has Admin + Server side-by-side on the same box and PkiStoreRoot defaults match so a plain-vanilla install needs no override). CertTrustOptions bound from the Admin's 'CertTrust:PkiStoreRoot' section, default %ProgramData%\OtOpcUa\pki (matches OpcUaServerOptions.PkiStoreRoot default). Trust action moves the .der from rejected/certs/ to trusted/certs/ via File.Move(overwrite:true) — idempotent, tolerates a concurrent operator doing the same move. Delete wipes the file. Revoke removes from trusted/certs/ (Opc.Ua re-reads the Directory store on each new client handshake, so no explicit reload signal is needed; operators retry the rejected connection after trusting). Thumbprint matching is case-insensitive because X509Certificate2.Thumbprint is upper-case hex but operators copy-paste from logs that sometimes lowercase it. Malformed files in the store are logged + skipped — a single bad .der can't take the whole management page offline. Missing store directories produce empty lists rather than exceptions so a pristine install (Server never run yet, no rejected/trusted dirs yet) doesn't crash the page.
...
Razor page layout: two tables (Rejected / Trusted) with Subject / Issuer / Thumbprint / Valid-window / Actions columns, status banner after each action with success or warning kind ('file missing' = another admin handled it), FleetAdmin-only via [Authorize(Roles=AdminRoles.FleetAdmin)]. Each action invokes LogActionAsync which Serilog-logs the authenticated admin user + thumbprint + action for an audit trail — DB-level ConfigAuditLog persistence is deferred because its schema is cluster-scoped and cert actions are cluster-agnostic; Serilog + CertTrustService's filesystem-op info logs give the forensic trail in the meantime. Sidebar link added to MainLayout between Reservations and the future Account page.
Tests — CertTrustServiceTests (9 new unit cases): ListRejected parses Subject + Thumbprint + store kind from a self-signed test cert written into rejected/certs/; rejected and trusted stores are kept separate; TrustRejected moves the file and the Rejected list is empty afterwards; TrustRejected with a thumbprint not in rejected returns false without touching trusted; DeleteRejected removes the file; UntrustCert removes from trusted only; thumbprint match is case-insensitive (operator UX); missing store directories produce empty lists instead of throwing DirectoryNotFoundException (pristine-install tolerance); a junk .der in the store is logged + skipped and the valid certs still surface (one bad file doesn't break the page). Full Admin.Tests Unit suite: 23 pass / 0 fail (14 prior + 9 new). Full Admin build clean — 0 errors, 0 warnings.
lmx-followups.md #3 marked DONE with a cross-reference to this PR and a note that flipping AutoAcceptUntrustedClientCertificates to false as the production default is a deployment-config follow-up, not a code gap — the Admin UI is now ready to be the trust gate.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com >
2026-04-18 14:37:55 -04:00
Joseph Doherty
b5f8661e98
Phase 3 PR 27 — Fleet status dashboard page. New /fleet route shows per-node apply state (ClusterNodeGenerationState joined with ClusterNode for the ClusterId) in a sortable table with summary cards for Total / Applied / Stale / Failed node counts. Stale detection: LastSeenAt older than 30s triggers a table-warning row class + yellow count card. Failed rows get table-danger + red card. Badge classes per LastAppliedStatus: Applied=bg-success, Failed=bg-danger, Applying=bg-info, unknown=bg-secondary. Timestamps rendered as relative-age strings ('42s ago', '15m ago', '3h ago', then absolute date for >24h). Error column is truncated to 320px with the full message in a tooltip so the table stays readable on wide fleets. Initial data load on OnInitializedAsync; auto-refresh every 5s via a Timer that calls InvokeAsync(RefreshAsync) — matches the FleetStatusPoller's 5s cadence so the dashboard sees the most recent state without polling ahead of the broadcaster. A Refresh button also kicks a manual reload; _refreshing gate prevents double-runs when the timer fires during an in-flight query. IServiceScopeFactory (matches FleetStatusPoller's pattern) creates a fresh DI scope per refresh so the per-page DbContext can't race the timer with the render thread; no new DI registrations needed. Live SignalR hub push is deliberately deferred to a follow-up PR — the existing FleetStatusHub + NodeStateChangedMessage already works for external JavaScript clients; wiring an in-process Blazor Server consumer adds HubConnectionBuilder plumbing that's worth its own focused change. Sidebar link added to MainLayout between Overview and Clusters. Full Admin.Tests Unit suite 14 pass / 0 fail — unchanged, no tests regressed. Full Admin build clean (0 errors, 0 warnings). Closes the 'no per-driver dashboard' gap from lmx-followups item #7 at the fleet level; per-host (platform/engine/Modbus PLC) granularity still needs a dedicated page that consumes IHostConnectivityProbe.GetHostStatuses from the Server process — that's the live-SignalR follow-up.
...
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com >
2026-04-18 13:12:31 -04:00
Joseph Doherty
6b04a85f86
Phase 3 PR 26 — server-layer write authorization gating by role. Per the user's ACL-at-server-layer directive (saved as feedback_acl_at_server_layer.md in memory), write authorization is enforced in DriverNodeManager.OnWriteValue and never delegated to the driver or to driver-specific auth (the v1 Galaxy-provided security path is explicitly not part of v2 — drivers report SecurityClassification as discovery metadata only). New WriteAuthzPolicy static class in Server/Security/ maps SecurityClassification → required role per the table documented in docs/Configuration.md: FreeAccess = no role required (anonymous sessions can write), Operate + SecuredWrite = WriteOperate, Tune = WriteTune, VerifiedWrite + Configure = WriteConfigure, ViewOnly = deny regardless of roles. Role matching is case-insensitive and role requirements do NOT cascade — a session with WriteConfigure can write Configure attributes but needs WriteOperate separately to write Operate attributes; this is deliberate so escalation is an explicit LDAP group assignment, not a hierarchy the policy silently grants. DriverNodeManager gains a _securityByFullRef Dictionary populated during Variable() registration (parallel to the existing _variablesByFullRef) so OnWriteValue can look up the classification in O(1) on the hot path. OnWriteValue casts the session's context.UserIdentity to the new IRoleBearer interface (implemented by OtOpcUaServer.RoleBasedIdentity from PR 19) — empty Roles collection when the session is anonymous; the same WriteAuthzPolicy.IsAllowed check then either short-circuits true (FreeAccess), false (ViewOnly), or walks the roles list looking for the required one. On deny, OnWriteValue logs 'Write denied for {FullRef}: classification=X userRoles=[...]' at Information level (readable trail for operator complaints) and returns BadUserAccessDenied without touching IWritable.WriteAsync — drivers never see a request we'd have refused. IRoleBearer kept as a minimal server-side interface rather than reusing some abstraction from Core.Abstractions because the concept is OPC-UA-session-scoped and doesn't generalize (the driver side has no notion of a user session). Tests — WriteAuthzPolicyTests (17 new cases): FreeAccess allows write with empty role set + arbitrary roles; ViewOnly denies write even with every role; Operate requires WriteOperate; role match is case-insensitive; Operate denies empty role set + wrong role; SecuredWrite shares Operate's requirement; Tune requires WriteTune; Tune denies WriteOperate-only (asserts roles don't cascade — this is the test that catches a future regression where someone 'helpfully' adds a role-escalation table); Configure requires WriteConfigure; VerifiedWrite shares Configure's requirement; multi-role session allowed when any role matches; unrelated roles denied; RequiredRole theory covering all 5 classified-and-mapped rows + null for FreeAccess/ViewOnly special cases. lmx-followups.md follow-up #2 marked DONE with a back-reference to this PR and the memory note. Full Server.Tests Unit suite: 38 pass / 0 fail (17 new WriteAuthz + 14 SecurityConfiguration from PR 19 + 2 NodeBootstrap + 5 others). Server.Tests Integration (Category=Integration) 2 pass — existing PR 17 anonymous-endpoint smoke tests stay green since the read path doesn't hit OnWriteValue.
...
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com >
2026-04-18 13:01:01 -04:00
Joseph Doherty
eea31dcc4e
Phase 3 PR 24 — Modbus PLC data type extensions. Extends ModbusDataType beyond the textbook Int16/UInt16/Int32/UInt32/Float32 set with Int64/UInt64/Float64 (4-register types), BitInRegister (single bit within a holding register, BitIndex 0-15 LSB-first), and String (ASCII packed 2 chars per register with StringLength-driven sizing). Adds ModbusByteOrder enum on ModbusTagDefinition covering the two word-orderings that matter in the real PLC population: BigEndian (ABCD — Modbus TCP standard, Schneider PLCs that follow it strictly) and WordSwap (CDAB — Siemens S7 family, several Allen-Bradley series, some Modicon families). NormalizeWordOrder helper reverses word pairs in-place for 32-bit values and reverses all four words for 64-bit values (keeps bytes big-endian within each register, which is universal; swaps only the word positions). Internal codec surface switched from (bytes, ModbusDataType) pairs to (bytes, ModbusTagDefinition) because the tag carries the ByteOrder + BitIndex + StringLength context the codec needs; RegisterCount similarly takes the tag so strings can compute ceil(StringLength/2). DriverDataType mapping in MapDataType extended to cover the new logical types — Int64/UInt64 widen to Int32 (PR 25 follow-up: extend DriverDataType enum with Int64 to avoid precision loss), Float64 maps to DriverDataType.Float64, String maps to DriverDataType.String, BitInRegister surfaces as Boolean, all other mappings preserved. BitInRegister writes throw a deliberate InvalidOperationException with a 'read-modify-write' hint — to atomically flip a single bit the driver needs to FC03 the register, OR/AND in the bit, then FC06 it back; that's a separate PR because the bit-modify atomicity story needs a per-register mutex and optional compare-and-write semantics. Everything else (decoder paths for both byte orders, Int64/UInt64/Float64 encode + decode, bit-index extraction across both register halves, String nul-truncation on decode, String nul-padding on encode) ships here. Tests (21 new ModbusDataTypeTests): RegisterCount_returns_correct_register_count_per_type theory (10 rows covering every numeric type); RegisterCount_for_String_rounds_up_to_register_pair theory (5 rows including the 0-char edge case that returns 0 registers); Int32_BigEndian_decodes_ABCD_layout + Int32_WordSwap_decodes_CDAB_layout + Float32_WordSwap_encode_decode_roundtrips (covers the two most-common 32-bit orderings); Int64_BigEndian_roundtrips + UInt64_WordSwap_reverses_four_words (word-swap on 64-bit reverses the four-word layout explicitly, with the test computing the expected wire shape by hand rather than trusting the implementation) + Float64_roundtrips_under_word_swap (3.14159265358979 survives the round-trip with 1e-12 tolerance); BitInRegister_extracts_bit_at_index theory (6 rows including LSB, MSB, and arbitrary bits in a multi-bit mask); BitInRegister_write_is_not_supported_in_PR24 (asserts the exception message steers the reader to the 'read-modify-write' follow-up); String_decodes_ASCII_packed_two_chars_per_register (decodes 'HELLO!' from 3 packed registers with the 'HELLO!'u8 test-only UTF-8 literal which happens to equal the ASCII bytes for this ASCII input); String_decode_truncates_at_first_nul ('Hi' padded with nuls reads back as 'Hi'); String_encode_nul_pads_remaining_bytes (short input writes remaining bytes as 0). Full solution: 0 errors, 217 unit + integration tests pass (22 + 30 new Modbus = 52 Modbus total, 165 pre-existing). ModbusDriver capability footprint now matches the most common industrial PLC workloads — Siemens S7 + Allen-Bradley + Modicon all supported via ByteOrder config without driver forks.
...
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com >
2026-04-18 12:27:12 -04:00
Joseph Doherty
268b12edec
Phase 3 PR 23 — Modbus IHostConnectivityProbe. ModbusDriver now implements 6 of 8 capability interfaces (adds IHostConnectivityProbe alongside IDriver + ITagDiscovery + IReadable + IWritable + ISubscribable from the earlier PRs). Background probe loop kicks off in InitializeAsync when ModbusProbeOptions.Enabled is true, sends a cheap FC03 read-1-register at ProbeAddress (default 0) every Interval (default 5s) with a per-tick Timeout (default 2s), and tracks the single Modbus endpoint's state in the HostState machine. Initial state = Unknown; first successful probe transitions to Running; any transport/timeout failure transitions to Stopped; recovery transitions back to Running. OnHostStatusChanged fires exactly on transitions (not on repeat successes — prevents event-spam on a healthy connection). HostName format is 'host:port' so the Admin UI can display the endpoint uniformly with Galaxy platforms/engines in the fleet status dashboard. GetHostStatuses returns a single-item list with the current state + last-change timestamp (Modbus driver talks to exactly one endpoint per instance — operators spin up multiple driver instances for multi-PLC deployments). ShutdownAsync cancels the probe CTS before tearing down the transport so the loop can't log a spurious Stopped after intentional shutdown (OperationCanceledException caught separately from the 'real' transport errors). ModbusDriverOptions extended with ModbusProbeOptions sub-record (Enabled default true, Interval 5s, Timeout 2s, ProbeAddress ushort for PLCs that have register-0 policies; most PLCs tolerate an FC03 at 0 but some industrial gateways lock it). Tests (7 new ModbusProbeTests): Initial_state_is_Unknown_before_first_probe_tick (probe disabled, state stays Unknown, HostName formatted); First_successful_probe_transitions_to_Running (enabled, waits for probe count + event queue, asserts Unknown → Running with correct OldState/NewState); Transport_failure_transitions_to_Stopped (flip fake.Reachable = false mid-run, wait for state diff); Recovery_transitions_Stopped_back_to_Running (up → down → up, asserts ≥ 3 transitions); Repeated_successful_probes_do_not_generate_duplicate_Running_events (several hundred ms of stable probes, count stays at 1); Disabled_probe_stays_Unknown_and_fires_no_events (safety guard when operator wants to disable probing); Shutdown_stops_the_probe_loop (probe count captured at shutdown, delay 400ms, assert ≤ 1 extra to tolerate the narrow race where an in-flight tick completes after shutdown — the contract is 'no new ticks scheduled' not 'instantaneous freeze'). FlappyTransport fake exposes a volatile Reachable flag so tests can toggle the PLC availability mid-run, + ProbeCount counter so tests can assert the loop actually issued requests. WaitForStateAsync helper polls GetHostStatuses up to a deadline; tolerates scheduler jitter on slow CI runners. Full solution: 0 errors, 202 unit + integration tests pass (22 Modbus + 180 pre-existing).
...
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com >
2026-04-18 12:12:00 -04:00
Joseph Doherty
18b3e24710
Phase 3 PR 22 — Modbus ISubscribable via polling overlay. Modbus has no push model at the wire protocol (unlike MXAccess's OnDataChange callback or OPC UA's own Subscription service), so the driver layers a per-subscription polling loop on top of the existing IReadable path: SubscribeAsync returns an opaque ModbusSubscriptionHandle, starts a background Task.Run that sleeps for the requested publishing interval (floored to 100ms so a misconfigured sub-10ms request doesn't hammer the PLC), reads every subscribed tag through the same FC01/03/04 path the one-shot ReadAsync uses, diffs the returned DataValueSnapshot against the last known value per tag, and raises OnDataChange exactly when (a) it's the first poll (initial-data push per OPC UA Part 4 convention) or (b) boxed value changed or (c) StatusCode changed — stable values don't generate event traffic after the initial push, matching the v1 MXAccess OnDataChange shape. SubscriptionState record holds the handle + tag list + interval + per-subscription CancellationTokenSource + ConcurrentDictionary<string,DataValueSnapshot> LastValues; UnsubscribeAsync removes the state from _subscriptions and cancels the CTS, stopping the polling loop cleanly. Multiple overlapping subscriptions each get their own polling Task so a slow PLC on one subscription can't stall the others. ShutdownAsync cancels every active subscription CTS before tearing down the transport so the driver doesn't leave orphaned polling tasks pumping requests against a disposed socket. Transient poll errors are swallowed inside the loop (the loop continues to the next tick) — the driver's health surface reflects the last-known Degraded state from the underlying ReadAsync path. OperationCanceledException is caught separately to exit the loop silently on unsubscribe/shutdown. Tests (6 new ModbusSubscriptionTests): Initial_poll_raises_OnDataChange_for_every_subscribed_tag asserts the initial-data push fires once per tag in the subscribe call (2 tags → 2 events with FullReference='Level' and FullReference='Temp'); Unchanged_values_do_not_raise_after_initial_poll lets the loop run ~5 cycles at 100ms with a stable register value, asserts only the initial push fires (no event spam on stable tags); Value_change_between_polls_raises_OnDataChange mutates the fake register bank between poll ticks and asserts a second event fires with the new value (verified via e.Snapshot.Value.ShouldBe((short)200)); Unsubscribe_stops_the_polling_loop captures the event count right after UnsubscribeAsync, mutates a register that would have triggered a change if polling continued, asserts the count stays the same after 400ms; SubscribeAsync_floors_intervals_below_100ms passes a 10ms interval + asserts only 1 event fires across 300ms (if the floor weren't enforced we'd see 30 — the test asserts the floor semantically by counting events on stable data); Multiple_subscriptions_fire_independently creates two subs on different tags, unsubscribes only one, mutates the other's tag, asserts only the surviving sub emits while the unsubscribed one stays at its pre-unsubscribe count. FakeTransport in this test file is scoped to FC03 only since that's all the subscription path exercises — keeps the test doubles minimal and the failure modes obvious. WaitForCountAsync helper polls a ConcurrentQueue up to a deadline, makes the tests tolerant of scheduler jitter on slow CI runners. Full solution: 0 errors, 195 tests pass (6 new subscription + 9 existing Modbus + 180 pre-existing). ModbusDriver now implements IDriver + ITagDiscovery + IReadable + IWritable + ISubscribable — five of the eight capability interfaces. IAlarmSource + IHistoryProvider remain unimplemented because Modbus has no wire-level alarm or history semantics; IHostConnectivityProbe is a plausible future addition (treat transport disconnect as a Stopped signal) but needs the socket-level connection-state tracking plumbed through IModbusTransport which is its own PR.
...
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com >
2026-04-18 12:03:39 -04:00
Joseph Doherty
058c3dddd3
Phase 3 PR 21 — Modbus TCP driver: first native-protocol greenfield for v2. New src/Driver.Modbus project with ModbusDriver implementing IDriver + ITagDiscovery + IReadable + IWritable. Validates the driver-agnostic abstractions (IAddressSpaceBuilder, DriverAttributeInfo, DataValueSnapshot, WriteRequest/WriteResult) generalize beyond Galaxy — nothing Galaxy-specific is used here. ModbusDriverOptions carries Host/Port/UnitId/Timeout + a pre-declared tag list (Modbus has no discovery protocol — tags are configuration). IModbusTransport abstracts the socket layer so tests swap in-memory fakes; concrete ModbusTcpTransport speaks the MBAP ADU (TxId + Protocol=0 + Length + UnitId + PDU) over TcpClient, serializes requests through a semaphore for single-flight in-order responses, validates the response TxId matches, surfaces server exception PDUs as ModbusException with function code + exception code. DiscoverAsync streams one folder per driver with a BaseDataVariable per tag + DriverAttributeInfo that flags writable tags as SecurityClassification.Operate vs ViewOnly for read-only regions. ReadAsync routes per-tag by ModbusRegion: FC01 for Coils, FC02 for DiscreteInputs, FC03 for HoldingRegisters, FC04 for InputRegisters; register values decoded through System.Buffers.Binary.BinaryPrimitives (BigEndian for single-register Int16/UInt16 + two-register Int32/UInt32/Float32 per standard modbus word-swap conventions). WriteAsync uses FC05 (Write Single Coil with 0xFF00/0x0000 encoding) for booleans, FC06 (Write Single Register) for 16-bit types, FC16 (Write Multiple Registers) for 32-bit types. Unknown tag → BadNodeIdUnknown; write to InputRegister or DiscreteInput or Writable=false tag → BadNotWritable; exception during transport → BadInternalError + driver health Degraded. Subscriptions + Historian + Alarms deliberately out of scope — Modbus has no push model (subscribe would be a polling overlay, additive PR) and no history/alarm semantics at the protocol level. Tests (9 new ModbusDriverTests): InitializeAsync connects + populates the tag map + sets health=Healthy; Read Int16 from HoldingRegister returns BigEndian value; Read Float32 spans two registers BigEndian (IEEE 754 single for 25.5f round-trips exactly); Read Coil returns boolean from the bit-packed response; unknown tag name returns BadNodeIdUnknown without an exception; Write UInt16 round-trips via FC06; Write Float32 uses FC16 (two-register write verified by decoding back through the fake register bank); Write to InputRegister returns BadNotWritable; Discover streams one folder + one variable per tag with correct DriverDataType mapping (Int16/Int32→Int32, UInt16/UInt32→Int32, Float32→Float32, Bool→Boolean). FakeTransport simulates a 256-register/256-coil bank + implements the 7 function codes the driver uses. slnx updated with the new src + tests entries. Full solution post-add: 0 errors, 189 tests pass (9 Modbus + 180 pre-existing). IDriver abstraction validated against a fundamentally different protocol — Modbus TCP has no AlarmExtension, no ScanState, no IPC boundary, no historian, no LDAP — and the same builder/reader/writer contract plugged straight in. Future PRs on this driver: ISubscribable via a polling loop, IHostConnectivityProbe for dead-device detection, PLC-specific data-type extensions (Int64/BCD/string-in-registers).
...
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com >
2026-04-18 11:55:21 -04:00
Joseph Doherty
22d3b0d23c
Phase 3 PR 19 — LDAP user identity + Basic256Sha256 security profile. Replaces the anonymous-only endpoint with a configurable security profile and an LDAP-backed UserName token validator. New IUserAuthenticator abstraction in Backend/Security/: LdapUserAuthenticator binds to the configured directory (reuses the pattern from Admin.Security.LdapAuthService without the cross-app dependency — Novell.Directory.Ldap.NETStandard 3.6.0 package ref added to Server alongside the existing OPCFoundation packages) and maps group membership to OPC UA roles via LdapOptions.GroupToRole (case-insensitive). DenyAllUserAuthenticator is the default when Ldap.Enabled=false so UserName token attempts return a clean BadUserAccessDenied rather than hanging on a localhost:3893 bind attempt. OpcUaSecurityProfile enum + LdapOptions nested record on OpcUaServerOptions. Profile=None keeps the PR 17 shape (SecurityPolicies.None + Anonymous token only) so existing integration tests stay green; Profile=Basic256Sha256SignAndEncrypt adds a second ServerSecurityPolicy (Basic256Sha256 + SignAndEncrypt) to the collection and, when Ldap.Enabled=true, adds a UserName token policy scoped to SecurityPolicies.Basic256Sha256 only — passwords must ride an encrypted channel, the stack rejects UserName over None. OtOpcUaServer.OnServerStarted hooks SessionManager.ImpersonateUser: AnonymousIdentityToken passes through; UserNameIdentityToken delegates to IUserAuthenticator.AuthenticateAsync — rejected identities throw ServiceResultException(BadUserAccessDenied); accepted identities get a RoleBasedIdentity that carries the resolved roles through session.Identity so future PRs can gate writes by role. OpcUaApplicationHost + OtOpcUaServer constructors take IUserAuthenticator as a dependency. Program.cs binds the new OpcUaServer:Ldap section from appsettings (Enabled defaults false, GroupToRole parsed as Dictionary<string,string>), registers IUserAuthenticator as LdapUserAuthenticator when enabled or DenyAllUserAuthenticator otherwise. PR 17 integration test updated to pass DenyAllUserAuthenticator so it keeps exercising the anonymous-only path unchanged. Tests — SecurityConfigurationTests (new, 13 cases): DenyAllAuthenticator rejects every credential; LdapAuthenticator rejects blank creds without hitting the server; rejects when Enabled=false; rejects plaintext when both UseTls=false AND AllowInsecureLdap=false (safety guard matching the Admin service); EscapeLdapFilter theory (4 rows: plain passthrough, parens/asterisk/backslash → hex escape) — regression guard against LDAP injection; ExtractOuSegment theory (3 rows: finds ou=, returns null when absent, handles multiple ou segments by returning first); ExtractFirstRdnValue theory (3 rows: strips cn= prefix, handles single-segment DN, returns plain string unchanged when no =). OpcUaServerOptions_default_is_anonymous_only asserts the default posture preserves PR 17 behavior. InternalsVisibleTo('ZB.MOM.WW.OtOpcUa.Server.Tests') added to Server csproj so ExtractOuSegment and siblings are reachable from the tests. Full solution: 0 errors, 180 tests pass (8 Core + 14 Proxy + 24 Configuration + 6 Shared + 91 Galaxy.Host + 19 Server (17 unit + 2 integration) + 18 Admin). Live-LDAP integration test (connect via Basic256Sha256 endpoint with a real user from GLAuth, assert the session.Identity carries the mapped role) is deferred to a follow-up — it requires the GLAuth dev instance to be running at localhost:3893 which is dev-machine-specific, and the test harness for that also needs a fresh client-side certificate provisioned by the live server's trusted store.
...
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com >
2026-04-18 08:49:46 -04:00
Joseph Doherty
dd3a449308
Phase 3 PR 18 — delete v1 archived projects. PR 2 archived via IsTestProject=false + PropertyGroup comment; PR 17 landed the full v2 OPC UA server runtime (ApplicationConfiguration + endpoint + client integration test); every v1 surface is now functionally superseded. This PR removes the archive: 154 files across 5 projects — src/OtOpcUa.Host (v1 server, 158 files), src/Historian.Aveva (v1 historian plugin, 4 files), tests/OtOpcUa.Tests.v1Archive (494 unit tests that were archived in PR 2 with IsTestProject=false), tests/Historian.Aveva.Tests (18 tests against the v1 plugin), tests/OtOpcUa.IntegrationTests (6 tests against the v1 Host). slnx trimmed to reflect the current set (12 src + 12 tests). Verified zero incoming references from live projects before deleting — no live csproj references .Host or .Historian.Aveva since PR 5 ported Historian into Driver.Galaxy.Host/Backend/Historian/ and PR 17 stood up the new OtOpcUa.Server. Full solution post-delete: 0 errors, 165 unit + integration tests pass (8 Core + 14 Proxy + 24 Configuration + 91 Galaxy.Host + 6 Shared + 4 Server + 18 Admin) — no regressions. Recovery path if a future PR needs to resurrect a specific v1 routine: git revert this commit or cherry-pick the specific file from pre-delete history; v1 is preserved in the full branch history, not lost.
...
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com >
2026-04-18 08:35:22 -04:00
Joseph Doherty
46834a43bd
Phase 3 PR 17 — complete OPC UA server startup end-to-end + integration test. PR 16 shipped the materialization shape (DriverNodeManager / OtOpcUaServer) without the activation glue; this PR finishes the scope so an external OPC UA client can actually connect, browse, and read. New OpcUaServerOptions DTO bound from the OpcUaServer section of appsettings.json (EndpointUrl default opc.tcp://0.0.0.0:4840/OtOpcUa, ApplicationName, ApplicationUri, PkiStoreRoot default %ProgramData%\OtOpcUa\pki, AutoAcceptUntrustedClientCertificates default true for dev — production flips via config). OpcUaApplicationHost wraps Opc.Ua.Configuration.ApplicationInstance: BuildConfiguration constructs the ApplicationConfiguration programmatically (no external XML) with SecurityConfiguration pointing at <PkiStoreRoot>/own, /issuers, /trusted, /rejected directories — stack auto-creates the cert folders on first run and generates a self-signed application certificate via CheckApplicationInstanceCertificate, ServerConfiguration.BaseAddresses set to the endpoint URL + SecurityPolicies just None + UserTokenPolicies just Anonymous with PolicyId='Anonymous' + SecurityPolicyUri=None so the client's UserTokenPolicy lookup succeeds at OpenSession, TransportQuotas.OperationTimeout=15s + MinRequestThreadCount=5 / MaxRequestThreadCount=100 / MaxQueuedRequestCount=200, CertificateValidator auto-accepts untrusted when configured. StartAsync creates the OtOpcUaServer (passes DriverHost + ILoggerFactory so one DriverNodeManager is created per registered driver in CreateMasterNodeManager from PR 16), calls ApplicationInstance.Start(server) to bind the endpoint, then walks each DriverNodeManager and drives a fresh GenericDriverNodeManager.BuildAddressSpaceAsync against it so the driver's discovery streams into the address space that's already serving clients. Per-driver discovery is isolated per decision #12 : a discovery exception marks the driver's subtree faulted but the server stays up serving the other drivers' subtrees. DriverHost.GetDriver(instanceId) public accessor added alongside the existing GetHealth so OtOpcUaServer can enumerate drivers during CreateMasterNodeManager. DriverNodeManager.Driver property made public so OpcUaApplicationHost can identify which driver each node manager wraps during the discovery loop. OpcUaServerService constructor takes OpcUaApplicationHost — ExecuteAsync sequence now: bootstrap.LoadCurrentGenerationAsync → applicationHost.StartAsync → infinite Task.Delay until stop. StopAsync disposes the application host (which stops the server via OtOpcUaServer.Stop) before disposing DriverHost. Program.cs binds OpcUaServerOptions from appsettings + registers OpcUaApplicationHost + OpcUaServerOptions as singletons. Integration test (OpcUaServerIntegrationTests, Category=Integration): IAsyncLifetime spins up the server on a random non-default port (48400+random for test isolation) with a per-test-run PKI store root (%temp%/otopcua-test-<guid>) + a FakeDriver registered in DriverHost that has ITagDiscovery + IReadable implementations — DiscoverAsync registers TestFolder>Var1, ReadAsync returns 42. Client_can_connect_and_browse_driver_subtree creates an in-process OPC UA client session via CoreClientUtils.SelectEndpoint (which talks to the running server's GetEndpoints and fetches the live EndpointDescription with the actual PolicyId), browses the fake driver's root, asserts TestFolder appears in the returned references. Client_can_read_a_driver_variable_through_the_node_manager constructs the variable NodeId using the namespace index the server registered (urn:OtOpcUa:fake), calls Session.ReadValue, asserts the DataValue.Value is 42 — the whole pipeline (client → server endpoint → DriverNodeManager.OnReadValue → FakeDriver.ReadAsync → back through the node manager → response to client) round-trips correctly. Dispose tears down the session, server, driver host, and PKI store directory. Full solution: 0 errors, 165 tests pass (8 Core unit + 14 Proxy unit + 24 Configuration unit + 6 Shared unit + 91 Galaxy.Host unit + 4 Server (2 unit NodeBootstrap + 2 new integration) + 18 Admin). End-to-end outcome: PR 14's GalaxyAlarmTracker alarm events now flow through PR 15's GenericDriverNodeManager event forwarder → PR 16's ConditionSink → OPC UA AlarmConditionState.ReportEvent → out to every OPC UA client subscribed to the alarm condition. The full alarm subsystem (driver-side subscription of the Galaxy 4-attribute quartet, Core-side routing by source node id, Server-side AlarmConditionState materialization with ReportEvent dispatch) is now complete and observable through any compliant OPC UA client. LDAP / security-profile wire-up (replacing the anonymous-only endpoint with BasicSignAndEncrypt + user identity mapping to NodePermissions role) is the next layer — it reuses the same ApplicationConfiguration plumbing this PR introduces but needs a deployment-policy source (central config DB) for the cert trust decisions.
...
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com >
2026-04-18 08:18:37 -04:00
Joseph Doherty
f53c39a598
Phase 3 PR 16 — concrete OPC UA server scaffolding with AlarmConditionState materialization. Introduces the OPCFoundation.NetStandard.Opc.Ua.Server package (v1.5.374.126, same version the v1 stack already uses) and two new server-side classes: DriverNodeManager : CustomNodeManager2 is the concrete realization of PR 15's IAddressSpaceBuilder contract — Folder() creates FolderState nodes under an Organizes hierarchy rooted at ObjectsFolder > DriverInstanceId; Variable() creates BaseDataVariableState with DataType mapped from DriverDataType (Boolean/Int32/Float/Double/String/DateTime) + ValueRank (Scalar or OneDimension) + AccessLevel CurrentReadOrWrite; AddProperty() creates PropertyState with HasProperty reference. Read hook wires OnReadValue per variable to route to IReadable.ReadAsync; Write hook wires OnWriteValue to route to IWritable.WriteAsync and surface per-tag StatusCode. MarkAsAlarmCondition() materializes an OPC UA AlarmConditionState child of the variable, seeded from AlarmConditionInfo (SourceName, InitialSeverity → UA severity via Low=250/Medium=500/High=700/Critical=900, InitialDescription), initial state Enabled + Acknowledged + Inactive + Retain=false. Returns an IAlarmConditionSink whose OnTransition updates alarm.Severity/Time/Message and switches state per AlarmType string ('Active' → SetActiveState(true) + SetAcknowledgedState(false) + Retain=true; 'Acknowledged' → SetAcknowledgedState(true); 'Inactive' → SetActiveState(false) + Retain=false if already Acked) then calls alarm.ReportEvent to emit the OPC UA event to subscribed clients. Galaxy's GalaxyAlarmTracker (PR 14) now lands at a concrete AlarmConditionState node instead of just raising an unobserved C# event. OtOpcUaServer : StandardServer wires one DriverNodeManager per DriverHost.GetDriver during CreateMasterNodeManager — anonymous endpoint, no security profile (minimum-viable; LDAP + security-profile wire-up is the next PR). DriverHost gains public GetDriver(instanceId) so the server can enumerate drivers at startup. NestedBuilder inner class in DriverNodeManager implements IAddressSpaceBuilder by temporarily retargeting the parent's _currentFolder during each call so Folder→Variable→AddProperty land under the correct subtree — not thread-safe if discovery ran concurrently, but GenericDriverNodeManager.BuildAddressSpaceAsync is sequential per driver so this is safe by construction. NuGet audit suppress for GHSA-h958-fxgg-g7w3 (moderate-severity in OPCFoundation.NetStandard.Opc.Ua.Core 1.5.374.126; v1 stack already accepts this risk on the same package version). PR 16 is scoped as scaffolding — the actual server startup (ApplicationInstance, certificate config, endpoint binding, session management wiring into OpcUaServerService.ExecuteAsync) is deferred to a follow-up PR because it needs ApplicationConfiguration XML + optional-cert-store logic that depends on per-deployment policy decisions. The materialization shape is complete: a subsequent PR adds 100 LOC to start the server and all the already-written IAddressSpaceBuilder + alarm-condition + read/write wire-up activates end-to-end. Full solution: 0 errors, 152 unit tests pass (no new tests this PR — DriverNodeManager unit testing needs an IServerInternal mock which is heavyweight; live-endpoint integration tests land alongside the server-startup PR).
...
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com >
2026-04-18 08:00:36 -04:00
Joseph Doherty
190d09cdeb
Phase 3 PR 15 — alarm-condition contract in IAddressSpaceBuilder + wire OnAlarmEvent through GenericDriverNodeManager. IAddressSpaceBuilder.IVariableHandle gains MarkAsAlarmCondition(AlarmConditionInfo) which returns an IAlarmConditionSink. AlarmConditionInfo carries SourceName/InitialSeverity/InitialDescription. Concrete address-space builders (the upcoming PR 16 OPC UA server backend) materialize a sibling AlarmConditionState node on the first call; the sink receives every lifecycle transition the generic node manager forwards. GenericDriverNodeManager gains a CapturingBuilder wrapper that transparently wraps every Folder/Variable call — the wrapper observes MarkAsAlarmCondition calls without participating in materialization, captures the resulting IAlarmConditionSink into an internal source-node-id → sink ConcurrentDictionary keyed by IVariableHandle.FullReference. After DiscoverAsync completes, if the driver implements IAlarmSource the node manager subscribes to OnAlarmEvent and routes every AlarmEventArgs to the sink registered for args.SourceNodeId — unknown source ids are dropped silently (may belong to another driver or to a variable the builder chose not to flag). Dispose unsubscribes the forwarder to prevent dangling invocation-list references across node-manager rebuilds. GalaxyProxyDriver.DiscoverAsync now calls handle.MarkAsAlarmCondition(new AlarmConditionInfo(fullName, AlarmSeverity.Medium, null)) on every attr.IsAlarm=true variable — severity seed is Medium because the live Priority byte arrives through the subsequent GalaxyAlarmEvent stream (which PR 14's GalaxyAlarmTracker now emits); the Admin UI sees the severity update on the first transition. RecordingAddressSpaceBuilder in Driver.Galaxy.E2E gains a RecordedAlarmCondition list + a RecordingSink implementation that captures AlarmEventArgs for test assertion — the E2E parity suite can now verify alarm-condition registration shape in addition to folder/variable shape. Tests (4 new GenericDriverNodeManagerTests): Alarm_events_are_routed_to_the_sink_registered_for_the_matching_source_node_id — 2 alarms registered (Tank.HiHi + Heater.OverTemp), driver raises an event for Tank.HiHi, the Tank.HiHi sink captures the payload, the Heater.OverTemp sink does not (tag-scoped fan-out, not broadcast); Non_alarm_variables_do_not_register_sinks — plain Tank.Level in the same discover is not in TrackedAlarmSources; Unknown_source_node_id_is_dropped_silently — a transition for Unknown.Source doesn't reach any sink + no exception; Dispose_unsubscribes_from_OnAlarmEvent — post-dispose, a transition for a previously-registered tag is no-op because the forwarder detached. InternalsVisibleTo('ZB.MOM.WW.OtOpcUa.Core.Tests') added to Core csproj so TrackedAlarmSources internal property is visible to the test. Full solution: 0 errors, 152 unit tests pass (8 Core + 14 Proxy + 14 Admin + 24 Configuration + 6 Shared + 84 Galaxy.Host + 2 Server). PR 16 will implement the concrete OPC UA address-space builder that materializes AlarmConditionState from this contract.
...
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com >
2026-04-18 07:51:35 -04:00
Joseph Doherty
c14624f012
Phase 2 PR 14 — alarm subsystem wire-up. Per IsAlarm=true attribute (PR 9 added the discovery flag), GalaxyAlarmTracker in Backend/Alarms/ advises the four Galaxy alarm-state attributes: .InAlarm (boolean alarm active), .Priority (int severity), .DescAttrName (human-readable description), .Acked (boolean acknowledged). Runs the OPC UA Part 9 alarm lifecycle state machine simplified for the Galaxy AlarmExtension model and raises AlarmTransition events on transitions operators must react to — Active (InAlarm false→true, default Unacknowledged), Acknowledged (Acked false→true while InAlarm still true), Inactive (InAlarm true→false). MxAccessGalaxyBackend instantiates the tracker in its constructor with delegate-based subscribe/unsubscribe/write pointers to MxAccessClient, hooks TransitionRaised to forward each transition through the existing OnAlarmEvent IPC event that PR 4 ConnectionSink wires into MessageKind.AlarmEvent frames — no new contract messages required since GalaxyAlarmEvent already exists in Shared.Contracts. Field mapping: EventId = fresh Guid.ToString('N') per transition, ObjectTagName = alarm attribute full reference, AlarmName = alarm attribute full reference, Severity = tracked Priority, StateTransition = 'Active'|'Acknowledged'|'Inactive', Message = DescAttrName or tag fallback, UtcUnixMs = transition time. DiscoverAsync caches every IsAlarm=true attribute's full reference (tag.attribute) into _discoveredAlarmTags (ConcurrentBag cleared-then-filled on every re-Discover to track Galaxy redeploys). SubscribeAlarmsAsync iterates the cache and advises each via GalaxyAlarmTracker.TrackAsync; best-effort per-alarm — a subscribe failure on one alarm doesn't abort the whole call since operators prefer partial alarm coverage to none. Tracker is internally idempotent on repeat Track calls (second invocation for same alarm tag is a no-op; already-subscribed check short-circuits before the 4 MXAccess sub calls). Subscribe-failure rollback inside TrackAsync removes the alarm state + unadvises any of the 4 that did succeed so a partial advise can't leak a phantom tracking entry. AcknowledgeAlarmAsync routes to tracker.AcknowledgeAsync which writes the operator comment to <tag>.AckMsg via MxAccessClient.WriteAsync — writes use the existing MXAccess OnWriteComplete TCS-by-handle path (PR 4 Medium 4) so a runtime-refused ack bubbles up as Success=false rather than false-positive. State-machine quirks preserved from v1: (1) initial Acked=true on subscribe does NOT fire Acknowledged (alarm at rest, pre-acknowledged — default state is Acked=true so the first subscribe callback is a no-op transition), (2) Acked false→true only fires Acknowledged when InAlarm is currently true (acking a latched-inactive alarm is not a user-visible transition), (3) Active transition clears the Acked flag in-state so the next Acked callback correctly fires Acknowledged (v1 had this buried in the ConditionState logic; we track it on the AlarmState struct directly). Priority value handled as int/short/long via type pattern match with int.MaxValue guard — Galaxy attribute category returns varying CLR types (Int32 is canonical but some older templates use Int16), and a long overflow cast to int would silently corrupt the severity. Dispose cascade in MxAccessGalaxyBackend.Dispose: alarm-tracker unsubscribe→dispose, probe-manager unsubscribe→dispose, mx.ConnectionStateChanged detach, historian dispose — same discipline PR 6 / PR 8 / PR 13 established so dangling invocation-list refs don't survive a backend recycle. #pragma warning disable CS0067 around OnAlarmEvent removed since the event is now raised. Tests (9 new, GalaxyAlarmTrackerTests): four-attribute subscribe per alarm, idempotent repeat-track, InAlarm false→true fires Active with Priority + Desc, InAlarm true→false fires Inactive, Acked false→true while InAlarm fires Acknowledged, Acked transition while InAlarm=false does not fire, AckMsg write path carries the comment, snapshot reports latest four fields, foreign probe callback for a non-tracked tag is silently dropped. Full Galaxy.Host.Tests Unit suite 84 pass / 0 fail (9 new alarm + 12 PR 13 probe + 21 PR 12 quality + 42 pre-existing). Galaxy.Host builds clean (0/0). Branches off phase-2-pr13-runtime-probe so the MxAccessGalaxyBackend constructor/Dispose chain gets the probe-manager + alarm-tracker wire-up in a coherent order; fast-forwards if PR 13 merges first.
...
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com >
2026-04-18 07:34:13 -04:00
Joseph Doherty
04d267d1ea
Phase 2 PR 13 — port GalaxyRuntimeProbeManager state machine + wire per-platform ScanState probing. PR 8 gave operators the gateway-level transport signal (MxAccessClient.ConnectionStateChanged → OnHostStatusChanged tagged with the Wonderware client identity) — enough to detect when the entire MXAccess COM proxy dies, but silent when a specific or goes down while the gateway stays alive. This PR closes that gap: a pure-logic GalaxyRuntimeProbeManager (ported from v1's 472-LOC GalaxyRuntimeProbeManager.cs, distilled to ~240 LOC of state machine without the OPC UA node-manager entanglement) lives in Backend/Stability/, advises <TagName>.ScanState per WinPlatform + AppEngine gobject discovered during DiscoverAsync, runs the Unknown → Running → Stopped state machine with v1's documented semantics preserved verbatim, and raises a StateChanged event on transitions operators should react to. MxAccessGalaxyBackend instantiates the probe manager in the constructor with SubscribeAsync/UnsubscribeAsync delegate pointers into MxAccessClient, hooks StateChanged to forward each transition through the same OnHostStatusChanged IPC event the gateway signal uses (HostName = platform/engine TagName, RuntimeStatus = 'Running'|'Stopped'|'Unknown', LastObservedUtcUnixMs from the state-change timestamp), so Admin UI gets per-host signals flowing through the existing PR 8 wire with no additional IPC plumbing. State machine rules ported from v1 runtimestatus.md: (a) ScanState is on-change-only — a stably-Running host may go hours without a callback, so Running → Stopped is driven only by explicit ScanState=false, never by starvation; (b) Unknown → Running is a startup transition and does NOT fire StateChanged (would paint every host as 'just recovered' at startup, which is noise and can clear Bad quality set by a concurrently-stopping sibling); (c) Stopped → Running fires StateChanged for the real recovery case; (d) Running → Stopped fires StateChanged; (e) Unknown → Stopped fires StateChanged because that's the first-known-bad signal operators need when a host is down at our startup time. MxAccessGalaxyBackend.DiscoverAsync calls _probeManager.SyncAsync with the runtime-host subset of the hierarchy (CategoryId == 1 WinPlatform or 3 AppEngine) as a best-effort step after building the Discover response — probe failures are swallowed so Discover still returns the hierarchy even if a per-host advise fails; the gateway signal covers the critical rung. SyncAsync is idempotent (second call with the same set is a no-op) and handles the diff on re-Discover for tag rename / host add / host remove. Subscribe failure rolls back the host's state entry under the lock so a later probe callback for a never-advised tag can't transition a phantom state from Unknown to Stopped and fan out a false host-down signal (the same protection v1's GalaxyRuntimeProbeManager had at line 237-243 of v1 with a captured-probe-string comparison under the lock). MxAccessGalaxyBackend.Dispose unsubscribes the StateChanged handler before disposing the probe manager to prevent dangling invocation-list references across reconnects, same discipline as PR 8's ConnectionStateChanged and PR 6's SubscriptionReplayFailed. Tests (12 new GalaxyRuntimeProbeManagerTests): Sync_subscribes_to_ScanState_per_host verifies tag.ScanState subscriptions are advised per Platform/Engine; Sync_is_idempotent_on_repeat_call_with_same_set verifies no duplicate subscribes; Sync_unadvises_removed_hosts verifies the diff unadvises gone hosts; Subscribe_failure_rolls_back_host_entry_so_later_transitions_do_not_fire_stale_events covers the rollback-on-subscribe-fail guard; Unknown_to_Running_does_not_fire_StateChanged preserves the startup-noise rule; Running_to_Stopped_fires_StateChanged_with_both_states asserts OldState and NewState are both captured in the transition record; Stopped_to_Running_fires_StateChanged_for_recovery verifies the recovery case; Unknown_to_Stopped_fires_StateChanged_for_first_known_bad_signal preserves the first-bad rule; Repeated_Good_Running_callbacks_do_not_fire_duplicate_events verifies the state-tracking de-dup; Unknown_callback_for_non_tracked_probe_is_dropped asserts a foreign callback is silently ignored; Snapshot_reports_current_state_for_every_tracked_host covers the dashboard query hook; IsRuntimeHost_recognizes_WinPlatform_and_AppEngine_category_ids asserts the CategoryId filter. Galaxy.Host.Tests Unit suite 75 pass / 0 fail (12 new probe + 63 pre-existing). Galaxy.Host builds clean (0 errors / 0 warnings). Branches off v2.
...
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com >
2026-04-18 07:27:56 -04:00
4448db8207
Merge pull request 'Phase 2 PR 12 � richer historian quality mapping' ( #11 ) from phase-2-pr12-quality-mapper into v2
2026-04-18 07:22:44 -04:00
Joseph Doherty
f24f969a85
Phase 2 PR 12 — richer historian quality mapping. Replace MxAccessGalaxyBackend's inline MapHistorianQualityToOpcUa category-only helper (192+→Good, 64-191→Uncertain, 0-63→Bad) with a new public HistorianQualityMapper.Map utility that preserves specific OPC DA subcodes — BadNotConnected(8)→0x808A0000u instead of generic Bad(0x80000000u), UncertainSubNormal(88)→0x40950000u instead of generic Uncertain, Good_LocalOverride(216)→0x00D80000u instead of generic Good, etc. Mirrors v1 QualityMapper.MapToOpcUaStatusCode byte-for-byte without pulling in OPC UA types — the function returns uint32 literals that are the canonical OPC UA StatusCode wire encoding, surfaced directly as DataValueSnapshot.StatusCode on the Proxy side with no additional translation. Unknown subcodes fall back to the family category (255→Good, 150→Uncertain, 50→Bad) so a future SDK change that adds a quality code we don't map yet still gets a sensible bucket. GalaxyDataValue wire shape unchanged (StatusCode stays uint) — this is a pure fidelity upgrade on the Host side. Downstream callers (Admin UI status dashboard, OPC UA clients receiving historian samples) can now distinguish e.g. a transport outage (BadNotConnected) from a sensor fault (BadSensorFailure) from a warm-up delay (BadWaitingForInitialData) without a second round-trip or dashboard heuristic. 21 new tests (HistorianQualityMapperTests): theory with 15 rows covering every specific mapping from the v1 QualityMapper table, plus 6 fallback tests verifying unknown-subcode codes in each family (Good/Uncertain/Bad) collapse to the family default. Galaxy.Host.Tests Unit suite 56/0 (21 new + 35 existing). Galaxy.Host builds clean (0/0). Branches off v2.
...
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com >
2026-04-18 07:11:02 -04:00
Joseph Doherty
ca025ebe0c
Phase 2 PR 11 — HistoryReadEvents IPC (alarm history). New Shared.Contracts messages HistoryReadEventsRequest/Response + GalaxyHistoricalEvent DTO (MessageKind 0x66/0x67). IGalaxyBackend gains HistoryReadEventsAsync, Stub/DbBacked return canonical pending error, MxAccessGalaxyBackend delegates to _historian.ReadEventsAsync (ported in PR 5) and maps HistorianEventDto → GalaxyHistoricalEvent — Guid.ToString() for EventId wire shape, DateTime → Unix ms for both EventTime (when the event fired in the process) and ReceivedTime (when the Historian persisted it), DisplayText + Severity pass through. SourceName is string? — null means 'all sources' (passed straight through to HistorianDataSource.ReadEventsAsync which adds the AddEventFilter('Source', Equal, ...) only when non-null). Distinct from the live GalaxyAlarmEvent type because historical rows carry both timestamps and lack StateTransition (Historian logs instantaneous events, not the OPC UA Part 9 alarm lifecycle; translating to OPC UA event lifecycle is the alarm-subsystem's job). Guards: null historian → Historian-disabled error; SDK exception → Success=false with message chained. Tests (3 new): disabled-error when historian null, maps HistorianEventDto with full field set (Id/Source/EventTime/ReceivedTime/DisplayText/Severity=900) to GalaxyHistoricalEvent, null SourceName passes through unchanged (verifies the 'all sources' contract). Galaxy.Host.Tests Unit suite 34 pass / 0 fail. Galaxy.Host builds clean. Branches off phase-2-pr10-history-attime since both extend the MessageKind enum; fast-forwards if PR 10 merges first.
...
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com >
2026-04-18 07:08:16 -04:00
Joseph Doherty
d13f919112
Phase 2 PR 10 — HistoryReadAtTime IPC surface. New Shared.Contracts messages HistoryReadAtTimeRequest/Response (MessageKind 0x64/0x65), IGalaxyBackend gains HistoryReadAtTimeAsync, Stub/DbBacked return canonical pending error, MxAccessGalaxyBackend delegates to _historian.ReadAtTimeAsync (ported in PR 5, exposed now) — request timestamp array is flow-encoded as Unix ms to avoid MessagePack DateTime quirks then re-hydrated to DateTime on the Host side. Per-sample mapping uses the same ToWire(HistorianSample) helper as ReadRawAsync so the category→StatusCode mapping stays consistent (Quality byte 192+ → Good 0u, 64-191 → Uncertain, 0-63 → Bad 0x80000000u). Guards: null historian → "Historian disabled" (symmetric with other history paths); empty timestamp array short-circuits to Success=true, Values=[] without an SDK round-trip; SDK exception → Success=false with the message chained. Proxy-side IHistoryProvider.ReadAtTimeAsync capability doesn't exist in Core.Abstractions yet (OPC UA HistoryReadAtTime service is supported but the current IHistoryProvider only has ReadRawAsync + ReadProcessedAsync) — this PR adds the Host-side surface so a future Core.Abstractions extension can wire it through without needing another IPC change. Tests (4 new): disabled-error when historian null, empty-timestamp short-circuit without SDK call, Unix-ms↔DateTime round-trip with Good samples at two distinct timestamps, missing sample (Quality=0) maps to 0x80000000u Bad category. Galaxy.Host.Tests Unit suite: 31 pass / 0 fail (4 new at-time + 27 pre-existing). Galaxy.Host builds clean. Branches off v2.
...
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com >
2026-04-18 07:03:25 -04:00
d2ebb91cb1
Merge pull request 'Phase 2 PR 9 — thread IsAlarm discovery flag end-to-end' ( #8 ) from phase-2-pr9-alarms into v2
2026-04-18 06:59:25 -04:00
90ce0af375
Merge pull request 'Phase 2 PR 8 — gateway-level host-status push from MxAccessGalaxyBackend' ( #7 ) from phase-2-pr8-alarms-hoststatus into v2
2026-04-18 06:59:04 -04:00
e250356e2a
Merge pull request 'Phase 2 PR 7 — wire IHistoryProvider.ReadProcessedAsync end-to-end' ( #6 ) from phase-2-pr7-history-processed into v2
2026-04-18 06:59:02 -04:00