Guard the two nullable child VM dereferences (BrowseTree at ConnectAsync
and History at ViewHistoryForSelectedNode) with != null checks, matching
the guarding style already used for Subscriptions and Alarms nearby.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Route the synchronous IsLoading = true write through _dispatcher.Post so
both IsLoading assignments use the same dispatch path as Results.Clear()
and the final IsLoading = false, eliminating the ordering hazard.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The Timeout_maps_to_BadInternalError_without_killing_the_engine test's
"Hang" script busy-looped on Environment.TickCount64. Commit cfb9ff1
(Core.Scripting-001) added System.Environment to the script-sandbox
deny-list, so the script now fails sandbox validation instead of
reaching the timeout path. Switch the busy-loop to DateTime.UtcNow
(an allowed type) to preserve the test's intent — a self-terminating
~5s hang that overruns the 30ms script timeout.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
WonderwareHistorianClient.ReadAtTimeAsync passed the sidecar's reply.Samples
straight through ToSnapshots, which violated the IHistorianDataSource
contract: the result MUST be the same length and order as the requested
timestampsUtc, with gaps returned as Bad-quality snapshots. If the sidecar
dropped or reordered samples, OPC UA HistoryReadAtTime would silently
misalign values with timestamps.
Add an AlignAtTimeSnapshots helper that indexes the returned samples by
timestamp ticks, builds the result array at timestampsUtc.Count in request
order, and emits a Bad-quality (0x80000000) snapshot for any requested
timestamp the sidecar did not return.
Add the ReadAtTimeAsync_PartialAndReorderedReply_AlignsByTimestamp_AndFillsGapsAsBad
regression test where the fake returns a partial, reordered sample set.
Update code-reviews/Driver.Historian.Wonderware.Client/findings.md: -001
Resolved, open-finding count 10 -> 9.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
WriteToReadOnlyFile was listed in MalformedErrors, so ClassifyOutcome/
MapOutcome routed it to PermanentFail and the store-and-forward sink
dead-lettered every alarm event in the batch. But WriteToReadOnlyFile is
a connection-configuration fault (the write session was opened without
ReadOnly = false), not an event-payload fault — treating it as permanent
silently and permanently discards alarm events on a misconfigured or
regressed connection, which is data loss.
Move WriteToReadOnlyFile from MalformedErrors into ConnectionErrors. The
batch loop now aborts the batch, resets the connection (so the reconnect
path re-opens a writable ReadOnly = false session), and defers the
events as RetryPlease for the next drain tick.
Updated the ClassifyOutcome theory data and added a dedicated regression
test pinning WriteToReadOnlyFile -> RetryPlease.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The DL205 family-native branch routed every V-prefixed address through
DirectLogicAddress.UserVMemoryToPdu, a plain octal-to-decimal decode.
DL205/DL260 system V-memory (V40400 and up) is not a simple octal decode:
the CPU relocates the system bank to Modbus PDU 0x2100. Octal-decoding
V40400 produced 16640 (0x4100), the wrong register, so any tag addressing
a system register through the grammar string silently read/wrote the
wrong PLC memory.
- Add DirectLogicAddress.VMemoryToPdu, which decodes the octal V-address,
detects the system bank (octal >= V40400 == SystemVMemoryOctalBase) and
relocates it through SystemVMemoryToPdu to PDU 0x2100; user-bank
addresses keep the plain octal decode.
- ModbusAddressParser's DL205 V branch now calls VMemoryToPdu instead of
UserVMemoryToPdu. UserVMemoryToPdu is retained for user-bank-only callers.
- Correct the ModbusFamilyParserTests V40400 assertion (16640 -> 0x2100)
and add system-bank regression cases plus direct helper coverage.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
_lastPublishedByRef was a plain Dictionary<string, object> mutated inside
ShouldPublish, which runs on the PollGroupEngine onChange callback. The engine
runs one background Task per subscription, so a driver with two or more
subscriptions invokes ShouldPublish concurrently on separate threads. Concurrent
TryGetValue/indexer writes on a non-thread-safe Dictionary can corrupt internal
state, drop entries, or throw, crashing the poll loop.
Switch _lastPublishedByRef to ConcurrentDictionary<string, object>; its
TryGetValue and indexer-set operations are individually thread-safe, so the
deadband cache is now correct under concurrent multi-subscription publishing,
consistent with the lock-guarded sibling cache _lastWrittenByRef.
Add an xUnit + Shouldly regression test that runs 24 deadband-configured
single-tag subscriptions concurrently and asserts the poll loop survives without
faulting.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Driver.Galaxy-002 — DataTypeMap.Map had no Int64 arm though MxValueDecoder/
MxValueEncoder both fully support Int64. Galaxy attributes with the Int64
mx_data_type code fell through to the String default, creating a String
address-space node while runtime reads decoded a boxed long. Added
`6 => DriverDataType.Int64`, extending the contiguous 0..5 scheme so the type
map agrees with the decoder/encoder on all seven Galaxy data types.
Driver.Galaxy-008 — after a stream fault the EventPump's StreamEvents consumer
loop exited and its channel completed; EventPump.Start() is a no-op on a
completed-but-non-null loop, so a replayed subscription had no consumer and
ReplayAsync never re-registered the post-reconnect item handles. ReplayAsync
now recreates the EventPump (RestartEventPumpForReplay) and rebinds the
SubscriptionRegistry per subscription with the fresh item handles returned by
the post-reconnect SubscribeBulkAsync, via new SubscriptionRegistry.SnapshotEntries
and Rebind APIs.
Regression tests: DataTypeMapTests (every code incl. Int64), SubscriptionRegistry
Tests (Rebind/SnapshotEntries), EventPumpStreamFaultTests (faulted pump dead,
fresh pump resumes dispatch).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Driver.FOCAS-001: FocasDriverConfigDto exposed no FixedTree / AlarmProjection /
HandleRecycle sections, so a deployment that opted into those features per
docs/drivers/FOCAS.md had the sections silently dropped by case-insensitive
JSON parsing and the features stayed at their disabled defaults. Added
FocasFixedTreeDto / FocasAlarmProjectionDto / FocasHandleRecycleDto and Build*
mappers in CreateInstance that populate the matching FocasDriverOptions
properties; a missing section or field keeps its existing default.
Driver.FOCAS-002: the fixed-tree bootstrap probe classified ProgramInfo as
"supported" whenever GetProgramInfoAsync returned non-null, but WireFocasClient
.GetProgramInfoAsync substituted defaults instead of throwing on a FOCAS error
return, so a CNC series answering EW_FUNC/EW_NOOPT for cnc_exeprgname2 /
cnc_rdopmode still got the Program/ and OperationMode/ subtrees. The method now
throws InvalidOperationException when neither the program-name nor the op-mode
read is IsOk, so SafeTryProbe correctly suppresses the capability.
Added FocasFactoryConfigTests covering the three opt-in config sections
round-tripping through CreateInstance and the fixed-tree bootstrap classifying
ProgramInfo as unsupported when the probe throws. Added an internal
FocasDriver.Options test seam.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Driver.OpcUaClient-001 — ReadAsync/WriteAsync/DiscoverAsync captured the
session before acquiring _gate, so a reconnect that completed while the
operation was blocked on the gate left the wire call bound to a stale,
closed session. All three now re-read Session (and parse NodeIds) inside
the _gate critical section after WaitAsync returns.
Driver.OpcUaClient-002 — OnReconnectComplete ignored the give-up (null
session) case, permanently wedging the driver with no Faulted signal and
no reconnect loop. The give-up branch now transitions HostState to
Faulted, sets a Faulted DriverHealth with an explanatory message, and
re-arms a fresh SessionReconnectHandler (TryRearmReconnect) against the
last-known session so an always-on gateway self-heals.
Driver.OpcUaClient-003 — BrowseRecursiveAsync discarded browse
continuation points, silently truncating large remote folders.
It now loops on BrowseResult.ContinuationPoint calling BrowseNextAsync
and appending each page until the continuation point is empty.
Driver.OpcUaClient-004 — driver-specs.md §8 namespace handling was
absent. Added NamespaceMap (built from session.NamespaceUris at connect,
rebuilt on reconnect) which persists discovered NodeIds in the
server-stable nsu=<uri>;... form; reads/writes re-resolve that form
against the current session so a remote namespace-table reorder no
longer misaddresses nodes. Added the TargetNamespaceKind option +
UnsMappingTable and ValidateNamespaceKind startup enforcement.
Driver.OpcUaClient-005 — OnKeepAlive read/wrote _reconnectHandler
without a lock, racing the SDK keep-alive timer thread and leaking
handlers. The check-and-set in OnKeepAlive, the take-and-clear in
ShutdownAsync, and the dispose/re-arm in OnReconnectComplete now all
run inside the _probeLock critical section.
Adds OpcUaClientNamespaceTests (11 xUnit + Shouldly regression tests)
covering ValidateNamespaceKind and the NamespaceMap stable encoding.
Reconnect/browse wire paths remain fixture-gated per finding -015.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Driver.S7-001: Timer (T{n}) / Counter (C{n}) addresses parsed cleanly but
the read path had no S7DataType or decode case for them, so a Timer/Counter
tag passed fail-fast init and then threw a misleading type-mismatch on every
read. InitializeAsync now runs RejectUnsupportedTagAddresses, throwing a clear
NotSupportedException ("not yet supported", echoing tag name + address) so the
config error fails fast at init.
Driver.S7-006: ShutdownAsync cancelled the probe/poll CTSs but did not await
the fire-and-forget loop tasks before DisposeAsync disposed _gate, letting a
loop iteration mid-semaphore race a disposed object. The probe task is now
tracked in _probeTask and each poll task in SubscriptionState.PollTask;
ShutdownAsync cancels every CTS, awaits Task.WhenAll of those handles with a
bounded 5 s DrainTimeout, then disposes the CTSs and gate. Task.Run is passed
CancellationToken.None so the handle is always awaitable.
Driver.S7-007: a PUT/GET-disabled fault (permanent misconfiguration) was
mapped identically to a transient PlcException — both BadDeviceFailure +
Degraded. ReadAsync/WriteAsync now split the catch via an IsAccessDenied
filter (S7.Net exposes no typed code for AccessingObjectNotAllowed, so the
inner-exception chain is inspected for the "not allowed" marker). Access-denied
now maps to BadNotSupported and Faulted with a config-alert message pointing
at the TIA Portal PUT/GET toggle; genuine device faults stay BadDeviceFailure.
Driver.S7-011: S7Driver ignored driverConfigJson on Initialize/Reinitialize,
so a config change delivered through ReinitializeAsync (the only Core-initiated
in-process recovery path) was silently discarded. Config parsing was factored
into S7DriverFactoryExtensions.ParseOptions; InitializeAsync now re-parses
driverConfigJson and rebuilds _options whenever the document has a real body.
An empty / placeholder document keeps the constructor options.
Adds S7DriverCodeReviewFixTests covering Timer/Counter rejection, config-json
application on Initialize/Reinitialize, and shutdown-drain with active
subscriptions. All 68 S7 driver tests pass.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Driver.AbLegacy-001 — PCCC bit-index range. AbLegacyAddress.TryParse
accepted a bit index of 0..31 for every file type, but a 16-bit
N/B/I/O/S/A word only has bits 0..15. TryParse now range-checks the
bit index against the file's word width (0..15 for 16-bit element
files, 0..31 for the 32-bit L file, no bits on float files), so
addresses like N7:0/20 are rejected at parse time instead of silently
truncating in the (short) cast. WriteBitInWordAsync reads and writes
an L-file parent word as 32-bit Long and masks the RMW arithmetic to
the native width, so a sign-extended 16-bit decode can no longer
corrupt the high bits.
Driver.AbLegacy-006 — shared-runtime concurrency. A per-tag libplctag
Tag handle is cached and reused by both the server read path and the
poll loop, with no synchronisation around Read/GetStatus/DecodeValue.
Added a per-runtime SemaphoreSlim (DeviceState.GetRuntimeLock, keyed
by tag name); ReadAsync and WriteAsync now hold it across the whole
Read -> GetStatus -> Decode / Encode -> Write -> GetStatus sequence so
no two threads touch the same Tag handle concurrently.
Added xUnit + Shouldly regression coverage: AbLegacyBitIndexRangeTests
(per-file bit-range validation + L-file 32-bit RMW + sign-extension
safety) and AbLegacyRuntimeConcurrencyTests (overlap-detecting fake
proving concurrent read/read and read/write are serialised).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Driver.AbCip-001 — ReinitializeAsync silently discarded its config JSON.
Extracted AbCipDriverFactoryExtensions.ParseOptions; InitializeAsync now
re-parses a content-bearing driverConfigJson and replaces _options (and
recreates the alarm projection), so a reinitialize with a changed config
(new device/tag, changed timeout) actually takes effect. A blank or
empty-object JSON keeps construction-time options for the unit-test seam.
Driver.AbCip-002 — libplctag status mapping used wrong integer constants.
MapLibplctagStatus now switches on the libplctag.NET Status enum members
(Ok/Pending/ErrorTimeout/ErrorNotFound/ErrorNotAllowed/ErrorOutOfBounds/…)
instead of hand-typed natives, so timeout/not-found/not-allowed/out-of-bounds
get their specific OPC UA codes instead of all collapsing to
BadCommunicationError. The int overload casts to Status to stay correct
against the wrapper's contiguous renumbering.
Driver.AbCip-003 — whole-UDT reads decoded members at declaration-order
offsets, which Studio 5000 does not guarantee. Added the opt-in
AbCipDriverOptions.EnableDeclarationOnlyUdtGrouping flag (default false);
AbCipUdtReadPlanner.Build forms no groups when it is off, so by default
every UDT member reads per-tag rather than at possibly-wrong offsets.
Driver.AbCip-008 — probe loops were fire-and-forget and ShutdownAsync raced
them. Each probe Task is stored on DeviceState.ProbeTask; ShutdownAsync now
cancels every CTS, awaits each probe Task (10s timeout), then disposes the
CTS and handles. DeviceState.Runtimes/ParentRuntimes are ConcurrentDictionary
and the Ensure*RuntimeAsync paths use TryAdd, disposing the losing concurrent
creator instead of leaking a native tag handle.
Adds AbCipDriverCodeReviewRegressionTests and updates existing AbCip tests
to the corrected status constants + opt-in grouping flag. AbCip driver +
test project build clean; all 244 AbCip tests pass. (The full-solution
build has pre-existing, unrelated Driver.Galaxy protobuf-generation errors
in this worktree.)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Driver.TwinCAT-001 — InitializeAsync/ReinitializeAsync ignored driverConfigJson.
Extracted the DTO-to-options parse into a shared TwinCATDriverFactoryExtensions.ParseOptions;
InitializeAsync now re-parses driverConfigJson into a mutable _options field, so a config
generation pushed via ReinitializeAsync (added/removed devices, tags, probe settings) is
actually applied at runtime.
Driver.TwinCAT-002 — LInt/ULInt narrowed to Int32. ToDriverDataType now maps LInt to Int64,
ULInt to UInt64, UDInt to UInt32, UInt/USInt to UInt16, Int/SInt to Int16, and the IEC
TIME/DATE/DT/TOD types to UInt32 (their raw UDINT counter). Removed the stale "Int64 gap"
comment — no truncation or sign flips at the OPC UA encode layer.
Driver.TwinCAT-007 — EnsureConnectedAsync was not thread-safe. Connect/reconnect is now
serialized per device by a SemaphoreSlim (DeviceState.ConnectGate) with a double-checked
connect, mirroring the S7 driver. Concurrent read/write/probe callers can no longer leak a
client or race a create-vs-dispose.
Driver.TwinCAT-008 — native ADS notification callbacks ran driver logic on the AMS router
thread. AdsTwinCATClient now enqueues AdsNotificationEx callbacks onto a bounded Channel
drained by a dedicated managed task; the router-thread callback only does a non-blocking
TryWrite, so a slow consumer cannot stall ADS notification delivery process-wide.
Driver.TwinCAT-013 — TwinCATDriver did not implement IRediscoverable. The driver now
implements IRediscoverable; AdsTwinCATClient detects ADS 0x0702 (symbol-version-changed) on
read/write paths and raises OnSymbolVersionChanged, which the driver forwards as
OnRediscoveryNeeded so Core rebuilds the address space after a PLC program re-download.
Adds TwinCATHighFindingsRegressionTests covering all five fixes; updates the data-type
mapping assertion in TwinCATDriverTests. TwinCAT driver builds clean; 119 tests pass.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
OnScriptSetVirtualTag updated the value cache, notified observers, and
recorded history for the written path but never scheduled a cascade for
tags depending on that path. This contradicts docs/VirtualTags.md, which
states ctx.SetVirtualTag writes "still participate in change-trigger
cascades": a change-triggered virtual tag reading a script-written tag
went stale until an unrelated trigger fired.
OnScriptSetVirtualTag now launches a fire-and-forget CascadeAsync for the
written path, mirroring OnUpstreamChange. The cascade is scheduled rather
than invoked inline because the callback runs inside EvaluateInternalAsync
while the non-reentrant _evalGate semaphore is held.
Added regression test
SetVirtualTag_within_script_cascades_to_dependents_of_the_written_tag.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
_alarms was a plain Dictionary<string, AlarmState> mutated under the
_evalGate semaphore, but four read paths (GetState, GetAllStates, the
LoadedAlarmIds property, and RunShelvingCheck) touched it from arbitrary
threads with no synchronisation. A Dictionary read concurrent with a
writer's entry reassignment can throw InvalidOperationException or return
torn state.
Switched _alarms to ConcurrentDictionary<string, AlarmState>. The only
write shapes are indexer-set and Clear, both atomic on ConcurrentDictionary,
so all mutations stay correct without further change; reads now get safe
snapshot semantics. LoadedAlarmIds materialises the key snapshot to keep
its IReadOnlyCollection<string> return type. This matches _valueCache,
which is already a ConcurrentDictionary.
Added a regression test (Concurrent_reads_during_mutation_do_not_throw)
that hammers the engine with state mutations while four reader threads
continuously call the three unguarded read paths.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Core.AlarmHistorian-002 — drain loop now honors exponential backoff:
StartDrainLoop arms a self-rescheduling one-shot Timer. RescheduleDrain
sets the next due-time to max(tickInterval, CurrentBackoff) while the
sink is BackingOff, so a historian outage genuinely slows the cadence
down the 1s->2s->5s->15s->60s ladder instead of hammering at the fixed
tick. Class doc-comment updated.
Core.AlarmHistorian-004 — SQLite busy handling: the connection string
is built via SqliteConnectionStringBuilder with DefaultTimeout=5, and a
new OpenConnection helper applies PRAGMA busy_timeout=5000 and
PRAGMA journal_mode=WAL on every open. A concurrent enqueue-vs-drain
file-lock collision now waits the lock out instead of failing fast with
SQLITE_BUSY. All connection open sites switched to the helper.
Core.AlarmHistorian-006 — drain-loop faults are no longer unobserved:
the timer callback (DrainTimerCallback) awaits DrainOnceAsync inside a
try/catch that logs via _logger.Error, records the message into
_lastError, and sets _drainState=BackingOff so a stalled drain is
visible on GetStatus; a finally always re-arms the timer.
Regression tests added to SqliteStoreAndForwardSinkTests:
StartDrainLoop_honors_backoff_and_slows_cadence_under_retry,
StartDrainLoop_keeps_steady_cadence_when_writer_is_healthy,
StartDrainLoop_records_drain_fault_and_keeps_running,
Concurrent_enqueue_and_drain_do_not_throw_sqlite_busy.
findings.md: 002/004/006 marked Resolved; open count 10 -> 7.
Build: clean (0 warnings). Tests: 20/20 passing.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
SnapshotFormatter.FormatStatus mapped four OPC UA status names to
incorrect numeric codes, mislabelling operator-facing CLI output. The
codes were corrected to their canonical OPC Foundation
Opc.Ua.StatusCodes values:
BadTimeout 0x80060000 -> 0x800A0000
BadNoCommunication 0x80070000 -> 0x80310000
BadWaitingForInitialData 0x80080000 -> 0x80320000
BadNodeIdInvalid 0x80350000 -> 0x80330000
The Cli.Common project does not reference the Opc.Ua package (only
Core.Abstractions / CliFx / Serilog), so the hex literals were
corrected in place with a sync note rather than adding a heavy new
dependency.
SnapshotFormatterTests was updated: the [Theory] expectations now use
the correct spec codes and assert the full rendered form, plus a new
regression [Theory] confirms the pre-fix wrong names no longer apply.
All 24 tests pass.
findings.md: Driver.Cli.Common-001 set to Resolved; open count 6 -> 5.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Client.Shared-005: _activeDataSubscriptions (a plain Dictionary) and the
_activeAlarmSubscription tuple were mutated from the caller thread, the
keep-alive failover path, and DisconnectAsync with no synchronization,
risking bucket corrosion / InvalidOperationException / lost entries.
Added a dedicated _subscriptionLock and wrapped every read/write of that
bookkeeping state inside it (Subscribe/Unsubscribe[Alarms]Async,
Disconnect, Dispose, and the snapshot/clear/re-record steps of
ReplaySubscriptionsAsync). Awaited adapter calls stay outside the lock so
it is never held across I/O.
Client.Shared-006: HandleKeepAliveFailureAsync had only a non-atomic
state check guarding re-entry, so two bad keep-alives could each start a
failover loop, racing to dispose/replace _session and double-replaying
subscriptions. It now claims an atomic _failoverInProgress slot via
Interlocked.CompareExchange; a re-entrant call returns immediately. The
loop body moved to RunFailoverAsync, wrapped in try/finally that resets
the flag.
Tests: added KeepAliveFailure_ReentrantWhileFailoverInFlight_RunsFailoverOnce
and SubscribeAndUnsubscribe_ConcurrentCalls_DoNotCorruptState regression
tests; made the FakeSubscriptionAdapter / FakeSessionAdapter /
FakeSessionFactory test doubles thread-safe (and added a CreateGate hook)
so the concurrency tests exercise production locking rather than fake
state. All 138 Client.Shared tests pass.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Admin-003 — SignalR hubs were anonymously reachable: an unauthenticated
client could open /hubs/fleet, /hubs/alerts and /hubs/script-log and
stream fleet state, alert detail text and server script-log contents.
Added [Authorize] to FleetStatusHub, AlertHub and ScriptLogHub, and
chained .RequireAuthorization() onto all three MapHub() calls as a
belt-and-braces backstop.
Admin-004 — appsettings.json committed live-looking secrets (the `sa`
ConfigDb password and the LDAP ServiceAccountPassword) in plaintext.
Replaced both with empty placeholders sourced from user-secrets (dev) or
the ConnectionStrings__ConfigDb / Authentication__Ldap__ServiceAccountPassword
environment variables (prod); added a UserSecretsId to the Admin csproj
and a fail-fast guard in Program.cs when ConfigDb is empty/missing.
Admin-005 — Login.razor performed SignInAsync from an interactive Blazor
circuit, where the original HTTP response has long completed so the auth
cookie was not emitted. Rewrote it as a static-rendered plain HTML form
(data-enhance="false") posting to a new AuthEndpoints.MapAuthEndpoints()
minimal-API handler (/auth/login, /auth/logout) that does the LDAP bind,
grant resolution, cookie SignInAsync and redirect while the endpoint
still owns the response. Includes an open-redirect guard on returnUrl.
Added xUnit + Shouldly regression tests: AuthEndpointsTests (login cookie
issuance, failed-bind redirect, open-redirect rejection, logout, anonymous
hub negotiate rejection) and AppSettingsSecretHygieneTests (no committed
secrets). All 26 auth-related tests pass.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Core-001: swap the authorization-cache defaults so
MembershipFreshnessInterval (5 min, inner re-resolve trigger) is
strictly less than AuthCacheMaxStaleness (15 min, fail-closed
ceiling), so NeedsRefresh's warm-refresh path is reachable.
Core-002: TriePermissionEvaluator.Authorize now compares the trie's
GenerationId against the session's AuthGenerationId and re-fetches the
session's bound generation on mismatch, failing closed when that
generation has been pruned.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Configuration-001: wrap the EXEC dbo.sp_ValidateDraft call in
sp_PublishGeneration in a BEGIN TRY/CATCH ROLLBACK; THROW block so a
validation RAISERROR aborts the publish instead of being ignored.
Configuration-008: route caller-supplied strings interpolated into
ConfigAuditLog.DetailsJson through STRING_ESCAPE(@x, 'json') and emit
sp_RollbackToGeneration's @TargetGenerationId as a bare JSON number,
closing the JSON-injection / denial-of-operation vector.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Server-002 — AuthorizationGate lax-mode no longer overrides explicit deny.
IsAllowed now switches on the evaluator's AuthorizationVerdict: Allow -> true,
Denied (an authored deny rule matched) -> false in BOTH strict and lax mode,
and only the indeterminate NotGranted case falls through to !_strictMode.
Previously `if (decision.IsAllowed) return true; return !_strictMode;` let lax
mode (the default) nullify authored NodeAcl deny rules for fully-resolved
sessions. The tri-state AuthorizationVerdict.Denied member is now honoured.
Server-009 — LDAP is secure-by-default. LdapOptions.AllowInsecureLdap now
defaults to false (was true) and Program.cs's config fallback reads `?? false`
(was `?? true`), so an LDAP-enabled deployment will not bind credentials over
an unencrypted socket unless an operator explicitly opts in. Program.cs also
logs a startup warning when LDAP is enabled with UseTls=false and
AllowInsecureLdap=true, flagging the clear-text server->LDAP credential hop.
Regression tests: AuthorizationGateTests covers all four verdict x mode
combinations via a fixed-verdict evaluator stub; new LdapOptionsTests asserts
the secure defaults. Both Server and Server.Tests build clean; the 15 targeted
tests pass.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The ForbiddenTypeAnalyzer syntax walker only inspected four node kinds
(ObjectCreation, Invocation-with-member-access, MemberAccess, bare
Identifier), so a forbidden type named through typeof, a generic type
argument, a cast, an is/as type pattern, default(T), an array-creation
element type, or an explicitly-typed local declaration produced no
examined node and bypassed the sandbox check.
Analyze now runs a second pass that resolves GetTypeInfo on every
TypeSyntax node and recursively unwraps array element types and generic
type arguments, so forbidden types nested at any depth are rejected at
compile. The original member/call node-kind switch is kept deliberately
narrow (rather than resolving GetSymbolInfo on every node) to avoid
flagging harmless inherited members such as typeof(int).Name, whose Name
property is declared by System.Reflection.MemberInfo. A span+type dedupe
keeps the two passes from emitting duplicate rejections.
Regression tests added in ScriptSandboxTests cover typeof, generic type
arguments, casts, default(T), is/as patterns, array element types, and
typed local declarations with forbidden types, plus over-block guards
asserting allowed generics and typeof still compile.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The ReconnectSupervisor was constructed but its trigger
ReportTransportFailure was never called. When the gateway StreamEvents
stream faulted, EventPump just logged and exited — the supervisor was
never notified, so a transient gateway drop permanently stopped
data-change notifications while GetHealth() still reported Healthy.
EventPump gains an optional onStreamFault callback invoked from its
stream-fault catch block (not on clean shutdown). GalaxyDriver wires it
to ReconnectSupervisor.ReportTransportFailure so a transport drop drives
reopen → replay.
This is the minimal fix for -001; the pump-restart-on-reopen gap remains
tracked as Driver.Galaxy-008. Regression tests cover the callback being
invoked on fault, the end-to-end supervisor reopen/replay, and that a
clean shutdown does not fire it. Driver.Galaxy suite: 206/206 pass.
Resolves code-review finding Driver.Galaxy-001 (Critical).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
ReadBatch built parallel rowIds / events lists: rowIds.Add ran for every
row but events.Add was guarded by `if (evt is not null)`. A corrupt /
null-deserializing payload desynced the lists, so DrainOnceAsync applied
each outcome to the wrong RowId — an Ack could delete an un-sent event
(silent alarm-event data loss) and the corrupt row stalled the queue
head forever.
ReadBatch now returns a single list of QueueRow(long RowId,
AlarmHistorianEvent? Event) records so a rowId can never drift from its
event; deserialization is wrapped to yield null on JsonException.
DrainOnceAsync immediately dead-letters rows whose payload is
null/un-deserializable and forwards only well-formed events to the
writer, mapping outcomes by RowId.
Regression tests cover a corrupt row mid-batch and at the queue head.
Core.AlarmHistorian suite: 16/16 pass.
Resolves code-review finding Core.AlarmHistorian-001 (Critical).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
ForbiddenTypeAnalyzer used only a namespace-prefix deny-list. System.Environment,
System.AppDomain, System.GC and System.Activator live directly in the System
namespace, which must stay allowed for primitives (Math, String, ...), so they
were never caught — an operator-authored predicate could call
System.Environment.Exit(0) and terminate the in-process OPC UA server.
Add a type-granular deny-list (ForbiddenFullTypeNames) checked by
fully-qualified type name after the namespace-prefix check; legitimate System
types are unaffected.
Regression tests assert scripts referencing Environment/AppDomain/GC/Activator
are rejected at analysis time. Core.Scripting suite: 68/68 pass.
Resolves code-review finding Core.Scripting-001 (Critical).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Admin-001: Routes.razor used a plain RouteView, so the page-level
[Authorize] attributes on 11 pages were inert — every page, including
mutating ones, was reachable fully unauthenticated.
Admin-002: several pages (e.g. NewCluster, which writes config rows)
carried no auth attribute at all.
- Routes.razor: RouteView → AuthorizeRouteView with NotAuthorized /
Authorizing slots; add RedirectToLogin component.
- Program.cs: SetFallbackPolicy(RequireAuthenticatedUser) — secure by
default for new pages/endpoints.
- Login.razor: [AllowAnonymous] so login stays reachable; login page,
/auth/* endpoints and static assets remain anonymous.
- Add [Authorize] to the previously un-gated pages; NewCluster gated to
the CanPublish (FleetAdmin) policy.
Regression tests in PageAuthorizationTests pin that anonymous requests
to protected/mutating routes are rejected and that login + static
assets stay anonymously reachable. Admin test suite: 210/210 pass.
Resolves code-review findings Admin-001 and Admin-002 (Critical).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
WriteNodeIdUnknown called itself unconditionally as its first statement
— unbounded recursion with no base case → StackOverflowException, an
uncatchable process crash reachable by any client issuing a HistoryRead
on an unresolvable NodeId (remote DoS).
Replace the self-call with the result-slot assignment, mirroring
WriteUnsupported / WriteInternalError. The helper is now internal so the
regression test can pin the StatusCode without a server fixture.
Resolves code-review finding Server-001 (Critical).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Reviewed all 31 src/ production projects against the 10-category
checklist in REVIEW-PROCESS.md. Each module gets its own findings.md;
code-reviews/README.md is regenerated from them.
334 findings: 6 Critical, 46 High, 126 Medium, 156 Low.
Critical findings:
- Server-001: WriteNodeIdUnknown recurses unconditionally — a HistoryRead
on an unresolvable node crashes the process (remote DoS).
- Admin-001/002: app-wide auth bypass (RouteView not AuthorizeRouteView)
plus unauthenticated mutating routes.
- Core.Scripting-001: System.Environment reachable from operator scripts;
Environment.Exit() terminates the server.
- Core.AlarmHistorian-001: rowIds/events parallel-list desync on a corrupt
payload misapplies outcomes — silent alarm-event data loss.
- Driver.Galaxy-001: ReconnectSupervisor is built but never triggered, so
a transient gateway drop permanently kills the event stream.
All findings are Status=Open; resolution is tracked per REVIEW-PROCESS.md
section 4. Review only — no source code changed.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adapts the code-review procedure, folder layout, template, and tooling
from the sibling mxaccessgw repo to lmxopcua.
- REVIEW-PROCESS.md: per-module review workflow — a module is one src/
or tests/ project (ZB.MOM.WW.OtOpcUa. prefix stripped); 10-category
checklist; finding IDs/severities/statuses; re-review rules.
- code-reviews/_template/findings.md: per-module findings template.
- code-reviews/regen-readme.py: generates the cross-module README.md
index from the per-module findings.md files; --check gates staleness
and consistency.
- code-reviews/test_regen_readme.py: dependency-free generator tests.
- code-reviews/prompt.md: orchestration prompt for clearing the backlog.
- code-reviews/README.md: generated index (no modules reviewed yet).
- scripts/check-code-reviews-readme.ps1: CI / pre-commit check wrapper.
Adapted to this repo: ZB.MOM.WW.OtOpcUa module naming, OtOpcUa
conventions checklist (in-process GalaxyDriver + mxaccessgw,
contained-name vs tag-name, ACL at DriverNodeManager), single .NET
solution build/test commands, and the lmxopcua design docs.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The mxaccessgw updated alarms to a session-less central monitor:
AcknowledgeAlarm dropped SessionId and alarm transitions now come from
the session-less StreamAlarms feed instead of the per-session worker
StreamEvents stream. The GalaxyDriver no longer compiled against the
updated client.
- GatewayGalaxyAlarmAcknowledger: session-less rewrite — no GalaxyMxSession;
outcome read from ProtocolStatus (throw) and Hresult (warn).
- New IGalaxyAlarmFeed seam + GatewayGalaxyAlarmFeed: background consumer
of StreamAlarms that decodes the active-alarm snapshot plus live
transitions into GalaxyAlarmTransition and reopens the stream on
transport faults.
- EventPump: drop the dead per-session OnAlarmTransition path; the
per-session stream no longer carries alarms.
- GalaxyDriver: bridge the feed onto IAlarmSource.OnAlarmEvent; the feed
starts on SubscribeAlarmsAsync, independent of data subscriptions.
- Tests: replace EventPumpAlarmTests with GatewayGalaxyAlarmFeedTests;
move the driver alarm-source tests onto the IGalaxyAlarmFeed seam.
Browse needed no change — GatewayGalaxyHierarchySource consumes the
unchanged DiscoverHierarchy contract.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
SdkAlarmHistorianWriteBackend.WriteBatchAsync replaces the RetryPlease
placeholder with the real entry point — HistorianAccess.AddStreamedValue
(HistorianEvent, out HistorianAccessError) in aahClientManaged, pinned by
decompiling the installed SDK.
The write path opens its own ReadOnly=false connection: the query-side
HistorianDataSource opens ReadOnly sessions and AddStreamedValue fails on
those with WriteToReadOnlyFile. IHistorianConnectionFactory gains a readOnly
parameter (default true, query path unchanged); BuildConnectionArgs is
extracted as a pure helper. HistorianClusterEndpointPicker is shared for
node failover; connection-class errors abort the batch as RetryPlease and
reset the connection, malformed-input codes map to PermanentFail.
Tests: connection-unavailable batch deferral, ClassifyOutcome error-code
table, BuildConnectionArgs read-vs-write shaping (80 pass, 2 rig-skipped).
Live_* round-trip tests stay Skip-gated for the D.1 rollout smoke.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
RouteScriptedAlarmMethodCalls now handles ConditionType.AddComment
alongside Acknowledge/Confirm, dispatching to engine.AddCommentAsync.
An empty comment is rejected by the Part 9 state machine and surfaced
as BadInvalidArgument. MapCallOperation gates AddComment at the
AlarmAcknowledge tier — there is no dedicated AddComment permission bit.
Closes phase-7-status.md Gap 1: all Part 9 alarm methods now route to
the engine. Adds 3 unit tests.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
OneShotShelve / TimedShelve / Unshelve now reach the ScriptedAlarmEngine.
Scripted-alarm condition nodes get a ShelvedStateMachine subtree created
before alarm.Create so the stack wires each shelve method's dispatch
handler; AlarmConditionState.OnShelve / OnTimedUnshelve route to the
engine and mirror the result onto the OPC UA node via SetShelvingState.
The three per-instance shelve method NodeIds are indexed so the Call gate
resolves them to OpcUaOperation.AlarmShelve instead of falling through to
generic Call. Engine dispatch is split into the node-free InvokeEngineShelve
so the routing decision is unit-testable.
Adds 9 unit tests; updates phase-7-status.md Gap 1 (only AddComment remains
unwired) and the #24 entry in looseends.md.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Task #19 (alarms C.1) — complete PR C.1 test coverage for the Wonderware
historian alarm-event writer (100/1000-event batching + cluster-failover).
The C.1 code structure was already on master; the live aahClientManaged
SDK binding in SdkAlarmHistorianWriteBackend remains rig-gated (D.1).
Add the spec-required 100/1000-event batching tests and cluster-failover
tests that were missing from the existing C.1 suite:
- AahClientManagedAlarmEventWriterTests: add Large_batch_all_ack_returns_all_true
(batchSize 100 + 1000) and Large_batch_alternating_outcomes_are_positionally_correct
(batchSize 100 + 1000) to satisfy the "1 / 100 / 1000 events" spec requirement;
add Backend_retry_then_succeed_simulates_cluster_failover to cover the
RetryPlease-then-Ack sequence at the IPC layer (unit-level stand-in for the
rig-gated live cluster-failover path).
- SdkAlarmHistorianWriteBackendTests (new file): unit tests that pin the
placeholder backend's RetryPlease-for-every-slot contract (preserves queued
events while D.1 is unresolved); plus two Skip("rig-required") integration
tests covering the live SDK single-event roundtrip and cluster failover via
HistorianClusterEndpointPicker — remove the Skip in PR D.1.
Feasibility note: aahClientManaged.dll IS present in lib/ and referenced in
the csproj; the SDK call site is isolated behind IAlarmHistorianWriteBackend
in SdkAlarmHistorianWriteBackend.WriteBatchAsync (single method, D.1 seam).
The full AahClientManagedAlarmEventWriter implementation was already complete.
Build: 0 errors, 0 warnings.
Tests: 64 passed, 2 skipped (rig-gated), 0 failed.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
#1 EventPumpBoundedChannelTests.Tags_metrics_with_client_name_for_multi_driver_hosts:
Replace fixed Task.Delay(100) with a poll-until-condition loop (5 s
timeout, 25 ms poll) so the test waits until the galaxy.events.received
measurement for galaxy.client=Driver-X actually lands in the listener.
Also adds lock(captured) in the MeterListener callback and at all reads,
since Counter.Add() fires the callback on the RunAsync background thread.
#2 VirtualTagEngineTests.Upstream_change_triggers_cascade_through_two_levels:
After waiting for B=15.0, also await WaitForConditionAsync for C=30.0
before asserting C. The cascade runs B then C sequentially under the
_evalGate semaphore; the prior code could read C while its evaluation
had not yet acquired the gate.
#3 ThreeUserInteropMatrixTests.Admin_Resolves_All_Five_Groups_From_LDAP:
Wrap the AuthenticateAsync call in a 15 s linked CancellationTokenSource
with one retry so transient GLAuth latency spikes under parallel test
load do not cause a CancellationToken expiry before the LDAP bind/search
complete.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Gap 2 (#25): VirtualTagsTab.razor + /virtual-tags global page — list/create/toggle
virtual tags per draft generation with DataType, Script, trigger, Historize, Enabled
fields. Tab wired into DraftEditor.
Gap 3 (#26): ScriptedAlarmsTab.razor + /scripted-alarms global page — list/create
scripted alarms with AlarmType, Severity, MessageTemplate, PredicateScript,
HistorizeToAveva, Retain. SeverityBand helper shows Low/Medium/High/Critical label.
Tab wired into DraftEditor.
Gap 4 (#27): ScriptLogHub (SignalR IAsyncEnumerable stream) tails scripts-*.log with
optional ScriptName filter; ScriptLog.razor provides Start/Stop/Clear controls plus
level filter dropdown. Hub registered at /hubs/script-log in Program.cs.
Nav rail gains a "Scripting" eyebrow with entries for all three pages.
19 new unit tests for ScriptLogHub parse/filter/tail helpers (Category=Unit).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Closes Phase 7 Gap 5: VirtualTagEngine called IHistoryWriter.Record per evaluation
when Historize=true but Phase7EngineComposer always passed NullHistoryWriter, so
virtual-tag history was computed but never persisted.
The fix:
- New RingBufferHistoryWriter implements both IHistoryWriter (write port for the
evaluation pipeline) and IHistorianDataSource (read port for IHistoryRouter so
OPC UA HistoryRead on virtual-tag nodes resolves here). Maintains one bounded
ring buffer (1000 samples, configurable) per tag path; Record() is O(1) and
never blocks evaluation.
- Phase7EngineComposer.Compose now accepts IHistoryRouter? and, when any
VirtualTagDefinition.Historize=true, creates a RingBufferHistoryWriter, passes
it to VirtualTagEngine as historyWriter, adds it to the disposables list, and
registers it under the "virtual:" prefix in the router for HistoryRead dispatch.
- Phase7Composer accepts IHistoryRouter? from DI (already registered as singleton
in Program.cs) and threads it through to Phase7EngineComposer.Compose.
- NullHistoryWriter remains as fallback when no tags request historization.
- 16 new unit tests in RingBufferHistoryWriterTests.cs cover ring-buffer semantics,
eviction, per-tag isolation, ReadRawAsync windowing, IHistorianDataSource stubs,
router registration, and the Historize=false / null-router fallback paths.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Gap 1 of phase-7-status.md. Intercepts AcknowledgeableConditionType_Acknowledge and
AcknowledgeableConditionType_Confirm calls in DriverNodeManager.Call and dispatches
them to ScriptedAlarmEngine so OPC UA HMI clients can acknowledge/confirm scripted alarms
in addition to the existing Admin UI path. Shelve methods deferred (per-instance NodeIds,
not well-known type MethodIds — follow-up task). AlarmEngine is now exposed through
Phase7ComposedSources so the server wire-up passes it to every DriverNodeManager. 13 new
unit tests cover dispatch kernel, identity fallback, batch handling, and error paths.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
phase-6-1-compliance.ps1 asserted CircuitBreaker.cs / Backoff.cs exist
under src/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Proxy/Supervisor/ — but the
Galaxy.Proxy project was retired in PR 7.2 with the legacy in-process
Galaxy stack. Circuit-breaker + backoff resilience is now the Core
pipeline (DriverResiliencePipelineBuilder, per-device-keyed), which the
same gate already asserts and passes. The two stale checks always fail.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Four DB-backed test fixtures still defaulted DefaultServer to
localhost,14330 — missed in the 2026-04-28 Docker migration that moved
SQL Server off this VM onto the shared host 10.100.0.35. With no SQL on
localhost, all 31 DB-backed tests failed with connection timeouts,
which in turn failed the Phase 6 compliance gate (phase-6-all.ps1).
Updated SchemaComplianceFixture, HostStatusPublisherTests,
FleetStatusPollerTests, and AdminServicesIntegrationTests to default to
10.100.0.35,14330 (still overridable via OTOPCUA_CONFIG_TEST_SERVER).
Verified: Configuration.Tests 91 pass, HostStatusPublisher 4 pass,
FleetStatusPoller + AdminServicesIntegration 5 pass — all 31 green.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Produces docs/plans/ entries for tasks #13, #15, #16, and #17-#20:
- phase-6-3-redundancy-interop-plan.md: automation boundary analysis,
concrete test matrix (A/B/C blocks), and a step-by-step cutover
runbook for the deferred Stream F client interop work
- v2-ga-lab-gates-plan.md: exact gate list with command, pass criterion,
and owner for each of the nine v2 GA exit criteria
- live-hardware-validation-runbooks.md: one runbook per driver (FOCAS
CNC smoke #54, AB CIP live-boot, TwinCAT wire-live) with preconditions,
procedure, expected results, and recording template
- alarms-worker-wiring-plan.md: focused plan for A.2/A.3-A.4/C.1/D.1
worker wiring in the mxaccessgw sibling repo, documenting the
discovered AVEVA API surface, the architectural decision that blocks
A.2, the dependency order, and what each item needs to unblock
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>