Core.VirtualTags-002: cold-start guard publishes BadWaitingForInitialData
instead of silently returning a stale value.
Core.VirtualTags-003: Load detects duplicate Path values and keys the
upstream-subscription loop off the registered tag set.
Core.VirtualTags-005: VirtualTagSource fires the initial-data callback per
path before registering the change observer, fixing an ordering race.
Core.VirtualTags-008: DependencyGraph caches topological rank, lowering
per-change-event cost from O(V+E) to O(closure).
Core.VirtualTags-012: added 9 engine tests; CoerceResult null-return now
maps to BadInternalError as the code comment intended.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Core.Abstractions-001: PollGroupEngine compares array values with structural
equality so a driver returning a fresh T[] each poll no longer fires spuriously.
Core.Abstractions-002: PollOnceAsync guards reader result cardinality and
throws a descriptive InvalidOperationException on mismatch instead of a
swallowed ArgumentOutOfRangeException that stalled the subscription.
Core.Abstractions-003: the poll loop Task is tracked; Unsubscribe/DisposeAsync
await loop completion before disposing the CTS.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Add engine-level tests covering the six gaps identified in the finding:
(1) timed-shelve auto-expiry driven via injectable clock + RunShelvingCheckForTest
hook so timer tests are deterministic;
(2) ConfirmAsync, TimedShelveAsync/UnshelveAsync round-trip, EnableAsync engine
methods exercised end-to-end;
(3) OnEvent subscriber-throws isolation — engine state advances and stays
operational after a subscriber throws;
(4) IAlarmStateStore.SaveAsync failure leaves in-memory state unchanged (locks in
the persist-before-update invariant from finding-007);
(5) second LoadAsync does not leak the old timer (regression for finding-002);
(6) AreInputsReady cold-start guard correctly blocks on Bad/missing inputs and
allows Uncertain-quality inputs through.
Expose RunShelvingCheckForTest() internal method on ScriptedAlarmEngine to
support deterministic timer tests.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
SubscribeAsync now wraps each driver handle in a private HostBoundHandle
that carries the resolved host name. UnsubscribeAsync unwraps it and
routes through the recorded host's resilience pipeline, correctly
charging the subscription's originating host's circuit breaker/bulkhead
instead of always using the default host. Falls back to the default
host for handles not created by this invoker. Two regression tests
added; update findings.md Open count from 10 to 6.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
BuildAddressSpaceAsync now checks _disposed (throws ObjectDisposedException)
and tears down the previous alarm forwarder + clears the sink registry
before re-walking, so a Galaxy-redeploy rebuild does not leak the old
forwarder and double-deliver alarm transitions. Three regression tests
added: double-build does not double-fire, sink count is correct after
rebuild, and post-dispose call throws.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Change ClusterEntry from sealed record to sealed class so TryUpdate
uses reference equality for the CAS comparison. Prune now uses a
read-compute-TryUpdate retry loop that restarts when a concurrent
Install updates the entry between the read and the write, preventing
a race that could silently drop the just-installed newest generation.
Two regression tests added to PermissionTrieCacheTests.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Add FolderSegment member to NodeAclScopeKind; update WalkSystemPlatform
to report NodeAclScopeKind.FolderSegment (not Equipment) for each
visited Galaxy folder level, so MatchedGrant.Scope in
AuthorizationDecision.Provenance correctly distinguishes Galaxy folder
grants from UNS Equipment grants in the audit trail and Admin UI
diagnostics. Three regression tests added to PermissionTrieTests.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Configuration-002: sp_PublishGeneration is transaction-nesting aware
(BEGIN TRANSACTION vs SAVE TRANSACTION on @@TRANCOUNT) so a caller's outer
transaction survives a publish failure; sp_ValidateDraft wrapped in TRY/CATCH.
Configuration-003: ValidatePathLength uses the cluster's actual Enterprise/Site
lengths when available, falling back to the conservative approximation.
Configuration-006: ResilientConfigReader treats a command-timeout
TaskCanceledException as a fault (not caller cancellation) and falls back.
Configuration-009: removed the checked-in plaintext sa connection string;
CreateDbContext now requires OTOPCUA_CONFIG_CONNECTION.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The Timeout_maps_to_BadInternalError_without_killing_the_engine test's
"Hang" script busy-looped on Environment.TickCount64. Commit cfb9ff1
(Core.Scripting-001) added System.Environment to the script-sandbox
deny-list, so the script now fails sandbox validation instead of
reaching the timeout path. Switch the busy-loop to DateTime.UtcNow
(an allowed type) to preserve the test's intent — a self-terminating
~5s hang that overruns the 30ms script timeout.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
OnScriptSetVirtualTag updated the value cache, notified observers, and
recorded history for the written path but never scheduled a cascade for
tags depending on that path. This contradicts docs/VirtualTags.md, which
states ctx.SetVirtualTag writes "still participate in change-trigger
cascades": a change-triggered virtual tag reading a script-written tag
went stale until an unrelated trigger fired.
OnScriptSetVirtualTag now launches a fire-and-forget CascadeAsync for the
written path, mirroring OnUpstreamChange. The cascade is scheduled rather
than invoked inline because the callback runs inside EvaluateInternalAsync
while the non-reentrant _evalGate semaphore is held.
Added regression test
SetVirtualTag_within_script_cascades_to_dependents_of_the_written_tag.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
_alarms was a plain Dictionary<string, AlarmState> mutated under the
_evalGate semaphore, but four read paths (GetState, GetAllStates, the
LoadedAlarmIds property, and RunShelvingCheck) touched it from arbitrary
threads with no synchronisation. A Dictionary read concurrent with a
writer's entry reassignment can throw InvalidOperationException or return
torn state.
Switched _alarms to ConcurrentDictionary<string, AlarmState>. The only
write shapes are indexer-set and Clear, both atomic on ConcurrentDictionary,
so all mutations stay correct without further change; reads now get safe
snapshot semantics. LoadedAlarmIds materialises the key snapshot to keep
its IReadOnlyCollection<string> return type. This matches _valueCache,
which is already a ConcurrentDictionary.
Added a regression test (Concurrent_reads_during_mutation_do_not_throw)
that hammers the engine with state mutations while four reader threads
continuously call the three unguarded read paths.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Core.AlarmHistorian-002 — drain loop now honors exponential backoff:
StartDrainLoop arms a self-rescheduling one-shot Timer. RescheduleDrain
sets the next due-time to max(tickInterval, CurrentBackoff) while the
sink is BackingOff, so a historian outage genuinely slows the cadence
down the 1s->2s->5s->15s->60s ladder instead of hammering at the fixed
tick. Class doc-comment updated.
Core.AlarmHistorian-004 — SQLite busy handling: the connection string
is built via SqliteConnectionStringBuilder with DefaultTimeout=5, and a
new OpenConnection helper applies PRAGMA busy_timeout=5000 and
PRAGMA journal_mode=WAL on every open. A concurrent enqueue-vs-drain
file-lock collision now waits the lock out instead of failing fast with
SQLITE_BUSY. All connection open sites switched to the helper.
Core.AlarmHistorian-006 — drain-loop faults are no longer unobserved:
the timer callback (DrainTimerCallback) awaits DrainOnceAsync inside a
try/catch that logs via _logger.Error, records the message into
_lastError, and sets _drainState=BackingOff so a stalled drain is
visible on GetStatus; a finally always re-arms the timer.
Regression tests added to SqliteStoreAndForwardSinkTests:
StartDrainLoop_honors_backoff_and_slows_cadence_under_retry,
StartDrainLoop_keeps_steady_cadence_when_writer_is_healthy,
StartDrainLoop_records_drain_fault_and_keeps_running,
Concurrent_enqueue_and_drain_do_not_throw_sqlite_busy.
findings.md: 002/004/006 marked Resolved; open count 10 -> 7.
Build: clean (0 warnings). Tests: 20/20 passing.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Core-001: swap the authorization-cache defaults so
MembershipFreshnessInterval (5 min, inner re-resolve trigger) is
strictly less than AuthCacheMaxStaleness (15 min, fail-closed
ceiling), so NeedsRefresh's warm-refresh path is reachable.
Core-002: TriePermissionEvaluator.Authorize now compares the trie's
GenerationId against the session's AuthGenerationId and re-fetches the
session's bound generation on mismatch, failing closed when that
generation has been pruned.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Configuration-001: wrap the EXEC dbo.sp_ValidateDraft call in
sp_PublishGeneration in a BEGIN TRY/CATCH ROLLBACK; THROW block so a
validation RAISERROR aborts the publish instead of being ignored.
Configuration-008: route caller-supplied strings interpolated into
ConfigAuditLog.DetailsJson through STRING_ESCAPE(@x, 'json') and emit
sp_RollbackToGeneration's @TargetGenerationId as a bare JSON number,
closing the JSON-injection / denial-of-operation vector.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The ForbiddenTypeAnalyzer syntax walker only inspected four node kinds
(ObjectCreation, Invocation-with-member-access, MemberAccess, bare
Identifier), so a forbidden type named through typeof, a generic type
argument, a cast, an is/as type pattern, default(T), an array-creation
element type, or an explicitly-typed local declaration produced no
examined node and bypassed the sandbox check.
Analyze now runs a second pass that resolves GetTypeInfo on every
TypeSyntax node and recursively unwraps array element types and generic
type arguments, so forbidden types nested at any depth are rejected at
compile. The original member/call node-kind switch is kept deliberately
narrow (rather than resolving GetSymbolInfo on every node) to avoid
flagging harmless inherited members such as typeof(int).Name, whose Name
property is declared by System.Reflection.MemberInfo. A span+type dedupe
keeps the two passes from emitting duplicate rejections.
Regression tests added in ScriptSandboxTests cover typeof, generic type
arguments, casts, default(T), is/as patterns, array element types, and
typed local declarations with forbidden types, plus over-block guards
asserting allowed generics and typeof still compile.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
ReadBatch built parallel rowIds / events lists: rowIds.Add ran for every
row but events.Add was guarded by `if (evt is not null)`. A corrupt /
null-deserializing payload desynced the lists, so DrainOnceAsync applied
each outcome to the wrong RowId — an Ack could delete an un-sent event
(silent alarm-event data loss) and the corrupt row stalled the queue
head forever.
ReadBatch now returns a single list of QueueRow(long RowId,
AlarmHistorianEvent? Event) records so a rowId can never drift from its
event; deserialization is wrapped to yield null on JsonException.
DrainOnceAsync immediately dead-letters rows whose payload is
null/un-deserializable and forwards only well-formed events to the
writer, mapping outcomes by RowId.
Regression tests cover a corrupt row mid-batch and at the queue head.
Core.AlarmHistorian suite: 16/16 pass.
Resolves code-review finding Core.AlarmHistorian-001 (Critical).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
ForbiddenTypeAnalyzer used only a namespace-prefix deny-list. System.Environment,
System.AppDomain, System.GC and System.Activator live directly in the System
namespace, which must stay allowed for primitives (Math, String, ...), so they
were never caught — an operator-authored predicate could call
System.Environment.Exit(0) and terminate the in-process OPC UA server.
Add a type-granular deny-list (ForbiddenFullTypeNames) checked by
fully-qualified type name after the namespace-prefix check; legitimate System
types are unaffected.
Regression tests assert scripts referencing Environment/AppDomain/GC/Activator
are rejected at analysis time. Core.Scripting suite: 68/68 pass.
Resolves code-review finding Core.Scripting-001 (Critical).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
#1 EventPumpBoundedChannelTests.Tags_metrics_with_client_name_for_multi_driver_hosts:
Replace fixed Task.Delay(100) with a poll-until-condition loop (5 s
timeout, 25 ms poll) so the test waits until the galaxy.events.received
measurement for galaxy.client=Driver-X actually lands in the listener.
Also adds lock(captured) in the MeterListener callback and at all reads,
since Counter.Add() fires the callback on the RunAsync background thread.
#2 VirtualTagEngineTests.Upstream_change_triggers_cascade_through_two_levels:
After waiting for B=15.0, also await WaitForConditionAsync for C=30.0
before asserting C. The cascade runs B then C sequentially under the
_evalGate semaphore; the prior code could read C while its evaluation
had not yet acquired the gate.
#3 ThreeUserInteropMatrixTests.Admin_Resolves_All_Five_Groups_From_LDAP:
Wrap the AuthenticateAsync call in a 15 s linked CancellationTokenSource
with one retry so transient GLAuth latency spikes under parallel test
load do not cause a CancellationToken expiry before the LDAP bind/search
complete.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Four DB-backed test fixtures still defaulted DefaultServer to
localhost,14330 — missed in the 2026-04-28 Docker migration that moved
SQL Server off this VM onto the shared host 10.100.0.35. With no SQL on
localhost, all 31 DB-backed tests failed with connection timeouts,
which in turn failed the Phase 6 compliance gate (phase-6-all.ps1).
Updated SchemaComplianceFixture, HostStatusPublisherTests,
FleetStatusPollerTests, and AdminServicesIntegrationTests to default to
10.100.0.35,14330 (still overridable via OTOPCUA_CONFIG_TEST_SERVER).
Verified: Configuration.Tests 91 pass, HostStatusPublisher 4 pass,
FleetStatusPoller + AdminServicesIntegration 5 pass — all 31 green.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Group all 69 projects into category subfolders under src/ and tests/ so the
Rider Solution Explorer mirrors the module structure. Folders: Core, Server,
Drivers (with a nested Driver CLIs subfolder), Client, Tooling.
- Move every project folder on disk with git mv (history preserved as renames).
- Recompute relative paths in 57 .csproj files: cross-category ProjectReferences,
the lib/ HintPath+None refs in Driver.Historian.Wonderware, and the external
mxaccessgw refs in Driver.Galaxy and its test project.
- Rebuild ZB.MOM.WW.OtOpcUa.slnx with nested solution folders.
- Re-prefix project paths in functional scripts (e2e, compliance, smoke SQL,
integration, install).
Build green (0 errors); unit tests pass. Docs left for a separate pass.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>