Commit Graph

32 Commits

Author SHA1 Message Date
Joseph Doherty
7b6ab2ec6f fix(scripting): unload compiled-script assemblies via collectible ALC
Core.Scripting-008 resolution: replace the legacy CSharpScript.CreateDelegate
path with hand-rolled CSharpCompilation + Emit + collectible AssemblyLoadContext,
so per-publish compile accretion no longer requires a server restart to reclaim.

Why this was needed:
  Roslyn's CSharpScript path emits dynamically-compiled script assemblies into
  the default AssemblyLoadContext, which is non-collectible. Across config-
  publish generations each Clear() drops dictionary entries but the emitted
  assemblies stay loaded for process lifetime, so memory grows steadily on
  long-running servers with frequent publishes. The accepted-limitation note
  in docs/VirtualTags.md recommended scheduled restarts as the workaround;
  operator feedback was that restarts are difficult, so the underlying
  limitation was the right thing to fix.

Implementation:
  - New ScriptAssemblyLoadContext(name, isCollectible: true) hosts one emitted
    script assembly per evaluator.
  - ScriptEvaluator.Compile synthesises a wrapper class around the user source
    (CompiledScript.Run(globals) — explicit return required per ordinary C#
    semantics, which every existing script already uses), builds a
    CSharpCompilation against the sandbox references, runs the
    ForbiddenTypeAnalyzer over the semantic model unchanged, emits to an
    in-memory PE stream, loads via ScriptAssemblyLoadContext.LoadFromStream,
    and binds a strongly-typed Func<ScriptGlobals<TContext>, TResult> delegate
    via reflection.
  - ScriptEvaluator now implements IDisposable — Dispose calls
    AssemblyLoadContext.Unload(), which makes the emitted assembly eligible
    for GC at the next collection cycle.
  - CompiledScriptCache.Clear() disposes every materialised evaluator before
    dropping its dictionary entry; CompiledScriptCache itself is now
    IDisposable for graceful server shutdown.
  - ScriptSandbox.Build returns a new SandboxConfig (References + Imports)
    instead of a Roslyn ScriptOptions; references now span BCL via the
    TRUSTED_PLATFORM_ASSEMBLIES set filtered to System.* + netstandard +
    Microsoft.Win32.Registry, so forbidden BCL types resolve at compile and
    ForbiddenTypeAnalyzer is the sole security gate (consistent with the
    Core.Scripting-001 / -002 model — references-list-only restriction is
    porous against type forwarding, so the analyzer must be the real gate).

Verification:
  - All 104 Core.Scripting tests pass (was 101 — three new regression tests
    locking the unload contract).
  - All 56 VirtualTags tests pass (unchanged).
  - All 63 ScriptedAlarms tests pass (unchanged).
  - New CompiledScriptCacheTests:
    - Dispose_unloads_compiled_script_assembly_load_context — proves single-
      evaluator ALC unload via WeakReference + bounded GC.Collect() loop.
    - Clear_disposes_every_materialised_evaluator — proves publish-replace
      releases every prior generation's ALC.
    - GetOrCompile_after_Dispose_throws_ObjectDisposedException — locks the
      post-dispose contract.

Docs:
  - docs/VirtualTags.md "Compile cache" section rewritten: the accepted-
    limitation note replaced with the unload contract + the new authoring
    convention (explicit return).
  - docs/ScriptedAlarms.md cross-reference updated to drop the obsolete
    restart guidance.
  - code-reviews/Core.Scripting/findings.md Core.Scripting-008 flipped
    Won't Fix → Resolved with the implementation summary.
  - code-reviews/README.md regenerated.

Pre-existing breakage note: Driver.Galaxy fails the solution-wide build on
master because its ProjectReference to the sibling mxaccessgw repo's
MxGateway.Client targets a path that the sibling repo no longer has after a
recent restructuring. This is unrelated to Core.Scripting-008 and was
verified to exist on master before this branch was cut.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-23 15:55:04 -04:00
Joseph Doherty
3f01a24b45 fix(core-virtual-tags): resolve Low code-review findings (Core.VirtualTags-004,006,007,009,010,011,013)
- Core.VirtualTags-004: CoerceResult now covers every scalar
  DriverDataType and throws on the default arm; Load rejects unsupported
  declared types.
- Core.VirtualTags-006: Subscribe/Unsub prune empty observer-list
  entries from _observers under the same lock with a reconfirm-on-add
  race guard.
- Core.VirtualTags-007: rewrote TimerTriggerScheduler so each TickGroup
  tracks an InFlight flag (Interlocked CAS); ticks that overlap a still-
  running tick for the same group are skipped + counted.
- Core.VirtualTags-009: DirectDependencies / DirectDependents return a
  shared static empty set on miss instead of allocating per call.
- Core.VirtualTags-010: corrected XML docs to reference the real engine
  symbols (OnUpstreamChange, CascadeAsync, etc.) instead of phantom types.
- Core.VirtualTags-011: Load now rejects scripts whose declared Writes
  target a non-registered virtual-tag path.
- Core.VirtualTags-013: DependencyCycleException renders SCC members as
  a set rather than a fabricated arrow-traversal edge path.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-23 07:23:53 -04:00
Joseph Doherty
0a20de728d fix(core-scripting): resolve Low code-review findings (Core.Scripting-005,006,008,009,011)
- Core.Scripting-005: DependencyExtractor.HandleTagCall now recognises
  raw-string literal paths by checking the StringLiteralExpression node
  kind instead of the legacy StringLiteralToken kind.
- Core.Scripting-006: scope CompiledScriptCache failed-compile eviction
  with TryRemove(KeyValuePair) so a racing retry entry is not evicted.
- Core.Scripting-008: document the per-publish assembly accretion as an
  accepted limitation in docs/VirtualTags.md.
- Core.Scripting-009: enumerate the authoritative deny-list (namespace
  prefixes + type-granular denies) in the Phase 7 decision-#6 entry to
  match ForbiddenTypeAnalyzer.
- Core.Scripting-011: pin ScriptSandbox.Build, ScriptContext.Deadband
  boundary semantics, and end-to-end factory + companion-sink
  integration.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-23 07:23:42 -04:00
Joseph Doherty
99354bfaf2 fix(core-scripted-alarms): resolve Low code-review findings (Core.ScriptedAlarms-003,006,008,010,011; -009 documented)
- Core.ScriptedAlarms-003: emit OnEvent OUTSIDE _evalGate by collecting
  pending emissions during the gate-held section and flushing them after
  release; eliminates re-entrancy deadlock the docs already promised.
- Core.ScriptedAlarms-006: track every fire-and-forget Reevaluate /
  ShelvingCheck task in _inFlight; Dispose drains the set so the engine
  no longer races store writes against teardown.
- Core.ScriptedAlarms-008: store comments as ImmutableList<AlarmComment>
  so AppendComment is O(log n) instead of O(n).
- Core.ScriptedAlarms-010: document the deliberate input-quality
  asymmetry (Uncertain drives the predicate, renders {?} in the message)
  in docs/ScriptedAlarms.md and on MessageTemplate.Resolve remarks.
- Core.ScriptedAlarms-011: propagate the no-op reason through
  TransitionResult.NoOp(state, reason) and log it from
  ScriptedAlarmEngine.ApplyAsync.
- Core.ScriptedAlarms-009 (Won't Fix per recommendation): documented the
  per-evaluation dictionary allocation in docs/v2/Galaxy.Performance.md
  with a mitigation path if a future soak surfaces pressure.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-23 07:23:31 -04:00
Joseph Doherty
0da4f3b63a fix(core-alarm-historian): resolve Low code-review findings (Core.AlarmHistorian-008,011)
- Core.AlarmHistorian-008: cache queue depth in an Interlocked counter so
  EnqueueAsync no longer runs COUNT(*) on every alarm; consolidate
  DrainOnceAsync onto a single SqliteConnection per tick (purge, batch
  read, dead-letter, and outcome transaction all share it).
- Core.AlarmHistorian-011: confirm the stale Galaxy.Host XML doc
  references were already fixed under earlier commits; flip to Resolved.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-23 05:38:26 -04:00
Joseph Doherty
b92fea15d4 fix(configuration): resolve Low code-review findings (Configuration-004,005,007,010,011)
- Configuration-004: NodePermissions stored as int to match the EF
  HasConversion<int>() in OtOpcUaConfigDbContext.ConfigureNodeAcl.
- Configuration-005: serialise LiteDbConfigCache.PutAsync so concurrent
  Put for the same (ClusterId, GenerationId) cannot duplicate rows.
- Configuration-007: rethrow OperationCanceledException from
  GenerationApplier.ApplyPass when the caller's token is cancelled.
- Configuration-010: scrub secrets and drop the full exception object
  from the ResilientConfigReader fallback warning log.
- Configuration-011: pin the previously-uncovered GenerationApplier
  cancellation and path-length / publish-validation paths.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-23 05:38:18 -04:00
Joseph Doherty
8be6afbda4 fix(core): resolve Low code-review findings (Core-004,008,009,010,011,012)
- Core-004: add ConfigureAwait(false) to DriverHost.RegisterAsync /
  UnregisterAsync / DisposeAsync.
- Core-008: rewrite the BuildAddressSpaceAsync XML doc to correctly name
  the caller (OpcUaApplicationHost.PopulateAddressSpaces) that owns the
  per-driver isolation.
- Core-009: snapshot DriverResilienceOptions once per non-idempotent write
  in CapabilityInvoker.ExecuteWriteAsync.
- Core-010: switch DriverResilienceOptions.Resolve to TryGetValue with a
  diagnostic error message when a tier table is missing a capability.
- Core-011: add an optional diagnostic callback to PermissionTrieBuilder
  so production callers can surface scope-path mismatches.
- Core-012: correct the stale WedgeDetector ctor summary and add the
  Reconnecting row to DriverHealthReport's state matrix.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-23 05:38:09 -04:00
Joseph Doherty
ff2e75ab98 fix(core-abstractions): resolve Low code-review findings (Core.Abstractions-004,005,006,007,008)
- Core.Abstractions-004: guard DriverTypeRegistry.Register with a Lock so
  concurrent registrations are atomic.
- Core.Abstractions-005: narrow PollGroupEngine catch blocks to non-fatal
  exceptions, add optional onError callback, tolerate disposed-CTS races.
- Core.Abstractions-006: document the deliberate int-vs-uint asymmetry on
  IHistoryProvider.ReadEventsAsync / IHistorianDataSource.ReadEventsAsync.
- Core.Abstractions-007: pin the gaps with PollGroupEngine + DriverHealth
  contract tests.
- Core.Abstractions-008: correct XML docs on DriverHealth.LastError and
  the optional / required asymmetry on the history-read surfaces.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-23 05:37:54 -04:00
Joseph Doherty
6d520c6756 fix(alarm-historian): resolve Medium code-review finding (Core.AlarmHistorian-005)
Status fields (_lastDrainUtc, _lastSuccessUtc, _lastError, _drainState,
_evictedCount) were written by the drain timer thread and read by
GetStatus() / health-check threads with no memory barrier, risking torn
DateTime? reads and stale DrainState observations.

- Added _statusLock object; all writes to status fields now happen inside
  lock(_statusLock) blocks in DrainOnceAsync and DrainTimerCallback.
- GetStatus() snapshots all fields atomically under the same lock so the
  Admin UI / /healthz endpoint always sees a consistent view.
- Regression test GetStatus_snapshot_is_consistent_under_concurrent_drain
  drives status writes and reads from concurrent threads; asserts no throws.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 09:27:31 -04:00
Joseph Doherty
a17de80cdb fix(scripting): resolve Medium code-review finding (Core.Scripting-010)
Add ScriptSandboxTests cases for all forbidden-namespace deny-list
vectors that lacked test coverage: System.Threading.Thread,
System.Threading.Tasks.Task.Run (newly denied per Core.Scripting-003),
System.Runtime.InteropServices.Marshal, and Microsoft.Win32.Registry.
The 001/002 type-granular and node-form vectors were already covered by
the -001/-002 resolution commits. All 79 tests pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 09:23:29 -04:00
Joseph Doherty
a6de04a297 fix(scripting): resolve Medium code-review finding (Core.Scripting-007)
In TimedScriptEvaluator.RunAsync, the catch (TimeoutException) block
now checks ct.IsCancellationRequested before throwing
ScriptTimeoutException, so a caller cancellation that races a timeout
deterministically surfaces as OperationCanceledException regardless of
which WaitAsync observes first. Regression test
Caller_cancellation_wins_even_when_timeout_fires_first added.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 09:23:20 -04:00
Joseph Doherty
2c571001ca fix(scripting): resolve Medium code-review finding (Core.Scripting-004)
DependencyExtractor.VisitInvocationExpression now additionally checks
that the member-access receiver is the identifier "ctx" before treating
a GetTag / SetVirtualTag call as a ScriptContext dependency. This
prevents spurious dependencies when a script defines a local helper type
with a matching method name and calls it as other.GetTag("X"). Test
Ignores_member_access_GetTag_on_non_ctx_receiver added.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 09:23:12 -04:00
Joseph Doherty
3d8c285034 fix(virtual-tags): resolve Medium code-review findings (Core.VirtualTags-002, -003, -005, -008, -012)
Core.VirtualTags-002: cold-start guard publishes BadWaitingForInitialData
instead of silently returning a stale value.
Core.VirtualTags-003: Load detects duplicate Path values and keys the
upstream-subscription loop off the registered tag set.
Core.VirtualTags-005: VirtualTagSource fires the initial-data callback per
path before registering the change observer, fixing an ordering race.
Core.VirtualTags-008: DependencyGraph caches topological rank, lowering
per-change-event cost from O(V+E) to O(closure).
Core.VirtualTags-012: added 9 engine tests; CoerceResult null-return now
maps to BadInternalError as the code comment intended.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 08:31:49 -04:00
Joseph Doherty
11612900ba fix(core-abstractions): resolve Medium code-review findings (Core.Abstractions-001, -002, -003)
Core.Abstractions-001: PollGroupEngine compares array values with structural
equality so a driver returning a fresh T[] each poll no longer fires spuriously.
Core.Abstractions-002: PollOnceAsync guards reader result cardinality and
throws a descriptive InvalidOperationException on mismatch instead of a
swallowed ArgumentOutOfRangeException that stalled the subscription.
Core.Abstractions-003: the poll loop Task is tracked; Unsubscribe/DisposeAsync
await loop completion before disposing the CTS.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 08:29:49 -04:00
Joseph Doherty
69994f9cf6 fix(scripted-alarms): resolve Medium code-review finding (Core.ScriptedAlarms-012)
Add engine-level tests covering the six gaps identified in the finding:
(1) timed-shelve auto-expiry driven via injectable clock + RunShelvingCheckForTest
    hook so timer tests are deterministic;
(2) ConfirmAsync, TimedShelveAsync/UnshelveAsync round-trip, EnableAsync engine
    methods exercised end-to-end;
(3) OnEvent subscriber-throws isolation — engine state advances and stays
    operational after a subscriber throws;
(4) IAlarmStateStore.SaveAsync failure leaves in-memory state unchanged (locks in
    the persist-before-update invariant from finding-007);
(5) second LoadAsync does not leak the old timer (regression for finding-002);
(6) AreInputsReady cold-start guard correctly blocks on Bad/missing inputs and
    allows Uncertain-quality inputs through.

Expose RunShelvingCheckForTest() internal method on ScriptedAlarmEngine to
support deterministic timer tests.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 08:24:19 -04:00
Joseph Doherty
ce86deca62 fix(core): resolve Medium code-review finding (Core-007)
SubscribeAsync now wraps each driver handle in a private HostBoundHandle
that carries the resolved host name.  UnsubscribeAsync unwraps it and
routes through the recorded host's resilience pipeline, correctly
charging the subscription's originating host's circuit breaker/bulkhead
instead of always using the default host.  Falls back to the default
host for handles not created by this invoker.  Two regression tests
added; update findings.md Open count from 10 to 6.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 08:24:17 -04:00
Joseph Doherty
6cec98caef fix(core): resolve Medium code-review finding (Core-006)
BuildAddressSpaceAsync now checks _disposed (throws ObjectDisposedException)
and tears down the previous alarm forwarder + clears the sink registry
before re-walking, so a Galaxy-redeploy rebuild does not leak the old
forwarder and double-deliver alarm transitions.  Three regression tests
added: double-build does not double-fire, sink count is correct after
rebuild, and post-dispose call throws.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 08:24:08 -04:00
Joseph Doherty
debe163f4d fix(core): resolve Medium code-review finding (Core-005)
Change ClusterEntry from sealed record to sealed class so TryUpdate
uses reference equality for the CAS comparison.  Prune now uses a
read-compute-TryUpdate retry loop that restarts when a concurrent
Install updates the entry between the read and the write, preventing
a race that could silently drop the just-installed newest generation.
Two regression tests added to PermissionTrieCacheTests.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 08:23:52 -04:00
Joseph Doherty
09cd579220 fix(core): resolve Medium code-review finding (Core-003)
Add FolderSegment member to NodeAclScopeKind; update WalkSystemPlatform
to report NodeAclScopeKind.FolderSegment (not Equipment) for each
visited Galaxy folder level, so MatchedGrant.Scope in
AuthorizationDecision.Provenance correctly distinguishes Galaxy folder
grants from UNS Equipment grants in the audit trail and Admin UI
diagnostics.  Three regression tests added to PermissionTrieTests.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 08:23:45 -04:00
Joseph Doherty
c126fc7a7d fix(configuration): resolve Medium code-review findings (Configuration-002, -003, -006, -009)
Configuration-002: sp_PublishGeneration is transaction-nesting aware
(BEGIN TRANSACTION vs SAVE TRANSACTION on @@TRANCOUNT) so a caller's outer
transaction survives a publish failure; sp_ValidateDraft wrapped in TRY/CATCH.
Configuration-003: ValidatePathLength uses the cluster's actual Enterprise/Site
lengths when available, falling back to the conservative approximation.
Configuration-006: ResilientConfigReader treats a command-timeout
TaskCanceledException as a fault (not caller cancellation) and falls back.
Configuration-009: removed the checked-in plaintext sa connection string;
CreateDbContext now requires OTOPCUA_CONFIG_CONNECTION.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 08:13:27 -04:00
Joseph Doherty
40f06314fb test(virtual-tags): unbreak script-timeout test after sandbox deny-list change
The Timeout_maps_to_BadInternalError_without_killing_the_engine test's
"Hang" script busy-looped on Environment.TickCount64. Commit cfb9ff1
(Core.Scripting-001) added System.Environment to the script-sandbox
deny-list, so the script now fails sandbox validation instead of
reaching the timeout path. Switch the busy-loop to DateTime.UtcNow
(an allowed type) to preserve the test's intent — a self-terminating
~5s hang that overruns the 30ms script timeout.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 07:06:27 -04:00
Joseph Doherty
66e8bfbab3 fix(virtual-tags): resolve High code-review finding (Core.VirtualTags-001)
OnScriptSetVirtualTag updated the value cache, notified observers, and
recorded history for the written path but never scheduled a cascade for
tags depending on that path. This contradicts docs/VirtualTags.md, which
states ctx.SetVirtualTag writes "still participate in change-trigger
cascades": a change-triggered virtual tag reading a script-written tag
went stale until an unrelated trigger fired.

OnScriptSetVirtualTag now launches a fire-and-forget CascadeAsync for the
written path, mirroring OnUpstreamChange. The cascade is scheduled rather
than invoked inline because the callback runs inside EvaluateInternalAsync
while the non-reentrant _evalGate semaphore is held.

Added regression test
SetVirtualTag_within_script_cascades_to_dependents_of_the_written_tag.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 06:27:40 -04:00
Joseph Doherty
e3f8fa535a fix(scripted-alarms): resolve High code-review finding (Core.ScriptedAlarms-001)
_alarms was a plain Dictionary<string, AlarmState> mutated under the
_evalGate semaphore, but four read paths (GetState, GetAllStates, the
LoadedAlarmIds property, and RunShelvingCheck) touched it from arbitrary
threads with no synchronisation. A Dictionary read concurrent with a
writer's entry reassignment can throw InvalidOperationException or return
torn state.

Switched _alarms to ConcurrentDictionary<string, AlarmState>. The only
write shapes are indexer-set and Clear, both atomic on ConcurrentDictionary,
so all mutations stay correct without further change; reads now get safe
snapshot semantics. LoadedAlarmIds materialises the key snapshot to keep
its IReadOnlyCollection<string> return type. This matches _valueCache,
which is already a ConcurrentDictionary.

Added a regression test (Concurrent_reads_during_mutation_do_not_throw)
that hammers the engine with state mutations while four reader threads
continuously call the three unguarded read paths.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 06:27:40 -04:00
Joseph Doherty
4638366b77 fix(alarm-historian): resolve High code-review findings (Core.AlarmHistorian-002, -004, -006)
Core.AlarmHistorian-002 — drain loop now honors exponential backoff:
StartDrainLoop arms a self-rescheduling one-shot Timer. RescheduleDrain
sets the next due-time to max(tickInterval, CurrentBackoff) while the
sink is BackingOff, so a historian outage genuinely slows the cadence
down the 1s->2s->5s->15s->60s ladder instead of hammering at the fixed
tick. Class doc-comment updated.

Core.AlarmHistorian-004 — SQLite busy handling: the connection string
is built via SqliteConnectionStringBuilder with DefaultTimeout=5, and a
new OpenConnection helper applies PRAGMA busy_timeout=5000 and
PRAGMA journal_mode=WAL on every open. A concurrent enqueue-vs-drain
file-lock collision now waits the lock out instead of failing fast with
SQLITE_BUSY. All connection open sites switched to the helper.

Core.AlarmHistorian-006 — drain-loop faults are no longer unobserved:
the timer callback (DrainTimerCallback) awaits DrainOnceAsync inside a
try/catch that logs via _logger.Error, records the message into
_lastError, and sets _drainState=BackingOff so a stalled drain is
visible on GetStatus; a finally always re-arms the timer.

Regression tests added to SqliteStoreAndForwardSinkTests:
StartDrainLoop_honors_backoff_and_slows_cadence_under_retry,
StartDrainLoop_keeps_steady_cadence_when_writer_is_healthy,
StartDrainLoop_records_drain_fault_and_keeps_running,
Concurrent_enqueue_and_drain_do_not_throw_sqlite_busy.

findings.md: 002/004/006 marked Resolved; open count 10 -> 7.

Build: clean (0 warnings). Tests: 20/20 passing.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 06:27:39 -04:00
Joseph Doherty
abbf49141c fix(core): resolve High code-review findings (Core-001, Core-002)
Core-001: swap the authorization-cache defaults so
MembershipFreshnessInterval (5 min, inner re-resolve trigger) is
strictly less than AuthCacheMaxStaleness (15 min, fail-closed
ceiling), so NeedsRefresh's warm-refresh path is reachable.

Core-002: TriePermissionEvaluator.Authorize now compares the trie's
GenerationId against the session's AuthGenerationId and re-fetches the
session's bound generation on mismatch, failing closed when that
generation has been pruned.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 06:13:01 -04:00
Joseph Doherty
ee51878c08 fix(configuration): resolve High code-review findings (Configuration-001, Configuration-008)
Configuration-001: wrap the EXEC dbo.sp_ValidateDraft call in
sp_PublishGeneration in a BEGIN TRY/CATCH ROLLBACK; THROW block so a
validation RAISERROR aborts the publish instead of being ignored.

Configuration-008: route caller-supplied strings interpolated into
ConfigAuditLog.DetailsJson through STRING_ESCAPE(@x, 'json') and emit
sp_RollbackToGeneration's @TargetGenerationId as a bare JSON number,
closing the JSON-injection / denial-of-operation vector.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 06:12:00 -04:00
Joseph Doherty
7bb21c2aa2 fix(scripting): resolve High code-review finding (Core.Scripting-002)
The ForbiddenTypeAnalyzer syntax walker only inspected four node kinds
(ObjectCreation, Invocation-with-member-access, MemberAccess, bare
Identifier), so a forbidden type named through typeof, a generic type
argument, a cast, an is/as type pattern, default(T), an array-creation
element type, or an explicitly-typed local declaration produced no
examined node and bypassed the sandbox check.

Analyze now runs a second pass that resolves GetTypeInfo on every
TypeSyntax node and recursively unwraps array element types and generic
type arguments, so forbidden types nested at any depth are rejected at
compile. The original member/call node-kind switch is kept deliberately
narrow (rather than resolving GetSymbolInfo on every node) to avoid
flagging harmless inherited members such as typeof(int).Name, whose Name
property is declared by System.Reflection.MemberInfo. A span+type dedupe
keeps the two passes from emitting duplicate rejections.

Regression tests added in ScriptSandboxTests cover typeof, generic type
arguments, casts, default(T), is/as patterns, array element types, and
typed local declarations with forbidden types, plus over-block guards
asserting allowed generics and typeof still compile.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 06:08:08 -04:00
Joseph Doherty
796871c210 fix(alarm-historian): keep queue rows aligned to events on drain (Core.AlarmHistorian-001)
ReadBatch built parallel rowIds / events lists: rowIds.Add ran for every
row but events.Add was guarded by `if (evt is not null)`. A corrupt /
null-deserializing payload desynced the lists, so DrainOnceAsync applied
each outcome to the wrong RowId — an Ack could delete an un-sent event
(silent alarm-event data loss) and the corrupt row stalled the queue
head forever.

ReadBatch now returns a single list of QueueRow(long RowId,
AlarmHistorianEvent? Event) records so a rowId can never drift from its
event; deserialization is wrapped to yield null on JsonException.
DrainOnceAsync immediately dead-letters rows whose payload is
null/un-deserializable and forwards only well-formed events to the
writer, mapping outcomes by RowId.

Regression tests cover a corrupt row mid-batch and at the queue head.
Core.AlarmHistorian suite: 16/16 pass.

Resolves code-review finding Core.AlarmHistorian-001 (Critical).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 05:54:20 -04:00
Joseph Doherty
cfb9ff1032 fix(scripting): block dangerous System types in the script sandbox (Core.Scripting-001)
ForbiddenTypeAnalyzer used only a namespace-prefix deny-list. System.Environment,
System.AppDomain, System.GC and System.Activator live directly in the System
namespace, which must stay allowed for primitives (Math, String, ...), so they
were never caught — an operator-authored predicate could call
System.Environment.Exit(0) and terminate the in-process OPC UA server.

Add a type-granular deny-list (ForbiddenFullTypeNames) checked by
fully-qualified type name after the namespace-prefix check; legitimate System
types are unaffected.

Regression tests assert scripts referencing Environment/AppDomain/GC/Activator
are rejected at analysis time. Core.Scripting suite: 68/68 pass.

Resolves code-review finding Core.Scripting-001 (Critical).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 05:54:08 -04:00
Joseph Doherty
392b219233 fix(tests): stabilize three flaky tests under parallel full-solution load
#1 EventPumpBoundedChannelTests.Tags_metrics_with_client_name_for_multi_driver_hosts:
Replace fixed Task.Delay(100) with a poll-until-condition loop (5 s
timeout, 25 ms poll) so the test waits until the galaxy.events.received
measurement for galaxy.client=Driver-X actually lands in the listener.
Also adds lock(captured) in the MeterListener callback and at all reads,
since Counter.Add() fires the callback on the RunAsync background thread.

#2 VirtualTagEngineTests.Upstream_change_triggers_cascade_through_two_levels:
After waiting for B=15.0, also await WaitForConditionAsync for C=30.0
before asserting C. The cascade runs B then C sequentially under the
_evalGate semaphore; the prior code could read C while its evaluation
had not yet acquired the gate.

#3 ThreeUserInteropMatrixTests.Admin_Resolves_All_Five_Groups_From_LDAP:
Wrap the AuthenticateAsync call in a 15 s linked CancellationTokenSource
with one retry so transient GLAuth latency spikes under parallel test
load do not cause a CancellationToken expiry before the LDAP bind/search
complete.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-18 05:59:00 -04:00
Joseph Doherty
41da84293a fix(tests): point DB-test fixture defaults at the migrated SQL host
Four DB-backed test fixtures still defaulted DefaultServer to
localhost,14330 — missed in the 2026-04-28 Docker migration that moved
SQL Server off this VM onto the shared host 10.100.0.35. With no SQL on
localhost, all 31 DB-backed tests failed with connection timeouts,
which in turn failed the Phase 6 compliance gate (phase-6-all.ps1).

Updated SchemaComplianceFixture, HostStatusPublisherTests,
FleetStatusPollerTests, and AdminServicesIntegrationTests to default to
10.100.0.35,14330 (still overridable via OTOPCUA_CONFIG_TEST_SERVER).
Verified: Configuration.Tests 91 pass, HostStatusPublisher 4 pass,
FleetStatusPoller + AdminServicesIntegration 5 pass — all 31 green.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-18 05:12:20 -04:00
Joseph Doherty
a25593a9c6 chore: organize solution into module folders (Core/Server/Drivers/Client/Tooling)
Group all 69 projects into category subfolders under src/ and tests/ so the
Rider Solution Explorer mirrors the module structure. Folders: Core, Server,
Drivers (with a nested Driver CLIs subfolder), Client, Tooling.

- Move every project folder on disk with git mv (history preserved as renames).
- Recompute relative paths in 57 .csproj files: cross-category ProjectReferences,
  the lib/ HintPath+None refs in Driver.Historian.Wonderware, and the external
  mxaccessgw refs in Driver.Galaxy and its test project.
- Rebuild ZB.MOM.WW.OtOpcUa.slnx with nested solution folders.
- Re-prefix project paths in functional scripts (e2e, compliance, smoke SQL,
  integration, install).

Build green (0 errors); unit tests pass. Docs left for a separate pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-17 01:55:28 -04:00