Replaces the Ok=true stub with a TCP connect to the peer's OPC UA port (4840
default) with a 2s timeout. A successful connect indicates the OPC UA server
process is up + accepting connections — enough for the redundancy calculator
to treat the peer as live. A full secure-channel Hello/Acknowledge handshake
is overkill for what the redundancy calc consumes and would pull in the OPC
UA Client SDK + a PKI setup. Upgrade later if a deeper liveness signal is ever
required.
Probe extracts the host from NodeId by stripping the :port suffix (commit
5cfbe8b encoded host:port into NodeId for cluster-member identity).
Tests: 2 new tests — Ok=true against a live TcpListener on a chosen port,
Ok=false against an unreachable endpoint. All 17 Runtime tests pass (was 16
covering only the message-contract surface).
Four new docs at docs/v2/ giving a single-page tour of each v2 piece:
- Architecture-v2.md: top-level mental model (fused Host + roles + cluster + live-edit)
- Cluster.md: AkkaClusterOptions + IClusterRoleInfo + WithOtOpcUaClusterBootstrap
- ControlPlane.md: 5 admin singletons + DPS topics + deploy flow + failover recovery
- Runtime.md: per-node actor tree + state machines + engine-wiring follow-up map
Each links back to the design doc for depth. Architecture-v2 cross-references
the other three + ServiceHosting + Redundancy + security.
- Redundancy.md: full rewrite — Akka-leader-driven ServiceLevel replaces
operator-managed RedundancyRole. Documents the 5-tier ServiceLevelCalculator,
RedundancyStateActor cluster singleton, and the DPS data flow.
- ServiceHosting.md: full rewrite — single fused OtOpcUa.Host binary with
OTOPCUA_ROLES env gating. Documents the conditional DI graph and the new
health endpoints (/health/ready, /health/active, /healthz).
- security.md: v2 banner at top covering path/project renames + new JWT bearer
+ DataProtection persisted to ConfigDb. Body unchanged because the 4-concern
security model is unchanged in v2; full per-section rewrite waits for F15
(Admin pages migration) since security.md references many pages that move.
- README.md: platform overview updated to v2 (fused Host + role gating).
The original Task 14 (5-min EF migration that "drops ConfigGeneration") was
under-scoped: the design doc (live-edit model, ~line 208) requires removing
GenerationId from 13 entities (Equipment, DriverInstance, Device, Tag,
PollGroup, Namespace, UnsArea, UnsLine, NodeAcl, Script, VirtualTag,
ScriptedAlarm) and adding RowVersion columns for last-write-wins detection.
That cascades into GenerationApplier / GenerationDiff / GenerationSealedCache
and the legacy Server/Admin CRUD services.
New decomposition (~85 min total, replacing the original 5-min estimate):
14a standard 10m Add RowVersion to live-edit entities
14b high-risk 30m Drop GenerationId FK from those entities
14c high-risk 20m Obsolete GenerationApplier/Diff/SealedCache
14d standard 5m Drop ClusterNode.RedundancyRole
14e small 5m Delete ConfigGeneration + ClusterNodeGenerationState
14f high-risk 15m Consolidator: generate V2HostingAlignment migration
Policy decision (recorded with user): OtOpcUa.Server + OtOpcUa.Admin are
allowed to fail-to-compile between 14b and Task 56 - only the new v2 projects
need to stay green. Task 56 deletes the legacy projects.
Plan markdown: replaces the original Task 14 section with the 6-task
decomposition + a header explaining the rewrite. Task index table at the
bottom of the plan updated.
Tasks JSON: replaces the single Task 14 row with 6 string-id rows
("14a", "14b", ..., "14f"). Task 15 (Migrate-To-V2.ps1) and downstream
consumers re-pointed at "14f".
Verification step in 14f rewritten to use the shared docker host at
10.100.0.35 per CLAUDE.md (Docker is not installed on this Mac dev VM).
Captures the brainstormed design to align OtOpcUa with ScadaLink:
single role-gated binary, Akka.NET cluster with admin/driver roles,
cluster singletons for control plane, per-node actor hierarchy for
OPC UA runtime, dual-endpoint warm redundancy preserved with
ServiceLevel driven by Akka leader, cookie+JWT auth, Traefik routing,
and ScadaLink-style live-edit + deploy model replacing the
draft/publish ConfigGeneration lifecycle.
Five doc-content updates after this session's code-review resolution
sweep. No code touched; pure documentation drift correction.
1. docs/reqs/HighLevelReqs.md (HLR-007 — Service Hosting):
Refreshed the deployment description from "three cooperating
processes (Server, Admin, Galaxy.Host)" to "two cooperating
Windows services (Server, Admin)". The legacy x86 TopShelf
Galaxy.Host process was retired in PR 7.2 (2026-04-30); Galaxy
access now flows through the in-process Tier-A GalaxyDriver
talking gRPC to the sibling mxaccessgw gateway. Also called out
decision #30 (AddWindowsService replacing TopShelf) inline.
2. docs/VirtualTags.md:
- Line 9: "compiled via Microsoft.CodeAnalysis.CSharp.Scripting"
replaced with the current pipeline (Microsoft.CodeAnalysis.CSharp
regular compiler — Core.Scripting-008 / -016 retired the
CSharpScript/ScriptRunner path).
- Line 39: orphan-thread leak description rewritten. The
CSharp.Scripting-era "underlying ScriptRunner keeps running on
its thread-pool thread until the Roslyn runtime returns" is no
longer accurate — the new pipeline binds the script as a
regular C# Func<> delegate, so the leak is now "synchronous
CPU-bound work on a pool thread" (same operator-visible
effect, different mechanism).
3. docs/v2/plan.md decision #29 ("Galaxy Host is a separate Windows
service"):
Annotated both the decision body and the decision-log table row
with "Reversed PR 7.2, 2026-04-30" + a one-line summary of the
replacement architecture. The original reasoning is preserved as
audit trail per the decision-log convention.
4. docs/v2/implementation/phase-7-scripting-and-alarming.md A.1:
Added an Implementation note describing the
Core.Scripting-008 / -016 supersession of the original
CSharpScript pipeline. The historical record stays; the note
points future readers at docs/VirtualTags.md "Compile cache"
for the current contract.
5. docs/plans/alarms-over-gateway.md "Files" section under client
regeneration:
Updated the .NET regeneration instructions to point at the new
ZB.MOM.WW.MxGateway.Contracts.csproj path. The old
clients/dotnet/MxGateway.Client.csproj no longer exists in the
sibling repo (restructure after this plan was written) and the
vendored-binaries situation in
src/Drivers/ZB.MOM.WW.OtOpcUa.Driver.Galaxy/libs/ is called out
so a reader following the plan won't chase a deleted path.
Verification: grep against docs/ for the pre-fix wordings ("three
cooperating processes", "Galaxy.Host (TopShelf)", "ScriptRunner",
the wrong BadDeviceFailure hex code 0x80550000) returns no hits.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Core.Scripting-012 (High, Security) resolution.
The Core.Scripting-008 rewrite broadened the BCL references list from a
narrow allow-list to the full System.* + netstandard +
Microsoft.Win32.Registry set, delegating the security gate entirely to
ForbiddenTypeAnalyzer. Three categories of dangerous BCL types were
reachable from script source without a deny-list entry:
- System.Threading.ThreadPool — QueueUserWorkItem re-introduces the
background-fanout threat Core.Scripting-003 closed against
System.Threading.Tasks.
- System.Threading.Timer — schedules unbounded callback work that
outlives the per-evaluation timeout.
- System.Runtime.Loader.AssemblyLoadContext — loads arbitrary DLLs.
Defense-in-depth gap; invocation needs reflection (already denied)
but the load itself was reachable.
Fix:
- Added 'System.Runtime.Loader' to ForbiddenNamespacePrefixes
(preferred over type-granular per the recommendation so future BCL
additions to that namespace are denied by default).
- Added 'System.Threading.ThreadPool' and 'System.Threading.Timer'
to ForbiddenFullTypeNames — both live in System.Threading shared
with allowed primitives so they must be type-granular.
Regression tests added to ScriptSandboxTests:
Rejects_ThreadPool_QueueUserWorkItem_at_compile
Rejects_Timer_new_at_compile
Rejects_AssemblyLoadContext_at_compile
Docs:
docs/v2/implementation/phase-7-scripting-and-alarming.md decision #6
and the Sandbox-escape compliance-check row both updated to enumerate
the new entries per the Core.Scripting-009 doc-sync convention.
Two lower-impact suggestions from the finding's recommendation
(System.Console, CultureInfo.DefaultThreadCurrentCulture) were
intentionally not addressed and are recorded as accepted minor risks
in the resolution.
Verification: Core.Scripting.Tests 107/107 (was 104 + 3 new rejection
tests).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The task-galaxy-e2e branch was merged + deleted; the durable reference
is PR #205 alone. Tidies a dangling pointer that future readers might
chase looking for a branch that no longer exists.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Core.ScriptedAlarms-009 resolution: replace the per-call Dictionary +
AlarmPredicateContext allocation with a per-alarm reusable AlarmScratch
held in _scratchByAlarmId, refilled in place under _evalGate on each
evaluation. The hot path no longer allocates per upstream tag change.
Why this matters:
On a busy line where many tags feeding many alarms change frequently,
the old BuildReadCache allocated a fresh dictionary + context on every
predicate evaluation — a steady stream of short-lived allocations the
GC eventually has to reclaim. With the reuse, the dictionary and
context are allocated once per alarm (on first evaluation) and refilled
in place across every subsequent re-eval.
Implementation:
- New private AlarmScratch class holds the reusable
Dictionary<string, DataValueSnapshot> read cache (pre-sized to the
alarm's Inputs.Count) and the AlarmPredicateContext that wraps it by
reference. The context observes refilled values without being
re-created.
- ConcurrentDictionary<string, AlarmScratch> _scratchByAlarmId on the
engine, cleared in LoadAsync alongside _alarms so a config-publish
drops the prior generation's scratch (Inputs / Logger may change).
- EvaluatePredicateToStateAsync looks up scratch via GetOrAdd, calls
the new RefillReadCache(Dictionary, IReadOnlySet) helper to clear +
repopulate the dictionary in place, then runs the predicate against
the reused context.
- BuildReadCache removed.
Safety:
Reuse is serialised under _evalGate which guarantees no two threads
ever observe the same scratch in a half-refilled state. The
AlarmPredicateContext is bound to the scratch dictionary by reference,
so the predicate's ctx.GetTag(path) sees the freshly-refilled values
rather than a stale snapshot.
Verification:
- All 66 ScriptedAlarms tests pass (was 63 — three new regression tests
locking the reuse contract).
- All 56 VirtualTags tests still pass (unchanged).
- All 104 Core.Scripting tests still pass (unchanged).
New tests in ScriptedAlarmEngineTests:
- Reevaluation_reuses_the_same_read_cache_dictionary — asserts
ReferenceEquals(scratch_before, scratch_after) across two
evaluations of the same alarm.
- Reevaluation_reuses_the_same_predicate_context — same, for the
context.
- LoadAsync_drops_the_prior_generations_scratch — asserts a config
publish wipes the prior scratch (so a stale Logger / Inputs can't
leak into the new generation).
Internal test hooks TryGetScratchReadCacheForTest /
TryGetScratchContextForTest added via the existing
InternalsVisibleTo for the tests project. Kept internal — not part of
the public engine surface.
Docs:
- docs/v2/Galaxy.Performance.md "Scripted-alarm engine" section
rewritten as "hot-path allocation reuse" documenting the new
contract + reuse safety reasoning + the three regression tests.
- code-reviews/Core.ScriptedAlarms/findings.md -009 flipped
Won't Fix → Resolved.
- code-reviews/README.md regenerated.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Core.Scripting-008 resolution: replace the legacy CSharpScript.CreateDelegate
path with hand-rolled CSharpCompilation + Emit + collectible AssemblyLoadContext,
so per-publish compile accretion no longer requires a server restart to reclaim.
Why this was needed:
Roslyn's CSharpScript path emits dynamically-compiled script assemblies into
the default AssemblyLoadContext, which is non-collectible. Across config-
publish generations each Clear() drops dictionary entries but the emitted
assemblies stay loaded for process lifetime, so memory grows steadily on
long-running servers with frequent publishes. The accepted-limitation note
in docs/VirtualTags.md recommended scheduled restarts as the workaround;
operator feedback was that restarts are difficult, so the underlying
limitation was the right thing to fix.
Implementation:
- New ScriptAssemblyLoadContext(name, isCollectible: true) hosts one emitted
script assembly per evaluator.
- ScriptEvaluator.Compile synthesises a wrapper class around the user source
(CompiledScript.Run(globals) — explicit return required per ordinary C#
semantics, which every existing script already uses), builds a
CSharpCompilation against the sandbox references, runs the
ForbiddenTypeAnalyzer over the semantic model unchanged, emits to an
in-memory PE stream, loads via ScriptAssemblyLoadContext.LoadFromStream,
and binds a strongly-typed Func<ScriptGlobals<TContext>, TResult> delegate
via reflection.
- ScriptEvaluator now implements IDisposable — Dispose calls
AssemblyLoadContext.Unload(), which makes the emitted assembly eligible
for GC at the next collection cycle.
- CompiledScriptCache.Clear() disposes every materialised evaluator before
dropping its dictionary entry; CompiledScriptCache itself is now
IDisposable for graceful server shutdown.
- ScriptSandbox.Build returns a new SandboxConfig (References + Imports)
instead of a Roslyn ScriptOptions; references now span BCL via the
TRUSTED_PLATFORM_ASSEMBLIES set filtered to System.* + netstandard +
Microsoft.Win32.Registry, so forbidden BCL types resolve at compile and
ForbiddenTypeAnalyzer is the sole security gate (consistent with the
Core.Scripting-001 / -002 model — references-list-only restriction is
porous against type forwarding, so the analyzer must be the real gate).
Verification:
- All 104 Core.Scripting tests pass (was 101 — three new regression tests
locking the unload contract).
- All 56 VirtualTags tests pass (unchanged).
- All 63 ScriptedAlarms tests pass (unchanged).
- New CompiledScriptCacheTests:
- Dispose_unloads_compiled_script_assembly_load_context — proves single-
evaluator ALC unload via WeakReference + bounded GC.Collect() loop.
- Clear_disposes_every_materialised_evaluator — proves publish-replace
releases every prior generation's ALC.
- GetOrCompile_after_Dispose_throws_ObjectDisposedException — locks the
post-dispose contract.
Docs:
- docs/VirtualTags.md "Compile cache" section rewritten: the accepted-
limitation note replaced with the unload contract + the new authoring
convention (explicit return).
- docs/ScriptedAlarms.md cross-reference updated to drop the obsolete
restart guidance.
- code-reviews/Core.Scripting/findings.md Core.Scripting-008 flipped
Won't Fix → Resolved with the implementation summary.
- code-reviews/README.md regenerated.
Pre-existing breakage note: Driver.Galaxy fails the solution-wide build on
master because its ProjectReference to the sibling mxaccessgw repo's
MxGateway.Client targets a path that the sibling repo no longer has after a
recent restructuring. This is unrelated to Core.Scripting-008 and was
verified to exist on master before this branch was cut.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Client.UI-003: wire Serilog properly per CLAUDE.md — console sink +
rolling daily file sink in Program.Main, Log.CloseAndFlush in finally,
per-VM Log.ForContext<> loggers.
- Client.UI-004: migrate the cert-store folder picker from the obsolete
OpenFolderDialog to StorageProvider.OpenFolderPickerAsync (with
TryGetFolderFromPathAsync seed + TryGetLocalPath extraction).
- Client.UI-006: surface formerly silent catch blocks via an observable
StatusMessage on the Subscriptions / Alarms VMs that bubbles up into
the shell's status bar; soft fallbacks log at Information level so
hard failures stay distinguishable.
- Client.UI-009: docs/Client.UI.md now lists Standard Deviation in the
Aggregate row of the Query Options table.
- Client.UI-010: removed the unused MinDateTimeProperty /
MaxDateTimeProperty styled properties from DateTimeRangePicker.
- Client.UI-011: updated the cert-store TextBox watermark from the
legacy AppData/LmxOpcUaClient/pki to the canonical
AppData/OtOpcUaClient/pki.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Client.CLI-002: SubscribeCommand's neverWentBad list now requires the
node to be present in lastStatus (i.e. received at least one update)
so the 'suspect' bucket only contains observed nodes.
- Client.CLI-003: every long-running command validates numeric option
ranges (Interval / Depth / MaxDepth / Duration / Max) and throws
CliFx CommandException on out-of-range values.
- Client.CLI-004: SubscribeCommand carries XML summary docs on the
type, ctor, every [CommandOption] property, and ExecuteAsync —
matching the sibling commands' style.
- Client.CLI-006: HistoryReadCommand parses --start / --end with
InvariantCulture+UTC and surfaces FormatException as CommandException;
every NodeIdParser.ParseRequired call wraps FormatException /
ArgumentException as CommandException.
- Client.CLI-007: CommandBase.ConfigureLogging calls Log.CloseAndFlush()
before assigning a new Log.Logger so prior sinks are disposed.
- Client.CLI-008: rewrote the subscribe and historyread sections of
docs/Client.CLI.md (every flag documented, summary-bucket vocabulary,
StandardDeviation aggregate, UTC --start/--end convention).
- Client.CLI-009: SubscribeCommand / AlarmsCommand use named local
handlers and detach them via -= after UnsubscribeAsync so no
notification reaches the console after the command's output phase
ends.
- Client.CLI-010: added CommandRangeValidationTests,
EventHandlerLifecycleTests, InputValidationErrorsTests,
LoggerLifecycleTests, and SubscribeCommandSummaryTests pinning every
Low fix; FakeOpcUaClientService gained AddDiscoveredVariable +
RaiseDataChanged + BrowseResultsByParent helpers.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Driver.Modbus.Cli-003: ModbusCommandBase.ValidateEndpoint rejects
--port outside 1..65535, non-positive --timeout-ms, and --unit-id
outside 1..247.
- Driver.Modbus.Cli-004: wrapped SubscribeCommand's OnDataChange handler
body in a try/catch (warn-and-swallow) and serialised the console
write through a lock.
- Driver.Modbus.Cli-005: Probe / Read / Write now catch the
cancellation-during-init OperationCanceledException and print
'Cancelled.' instead of dumping a stack trace.
- Driver.Modbus.Cli-006: ProbeCommand.ComputeVerdict derives the headline
from BOTH the driver state and the probe snapshot's OPC UA quality
class so the headline can't disagree with the wire result.
- Driver.Modbus.Cli-007: docs/Driver.Modbus.Cli.md carries an explicit
'CLI scope' callout — the address-string grammar is a DriverConfig
JSON feature; the CLI takes the structured triple only.
- Driver.Modbus.Cli-008: pinned BuildOptions, ValidateEndpoint, the
region-validation guards, ComputeVerdict, and the cancellation-during-
initialize paths.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Driver.AbLegacy.Cli-002: WriteCommand.Value description lists the full
true/false, 1/0, on/off, yes/no alias set.
- Driver.AbLegacy.Cli-003: SubscribeCommand serialises every WriteLine
via a per-execution consoleGate lock so the poll-thread OnDataChange
handler can't interleave with the banner.
- Driver.AbLegacy.Cli-004: dropped 'await using var driver' in favour of
a plain 'var driver' + explicit await ShutdownAsync in finally; the
driver is no longer shut down twice.
- Driver.AbLegacy.Cli-005: SubscribeCommand.IntervalMs description
carries the PollGroupEngine 250ms-floor caveat; docs/Driver.AbLegacy.Cli.md
spells out the same.
- Driver.AbLegacy.Cli-006: ProbeCommand --type now carries the short
alias 't' to match the other commands.
- Driver.AbLegacy.Cli-007: BuildOptionsTests cover the probe-disabled,
device-shape, tag-passthrough, timeout-propagation, and empty-tag-list
paths.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Driver.AbCip.Cli-003: SubscribeCommand prints the 'Subscribed' banner
BEFORE wiring OnDataChange so the main thread can't interleave its
write with the poll-thread handler.
- Driver.AbCip.Cli-004: AbCipCommandBase.Timeout and SubscribeCommand
validate TimeoutMs / IntervalMs and throw CommandException on
non-positive values.
- Driver.AbCip.Cli-005: every command now calls FlushLogging() in its
finally block.
- Driver.AbCip.Cli-006: Timeout init throws NotSupportedException with a
pointer at TimeoutMs instead of silently swallowing assignments.
- Driver.AbCip.Cli-007: added AbCipCommandBaseTests covering BuildOptions
shape, probe / controller-browse / alarm toggles, host address, family
selection, tag list passthrough.
- Driver.AbCip.Cli-008: rewrote the opening paragraph in
docs/Driver.AbCip.Cli.md to credit the six-CLI roster with a pointer
at docs/DriverClis.md.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Driver.Modbus.Addressing-006: broaden the catch in TryParseFamilyNative
so a future helper throwing a non-Argument/Overflow type still satisfies
the try-parse contract.
- Driver.Modbus.Addressing-007: document that the address grammar does
not carry ModbusStringByteOrder (the structured-tag path does);
add a 'Grammar scope' bullet to docs/v2/dl205.md.
- Driver.Modbus.Addressing-009: reword the ModbusModiconAddress comments
so they don't imply a leading-digit invariant the parser doesn't
enforce.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Core.Scripting-005: DependencyExtractor.HandleTagCall now recognises
raw-string literal paths by checking the StringLiteralExpression node
kind instead of the legacy StringLiteralToken kind.
- Core.Scripting-006: scope CompiledScriptCache failed-compile eviction
with TryRemove(KeyValuePair) so a racing retry entry is not evicted.
- Core.Scripting-008: document the per-publish assembly accretion as an
accepted limitation in docs/VirtualTags.md.
- Core.Scripting-009: enumerate the authoritative deny-list (namespace
prefixes + type-granular denies) in the Phase 7 decision-#6 entry to
match ForbiddenTypeAnalyzer.
- Core.Scripting-011: pin ScriptSandbox.Build, ScriptContext.Deadband
boundary semantics, and end-to-end factory + companion-sink
integration.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Core.ScriptedAlarms-003: emit OnEvent OUTSIDE _evalGate by collecting
pending emissions during the gate-held section and flushing them after
release; eliminates re-entrancy deadlock the docs already promised.
- Core.ScriptedAlarms-006: track every fire-and-forget Reevaluate /
ShelvingCheck task in _inFlight; Dispose drains the set so the engine
no longer races store writes against teardown.
- Core.ScriptedAlarms-008: store comments as ImmutableList<AlarmComment>
so AppendComment is O(log n) instead of O(n).
- Core.ScriptedAlarms-010: document the deliberate input-quality
asymmetry (Uncertain drives the predicate, renders {?} in the message)
in docs/ScriptedAlarms.md and on MessageTemplate.Resolve remarks.
- Core.ScriptedAlarms-011: propagate the no-op reason through
TransitionResult.NoOp(state, reason) and log it from
ScriptedAlarmEngine.ApplyAsync.
- Core.ScriptedAlarms-009 (Won't Fix per recommendation): documented the
per-evaluation dictionary allocation in docs/v2/Galaxy.Performance.md
with a mitigation path if a future soak surfaces pressure.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Add #pragma warning disable xUnit1051 at the top of ContractsWireParityTests.cs.
The xUnit1051 analyser fires on MessagePack's Serialize/Deserialize overloads that
have an optional CancellationToken parameter; these are synchronous parity tests
where the token is not meaningful — the suppression is scoped to this file only.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Add System.Threading.Tasks to ForbiddenNamespacePrefixes so scripts
cannot use Task.Run / Parallel to spawn background work that outlives
the per-evaluation timeout. Document the unbounded-memory accepted
trade-off and the Task denial rationale in docs/VirtualTags.md (new
"Known resource limits" subsection) and cross-reference from
docs/ScriptedAlarms.md.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
SdkAlarmHistorianWriteBackend.WriteBatchAsync replaces the RetryPlease
placeholder with the real entry point — HistorianAccess.AddStreamedValue
(HistorianEvent, out HistorianAccessError) in aahClientManaged, pinned by
decompiling the installed SDK.
The write path opens its own ReadOnly=false connection: the query-side
HistorianDataSource opens ReadOnly sessions and AddStreamedValue fails on
those with WriteToReadOnlyFile. IHistorianConnectionFactory gains a readOnly
parameter (default true, query path unchanged); BuildConnectionArgs is
extracted as a pure helper. HistorianClusterEndpointPicker is shared for
node failover; connection-class errors abort the batch as RetryPlease and
reset the connection, malformed-input codes map to PermanentFail.
Tests: connection-unavailable batch deferral, ClassifyOutcome error-code
table, BuildConnectionArgs read-vs-write shaping (80 pass, 2 rig-skipped).
Live_* round-trip tests stay Skip-gated for the D.1 rollout smoke.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
RouteScriptedAlarmMethodCalls now handles ConditionType.AddComment
alongside Acknowledge/Confirm, dispatching to engine.AddCommentAsync.
An empty comment is rejected by the Part 9 state machine and surfaced
as BadInvalidArgument. MapCallOperation gates AddComment at the
AlarmAcknowledge tier — there is no dedicated AddComment permission bit.
Closes phase-7-status.md Gap 1: all Part 9 alarm methods now route to
the engine. Adds 3 unit tests.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
OneShotShelve / TimedShelve / Unshelve now reach the ScriptedAlarmEngine.
Scripted-alarm condition nodes get a ShelvedStateMachine subtree created
before alarm.Create so the stack wires each shelve method's dispatch
handler; AlarmConditionState.OnShelve / OnTimedUnshelve route to the
engine and mirror the result onto the OPC UA node via SetShelvingState.
The three per-instance shelve method NodeIds are indexed so the Call gate
resolves them to OpcUaOperation.AlarmShelve instead of falling through to
generic Call. Engine dispatch is split into the node-free InvokeEngineShelve
so the routing decision is unit-testable.
Adds 9 unit tests; updates phase-7-status.md Gap 1 (only AddComment remains
unwired) and the #24 entry in looseends.md.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>