The original Task 14 (5-min EF migration that "drops ConfigGeneration") was
under-scoped: the design doc (live-edit model, ~line 208) requires removing
GenerationId from 13 entities (Equipment, DriverInstance, Device, Tag,
PollGroup, Namespace, UnsArea, UnsLine, NodeAcl, Script, VirtualTag,
ScriptedAlarm) and adding RowVersion columns for last-write-wins detection.
That cascades into GenerationApplier / GenerationDiff / GenerationSealedCache
and the legacy Server/Admin CRUD services.
New decomposition (~85 min total, replacing the original 5-min estimate):
14a standard 10m Add RowVersion to live-edit entities
14b high-risk 30m Drop GenerationId FK from those entities
14c high-risk 20m Obsolete GenerationApplier/Diff/SealedCache
14d standard 5m Drop ClusterNode.RedundancyRole
14e small 5m Delete ConfigGeneration + ClusterNodeGenerationState
14f high-risk 15m Consolidator: generate V2HostingAlignment migration
Policy decision (recorded with user): OtOpcUa.Server + OtOpcUa.Admin are
allowed to fail-to-compile between 14b and Task 56 - only the new v2 projects
need to stay green. Task 56 deletes the legacy projects.
Plan markdown: replaces the original Task 14 section with the 6-task
decomposition + a header explaining the rewrite. Task index table at the
bottom of the plan updated.
Tasks JSON: replaces the single Task 14 row with 6 string-id rows
("14a", "14b", ..., "14f"). Task 15 (Migrate-To-V2.ps1) and downstream
consumers re-pointed at "14f".
Verification step in 14f rewritten to use the shared docker host at
10.100.0.35 per CLAUDE.md (Docker is not installed on this Mac dev VM).
Captures the brainstormed design to align OtOpcUa with ScadaLink:
single role-gated binary, Akka.NET cluster with admin/driver roles,
cluster singletons for control plane, per-node actor hierarchy for
OPC UA runtime, dual-endpoint warm redundancy preserved with
ServiceLevel driven by Akka leader, cookie+JWT auth, Traefik routing,
and ScadaLink-style live-edit + deploy model replacing the
draft/publish ConfigGeneration lifecycle.
Five doc-content updates after this session's code-review resolution
sweep. No code touched; pure documentation drift correction.
1. docs/reqs/HighLevelReqs.md (HLR-007 — Service Hosting):
Refreshed the deployment description from "three cooperating
processes (Server, Admin, Galaxy.Host)" to "two cooperating
Windows services (Server, Admin)". The legacy x86 TopShelf
Galaxy.Host process was retired in PR 7.2 (2026-04-30); Galaxy
access now flows through the in-process Tier-A GalaxyDriver
talking gRPC to the sibling mxaccessgw gateway. Also called out
decision #30 (AddWindowsService replacing TopShelf) inline.
2. docs/VirtualTags.md:
- Line 9: "compiled via Microsoft.CodeAnalysis.CSharp.Scripting"
replaced with the current pipeline (Microsoft.CodeAnalysis.CSharp
regular compiler — Core.Scripting-008 / -016 retired the
CSharpScript/ScriptRunner path).
- Line 39: orphan-thread leak description rewritten. The
CSharp.Scripting-era "underlying ScriptRunner keeps running on
its thread-pool thread until the Roslyn runtime returns" is no
longer accurate — the new pipeline binds the script as a
regular C# Func<> delegate, so the leak is now "synchronous
CPU-bound work on a pool thread" (same operator-visible
effect, different mechanism).
3. docs/v2/plan.md decision #29 ("Galaxy Host is a separate Windows
service"):
Annotated both the decision body and the decision-log table row
with "Reversed PR 7.2, 2026-04-30" + a one-line summary of the
replacement architecture. The original reasoning is preserved as
audit trail per the decision-log convention.
4. docs/v2/implementation/phase-7-scripting-and-alarming.md A.1:
Added an Implementation note describing the
Core.Scripting-008 / -016 supersession of the original
CSharpScript pipeline. The historical record stays; the note
points future readers at docs/VirtualTags.md "Compile cache"
for the current contract.
5. docs/plans/alarms-over-gateway.md "Files" section under client
regeneration:
Updated the .NET regeneration instructions to point at the new
ZB.MOM.WW.MxGateway.Contracts.csproj path. The old
clients/dotnet/MxGateway.Client.csproj no longer exists in the
sibling repo (restructure after this plan was written) and the
vendored-binaries situation in
src/Drivers/ZB.MOM.WW.OtOpcUa.Driver.Galaxy/libs/ is called out
so a reader following the plan won't chase a deleted path.
Verification: grep against docs/ for the pre-fix wordings ("three
cooperating processes", "Galaxy.Host (TopShelf)", "ScriptRunner",
the wrong BadDeviceFailure hex code 0x80550000) returns no hits.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Core.Scripting-012 (High, Security) resolution.
The Core.Scripting-008 rewrite broadened the BCL references list from a
narrow allow-list to the full System.* + netstandard +
Microsoft.Win32.Registry set, delegating the security gate entirely to
ForbiddenTypeAnalyzer. Three categories of dangerous BCL types were
reachable from script source without a deny-list entry:
- System.Threading.ThreadPool — QueueUserWorkItem re-introduces the
background-fanout threat Core.Scripting-003 closed against
System.Threading.Tasks.
- System.Threading.Timer — schedules unbounded callback work that
outlives the per-evaluation timeout.
- System.Runtime.Loader.AssemblyLoadContext — loads arbitrary DLLs.
Defense-in-depth gap; invocation needs reflection (already denied)
but the load itself was reachable.
Fix:
- Added 'System.Runtime.Loader' to ForbiddenNamespacePrefixes
(preferred over type-granular per the recommendation so future BCL
additions to that namespace are denied by default).
- Added 'System.Threading.ThreadPool' and 'System.Threading.Timer'
to ForbiddenFullTypeNames — both live in System.Threading shared
with allowed primitives so they must be type-granular.
Regression tests added to ScriptSandboxTests:
Rejects_ThreadPool_QueueUserWorkItem_at_compile
Rejects_Timer_new_at_compile
Rejects_AssemblyLoadContext_at_compile
Docs:
docs/v2/implementation/phase-7-scripting-and-alarming.md decision #6
and the Sandbox-escape compliance-check row both updated to enumerate
the new entries per the Core.Scripting-009 doc-sync convention.
Two lower-impact suggestions from the finding's recommendation
(System.Console, CultureInfo.DefaultThreadCurrentCulture) were
intentionally not addressed and are recorded as accepted minor risks
in the resolution.
Verification: Core.Scripting.Tests 107/107 (was 104 + 3 new rejection
tests).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The task-galaxy-e2e branch was merged + deleted; the durable reference
is PR #205 alone. Tidies a dangling pointer that future readers might
chase looking for a branch that no longer exists.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Core.ScriptedAlarms-009 resolution: replace the per-call Dictionary +
AlarmPredicateContext allocation with a per-alarm reusable AlarmScratch
held in _scratchByAlarmId, refilled in place under _evalGate on each
evaluation. The hot path no longer allocates per upstream tag change.
Why this matters:
On a busy line where many tags feeding many alarms change frequently,
the old BuildReadCache allocated a fresh dictionary + context on every
predicate evaluation — a steady stream of short-lived allocations the
GC eventually has to reclaim. With the reuse, the dictionary and
context are allocated once per alarm (on first evaluation) and refilled
in place across every subsequent re-eval.
Implementation:
- New private AlarmScratch class holds the reusable
Dictionary<string, DataValueSnapshot> read cache (pre-sized to the
alarm's Inputs.Count) and the AlarmPredicateContext that wraps it by
reference. The context observes refilled values without being
re-created.
- ConcurrentDictionary<string, AlarmScratch> _scratchByAlarmId on the
engine, cleared in LoadAsync alongside _alarms so a config-publish
drops the prior generation's scratch (Inputs / Logger may change).
- EvaluatePredicateToStateAsync looks up scratch via GetOrAdd, calls
the new RefillReadCache(Dictionary, IReadOnlySet) helper to clear +
repopulate the dictionary in place, then runs the predicate against
the reused context.
- BuildReadCache removed.
Safety:
Reuse is serialised under _evalGate which guarantees no two threads
ever observe the same scratch in a half-refilled state. The
AlarmPredicateContext is bound to the scratch dictionary by reference,
so the predicate's ctx.GetTag(path) sees the freshly-refilled values
rather than a stale snapshot.
Verification:
- All 66 ScriptedAlarms tests pass (was 63 — three new regression tests
locking the reuse contract).
- All 56 VirtualTags tests still pass (unchanged).
- All 104 Core.Scripting tests still pass (unchanged).
New tests in ScriptedAlarmEngineTests:
- Reevaluation_reuses_the_same_read_cache_dictionary — asserts
ReferenceEquals(scratch_before, scratch_after) across two
evaluations of the same alarm.
- Reevaluation_reuses_the_same_predicate_context — same, for the
context.
- LoadAsync_drops_the_prior_generations_scratch — asserts a config
publish wipes the prior scratch (so a stale Logger / Inputs can't
leak into the new generation).
Internal test hooks TryGetScratchReadCacheForTest /
TryGetScratchContextForTest added via the existing
InternalsVisibleTo for the tests project. Kept internal — not part of
the public engine surface.
Docs:
- docs/v2/Galaxy.Performance.md "Scripted-alarm engine" section
rewritten as "hot-path allocation reuse" documenting the new
contract + reuse safety reasoning + the three regression tests.
- code-reviews/Core.ScriptedAlarms/findings.md -009 flipped
Won't Fix → Resolved.
- code-reviews/README.md regenerated.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Core.Scripting-008 resolution: replace the legacy CSharpScript.CreateDelegate
path with hand-rolled CSharpCompilation + Emit + collectible AssemblyLoadContext,
so per-publish compile accretion no longer requires a server restart to reclaim.
Why this was needed:
Roslyn's CSharpScript path emits dynamically-compiled script assemblies into
the default AssemblyLoadContext, which is non-collectible. Across config-
publish generations each Clear() drops dictionary entries but the emitted
assemblies stay loaded for process lifetime, so memory grows steadily on
long-running servers with frequent publishes. The accepted-limitation note
in docs/VirtualTags.md recommended scheduled restarts as the workaround;
operator feedback was that restarts are difficult, so the underlying
limitation was the right thing to fix.
Implementation:
- New ScriptAssemblyLoadContext(name, isCollectible: true) hosts one emitted
script assembly per evaluator.
- ScriptEvaluator.Compile synthesises a wrapper class around the user source
(CompiledScript.Run(globals) — explicit return required per ordinary C#
semantics, which every existing script already uses), builds a
CSharpCompilation against the sandbox references, runs the
ForbiddenTypeAnalyzer over the semantic model unchanged, emits to an
in-memory PE stream, loads via ScriptAssemblyLoadContext.LoadFromStream,
and binds a strongly-typed Func<ScriptGlobals<TContext>, TResult> delegate
via reflection.
- ScriptEvaluator now implements IDisposable — Dispose calls
AssemblyLoadContext.Unload(), which makes the emitted assembly eligible
for GC at the next collection cycle.
- CompiledScriptCache.Clear() disposes every materialised evaluator before
dropping its dictionary entry; CompiledScriptCache itself is now
IDisposable for graceful server shutdown.
- ScriptSandbox.Build returns a new SandboxConfig (References + Imports)
instead of a Roslyn ScriptOptions; references now span BCL via the
TRUSTED_PLATFORM_ASSEMBLIES set filtered to System.* + netstandard +
Microsoft.Win32.Registry, so forbidden BCL types resolve at compile and
ForbiddenTypeAnalyzer is the sole security gate (consistent with the
Core.Scripting-001 / -002 model — references-list-only restriction is
porous against type forwarding, so the analyzer must be the real gate).
Verification:
- All 104 Core.Scripting tests pass (was 101 — three new regression tests
locking the unload contract).
- All 56 VirtualTags tests pass (unchanged).
- All 63 ScriptedAlarms tests pass (unchanged).
- New CompiledScriptCacheTests:
- Dispose_unloads_compiled_script_assembly_load_context — proves single-
evaluator ALC unload via WeakReference + bounded GC.Collect() loop.
- Clear_disposes_every_materialised_evaluator — proves publish-replace
releases every prior generation's ALC.
- GetOrCompile_after_Dispose_throws_ObjectDisposedException — locks the
post-dispose contract.
Docs:
- docs/VirtualTags.md "Compile cache" section rewritten: the accepted-
limitation note replaced with the unload contract + the new authoring
convention (explicit return).
- docs/ScriptedAlarms.md cross-reference updated to drop the obsolete
restart guidance.
- code-reviews/Core.Scripting/findings.md Core.Scripting-008 flipped
Won't Fix → Resolved with the implementation summary.
- code-reviews/README.md regenerated.
Pre-existing breakage note: Driver.Galaxy fails the solution-wide build on
master because its ProjectReference to the sibling mxaccessgw repo's
MxGateway.Client targets a path that the sibling repo no longer has after a
recent restructuring. This is unrelated to Core.Scripting-008 and was
verified to exist on master before this branch was cut.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Client.UI-003: wire Serilog properly per CLAUDE.md — console sink +
rolling daily file sink in Program.Main, Log.CloseAndFlush in finally,
per-VM Log.ForContext<> loggers.
- Client.UI-004: migrate the cert-store folder picker from the obsolete
OpenFolderDialog to StorageProvider.OpenFolderPickerAsync (with
TryGetFolderFromPathAsync seed + TryGetLocalPath extraction).
- Client.UI-006: surface formerly silent catch blocks via an observable
StatusMessage on the Subscriptions / Alarms VMs that bubbles up into
the shell's status bar; soft fallbacks log at Information level so
hard failures stay distinguishable.
- Client.UI-009: docs/Client.UI.md now lists Standard Deviation in the
Aggregate row of the Query Options table.
- Client.UI-010: removed the unused MinDateTimeProperty /
MaxDateTimeProperty styled properties from DateTimeRangePicker.
- Client.UI-011: updated the cert-store TextBox watermark from the
legacy AppData/LmxOpcUaClient/pki to the canonical
AppData/OtOpcUaClient/pki.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Client.CLI-002: SubscribeCommand's neverWentBad list now requires the
node to be present in lastStatus (i.e. received at least one update)
so the 'suspect' bucket only contains observed nodes.
- Client.CLI-003: every long-running command validates numeric option
ranges (Interval / Depth / MaxDepth / Duration / Max) and throws
CliFx CommandException on out-of-range values.
- Client.CLI-004: SubscribeCommand carries XML summary docs on the
type, ctor, every [CommandOption] property, and ExecuteAsync —
matching the sibling commands' style.
- Client.CLI-006: HistoryReadCommand parses --start / --end with
InvariantCulture+UTC and surfaces FormatException as CommandException;
every NodeIdParser.ParseRequired call wraps FormatException /
ArgumentException as CommandException.
- Client.CLI-007: CommandBase.ConfigureLogging calls Log.CloseAndFlush()
before assigning a new Log.Logger so prior sinks are disposed.
- Client.CLI-008: rewrote the subscribe and historyread sections of
docs/Client.CLI.md (every flag documented, summary-bucket vocabulary,
StandardDeviation aggregate, UTC --start/--end convention).
- Client.CLI-009: SubscribeCommand / AlarmsCommand use named local
handlers and detach them via -= after UnsubscribeAsync so no
notification reaches the console after the command's output phase
ends.
- Client.CLI-010: added CommandRangeValidationTests,
EventHandlerLifecycleTests, InputValidationErrorsTests,
LoggerLifecycleTests, and SubscribeCommandSummaryTests pinning every
Low fix; FakeOpcUaClientService gained AddDiscoveredVariable +
RaiseDataChanged + BrowseResultsByParent helpers.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Driver.Modbus.Cli-003: ModbusCommandBase.ValidateEndpoint rejects
--port outside 1..65535, non-positive --timeout-ms, and --unit-id
outside 1..247.
- Driver.Modbus.Cli-004: wrapped SubscribeCommand's OnDataChange handler
body in a try/catch (warn-and-swallow) and serialised the console
write through a lock.
- Driver.Modbus.Cli-005: Probe / Read / Write now catch the
cancellation-during-init OperationCanceledException and print
'Cancelled.' instead of dumping a stack trace.
- Driver.Modbus.Cli-006: ProbeCommand.ComputeVerdict derives the headline
from BOTH the driver state and the probe snapshot's OPC UA quality
class so the headline can't disagree with the wire result.
- Driver.Modbus.Cli-007: docs/Driver.Modbus.Cli.md carries an explicit
'CLI scope' callout — the address-string grammar is a DriverConfig
JSON feature; the CLI takes the structured triple only.
- Driver.Modbus.Cli-008: pinned BuildOptions, ValidateEndpoint, the
region-validation guards, ComputeVerdict, and the cancellation-during-
initialize paths.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Driver.AbLegacy.Cli-002: WriteCommand.Value description lists the full
true/false, 1/0, on/off, yes/no alias set.
- Driver.AbLegacy.Cli-003: SubscribeCommand serialises every WriteLine
via a per-execution consoleGate lock so the poll-thread OnDataChange
handler can't interleave with the banner.
- Driver.AbLegacy.Cli-004: dropped 'await using var driver' in favour of
a plain 'var driver' + explicit await ShutdownAsync in finally; the
driver is no longer shut down twice.
- Driver.AbLegacy.Cli-005: SubscribeCommand.IntervalMs description
carries the PollGroupEngine 250ms-floor caveat; docs/Driver.AbLegacy.Cli.md
spells out the same.
- Driver.AbLegacy.Cli-006: ProbeCommand --type now carries the short
alias 't' to match the other commands.
- Driver.AbLegacy.Cli-007: BuildOptionsTests cover the probe-disabled,
device-shape, tag-passthrough, timeout-propagation, and empty-tag-list
paths.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Driver.AbCip.Cli-003: SubscribeCommand prints the 'Subscribed' banner
BEFORE wiring OnDataChange so the main thread can't interleave its
write with the poll-thread handler.
- Driver.AbCip.Cli-004: AbCipCommandBase.Timeout and SubscribeCommand
validate TimeoutMs / IntervalMs and throw CommandException on
non-positive values.
- Driver.AbCip.Cli-005: every command now calls FlushLogging() in its
finally block.
- Driver.AbCip.Cli-006: Timeout init throws NotSupportedException with a
pointer at TimeoutMs instead of silently swallowing assignments.
- Driver.AbCip.Cli-007: added AbCipCommandBaseTests covering BuildOptions
shape, probe / controller-browse / alarm toggles, host address, family
selection, tag list passthrough.
- Driver.AbCip.Cli-008: rewrote the opening paragraph in
docs/Driver.AbCip.Cli.md to credit the six-CLI roster with a pointer
at docs/DriverClis.md.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Driver.Modbus.Addressing-006: broaden the catch in TryParseFamilyNative
so a future helper throwing a non-Argument/Overflow type still satisfies
the try-parse contract.
- Driver.Modbus.Addressing-007: document that the address grammar does
not carry ModbusStringByteOrder (the structured-tag path does);
add a 'Grammar scope' bullet to docs/v2/dl205.md.
- Driver.Modbus.Addressing-009: reword the ModbusModiconAddress comments
so they don't imply a leading-digit invariant the parser doesn't
enforce.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Core.Scripting-005: DependencyExtractor.HandleTagCall now recognises
raw-string literal paths by checking the StringLiteralExpression node
kind instead of the legacy StringLiteralToken kind.
- Core.Scripting-006: scope CompiledScriptCache failed-compile eviction
with TryRemove(KeyValuePair) so a racing retry entry is not evicted.
- Core.Scripting-008: document the per-publish assembly accretion as an
accepted limitation in docs/VirtualTags.md.
- Core.Scripting-009: enumerate the authoritative deny-list (namespace
prefixes + type-granular denies) in the Phase 7 decision-#6 entry to
match ForbiddenTypeAnalyzer.
- Core.Scripting-011: pin ScriptSandbox.Build, ScriptContext.Deadband
boundary semantics, and end-to-end factory + companion-sink
integration.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Core.ScriptedAlarms-003: emit OnEvent OUTSIDE _evalGate by collecting
pending emissions during the gate-held section and flushing them after
release; eliminates re-entrancy deadlock the docs already promised.
- Core.ScriptedAlarms-006: track every fire-and-forget Reevaluate /
ShelvingCheck task in _inFlight; Dispose drains the set so the engine
no longer races store writes against teardown.
- Core.ScriptedAlarms-008: store comments as ImmutableList<AlarmComment>
so AppendComment is O(log n) instead of O(n).
- Core.ScriptedAlarms-010: document the deliberate input-quality
asymmetry (Uncertain drives the predicate, renders {?} in the message)
in docs/ScriptedAlarms.md and on MessageTemplate.Resolve remarks.
- Core.ScriptedAlarms-011: propagate the no-op reason through
TransitionResult.NoOp(state, reason) and log it from
ScriptedAlarmEngine.ApplyAsync.
- Core.ScriptedAlarms-009 (Won't Fix per recommendation): documented the
per-evaluation dictionary allocation in docs/v2/Galaxy.Performance.md
with a mitigation path if a future soak surfaces pressure.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Add #pragma warning disable xUnit1051 at the top of ContractsWireParityTests.cs.
The xUnit1051 analyser fires on MessagePack's Serialize/Deserialize overloads that
have an optional CancellationToken parameter; these are synchronous parity tests
where the token is not meaningful — the suppression is scoped to this file only.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Add System.Threading.Tasks to ForbiddenNamespacePrefixes so scripts
cannot use Task.Run / Parallel to spawn background work that outlives
the per-evaluation timeout. Document the unbounded-memory accepted
trade-off and the Task denial rationale in docs/VirtualTags.md (new
"Known resource limits" subsection) and cross-reference from
docs/ScriptedAlarms.md.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
SdkAlarmHistorianWriteBackend.WriteBatchAsync replaces the RetryPlease
placeholder with the real entry point — HistorianAccess.AddStreamedValue
(HistorianEvent, out HistorianAccessError) in aahClientManaged, pinned by
decompiling the installed SDK.
The write path opens its own ReadOnly=false connection: the query-side
HistorianDataSource opens ReadOnly sessions and AddStreamedValue fails on
those with WriteToReadOnlyFile. IHistorianConnectionFactory gains a readOnly
parameter (default true, query path unchanged); BuildConnectionArgs is
extracted as a pure helper. HistorianClusterEndpointPicker is shared for
node failover; connection-class errors abort the batch as RetryPlease and
reset the connection, malformed-input codes map to PermanentFail.
Tests: connection-unavailable batch deferral, ClassifyOutcome error-code
table, BuildConnectionArgs read-vs-write shaping (80 pass, 2 rig-skipped).
Live_* round-trip tests stay Skip-gated for the D.1 rollout smoke.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
RouteScriptedAlarmMethodCalls now handles ConditionType.AddComment
alongside Acknowledge/Confirm, dispatching to engine.AddCommentAsync.
An empty comment is rejected by the Part 9 state machine and surfaced
as BadInvalidArgument. MapCallOperation gates AddComment at the
AlarmAcknowledge tier — there is no dedicated AddComment permission bit.
Closes phase-7-status.md Gap 1: all Part 9 alarm methods now route to
the engine. Adds 3 unit tests.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
OneShotShelve / TimedShelve / Unshelve now reach the ScriptedAlarmEngine.
Scripted-alarm condition nodes get a ShelvedStateMachine subtree created
before alarm.Create so the stack wires each shelve method's dispatch
handler; AlarmConditionState.OnShelve / OnTimedUnshelve route to the
engine and mirror the result onto the OPC UA node via SetShelvingState.
The three per-instance shelve method NodeIds are indexed so the Call gate
resolves them to OpcUaOperation.AlarmShelve instead of falling through to
generic Call. Engine dispatch is split into the node-free InvokeEngineShelve
so the routing decision is unit-testable.
Adds 9 unit tests; updates phase-7-status.md Gap 1 (only AddComment remains
unwired) and the #24 entry in looseends.md.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Produces docs/plans/ entries for tasks #13, #15, #16, and #17-#20:
- phase-6-3-redundancy-interop-plan.md: automation boundary analysis,
concrete test matrix (A/B/C blocks), and a step-by-step cutover
runbook for the deferred Stream F client interop work
- v2-ga-lab-gates-plan.md: exact gate list with command, pass criterion,
and owner for each of the nine v2 GA exit criteria
- live-hardware-validation-runbooks.md: one runbook per driver (FOCAS
CNC smoke #54, AB CIP live-boot, TwinCAT wire-live) with preconditions,
procedure, expected results, and recording template
- alarms-worker-wiring-plan.md: focused plan for A.2/A.3-A.4/C.1/D.1
worker wiring in the mxaccessgw sibling repo, documenting the
discovered AVEVA API surface, the architectural decision that blocks
A.2, the dependency order, and what each item needs to unblock
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Audits every Phase 7 plan stream (A-H) against the repo, confirms the
exit gate is fully closed, and records the five genuine remaining gaps:
OPC UA method-call dispatch for alarm Ack/Confirm/Shelve, the
/virtual-tags and /scripted-alarms Admin UI tabs, the script log viewer,
and the missing production IHistoryWriter for virtual-tag historization.
Also notes that docs/v2/v2-release-readiness.md carries a stale "out of
scope" label — Phase 7 shipped completely after that doc was last updated.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Every item in lmx-followups.md was marked DONE and rooted in the
retired GalaxyProxyDriver / OtOpcUaGalaxyHost named-pipe architecture
deleted in PR 7.2. No live or unique content remains: v2-release-readiness.md
is the canonical open-work tracker. Remove the file and drop the
now-dead link from docs/README.md.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Document the ZTag / SAPID external-ID reservation subsystem: what a
reservation is, why it sits outside the generation flow (decision #124),
the ExternalIdReservation table, the lifecycle (author → publish
precheck → publish-time MERGE → FleetAdmin release), and the
/reservations Admin page. Linked from the docs README Operational table.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Rewrite src/ and tests/ project paths in docs, CLAUDE.md, README.md, and
test-fixture READMEs to the new module-folder layout (Core/Server/Drivers/
Client/Tooling). References to retired v1 projects (Galaxy.Host/Proxy/Shared,
the legacy monolithic test projects) are left untouched.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Replaces the "ships as a follow-up gated on dev-rig validation"
banner with the actual finding from the dev-rig inspection: the
MXAccess COM Toolkit on this AVEVA install does not expose any
alarm-event family, and the AVEVA alarm-subscription managed
assemblies (aaAlarmManagedClient, ArchestrAAlarmsAndEvents.SDK)
are x64-only and incompatible with the worker's x86 bitness.
Two operator-facing paths forward documented inline:
1. Stay on the value-driven sub-attribute path (current production
behaviour). Operator-comment fidelity is the only v1 regression.
2. Add an x64 alarm-helper sub-process alongside the worker that
loads aaAlarmManagedClient and forwards transitions over a
named-pipe IPC. Recovers full v1 fidelity but adds operational
complexity.
The full architectural notes live in the mxaccessgw repo at
src/MxGateway.Worker/MxAccess/MxAccessAlarmEventSink.cs.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Seventeenth PR of the alarms-over-gateway epic
(docs/plans/alarms-over-gateway.md). Lands the script that the
plan calls for in Track D — the actual smoke-run validation
on the dev rig (publish, restart, fire alarms, capture artifacts)
remains operator work; this PR ships the automation that the
operator drives.
scripts/install/Refresh-Services.ps1 — single-shot refresh
script. Designed to run elevated on the deploy host
(DESKTOP-6JL3KKO today; production uses a separate runbook).
The script:
- Stops services in reverse-dependency order (OtOpcUa →
OtOpcUaWonderwareHistorian → MxAccessGw) and force-kills any
residual processes (avoids the publish-time MSB3027 file-lock
the original install script hit).
- Snapshots existing C:\publish trees to
C:\publish\.backup-YYYY-MM-DD-HHMMSS\ for rollback (skip with
-SkipBackup).
- Builds + copies mxaccessgw worker (x86 net48) + server (net10.0)
binaries from the sibling repo.
- Publishes OtOpcUa Server + Wonderware historian sidecar from
this repo.
- Ensures OTOPCUA_HISTORIAN_ALARM_WRITE_ENABLED=true is set on
the historian service env block (PR C.2 toggle).
- Starts services in forward-dependency order with the
inter-service waits the original install used.
- Smoke-verifies (service status, listening ports 5120 / 4840
/ 4841, recent log tails).
Supports -WhatIf for dry-run inspection without touching the
running services.
docs/v2/dev-environment.md — new "Service Refresh —
Refresh-Services.ps1" section between Credential Management
and Test Data Seed. Cross-references the plan's Track D
functional verification scenarios.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sixteenth PR of the alarms-over-gateway epic
(docs/plans/alarms-over-gateway.md). Closes the documentation sweep
the plan calls for.
- docs/AlarmTracking.md — promoted top-level v2-final architecture
doc (was a worktree-only draft pre-epic). Covers the three alarm
sources (Galaxy MxAccess driver-native / Galaxy sub-attribute
fallback / scripted alarms), how they converge on
AlarmConditionService, the Acknowledge routing decision in
DriverNodeManager (driver-native preferred over IWritable
sub-attribute fallback), the sidecar historian write-back path
for non-Galaxy producers, and cross-references to the plan +
v1 archive.
- docs/v1/AlarmTracking.md — banner pointing readers at the v2
doc; preserved as historical record.
- docs/drivers/Galaxy.md — capability list updated to include
IAlarmSource (now eight capabilities, restored by B.2). Replaced
the "IAlarmSource retired in 7.2" sentence with the restoration
note + cross-link to docs/AlarmTracking.md.
- docs/plans/alarms-over-gateway.md — completion banner at the
top of the plan, marking 14 of 16 PRs shipped 2026-04-30 and
noting that A.2 + A.4 + D.1 are the hardware-gated follow-up.
Memory entries updated separately:
- project_alarms_over_gateway_epic.md (new) — epic summary +
per-PR digest.
- project_galaxy_via_mxgateway.md — added "Alarms restored"
bullet pointing at the new architecture.
- project_server_history_alarm_subsystems.md — bullet 2 updated
to describe the new ack-routing decision (B.3) + bullet 3
added describing the historian write-back path that B.4 + C.1
+ C.2 light up.
- MEMORY.md index — new pointer entry.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Cover both client surfaces that become user-visible when the alarm
path lights up:
- mxaccessgw client SDKs in 5 languages (.NET, Python, Go, Java,
Rust). E.1 regens proto across all of them; E.2-E.6 add per-language
alarms helpers (subscribe / acknowledge / query-active) plus matching
CLI verbs.
- lmxopcua OPC UA-facing clients (Client.CLI, Client.UI). E.7 extends
AlarmEventArgs with the new optional fields, surfaces them in the
CLI's --verbose / --json output and the UI's Show-details toggle,
and updates ClientRequirements + Client.{CLI,UI}.md.
Sequencing: E.1 first (mechanical regen), then E.2-E.7 in parallel.
E.2 (.NET) is on the critical path because lmxopcua consumes it; the
other-language SDKs can ship asynchronously without gating D.1.
12 PRs grew to 19 total: 4 in A, 5 in B, 2 in C, 7 in E, 1 in D.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
After A/B/C all merge, the running services on C:\publish need to be
refreshed before the Galaxy alarm-event family flows end-to-end. Add
PR D.1: a Refresh-Services.ps1 script + runbook for stopping in
reverse-dependency order, restaging binaries from the build outputs,
restarting in forward-dependency order, and capturing a smoke-run
artifact.
D.1 gates B.5 (docs sweep) — the documentation records the
as-deployed shape, so the deployment has to be live first.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Revise the alarms-over-gateway plan based on review feedback:
The gateway is for MxAccess (live data + Galaxy hierarchy); the
Wonderware historian sidecar is for aahClientManaged (time-series +
alarms historian). Two SDKs, two concerns. Routing alarm-historian
write-back through the gateway would force coupling that doesn't need
to exist — the sidecar already has a dormant WriteAlarmEvents IPC slot
ready to wire.
Drop A.5 (gateway WriteHistorianEvent RPC). Add Track C — two PRs in
the historian sidecar that complete the dormant slot:
C.1 AahClientManagedAlarmEventWriter implementation
C.2 Program.cs wires the writer into HistorianFrameHandler
B.4 reverses from "delete the IPC slot" to "consume the IPC slot" via
a new SidecarAlarmHistorianWriter on the lmxopcua side.
Also tightens Why-section #3 + D5 to make explicit that the path is
exclusively for non-Galaxy alarm producers (scripted alarms today, AB
CIP ALMD or others future). Galaxy-native alarms reach AVEVA Historian
via System Platform's own HistorizeToAveva toggle, independent of
anything in our stack.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Coordinated cross-repo epic to restore the three v1 alarm capabilities
that PR 7.2 regressed: rich MxAccess alarm-event metadata, native
Acknowledge semantics, and the IAlarmHistorianWriter write-back path.
Architectural split: gateway owns MxAccess transport (new
OnAlarmTransition event family + AcknowledgeAlarm / QueryActiveAlarms /
WriteHistorianEvent RPCs); lmxopcua keeps the OPC UA Part 9 state
machine, ACL/role enforcement, and multi-source aggregation. The
existing value-driven sub-attribute path stays as fallback.
10 PRs total — 5 in mxaccessgw, 5 in lmxopcua — sequenced so each
side's work is independently reviewable. End-of-epic gate is a parity
matrix run with five new alarm scenarios.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Delete _p54.json / _p55.json (PR-body snapshots for the shipped S7
+ Mitsubishi research docs).
- Delete session.dat (38-byte CLI runtime cache, not produced by any
current source code) and add it to .gitignore so it doesn't come
back.
- Delete lmx_backend.md / lmx_mxgw.md / lmx_mxgw_impl.md. All three
carried "✅ Completed 2026-04-30" historical-record banners — the
v2-mxgw migration shipped + merged to master, so the design plans
served their purpose. Drop the cross-refs from CLAUDE.md and
docs/v1/README.md.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- docs/drivers/FOCAS.md and docs/v2/implementation/focas-wire-protocol.md
pointed at focas-deployment.md and focas-simulator-plan.md, both of
which were untracked drafts that have since been removed. Drop the
refs (the wire-protocol companion now stands on its own; deployment
guidance lives inline in the FOCAS driver doc).
- Link the orphan v2 design docs from docs/README.md (multi-host
dispatch, v2 release readiness, the historical lmx-followups tracker)
and from modbus-test-plan.md (s7.md, mitsubishi.md per-family quirk
catalogs, sibling to dl205.md).
Surfaced by the doc audit; no content changes.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- gr/ folder moved to sibling repo at C:\Users\dohertj2\Desktop\graccess\gr;
the SQL queries + DDL captures belong with the graccess CLI work, not
with the OPC UA server. PR 7.2 retired direct Galaxy-DB access from this
repo (mxaccessgw owns those queries server-side now).
- Drop the now-obsolete "Galaxy Repository Database" section in CLAUDE.md
for the same reason — server no longer queries the DB directly.
- Delete root scratch files surfaced by the doc audit (runtimestatus.md,
service_info.md) — abandoned plan + operational scratch.
- Delete docs/v2/implementation/pr-{1,2,4}-body.md — ephemeral PR-body
snapshots from the v2-mxgw rollout.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Audit (three parallel agent passes) found 43 markdown files carrying
stale references to the deleted Galaxy.Host/Proxy/Shared projects
after the v2-mxgw merge. This commit lands the prioritized fixes.
Track 1 — high-traffic in-place rewrites (3 files, ~454 lines deleted)
- README.md (202 → 91 lines): drops .NET 4.8 / x86 / TopShelf install
text; leads with the multi-driver .NET 10 server identity and points
at scripts/install/Install-Services.ps1 and the parity rig.
- docs/v2/driver-specs.md §1 Galaxy (~289 → ~66 lines): replaces the
Tier-C out-of-process spec with a Tier-A in-process description
matching the current GalaxyDriver code, with the four-section
GalaxyDriverOptions JSON shape pulled verbatim from
Config/GalaxyDriverOptions.cs.
- docs/drivers/Galaxy.md (211 → 92 lines): full rewrite around the
current Browse/Runtime/Health/Config sub-folders.
Track 2 — historical banners (5 files)
- lmx_mxgw.md, lmx_mxgw_impl.md, lmx_backend.md,
docs/v2/Galaxy.ParityMatrix.md,
docs/v2/implementation/phase-2-galaxy-out-of-process.md each get a
"✅ Completed 2026-04-30 — historical record" banner block. lmx_mxgw.md
also fixes two dead links (`docs/Galaxy.Driver.md` and
`docs/v2/Galaxy.Driver.md`) → `docs/drivers/Galaxy.md`.
Track 3 — v1 archive sweep (10 git mv + 1 new index + 2 in-place scrubs)
- Moved 10 v1 docs under docs/v1/ preserving subpath structure:
AlarmTracking, Configuration, DataTypeMapping, HistoricalDataAccess,
Subscriptions (top-level); drivers/Galaxy-Repository,
drivers/Galaxy-Test-Fixture; reqs/GalaxyRepositoryReqs,
reqs/MxAccessClientReqs, reqs/ServiceHostReqs.
- New docs/v1/README.md is the shared archive banner + per-file table.
- docs/README.md repointed to the v1 paths and updated to reflect the
v2 two-process deploy shape (Server + Admin + optional
OtOpcUaWonderwareHistorian).
- docs/v2/Galaxy.ParityRig.md got a historical banner + four inline
scrubs marking the OtOpcUaGalaxyHost service / Driver.Galaxy.Host
EXE / Driver.Galaxy.ParityTests project as deleted-in-PR-7.2.
The repo's live-reading surface (README + CLAUDE.md + docs/v2/) now
describes only the post-PR-7.2 architecture. v1 docs are preserved as
a labelled archive under docs/v1/.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Closes the v2-mxgw migration's housekeeping debt now that PR 7.2 has
retired the legacy projects + service.
Repo docs:
- CLAUDE.md: rewrote the Galaxy section + reference-impl + MXAccess
documentation pointers; replaced .NET 4.8 x86 / COM apartment
constraints with .NET 10 AnyCPU + a pointer to the gateway. Dropped
the "Service hosting (Galaxy.Host)" library-preferences row.
- docs/ServiceHosting.md: rewrote (was 156 lines of Galaxy.Host pipe
IPC details). Now reflects the v2 process shape: OtOpcUa.Server +
OtOpcUa.Admin + optional OtOpcUaWonderwareHistorian, with Galaxy
access via the in-process driver → mxaccessgw.
- docs/v2/dev-environment.md: scrubbed four Galaxy.Host references
(TwinCAT/Galaxy.Host shared-host note; .NET 4.8 SDK row; install
step #2; risks table). The .NET 4.8 SDK is now correctly framed as
"optional, only needed when building the mxaccessgw worker".
- mxaccess_documentation.md: deleted from the repo root (obsolete; the
gateway repo is the canonical MxAccess API doc).
Memory housekeeping (under ~/.claude/projects/.../memory/):
- Retired: project_galaxy_host_service.md,
project_galaxy_host_installed.md, reference_impl.md (the LmxProxy
Host MXAccess reference is no longer the design pattern this repo
uses).
- Revised: project_overview.md (now describes the .NET 10 + mxaccessgw
shape), project_aveva_platform_installed.md (AVEVA still required
on the dev box but consumed by the gateway worker, not by anything
here), project_galaxy_via_mxgateway.md (post-7.2 state — flagged as
the only Galaxy backend), project_server_history_alarm_subsystems.md
(per-driver fallbacks retired in PR 7.2).
- MEMORY.md index updated to match.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The parity matrix gate is the precondition for retiring the legacy
Galaxy projects. The 24h × 50k soak run and 2-week production pilot
were sketched in early planning as additional safety nets but aren't
operationally applicable for this deployment — there's no separate
production fleet to pilot against, and the soak harness's value is as
ongoing diagnostic infrastructure (still shipped in PR 6.4) rather
than a one-shot release gate.
PR 7.2's only remaining precondition is the matrix being fully green
or carrying documented accepted-deltas — verified 2026-04-30 on the
dev rig: 14 passed / 1 skipped / 0 failed.
Affected:
- docs/v2/Galaxy.ParityMatrix.md "Outstanding deltas" — flips to
"PR 7.2 is unblocked"
- docs/v2/Galaxy.ParityRig.md "After the rig is green" — drops the
three-step soak+pilot flow, keeps only the matrix-doc bookkeeping
follow-up
- lmx_mxgw_impl.md PR 7.2 "Depends on" — replaces "fully soaked"
with the matrix-green precondition + the verification date
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
End-to-end run on the live ZB galaxy with mxaccessgw on
http://localhost:5120: 14 passed / 1 skipped / 0 failed in 18m53s.
PR 7.2's matrix-gate condition met. Three resolution patches in this
commit; the matrix doc records the new state.
1. Discoverer: defensive `[]` array-suffix strip
----------------------------------------------------
The gw's GalaxyRepository.cs:173-175 appends `[]` to
array-typed full_tag_reference values, but MxAccess COM
IInstance.AddItem doesn't accept `[]`-suffixed addresses.
GalaxyDiscoverer.StripArraySuffix removes the suffix client-side
so SubscribeBulk / Read / Write paths see the canonical form.
Tracked in mxaccessgw/requirements-array-suffix-fix.md; this
workaround is removed when the gw fix lands.
2. WriteByClassification: pin status class, not exact code
---------------------------------------------------------
Legacy MxAccessGalaxyBackend.WriteValuesAsync flat-maps every
failure to BadInternalError (0x80020000); mxgw's
GatewayGalaxyDataWriter.TranslateReply uses
MxStatusProxy.RawDetectedBy to distinguish gw-layer faults
(BadCommunicationError, 0x80050000) from MxAccess HRESULT
faults. Both yield Bad-status — the parity invariant is the
status class (Good/Uncertain/Bad), not the exact code. Both
write tests now use AssertStatusClassMatches; legacy mapping
retires alongside GalaxyProxyDriver in PR 7.2.
3. BrowseAndReadParity Read scenario: drop CLR-type assertion
------------------------------------------------------------
Legacy returns the raw VARIANT (e.g. byte[]) for an attribute
that hasn't received its first value cycle from MxAccess yet,
while mxgw returns the typed value (Single, Int32, etc.). Once
a real value is written or scanned, both converge. Pinning
CLR-type equality across the uninitialized window adds noise
without a real parity invariant — the StatusCode-class
assertion already covers the "did the read succeed" question.
The test still pins StatusCode-class parity per scenario.
4. Galaxy.ParityMatrix.md — first-rig results captured
-----------------------------------------------------
Per-row status flipped from "n/a unverified" to actual
green / yellow / deferred outcomes from this run. Four new
accepted-deltas added (read-value CLR type, write-status code
mapping, single-platform ScanState scope, gw `[]` suffix
workaround), bringing the total to nine. Outstanding deltas
section flipped to "none as of 2026-04-30."
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Replaces the placeholder "configure an API key per gateway.md" with
the actual commands that worked end-to-end on this dev box:
- Build both halves (Worker x86 net48, Server net10)
- apikey init-db + apikey create-key with the seven scopes the parity
test exercises (session:*, invoke:*, events:read, metadata:read)
- Three env-var overrides at server startup — capturing real lessons
learned standing the rig up:
* Kestrel__Endpoints__Http__Url = http://localhost:5120
* Kestrel__Endpoints__Http__Protocols = Http2 (gRPC needs h2c on
plain HTTP — without this flag the client gets HTTP_1_1_REQUIRED)
* MxGateway__Worker__ExecutablePath = absolute path to the built
worker (appsettings.json's relative path drops \net48 and the
server can't resolve it)
- Note that workers spawn lazily on first OpenSession, not at server
startup — so port-listening is necessary but not sufficient
evidence the gateway is healthy.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>