ForbiddenTypeAnalyzer used only a namespace-prefix deny-list. System.Environment,
System.AppDomain, System.GC and System.Activator live directly in the System
namespace, which must stay allowed for primitives (Math, String, ...), so they
were never caught — an operator-authored predicate could call
System.Environment.Exit(0) and terminate the in-process OPC UA server.
Add a type-granular deny-list (ForbiddenFullTypeNames) checked by
fully-qualified type name after the namespace-prefix check; legitimate System
types are unaffected.
Regression tests assert scripts referencing Environment/AppDomain/GC/Activator
are rejected at analysis time. Core.Scripting suite: 68/68 pass.
Resolves code-review finding Core.Scripting-001 (Critical).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Admin-001: Routes.razor used a plain RouteView, so the page-level
[Authorize] attributes on 11 pages were inert — every page, including
mutating ones, was reachable fully unauthenticated.
Admin-002: several pages (e.g. NewCluster, which writes config rows)
carried no auth attribute at all.
- Routes.razor: RouteView → AuthorizeRouteView with NotAuthorized /
Authorizing slots; add RedirectToLogin component.
- Program.cs: SetFallbackPolicy(RequireAuthenticatedUser) — secure by
default for new pages/endpoints.
- Login.razor: [AllowAnonymous] so login stays reachable; login page,
/auth/* endpoints and static assets remain anonymous.
- Add [Authorize] to the previously un-gated pages; NewCluster gated to
the CanPublish (FleetAdmin) policy.
Regression tests in PageAuthorizationTests pin that anonymous requests
to protected/mutating routes are rejected and that login + static
assets stay anonymously reachable. Admin test suite: 210/210 pass.
Resolves code-review findings Admin-001 and Admin-002 (Critical).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
WriteNodeIdUnknown called itself unconditionally as its first statement
— unbounded recursion with no base case → StackOverflowException, an
uncatchable process crash reachable by any client issuing a HistoryRead
on an unresolvable NodeId (remote DoS).
Replace the self-call with the result-slot assignment, mirroring
WriteUnsupported / WriteInternalError. The helper is now internal so the
regression test can pin the StatusCode without a server fixture.
Resolves code-review finding Server-001 (Critical).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The mxaccessgw updated alarms to a session-less central monitor:
AcknowledgeAlarm dropped SessionId and alarm transitions now come from
the session-less StreamAlarms feed instead of the per-session worker
StreamEvents stream. The GalaxyDriver no longer compiled against the
updated client.
- GatewayGalaxyAlarmAcknowledger: session-less rewrite — no GalaxyMxSession;
outcome read from ProtocolStatus (throw) and Hresult (warn).
- New IGalaxyAlarmFeed seam + GatewayGalaxyAlarmFeed: background consumer
of StreamAlarms that decodes the active-alarm snapshot plus live
transitions into GalaxyAlarmTransition and reopens the stream on
transport faults.
- EventPump: drop the dead per-session OnAlarmTransition path; the
per-session stream no longer carries alarms.
- GalaxyDriver: bridge the feed onto IAlarmSource.OnAlarmEvent; the feed
starts on SubscribeAlarmsAsync, independent of data subscriptions.
- Tests: replace EventPumpAlarmTests with GatewayGalaxyAlarmFeedTests;
move the driver alarm-source tests onto the IGalaxyAlarmFeed seam.
Browse needed no change — GatewayGalaxyHierarchySource consumes the
unchanged DiscoverHierarchy contract.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
SdkAlarmHistorianWriteBackend.WriteBatchAsync replaces the RetryPlease
placeholder with the real entry point — HistorianAccess.AddStreamedValue
(HistorianEvent, out HistorianAccessError) in aahClientManaged, pinned by
decompiling the installed SDK.
The write path opens its own ReadOnly=false connection: the query-side
HistorianDataSource opens ReadOnly sessions and AddStreamedValue fails on
those with WriteToReadOnlyFile. IHistorianConnectionFactory gains a readOnly
parameter (default true, query path unchanged); BuildConnectionArgs is
extracted as a pure helper. HistorianClusterEndpointPicker is shared for
node failover; connection-class errors abort the batch as RetryPlease and
reset the connection, malformed-input codes map to PermanentFail.
Tests: connection-unavailable batch deferral, ClassifyOutcome error-code
table, BuildConnectionArgs read-vs-write shaping (80 pass, 2 rig-skipped).
Live_* round-trip tests stay Skip-gated for the D.1 rollout smoke.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
RouteScriptedAlarmMethodCalls now handles ConditionType.AddComment
alongside Acknowledge/Confirm, dispatching to engine.AddCommentAsync.
An empty comment is rejected by the Part 9 state machine and surfaced
as BadInvalidArgument. MapCallOperation gates AddComment at the
AlarmAcknowledge tier — there is no dedicated AddComment permission bit.
Closes phase-7-status.md Gap 1: all Part 9 alarm methods now route to
the engine. Adds 3 unit tests.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
OneShotShelve / TimedShelve / Unshelve now reach the ScriptedAlarmEngine.
Scripted-alarm condition nodes get a ShelvedStateMachine subtree created
before alarm.Create so the stack wires each shelve method's dispatch
handler; AlarmConditionState.OnShelve / OnTimedUnshelve route to the
engine and mirror the result onto the OPC UA node via SetShelvingState.
The three per-instance shelve method NodeIds are indexed so the Call gate
resolves them to OpcUaOperation.AlarmShelve instead of falling through to
generic Call. Engine dispatch is split into the node-free InvokeEngineShelve
so the routing decision is unit-testable.
Adds 9 unit tests; updates phase-7-status.md Gap 1 (only AddComment remains
unwired) and the #24 entry in looseends.md.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Gap 2 (#25): VirtualTagsTab.razor + /virtual-tags global page — list/create/toggle
virtual tags per draft generation with DataType, Script, trigger, Historize, Enabled
fields. Tab wired into DraftEditor.
Gap 3 (#26): ScriptedAlarmsTab.razor + /scripted-alarms global page — list/create
scripted alarms with AlarmType, Severity, MessageTemplate, PredicateScript,
HistorizeToAveva, Retain. SeverityBand helper shows Low/Medium/High/Critical label.
Tab wired into DraftEditor.
Gap 4 (#27): ScriptLogHub (SignalR IAsyncEnumerable stream) tails scripts-*.log with
optional ScriptName filter; ScriptLog.razor provides Start/Stop/Clear controls plus
level filter dropdown. Hub registered at /hubs/script-log in Program.cs.
Nav rail gains a "Scripting" eyebrow with entries for all three pages.
19 new unit tests for ScriptLogHub parse/filter/tail helpers (Category=Unit).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Closes Phase 7 Gap 5: VirtualTagEngine called IHistoryWriter.Record per evaluation
when Historize=true but Phase7EngineComposer always passed NullHistoryWriter, so
virtual-tag history was computed but never persisted.
The fix:
- New RingBufferHistoryWriter implements both IHistoryWriter (write port for the
evaluation pipeline) and IHistorianDataSource (read port for IHistoryRouter so
OPC UA HistoryRead on virtual-tag nodes resolves here). Maintains one bounded
ring buffer (1000 samples, configurable) per tag path; Record() is O(1) and
never blocks evaluation.
- Phase7EngineComposer.Compose now accepts IHistoryRouter? and, when any
VirtualTagDefinition.Historize=true, creates a RingBufferHistoryWriter, passes
it to VirtualTagEngine as historyWriter, adds it to the disposables list, and
registers it under the "virtual:" prefix in the router for HistoryRead dispatch.
- Phase7Composer accepts IHistoryRouter? from DI (already registered as singleton
in Program.cs) and threads it through to Phase7EngineComposer.Compose.
- NullHistoryWriter remains as fallback when no tags request historization.
- 16 new unit tests in RingBufferHistoryWriterTests.cs cover ring-buffer semantics,
eviction, per-tag isolation, ReadRawAsync windowing, IHistorianDataSource stubs,
router registration, and the Historize=false / null-router fallback paths.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Gap 1 of phase-7-status.md. Intercepts AcknowledgeableConditionType_Acknowledge and
AcknowledgeableConditionType_Confirm calls in DriverNodeManager.Call and dispatches
them to ScriptedAlarmEngine so OPC UA HMI clients can acknowledge/confirm scripted alarms
in addition to the existing Admin UI path. Shelve methods deferred (per-instance NodeIds,
not well-known type MethodIds — follow-up task). AlarmEngine is now exposed through
Phase7ComposedSources so the server wire-up passes it to every DriverNodeManager. 13 new
unit tests cover dispatch kernel, identity fallback, batch handling, and error paths.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
ApplyReservationPreCheckAsync on EquipmentImportBatchService queries active
ExternalIdReservation rows in a single round-trip at parse time; rows whose ZTag
or SAPID is claimed by a different EquipmentUuid are moved from AcceptedRows to
RejectedRows with a descriptive reason. ImportEquipment.razor calls the check
after EquipmentCsvImporter.Parse so conflicts appear in the preview before the
operator clicks Stage + Finalise. Updated notice banner to reflect the pre-check
is now live; 6 new unit tests cover conflict, no-conflict, same-UUID, released-
reservation, and empty-input paths.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
ModbusAddressPreview (@bind on dropdowns + child ModbusAddressEditor with @oninput),
ModbusDiagnostics (@onclick Refresh), and NewCluster (EditForm with Nav.NavigateTo on submit)
were missed in the first pass — all three require interactivity but had no @rendermode.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Eight pages were using @onclick handlers, Timers, or HubConnections but had no @rendermode,
causing interactivity to be silently dead under static SSR. Added @rendermode RenderMode.InteractiveServer
(with the required @using Microsoft.AspNetCore.Components.Web) to: AlarmsHistorian, Certificates,
Fleet, Home, Hosts, Reservations, DraftEditor, and ImportEquipment.
Also fixed two hub URL bugs: AclsTab and RedundancyTab were connecting to the non-existent
/hubs/fleet-status path; corrected to /hubs/fleet which matches the MapHub<FleetStatusHub>
call in Program.cs. Build: 0 errors, 0 warnings.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Add ResilientLdapGroupRoleMappingService — a singleton decorator that wraps the
hot-path GetByGroupsAsync call in a Polly pipeline (timeout 2s → retry 3× jittered
→ fallback to in-memory sealed snapshot) so a transient Config DB outage at
Admin sign-in falls back to the last-known-good mapping set rather than denying
every login. The static LdapOptions.GroupToRole bootstrap dictionary in
AdminRoleGrantResolver remains the lock-out-proof floor regardless of DB state.
DI wiring uses keyed services: LdapGroupRoleMappingService (EF, scoped) is
registered under key "LdapGroupRoleMappingService.Inner"; the resilient singleton
decorator is the primary ILdapGroupRoleMappingService binding. The singleton
avoids the captive-dependency anti-pattern by using IServiceScopeFactory to open
a short-lived scope for each DB call.
Write methods (CreateAsync, DeleteAsync, ListAllAsync) pass through unchanged —
resilience is read-path only per Phase 6.1 design decision.
15 new unit tests cover: DB success/failure/retry paths, snapshot sealing and
per-group-set isolation, order-independent cache key normalisation, cancellation
propagation, and pass-through method routing.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Role grants: drop the page notice describing the LDAP-group → role
mapping semantics; this is moving to the user instructions.
- Certificates: drop the trailing "operators should retry the rejected
client's connection" note from the trust notice.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The role-grants page is the authoring surface for LdapGroupRoleMapping
rows, but it had no @rendermode — so it rendered as static SSR and its
@onclick handlers (Add grant, Revoke) never fired. App.razor's <Routes/>
sets no global render mode; only ClusterDetail opted in.
- Add @rendermode RenderMode.InteractiveServer.
- Fix the SignalR hub URL: the page connected to /hubs/fleet-status,
but FleetStatusHub is mapped at /hubs/fleet. Static SSR masked this
(OnAfterRenderAsync never ran); enabling interactivity surfaced the
404 that terminated the circuit.
Verified in-browser: Add grant opens the form, a cluster-scoped grant
saves and lists, Revoke removes it.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The role-grants page authored LdapGroupRoleMapping rows but nothing
consumed them — sign-in only read the static appsettings GroupToRole
dictionary. Wire the DB-backed grants into the auth path.
- AdminRoleGrantResolver merges the static bootstrap dictionary (always
fleet-wide, lock-out-proof) with DB grants; system-wide rows fold into
fleet roles, cluster-scoped rows become (cluster, role) grants.
- Login emits a ClaimTypes.Role claim per fleet role and a cluster_role
claim per cluster-scoped grant; lock-out check spans both scopes.
- ClusterRoleClaims + ClaimsPrincipal extensions resolve the effective
role for a cluster (highest of fleet-wide and cluster-scoped).
- ClusterAuthorizeView gates cluster pages: ClusterDetail (view +
ConfigEditor draft actions), DraftEditor (ConfigEditor / FleetAdmin
publish), DiffViewer (ConfigViewer), ImportEquipment (ConfigEditor).
- RoleGrants page is now FleetAdmin-only; Account surfaces fleet-wide
and cluster-scoped grants separately.
Control-plane only — decision #150 holds, NodeAcl is untouched.
Tests: AdminRoleGrantResolverTests + ClusterRoleClaimsTests (22).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three bugs blocked sign-in entirely:
- Login.razor is static-SSR but its form model lacked
[SupplyParameterFromForm], so the posted username/password never
bound — SignInAsync saw empty fields and bailed before LDAP was
contacted. Annotate the model; seed it in OnInitialized since
BL0008 forbids an initializer on a [SupplyParameterFromForm]
property.
- appsettings.json ServiceAccountDn used ou=svcaccts, which GLAuth
reads as a (non-existent) group — the service-account bind failed
with "Group not found". Use cn=serviceaccount,dc=lmxopcua,dc=local.
- LdapAuthService resolved the user DN by searching (uid=...), but
GLAuth keys users by cn. Add an LdapOptions.UserNameAttribute knob
(default cn for GLAuth; set sAMAccountName for Active Directory)
and use it for the search filter.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adopt the technical-light design system across the Admin web UI:
- Vendor theme.css + IBM Plex woff2 fonts into wwwroot; include
theme.css globally after Bootstrap.
- Rebuild MainLayout: top app-bar (brand mark, breadcrumb, connection
pill) + hairline-ruled side rail with accent-bordered active link.
- Convert all 33 pages to the component catalog — tables to
panel + data-table (num/mono columns), KPI cards to agg-grid,
detail blocks to metric-card/kv rows, badges to chips, alerts to
panel notice, headings to page-title/panel-head, .rise reveals.
- Buttons/forms stay on Bootstrap; theme.css restyles them via
--bs-* overrides. View-specific layout lives in app.css; all
colour/type comes from theme.css tokens.
Also fix a pre-existing /fleet 500: the node-state query ordered on
a property of a constructed FleetNodeRow record, which EF Core
cannot translate. Order the join's columns before projecting.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The Admin appsettings.json still carried Server=localhost,14330 — a
straggler from before the 2026-04-28 Docker migration that moved SQL
Server onto the shared Linux host. Every other checked-in appsettings
was rewritten then; this one was missed, so the Admin web UI returned
HTTP 500 on every page (SqlException, connection timeout). Repoint it
at 10.100.0.35,14330 to match.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Group all 69 projects into category subfolders under src/ and tests/ so the
Rider Solution Explorer mirrors the module structure. Folders: Core, Server,
Drivers (with a nested Driver CLIs subfolder), Client, Tooling.
- Move every project folder on disk with git mv (history preserved as renames).
- Recompute relative paths in 57 .csproj files: cross-category ProjectReferences,
the lib/ HintPath+None refs in Driver.Historian.Wonderware, and the external
mxaccessgw refs in Driver.Galaxy and its test project.
- Rebuild ZB.MOM.WW.OtOpcUa.slnx with nested solution folders.
- Re-prefix project paths in functional scripts (e2e, compliance, smoke SQL,
integration, install).
Build green (0 errors); unit tests pass. Docs left for a separate pass.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Fourteenth PR of the alarms-over-gateway epic
(docs/plans/alarms-over-gateway.md). Depends on PR B.2 (GalaxyDriver
implements IAlarmSource, merged) and B.3 (DriverNodeManager prefers
driver-native ack, merged).
Three new optional fields on Core.Abstractions.AlarmEventArgs:
- OperatorComment — populated by the driver-native gateway path on
Acknowledge transitions. Null on raise / clear, and null on the
sub-attribute fallback path where the comment collapses into a
single string write.
- OriginalRaiseTimestampUtc — preserved across Acknowledge so OPC
UA Part 9 conditions keep the original raise time.
- AlarmCategory — taxonomy bucket from the upstream alarm system.
Maps to ConditionClassName downstream when a class mapping is
configured.
GalaxyDriver.OnPumpAlarmTransition populates the new fields from
GalaxyAlarmTransition (PR B.1). Empty strings collapse to null so
consumers can use is-null rather than is-null-or-empty checks.
Client.Shared mirror DTO (Client.Shared/Models/AlarmEventArgs)
gains the same three properties so the Client.UI / Client.CLI
surfaces can reflect the rich payload — the actual UI/CLI
verbose-output and Show-Details rendering ship as a follow-up
PR; this PR locks in the payload contract.
Tests:
- 2 new tests in Driver.Galaxy.Tests pin the populated-vs-null
behaviour for full-payload Acknowledge and bare-bones Raise
transitions respectively.
- Solution build clean.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Thirteenth PR of the alarms-over-gateway epic
(docs/plans/alarms-over-gateway.md). Depends on PR B.2 (GalaxyDriver
implements IAlarmSource, merged).
When DriverNodeManager registers an AlarmConditionState with
AlarmConditionService, it now picks the acknowledger:
- Driver implements IAlarmSource → DriverAlarmSourceAcknowledger
routes the operator comment through IAlarmSource.AcknowledgeAsync
via the existing AlarmSurfaceInvoker (Phase 6.1 resilience pipeline,
no-retry per decision #143). Preserves operator-comment fidelity
end-to-end — the value-driven sub-attribute write collapses the
comment into a single string write that loses MxAccess metadata.
- Driver does not implement IAlarmSource →
DriverWritableAcknowledger fallback (existing behaviour for
AbCip / Modbus / S7 / etc).
The dedup logic that prefers driver-native transitions over
sub-attribute synthesis lives in AlarmConditionService and is
already in place — drivers that surface OnAlarmEvent (B.2) feed
the service directly, while sub-attribute writes still flow
through DriverNodeManager's ConditionSink so a Galaxy template
without $Alarm extensions stays functional.
Tests:
- 2 new routing-decision tests in
DriverAlarmSourceAcknowledgerRoutingTests pin the
IAlarmSource detection used at registration time.
- Server build clean.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Twelfth PR of the alarms-over-gateway epic
(docs/plans/alarms-over-gateway.md). Depends on PR B.1 (EventPump
dispatch, merged) and PR E.2 (.NET SDK alarm methods, merged).
Restores the v1 IAlarmSource capability that PR 7.2 retired with the
legacy Galaxy.Host / Galaxy.Proxy projects.
GalaxyDriver gains:
- IAlarmSource on the class declaration → eight capabilities total
(IDriver / ITagDiscovery / IReadable / IWritable / ISubscribable /
IRediscoverable / IHostConnectivityProbe / IAlarmSource).
- SubscribeAlarmsAsync — returns a sentinel handle and starts the
shared EventPump (alarm wiring is lazy on first sub).
Multiple handles share the same gateway stream; the server-side
AlarmConditionService dispatches per-source-node downstream.
- UnsubscribeAlarmsAsync — symmetric handle removal; rejects
handles not issued by this driver.
- AcknowledgeAsync — issues one gateway RPC per acknowledgement
through IGalaxyAlarmAcknowledger. ConditionId carries the alarm
full reference; falls back to SourceNodeId when empty.
- OnAlarmEvent — bridges EventPump.OnAlarmTransition (B.1) onto
AlarmEventArgs. Suppressed when no alarm subscription is active so
untracked transitions don't leak through.
New runtime types:
- IGalaxyAlarmAcknowledger — test seam.
- GatewayGalaxyAlarmAcknowledger — production wrapper around
MxGatewayClient.AcknowledgeAlarmAsync (PR E.2). Maps native
MxStatus failures to a logged warning rather than a thrown
exception so a transient MxAccess hiccup doesn't fail the
operator's Acknowledge.
- GalaxyAlarmSubscriptionHandle — driver-side IAlarmSubscriptionHandle.
Production runtime construction in BuildProductionRuntimeAsync wires
the acknowledger when not pre-injected; tests inject a fake via the
internal ctor.
Tests:
- 7 new tests in GalaxyDriverAlarmSourceTests — subscribe → event
fire path, suppress without subscription, unsubscribe stops flow,
foreign-handle rejection, ack routes per-request, ack falls back
to SourceNodeId, ack throws NotSupported without acknowledger.
- Full Driver.Galaxy.Tests: 203 passed (was 196; 7 new).
Operates as a "stub-ready" surface — runtime ack calls will return
PERMISSION_DENIED until A.3 ships the gateway-side dispatch, and no
alarm transitions will arrive until A.2 adds the worker MxAccess
subscription. Both will activate this code path automatically when
the gateway side lands.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sixth PR of the alarms-over-gateway epic
(docs/plans/alarms-over-gateway.md). Depends on PR C.2 (sidecar
serves IAlarmEventWriter when enabled), already merged.
Today Phase7Composer.ResolveHistorianSink only scans drivers for an
IAlarmHistorianWriter — no Galaxy driver provides one since PR 7.2,
so the resolution falls through to NullAlarmHistorianSink and
scripted-alarm transitions are silently discarded.
WonderwareHistorianClient already implements IAlarmHistorianWriter
and Program.cs:178 already registers it as a singleton when
Historian:Wonderware:Enabled=true. The gap was that Phase7Composer
ignored DI: this PR adds an optional injectedWriter constructor
parameter, and ASP.NET Core DI resolves it from the same
registration when present.
- Phase7Composer constructor: new optional IAlarmHistorianWriter?
injectedWriter parameter (default null). Backward-compatible —
existing callers don't need to change; DI populates it
automatically when the singleton is registered.
- New static SelectAlarmHistorianWriter helper — resolution order
is driver → DI → null. Drivers win when both are present so a
future GalaxyDriver-as-IAlarmHistorianWriter takes the write
path directly, preserving the v1 invariant where a driver that
natively owns the historian client doesn't bounce through the
sidecar IPC.
- ResolveHistorianSink uses the helper + emits a structured log
line identifying which source provided the writer.
Tests:
- 4 SelectAlarmHistorianWriter precedence tests — no source / DI
only / driver wins over DI / first-driver-with-writer wins.
- Pre-existing 4 HostStatusPublisherTests SQL failures unrelated
to this change (require the docker-host SQL Server at
10.100.0.35,14330 per CLAUDE.md). Phase7 + alarm tests all
green.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Fifth PR of the alarms-over-gateway epic
(docs/plans/alarms-over-gateway.md). Depends on PR C.1
(AahClientManagedAlarmEventWriter), already merged.
Today HistorianFrameHandler is constructed at Program.cs line 57
without an alarmWriter, so every WriteAlarmEvents frame replies
"Sidecar not configured with an alarm-event writer" and the lmxopcua
side keeps the row queued. C.2 wires a real writer behind a new
OTOPCUA_HISTORIAN_ALARM_WRITE_ENABLED toggle.
- Program.BuildAlarmWriter — gated on the env var (default true,
fail-open under accidental misconfiguration). Constructs an
AahClientManagedAlarmEventWriter wrapping a
SdkAlarmHistorianWriteBackend with the same connection config the
read path uses.
- Install-Services.ps1 — appends OTOPCUA_HISTORIAN_ALARM_WRITE_ENABLED=true
to the OtOpcUaWonderwareHistorian service env block when the
sidecar is installed. Read-only deployments flip it to false at
service-config edit time without re-installing.
- HistorianFrameHandler signature already accepts
IAlarmEventWriter? — supplying non-null at line 57 lights up
the WriteAlarmEvents reply path that's been dormant since PR 3.3.
Until PR D.1 pins the live aahClientManaged entry point, the
SdkAlarmHistorianWriteBackend reports RetryPlease for every event
with a structured diagnostic. The lmxopcua-side
SqliteStoreAndForwardSink retains queued events; same effective
behaviour as today's NullAlarmHistorianSink fallback but with
visible diagnostics rather than silent discard.
Tests:
- 6 BuildAlarmWriter env-var cases — unset / true / false /
unrecognized → default-on / capitalization variants.
- Full sidecar test suite: 56 passed (was 48; 8 new).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Fourth PR of the alarms-over-gateway epic
(docs/plans/alarms-over-gateway.md). Independent of Tracks A and B —
the sidecar slot defined in HistorianFrameHandler line 242 is unwired
today; PR C.2 (next) flips it on in Program.cs.
- AlarmHistorianWriteOutcome (sidecar-local, net48 — twin of
Core.AlarmHistorian.HistorianWriteOutcome which is net10): Ack /
RetryPlease / PermanentFail.
- IAlarmHistorianWriteBackend abstraction so the SDK call can be
faked in unit tests.
- AahClientManagedAlarmEventWriter implements IAlarmEventWriter,
delegates to the backend, maps Ack→true / Retry|Permanent→false
for the IPC bool[] reply contract. Backend exception → whole
batch RetryPlease (preserves the sender's queue across transients
rather than dropping). Wrong-count return defends against a
backend bug desyncing queue accounting.
- SdkAlarmHistorianWriteBackend — production binding skeleton.
Reports RetryPlease for every event and logs a structured
diagnostic until PR D.1 pins the live aahClientManaged entry
point against the dev rig. The sender's SqliteStoreAndForwardSink
retains queued events, mirroring today's NullAlarmHistorianSink
behaviour but with visible diagnostics instead of silent discard.
- MapOutcome shared helper — pinned via theory tests so the D.1
swap can change the SDK call site without reshuffling the
HRESULT → outcome mapping.
Tests:
- 6 writer tests — empty batch / single Ack / mixed Ack-Retry-
Permanent-Ack ordering / backend-throw → RetryPlease batch /
cancellation propagates / wrong-count defensive degrade.
- 5 outcome theory cases — hresult 0 → Ack, malformed wins over
hresult 0, comm error → Retry, unknown failure → Retry,
malformed + comm → Permanent.
- Full sidecar test suite: 48 passed (was 42; 6 new).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Second PR of the alarms-over-gateway epic
(docs/plans/alarms-over-gateway.md). Depends on PR A.1 in mxaccessgw
(merged) which added the OnAlarmTransitionEvent body + family. No
runtime impact yet — the gateway doesn't emit the new family until
A.3 ships; this PR just stops dropping it on the floor.
EventPump.Dispatch becomes a switch on MxEventFamily. The new
DispatchAlarmTransition decodes the proto event, runs the raw severity
through MxAccessSeverityMapper (the same four-bucket ladder v1 used —
250/500/750/1000 boundaries per docs/v1/AlarmTracking.md), and fires
an internal OnAlarmTransition event with a GalaxyAlarmTransition
record carrying the full payload.
Body absent or transition-kind unspecified → counted via
galaxy.alarm_transitions.decoding_failures and dropped. Gateway
version skew or worker malformed event therefore degrades to "fall
back to the sub-attribute path" rather than crashing the pump.
GalaxyDriver consumes the internal event in PR B.2 (next), wrapping
it onto IAlarmSource.OnAlarmEvent. The richer fields (operator user
+ comment, original raise time, category) become visible on the OPC
UA Part 9 condition once AlarmEventArgs gets extended in E.7.
Tests:
- MxAccessSeverityMapperTests — full bucket ladder + clamp behaviour
for negative + out-of-range inputs.
- EventPumpAlarmTests — raise/ack/clear sequence dispatches in order
with operator metadata + original-raise preserved; unspecified
kind drops; missing body drops; mixed data-change + alarm streams
dispatch independently; OnWriteComplete / OperationComplete
filtered out.
Full Driver.Galaxy.Tests suite: 196 passed (was 191 — 5 new tests).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The sidecar was set to PlatformTarget=x86 + Prefer32Bit=true to mirror
v1's Driver.Galaxy.Host bitness, which itself was x86 only because of
MXAccess COM. PR 7.2 retired Galaxy.Host, so that constraint is gone.
AVEVA Historian 2020 ships an x64 build of every SDK assembly the
sidecar needs (lib\aahClientManaged.dll + aahClient.dll + aahClientCommon.dll
sourced from C:\Program Files (x86)\Wonderware\Historian\x64\; the
remaining three SDK assemblies — Historian.CBE / DPAPI /
ArchestrA.CloudHistorian.Contract — are pure-managed AnyCPU and load
in either bitness). Drop PlatformTarget to x64 on both the sidecar
project and its test project; running 37/37 historian tests + the
live install confirms the SDK loads and serves the named pipe in a
64-bit-native process.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Microsoft.NET.Sdk doesn't auto-include appsettings.json the way Web SDK
does, so dotnet publish was leaving it behind. Without it next to the
EXE the Windows-service-mode host can't find Node + ConfigDb config and
the install scripts had to copy it by hand.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Matrix-gate satisfied (14 passed / 1 skipped / 0 failed on 2026-04-30
per docs/v2/Galaxy.ParityMatrix.md). Galaxy access flows through the
in-process GalaxyDriver → mxaccessgw exclusively. Legacy infrastructure
deleted in this commit:
Source projects (6):
- src/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Host (.NET 4.8 x86 + MXAccess COM)
- src/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Proxy (in-process pipe client)
- src/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Shared (pipe-IPC contracts)
- tests/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Host.Tests
- tests/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Proxy.Tests
- tests/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Shared.Tests
Test projects with no consumer after legacy retired (3):
- tests/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.E2E (drove Galaxy.Host EXE)
- tests/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.ParityTests (drove both backends)
- tests/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.TestSupport (only consumed by Host/Proxy tests)
Edits:
- ZB.MOM.WW.OtOpcUa.slnx: drop nine project entries
- Server.csproj: drop Driver.Galaxy.Proxy ProjectReference
- Server/Program.cs: drop GalaxyProxyDriverFactoryExtensions.Register
+ the parallel-registration comment block; only GalaxyDriverFactoryExtensions
registers now under DriverType "GalaxyMxGateway"
- Install-Services.ps1: rewrite to drop OtOpcUaGalaxyHost service install +
the GalaxySharedSecret/ZbConnection/GalaxyClientName/GalaxyPipeName/
AvevaServiceDependencies/MxAccessInitialConnect* parameters that only
applied to the legacy host. Adds a closing note pointing operators at
the separate mxaccessgw install
- Uninstall-Services.ps1: keep OtOpcUaGalaxyHost in the cleanup loop so
pre-7.2 rigs upgrade-uninstall cleanly, plus add OtOpcUaWonderwareHistorian
- scripts/e2e/test-galaxy.ps1: deleted (drove the legacy E2E)
- scripts/e2e/e2e-config.sample.json: rewrite the galaxy section comment
to reflect the GalaxyMxGateway-only path
- scripts/e2e/README.md: drop OtOpcUaGalaxyHost references
- scripts/compliance/phase-7-compliance.ps1: drop Galaxy.Shared
HistorianAlarms* checks (those contracts moved to
Driver.Historian.Wonderware.Client in PR 3.4)
Live state: OtOpcUaGalaxyHost Windows service stopped + removed via
NSSM before this commit. The dev box's Galaxy access is now exclusively
through the running mxaccessgw (separate repo).
Stays out of scope for PR 7.2 (PR 7.3 territory):
- CLAUDE.md Galaxy section rewrite
- mxaccess_documentation.md deletion
- Memory entries for the now-retired Galaxy.Host service
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
End-to-end run on the live ZB galaxy with mxaccessgw on
http://localhost:5120: 14 passed / 1 skipped / 0 failed in 18m53s.
PR 7.2's matrix-gate condition met. Three resolution patches in this
commit; the matrix doc records the new state.
1. Discoverer: defensive `[]` array-suffix strip
----------------------------------------------------
The gw's GalaxyRepository.cs:173-175 appends `[]` to
array-typed full_tag_reference values, but MxAccess COM
IInstance.AddItem doesn't accept `[]`-suffixed addresses.
GalaxyDiscoverer.StripArraySuffix removes the suffix client-side
so SubscribeBulk / Read / Write paths see the canonical form.
Tracked in mxaccessgw/requirements-array-suffix-fix.md; this
workaround is removed when the gw fix lands.
2. WriteByClassification: pin status class, not exact code
---------------------------------------------------------
Legacy MxAccessGalaxyBackend.WriteValuesAsync flat-maps every
failure to BadInternalError (0x80020000); mxgw's
GatewayGalaxyDataWriter.TranslateReply uses
MxStatusProxy.RawDetectedBy to distinguish gw-layer faults
(BadCommunicationError, 0x80050000) from MxAccess HRESULT
faults. Both yield Bad-status — the parity invariant is the
status class (Good/Uncertain/Bad), not the exact code. Both
write tests now use AssertStatusClassMatches; legacy mapping
retires alongside GalaxyProxyDriver in PR 7.2.
3. BrowseAndReadParity Read scenario: drop CLR-type assertion
------------------------------------------------------------
Legacy returns the raw VARIANT (e.g. byte[]) for an attribute
that hasn't received its first value cycle from MxAccess yet,
while mxgw returns the typed value (Single, Int32, etc.). Once
a real value is written or scanned, both converge. Pinning
CLR-type equality across the uninitialized window adds noise
without a real parity invariant — the StatusCode-class
assertion already covers the "did the read succeed" question.
The test still pins StatusCode-class parity per scenario.
4. Galaxy.ParityMatrix.md — first-rig results captured
-----------------------------------------------------
Per-row status flipped from "n/a unverified" to actual
green / yellow / deferred outcomes from this run. Four new
accepted-deltas added (read-value CLR type, write-status code
mapping, single-platform ScanState scope, gw `[]` suffix
workaround), bringing the total to nine. Outstanding deltas
section flipped to "none as of 2026-04-30."
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Lands the five concrete code-level follow-ups identified after Phase 7.1:
#1 GalaxyDriver.ReadAsync now works in production. Previously threw
NotSupportedException when no test reader was injected. New path
subscribes through the existing SubscriptionRegistry + EventPump,
waits for the first OnDataChange per item handle (gw pushes the
initial value after SubscribeBulk), then unsubscribes. Tags the gw
rejects up front, or that don't publish before the caller's CT
fires, return Bad-status snapshots in input order so callers still
get one snapshot per requested reference.
#2 ResolveApiKey() routes Gateway.ApiKeySecretRef through three forms:
env:NAME, file:PATH, or literal-string fallback. A future DPAPI arm
slots in here without touching the call site.
#3 GatewayGalaxySubscriber actually honors bufferedUpdateIntervalMs now
(was being silently dropped). Calls SetBufferedUpdateInterval via
the gw's MxCommandKind.SetBufferedUpdateInterval before SubscribeBulk
when the requested interval differs from the cached last-applied
value. Soft-fails on a non-Ok protocol status (the SubscribeBulk
still succeeds at gw cadence).
#4 GalaxyMxAccessOptions.EventPumpChannelCapacity surfaces the bounded-
channel size through DriverConfig JSON, defaulting to 50_000.
#5 Stale doc-comments in HostStatusAggregator and GatewayGalaxySubscriber
describing follow-ups that already shipped.
Tests: +6 (read subscribe-once happy path + rejected-tag fallback;
five resolver scenarios). Total Galaxy driver tests now 180/180 green.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds Galaxy.DefaultBackend = "GalaxyMxGateway" to the server
appsettings as the forward-looking default for tooling and migration
scripts that author new Galaxy DriverInstance rows. No runtime
behavior change — both factories register independently at startup,
so existing rows keep working until PR 7.2 retires the legacy
registration (gated on the parity matrix in
docs/v2/Galaxy.ParityMatrix.md going fully green on the parity rig).
The e2e-config.sample.json comment is updated to reflect the new
default endpoint (http://localhost:5120 mxaccessgw) while still
pointing pre-flip rigs at the legacy OtOpcUaGalaxyHost path.
Install-Services.ps1's OtOpcUaGalaxyHost registration is intentionally
unchanged — yanking that mid-flight without a soaked parity rig would
leave any in-progress installation without a Galaxy backend at all.
PR 7.2 retires it alongside the legacy projects.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Bumps DefaultCallTimeoutSeconds from 5 → 30. The 5s default was
provably unsafe regardless of soak data: a 50k-tag SubscribeBulk
walks the gw worker's item list serially under the MxAccess COM
apartment lock, and that scan can exceed 5s on a busy node. 30s
leaves comfortable headroom for the legitimate worst case while
still failing fast on a wedged worker.
ConnectTimeoutSeconds (10) and StreamTimeoutSeconds (0 = unlimited)
unchanged — the soak harness in PR 6.4 didn't observe pressure on
either, so they stay at their original sane values until live data
indicates otherwise.
Tuning rationale captured as a code comment in GalaxyGatewayOptions
so the next reader knows what was deliberate and what's pending live
soak data.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Wires MxAccess.PublishingIntervalMs into the gw's SubscribeBulk
bufferedUpdateIntervalMs parameter on both subscribe paths:
- GalaxyDriver.SubscribeAsync — when the caller passes TimeSpan.Zero
(typical for infrastructure callers like the deploy watcher), the
driver substitutes _options.MxAccess.PublishingIntervalMs. When the
caller sets a non-zero interval (the server's UA subscription
publishingInterval), that wins.
- PerPlatformProbeWatcher — new bufferedUpdateIntervalMs ctor parameter
defaulting to 0 (gw default cadence). GalaxyDriver passes
_options.MxAccess.PublishingIntervalMs so probe ScanState changes
publish at the configured rate.
Tests: caller-wins-when-non-zero, fallback-to-config-when-zero on the
driver; default-zero, configured-forwarded, negative-rejected on the
probe watcher.
A session-level SetBufferedUpdateInterval RPC exists in the gw protocol
(MxCommandKind.SetBufferedUpdateInterval) but the .NET client doesn't
expose a typed helper yet — adjusting an existing subscription's
interval is a follow-up. Today's path subscribes once with the right
interval, which covers the common case.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Decouples the gw stream-read loop from the listener-fanout loop with a
bounded Channel<MxEvent> (default capacity 50_000) sitting between them.
When a slow listener fills the channel, the producer's TryWrite returns
false and we count the drop rather than back-pressuring the gw stream.
Three counters on the ZB.MOM.WW.OtOpcUa.Driver.Galaxy meter expose the
pressure curve before it manifests as user-visible loss:
- galaxy.events.received — MxEvents read from StreamEvents
- galaxy.events.dispatched — MxEvents that made it through to OnDataChange
- galaxy.events.dropped — MxEvents discarded because the channel was full
Each measurement carries a galaxy.client tag so multi-driver hosts can
split by source. The driver wires _options.MxAccess.ClientName into the
new EventPump constructor parameter.
Tests: drop-newest under pressure, capacity validation, and per-pump
measurement filtering (xUnit can run other pump tests in parallel and
their measurements land on the same listener — the test filters to its
own client name).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
In-box ActivitySource ("ZB.MOM.WW.OtOpcUa.Driver.Galaxy") wrapped around
the three gw-facing seams via decorators:
- TracedGalaxySubscriber — galaxy.subscribe_bulk / galaxy.unsubscribe_bulk
/ galaxy.stream_events spans. Stream span covers the entire stream
lifetime with a galaxy.event_count tag (per-event spans would dominate
the trace volume at 50k tags / 1Hz; PR 6.2 owns per-event metrics).
- TracedGalaxyDataWriter — galaxy.write spans tagged with
galaxy.tag_count, galaxy.secured_write_count (split between FreeAccess
/Operate vs Tune/Configure/VerifiedWrite, computed only when a listener
is recording so the hot path stays free), galaxy.success_count.
- TracedGalaxyHierarchySource — galaxy.get_hierarchy spans tagged with
galaxy.object_count.
GalaxyDriver.BuildProductionRuntimeAsync wraps the production seams in
the decorators. The driver itself doesn't take an OpenTelemetry package
dependency — System.Diagnostics.ActivitySource is in-box; the host
process picks the listener.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- GalaxyDriver.InitializeAsync now builds the production gw runtime (MxGatewayClient,
GalaxyMxSession, GatewayGalaxySubscriber, GatewayGalaxyDataWriter,
ReconnectSupervisor, HostConnectivityForwarder, PerPlatformProbeWatcher) when no
test seams are pre-injected; Dispose tears the chain down in order.
- GetHealth surfaces supervisor.IsDegraded as DriverState.Degraded so a transport
drop is observable without polling the supervisor directly.
- DiscoverAsync now refreshes the per-platform probe watcher's membership against
$WinPlatform / $AppEngine objects after every discovery pass.
- OnPumpDataChange routes ScanState changes through the probe watcher in addition
to fanning out OnDataChange to ISubscribable consumers.
- Server registers GalaxyDriver under "GalaxyMxGateway" alongside the legacy
"Galaxy" GalaxyProxyDriver factory so DriverInstance rows can opt in.
- Bumped Server.Tests' Microsoft.Extensions.Logging.Abstractions to 10.0.7 to
resolve the downgrade pulled in transitively via MxGateway.Client.
- Lifecycle factory tests switched to the internal seam-injection ctor so they
no longer attempt a real gRPC connect during InitializeAsync.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
HostStatusAggregator merges transport + per-platform host entries with
change-event diffing (re-asserting same state is a no-op so a stable
ScanState=Running burst doesn't fan out duplicates). PerPlatformProbeWatcher
ports the legacy GalaxyRuntimeProbeManager state machine onto the gw
subscription path: SubscribeBulk for `<tag>.ScanState`, idempotent
SyncPlatformsAsync (subscribe new, unsubscribe dropped), and a
DecodeState helper pinning bool/int/string ScanState values + bad-quality
fallback. HostConnectivityForwarder is the skeleton for the gw-6
StreamSessionHealth signal — until that mxaccessgw RPC ships, PR 4.5's
ReconnectSupervisor pushes transport state by calling SetTransport on
session connect/disconnect.
GalaxyDriver wiring (implement IHostConnectivityProbe, route OnDataChange
to PerPlatformProbeWatcher, expose GetHostStatuses() / OnHostStatusChanged,
push transport from supervisor) is deferred to PR 4.W to avoid conflict
with the rest of the Phase 4 deferred wiring (4.5 supervisor + 4.6
DeployWatcher).
Tests: 19 new
- HostStatusAggregatorTests (9): empty snapshot, new-host change with
Unknown predecessor, same-state silence, transition diff, snapshot
reflects every host, case-insensitive host names, Remove returns true
for tracked, Remove false for unknown, concurrent updates don't corrupt.
- HostConnectivityForwarderTests (5): SetTransport routes under client
name, transitions fire change, repeated same-state silent, empty client
name throws, post-dispose throws.
- PerPlatformProbeWatcherTests (5 + theory pinning DecodeState's full
truth table): subscribe N platforms, idempotent re-sync, removed
platforms unsubscribed + dropped from aggregator, OnProbeValueChanged
routing for Running/Stopped/bad-quality/foreign-ref, Dispose
unsubscribes everything.
NOTE: build is currently broken because mxaccessgw/clients/dotnet/ has
been removed from C:\Users\dohertj2\Desktop\mxaccessgw — this PR's source
is internally consistent and isolated from the missing dependency, but the
existing Driver.Galaxy code (PRs 4.1–4.6) can't compile until the .NET
client is restored. Once it is, expect 116 + 19 = 135 tests in the
Driver.Galaxy.Tests project.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
State machine that drives GalaxyDriver's recovery from gw transport
failure. Healthy → TransportLost → Reopening → Replaying → Healthy. Drivers
report failure signals; the supervisor runs reopen + replay with capped
exponential backoff (default 500ms → 30s) until both succeed.
Files:
- Runtime/ReconnectSupervisor.cs — state machine with snapshot, change
event, last-error tracking, and a one-attempt-at-a-time recovery loop.
Idempotent ReportTransportFailure: repeated failure reports during an
in-flight recovery do not spawn parallel loops. Reopen + replay are
caller-supplied callbacks (the driver injects them in the wire-up PR);
reopen re-Registers the gw session, replay re-establishes every active
subscription via gw's ReplaySubscriptionsCommand (mxaccessgw issue gw-3)
or the SubscribeBulk fallback. Dispose cancels the loop cleanly.
- Public StateTransition record + IsDegraded predicate the driver maps
to DriverState.Degraded for health snapshots.
Wiring (GalaxyDriver subscribes the supervisor to its EventPump's
transport-failure signal, exposes IsDegraded through GetHealth(), routes
reopen/replay callbacks through GalaxyMxSession + SubscriptionRegistry)
lands in PR 4.W to avoid conflict with the parallel host-probe track
(PR 4.7) and align the wire-up with the rest of Phase 4's plumbing.
9 supervisor tests (full state-machine traversal, retry-until-success on
both reopen and replay failures, idempotent failure reports, last-error
propagation, Dispose mid-recovery, post-dispose throws, fast-path Healthy
WaitForHealthy).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
DeployWatcher consumes GalaxyRepositoryClient.WatchDeployEventsAsync,
suppresses the bootstrap event, and raises RediscoveryEventArgs whenever
time_of_last_deploy actually changes. Reconnect-on-error with capped
exponential backoff. GalaxyDriver wiring (IRediscoverable.OnRediscoveryNeeded
event + StartAsync inside InitializeAsync) lands in a follow-up so this PR
doesn't conflict with the parallel runtime track.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Subscription path online. GalaxyDriver implements ISubscribable; subscribes
batches via gw SubscribeBulkAsync, runs a single shared EventPump consumer
of StreamEventsAsync, fans out OnDataChange events to every driver
subscription that observes the changed gw item handle.
Files:
- Runtime/GalaxySubscriptionHandle.cs — record implementing ISubscriptionHandle.
- Runtime/SubscriptionRegistry.cs — bookkeeping with forward (subscriptionId
→ bindings) and reverse (itemHandle → list of subscriptionIds) maps. The
reverse map is the fan-out index so a single OnDataChange dispatches to
every subscription that observes the changed handle.
- Runtime/IGalaxySubscriber.cs — driver-side seam: SubscribeBulk +
UnsubscribeBulk + StreamEventsAsync. Production wraps GalaxyMxSession;
tests substitute a fake driving synthetic MxEvents.
- Runtime/GatewayGalaxySubscriber.cs — production. Forwards to
MxGatewaySession; bufferedUpdateIntervalMs is captured for now and
becomes a SetBufferedUpdateInterval call once gw issue #102 / gw-9 lands
(PR 6.3 picks this up).
- Runtime/EventPump.cs — long-running background consumer of
StreamEventsAsync. Decodes MxValue + maps quality byte/MxStatusProxy via
StatusCodeMap. Fan-out per subscriber resolves through the registry; bad
handler exceptions are caught + logged, never break the dispatch loop.
Filters out non-OnDataChange families (write-complete and operation-
complete come back via InvokeAsync's reply path, not the event stream).
GalaxyDriver:
- Adds ISubscribable. SubscribeAsync allocates a subscription id,
SubscribeBulks, builds the binding list (failed gw entries get
ItemHandle=0 + a per-tag warn log), registers, and returns the handle.
EventPump is started lazily on first subscribe; one pump per driver
shared across all subscriptions.
- UnsubscribeAsync removes from the registry first (so stale events are
filtered immediately) then calls UnsubscribeBulk best-effort. Foreign
handles throw ArgumentException.
- ReadAsync NotSupportedException message updated: PR 4.4 no longer the
pointer (deferred to a small follow-up that wraps the pump as a
one-shot reader).
- Dispose tears down the pump first, then the repository client, then
clears state.
- Internal ctor extended with optional subscriber parameter.
Tests (15 new, 109 Galaxy total):
- SubscriptionRegistryTests: monotonic id allocation, single+multi
subscription fan-out, failed-handle exclusion, removal isolation, count
invariants.
- GalaxyDriverSubscribeTests: handle allocation + value-change dispatch,
multi-subscription fan-out, failed-tag silence, unsubscribe drops gw
handle and stops dispatch, foreign handle throws, no-subscriber throws,
empty-tag-list returns handle without calling gw.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Write path online. GalaxyDriver implements IWritable; routes by
SecurityClassification — SecuredWrite / VerifiedWrite tags go through
MxCommandKind.WriteSecured, everything else through MxGatewaySession.
WriteAsync. Per-tag classifications are captured during ITagDiscovery via a
SecurityCapturingBuilder wrapper that intercepts Variable() calls without
the discoverer needing to know about the driver's internal state.
Files:
- Runtime/MxValueEncoder.cs — boxed CLR value → MxValue. Covers seven Galaxy
scalar types (bool/int8-32/uint8-32 → Int32, int64/uint64 → Int64, float,
double, string, DateTime/DateTimeOffset → Timestamp) and 1-D array
variants. Inverse of MxValueDecoder; round-trip pinned by tests.
DateTime.Local converts to UTC; unsupported types throw ArgumentException.
- Runtime/IGalaxyDataWriter.cs — driver-side seam. Tests inject a fake to
capture routing decisions; production path uses GatewayGalaxyDataWriter.
- Runtime/GatewayGalaxyDataWriter.cs — production. Lazy-AddItem caches
itemHandles, encodes value, routes Write vs WriteSecured, translates
MxCommandReply (ProtocolStatus → BadCommunicationError; first
MxStatusProxy in statuses[] via StatusCodeMap.FromMxStatus). Per-tag
exception isolation: one bad write doesn't fail the batch.
- GalaxyDriver: now implements IWritable. Discovery wraps the supplied
IAddressSpaceBuilder in SecurityCapturingBuilder which records each
attribute's SecurityClass into _securityByFullRef before delegating.
WriteAsync resolves classification per tag (FreeAccess default for
unknown tags — matches the legacy backend), routes through the injected
writer. Throws NotSupportedException with PR 4.4 pointer when no writer
is wired (production path requires GalaxyMxSession.Connect from PR 4.4).
Tests (32 new, 94 Galaxy total):
- MxValueEncoder: every scalar type, narrowing checks (sbyte/short/byte/
ushort fit Int32; uint within Int32 range; ulong within Int64),
DateTime.Local → UTC conversion, array variants for bool/double/string/
DateTime, Dimensions populated, unsupported-type throws ArgumentException,
encoder/decoder round-trip pin.
- GalaxyDriverWriteTests: WriteAsync routes through fake writer with
values intact; theory exercises every SecurityClassification value through
the discovery-then-write path; unknown-tag defaults to FreeAccess; empty-
request short-circuit; no-writer fail-loud; post-dispose throws.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Read path scaffold + the byte→uint quality mapping table that the parity
matrix (PR 5.x) pins. PR 4.4 supplies the production GW-backed reader; this
PR ships the abstraction and the supporting infrastructure so 4.4 just
plugs the implementation in.
Files:
- Runtime/StatusCodeMap.cs — explicit OPC DA quality byte → OPC UA
StatusCode uint mapping. Extends the legacy Galaxy.Host
HistorianQualityMapper with named constants (Good / GoodLocalOverride,
Uncertain + 4 substatuses, Bad + 7 substatuses, BadInternalError) and an
MxStatusProxy → uint helper that honors success flag → detail byte →
detected_by transport-error fallback. Unknown bytes fall back to category
bucket with a once-per-session diagnostic log so field captures can
extend the table.
- Runtime/MxValueDecoder.cs — gateway MxValue → boxed CLR value for the
seven Galaxy data types (Boolean, Int32, Int64, Float32, Float64, String,
DateTime) plus their array variants. Honors MxValue.IsNull and
RawValue passthrough.
- Runtime/IGalaxyDataReader.cs — driver-side seam for one-shot reads. PR
4.4 ships the production wrapper around MxGatewaySession.SubscribeBulk +
StreamEvents + UnsubscribeBulk; this PR exposes the contract so
GalaxyDriver.ReadAsync wires through it.
- Runtime/GalaxyMxSession.cs — wrapper around MxGatewaySession that owns
the Register handle. ConnectAsync opens session + Register; AttachForTests
lets tests bypass real gw construction. PR 4.3/4.4/4.5 add write,
subscribe, and reconnect surfaces.
GalaxyDriver:
- Implements IReadable. ReadAsync routes through the injected
IGalaxyDataReader (test seam) when present; production path throws
NotSupportedException pointing at PR 4.4 — protects deployments running
this PR from silent wrong reads while signaling that the legacy-host
backend (Galaxy:Backend=legacy-host) handles reads in the meantime.
- Internal ctor extended with optional dataReader parameter (default null,
preserves PR 4.0/4.1 callers).
Tests: 42 new — exhaustive byte→uint table for StatusCodeMap (15 known
codes + category-bucket fallback for unknowns + MxStatusProxy precedence
rules + OPC UA top-byte invariants), every MxValue oneof case for the
decoder (bool/int32/int64/float/double/string/timestamp/3 array variants/
raw bytes/null), GalaxyDriver IReadable wiring (route-through, empty-
request, no-reader-throws, post-dispose-throws, status-code preservation).
62 Galaxy tests total pass.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Browse path online. GalaxyDriver now implements ITagDiscovery against the
gateway's GalaxyRepositoryClient (PR 0.1's mxaccessgw browse RPC) and feeds
the address-space builder one folder per gobject + one variable per dynamic
attribute, with alarm-bearing attributes carrying all five sub-attribute refs
the server-level AlarmConditionService (PR 2.2) needs.
Files:
- Browse/IGalaxyHierarchySource.cs — driver-side seam between the discoverer
and the gateway. Test fakes return canned hierarchies so the discoverer's
translation logic is exercised without a real gRPC channel.
- Browse/GatewayGalaxyHierarchySource.cs — production wrapper around
GalaxyRepositoryClient.DiscoverHierarchyAsync (paged internally).
- Browse/GalaxyDiscoverer.cs — translates GalaxyObject → IAddressSpaceBuilder
calls. Browse name = contained_name (falls back to tag_name); full
reference = attr.full_tag_reference when set, else tag_name + "." +
attribute_name. Skips objects/attributes with empty identity.
- Browse/DataTypeMap.cs — mx_data_type → DriverDataType (port from legacy
GalaxyProxyDriver.MapDataType, same fallback to String for unknown codes).
- Browse/SecurityMap.cs — security_classification → SecurityClassification
(port from legacy GalaxyProxyDriver.MapSecurity).
- Browse/AlarmRefBuilder.cs — populates the five sub-attribute refs by
Galaxy convention (.InAlarm/.Priority/.DescAttrName/.Acked/.AckMsg). The
same convention the legacy GalaxyAlarmTracker hard-coded; concentrated
here so PR 2.2's service receives complete AlarmConditionInfo rows.
GalaxyDriver:
- Added internal ctor accepting IGalaxyHierarchySource? for test injection.
Default lazily builds GatewayGalaxyHierarchySource around a
GalaxyRepositoryClient constructed from options on first DiscoverAsync.
- Owned GalaxyRepositoryClient disposed in Dispose.
- ApiKey resolution is currently a passthrough of ApiKeySecretRef — PR 4.W
(or follow-up) wires DPAPI-backed secret resolution.
csproj: path-based ProjectReference to mxaccessgw (the user is shipping
that repo on a parallel track; both repos sit side-by-side on the dev box).
Tests project also references MxGateway.Contracts directly to construct
GalaxyObject / GalaxyAttribute fixtures.
Tests: 10 new in Browse/GalaxyDiscovererTests.cs covering folder-per-object,
variable-per-attribute, full-ref defaulting + gw-supplied override, browse-
name fallback, every metadata field propagation, alarm sub-attribute ref
population, non-alarm rows skip MarkAsAlarmCondition, empty-identity skips,
empty-attribute-name skips, end-to-end through GalaxyDriver.DiscoverAsync.
20 total Galaxy tests pass.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
New in-process .NET 10 driver project at
src/ZB.MOM.WW.OtOpcUa.Driver.Galaxy/. The Tier-A replacement for
Driver.Galaxy.Host + Driver.Galaxy.Proxy. PR 4.0 ships only the IDriver
shape + factory + options; capability bodies (browse, read, write,
subscribe, deploy-watch, host probes) land in PRs 4.1–4.7.
Files:
- Driver.Galaxy.csproj — net10 x64, AnyCPU+x64 platforms, references
Core.Abstractions + Core. No MxGatewayClient ProjectReference yet — that
comes in PR 4.2 once the gw NuGet package is wired (the user is
shipping mxaccessgw on a parallel track).
- Config/GalaxyDriverOptions.cs — nested record hierarchy
(Gateway/MxAccess/Repository/Reconnect) mirroring the JSON shape spelled
out in lmx_mxgw_impl.md PR 4.0 acceptance section.
- GalaxyDriver.cs — minimal IDriver impl. Initialize/Shutdown toggle
DriverHealth between Healthy/Unknown; Reinitialize bumps the timestamp;
GetMemoryFootprint=0 (PR 4.4 wires SubscriptionRegistry size);
FlushOptionalCachesAsync no-op. Logs intent on lifecycle calls so
partial deployments are diagnosable.
- GalaxyDriverFactoryExtensions.cs — JSON parser, default fill-ins,
validation throw on missing required fields. Driver type name
"GalaxyMxGateway" intentionally distinct from legacy "Galaxy" so both
factories coexist during parity testing (Phase 5). PR 4.W's
Galaxy:Backend switch picks one or the other.
Tests:
- 10 tests in Driver.Galaxy.Tests covering minimal-config defaults, full
override path, three required-field error cases, factory registration
via DriverFactoryRegistry.TryGet, lifecycle health transitions
(Init → Shutdown → Reinit), Dispose idempotency, and post-disposal
ObjectDisposedException.
slnx: registers the new Driver.Galaxy + Driver.Galaxy.Tests projects.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Solution + DI plumbing to complete Phase 3. With this PR the .NET 10 server
can boot with the Wonderware historian sidecar in the loop, gated by config
so existing deployments are unaffected.
slnx: registers Driver.Historian.Wonderware (net48 sidecar),
Driver.Historian.Wonderware.Client (net10 client), and both test projects.
Server.csproj: adds ProjectReference to the .NET 10 client.
Program.cs: reads Historian:Wonderware:* configuration. When Enabled=true,
constructs a WonderwareHistorianClient singleton and:
- Registers it as IAlarmHistorianWriter so the SqliteStoreAndForwardSink
drain (task #248) can pick it up.
- Registers a WonderwareHistorianBootstrap hosted service that, on
StartAsync, calls IHistoryRouter.Register(prefix, client) under the
configured DriverInstancePrefix (default "galaxy") — lets the
HistoryRead* dispatch in DriverNodeManager find the sidecar via
longest-prefix-match resolution.
When Enabled=false (the default), DriverNodeManager keeps using its
internal LegacyDriverHistoryAdapter for the read path and the existing
NullAlarmHistorianSink stays in place — drop-in compatible with every
deployment that hasn't moved off Galaxy.Host yet.
42 server integration tests + 10 client tests pass. Full solution build
clean (0/0).
Note: scripts/install/Install-Services.ps1 and
src/.../Server/appsettings.json carry intermixed user WIP and are NOT
committed in this PR. Equivalent edits applied locally:
Install-Services.ps1: new -InstallWonderwareHistorian switch installs the
OtOpcUaWonderwareHistorian service alongside OtOpcUaGalaxyHost;
generates a fresh historian shared secret; OtOpcUa service depends on
both when historian sidecar is installed.
Server/appsettings.json: new Historian.Wonderware section with
Enabled=false default, PipeName/SharedSecret/PeerName/
DriverInstancePrefix/ConnectTimeoutSeconds/CallTimeoutSeconds keys.
Both pieces should land in a follow-up commit once the user's WIP on those
files clears.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
New project Driver.Historian.Wonderware.Client (net10 x64) implements both
Core.Abstractions.IHistorianDataSource (read paths consumed by the server's
IHistoryRouter) and Core.AlarmHistorian.IAlarmHistorianWriter (alarm-event
drain consumed by SqliteStoreAndForwardSink) against the sidecar's PR 3.3
pipe protocol.
Wire-format files (Framing/MessageKind, Hello, Contracts, FrameReader,
FrameWriter) are byte-identical mirrors of the sidecar's net48 originals —
the sidecar can't be referenced as a ProjectReference because of the
runtime/bitness gap, so we duplicate and pin the wire bytes via tests.
PipeChannel owns one bidirectional NamedPipeClientStream + Hello handshake +
serializes calls. Single in-flight at a time (semaphore); transport failures
trigger one in-flight reconnect-and-retry before propagating. Connect is
abstracted behind a Func<CancellationToken, Task<Stream>> so tests inject
in-process pipes.
WonderwareHistorianClient maps:
- HistorianSampleDto.Quality (raw OPC DA byte) → OPC UA StatusCode uint via
QualityMapper (port of HistorianQualityMapper from sidecar).
- HistorianAggregateSampleDto.Value=null → BadNoData (0x800E0000).
- WriteAlarmEventsReply.PerEventOk[i]=true → Ack, false → RetryPlease.
Whole-call failure or transport exception → RetryPlease for every event in
the batch (drain worker handles backoff).
- AlarmHistorianEvent → AlarmHistorianEventDto with severity bucketed via
AlarmSeverity-to-ushort mapping (Low=250, Medium=500, High=700, Crit=900).
GetHealthSnapshot tracks transport success + sidecar-reported failure
separately; ConsecutiveFailures rises on operation-level errors, not just
transport drops.
10 round-trip tests via FakeSidecarServer (in-process net10 fake using the
client's own framing): byte→uint quality mapping, null-bucket BadNoData,
at-time order preservation, event-field round-trip, sidecar error surfacing,
WriteBatch per-event status, whole-call retry-please mapping, Hello
shared-secret rejection, transport-drop reconnect-and-retry, health snapshot
counters.
PR 3.W will register this client as IHistorianDataSource + IAlarmHistorianWriter
in OpcUaServerService DI.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>