HistorianGateway is now the sole historian backend (read + alarm SendEvent +
continuous WriteLiveValues). Document the final state and retire the Wonderware
sidecar from the docs/config/labels:
- CLAUDE.md: rewrite the Historian section — ServerHistorian /
ContinuousHistorization / AlarmHistorian config keys, the IHistorianProvisioning
EnsureTags hook, the GatewayAlarmHistorianWriter SendEvent path + ReadEvents
dependency on gateway RuntimeDb:EventReadsEnabled=true, gateway-side
prerequisites (RuntimeDb flags + historian:read/write/tags:write scopes),
migration note, and two KNOWN-LIMITATION callouts (live-validation gate +
empty historized-ref-set recorder follow-on).
- appsettings.json: fix the stale ServerHistorian block (Host/Port/SharedSecret/
ServerCertThumbprint -> Endpoint/ApiKey/UseTls/AllowUntrustedServerCertificate/
CaCertificatePath/CallTimeout, keep MaxTieClusterOverfetch); add a disabled
ContinuousHistorization block; prune the orphaned Wonderware keys from
AlarmHistorian (keep the SQLite knobs). ApiKey env-supplied via
ServerHistorian__ApiKey (commented; valid strict JSON via _comment keys).
- README.md + docs (Historian.md, AlarmHistorian.md, Configuration.md,
ServiceHosting.md, DriverLifecycle.md, drivers/README.md, Uns.md, VirtualTags.md,
AlarmTracking.md, Client.UI.md, README.md, TestConnectProbes.md): retire the
Wonderware historian backend from current-backend descriptions; fix the stale
ServerHistorian/AlarmHistorian config tables (now gateway shape); convert
drivers/Historian.Wonderware.md to a retired stub pointing at the gateway.
- Source/UI labels (descriptive text only, no behavior change):
OtOpcUaServerHostedService.cs, HistoryPaging.cs, OtOpcUaSdkServer.cs,
HistorianAdapterActor.cs, VirtualTagModal.razor, ScriptedAlarmModal.razor,
AlarmsHistorian.razor now name the HistorianGateway backend.
Build clean (0 errors); AdminUI.Tests green (514 passed).
Claude-Session: https://claude.ai/code/session_012SDSQ3AcaXqPcBtDESBRii
The HistorianGateway driver is now the sole historian read/write+alarm backend, so the
Wonderware sidecar projects are dead code. Removes the 5 Wonderware projects (driver,
.Client, .Client.Contracts, + their 2 test projects) from the solution and tree, and fully
retires the vestigial 'Historian.Wonderware' driver type (UI/probe-only; it had no driver
factory): the Host probe registration, the AdminUI driver-config surface (driver page,
tag-config editor/model/validator entry, address picker/builder, driver-type catalog +
dropdown + edit-router entries), and their tests. Prunes the now-unused Wonderware
connection fields (Host/Port/UseTls/ServerCertThumbprint/SharedSecret) from
AlarmHistorianOptions (keeping Enabled + the SQLite store-and-forward knobs) and refreshes
the stale XML docs that named Wonderware as the production backend.
Claude-Session: https://claude.ai/code/session_012SDSQ3AcaXqPcBtDESBRii
Bind ContinuousHistorizationOptions (Enabled/OutboxPath/CommitMode/
CommitIntervalMs/DrainBatchSize/DrainIntervalSeconds/Capacity/backoff) with a
warn-only Validate(); gated on Enabled AND the ServerHistorian gateway being
configured, the Host registers the durable FasterLogHistorizationOutbox (container
-disposed) + a gateway-backed GatewayHistorianValueWriter, and binds outbox
depth/dropped observable gauges on the central scraped meter. WithOtOpcUaRuntimeActors
spawns the recorder (over the same dependency-mux ref) when the options + writer +
outbox resolve, registering ContinuousHistorizationRecorderKey. Spawned with an EMPTY
historized-ref set: the deployed address space builds later, so ref population is a
documented follow-on (a later SetHistorizedRefs feed) — T18 wires the actor + outbox
+ writer + meters; the ref feed is the known remaining gap.
Claude-Session: https://claude.ai/code/session_012SDSQ3AcaXqPcBtDESBRii
DriverHostActor/DriverInstanceActor and cluster events log via Akka's ILoggingAdapter, which had no Serilog bridge — under the Windows service host the default StandardOutLogger output is discarded, making the driver-role actor graph invisible (this masked the data-plane diagnosis).
- Add Akka.Logger.Serilog 1.5.60 (deps satisfied by Akka 1.5.62 / Serilog 4.3.1).
- WithOtOpcUaClusterBootstrap: ConfigureLoggers(DebugLevel; ClearLoggers(); AddLogger<SerilogLogger>()) — Akka.Hosting owns logger setup, so HOCON akka.loggers alone is not honored.
- Program.cs: set static Serilog.Log.Logger from the DI root logger after Build() (AddZbSerilog registers the MEL provider but not the static logger, which the Akka SerilogLogger and the startup banner both need).
Code review at HEAD 7286d320. Host-001 (High): /metrics was auth-gated on admin
nodes (Prometheus 401) -> AllowAnonymous. Host-002: register LdapOptionsValidator
unconditionally for fail-fast startup validation on admin-only nodes. Host-004: fix
metrics XML doc. Host-003 (docs) left Open.
All five suppressed advisories are now resolved at baseline/resolved versions,
so every NuGetAuditSuppress is removed repo-wide:
- System.Security.Cryptography.Xml (GHSA-37gx-xxp4-5rgx / GHSA-w3x6-4m5h-cxqf)
-> fixed by the .NET 10 baseline (10.0.6)
- OPCFoundation Opc.Ua.Core (GHSA-h958-fxgg-g7w3) -> fixed at resolved 1.5.378.106
Two were still live and are now patched via direct security pins:
- OpenTelemetry.Api 1.9.0 -> 1.15.3 (GHSA-g94r-2vxg-569j) pinned in Cluster;
Runtime/ControlPlane/AdminUI + tests inherit via project reference
- Tmds.DBus.Protocol 0.20.0 -> 0.21.3 (GHSA-xrw6-gwf8-vvr9) pinned in Client.UI
Also correct the Historian sidecar runtime comments (x86 -> x64, matching the
csproj PlatformTarget). Solution audit: 0 vulnerable packages; full build clean.
Wire the materialised AlarmConditionState method handlers so a client calling
Acknowledge/Confirm/Shelve/AddComment is gated on the AlarmAck data-plane role
and, when allowed, routed back to the scripted-alarm engine via a new
`alarm-commands` DistributedPubSub topic.
- Commons: new AlarmCommand DTO (AlarmId/Operation/User/Comment/UnshelveAtUtc).
- ScriptedAlarmHostActor: add AlarmCommandsTopic const.
- OtOpcUaNodeManager: settable AlarmCommandRouter + wire OnAcknowledge/OnConfirm/
OnAddComment/OnShelve/OnTimedUnshelve. Each resolves the principal off
ISessionOperationContext.UserIdentity as RoleCarryingUserIdentity, fails closed
(BadUserAccessDenied) when the AlarmAck role is absent or no identity, else maps
+ routes an AlarmCommand and returns Good. OnShelve discriminates OneShotShelve/
TimedShelve/Unshelve from the SDK flags; TimedShelve expiry = UtcNow + ms.
No Akka/IActorRef handle — only the Action<AlarmCommand> delegate. T20 de-dup
note left; WriteAlarmCondition untouched.
- OpcUaServer.Security: OpcUaDataPlaneRoles.AlarmAck shared const (the role was a
bare string everywhere; introduced one symbol for the gate + tests).
- OtOpcUaSdkServer: SetAlarmCommandRouter pass-through.
- Host: boot wiring publishes each command via mediator.Tell(Publish(...)) using a
lazy ActorSystem accessor (mirrors DpsScriptLogPublisher).
- Tests: 11 new gate + mapping tests (OpcUaServer.Tests 88->99, all green).
A thin gateway over the admin-operations cluster singleton so CI/scripts can trigger a
deployment without the Blazor button. Forwards to the same IAdminOperationsClient.
StartDeploymentAsync; mounted on admin-role nodes. Auth is a fixed-time X-Api-Key check
against Security:DeployApiKey (orthogonal to the cookie-only web auth); AllowAnonymous so the
auth fallback doesn't 401 it, self-disabling (503) until the key is set. Outcome->status:
202/200/409/422. Unit tests for the key check + outcome mapping; HTTP E2E (real auth + real
deploy via the 2-node harness). Documented in docs/security.md.
The per-role overlay (appsettings.{role}.json) was appended after WebApplicationBuilder's
default sources, so it outranked environment variables — a baked role file could not be
overridden by a deployment env var. In otopcua-dev this meant appsettings.admin.json's
Security:Ldap:DevStubMode=false beat the compose's DevStub override, so every AdminUI login
attempted a real LDAP bind against a non-existent server and failed with 'Unexpected
authentication error'.
- Program.cs: re-append AddEnvironmentVariables() + AddCommandLine(args) after the role
overlay so deployment overrides keep top precedence (overlay still beats base appsettings).
- docker-dev/docker-compose.yml: the DevStub env var targeted the stale 'Authentication:Ldap'
section; the code reads 'Security:Ldap'. Corrected the prefix on every host node (+ comment).
Dev AdminUI login now signs in as Administrator via the DevStub bypass.
Both bugs surfaced only on split-role deployments (the MAIN cluster's
admin-only nodes), where the AdminUI runs without the driver role.
- Test Connect returned "No probe registered" for every driver: the
IDriverProbe set was registered only under the driver role, but the
admin-operations singleton that consumes it is pinned to admin. Extract
AddOtOpcUaDriverProbes() (idempotent via TryAddEnumerable) and call it
in the hasAdmin path too.
- Live driver-status/alerts/script-log panels showed "SignalR error:
Connection refused": these Blazor Server components opened a HubConnection
to their own hub via the browser's public URL, which server-side code
can't reach behind Traefik (host :9200 -> container :9000). Read the
in-process source directly instead -- DriverStatus via
IDriverStatusSnapshotStore.SnapshotChanged, Alerts/ScriptLog via a new
IInProcessBroadcaster<T>. Fleet status was unaffected (reads DB/ActorSystem).
Adds unit tests for probe registration, the snapshot-store event, and the
broadcaster.
Cheap-and-fast probe: open TCP socket to the configured endpoint,
close immediately. Surfaces SocketError on failure, latency on
success, "timed out" on caller cancel. Sufficient for the AdminUI
Test Connect "can we reach the host?" question. Richer protocol-
level probes (OPC UA session open, FOCAS handshake, gRPC ping)
are a documented follow-up. Each probe registered as
AddSingleton<IDriverProbe, X> in DriverFactoryBootstrap so they
flow through DI into AdminOperationsActor.
Historian.Wonderware returns a clean "TCP probe not applicable"
result because it communicates over a Windows named pipe, not TCP.
Also adds OpcUaClient + Historian.Wonderware.Client project
references to Host.csproj (both were missing from the driver
ItemGroup).
Adds <summary>, <param>, <typeparam>, and <inheritdoc/> tags to public
members surfaced by commentchecker — resolves 5,847 of 5,869 issues
(99.6%) across three /fixdocs passes.
The v2 plan's blessed install path (scripts/install/Install-Services.ps1)
registers the host via `sc.exe create binPath=...OtOpcUa.Host.exe`, but the
binary never called `UseWindowsService`. Without it, the Service Control
Manager waits ~30s for the process to call SetServiceStatus(Running) and
then kills it — the install script's design was incomplete.
Two changes:
- Host.csproj: drop the `IsOSPlatform('Windows')` condition on the
Microsoft.Extensions.Hosting.WindowsServices package reference so the
package is always available. The runtime helper used by
UseWindowsService gates on WindowsServiceHelpers.IsWindowsService()
internally, so it's a no-op when running as a console app or under
Linux/macOS — the binary stays cross-platform-buildable.
- Program.cs: call builder.Host.UseWindowsService(options =>
options.ServiceName = "OtOpcUaHost") immediately after CreateBuilder.
When the host is launched by SCM, WindowsServiceLifetime takes over
the IHostLifetime slot and reports START/STOP correctly. When launched
by `dotnet run` or `OtOpcUa.Host.exe` from a console, it's a no-op.
Verified end-to-end on wonder-app-vd03.zmr.zimmer.com: `sc.exe create`
followed by `sc.exe start OtOpcUaHost` transitions from START_PENDING to
RUNNING; /login + /health/ready + /health/active all return 200; service
survives SSH session close and auto-starts on boot per the AUTO_START
flag set by the installer script.
Two fixes surfaced while bringing up the docker-dev stack end-to-end:
- HealthEndpoints.MapOtOpcUaHealth now calls .AllowAnonymous() on /health/ready,
/health/active, /healthz. Without it the AddOtOpcUaAuth fallback policy 401s
every probe and Traefik marks every backend unhealthy → all three cluster
routes return 503.
- cluster-seed entrypoint no longer attempts to apply Migrate-To-V2.sql via
sqlcmd. The EF-generated idempotent script puts CREATE PROCEDURE inside
IF NOT EXISTS BEGIN ... END blocks (procs must be first in their batch),
so sqlcmd fails with "Must declare the scalar variable @FromGenerationId".
EF's own runner handles this; sqlcmd doesn't. The seed now just waits for
the schema and applies row inserts. Migrations remain the operator's job:
dotnet ef database update --project src/Core/.../Configuration \
--startup-project src/Server/.../Host
Also: LDAP service removed (bitnami/openldap:2.6 image retired, legacy tag
crashes mid-setup with exit 68); every host now runs with
Authentication__Ldap__DevStubMode=true. Bumped LDAP+Traefik dashboard host
ports to avoid collisions with the sister scadalink dev stack (3893→3894,
8080→8089).
Confirmed working end-to-end: all three Traefik routes return HTTP 200,
cluster-seed populates ServerCluster (MAIN/SITE-A/SITE-B) + ClusterNode
(driver-a/b, site-a-1/2, site-b-1/2) rows on first boot.
Six interlocking fixes surfaced while smoke-testing the fused Host in a browser:
- Host/Program.cs: UseStaticWebAssets() opts into the RCL static-asset pipeline
in any environment (auto-only in Development), MapStaticAssets().AllowAnonymous()
exempts CSS/JS from the AddOtOpcUaAuth fallback policy, and
AddCascadingAuthenticationState() lets <AuthorizeView/> work inside interactive
components (NavSidebar's session block).
- Security/ServiceCollectionExtensions: ILdapAuthService Scoped → Singleton —
consumed by the Singleton LdapOpcUaUserAuthenticator on driver-role nodes.
Crash only surfaced in Development (ValidateOnBuild=true).
- Security/Endpoints/AuthEndpoints: /auth/login now dispatches on Content-Type —
application/json keeps the original 204/401/503 contract for tests, and
application/x-www-form-urlencoded (the browser <form>) gets a redirect dance.
DisableAntiforgery on the login endpoint (it's the entry point, no prior session)
and AllowAnonymous to override the fallback policy.
- Security/Ldap/LdapOptions + LdapAuthService: real DevStubMode property; when
true the auth service bypasses the LDAP bind and returns a FleetAdmin role so
dev/test can navigate the full Admin UI without GLAuth running.
- AdminUI/EndpointRouteBuilderExtensions: doc-comment update about static-asset
flow (the actual MapStaticAssets call lives in Host/Program.cs).