Bundle C task M5-T7 — surface DefaultAuditPayloadFilter redactor
over-redactions as a Site Health metric so a misconfigured /
catastrophic regex shows up on /monitoring/health rather than
disappearing into a NoOp sink.
- SiteHealthReport: new 'AuditRedactionFailure' int field
(defaulted to 0 for back-compat with existing producers/tests).
- ISiteHealthCollector / SiteHealthCollector:
new IncrementAuditRedactionFailure() — per-interval atomic
counter with Interlocked, reset on CollectReport, mirroring
the M2 Bundle G SiteAuditWriteFailures pattern.
- HealthMetricsAuditRedactionFailureCounter: new bridge in
ScadaLink.AuditLog.Site that forwards IAuditRedactionFailureCounter
increments to ISiteHealthCollector — mirrors
HealthMetricsAuditWriteFailureCounter one-for-one.
- AddAuditLogHealthMetricsBridge: now ALSO Replaces the
NoOpAuditRedactionFailureCounter binding with the health-metrics
bridge, so a single AddAuditLogHealthMetricsBridge() call wires
both the M2 Bundle G write-failure counter and the M5 Bundle C
redaction-failure counter into the health report.
Site-side only for M5 — the filter also runs on CentralAuditWriter
and AuditLogIngestActor (where it just keeps the NoOp default), but
a central-side health-metric surface for AuditRedactionFailure is
deferred to M6 alongside the rest of the central health collector
work.
Tests:
- AuditRedactionFailureMetricTests (HealthMonitoring) covers the
SiteHealthCollector increment/report/reset shape (3 tests).
- HealthMetricsAuditRedactionFailureCounterTests (AuditLog) covers
the AuditLog → HealthMonitoring bridge (3 tests).
- Existing CountCapturingHealthCollector stub in
DeploymentManagerRedeployTests extended with the new no-op
interface method.
Verified: dotnet build clean, all 24 test projects green
(the only Failed at first ScadaLink.SiteRuntime.Tests run was the
known-flaky InstanceActorChildAttributeRaceTests; passes on re-run
in isolation and full suite, unrelated to these changes).
Bundle G of Audit Log #23 M2. Bridges the FallbackAuditWriter primary-
failure counter into the Site Health Monitoring report payload so a
sustained audit-write outage surfaces on /monitoring/health instead of
disappearing into a NoOp sink.
- SiteHealthReport: add SiteAuditWriteFailures (defaulted, additive).
- ISiteHealthCollector + SiteHealthCollector: new
IncrementSiteAuditWriteFailures() counter, per-interval reset
semantics matching ScriptErrorCount / DeadLetterCount.
- HealthMetricsAuditWriteFailureCounter: adapter forwarding
IAuditWriteFailureCounter.Increment() to the collector.
- AddAuditLogHealthMetricsBridge(): swaps the NoOp default
registration for the real bridge; called from
SiteServiceRegistration after AddSiteHealthMonitoring + AddAuditLog.
- Existing host-wiring test updated: site composition now resolves
HealthMetricsAuditWriteFailureCounter (not NoOp).
Tests: HealthMonitoring 60 -> 63 (3 new), AuditLog 56 -> 59 (3 new),
full solution green.
A heartbeat-registered site that has never sent a full report now has
LastReportReceivedAt = null instead of the year-0001 sentinel. TimestampDisplay
accepts DateTimeOffset? and renders null as a placeholder ('awaiting first
report') rather than a ~2000-year-stale date. Cross-module: HealthMonitoring +
CentralUI.
Move all package versions into Directory.Packages.props so every project
resolves a single consistent version. Consolidates the Roslyn packages
(Microsoft.CodeAnalysis.CSharp.Scripting/Workspaces) onto 5.0.0, which
resolves the pre-existing NU1608 version-skew error in the test projects.
The Test Run sandbox and Monaco analysis modelled a script API that had
drifted from the site runtime's ScriptGlobals, so real scripts failed to
compile in Test Run. Realign both to the runtime surface
(Instance/Scripts/ExternalSystem/Attributes/Children/Parent) and drop the
duplicate ScriptHost stub so the two cannot diverge again.
- Script calls (Scripts.CallShared, Instance.CallScript, Route.To().Call)
accept an anonymous object instead of a hand-built dictionary, via a
shared ScriptArgs normalizer; existing dictionary calls still compile.
- Test Run can optionally bind to a deployed instance, so Instance/
Attributes/CallScript route to it cross-site; adds site-side
RouteToGetAttributes/RouteToSetAttributes handlers.
- Adds Test Run panels to the API method and template script editors.
- Fixes the TestDatabaseQuery seed script, which queried a table that
never existed.
Also commits unrelated in-progress work already in the tree: the health
monitoring report loop, site streaming changes, and the Admin/Design
data-connection and SMTP page reorganization.
CentralCommunicationActor.HandleHeartbeat was forwarding each incoming
HeartbeatMessage to Context.Parent, which resolves to the /user
guardian — a non-actor. Every site heartbeat went straight to dead
letters (~1026 per central node per 30 minutes at the default ~2s
interval across three sites).
The aggregator now exposes MarkHeartbeat(siteId, receivedAt) which
bumps LastReportReceivedAt on already-known sites (and clears IsOnline
if it had flipped) without touching LatestReport. Heartbeats from
unregistered sites are dropped — first registration still happens on
the first full report. CentralCommunicationActor calls this in place
of the no-op Tell.
The result: heartbeats now serve their stated health-monitoring
purpose (per CLAUDE.md) by keeping a site marked online between the
30s full reports if a single report is briefly delayed, and the dead
letter noise disappears entirely.
Restore inside the docker build was failing because TreatWarningsAsErrors
promotes NU1902/NU1903/NU1904 (vulnerable package warnings) to errors.
Bump the flagged packages to advisory-free versions:
- MailKit 4.15.1 -> 4.16.0 (GHSA-9j88-vvj5-vhgr)
- Microsoft.AspNetCore.DataProtection.EFCore 10.0.5 -> 10.0.7 (GHSA-9mv3-2cwr-p262, transitively pulls fixed System.Security.Cryptography.Xml — GHSA-37gx-xxp4-5rgx, GHSA-w3x6-4m5h-cxqf)
- OpenTelemetry.Api (transitive via Akka.Hosting) 1.9.0 -> 1.15.3 (GHSA-g94r-2vxg-569j, GHSA-8785-wc3w-h8q6) — added as a direct PackageReference in ScadaLink.Host to override the Akka.Hosting pin
To resolve the NU1605 downgrade chain triggered by DataProtection.EFCore
10.0.7 (which transitively requires Microsoft.EntityFrameworkCore >= 10.0.7
and friends), bump every Microsoft.* 10.0.5 reference across src/ and
tests/ to 10.0.7 in lockstep.
Add NodeStatus record, IClusterNodeProvider interface, and AkkaClusterNodeProvider
that queries Akka cluster membership for all site-role nodes. HealthReportSender
populates ClusterNodes before each report. UI shows a row per node with
hostname, Online/Offline badge, and Primary/Standby badge. Falls back to
single-node display if ClusterNodes is not populated.
- AkkaHostedService: SetNodeHostname from NodeOptions
- DataConnectionActor: UpdateConnectionEndpoint on state transitions,
track per-tag quality counts and UpdateTagQuality on value changes
- HealthReportSender: query StoreAndForwardStorage for parked message count
- StoreAndForwardStorage: add GetParkedMessageCountAsync()
New fields in SiteHealthReport: NodeHostname, DataConnectionEndpoints
(primary/secondary), DataConnectionTagQuality (good/bad/uncertain),
ParkedMessageCount. New collector methods to populate them.
Health dashboard redesigned to match mockup: Nodes | Data Connections
(with per-connection tag quality) | Instances + S&F Buffers | Error
Counts + Parked Messages. Site names resolved from repository.
- Add JoeAppEngine folder to OPC UA nodes.json (BTCS, AlarmCntsBySeverity, Scheduler/ScanTime)
- Fix DataConnectionActor: capture Self in PreStart for use from non-actor threads,
preventing Self.Tell failure in Disconnected event handler
- Implement InstanceActor.HandleConnectionQualityChanged to mark attributes Bad on disconnect
- Fix LmxFakeProxy TagMapper to serialize arrays as JSON instead of "System.Int32[]"
- Allow DataType and DataSourceReference updates in TemplateService.UpdateAttributeAsync
- Update test_infra_opcua.md with JoeAppEngine documentation
Both nodes of a site cluster were sending health reports. The standby
node (without the DeploymentManager singleton) reported 0 instances and
no connections, overwriting the active node's data in the aggregator.
Added IsActiveNode flag to ISiteHealthCollector, set by
DeploymentManagerActor on PreStart/PostStop. HealthReportSender skips
sending when the node is not active. Also ensured EnsureDclConnections
is called during startup batch creation so data connections survive
container restarts.
Wired ISiteHealthCollector calls for script errors (ScriptExecutionActor),
alarm eval errors (AlarmActor), dead letters (DeadLetterMonitorActor), and
S&F buffer depth placeholder. Added instance count tracking (deployed/
enabled/disabled) to SiteHealthReport via DeploymentManagerActor. Updated
Health Dashboard UI to show instance counts per site. All metrics flow
through the existing health report pipeline via ClusterClient.
- Rider launch profiles: "ScadaLink Central" and "ScadaLink Site"
- appsettings.Central.json: correct test_infra credentials (ScadaLink_Dev1#,
scadalink_app user, GLAuth on 3893, Mailpit on 1025)
- Fix HealthMonitoring DI: split site vs central registration to avoid
missing IHealthReportTransport on central
- Regenerate single clean EF migration (InitialSchema) covering all entities
- Suppress PendingModelChangesWarning in dev mode
- Fix isDevelopment check for ASPNETCORE_ENVIRONMENT propagation
Verified: Host starts, connects to SQL Server, applies migrations, boots
Akka.NET cluster, LDAP auth works (admin/password via GLAuth), health
endpoint returns Healthy.
- WP-0.10: Role-based Host startup (Central=WebApplication, Site=generic Host),
15 component AddXxx() extension methods, MapCentralUI/MapInboundAPI stubs
- WP-0.11: 12 per-component options classes with config binding
- WP-0.12: Sample appsettings for central and site topologies
- Add execution procedure and checklist template to generate_plans.md
- Add phase-0-checklist.md for execution tracking
- Resolve all 21 open questions from plan generation
- Update IDataConnection with batch ops and IAsyncDisposable
57 tests pass, zero warnings.
17 source projects (Commons + Host + 15 components) and 17 xUnit test projects.
SLNX format, net10.0, nullable enabled, warnings as errors. All components
reference Commons; Host references all components. Builds and tests clean.