Replace ASCII-art diagrams across the README and docs/ with editable
.drawio sources plus exported PNGs, so the diagrams render clearly in
rendered markdown and can be maintained/regenerated instead of being
hand-edited as fragile text art. Non-diagram blocks (code, folder
trees, UI wireframes) were left as text.
Renames the 13 SCADALINK_* runtime env vars → SCADABRIDGE_*, the ScadaLink__
.NET config keys → ScadaBridge__, the stale ScadaLink.Host.exe assembly name
→ ZB.MOM.WW.ScadaBridge.Host.exe, the scadalink_app SQL login → scadabridge_app,
and residual identifiers/comments/docs. Migration records (prior rename
tooling/design, DB-rename helper, this scrub script) carved out.
Adds tools/scrub-scadalink-refs.sh.
The native alarms feature merged with 7 component docs updated, but the
spec layer drifted: HighLevelReqs, Commons, and ManagementService had no
native-alarm coverage and the README table flagged it on only one row.
Add HighLevelReqs §3.4.2 (+ validation), document the Commons
types/entities/messages and the 7 ManagementService commands, sync the
README rows + link the TreeView sub-component, fix 2 broken plan links,
and drop the one-off native-alarms RESUME scratchpad.
Expanding a Galaxy object in the tag picker hung on "loading…": the browse
reply inlined every child's full attribute set (~152 KB), exceeding Akka's
128 KB remote frame, and remoting silently discarded the oversized reply.
Browse path (DataConnectionLayer):
- RealMxGatewayClient: navigation now uses BrowseChildren(include_attributes=
false) — child objects only — and an object's own attributes load lazily via
DiscoverHierarchy(root, max_depth=0) when it's expanded. Payload drops from
~152 KB/level to a few KB. Seam contract unchanged.
- DataConnectionActor.CapBrowseChildren: protocol-agnostic byte-budget cap
(~100 KB) on every BrowseNodeResult before it crosses the site→central
frame, OR-ing the adapter's own Truncated flag. Byte budget, not a count —
the only bound that holds regardless of NodeId/attribute-name length.
- RealOpcUaClient: requestedMaxReferencesPerNode 1000 → 500 to narrow the
window before the byte budget applies.
- Graceful gRPC Unimplemented handling → NotSupportedException →
BrowseFailureKind.NotBrowsable with an actionable message (older gateway
builds lacking BrowseChildren).
Picker UI (CentralUI):
- NodeBrowserDialog: modal-lg → modal-xl; new scoped .razor.css caps the tree
at 55vh with its own scrollbar so manual entry + Select/Cancel stay visible.
- Protocol-agnostic failure messages (was hardcoded "OPC UA …"); renamed the
leftover opcua-browser-tree class to node-browser-tree.
Tests: new frame-budget cap test + NotSupported=>NotBrowsable mapping test;
DCL suite 88/88. Doc: Component-DataConnectionLayer.md records the lazy
attribute-light browse and the frame-size guard.
Adds MxGateway under Supported Protocols, an MxGateway Settings config table,
notes IBrowsableDataConnection now backs both protocols via BrowseNodeCommand/
BrowseService, and updates the README component table.
Final themed batch. 5 well-localised correctness fixes.
Serialisation precision:
- ESG-020: DatabaseGateway.JsonElementToParameterValue probes
TryGetInt64 → TryGetDecimal → GetDouble, so a script's high-precision
decimal SQL parameter survives the cached-write retry round-trip
without silent precision loss. 3 new regression tests.
Template engine correctness:
- TE-018: DiffService gains ComputeConnectionsDiff over
FlattenedConfiguration.Connections, mirroring the existing entity-diff
shape and pairing with the Theme 1 TE-017 hash-coverage fix. A
ConfigurationDiff record extension in Commons is flagged as a follow-up.
- TE-019: TemplateResolver.BuildInheritanceChain now walks via the
int? ParentTemplateId directly — only null means "no parent". A real
Id of 0 (the prior special-cased sentinel) now walks the chain like
any other node, matching the TemplateEngine-013 CycleDetector fix.
Regression of TE-013 closed.
- TE-020: All 5 Create* paths in TemplateService + SharedScriptService
re-ordered to save-first → log-with-real-Id → save-audit (matching
the InstanceService pattern). Create* audit rows no longer carry a
literal "0" EntityId.
Doc deferral:
- Transport-012: Component-Transport.md §Audit Trail now spells out that
the BundleImportId repository filter IS wired (in CentralUiRepository),
but the Audit-Log-Viewer UI dropdown + summary-row hyperlink are a
deferred CentralUI follow-up. CLI workaround documented
(audit query --bundle-import-id).
11+ new regression tests (3 ESG, 4 DiffService, 3 TemplateResolver, 4
TemplateService, 1 SharedScriptService). Build clean; ESG 72/72,
TemplateEngine 324/324. README regenerated: 1 pending of 481 total.
Session-to-date: 135 of 136 originally-open Theme findings closed
across 10 themes in 10 commits.
The largest themed batch — small mechanical fixes across 11 modules.
API / message hygiene:
- Comm-020: SiteAddressCacheLoaded now carries IReadOnlyDictionary /
IReadOnlyList — Akka messages must be immutable.
- Commons-016: BundleSession.MaxUnlockAttempts named constant replaces
magic 3.
- Commons-018: IOperationTrackingStore + IPartitionMaintenance moved from
Interfaces/ root to Interfaces/Services/ (namespace preserved — 9
consumers exceeded the in-prompt move threshold).
- Commons-023: TrackingStatusSnapshot.SourceNode now consistent with the
trailing-optional-with-default pattern used elsewhere.
- SR-022: AuditingDbCommand.DbConnection.set no longer uses reflection —
exposes AuditingDbConnection.Inner via internal API surface.
Dead code / config cleanup:
- ClusterInfra-011: decorative SectionName constant deleted.
- ClusterInfra-014: dead AddClusterInfrastructureActors method + its
"throws-when-called" test deleted.
- Host-021: Microsoft Logging:LogLevel block deleted from appsettings.json
(dead under Serilog).
Fail-loud over fail-silent:
- DM-021: ResolveSiteIdentifierAsync throws on missing site (was silently
substituting a DB id).
- DM-022: dropped transient Pending write — record now lands directly in
InProgress (no UI flicker, one fewer DB write).
- Host-020: LoggerConfigurationFactory emits a Console.Error warning when
both Serilog:MinimumLevel and ScadaLink:Logging:MinimumLevel are set
(ScadaLink remains truth per Host-011).
- SnF-022: NotifyCachedCallObserverAsync logs Warning on unparseable
TrackedOperationId (was silently dropping).
- SnF-023: empty siteId default replaced with $unknown-site sentinel
+ constructor normalisation.
Correctness:
- SCA-001: SupervisorStrategy XML rewritten to match actual
DefaultDecider/Restart semantics (was claiming Resume).
- SCA-003: OnUpsertAsync now restamps IngestedAtUtc on every upsert.
- SR-021: HandleDeployArtifacts now dispatches an internal
ApplyArtifactDataConnectionsToDcl message after the SQLite write so
system-wide artifact-deploy data-connection changes go live
immediately (was requiring a site restart).
- SnF-020: RetryParkedMessageAsync captures the parked row BEFORE the
local write so a concurrent delete can't skip standby replication.
Sentinels / naming collisions:
- HM-021: CentralSiteId changed from "central" to "$central"
(uncollideable — leading $ is forbidden in real SiteIdentifiers).
Doc / surface cleanups:
- SEL-018: FailedWriteCount promoted to ISiteEventLogger; XML softened
to "Available for future Health Monitoring integration".
- SnF-019: VERIFY outcome — documented parking-after-DefaultMaxRetries
in Component-StoreAndForward.md + DefaultMaxRetries XML (uniform
cap; maxRetries:0 is the unbounded escape hatch).
- SnF-021: Component-StoreAndForward.md no longer claims the tracking
table lives in SnF — it's in SiteRuntime, the interface is in Commons.
- CLI-020: bundle export response parse guarded with try/catch on
JsonException / KeyNotFoundException / FormatException — emits a
clean INVALID_RESPONSE exit instead of a stack trace.
Config:
- ClusterInfra-013: intent comment added to "catastrophic config" test.
- Host-016: appsettings.Site.json second CentralContactPoints entry
removed (was pointing at the SITE's own port); doc-key explains how
to extend.
- Host-018: NodeName added to both shipped per-role configs (was
causing SourceNode to be null on audit rows).
UI:
- CentralUI-029: replaced JS.InvokeAsync<int>("eval", …) with an ES
module import (new wwwroot/js/browser-time.js).
- CentralUI-032: AuditResultsGrid gains a Previous button backed by a
cursor stack.
10+ new regression tests across the affected projects. Build clean;
all suites green. README regenerated: 6 open (was 33).
Session-to-date: 130 of 136 originally-open Theme findings closed.
Comm-016: delete dead HandleConnectionStateChanged + _debugSubscriptions /
_inProgressDeployments tracking + ConnectionStateChanged message record.
Disconnect detection is owned by the transport layers (gRPC keepalive PING
~25s; Ask-timeout at CommunicationService). Updates the
Component-Communication.md design doc to make that explicit.
SnF-018: NotificationForwarder.DeliverAsync now discards a corrupt buffered
payload (Warning log + return true) instead of returning false and parking
the row — honoring the design's "notifications do not park" invariant.
DM-018: reconciliation no longer force-sets Enabled, preserving an
intentional Disabled state after central failover.
ESG-018: DeliverBufferedAsync (both ExternalSystemClient + DatabaseGateway)
catches JsonException and returns false, turning a corrupt buffered row
into a parked operation instead of a retry-forever poison message.
InboundAPI-022: register ActiveNodeGate as IActiveNodeGate in the Central
DI branch so standby-node gating is actually wired up in production.
NS-019: remove orphaned NotificationDeliveryService /
INotificationDeliveryService / NotificationResult; central notification
delivery now lives entirely in NotificationOutbox.
SEL-016: normalise From/To filters to UTC before ISO-string compare so
non-UTC DateTimeOffset clients no longer get spuriously excluded events.
TE-017: include Description on attributes/alarms and a HashableConnections
projection (protocol, endpoint JSON, failover count) in the revision hash
and DiffService; staleness detection now catches description-only and
connection-endpoint edits.
Transport-001 and Transport-002 (also High) remain Open — they're being
handled in a follow-up batch because both touch BundleImporter.cs and
must serialise.
Reflect this session's implementation work in the Transport (#24)
component spec:
- New 'CLI' section covering bundle export / preview / import
commands, the base64-over-JSON wire format, the 200 MB request-body
cap, and the 5-minute per-command timeout. Authorization table +
Interactions section updated to mention ManagementActor handlers.
- Import wizard nav placement corrected from Design to Admin (already
the case in code; the spec lagged).
- Blocker-scan heuristic boundaries documented under Import Flow:
the '.' skip, the DataSourceReference exclusion, and the
KnownNonReferenceNames denylist. Both DetectBlockersAsync and
RunSemanticValidationAsync Pass 1 share the filter.
- Adds SourceNode varchar(64) NULL to AuditLog, Notifications, and SiteCalls
tables with role-name semantics: node-a/node-b for site rows (qualified by
SourceSiteId), central-a/central-b for central direct-write rows.
- New IX_AuditLog_Node_Occurred (SourceNode, OccurredAtUtc) index.
- Reframes CLAUDE.md from documentation-only to implementation project.
- Adds docs/plans/2026-05-23-audit-source-node.md + tasks.json companion.
The M1 implementation (Bundle A) committed concrete AuditChannel /
AuditKind / AuditStatus enums that reflect CLAUDE.md's locked
cached-call lifecycle decisions. The older alog.md and
Component-AuditLog.md narratives still used pre-M1 vocabulary
(Success / TransientFailure / PermanentFailure / Enqueued / Retrying /
SyncCall / CachedEnqueued / Attempt / Terminal / Completed). This
commit reconciles both docs to the M1 vocabulary:
AuditChannel : ApiOutbound, DbOutbound, Notification, ApiInbound
AuditKind (10): ApiCall, ApiCallCached, DbWrite, DbWriteCached,
NotifySend, NotifyDeliver, InboundRequest,
InboundAuthFailure, CachedSubmit, CachedResolve
AuditStatus(8): Submitted, Forwarded, Attempted, Delivered, Failed,
Parked, Discarded, Skipped
Updates:
- Status column description + worked examples use the new 8 values.
- Kind table flattened from per-channel groupings to a single flat
list of the 10 discriminators (no more SyncCall / Cached* /
Attempt / Terminal / Completed).
- Cached-call lifecycle examples rewritten to the
CachedSubmit -> Forwarded -> Attempted... -> CachedResolve shape.
- Notification lifecycle examples rewritten to
NotifySend(Submitted) -> NotifyDeliver(Attempted) ->
NotifyDeliver(Delivered/Parked/Discarded).
- Inbound API examples split into InboundRequest (success path) and
InboundAuthFailure (401 path).
- 'Errors only' UI toggle, audit-error-rate KPI, and payload-cap
decision (#6 in §16) all switched from 'non-Success' to
Status IN ('Failed', 'Parked', 'Discarded').
- Per-site event-rate table in §13.1 renamed to the new kinds.
Pure design correction; no operational behavior change. Per the
goal-prompt invariant #6, alog.md may change when a design correction
is committed before the affected code change — this commit is that
correction, landed ahead of the M1 merge so the merge order reads
design-first, code-second.
No code, test, or infra file changes.
Final cross-bundle reviewer identified 7 inconsistencies that the per-bundle
reviewers couldn't see; all fixed in one logical commit.
Critical:
- HighLevelReqs AL-3: drop 'then upsert-on-newer-status' — AuditLog is
strictly append-only (correct for SiteCalls/Notifications, wrong for
the immutable AuditLog shadow).
- Component-AuditLog Error rate KPI: align with HealthMonitoring's
exclusion list (Success/Delivered/Enqueued) rather than just non-Success;
otherwise every Delivered notification or Enqueued cached call would be
counted as an error.
Important:
- Component-AuditLog line 154: ISiteAuditWriter -> IAuditWriter (canonical
name per Commons and the rest of this doc).
- Component-AuditLog Central direct-write paragraph: convert remaining
slash notation (ApiInbound/Completed, Notification/Attempt,
Notification/Terminal) to dot notation used everywhere else.
- Component-ClusterInfrastructure: scope SiteCallAuditActor to
reconciliation + KPIs + Retry/Discard relay; cached-telemetry ingest is
AuditLogIngestActor's role per Combined Telemetry contract.
- Component-CentralUI Audit Log page: state the OperationalAudit read
permission and the read-vs-export split (matching CLI doc).
- Component-NotificationOutbox: add never-fail-the-action invariant for
dispatcher audit writes.
Minor:
- Component-InboundAPI: 'Non-blocking semantics' was ambiguous (could be
read as async); reword to 'Fail-soft' — the write is still synchronous
before flush, but failures are caught and don't change the response.
- Component-CLI: realign audit-query/audit-export flags to actually match
the Central UI Audit Log filter set (channel, kind, status, site,
instance, target, actor, correlation-id, errors-only); drop --user and
--entity-id which are IAuditService concepts, not Audit Log columns.
- Component-AuditLog KPI tile names: 'Volume/Error rate/Backlog' ->
'Audit volume/Audit error rate/Audit backlog' (matches Central UI and
Health Monitoring); drop the two orphan KPIs (Top inbound callers, Top
outbound 5xx) that were never surfaced anywhere.
- Component-AuditLog Interactions: re-attribute DbOutbound emissions to
ESG (where Database.* lives) with a note that Site Runtime is the API
surface for scripts.
- HighLevelReqs AL-12: drop 'and reconciliation operations' (CLI has no
reconcile command; reconciliation is an internal self-healing pull).
Add note that verify-chain becomes operational once AL-11's hash chain
ships.
Task 10's reviewer noted that Component-CentralUI.md renamed the
IAuditService page from 'Audit Log Viewer' to 'Configuration Audit Log
Viewer' to avoid collision with the new operational Audit Log page (#23).
Two stale lowercased refs in Component-ConfigurationDatabase.md needed
the same disambiguation.
Bundle D code-review feedback on 0ae1a25 and e6f7a7f:
- Audit error rate (HealthMonitoring tile) was described as a combined
view of CentralAuditWriteFailures + AuditRedactionFailure (writer
health). Per alog.md §10.3 / §14.1 it is the operational error rate
of audited operations: % of central AuditLog rows with Status not
in (Success/Delivered/Enqueued) over a rolling 5-min window. Audit
writer issues surface separately via the dedicated metrics.
- Audit volume description gains the spec-mandated 'events/min, global
+ per-site sparkline' shape.
- CLI: scadalink audit was claiming all three subcommands need both
OperationalAudit and AuditExport. Per alog.md §11.2 / §15.1, read
(query, verify-chain) needs OperationalAudit; bulk export
additionally requires AuditExport. Restored the spec's split.
Reviewer flag on 1bbfad3: "per Component-AuditLog.md, §6.2" pointed at
alog.md numbering, not at any anchor in Component-AuditLog.md (which uses
prose subsection titles). Switch to the prose anchor (Ingestion Paths →
Telemetry forward) so the link resolves.
Code-review feedback on c334de0:
- Ingestion Paths intro said 'Three write paths' but the section has four
subsections (site hot-path append + 3 central writers). Reword to 'Four
paths feed the central AuditLog -- one site originator and three central
writers'.
- Purpose: 'dashboards plus drilldowns plus filter queries' read awkwardly;
switch to standard comma list.