Brainstormed design: generate 25 StyleGuide-conformant developer-reference
docs derived from src/ code (pilot AuditLog, then parallel fan-out, then
accuracy/conformance verification). Complements the requirements specs;
leaves src/, XML docs, and specs untouched.
Replace dc=scadabridge,dc=local with dc=zb,dc=local in all dev/test LDAP
references — app config, docker test-cluster node configs (docker/ and
docker-env2/), GLAuth fixture, dev tooling, Host.Tests fixtures,
IntegrationTests factory, and operational test_infra docs. OU structure
(ou=SCADA-Admins,ou=users,etc.) preserved throughout. Email domains
(@scadabridge.local), hostnames, and container names are untouched.
Historical plan docs (2026-05-24-second-environment.md,
2026-05-31-folder-repo-rename-scadabridge-design.md) excluded as
point-in-time records. No synthetic dc=example,dc=com placeholders touched.
Gitea renders mermaid inline, so the flow/state/hierarchy/DAG diagrams
move to text-in-markdown: auto-layout (removes the manual overlap-prone
draw.io step), diffable source, no committed binaries, and a dark-text
theme so labels stay legible. Keep draw.io PNGs only for the two complex
bespoke diagrams (logical architecture, env2 topology) where pixel
control still wins. All 24 mermaid blocks validated by rendering.
Add explicit dark text color (per-class color + base theme override) to
the store-and-forward mermaid diagram so node/edge labels read clearly
regardless of gitea's page theme.
Swap the store-and-forward Message Lifecycle PNG embed for an inline
mermaid block to verify whether gitea renders mermaid in markdown. If it
does, the standard flow/state/hierarchy diagrams can move to inline
mermaid (text-only, auto-layout) instead of draw.io source + PNG.
Replace ASCII-art diagrams across the README and docs/ with editable
.drawio sources plus exported PNGs, so the diagrams render clearly in
rendered markdown and can be maintained/regenerated instead of being
hand-edited as fragile text art. Non-diagram blocks (code, folder
trees, UI wireframes) were left as text.
Renames the 13 SCADALINK_* runtime env vars → SCADABRIDGE_*, the ScadaLink__
.NET config keys → ScadaBridge__, the stale ScadaLink.Host.exe assembly name
→ ZB.MOM.WW.ScadaBridge.Host.exe, the scadalink_app SQL login → scadabridge_app,
and residual identifiers/comments/docs. Migration records (prior rename
tooling/design, DB-rename helper, this scrub script) carved out.
Adds tools/scrub-scadalink-refs.sh.
The native alarms feature merged with 7 component docs updated, but the
spec layer drifted: HighLevelReqs, Commons, and ManagementService had no
native-alarm coverage and the README table flagged it on only one row.
Add HighLevelReqs §3.4.2 (+ validation), document the Commons
types/entities/messages and the 7 ManagementService commands, sync the
README rows + link the TreeView sub-component, fix 2 broken plan links,
and drop the one-off native-alarms RESUME scratchpad.
Read-only mirror of native alarm sources into a unified A&C-style state
model (severity + active/acked/shelved/suppressed). Instance-bound source
discovery, site-only SQLite state with live central query (no central
tables), DebugView enrichment. OPC UA A&C events + ConditionRefresh and
MxGateway session-less StreamAlarms via a new IAlarmSubscribableConnection
seam routed connection-level by source reference; new NativeAlarmActor peer
to computed AlarmActor.
Expanding a Galaxy object in the tag picker hung on "loading…": the browse
reply inlined every child's full attribute set (~152 KB), exceeding Akka's
128 KB remote frame, and remoting silently discarded the oversized reply.
Browse path (DataConnectionLayer):
- RealMxGatewayClient: navigation now uses BrowseChildren(include_attributes=
false) — child objects only — and an object's own attributes load lazily via
DiscoverHierarchy(root, max_depth=0) when it's expanded. Payload drops from
~152 KB/level to a few KB. Seam contract unchanged.
- DataConnectionActor.CapBrowseChildren: protocol-agnostic byte-budget cap
(~100 KB) on every BrowseNodeResult before it crosses the site→central
frame, OR-ing the adapter's own Truncated flag. Byte budget, not a count —
the only bound that holds regardless of NodeId/attribute-name length.
- RealOpcUaClient: requestedMaxReferencesPerNode 1000 → 500 to narrow the
window before the byte budget applies.
- Graceful gRPC Unimplemented handling → NotSupportedException →
BrowseFailureKind.NotBrowsable with an actionable message (older gateway
builds lacking BrowseChildren).
Picker UI (CentralUI):
- NodeBrowserDialog: modal-lg → modal-xl; new scoped .razor.css caps the tree
at 55vh with its own scrollbar so manual entry + Select/Cancel stay visible.
- Protocol-agnostic failure messages (was hardcoded "OPC UA …"); renamed the
leftover opcua-browser-tree class to node-browser-tree.
Tests: new frame-budget cap test + NotSupported=>NotBrowsable mapping test;
DCL suite 88/88. Doc: Component-DataConnectionLayer.md records the lazy
attribute-light browse and the frame-size guard.
Adds MxGateway under Supported Protocols, an MxGateway Settings config table,
notes IBrowsableDataConnection now backs both protocols via BrowseNodeCommand/
BrowseService, and updates the README component table.
Add design doc for a second data-connection protocol, MxGateway, alongside
the OPC UA client. New IDataConnection adapter behind the existing
DataConnectionFactory extension point; tag pipe (read/subscribe/write) plus
Galaxy hierarchy browse, optional 2nd endpoint for failover. Generalizes the
OPC-UA-named browse plumbing to protocol-agnostic browse via
IBrowsableDataConnection. No entity/schema changes.
Five phases, PR-shippable per phase: schema/contracts, DCL browse capability,
flattening uses override, Central UI popup + integration, docs. Per-task
classification, time estimates, and parallelism declared.
Per-instance address override + live ClusterClient-based browse via a new
IBrowsableDataConnection capability on RealOpcUaClient. Lazy-loaded tree
with manual-paste fallback; offline-safe.
Decisions: full prefix in csproj names + namespaces, full runtime
artifact rename (containers/network/DBs), staged commits on main,
in-place MS SQL DB rename, wipe site SQLite on cutover.
Final themed batch. 5 well-localised correctness fixes.
Serialisation precision:
- ESG-020: DatabaseGateway.JsonElementToParameterValue probes
TryGetInt64 → TryGetDecimal → GetDouble, so a script's high-precision
decimal SQL parameter survives the cached-write retry round-trip
without silent precision loss. 3 new regression tests.
Template engine correctness:
- TE-018: DiffService gains ComputeConnectionsDiff over
FlattenedConfiguration.Connections, mirroring the existing entity-diff
shape and pairing with the Theme 1 TE-017 hash-coverage fix. A
ConfigurationDiff record extension in Commons is flagged as a follow-up.
- TE-019: TemplateResolver.BuildInheritanceChain now walks via the
int? ParentTemplateId directly — only null means "no parent". A real
Id of 0 (the prior special-cased sentinel) now walks the chain like
any other node, matching the TemplateEngine-013 CycleDetector fix.
Regression of TE-013 closed.
- TE-020: All 5 Create* paths in TemplateService + SharedScriptService
re-ordered to save-first → log-with-real-Id → save-audit (matching
the InstanceService pattern). Create* audit rows no longer carry a
literal "0" EntityId.
Doc deferral:
- Transport-012: Component-Transport.md §Audit Trail now spells out that
the BundleImportId repository filter IS wired (in CentralUiRepository),
but the Audit-Log-Viewer UI dropdown + summary-row hyperlink are a
deferred CentralUI follow-up. CLI workaround documented
(audit query --bundle-import-id).
11+ new regression tests (3 ESG, 4 DiffService, 3 TemplateResolver, 4
TemplateService, 1 SharedScriptService). Build clean; ESG 72/72,
TemplateEngine 324/324. README regenerated: 1 pending of 481 total.
Session-to-date: 135 of 136 originally-open Theme findings closed
across 10 themes in 10 commits.
The largest themed batch — small mechanical fixes across 11 modules.
API / message hygiene:
- Comm-020: SiteAddressCacheLoaded now carries IReadOnlyDictionary /
IReadOnlyList — Akka messages must be immutable.
- Commons-016: BundleSession.MaxUnlockAttempts named constant replaces
magic 3.
- Commons-018: IOperationTrackingStore + IPartitionMaintenance moved from
Interfaces/ root to Interfaces/Services/ (namespace preserved — 9
consumers exceeded the in-prompt move threshold).
- Commons-023: TrackingStatusSnapshot.SourceNode now consistent with the
trailing-optional-with-default pattern used elsewhere.
- SR-022: AuditingDbCommand.DbConnection.set no longer uses reflection —
exposes AuditingDbConnection.Inner via internal API surface.
Dead code / config cleanup:
- ClusterInfra-011: decorative SectionName constant deleted.
- ClusterInfra-014: dead AddClusterInfrastructureActors method + its
"throws-when-called" test deleted.
- Host-021: Microsoft Logging:LogLevel block deleted from appsettings.json
(dead under Serilog).
Fail-loud over fail-silent:
- DM-021: ResolveSiteIdentifierAsync throws on missing site (was silently
substituting a DB id).
- DM-022: dropped transient Pending write — record now lands directly in
InProgress (no UI flicker, one fewer DB write).
- Host-020: LoggerConfigurationFactory emits a Console.Error warning when
both Serilog:MinimumLevel and ScadaLink:Logging:MinimumLevel are set
(ScadaLink remains truth per Host-011).
- SnF-022: NotifyCachedCallObserverAsync logs Warning on unparseable
TrackedOperationId (was silently dropping).
- SnF-023: empty siteId default replaced with $unknown-site sentinel
+ constructor normalisation.
Correctness:
- SCA-001: SupervisorStrategy XML rewritten to match actual
DefaultDecider/Restart semantics (was claiming Resume).
- SCA-003: OnUpsertAsync now restamps IngestedAtUtc on every upsert.
- SR-021: HandleDeployArtifacts now dispatches an internal
ApplyArtifactDataConnectionsToDcl message after the SQLite write so
system-wide artifact-deploy data-connection changes go live
immediately (was requiring a site restart).
- SnF-020: RetryParkedMessageAsync captures the parked row BEFORE the
local write so a concurrent delete can't skip standby replication.
Sentinels / naming collisions:
- HM-021: CentralSiteId changed from "central" to "$central"
(uncollideable — leading $ is forbidden in real SiteIdentifiers).
Doc / surface cleanups:
- SEL-018: FailedWriteCount promoted to ISiteEventLogger; XML softened
to "Available for future Health Monitoring integration".
- SnF-019: VERIFY outcome — documented parking-after-DefaultMaxRetries
in Component-StoreAndForward.md + DefaultMaxRetries XML (uniform
cap; maxRetries:0 is the unbounded escape hatch).
- SnF-021: Component-StoreAndForward.md no longer claims the tracking
table lives in SnF — it's in SiteRuntime, the interface is in Commons.
- CLI-020: bundle export response parse guarded with try/catch on
JsonException / KeyNotFoundException / FormatException — emits a
clean INVALID_RESPONSE exit instead of a stack trace.
Config:
- ClusterInfra-013: intent comment added to "catastrophic config" test.
- Host-016: appsettings.Site.json second CentralContactPoints entry
removed (was pointing at the SITE's own port); doc-key explains how
to extend.
- Host-018: NodeName added to both shipped per-role configs (was
causing SourceNode to be null on audit rows).
UI:
- CentralUI-029: replaced JS.InvokeAsync<int>("eval", …) with an ES
module import (new wwwroot/js/browser-time.js).
- CentralUI-032: AuditResultsGrid gains a Previous button backed by a
cursor stack.
10+ new regression tests across the affected projects. Build clean;
all suites green. README regenerated: 6 open (was 33).
Session-to-date: 130 of 136 originally-open Theme findings closed.
Comm-016: delete dead HandleConnectionStateChanged + _debugSubscriptions /
_inProgressDeployments tracking + ConnectionStateChanged message record.
Disconnect detection is owned by the transport layers (gRPC keepalive PING
~25s; Ask-timeout at CommunicationService). Updates the
Component-Communication.md design doc to make that explicit.
SnF-018: NotificationForwarder.DeliverAsync now discards a corrupt buffered
payload (Warning log + return true) instead of returning false and parking
the row — honoring the design's "notifications do not park" invariant.
DM-018: reconciliation no longer force-sets Enabled, preserving an
intentional Disabled state after central failover.
ESG-018: DeliverBufferedAsync (both ExternalSystemClient + DatabaseGateway)
catches JsonException and returns false, turning a corrupt buffered row
into a parked operation instead of a retry-forever poison message.
InboundAPI-022: register ActiveNodeGate as IActiveNodeGate in the Central
DI branch so standby-node gating is actually wired up in production.
NS-019: remove orphaned NotificationDeliveryService /
INotificationDeliveryService / NotificationResult; central notification
delivery now lives entirely in NotificationOutbox.
SEL-016: normalise From/To filters to UTC before ISO-string compare so
non-UTC DateTimeOffset clients no longer get spuriously excluded events.
TE-017: include Description on attributes/alarms and a HashableConnections
projection (protocol, endpoint JSON, failover count) in the revision hash
and DiffService; staleness detection now catches description-only and
connection-endpoint edits.
Transport-001 and Transport-002 (also High) remain Open — they're being
handled in a follow-up batch because both touch BundleImporter.cs and
must serialise.
Reflect this session's implementation work in the Transport (#24)
component spec:
- New 'CLI' section covering bundle export / preview / import
commands, the base64-over-JSON wire format, the 200 MB request-body
cap, and the 5-minute per-command timeout. Authorization table +
Interactions section updated to mention ManagementActor handlers.
- Import wizard nav placement corrected from Design to Admin (already
the case in code; the spec lagged).
- Blocker-scan heuristic boundaries documented under Import Flow:
the '.' skip, the DataSourceReference exclusion, and the
KnownNonReferenceNames denylist. Both DetectBlockersAsync and
RunSemanticValidationAsync Pass 1 share the filter.