Adds a "Dev Disable-Login Flag" subsection to Component-Security.md covering
ScadaBridge:Security:Auth:DisableLogin / User, the AutoLoginAuthenticationHandler
mechanism, and the no-environment-guard / startup-warning production risk.
Ships DisableLogin: false under ScadaBridge → Security → Auth in:
- src/.../Host/appsettings.json (canonical default)
- docker/central-node-a/appsettings.Central.json
- docker/central-node-b/appsettings.Central.json
Also records DL-3 commit SHAs in the plan tasks file.
Faithful copy (warning only, no env guard); custom AuthenticationHandler under the
cookie scheme; reuses M2.19 SessionClaimBuilder for an all-roles system-wide principal.
Spike outcome: the shared ILdapAuthService (ZB.MOM.WW.Auth.Abstractions, an external
NuGet package) exposes ONLY AuthenticateAsync(username, password, ct) — no passwordless
service-account group-search. A live LDAP group re-query for an active session therefore
requires a new lib method and is OUT OF SCOPE (cannot modify the external package).
Implemented the always-achievable layers (cookie-only; no embedded JWT for cookie principals):
- /auth/login now stores the user's raw LDAP groups (one zb:group claim each) plus a
zb:lastrolerefresh anchor (login time, UTC), seeding the LastActivity idle anchor too.
- SessionClaimBuilder: single shared DRY claim-builder used by BOTH /auth/login AND the
refresh path, so the two claim shapes cannot drift (canonical identity/role/scope claims
with nameType/roleType pinned, plus the M2.19 group + refresh-anchor additions).
- CookieSessionValidator (TimeProvider-injected, unit-testable) + a thin
CookieAuthenticationEvents.OnValidatePrincipal adapter:
* idle-timeout: a session past IdleTimeoutMinutes (default 30) is RejectPrincipal+SignOut;
consistent with the cookie ExpireTimeSpan+SlidingExpiration window (same value).
* role refresh WITHOUT LDAP: when older than RoleRefreshThresholdMinutes (new option,
default 15) the DB-backed RoleMapper re-runs on the STORED groups, claims are rebuilt
via the shared builder, the anchor advances, principal is replaced + cookie renewed.
Revoked DB mappings drop the user's roles mid-session.
* fail-soft: any refresh error KEEPS the existing principal (no sign-out, never throws)
— mirrors the documented "LDAP failure: active sessions continue with current roles".
- Documented residual limitation in Component-Security.md: central role-mapping/scope
changes apply within ~15 min without LDAP; live directory group-membership changes are
picked up only at next login (needs a passwordless group-search on the external
ZB.MOM.WW.Auth.Ldap lib — tracked follow-up).
Tests (Security.Tests, all green): CookieSessionValidatorTests + SessionClaimBuilderParityTests
— idle reject/keep, LDAP-free remap-from-stored-groups, revoked-roles loss, sub-threshold
no-refresh, refresh-throws-keeps-session, and login/refresh claim-parity.
Re-architect DebugStreamBridgeActor from snapshot-first to stream-first so no
attribute/alarm event occurring during the snapshot-build + network-transit
window is lost (#26).
Lifecycle change:
- PreStart now opens the gRPC subscription FIRST (alongside sending the
SubscribeDebugViewRequest), so live events start flowing immediately.
- Phase model via a single _snapshotDelivered flag (mutated only on the actor
thread). While buffering (snapshot not yet delivered), AttributeValueChanged/
AlarmStateChanged are appended to an ordered _preSnapshotBuffer instead of
being delivered. After snapshot+flush, the same handlers pass through directly.
- On DebugViewSnapshot: deliver snapshot, then flush the buffer in arrival order
with per-entity dedup, then set _snapshotDelivered=true (pass-through).
Dedup rule (exactly-once):
- Identity: attributes by (InstanceUniqueName, AttributePath, AttributeName);
alarms by (InstanceUniqueName, AlarmName, SourceReference) so native
per-condition alarms are not conflated. Keys joined with a NUL delimiter
(declared as an escaped char constant; no raw NUL in source) so distinct
identities never collide on a space within a name.
- Boundary: a buffered event whose timestamp is <= the snapshot's timestamp for
the same entity is already reflected -> DROP; strictly-newer (>) -> DELIVER;
entity absent from the snapshot -> DELIVER (genuine gap-window event).
Preserved paths:
- M2.11 InstanceNotFound: with stream-first the gRPC stream is already open, so
the not-found path now tears it down (CleanupGrpc) + clears the buffer, does
NOT enter pass-through, delivers the not-found snapshot, and stops cleanly.
- Reconnect (ReconnectGrpcStream -> OpenGrpcStream) does not touch the phase
flag: a mid-session reconnect resumes pass-through; a reconnect during the
buffering phase stays buffering until the snapshot arrives.
- Communication-008 retry/stability/stop/terminate + ReceiveTimeout orphan net
unchanged. Duplicate/late snapshot after delivery is ignored defensively.
Tests: 10 new M2.18 tests (stream-first ordering, gap-window buffering, dedup
drop/deliver for attrs + alarms, ordering, pass-through, InstanceNotFound
teardown, reconnect-during-buffering, reconnect-after-snapshot) + revised the
M2.11 not-found test to assert stream teardown. Full DebugStreamBridgeActor
class green: 23/23.
REQ-HOST-4a lists "required cluster singletons running (if applicable)" as a
readiness criterion, but /health/ready only checked database + akka-cluster.
Add a third Ready-tagged check, RequiredSingletonsHealthCheck, registered in the
Central-role AddHealthChecks() chain (so it is naturally role-scoped — site nodes
never run it).
Probe: for each required central singleton, Ask its local ClusterSingletonProxy
an Identify with a short bounded per-singleton timeout (~2s, probes run
concurrently via Task.WhenAll). A non-null ActorIdentity.Subject within the
timeout means the singleton is running and reachable through the proxy; a null
subject or a timeout means unreachable → Unhealthy, naming the unreachable
singleton(s). The check never throws (catch-all → Unhealthy) and resolves
ActorSystem lazily from DI per probe (Unhealthy if Akka not yet up).
Required-always set = the five singleton proxies created unconditionally in
AkkaHostedService.RegisterCentralActors: notification-outbox, audit-log-ingest,
site-call-audit, audit-log-purge, site-audit-reconciliation. There are no
feature/config-gated central singletons today; any future gated singleton is the
"if applicable" case and must NOT be added to the required set.
Leadership-agnostic: the proxy reaches the singleton from either central node, so
a ready standby still reports ready (readiness must not require cluster
leadership — that is the Active tier's job). During a brief singleton handover the
probe may time out and the node flaps to not-ready, which is correct (a node
mid-handover is legitimately not fully ready); no retries, to keep the probe fast.
Tests (TDD): RequiredSingletonsHealthCheckTests exercises the probe against a
TestKit ActorSystem — all proxies present+reachable → Healthy; one missing →
Unhealthy naming it; ActorSystem absent → Unhealthy, no throw. HealthCheckTests
regression-guards the Ready tag + absence of the Active tag on the new check.
OPC UA (RealOpcUaClient):
- Append 5 new SelectClauses at indices 13–17 (never renumber 0–12):
- 13: AlarmConditionType/ActiveState/TransitionTime → OriginalRaiseTime
- 14–17: LimitAlarmType HighHighLimit/HighLimit/LowLimit/LowLowLimit → LimitValue
- New OpcUaAlarmMapper.PickLimitValue helper: first non-null in HiHi→Hi→Lo→LoLo
priority order, InvariantCulture-formatted; empty string for non-limit alarm types.
- HandleAlarmEvent reads new indices with fields.Count > N guards; hard minimum (6)
unchanged so base ConditionType events still process without the limit fields.
- Document unavailable-by-protocol fields (Category, Description, OperatorUser,
CurrentValue) inline in BuildAlarmEventFilter and HandleAlarmEvent.
MxGateway (MxGatewayAlarmMapper):
- MapTransition: CurrentValue and LimitValue now populated via MxValueToString
(uses MxValueExtensions.ToClrValue + InvariantCulture) from OnAlarmTransitionEvent
proto fields current_value/limit_value.
- MapSnapshot: same — populated from ActiveAlarmSnapshot.current_value/limit_value.
- MxValueToString helper (internal): null-safe MxValue → string conversion.
Tests (17 new, 40 total pass):
- OpcUaAlarmMapperTests: PickLimitValue priority, InvariantCulture, all-null case.
- MxGatewayAlarmMapperTests: CurrentValue/LimitValue populate from double/string
MxValue; absent fields yield empty strings.
- RealOpcUaClientAlarmFilterTests: index alignment assertions (count=18, per-index
TypeDefinitionId+BrowsePath), regression guard on existing indices 0–12.
Adds AuditLogAppendOnlyGuardTests.cs to
tests/ZB.MOM.WW.ScadaBridge.ConfigurationDatabase.Tests/ — a code-level backstop
for the DB-role DENY UPDATE / DENY DELETE control established in migration
20260602174346_CollapseAuditLogToCanonical.
The guard scans every non-Designer, non-Snapshot *.cs file in the
ConfigurationDatabase source tree and fails the test run if any line matches the
DML-syntax pattern:
UPDATE\s+(?:dbo\.)?AuditLog\b
DELETE\s+(?:FROM\s+)?(?:dbo\.)?AuditLog\b
The tight DML-syntax pattern naturally excludes false positives without extra
exclusion checks: DENY UPDATE ON dbo.AuditLog is not matched (UPDATE is followed
by ON, not the table name); ALTER TABLE … SWITCH and TRUNCATE contain no UPDATE/
DELETE keyword; comments with UPDATE/AuditLog in separate clauses are not matched.
Self-verifying unit tests (ContainsAuditLogMutation_*) prove the helper:
- returns false on clean-source lines (INSERT, SELECT, DENY DDL, ALTER SWITCH,
TRUNCATE, DELETE FROM Notifications);
- returns TRUE on planted violations (UPDATE AuditLog SET …, DELETE FROM
dbo.AuditLog WHERE …, lower-case variants);
- returns false on the exact DENY/GRANT/partition-switch strings from the
production migration files.
All 256 ConfigurationDatabase.Tests pass; solution builds 0 W / 0 E.
Pre-deployment validation only WARNED when a data-sourced attribute had no
connection binding, so an instance with unresolved bindings still passed IsValid
and could deploy. There was also no check that a binding resolves to a connection
that actually exists at the target site.
- ValidationService.Validate gains an opt-in `enforceConnectionBindings` flag
(default false) plus a `siteConnectionNames` set. Default-false keeps the
template DESIGN-TIME path (ManagementActor.HandleValidateTemplate) non-blocking,
since bindings are legitimately set later at instance/deploy time. The DEPLOY
path (FlatteningPipeline) opts in (true) so:
* a data-sourced attribute with no binding is now a deploy-gating Error;
* a binding to a connection that does not exist on the target site is an Error.
Static (non-data-sourced) attributes are never flagged.
- FlatteningPipeline computes the site-connection-names set from the loaded site
data connections (mirroring M2.1's alarmCapableConnectionNames) and threads it in.
- Tests: TemplateEngine.Tests covers design-time warning / deploy-time error /
static-ok / exists-at-site / non-existent-connection. New
FlatteningPipelineConnectionBindingTests proves the deploy path enforces it.
Mark M2.7 + M2.8 completed in the plan task tracker.
Object/List parameters and return values were shape-validated only (object vs
array), with no field-level/nested type checks — type-wrong nested data passed
inbound validation and failed only at script runtime. Add recursive type
validation (declared Object field types, List element type, scalars at any depth)
with path-qualified errors, symmetric across ParameterValidator and ReturnValueValidator.
Both validators now parse the canonical JSON Schema definition format (the
Central UI / MigrateParametersToJsonSchema output) via a shared recursive engine,
Commons.Types.InboundApi.InboundApiSchema, instead of the legacy flat
[{name,type}] array which they could not even deserialize from migrated rows.
The legacy flat-array form is still accepted on read for transition safety.
Undeclared fields are rejected at every level (consistent with the existing
top-level unexpected-parameter rejection); a present-but-null value satisfies
any type, only absence of a required field is an error.
Bite-sized TDD plan. M1 (runtime wiring) fully detailed across 10 tasks
after verifying the purge/reconciliation actors already exist and only
need Host wiring + a gRPC pull client + event-logger injection. M2/M3/M4
as right-sized task inventories with files, classification, and AC.
Co-located .tasks.json for executing-plans resume.
Captures the 2026-06-05 coverage audit's gaps and the approved approach for
closing them: ephemeral CLI-provisioned fixtures with outcome-tolerant asserts
for the mutating suites (deploy lifecycle, retry/discard, transport import),
UI CRUD round-trips, nav render hardening, a Health KPI load test, and a
standardized skip-and-log policy. Next: writing-plans turns this into tasks.
Brainstormed design: generate 25 StyleGuide-conformant developer-reference
docs derived from src/ code (pilot AuditLog, then parallel fan-out, then
accuracy/conformance verification). Complements the requirements specs;
leaves src/, XML docs, and specs untouched.
Gitea renders mermaid inline, so the flow/state/hierarchy/DAG diagrams
move to text-in-markdown: auto-layout (removes the manual overlap-prone
draw.io step), diffable source, no committed binaries, and a dark-text
theme so labels stay legible. Keep draw.io PNGs only for the two complex
bespoke diagrams (logical architecture, env2 topology) where pixel
control still wins. All 24 mermaid blocks validated by rendering.
Replace ASCII-art diagrams across the README and docs/ with editable
.drawio sources plus exported PNGs, so the diagrams render clearly in
rendered markdown and can be maintained/regenerated instead of being
hand-edited as fragile text art. Non-diagram blocks (code, folder
trees, UI wireframes) were left as text.
Renames the 13 SCADALINK_* runtime env vars → SCADABRIDGE_*, the ScadaLink__
.NET config keys → ScadaBridge__, the stale ScadaLink.Host.exe assembly name
→ ZB.MOM.WW.ScadaBridge.Host.exe, the scadalink_app SQL login → scadabridge_app,
and residual identifiers/comments/docs. Migration records (prior rename
tooling/design, DB-rename helper, this scrub script) carved out.
Adds tools/scrub-scadalink-refs.sh.