List values now encode as native-typed JSON ([10,20], [true,false], ISO dates;
strings stay quoted) via AttributeValueCodec; Decode reads both native and the
earlier array-of-strings form for every element type. Already-persisted old-form
data is normalized on the fly: idempotent central startup normalizer
(ListValueNormalizer), active site-SQLite normalization on InstanceActor
override-load, and normalize-on-import in the bundle importer. Instance-override
writes now stamp ElementDataType (#93/M3). Full solution 0/0; feature-targeted
tests green. Plan: docs/plans/2026-06-16-native-typed-json.md.
Set existingOverride.ElementDataType and newOverride.ElementDataType from
templateAttr.ElementDataType in both the update and create branches of
SetAttributeOverrideAsync, so the persisted InstanceAttributeOverride row
always carries the element type for later central normalizer use (#93/M3).
DisableLogin only swapped the cookie auth scheme (AutoLoginAuthenticationHandler),
which covers the interactive UI. The CLI authenticates POST /management, the audit
REST endpoints, and the SignalR debug-stream hub with HTTP Basic, and each ran its
own hardcoded Basic->LDAP check that ignored DisableLogin. In a login-disabled (e.g.
no-LDAP) deployment that locked the CLI out: every call returned 401 AUTH_FAILED.
Add ManagementAuthenticator, which centralizes the management/CLI auth flow:
when ScadaBridge:Security:Auth:DisableLogin is true it synthesizes the same dev
principal as AutoLoginAuthenticationHandler (configured user, all roles, system-wide)
and bypasses Basic->LDAP; otherwise the unchanged Basic->LDAP flow runs. Wired into
ManagementEndpoints (delegates), AuditEndpoints (delegates), and DebugStreamHub
(bypass branch). +6 unit tests; ManagementService.Tests green (140).
When overriding a List attribute, render the shared AttributeListEditor
(whole-list replacement; element type fixed by the base, shown read-only via
ShowElementType=false) instead of the single-line input. Loading an existing
override decodes its JSON into rows (malformed -> empty); saving encodes rows to
canonical JSON with a pre-submit Decode round-trip guard surfacing element
errors inline. Clearing removes the InstanceAttributeOverride row
(repository-direct, mirroring native-alarm-source overrides). Non-List override
UX unchanged.
Replace value?.ToString() with AttributeValueCodec.Encode(value) in
AttributeAccessor indexer set and SetAsync, so a List<string>{"a","b"}
encodes to ["a","b"] instead of the garbage ToString representation.
Add using ZB.MOM.WW.ScadaBridge.Commons.Types. Tests verify the codec
contract (list→JSON array, scalar passthrough, null); full round-trip
through the accessor is not viable without a live Akka ActorSystem —
noted in-test with explanation.
Replace ValueFormatter.FormatDisplayValue with AttributeValueCodec.Encode
in StreamRelayActor so List<T> attribute values cross the gRPC wire as a
JSON array (e.g. ["a","b"]) rather than a comma-joined display string.
Scalars and null values are unaffected. Tests cover List→JSON, scalar
string pass-through, and null→empty-string.
- MV-2: guard unsupported element type before parse (no misleading re-wrap); add Float round-trip test
- MV-4: carry ElementDataType through the two validation-flatten ResolvedAttribute sites (ManagementActor.HandleValidateTemplate, BundleImporter.BuildFlattenedConfigForValidation) so MV-5 validation sees element type via every entry point
- MV-12: include ElementDataType in TemplateAttribute add/update audit payloads + fix stale docstring
Add DataType? ElementDataType to TemplateAttributeDto (optional, default null
for backward-compat with old bundles). Map it in both directions in
EntitySerializer (export + FromBundleContent) and in all three
TemplateAttribute construction sites in BundleImporter (BuildTemplate,
SyncTemplateAttributesAsync add-path, and SyncTemplateAttributesAsync
update-path including change-detection). Two new round-trip tests in
EntitySerializerTests confirm List attributes survive export→import and that
old DTOs with null ElementDataType import cleanly.
- Add WhitespaceAuthorization_ValidXApiKey_Returns200: pins the IsNullOrWhiteSpace
fall-through — a present-but-blank Authorization header is treated as absent so a
valid X-API-Key still authenticates (200).
- Remove MissingBearer_Returns401 (added in 510559e): identical path to
NeitherHeader_Returns401 (no Authorization + no X-API-Key → 401); keep the
descriptively-named NeitherHeader variant.
- Change "legacy 'X-API-Key'" -> "alternate 'X-API-Key'" in EndpointExtensions.cs and
the BuildPostWithApiKeyHeader/HappyPath doc comments to avoid implying Bearer is
the older transport (Bearer was itself introduced by the prior auth re-arch).
- Add SecurityOptionsValidator (IValidateOptions<SecurityOptions>) enforcing
RoleRefreshThresholdMinutes < IdleTimeoutMinutes; registered with ValidateOnStart in
AddSecurity — startup FAILS if threshold >= idle, so the invariant cannot be silently
misconfigured away.
- Update SecurityOptions XML-docs: class-level summary distinguishes JWT Bearer path
(JwtSigningKey/JwtExpiryMinutes) from Blazor cookie session path (IdleTimeoutMinutes/
RoleRefreshThresholdMinutes); both time fields document the ~45-min effective idle window
and the new cross-field constraint.
- Remove dead jwtService variable from /auth/login lambda in AuthEndpoints.cs (resolved
but never used since login moved to SessionClaimBuilder).
- Extract ApplyValidationResultAsync helper from OnValidatePrincipalAsync (pure
decision-application step); add 3 adapter tests covering Reject → RejectPrincipal +
SignOutAsync; Replace → ReplacePrincipal + ShouldRenew; Keep → no-op.
- Fix inaccurate TryRefreshAsync comment (dropped "OR last-activity needs advancing" —
the code only returns non-null when roleRefreshDue).
- Add InternalsVisibleTo for Security.Tests in Security.csproj.
- Add IsRoleRefreshDue tests: missing claim → due; unparsable claim → due; plus integration
test covering the full ValidateAsync path for a principal missing zb:lastrolerefresh
(triggers refresh + re-stamps anchor rather than keeping stale principal forever).
- Add SecurityOptionsValidatorConfigGuardTests: default succeeds; equal fails; greater fails;
boundary (idle-1) succeeds; wiring confirmed via AddSecurity container.
Spike outcome: the shared ILdapAuthService (ZB.MOM.WW.Auth.Abstractions, an external
NuGet package) exposes ONLY AuthenticateAsync(username, password, ct) — no passwordless
service-account group-search. A live LDAP group re-query for an active session therefore
requires a new lib method and is OUT OF SCOPE (cannot modify the external package).
Implemented the always-achievable layers (cookie-only; no embedded JWT for cookie principals):
- /auth/login now stores the user's raw LDAP groups (one zb:group claim each) plus a
zb:lastrolerefresh anchor (login time, UTC), seeding the LastActivity idle anchor too.
- SessionClaimBuilder: single shared DRY claim-builder used by BOTH /auth/login AND the
refresh path, so the two claim shapes cannot drift (canonical identity/role/scope claims
with nameType/roleType pinned, plus the M2.19 group + refresh-anchor additions).
- CookieSessionValidator (TimeProvider-injected, unit-testable) + a thin
CookieAuthenticationEvents.OnValidatePrincipal adapter:
* idle-timeout: a session past IdleTimeoutMinutes (default 30) is RejectPrincipal+SignOut;
consistent with the cookie ExpireTimeSpan+SlidingExpiration window (same value).
* role refresh WITHOUT LDAP: when older than RoleRefreshThresholdMinutes (new option,
default 15) the DB-backed RoleMapper re-runs on the STORED groups, claims are rebuilt
via the shared builder, the anchor advances, principal is replaced + cookie renewed.
Revoked DB mappings drop the user's roles mid-session.
* fail-soft: any refresh error KEEPS the existing principal (no sign-out, never throws)
— mirrors the documented "LDAP failure: active sessions continue with current roles".
- Documented residual limitation in Component-Security.md: central role-mapping/scope
changes apply within ~15 min without LDAP; live directory group-membership changes are
picked up only at next login (needs a passwordless group-search on the external
ZB.MOM.WW.Auth.Ldap lib — tracked follow-up).
Tests (Security.Tests, all green): CookieSessionValidatorTests + SessionClaimBuilderParityTests
— idle reject/keep, LDAP-free remap-from-stored-groups, revoked-roles loss, sub-threshold
no-refresh, refresh-throws-keeps-session, and login/refresh claim-parity.
- MockSiteStreamGrpcClient.SubscribeCalls and UnsubscribedCorrelationIds
switched from bare List<T> to lock-guarded backing fields with snapshot
accessors, eliminating the actor-thread/test-thread data race (matches
the existing lock(events) pattern for ReceivedEvents)
- AttributeKey and AlarmKey null-guard each component with ?? string.Empty
so a null SourceReference/AlarmName/etc. cannot silently collide with an
empty-string component in the dedup dictionary
- On_Snapshot_Opens_GrpcStream renamed to
On_Snapshot_Does_Not_Open_Additional_GrpcStream; assertion updated to
confirm exactly one subscribe (the PreStart stream-first open) with no
second subscribe after snapshot delivery
- _stopped ordering in InstanceNotFound path moved after CleanupGrpc()
for consistency with DebugStreamTerminated and ReceiveTimeout handlers
Re-architect DebugStreamBridgeActor from snapshot-first to stream-first so no
attribute/alarm event occurring during the snapshot-build + network-transit
window is lost (#26).
Lifecycle change:
- PreStart now opens the gRPC subscription FIRST (alongside sending the
SubscribeDebugViewRequest), so live events start flowing immediately.
- Phase model via a single _snapshotDelivered flag (mutated only on the actor
thread). While buffering (snapshot not yet delivered), AttributeValueChanged/
AlarmStateChanged are appended to an ordered _preSnapshotBuffer instead of
being delivered. After snapshot+flush, the same handlers pass through directly.
- On DebugViewSnapshot: deliver snapshot, then flush the buffer in arrival order
with per-entity dedup, then set _snapshotDelivered=true (pass-through).
Dedup rule (exactly-once):
- Identity: attributes by (InstanceUniqueName, AttributePath, AttributeName);
alarms by (InstanceUniqueName, AlarmName, SourceReference) so native
per-condition alarms are not conflated. Keys joined with a NUL delimiter
(declared as an escaped char constant; no raw NUL in source) so distinct
identities never collide on a space within a name.
- Boundary: a buffered event whose timestamp is <= the snapshot's timestamp for
the same entity is already reflected -> DROP; strictly-newer (>) -> DELIVER;
entity absent from the snapshot -> DELIVER (genuine gap-window event).
Preserved paths:
- M2.11 InstanceNotFound: with stream-first the gRPC stream is already open, so
the not-found path now tears it down (CleanupGrpc) + clears the buffer, does
NOT enter pass-through, delivers the not-found snapshot, and stops cleanly.
- Reconnect (ReconnectGrpcStream -> OpenGrpcStream) does not touch the phase
flag: a mid-session reconnect resumes pass-through; a reconnect during the
buffering phase stays buffering until the snapshot arrives.
- Communication-008 retry/stability/stop/terminate + ReceiveTimeout orphan net
unchanged. Duplicate/late snapshot after delivery is ignored defensively.
Tests: 10 new M2.18 tests (stream-first ordering, gap-window buffering, dedup
drop/deliver for attrs + alarms, ordering, pass-through, InstanceNotFound
teardown, reconnect-during-buffering, reconnect-after-snapshot) + revised the
M2.11 not-found test to assert stream teardown. Full DebugStreamBridgeActor
class green: 23/23.
AddSiteEventLogHealthMetricsBridge registered via AddHostedService(factory-lambda),
which sets ImplementationFactory and leaves ImplementationType null. The prior
ImplementationType == guard was therefore silently dead — a second call would spin
up a second SiteEventLogFailureCountReporter. Fix: add a private
SiteEventLogHealthMetricsBridgeMarker singleton and guard on its ServiceType instead.
Also corrects the cycle-path comment in both ServiceCollectionExtensions.cs and
SiteEventLogFailureCountReporter.cs: StoreAndForward.csproj does reference
SiteEventLogging.csproj, so the transitive path HealthMonitoring → StoreAndForward →
SiteEventLogging is real, but adding a direct HealthMonitoring → SiteEventLogging
reference would NOT create a cycle (SiteEventLogging has no back-edge to HealthMonitoring).
The Func<long> seam is a coupling-avoidance measure, not a cycle-breaker.
Adds AddSiteEventLogHealthMetricsBridgeTests.AddSiteEventLogHealthMetricsBridge_IsIdempotent_DoesNotDoubleRegister_HostedService
as a regression test (builds provider and asserts exactly one reporter via GetServices<IHostedService>().OfType<T>()).
- Remove redundant linked CancellationTokenSource in ProbeAsync; pass the
framework cancellationToken and ProbeTimeout directly to Ask (the two-CTS
pattern was redundant — Ask already honours both the timeout and the token).
- Add EchoActor XML <remarks> explaining why no Receive<Identify> handler is
needed (ActorBase answers Identify automatically).
- Add PreCancelledToken_ReportsUnhealthy_DoesNotThrow test: verifies the
never-throws guarantee on the shutdown-race path (token already cancelled
before CheckHealthAsync is invoked).
REQ-HOST-4a lists "required cluster singletons running (if applicable)" as a
readiness criterion, but /health/ready only checked database + akka-cluster.
Add a third Ready-tagged check, RequiredSingletonsHealthCheck, registered in the
Central-role AddHealthChecks() chain (so it is naturally role-scoped — site nodes
never run it).
Probe: for each required central singleton, Ask its local ClusterSingletonProxy
an Identify with a short bounded per-singleton timeout (~2s, probes run
concurrently via Task.WhenAll). A non-null ActorIdentity.Subject within the
timeout means the singleton is running and reachable through the proxy; a null
subject or a timeout means unreachable → Unhealthy, naming the unreachable
singleton(s). The check never throws (catch-all → Unhealthy) and resolves
ActorSystem lazily from DI per probe (Unhealthy if Akka not yet up).
Required-always set = the five singleton proxies created unconditionally in
AkkaHostedService.RegisterCentralActors: notification-outbox, audit-log-ingest,
site-call-audit, audit-log-purge, site-audit-reconciliation. There are no
feature/config-gated central singletons today; any future gated singleton is the
"if applicable" case and must NOT be added to the required set.
Leadership-agnostic: the proxy reaches the singleton from either central node, so
a ready standby still reports ready (readiness must not require cluster
leadership — that is the Active tier's job). During a brief singleton handover the
probe may time out and the node flaps to not-ready, which is correct (a node
mid-handover is legitimately not fully ready); no retries, to keep the probe fast.
Tests (TDD): RequiredSingletonsHealthCheckTests exercises the probe against a
TestKit ActorSystem — all proxies present+reachable → Healthy; one missing →
Unhealthy naming it; ActorSystem absent → Unhealthy, no throw. HealthCheckTests
regression-guards the Ready tag + absence of the Active tag on the new check.
OPC UA (RealOpcUaClient):
- Append 5 new SelectClauses at indices 13–17 (never renumber 0–12):
- 13: AlarmConditionType/ActiveState/TransitionTime → OriginalRaiseTime
- 14–17: LimitAlarmType HighHighLimit/HighLimit/LowLimit/LowLowLimit → LimitValue
- New OpcUaAlarmMapper.PickLimitValue helper: first non-null in HiHi→Hi→Lo→LoLo
priority order, InvariantCulture-formatted; empty string for non-limit alarm types.
- HandleAlarmEvent reads new indices with fields.Count > N guards; hard minimum (6)
unchanged so base ConditionType events still process without the limit fields.
- Document unavailable-by-protocol fields (Category, Description, OperatorUser,
CurrentValue) inline in BuildAlarmEventFilter and HandleAlarmEvent.
MxGateway (MxGatewayAlarmMapper):
- MapTransition: CurrentValue and LimitValue now populated via MxValueToString
(uses MxValueExtensions.ToClrValue + InvariantCulture) from OnAlarmTransitionEvent
proto fields current_value/limit_value.
- MapSnapshot: same — populated from ActiveAlarmSnapshot.current_value/limit_value.
- MxValueToString helper (internal): null-safe MxValue → string conversion.
Tests (17 new, 40 total pass):
- OpcUaAlarmMapperTests: PickLimitValue priority, InvariantCulture, all-null case.
- MxGatewayAlarmMapperTests: CurrentValue/LimitValue populate from double/string
MxValue; absent fields yield empty strings.
- RealOpcUaClientAlarmFilterTests: index alignment assertions (count=18, per-index
TypeDefinitionId+BrowsePath), regression guard on existing indices 0–12.
Replace bare task-discard with ContinueWith(OnlyOnFaulted|ExecuteSynchronously) so a
faulted ISiteEventLogger is logged and swallowed rather than going to the unobserved-task
firehose. Replace the "ScriptRuntimeContext" class-name fallback with the meaningful
"InstanceScript:{instanceName}" identifier (matching the site-event-log source convention).
Update the method doc-comment to state the best-effort contract explicitly. Pin the new
fallback value in the shape-precision test.
Inject ISiteEventLogger into ScriptRuntimeContext (additive optional ctor
param, defaulted null, all existing callers source-compatible). Add a single
private EmitRecursionLimitEventAsync helper that fires-and-forgets a
"script"/Error site event; called at both recursion guard sites (CallScript
at ~:332 and ScriptCallHelper.CallShared at ~:499). ScriptExecutionActor
threads the already-resolved siteEventLogger singleton into the context;
AlarmExecutionActor leaves it null (no siteEventLogger wired there).
Existing _logger.LogError + throw behaviour unchanged.
Tests: RecursionLimitSiteEventTests — 5 tests covering both CallScript and
CallShared (ISiteEventLogger.LogEventAsync called once with category "script",
severity "Error"; null logger path does not throw).
- Add DebugStreamBridgeActorTests: On_InstanceNotFound_Snapshot_Forwards_To_OnEvent_Does_Not_Open_Stream_And_Terminates — asserts _onEvent receives the not-found snapshot, SubscribeCalls remains empty, and the actor terminates cleanly via Watch/ExpectTerminated.
- Add comment in DebugStreamBridgeActor near Context.Stop(Self) explaining that the subsequent StopDebugStream Tell from DebugStreamService.StopStream produces a benign expected dead-letter.
- Reword not-found toast in DebugView.razor to "Instance not found on the selected site — check the deployment target." (accurate when the instance may be deployed to a different site).
RouteDebugSnapshot and RouteDebugViewSubscribe on DeploymentManagerActor
previously returned an empty DebugViewSnapshot for unknown instances,
indistinguishable from a deployed-but-empty instance. Callers had no way
to differentiate "not deployed here" from "deployed, no data yet."
Approach — additive field on existing message contract:
Added `bool InstanceNotFound = false` as an optional trailing parameter
to DebugViewSnapshot (Commons). All existing positional constructor calls
and serialized wire frames are unaffected (default = false). A dedicated
new message type was considered but rejected: the ClusterClient channel
and DebugStreamService TCS are already typed on DebugViewSnapshot, and a
second reply union would require wider changes for zero additive-safety
gain.
Changes:
- Commons/DebugViewSnapshot: add InstanceNotFound = false (additive)
- DeploymentManagerActor: set InstanceNotFound=true in both unknown-
instance branches (RouteDebugViewSubscribe, RouteDebugSnapshot)
- DebugStreamBridgeActor: when snapshot.InstanceNotFound, forward it to
_onEvent (resolves the TCS) then stop cleanly; no gRPC stream opened
- DebugView.razor: check session.InitialSnapshot.InstanceNotFound after
connect and show a clear "not deployed on this site" error toast
- 3 new tests in DeploymentManagerActorTests covering: unknown→snapshot,
unknown→subscribe, known-empty→InstanceNotFound stays false
Extends ContainsAuditLogMutation regex to match T-SQL bracketed forms
([AuditLog], [dbo].[AuditLog]) that SSMS-generated SQL produces; the
prior optional-schema pattern only matched bare/dbo-prefixed names,
silently missing these real violation forms.
Changes:
- Schema sub-pattern (?:dbo\.)? → (?:\[?dbo\]?\.)? (matches dbo. and [dbo].)
- Table sub-pattern AuditLog\b → \[?AuditLog\]?\b (matches AuditLog and [AuditLog])
- Pattern compiled as static readonly Regex field for clarity/performance
- Adds 4 new planted-positive cases: UPDATE [dbo].[AuditLog], UPDATE [AuditLog],
DELETE FROM [dbo].[AuditLog], DELETE FROM [AuditLog]
- Retains all existing negatives; adds DELETE FROM [dbo].[Notifications] negative
- Fixes misleading "reverse order" comment on the comment-prefix positive case
- Documents scan limitations (EF Core bulk methods; multi-line DML) in class XML doc
Adds AuditLogAppendOnlyGuardTests.cs to
tests/ZB.MOM.WW.ScadaBridge.ConfigurationDatabase.Tests/ — a code-level backstop
for the DB-role DENY UPDATE / DENY DELETE control established in migration
20260602174346_CollapseAuditLogToCanonical.
The guard scans every non-Designer, non-Snapshot *.cs file in the
ConfigurationDatabase source tree and fails the test run if any line matches the
DML-syntax pattern:
UPDATE\s+(?:dbo\.)?AuditLog\b
DELETE\s+(?:FROM\s+)?(?:dbo\.)?AuditLog\b
The tight DML-syntax pattern naturally excludes false positives without extra
exclusion checks: DENY UPDATE ON dbo.AuditLog is not matched (UPDATE is followed
by ON, not the table name); ALTER TABLE … SWITCH and TRUNCATE contain no UPDATE/
DELETE keyword; comments with UPDATE/AuditLog in separate clauses are not matched.
Self-verifying unit tests (ContainsAuditLogMutation_*) prove the helper:
- returns false on clean-source lines (INSERT, SELECT, DENY DDL, ALTER SWITCH,
TRUNCATE, DELETE FROM Notifications);
- returns TRUE on planted violations (UPDATE AuditLog SET …, DELETE FROM
dbo.AuditLog WHERE …, lower-case variants);
- returns false on the exact DENY/GRANT/partition-switch strings from the
production migration files.
All 256 ConfigurationDatabase.Tests pass; solution builds 0 W / 0 E.
REQ-HOST-3/REQ-HOST-4 require a MachineDataDb connection string for Central nodes.
The shipped docker appsettings (docker/central-node-a/appsettings.Central.json and
central-node-b) already carry the key. Host-008 had removed the fail-fast Require
because MachineDataDb had no consumer yet; this commit reverses that decision so a
misconfigured or missing connection string is caught at startup with a clear error.
Changes:
- DatabaseOptions: add MachineDataDb property with XML doc comment
- StartupValidator: add .Require for ScadaBridge:Database:MachineDataDb inside the
existing Central .When block, immediately after the ConfigurationDb Require
- StartupValidatorTests: rename Central_MissingMachineDataDb_PassesValidation ->
FailsValidation and flip to Assert.Throws; update comment to cite REQ-HOST-3/4,
shipped docker appsettings, and the Host-008 reversal; add MachineDataDb to
ValidCentralConfig() so all other Central tests remain green
- CentralDbTestEnvironment: supply ScadaBridge__Database__MachineDataDb env var
(mirrors ConfigurationDb pattern) so HostStartupTests, HealthCheckTests, and
MetricsEndpointTests pass through the new Require
- CompositionRootTests, AkkaHostedServiceAuditWiringTests, ActorPathTests: set
ScadaBridge__Database__MachineDataDb env var alongside the pepper env var and
clear it in Dispose, matching the existing pepper handling pattern
Build: 0 warnings, 0 errors. dotnet test Host.Tests: 233/233 passed.