Replace value?.ToString() with AttributeValueCodec.Encode(value) in
AttributeAccessor indexer set and SetAsync, so a List<string>{"a","b"}
encodes to ["a","b"] instead of the garbage ToString representation.
Add using ZB.MOM.WW.ScadaBridge.Commons.Types. Tests verify the codec
contract (list→JSON array, scalar passthrough, null); full round-trip
through the accessor is not viable without a live Akka ActorSystem —
noted in-test with explanation.
Replace ValueFormatter.FormatDisplayValue with AttributeValueCodec.Encode
in StreamRelayActor so List<T> attribute values cross the gRPC wire as a
JSON array (e.g. ["a","b"]) rather than a comma-joined display string.
Scalars and null values are unaffected. Tests cover List→JSON, scalar
string pass-through, and null→empty-string.
- MV-2: guard unsupported element type before parse (no misleading re-wrap); add Float round-trip test
- MV-4: carry ElementDataType through the two validation-flatten ResolvedAttribute sites (ManagementActor.HandleValidateTemplate, BundleImporter.BuildFlattenedConfigForValidation) so MV-5 validation sees element type via every entry point
- MV-12: include ElementDataType in TemplateAttribute add/update audit payloads + fix stale docstring
Add DataType? ElementDataType to TemplateAttributeDto (optional, default null
for backward-compat with old bundles). Map it in both directions in
EntitySerializer (export + FromBundleContent) and in all three
TemplateAttribute construction sites in BundleImporter (BuildTemplate,
SyncTemplateAttributesAsync add-path, and SyncTemplateAttributesAsync
update-path including change-detection). Two new round-trip tests in
EntitySerializerTests confirm List attributes survive export→import and that
old DTOs with null ElementDataType import cleanly.
Add a first-class DataType.List + ElementDataType companion so object
attributes can store homogeneous scalar lists (e.g. MoveInWorkOrderNumbers,
MoveInPartNumbers) across all four lifecycle paths: script write/read,
static authored default, OPC UA array read, OPC UA array write.
Canonical JSON value codec; whole-list override; element type fixed by base;
idempotent migration widening Value to nvarchar(max) + adding ElementDataType.
Approved via brainstorming.
Inbound API now accepts the credential from either Authorization: Bearer sbk_... OR
X-API-Key: sbk_... (raw token), via the SAME peppered-HMAC verifier (Authorization
precedence preserved; failure path / scope checks unchanged). 16/16 inbound-auth tests.
- Add WhitespaceAuthorization_ValidXApiKey_Returns200: pins the IsNullOrWhiteSpace
fall-through — a present-but-blank Authorization header is treated as absent so a
valid X-API-Key still authenticates (200).
- Remove MissingBearer_Returns401 (added in 510559e): identical path to
NeitherHeader_Returns401 (no Authorization + no X-API-Key → 401); keep the
descriptively-named NeitherHeader variant.
- Change "legacy 'X-API-Key'" -> "alternate 'X-API-Key'" in EndpointExtensions.cs and
the BuildPostWithApiKeyHeader/HappyPath doc comments to avoid implying Bearer is
the older transport (Bearer was itself introduced by the prior auth re-arch).
Faithful port of OtOpcUa: AutoLoginAuthenticationHandler under the cookie scheme when
the flag is true → all-roles system-wide multi-role principal; loud warning; no env guard.
Full-solution build green; Security suite 136/136.
Adds a "Dev Disable-Login Flag" subsection to Component-Security.md covering
ScadaBridge:Security:Auth:DisableLogin / User, the AutoLoginAuthenticationHandler
mechanism, and the no-environment-guard / startup-warning production risk.
Ships DisableLogin: false under ScadaBridge → Security → Auth in:
- src/.../Host/appsettings.json (canonical default)
- docker/central-node-a/appsettings.Central.json
- docker/central-node-b/appsettings.Central.json
Also records DL-3 commit SHAs in the plan tasks file.
Faithful copy (warning only, no env guard); custom AuthenticationHandler under the
cookie scheme; reuses M2.19 SessionClaimBuilder for an all-roles system-wide principal.
20 tasks (M2.0-M2.19), each through its classification-driven review chain.
Full-solution build green (0 warnings, TreatWarningsAsErrors). Per-task targeted
suites all passed. Known pre-existing: 2 partition-purge E2E failures (follow-up #52).
- Add SecurityOptionsValidator (IValidateOptions<SecurityOptions>) enforcing
RoleRefreshThresholdMinutes < IdleTimeoutMinutes; registered with ValidateOnStart in
AddSecurity — startup FAILS if threshold >= idle, so the invariant cannot be silently
misconfigured away.
- Update SecurityOptions XML-docs: class-level summary distinguishes JWT Bearer path
(JwtSigningKey/JwtExpiryMinutes) from Blazor cookie session path (IdleTimeoutMinutes/
RoleRefreshThresholdMinutes); both time fields document the ~45-min effective idle window
and the new cross-field constraint.
- Remove dead jwtService variable from /auth/login lambda in AuthEndpoints.cs (resolved
but never used since login moved to SessionClaimBuilder).
- Extract ApplyValidationResultAsync helper from OnValidatePrincipalAsync (pure
decision-application step); add 3 adapter tests covering Reject → RejectPrincipal +
SignOutAsync; Replace → ReplacePrincipal + ShouldRenew; Keep → no-op.
- Fix inaccurate TryRefreshAsync comment (dropped "OR last-activity needs advancing" —
the code only returns non-null when roleRefreshDue).
- Add InternalsVisibleTo for Security.Tests in Security.csproj.
- Add IsRoleRefreshDue tests: missing claim → due; unparsable claim → due; plus integration
test covering the full ValidateAsync path for a principal missing zb:lastrolerefresh
(triggers refresh + re-stamps anchor rather than keeping stale principal forever).
- Add SecurityOptionsValidatorConfigGuardTests: default succeeds; equal fails; greater fails;
boundary (idle-1) succeeds; wiring confirmed via AddSecurity container.
Spike outcome: the shared ILdapAuthService (ZB.MOM.WW.Auth.Abstractions, an external
NuGet package) exposes ONLY AuthenticateAsync(username, password, ct) — no passwordless
service-account group-search. A live LDAP group re-query for an active session therefore
requires a new lib method and is OUT OF SCOPE (cannot modify the external package).
Implemented the always-achievable layers (cookie-only; no embedded JWT for cookie principals):
- /auth/login now stores the user's raw LDAP groups (one zb:group claim each) plus a
zb:lastrolerefresh anchor (login time, UTC), seeding the LastActivity idle anchor too.
- SessionClaimBuilder: single shared DRY claim-builder used by BOTH /auth/login AND the
refresh path, so the two claim shapes cannot drift (canonical identity/role/scope claims
with nameType/roleType pinned, plus the M2.19 group + refresh-anchor additions).
- CookieSessionValidator (TimeProvider-injected, unit-testable) + a thin
CookieAuthenticationEvents.OnValidatePrincipal adapter:
* idle-timeout: a session past IdleTimeoutMinutes (default 30) is RejectPrincipal+SignOut;
consistent with the cookie ExpireTimeSpan+SlidingExpiration window (same value).
* role refresh WITHOUT LDAP: when older than RoleRefreshThresholdMinutes (new option,
default 15) the DB-backed RoleMapper re-runs on the STORED groups, claims are rebuilt
via the shared builder, the anchor advances, principal is replaced + cookie renewed.
Revoked DB mappings drop the user's roles mid-session.
* fail-soft: any refresh error KEEPS the existing principal (no sign-out, never throws)
— mirrors the documented "LDAP failure: active sessions continue with current roles".
- Documented residual limitation in Component-Security.md: central role-mapping/scope
changes apply within ~15 min without LDAP; live directory group-membership changes are
picked up only at next login (needs a passwordless group-search on the external
ZB.MOM.WW.Auth.Ldap lib — tracked follow-up).
Tests (Security.Tests, all green): CookieSessionValidatorTests + SessionClaimBuilderParityTests
— idle reject/keep, LDAP-free remap-from-stored-groups, revoked-roles loss, sub-threshold
no-refresh, refresh-throws-keeps-session, and login/refresh claim-parity.
- MockSiteStreamGrpcClient.SubscribeCalls and UnsubscribedCorrelationIds
switched from bare List<T> to lock-guarded backing fields with snapshot
accessors, eliminating the actor-thread/test-thread data race (matches
the existing lock(events) pattern for ReceivedEvents)
- AttributeKey and AlarmKey null-guard each component with ?? string.Empty
so a null SourceReference/AlarmName/etc. cannot silently collide with an
empty-string component in the dedup dictionary
- On_Snapshot_Opens_GrpcStream renamed to
On_Snapshot_Does_Not_Open_Additional_GrpcStream; assertion updated to
confirm exactly one subscribe (the PreStart stream-first open) with no
second subscribe after snapshot delivery
- _stopped ordering in InstanceNotFound path moved after CleanupGrpc()
for consistency with DebugStreamTerminated and ReceiveTimeout handlers
Re-architect DebugStreamBridgeActor from snapshot-first to stream-first so no
attribute/alarm event occurring during the snapshot-build + network-transit
window is lost (#26).
Lifecycle change:
- PreStart now opens the gRPC subscription FIRST (alongside sending the
SubscribeDebugViewRequest), so live events start flowing immediately.
- Phase model via a single _snapshotDelivered flag (mutated only on the actor
thread). While buffering (snapshot not yet delivered), AttributeValueChanged/
AlarmStateChanged are appended to an ordered _preSnapshotBuffer instead of
being delivered. After snapshot+flush, the same handlers pass through directly.
- On DebugViewSnapshot: deliver snapshot, then flush the buffer in arrival order
with per-entity dedup, then set _snapshotDelivered=true (pass-through).
Dedup rule (exactly-once):
- Identity: attributes by (InstanceUniqueName, AttributePath, AttributeName);
alarms by (InstanceUniqueName, AlarmName, SourceReference) so native
per-condition alarms are not conflated. Keys joined with a NUL delimiter
(declared as an escaped char constant; no raw NUL in source) so distinct
identities never collide on a space within a name.
- Boundary: a buffered event whose timestamp is <= the snapshot's timestamp for
the same entity is already reflected -> DROP; strictly-newer (>) -> DELIVER;
entity absent from the snapshot -> DELIVER (genuine gap-window event).
Preserved paths:
- M2.11 InstanceNotFound: with stream-first the gRPC stream is already open, so
the not-found path now tears it down (CleanupGrpc) + clears the buffer, does
NOT enter pass-through, delivers the not-found snapshot, and stops cleanly.
- Reconnect (ReconnectGrpcStream -> OpenGrpcStream) does not touch the phase
flag: a mid-session reconnect resumes pass-through; a reconnect during the
buffering phase stays buffering until the snapshot arrives.
- Communication-008 retry/stability/stop/terminate + ReceiveTimeout orphan net
unchanged. Duplicate/late snapshot after delivery is ignored defensively.
Tests: 10 new M2.18 tests (stream-first ordering, gap-window buffering, dedup
drop/deliver for attrs + alarms, ordering, pass-through, InstanceNotFound
teardown, reconnect-during-buffering, reconnect-after-snapshot) + revised the
M2.11 not-found test to assert stream teardown. Full DebugStreamBridgeActor
class green: 23/23.
git blame shows commit 1d5465f3 deliberately added NotDeployed to CanDelete so an
undeployed instance can have its orphan record fully removed. Code + tests already
permit it; the spec matrix said 'No'. Per M2.17, reconcile doc→code (not the reverse):
matrix now reads 'Delete from Not deployed = Yes (removes the orphan record)' with a
note, and CanDelete carries a remark citing the rationale + origin commit.
AddSiteEventLogHealthMetricsBridge registered via AddHostedService(factory-lambda),
which sets ImplementationFactory and leaves ImplementationType null. The prior
ImplementationType == guard was therefore silently dead — a second call would spin
up a second SiteEventLogFailureCountReporter. Fix: add a private
SiteEventLogHealthMetricsBridgeMarker singleton and guard on its ServiceType instead.
Also corrects the cycle-path comment in both ServiceCollectionExtensions.cs and
SiteEventLogFailureCountReporter.cs: StoreAndForward.csproj does reference
SiteEventLogging.csproj, so the transitive path HealthMonitoring → StoreAndForward →
SiteEventLogging is real, but adding a direct HealthMonitoring → SiteEventLogging
reference would NOT create a cycle (SiteEventLogging has no back-edge to HealthMonitoring).
The Func<long> seam is a coupling-avoidance measure, not a cycle-breaker.
Adds AddSiteEventLogHealthMetricsBridgeTests.AddSiteEventLogHealthMetricsBridge_IsIdempotent_DoesNotDoubleRegister_HostedService
as a regression test (builds provider and asserts exactly one reporter via GetServices<IHostedService>().OfType<T>()).
- Remove redundant linked CancellationTokenSource in ProbeAsync; pass the
framework cancellationToken and ProbeTimeout directly to Ask (the two-CTS
pattern was redundant — Ask already honours both the timeout and the token).
- Add EchoActor XML <remarks> explaining why no Receive<Identify> handler is
needed (ActorBase answers Identify automatically).
- Add PreCancelledToken_ReportsUnhealthy_DoesNotThrow test: verifies the
never-throws guarantee on the shutdown-race path (token already cancelled
before CheckHealthAsync is invoked).
REQ-HOST-4a lists "required cluster singletons running (if applicable)" as a
readiness criterion, but /health/ready only checked database + akka-cluster.
Add a third Ready-tagged check, RequiredSingletonsHealthCheck, registered in the
Central-role AddHealthChecks() chain (so it is naturally role-scoped — site nodes
never run it).
Probe: for each required central singleton, Ask its local ClusterSingletonProxy
an Identify with a short bounded per-singleton timeout (~2s, probes run
concurrently via Task.WhenAll). A non-null ActorIdentity.Subject within the
timeout means the singleton is running and reachable through the proxy; a null
subject or a timeout means unreachable → Unhealthy, naming the unreachable
singleton(s). The check never throws (catch-all → Unhealthy) and resolves
ActorSystem lazily from DI per probe (Unhealthy if Akka not yet up).
Required-always set = the five singleton proxies created unconditionally in
AkkaHostedService.RegisterCentralActors: notification-outbox, audit-log-ingest,
site-call-audit, audit-log-purge, site-audit-reconciliation. There are no
feature/config-gated central singletons today; any future gated singleton is the
"if applicable" case and must NOT be added to the required set.
Leadership-agnostic: the proxy reaches the singleton from either central node, so
a ready standby still reports ready (readiness must not require cluster
leadership — that is the Active tier's job). During a brief singleton handover the
probe may time out and the node flaps to not-ready, which is correct (a node
mid-handover is legitimately not fully ready); no retries, to keep the probe fast.
Tests (TDD): RequiredSingletonsHealthCheckTests exercises the probe against a
TestKit ActorSystem — all proxies present+reachable → Healthy; one missing →
Unhealthy naming it; ActorSystem absent → Unhealthy, no throw. HealthCheckTests
regression-guards the Ready tag + absence of the Active tag on the new check.
OPC UA (RealOpcUaClient):
- Append 5 new SelectClauses at indices 13–17 (never renumber 0–12):
- 13: AlarmConditionType/ActiveState/TransitionTime → OriginalRaiseTime
- 14–17: LimitAlarmType HighHighLimit/HighLimit/LowLimit/LowLowLimit → LimitValue
- New OpcUaAlarmMapper.PickLimitValue helper: first non-null in HiHi→Hi→Lo→LoLo
priority order, InvariantCulture-formatted; empty string for non-limit alarm types.
- HandleAlarmEvent reads new indices with fields.Count > N guards; hard minimum (6)
unchanged so base ConditionType events still process without the limit fields.
- Document unavailable-by-protocol fields (Category, Description, OperatorUser,
CurrentValue) inline in BuildAlarmEventFilter and HandleAlarmEvent.
MxGateway (MxGatewayAlarmMapper):
- MapTransition: CurrentValue and LimitValue now populated via MxValueToString
(uses MxValueExtensions.ToClrValue + InvariantCulture) from OnAlarmTransitionEvent
proto fields current_value/limit_value.
- MapSnapshot: same — populated from ActiveAlarmSnapshot.current_value/limit_value.
- MxValueToString helper (internal): null-safe MxValue → string conversion.
Tests (17 new, 40 total pass):
- OpcUaAlarmMapperTests: PickLimitValue priority, InvariantCulture, all-null case.
- MxGatewayAlarmMapperTests: CurrentValue/LimitValue populate from double/string
MxValue; absent fields yield empty strings.
- RealOpcUaClientAlarmFilterTests: index alignment assertions (count=18, per-index
TypeDefinitionId+BrowsePath), regression guard on existing indices 0–12.
Replace bare task-discard with ContinueWith(OnlyOnFaulted|ExecuteSynchronously) so a
faulted ISiteEventLogger is logged and swallowed rather than going to the unobserved-task
firehose. Replace the "ScriptRuntimeContext" class-name fallback with the meaningful
"InstanceScript:{instanceName}" identifier (matching the site-event-log source convention).
Update the method doc-comment to state the best-effort contract explicitly. Pin the new
fallback value in the shape-precision test.
Inject ISiteEventLogger into ScriptRuntimeContext (additive optional ctor
param, defaulted null, all existing callers source-compatible). Add a single
private EmitRecursionLimitEventAsync helper that fires-and-forgets a
"script"/Error site event; called at both recursion guard sites (CallScript
at ~:332 and ScriptCallHelper.CallShared at ~:499). ScriptExecutionActor
threads the already-resolved siteEventLogger singleton into the context;
AlarmExecutionActor leaves it null (no siteEventLogger wired there).
Existing _logger.LogError + throw behaviour unchanged.
Tests: RecursionLimitSiteEventTests — 5 tests covering both CallScript and
CallShared (ISiteEventLogger.LogEventAsync called once with category "script",
severity "Error"; null logger path does not throw).
- Add DebugStreamBridgeActorTests: On_InstanceNotFound_Snapshot_Forwards_To_OnEvent_Does_Not_Open_Stream_And_Terminates — asserts _onEvent receives the not-found snapshot, SubscribeCalls remains empty, and the actor terminates cleanly via Watch/ExpectTerminated.
- Add comment in DebugStreamBridgeActor near Context.Stop(Self) explaining that the subsequent StopDebugStream Tell from DebugStreamService.StopStream produces a benign expected dead-letter.
- Reword not-found toast in DebugView.razor to "Instance not found on the selected site — check the deployment target." (accurate when the instance may be deployed to a different site).
RouteDebugSnapshot and RouteDebugViewSubscribe on DeploymentManagerActor
previously returned an empty DebugViewSnapshot for unknown instances,
indistinguishable from a deployed-but-empty instance. Callers had no way
to differentiate "not deployed here" from "deployed, no data yet."
Approach — additive field on existing message contract:
Added `bool InstanceNotFound = false` as an optional trailing parameter
to DebugViewSnapshot (Commons). All existing positional constructor calls
and serialized wire frames are unaffected (default = false). A dedicated
new message type was considered but rejected: the ClusterClient channel
and DebugStreamService TCS are already typed on DebugViewSnapshot, and a
second reply union would require wider changes for zero additive-safety
gain.
Changes:
- Commons/DebugViewSnapshot: add InstanceNotFound = false (additive)
- DeploymentManagerActor: set InstanceNotFound=true in both unknown-
instance branches (RouteDebugViewSubscribe, RouteDebugSnapshot)
- DebugStreamBridgeActor: when snapshot.InstanceNotFound, forward it to
_onEvent (resolves the TCS) then stop cleanly; no gRPC stream opened
- DebugView.razor: check session.InitialSnapshot.InstanceNotFound after
connect and show a clear "not deployed on this site" error toast
- 3 new tests in DeploymentManagerActorTests covering: unknown→snapshot,
unknown→subscribe, known-empty→InstanceNotFound stays false