Faithful copy (warning only, no env guard); custom AuthenticationHandler under the
cookie scheme; reuses M2.19 SessionClaimBuilder for an all-roles system-wide principal.
20 tasks (M2.0-M2.19), each through its classification-driven review chain.
Full-solution build green (0 warnings, TreatWarningsAsErrors). Per-task targeted
suites all passed. Known pre-existing: 2 partition-purge E2E failures (follow-up #52).
- Add SecurityOptionsValidator (IValidateOptions<SecurityOptions>) enforcing
RoleRefreshThresholdMinutes < IdleTimeoutMinutes; registered with ValidateOnStart in
AddSecurity — startup FAILS if threshold >= idle, so the invariant cannot be silently
misconfigured away.
- Update SecurityOptions XML-docs: class-level summary distinguishes JWT Bearer path
(JwtSigningKey/JwtExpiryMinutes) from Blazor cookie session path (IdleTimeoutMinutes/
RoleRefreshThresholdMinutes); both time fields document the ~45-min effective idle window
and the new cross-field constraint.
- Remove dead jwtService variable from /auth/login lambda in AuthEndpoints.cs (resolved
but never used since login moved to SessionClaimBuilder).
- Extract ApplyValidationResultAsync helper from OnValidatePrincipalAsync (pure
decision-application step); add 3 adapter tests covering Reject → RejectPrincipal +
SignOutAsync; Replace → ReplacePrincipal + ShouldRenew; Keep → no-op.
- Fix inaccurate TryRefreshAsync comment (dropped "OR last-activity needs advancing" —
the code only returns non-null when roleRefreshDue).
- Add InternalsVisibleTo for Security.Tests in Security.csproj.
- Add IsRoleRefreshDue tests: missing claim → due; unparsable claim → due; plus integration
test covering the full ValidateAsync path for a principal missing zb:lastrolerefresh
(triggers refresh + re-stamps anchor rather than keeping stale principal forever).
- Add SecurityOptionsValidatorConfigGuardTests: default succeeds; equal fails; greater fails;
boundary (idle-1) succeeds; wiring confirmed via AddSecurity container.
Spike outcome: the shared ILdapAuthService (ZB.MOM.WW.Auth.Abstractions, an external
NuGet package) exposes ONLY AuthenticateAsync(username, password, ct) — no passwordless
service-account group-search. A live LDAP group re-query for an active session therefore
requires a new lib method and is OUT OF SCOPE (cannot modify the external package).
Implemented the always-achievable layers (cookie-only; no embedded JWT for cookie principals):
- /auth/login now stores the user's raw LDAP groups (one zb:group claim each) plus a
zb:lastrolerefresh anchor (login time, UTC), seeding the LastActivity idle anchor too.
- SessionClaimBuilder: single shared DRY claim-builder used by BOTH /auth/login AND the
refresh path, so the two claim shapes cannot drift (canonical identity/role/scope claims
with nameType/roleType pinned, plus the M2.19 group + refresh-anchor additions).
- CookieSessionValidator (TimeProvider-injected, unit-testable) + a thin
CookieAuthenticationEvents.OnValidatePrincipal adapter:
* idle-timeout: a session past IdleTimeoutMinutes (default 30) is RejectPrincipal+SignOut;
consistent with the cookie ExpireTimeSpan+SlidingExpiration window (same value).
* role refresh WITHOUT LDAP: when older than RoleRefreshThresholdMinutes (new option,
default 15) the DB-backed RoleMapper re-runs on the STORED groups, claims are rebuilt
via the shared builder, the anchor advances, principal is replaced + cookie renewed.
Revoked DB mappings drop the user's roles mid-session.
* fail-soft: any refresh error KEEPS the existing principal (no sign-out, never throws)
— mirrors the documented "LDAP failure: active sessions continue with current roles".
- Documented residual limitation in Component-Security.md: central role-mapping/scope
changes apply within ~15 min without LDAP; live directory group-membership changes are
picked up only at next login (needs a passwordless group-search on the external
ZB.MOM.WW.Auth.Ldap lib — tracked follow-up).
Tests (Security.Tests, all green): CookieSessionValidatorTests + SessionClaimBuilderParityTests
— idle reject/keep, LDAP-free remap-from-stored-groups, revoked-roles loss, sub-threshold
no-refresh, refresh-throws-keeps-session, and login/refresh claim-parity.
- MockSiteStreamGrpcClient.SubscribeCalls and UnsubscribedCorrelationIds
switched from bare List<T> to lock-guarded backing fields with snapshot
accessors, eliminating the actor-thread/test-thread data race (matches
the existing lock(events) pattern for ReceivedEvents)
- AttributeKey and AlarmKey null-guard each component with ?? string.Empty
so a null SourceReference/AlarmName/etc. cannot silently collide with an
empty-string component in the dedup dictionary
- On_Snapshot_Opens_GrpcStream renamed to
On_Snapshot_Does_Not_Open_Additional_GrpcStream; assertion updated to
confirm exactly one subscribe (the PreStart stream-first open) with no
second subscribe after snapshot delivery
- _stopped ordering in InstanceNotFound path moved after CleanupGrpc()
for consistency with DebugStreamTerminated and ReceiveTimeout handlers
Re-architect DebugStreamBridgeActor from snapshot-first to stream-first so no
attribute/alarm event occurring during the snapshot-build + network-transit
window is lost (#26).
Lifecycle change:
- PreStart now opens the gRPC subscription FIRST (alongside sending the
SubscribeDebugViewRequest), so live events start flowing immediately.
- Phase model via a single _snapshotDelivered flag (mutated only on the actor
thread). While buffering (snapshot not yet delivered), AttributeValueChanged/
AlarmStateChanged are appended to an ordered _preSnapshotBuffer instead of
being delivered. After snapshot+flush, the same handlers pass through directly.
- On DebugViewSnapshot: deliver snapshot, then flush the buffer in arrival order
with per-entity dedup, then set _snapshotDelivered=true (pass-through).
Dedup rule (exactly-once):
- Identity: attributes by (InstanceUniqueName, AttributePath, AttributeName);
alarms by (InstanceUniqueName, AlarmName, SourceReference) so native
per-condition alarms are not conflated. Keys joined with a NUL delimiter
(declared as an escaped char constant; no raw NUL in source) so distinct
identities never collide on a space within a name.
- Boundary: a buffered event whose timestamp is <= the snapshot's timestamp for
the same entity is already reflected -> DROP; strictly-newer (>) -> DELIVER;
entity absent from the snapshot -> DELIVER (genuine gap-window event).
Preserved paths:
- M2.11 InstanceNotFound: with stream-first the gRPC stream is already open, so
the not-found path now tears it down (CleanupGrpc) + clears the buffer, does
NOT enter pass-through, delivers the not-found snapshot, and stops cleanly.
- Reconnect (ReconnectGrpcStream -> OpenGrpcStream) does not touch the phase
flag: a mid-session reconnect resumes pass-through; a reconnect during the
buffering phase stays buffering until the snapshot arrives.
- Communication-008 retry/stability/stop/terminate + ReceiveTimeout orphan net
unchanged. Duplicate/late snapshot after delivery is ignored defensively.
Tests: 10 new M2.18 tests (stream-first ordering, gap-window buffering, dedup
drop/deliver for attrs + alarms, ordering, pass-through, InstanceNotFound
teardown, reconnect-during-buffering, reconnect-after-snapshot) + revised the
M2.11 not-found test to assert stream teardown. Full DebugStreamBridgeActor
class green: 23/23.
git blame shows commit 1d5465f3 deliberately added NotDeployed to CanDelete so an
undeployed instance can have its orphan record fully removed. Code + tests already
permit it; the spec matrix said 'No'. Per M2.17, reconcile doc→code (not the reverse):
matrix now reads 'Delete from Not deployed = Yes (removes the orphan record)' with a
note, and CanDelete carries a remark citing the rationale + origin commit.
AddSiteEventLogHealthMetricsBridge registered via AddHostedService(factory-lambda),
which sets ImplementationFactory and leaves ImplementationType null. The prior
ImplementationType == guard was therefore silently dead — a second call would spin
up a second SiteEventLogFailureCountReporter. Fix: add a private
SiteEventLogHealthMetricsBridgeMarker singleton and guard on its ServiceType instead.
Also corrects the cycle-path comment in both ServiceCollectionExtensions.cs and
SiteEventLogFailureCountReporter.cs: StoreAndForward.csproj does reference
SiteEventLogging.csproj, so the transitive path HealthMonitoring → StoreAndForward →
SiteEventLogging is real, but adding a direct HealthMonitoring → SiteEventLogging
reference would NOT create a cycle (SiteEventLogging has no back-edge to HealthMonitoring).
The Func<long> seam is a coupling-avoidance measure, not a cycle-breaker.
Adds AddSiteEventLogHealthMetricsBridgeTests.AddSiteEventLogHealthMetricsBridge_IsIdempotent_DoesNotDoubleRegister_HostedService
as a regression test (builds provider and asserts exactly one reporter via GetServices<IHostedService>().OfType<T>()).
- Remove redundant linked CancellationTokenSource in ProbeAsync; pass the
framework cancellationToken and ProbeTimeout directly to Ask (the two-CTS
pattern was redundant — Ask already honours both the timeout and the token).
- Add EchoActor XML <remarks> explaining why no Receive<Identify> handler is
needed (ActorBase answers Identify automatically).
- Add PreCancelledToken_ReportsUnhealthy_DoesNotThrow test: verifies the
never-throws guarantee on the shutdown-race path (token already cancelled
before CheckHealthAsync is invoked).
REQ-HOST-4a lists "required cluster singletons running (if applicable)" as a
readiness criterion, but /health/ready only checked database + akka-cluster.
Add a third Ready-tagged check, RequiredSingletonsHealthCheck, registered in the
Central-role AddHealthChecks() chain (so it is naturally role-scoped — site nodes
never run it).
Probe: for each required central singleton, Ask its local ClusterSingletonProxy
an Identify with a short bounded per-singleton timeout (~2s, probes run
concurrently via Task.WhenAll). A non-null ActorIdentity.Subject within the
timeout means the singleton is running and reachable through the proxy; a null
subject or a timeout means unreachable → Unhealthy, naming the unreachable
singleton(s). The check never throws (catch-all → Unhealthy) and resolves
ActorSystem lazily from DI per probe (Unhealthy if Akka not yet up).
Required-always set = the five singleton proxies created unconditionally in
AkkaHostedService.RegisterCentralActors: notification-outbox, audit-log-ingest,
site-call-audit, audit-log-purge, site-audit-reconciliation. There are no
feature/config-gated central singletons today; any future gated singleton is the
"if applicable" case and must NOT be added to the required set.
Leadership-agnostic: the proxy reaches the singleton from either central node, so
a ready standby still reports ready (readiness must not require cluster
leadership — that is the Active tier's job). During a brief singleton handover the
probe may time out and the node flaps to not-ready, which is correct (a node
mid-handover is legitimately not fully ready); no retries, to keep the probe fast.
Tests (TDD): RequiredSingletonsHealthCheckTests exercises the probe against a
TestKit ActorSystem — all proxies present+reachable → Healthy; one missing →
Unhealthy naming it; ActorSystem absent → Unhealthy, no throw. HealthCheckTests
regression-guards the Ready tag + absence of the Active tag on the new check.
OPC UA (RealOpcUaClient):
- Append 5 new SelectClauses at indices 13–17 (never renumber 0–12):
- 13: AlarmConditionType/ActiveState/TransitionTime → OriginalRaiseTime
- 14–17: LimitAlarmType HighHighLimit/HighLimit/LowLimit/LowLowLimit → LimitValue
- New OpcUaAlarmMapper.PickLimitValue helper: first non-null in HiHi→Hi→Lo→LoLo
priority order, InvariantCulture-formatted; empty string for non-limit alarm types.
- HandleAlarmEvent reads new indices with fields.Count > N guards; hard minimum (6)
unchanged so base ConditionType events still process without the limit fields.
- Document unavailable-by-protocol fields (Category, Description, OperatorUser,
CurrentValue) inline in BuildAlarmEventFilter and HandleAlarmEvent.
MxGateway (MxGatewayAlarmMapper):
- MapTransition: CurrentValue and LimitValue now populated via MxValueToString
(uses MxValueExtensions.ToClrValue + InvariantCulture) from OnAlarmTransitionEvent
proto fields current_value/limit_value.
- MapSnapshot: same — populated from ActiveAlarmSnapshot.current_value/limit_value.
- MxValueToString helper (internal): null-safe MxValue → string conversion.
Tests (17 new, 40 total pass):
- OpcUaAlarmMapperTests: PickLimitValue priority, InvariantCulture, all-null case.
- MxGatewayAlarmMapperTests: CurrentValue/LimitValue populate from double/string
MxValue; absent fields yield empty strings.
- RealOpcUaClientAlarmFilterTests: index alignment assertions (count=18, per-index
TypeDefinitionId+BrowsePath), regression guard on existing indices 0–12.
Replace bare task-discard with ContinueWith(OnlyOnFaulted|ExecuteSynchronously) so a
faulted ISiteEventLogger is logged and swallowed rather than going to the unobserved-task
firehose. Replace the "ScriptRuntimeContext" class-name fallback with the meaningful
"InstanceScript:{instanceName}" identifier (matching the site-event-log source convention).
Update the method doc-comment to state the best-effort contract explicitly. Pin the new
fallback value in the shape-precision test.
Inject ISiteEventLogger into ScriptRuntimeContext (additive optional ctor
param, defaulted null, all existing callers source-compatible). Add a single
private EmitRecursionLimitEventAsync helper that fires-and-forgets a
"script"/Error site event; called at both recursion guard sites (CallScript
at ~:332 and ScriptCallHelper.CallShared at ~:499). ScriptExecutionActor
threads the already-resolved siteEventLogger singleton into the context;
AlarmExecutionActor leaves it null (no siteEventLogger wired there).
Existing _logger.LogError + throw behaviour unchanged.
Tests: RecursionLimitSiteEventTests — 5 tests covering both CallScript and
CallShared (ISiteEventLogger.LogEventAsync called once with category "script",
severity "Error"; null logger path does not throw).
- Add DebugStreamBridgeActorTests: On_InstanceNotFound_Snapshot_Forwards_To_OnEvent_Does_Not_Open_Stream_And_Terminates — asserts _onEvent receives the not-found snapshot, SubscribeCalls remains empty, and the actor terminates cleanly via Watch/ExpectTerminated.
- Add comment in DebugStreamBridgeActor near Context.Stop(Self) explaining that the subsequent StopDebugStream Tell from DebugStreamService.StopStream produces a benign expected dead-letter.
- Reword not-found toast in DebugView.razor to "Instance not found on the selected site — check the deployment target." (accurate when the instance may be deployed to a different site).
RouteDebugSnapshot and RouteDebugViewSubscribe on DeploymentManagerActor
previously returned an empty DebugViewSnapshot for unknown instances,
indistinguishable from a deployed-but-empty instance. Callers had no way
to differentiate "not deployed here" from "deployed, no data yet."
Approach — additive field on existing message contract:
Added `bool InstanceNotFound = false` as an optional trailing parameter
to DebugViewSnapshot (Commons). All existing positional constructor calls
and serialized wire frames are unaffected (default = false). A dedicated
new message type was considered but rejected: the ClusterClient channel
and DebugStreamService TCS are already typed on DebugViewSnapshot, and a
second reply union would require wider changes for zero additive-safety
gain.
Changes:
- Commons/DebugViewSnapshot: add InstanceNotFound = false (additive)
- DeploymentManagerActor: set InstanceNotFound=true in both unknown-
instance branches (RouteDebugViewSubscribe, RouteDebugSnapshot)
- DebugStreamBridgeActor: when snapshot.InstanceNotFound, forward it to
_onEvent (resolves the TCS) then stop cleanly; no gRPC stream opened
- DebugView.razor: check session.InitialSnapshot.InstanceNotFound after
connect and show a clear "not deployed on this site" error toast
- 3 new tests in DeploymentManagerActorTests covering: unknown→snapshot,
unknown→subscribe, known-empty→InstanceNotFound stays false
Extends ContainsAuditLogMutation regex to match T-SQL bracketed forms
([AuditLog], [dbo].[AuditLog]) that SSMS-generated SQL produces; the
prior optional-schema pattern only matched bare/dbo-prefixed names,
silently missing these real violation forms.
Changes:
- Schema sub-pattern (?:dbo\.)? → (?:\[?dbo\]?\.)? (matches dbo. and [dbo].)
- Table sub-pattern AuditLog\b → \[?AuditLog\]?\b (matches AuditLog and [AuditLog])
- Pattern compiled as static readonly Regex field for clarity/performance
- Adds 4 new planted-positive cases: UPDATE [dbo].[AuditLog], UPDATE [AuditLog],
DELETE FROM [dbo].[AuditLog], DELETE FROM [AuditLog]
- Retains all existing negatives; adds DELETE FROM [dbo].[Notifications] negative
- Fixes misleading "reverse order" comment on the comment-prefix positive case
- Documents scan limitations (EF Core bulk methods; multi-line DML) in class XML doc
Adds AuditLogAppendOnlyGuardTests.cs to
tests/ZB.MOM.WW.ScadaBridge.ConfigurationDatabase.Tests/ — a code-level backstop
for the DB-role DENY UPDATE / DENY DELETE control established in migration
20260602174346_CollapseAuditLogToCanonical.
The guard scans every non-Designer, non-Snapshot *.cs file in the
ConfigurationDatabase source tree and fails the test run if any line matches the
DML-syntax pattern:
UPDATE\s+(?:dbo\.)?AuditLog\b
DELETE\s+(?:FROM\s+)?(?:dbo\.)?AuditLog\b
The tight DML-syntax pattern naturally excludes false positives without extra
exclusion checks: DENY UPDATE ON dbo.AuditLog is not matched (UPDATE is followed
by ON, not the table name); ALTER TABLE … SWITCH and TRUNCATE contain no UPDATE/
DELETE keyword; comments with UPDATE/AuditLog in separate clauses are not matched.
Self-verifying unit tests (ContainsAuditLogMutation_*) prove the helper:
- returns false on clean-source lines (INSERT, SELECT, DENY DDL, ALTER SWITCH,
TRUNCATE, DELETE FROM Notifications);
- returns TRUE on planted violations (UPDATE AuditLog SET …, DELETE FROM
dbo.AuditLog WHERE …, lower-case variants);
- returns false on the exact DENY/GRANT/partition-switch strings from the
production migration files.
All 256 ConfigurationDatabase.Tests pass; solution builds 0 W / 0 E.
REQ-HOST-3/REQ-HOST-4 require a MachineDataDb connection string for Central nodes.
The shipped docker appsettings (docker/central-node-a/appsettings.Central.json and
central-node-b) already carry the key. Host-008 had removed the fail-fast Require
because MachineDataDb had no consumer yet; this commit reverses that decision so a
misconfigured or missing connection string is caught at startup with a clear error.
Changes:
- DatabaseOptions: add MachineDataDb property with XML doc comment
- StartupValidator: add .Require for ScadaBridge:Database:MachineDataDb inside the
existing Central .When block, immediately after the ConfigurationDb Require
- StartupValidatorTests: rename Central_MissingMachineDataDb_PassesValidation ->
FailsValidation and flip to Assert.Throws; update comment to cite REQ-HOST-3/4,
shipped docker appsettings, and the Host-008 reversal; add MachineDataDb to
ValidCentralConfig() so all other Central tests remain green
- CentralDbTestEnvironment: supply ScadaBridge__Database__MachineDataDb env var
(mirrors ConfigurationDb pattern) so HostStartupTests, HealthCheckTests, and
MetricsEndpointTests pass through the new Require
- CompositionRootTests, AkkaHostedServiceAuditWiringTests, ActorPathTests: set
ScadaBridge__Database__MachineDataDb env var alongside the pepper env var and
clear it in Dispose, matching the existing pepper handling pattern
Build: 0 warnings, 0 errors. dotnet test Host.Tests: 233/233 passed.
Add code comments in ValidateConnectionBindingCompleteness explaining
that the unbound-attribute branch also covers the silently-dropped
stale-binding case (cross-reference FlatteningService.ApplyConnectionBindings),
and that the `continue` skips the exists-at-site check for unbound attrs.
Add two new tests:
- FlatteningPipelineConnectionBindingTests: stale DataConnectionId (999)
not present in site connections → flattener drops it silently →
validator reports ConnectionBinding Error, IsValid false.
- ValidationServiceTests: enforce:true + siteConnectionNames:null on a
properly-bound attribute → no ConnectionBinding error (exists-at-site
check stays inert when site set is not supplied).
Pre-deployment validation only WARNED when a data-sourced attribute had no
connection binding, so an instance with unresolved bindings still passed IsValid
and could deploy. There was also no check that a binding resolves to a connection
that actually exists at the target site.
- ValidationService.Validate gains an opt-in `enforceConnectionBindings` flag
(default false) plus a `siteConnectionNames` set. Default-false keeps the
template DESIGN-TIME path (ManagementActor.HandleValidateTemplate) non-blocking,
since bindings are legitimately set later at instance/deploy time. The DEPLOY
path (FlatteningPipeline) opts in (true) so:
* a data-sourced attribute with no binding is now a deploy-gating Error;
* a binding to a connection that does not exist on the target site is an Error.
Static (non-data-sourced) attributes are never flagged.
- FlatteningPipeline computes the site-connection-names set from the loaded site
data connections (mirroring M2.1's alarmCapableConnectionNames) and threads it in.
- Tests: TemplateEngine.Tests covers design-time warning / deploy-time error /
static-ok / exists-at-site / non-existent-connection. New
FlatteningPipelineConnectionBindingTests proves the deploy path enforces it.
Mark M2.7 + M2.8 completed in the plan task tracker.
SplitCallArguments now skips C# line (`//`) and block (`/* */`) comments when
tokenizing the argument list, so a comma inside a comment no longer produces a
spurious arg-count mismatch. IsNumericLiteral now explicitly rejects tokens
whose first non-sign character is `_` or a letter (e.g. `_2`), and restricts
underscore digit-separators to positions after at least one digit, preventing
identifier-shaped tokens from being inferred as Integer/Float.
#20 return-type: when a CallScript/CallShared result is assigned directly into
a typed local declaration (optionally awaited, optionally via an Instance./
Scripts./Parent./Children["x"]. receiver), compare the LHS declared type
against the target script's declared ReturnDefinition and flag clear
cross-category mismatches (ReturnTypeMismatch). Previously BuildReturnMap was
built but never read.
#21 argument-type: positional call arguments are now split (paren/brace/bracket
+ string-literal aware) and each literal-inferable argument is checked against
the target's declared parameter type (ParameterMismatch), not just the count.
Conservative — only CLEAR primitive mismatches (String/Integer/Float/Boolean)
are flagged; Integer<->Float widening is tolerated. Unknown/Object/List
declarations, var/untyped/unused/expression-embedded assignments, and
non-literal arguments (variables, member access, method/await chains, casts,
object/array initializers, compound or concatenated expressions, interpolated
strings) are never flagged. Inference limits documented in code.
Adds 16 SemanticValidatorTests covering mismatch detection, correct-call pass,
and the dynamic/unknown no-false-positive cases.
Object/List parameters and return values were shape-validated only (object vs
array), with no field-level/nested type checks — type-wrong nested data passed
inbound validation and failed only at script runtime. Add recursive type
validation (declared Object field types, List element type, scalars at any depth)
with path-qualified errors, symmetric across ParameterValidator and ReturnValueValidator.
Both validators now parse the canonical JSON Schema definition format (the
Central UI / MigrateParametersToJsonSchema output) via a shared recursive engine,
Commons.Types.InboundApi.InboundApiSchema, instead of the legacy flat
[{name,type}] array which they could not even deserialize from migrated rows.
The legacy flat-array form is still accepted on read for transition safety.
Undeclared fields are rejected at every level (consistent with the existing
top-level unexpected-parameter rejection); a present-but-null value satisfies
any type, only absence of a required field is an error.
The UI script editor has no ExecutionTimeoutSeconds control (authoring deferred),
so a body edit silently cleared a timeout set via Transport import. Round-trip the
loaded value so UI edits preserve it. Add the missing AlarmExecutionActor null/<=0
fallback tests for symmetry with ScriptExecutionActor.
Spec promised a per-script timeout but only the global ScriptExecutionTimeoutSeconds
existed. Add nullable TemplateScript.ExecutionTimeoutSeconds threaded through EF +
flattening (ResolvedScript) to ScriptExecutionActor/AlarmExecutionActor, which use
perScript ?? global for the execution CTS. Includes the EF migration for the new column.
HandleAlarmEvent set AlarmTypeName to the event-type NodeId string ("i=9341"),
but the client-side conditionFilter gate (and the OPC UA WhereClause) use friendly
type names — so a friendly-name filter built a correct server WhereClause yet the
client gate dropped every event (zero alarms delivered). Resolve the event-type
NodeId to its friendly name via an inverse of KnownConditionTypeIds (NodeId-string
fallback for custom types) so both sides agree. Also fix a dead-code ternary in
the SourceName derivation.
conditionFilter was plumbed end-to-end but applied nowhere — a filtered source
silently mirrored all conditions. Define the filter as a comma-separated,
case-insensitive list of condition type names (blank = all); enforce it
authoritatively client-side in DataConnectionActor routing (uniform across OPC UA
+ MxGateway) and, for OPC UA, additionally build a server-side EventFilter
WhereClause as a bandwidth optimization.
ExecuteWriteAsync only caught SqlException, so a live outage surfacing as
InvalidOperationException/SocketException/IOException/TimeoutException escaped
unclassified and crashed the script actor instead of buffering. Mirror the HTTP
path: propagate OperationCanceledException on cancellation, classify transport
exceptions as transient (buffer+retry), let unexpected exceptions propagate.
CachedWrite buffered ALL write failures and retried forever, never returning a
synchronous failure to the script — permanent SQL errors (constraint/syntax/
permission) were treated as transient. Mirror the External-System API path:
attempt immediately, return Failed synchronously on permanent SQL errors (no
buffering), buffer only transient errors; the S&F retry path parks permanent
failures instead of retrying forever. New SqlErrorClassifier + PermanentDatabaseException.
- FormatConnection now includes BackupConfigurationJson so a backup-only change
no longer renders identical Before/After cells (covers all 4 ConnectionsEqual fields)
- add ComputeConnectionsDiff(null, newConfig) first-deploy unit test
ComputeConnectionsDiff existed with tests but was never called and ConfigurationDiff
had no slot for it, so standalone connection endpoint/protocol/failover drift never
appeared in the deployment diff (only per-attribute binding drift did). Add a
ConnectionChanges slot, wire ComputeConnectionsDiff into ComputeDiff, and render the
connection section in the deployment diff UI.
- connection-name capable-set comparer kept as StringComparer.Ordinal:
FlatteningService and SemanticValidator use all-ordinal name-keyed
dictionaries throughout; OrdinalIgnoreCase would be inconsistent with
the rest of the binding-resolution path — added comment documenting this
- IsAlarmCapable protocol-match confirmed consistent with DataConnectionFactory
(both OrdinalIgnoreCase); added case-insensitive InlineData variants
(OPCUA, opcua, mxgateway, MXGATEWAY) to lock the contract
- clarified FlatteningPipeline comment: "filters connections by alarm-capable
protocol, then collects their names" (was "maps from the protocol string")
- added DataConnectionLayer/DataConnectionFactory.cs path reference to
AlarmCapableProtocols sync-risk comment
FlatteningPipeline loaded data connections but never passed the alarm-capable
connection set to SemanticValidator, so the native-alarm-source capability check
(built but inert) never ran — a source bound to a non-alarm-capable connection
deployed silently. Compute the capable set (IAlarmSubscribableConnection: OPC UA
+ MxGateway) and thread it through ValidationService to SemanticValidator.
The actual drift was NOT OccurredAtUtc's converter (a same-CLR-type
DateTime->DateTime ValueConverter emits no snapshot annotation and never
triggers PendingModelChangesWarning). The real pending change was a HasData
seed row: SecurityConfiguration adds LdapGroupMapping Id=5 (SCADA-Viewers ->
Viewer) but the model snapshot omitted it, so MsSqlMigrationFixture's
MigrateAsync threw PendingModelChangesWarning and failed every fixture-backed
AuditLog MSSQL test (~57).
Generated via `dotnet ef migrations add`; Up/Down are seed-data DML only
(InsertData/DeleteData of the single reference row) -- no schema DDL. The
snapshot now carries the Id=5 seed and has-pending-model-changes is clean.
Closes the Tier-1 silent gaps from the stillpending.md audit (#3-#6):
- AuditLog 365-day purge actor + reconciliation self-heal now actually
start and run on the central node (were dead code).
- SiteCall reconciliation pull (new PullSiteCalls RPC + plumbing) + daily
terminal-row purge scheduler.
- Site Event Logging now emits all 5 previously-missing categories
(alarm, deployment, instance_lifecycle, store_and_forward, notification,
script started/completed).
14 commits, each implement->review->fix. Build 0/0; cluster verified
healthy with the new singletons starting cleanly (bash docker/deploy.sh).
ScriptExecutionActor previously emitted only an Error 'script' event on failure.
It now also fire-and-forgets an Info 'script' event when execution starts (right
before RunAsync) and when it completes successfully — giving the operational log
the full started/completed/failed lifecycle. Uses the already-resolved
siteEventLogger; fire-and-forget so the event log can never block or fault the
script's own run.
Extends the SingleServiceProvider test helper to also serve IServiceScopeFactory
(returning a self-scope) so ScriptExecutionActor's serviceProvider.CreateScope()
reaches the logging hot path in tests instead of throwing into the catch.
StoreAndForwardService gains an optional ISiteEventLogger? ctor param (default
null so the many direct-construction tests still compile) and, when wired,
mirrors its own buffer/retry/park activity onto site operational events via the
existing OnActivity hook (which already isolates a throwing subscriber, so a
failing event log can never be misclassified as a transient delivery failure):
- store_and_forward (ExternalSystem / CachedDbWrite): queued/retried/delivered/
parked. Warning on buffer/retry, Error on park, Info on retry-recovery; an
immediate-success delivery is the hot path and is not logged.
- notification (the site forward-to-central path): logged ONLY on forward
FAILURE (buffered after the immediate forward threw) and on park, per the
Component-SiteEventLogging spec — routine enqueue and forward-success are
deliberately not logged (central's Notifications table is the audit record).
Wired through AddStoreAndForward (resolves ISiteEventLogger optionally from DI);
StoreAndForward project now references SiteEventLogging (acyclic: SiteEventLogging
references only Commons). Also documents the 'notification' category on the
ISiteEventLogger.LogEventAsync eventType param (folds in M1.8 doc fix).