fix(dcl): robust static-tag seeding — bounded-retry+log initial seed (#1) and re-seed on reconnect (#3)

STATIC tags (no further OnDataChange after advise) depend entirely on the
seed read. Pre-fix HandleSubscribe seeded only on Success && Value != null,
silently dropping a seed that raced the just-created advise (VT_EMPTY) — so a
static tag stayed Uncertain forever while the source read Good. ReSubscribeAll
did no seeding at all, so a static tag could not self-heal across reconnect.

- New SeedTagsAsync helper: per-tag ReadAsync (not a bulk read — some gateways
  time out on large batches) with round-based bounded retry
  (SeedReadMaxAttempts/SeedReadRetryDelay), logging any tag that never yields a
  value (named — previously zero log trace).
- HandleSubscribe seed loop delegates to SeedTagsAsync.
- ReSubscribeAll re-seeds re-advised tags after reconnect via the
  generation-guarded TagValueReceived path (fan-out keys off
  _subscriptionsByInstance, preserved across reconnect).

Diagnosed live on wonder-app-vd03 2026-06-17 (see scadabridge-dcl-static-tag-false-bad).
Mechanism #2 (single transient-bad push) left as a follow-up.
This commit is contained in:
Joseph Doherty
2026-06-18 09:23:23 -04:00
parent 3782ebdadb
commit 72aec3b4d4
3 changed files with 229 additions and 9 deletions
@@ -744,25 +744,73 @@ public class DataConnectionActor : UntypedActor, IWithStash, IWithTimers
// MES field that never changes) the dropped seed is the only value it will
// ever produce — leaving the attribute Uncertain forever. So the seeds ride
// back on SubscribeCompleted and are delivered after registration.
var seedValues = new List<SeededValue>(tagsToSeed.Count);
foreach (var tagPath in tagsToSeed)
// DataConnectionLayer-027: SeedTagsAsync retries the still-empty reads so a
// seed that races the just-created advise (returns VT_EMPTY) is not silently
// dropped, and logs any tag that never yields a value.
var seedValues = await SeedTagsAsync(_adapter, tagsToSeed);
return new SubscribeCompleted(request, sender, results, seedValues);
}).PipeTo(self);
}
/// <summary>
/// DataConnectionLayer-027: reads the current value of each tag so the Instance Actor
/// has an initial value, retrying the still-empty subset a bounded number of times. A
/// STATIC tag (one that emits no further OnDataChange after the advise) depends
/// entirely on this seed; on a cold/fresh advise the read can race the just-created
/// subscription and return an empty/failed result, which pre-fix was swallowed —
/// leaving the attribute Uncertain forever even though the source reads Good. Per-tag
/// <see cref="IDataConnection.ReadAsync"/> (not a single bulk read) is deliberate:
/// some gateways time out on a large batch. The retry delay is applied once per round
/// across the whole pending subset, so total added latency is bounded to
/// <c>(attempts - 1) × SeedReadRetryDelay</c> regardless of tag count. Tags still
/// empty after the budget are logged (named) and left to heal from a future change.
/// Runs on a background task: it reads only the supplied <paramref name="adapter"/>
/// and returns the seeds — all actor-state mutation/delivery stays on the actor thread.
/// </summary>
private async Task<List<SeededValue>> SeedTagsAsync(IDataConnection adapter, IReadOnlyCollection<string> tags)
{
var seedValues = new List<SeededValue>(tags.Count);
if (tags.Count == 0)
return seedValues;
var pending = new HashSet<string>(tags);
var attempts = Math.Max(1, _options.SeedReadMaxAttempts);
for (var attempt = 1; attempt <= attempts && pending.Count > 0; attempt++)
{
foreach (var tagPath in pending.ToList())
{
try
{
var readResult = await _adapter.ReadAsync(tagPath);
if (readResult.Success && readResult.Value != null)
var readResult = await adapter.ReadAsync(tagPath);
if (readResult.Success && readResult.Value is { Value: not null } value)
{
seedValues.Add(new SeededValue(tagPath, readResult.Value));
seedValues.Add(new SeededValue(tagPath, value));
pending.Remove(tagPath);
}
}
catch
{
// Best-effort — subscription will deliver subsequent changes
// Best-effort read — retried below, or logged once the budget is spent.
}
}
return new SubscribeCompleted(request, sender, results, seedValues);
}).PipeTo(self);
if (pending.Count > 0 && attempt < attempts)
await Task.Delay(_options.SeedReadRetryDelay);
}
if (pending.Count > 0)
{
const int maxNamed = 20;
var named = string.Join(", ", pending.Take(maxNamed));
if (pending.Count > maxNamed)
named += $", … (+{pending.Count - maxNamed} more)";
_log.Warning(
"[{0}] Seed read returned no value for {1} tag(s) after {2} attempt(s); they stay Uncertain until a change notification arrives: {3}",
_connectionName, pending.Count, attempts, named);
}
return seedValues;
}
/// <summary>
@@ -1542,6 +1590,36 @@ public class DataConnectionActor : UntypedActor, IWithStash, IWithTimers
return new TagResolutionFailed(tagPath, t.Exception?.GetBaseException().Message ?? "Unknown error");
}).PipeTo(self);
}
// DataConnectionLayer-027: re-advising alone does NOT restore a STATIC tag's value.
// PushBadQualityForAllTags flipped every tag Bad on disconnect, and a tag that fires
// no further OnDataChange would stay Bad/Uncertain across the reconnect even though
// the source reads Good — the same seed gap as the initial subscribe (DCL-026/027),
// which this path previously did not cover. Mirror HandleSubscribe and re-seed: read
// each re-advised tag's current value and deliver it through the normal
// generation-guarded path. Capture the adapter + generation up front so a later
// failover (which swaps _adapter and bumps _adapterGeneration) cannot misroute the
// reads or deliver a stale value — HandleTagValueReceived drops any value whose
// generation no longer matches. Delivery fans out via _subscriptionsByInstance
// (preserved across reconnect), so it does not depend on _subscriptionIds being
// repopulated for the new adapter generation. Seeds delivered after registration is
// already established, so the DCL-026 ordering hazard does not apply here.
var reseedAdapter = _adapter;
Task.Run(async () =>
{
try
{
var seeds = await SeedTagsAsync(reseedAdapter, allTags);
foreach (var seed in seeds)
self.Tell(new TagValueReceived(seed.TagPath, seed.Value, generation));
}
catch (Exception ex)
{
// Fire-and-forget: guarantee a trace rather than an unobserved task fault.
// SeedTagsAsync already swallows per-read errors, so reaching here is unexpected.
_log.Warning("[{0}] Reconnect re-seed task faulted: {1}", _connectionName, ex.Message);
}
});
}
// ── Health Reporting (WP-13) ──
@@ -20,4 +20,24 @@ public class DataConnectionOptions
/// disconnect toward the failover retry count (DataConnectionLayer-009).
/// </summary>
public TimeSpan StableConnectionThreshold { get; set; } = TimeSpan.FromSeconds(60);
/// <summary>
/// DataConnectionLayer-027: total number of attempts the seed read makes per tag
/// before giving up. The initial subscribe seed (and the reconnect re-seed) reads
/// each just-advised tag to capture its current value; on a cold/fresh advise the
/// read can race the subscription and return an empty/failed result. A STATIC tag
/// (one that never fires another OnDataChange) would then stay Uncertain forever,
/// because its bridge value depends entirely on that seed. Re-reading the still-empty
/// subset a bounded number of times lets the advise warm up. Minimum 1 (1 = single
/// read, no retry).
/// </summary>
public int SeedReadMaxAttempts { get; set; } = 3;
/// <summary>
/// DataConnectionLayer-027: delay between seed-read attempts for the tags that have
/// not yet returned a usable value. Applied once per round across the whole pending
/// subset, so the total added latency is bounded to
/// <c>(SeedReadMaxAttempts - 1) × SeedReadRetryDelay</c> regardless of tag count.
/// </summary>
public TimeSpan SeedReadRetryDelay { get; set; } = TimeSpan.FromMilliseconds(250);
}