Files
ScadaBridge/docs/plans/2026-06-16-native-typed-json-design.md
T
Joseph Doherty 91b1aa1275 docs: design for native-typed JSON List attribute values + data normalization
Encode emits native-typed JSON ([10,20], [true,false], ISO dates); Decode reads
both old (array-of-strings) and new forms. Existing data normalized via an
idempotent central MS SQL startup normalizer, active site SQLite normalization in
the InstanceActor override-load path, and normalize-on-import for bundles.
Approved via brainstorming (Approach B, thorough).
2026-06-16 17:08:38 -04:00

6.7 KiB

Native-Typed JSON for List Attribute Values — Design

Date: 2026-06-16 Status: Approved (brainstorming) — ready for implementation plan Branch: feature/native-typed-json

Problem

The multi-value (List) attribute feature (shipped 2026-06-16, branch feature/multivalue-attribute) stores List values via AttributeValueCodec as a JSON array of strings — e.g. an Int32 list is ["10","20","30"] and a Boolean list is ["True","False"]. This is internally consistent and round-trips, but it is not "native-typed" JSON: numbers and booleans are quoted, and DateTime uses a US-invariant format rather than ISO-8601. We want the canonical form to be native-typed ([10,20,30], [true,false], ISO dates), while existing persisted data is normalized to the new form (no dual-format data left behind).

Decisions

Decision Choice
Encode form Native-typed JSON: numbers/bools unquoted, strings quoted, DateTime as ISO-8601 string
Decode Read both old (array-of-strings) and new (native) forms — backward compatible
Existing data Migrate to native form across MS SQL + site SQLite + on bundle import (Approach B, thorough)
MS SQL mechanism Idempotent C# startup normalizer (not T-SQL — type-aware JSON re-emission is fragile in SQL)
Site SQLite mechanism Active normalization in the InstanceActor override-load path (it already has the element type)
Bundles Normalize on import (already-exported files are external/unreachable)

Reality note: the List feature shipped this session and was not deployed to the docker cluster, so there is almost certainly zero old-form List data in any store yet. The migration is a safety net guaranteeing no dual-format data ever lingers, not a fix for existing broken data.

Architecture

1. Codec (AttributeValueCodec, Commons) — foundation

  • Encode — only the list branch changes. Instead of mapping each element to an invariant string then serializing, serialize the typed CLR collection directly (JsonSerializer.Serialize<object>(enumerable, …)) so System.Text.Json emits each element in its native JSON kind. STJ numbers/bools are culture-invariant by spec; DateTime serializes as round-trippable ISO-8601. Scalars (the string / IFormattable branches) are untouched.
  • Decode — read both forms. Deserialize to JsonElement[] (instead of string?[]); for each element feed ParseScalar either GetString() (JSON string element) or GetRawText() (number/bool element). So [10,20] and ["10","20"] both decode to List<int>{10,20}; ISO and old US-invariant DateTime strings both parse via the existing DateTime.Parse(…, RoundtripKind). A JSON null element still throws FormatException ("elements may not be null"), unchanged.
  • The read-both Decode is also what makes the migration idempotent: re-encoding an already-native value yields identical bytes.

2. MS SQL — idempotent central startup normalizer

A normalization step invoked once after dbContext.Database.MigrateAsync(...) in MigrationHelper.ApplyOrValidateMigrationsAsync (active central node only). For each List row:

  • TemplateAttributes where DataType = 'List': read Value + the row's own ElementDataType; compute Encode(Decode(value, List, elementType)); if it differs from the stored string, UPDATE.
  • InstanceAttributeOverrides for List attributes: these rows may have a null ElementDataType (it is currently informational — see follow-up #93/M3), so resolve the element type via the owning instance's template attribute (instance → TemplateIdTemplateAttribute by name → ElementDataType). Then re-encode as above.

Idempotent (native→native is a no-op UPDATE-skip), so the step is safe to leave in permanently and cheap on every subsequent startup (a scan, no writes). Per-row failures (malformed JSON, unresolved element type) are logged and skipped — normalization NEVER aborts startup (mirrors the audit/best-effort principle). The scan is bounded to List rows only.

3. Site SQLite — active normalization on override load

Site static-override values (SiteStorageService) are keyed by (instance, canonicalName) and carry no element type — the element type lives in the instance's flattened config. The natural normalization point is therefore the InstanceActor override-load path (HandleOverridesLoaded, added in MV-7), which already decodes both forms using the ResolvedAttribute's ElementDataType. Extend it so that when a List override's stored string is in old form (i.e. Encode(decoded) differs from the stored string), it re-persists the native form via SiteStorageService.SetStaticOverrideAsync. This actively normalizes on every instance load (site startup / failover), reuses the existing decode + element type, and is idempotent.

4. Bundles — normalize on import

Already-exported .bundle files are external artifacts we cannot reach to rewrite; import already reads both forms via the codec. To ensure imported List values land in native form in the DB, the importer re-encodes List attribute Values through the codec when writing (and the MS SQL normalizer is a backstop on next startup). No file rewriting.

Error handling

  • Decode of a genuinely malformed value still throws FormatException; the normalizers catch it per-row, log, and skip (no startup abort, no actor crash).
  • The codec change is additive on the wire (gRPC string value field unchanged; List is a new type with no external wire consumer relying on the old quoted form).

Testing

  • Codec: native-form encode per element type ([10,20,30], [true,false], ISO DateTime, strings stay ["a","b"]); old-form backward-compat decode (["10","20"]List<int>); round-trip for every element type; malformed still throws; culture-invariance preserved.
  • MS SQL normalizer: old-form row → rewritten to native; native row → untouched (idempotency); malformed row → skipped + logged, other rows still processed; override row element-type resolved via template attribute.
  • Site SQLite / InstanceActor: an old-form List override on load → re-persisted native (assert SetStaticOverrideAsync called with native form); a native override → not re-persisted (idempotent); scalar overrides unaffected.
  • Bundle import: importing an old-form bundle lands native-form Values in the DB.

Out of scope / follow-ups

  • Rewriting already-exported bundle files (unreachable).
  • This pairs naturally with follow-up #93/M3 (populate InstanceAttributeOverride.ElementDataType on write); if done, the override normalizer could read the column directly instead of joining to the template attribute. Not required here.