docs: design for native-typed JSON List attribute values + data normalization
Encode emits native-typed JSON ([10,20], [true,false], ISO dates); Decode reads both old (array-of-strings) and new forms. Existing data normalized via an idempotent central MS SQL startup normalizer, active site SQLite normalization in the InstanceActor override-load path, and normalize-on-import for bundles. Approved via brainstorming (Approach B, thorough).
This commit is contained in:
@@ -0,0 +1,64 @@
|
||||
# Native-Typed JSON for List Attribute Values — Design
|
||||
|
||||
**Date:** 2026-06-16
|
||||
**Status:** Approved (brainstorming) — ready for implementation plan
|
||||
**Branch:** `feature/native-typed-json`
|
||||
|
||||
## Problem
|
||||
|
||||
The multi-value (List) attribute feature (shipped 2026-06-16, branch `feature/multivalue-attribute`) stores List values via `AttributeValueCodec` as a JSON **array of strings** — e.g. an `Int32` list is `["10","20","30"]` and a `Boolean` list is `["True","False"]`. This is internally consistent and round-trips, but it is not "native-typed" JSON: numbers and booleans are quoted, and `DateTime` uses a US-invariant format rather than ISO-8601. We want the canonical form to be native-typed (`[10,20,30]`, `[true,false]`, ISO dates), while existing persisted data is normalized to the new form (no dual-format data left behind).
|
||||
|
||||
## Decisions
|
||||
|
||||
| Decision | Choice |
|
||||
|---|---|
|
||||
| Encode form | Native-typed JSON: numbers/bools unquoted, strings quoted, `DateTime` as ISO-8601 string |
|
||||
| Decode | **Read both** old (array-of-strings) and new (native) forms — backward compatible |
|
||||
| Existing data | **Migrate** to native form across MS SQL + site SQLite + on bundle import (Approach B, thorough) |
|
||||
| MS SQL mechanism | Idempotent C# **startup normalizer** (not T-SQL — type-aware JSON re-emission is fragile in SQL) |
|
||||
| Site SQLite mechanism | **Active** normalization in the InstanceActor override-load path (it already has the element type) |
|
||||
| Bundles | Normalize **on import** (already-exported files are external/unreachable) |
|
||||
|
||||
**Reality note:** the List feature shipped this session and was not deployed to the docker cluster, so there is almost certainly **zero** old-form List data in any store yet. The migration is a safety net guaranteeing no dual-format data ever lingers, not a fix for existing broken data.
|
||||
|
||||
## Architecture
|
||||
|
||||
### 1. Codec (`AttributeValueCodec`, Commons) — foundation
|
||||
|
||||
- **`Encode`** — only the list branch changes. Instead of mapping each element to an invariant string then serializing, serialize the typed CLR collection directly (`JsonSerializer.Serialize<object>(enumerable, …)`) so `System.Text.Json` emits each element in its native JSON kind. STJ numbers/bools are culture-invariant by spec; `DateTime` serializes as round-trippable ISO-8601. Scalars (the `string` / `IFormattable` branches) are untouched.
|
||||
- **`Decode`** — read both forms. Deserialize to `JsonElement[]` (instead of `string?[]`); for each element feed `ParseScalar` either `GetString()` (JSON string element) or `GetRawText()` (number/bool element). So `[10,20]` and `["10","20"]` both decode to `List<int>{10,20}`; ISO and old US-invariant `DateTime` strings both parse via the existing `DateTime.Parse(…, RoundtripKind)`. A JSON `null` element still throws `FormatException` ("elements may not be null"), unchanged.
|
||||
- The read-both Decode is also what makes the migration idempotent: re-encoding an already-native value yields identical bytes.
|
||||
|
||||
### 2. MS SQL — idempotent central startup normalizer
|
||||
|
||||
A normalization step invoked once after `dbContext.Database.MigrateAsync(...)` in `MigrationHelper.ApplyOrValidateMigrationsAsync` (active central node only). For each List row:
|
||||
|
||||
- **`TemplateAttributes`** where `DataType = 'List'`: read `Value` + the row's own `ElementDataType`; compute `Encode(Decode(value, List, elementType))`; if it differs from the stored string, `UPDATE`.
|
||||
- **`InstanceAttributeOverrides`** for List attributes: these rows may have a null `ElementDataType` (it is currently informational — see follow-up #93/M3), so resolve the element type via the owning instance's template attribute (instance → `TemplateId` → `TemplateAttribute` by name → `ElementDataType`). Then re-encode as above.
|
||||
|
||||
Idempotent (native→native is a no-op `UPDATE`-skip), so the step is safe to leave in permanently and cheap on every subsequent startup (a scan, no writes). Per-row failures (malformed JSON, unresolved element type) are logged and skipped — normalization NEVER aborts startup (mirrors the audit/best-effort principle). The scan is bounded to List rows only.
|
||||
|
||||
### 3. Site SQLite — active normalization on override load
|
||||
|
||||
Site static-override values (`SiteStorageService`) are keyed by `(instance, canonicalName)` and carry no element type — the element type lives in the instance's flattened config. The natural normalization point is therefore the **InstanceActor override-load path** (`HandleOverridesLoaded`, added in MV-7), which already decodes both forms using the `ResolvedAttribute`'s `ElementDataType`. Extend it so that when a List override's stored string is in old form (i.e. `Encode(decoded)` differs from the stored string), it re-persists the native form via `SiteStorageService.SetStaticOverrideAsync`. This actively normalizes on every instance load (site startup / failover), reuses the existing decode + element type, and is idempotent.
|
||||
|
||||
### 4. Bundles — normalize on import
|
||||
|
||||
Already-exported `.bundle` files are external artifacts we cannot reach to rewrite; import already reads both forms via the codec. To ensure imported List values land in native form in the DB, the importer re-encodes List attribute `Value`s through the codec when writing (and the MS SQL normalizer is a backstop on next startup). No file rewriting.
|
||||
|
||||
## Error handling
|
||||
|
||||
- Decode of a genuinely malformed value still throws `FormatException`; the normalizers catch it per-row, log, and skip (no startup abort, no actor crash).
|
||||
- The codec change is additive on the wire (`gRPC` `string value` field unchanged; `List` is a new type with no external wire consumer relying on the old quoted form).
|
||||
|
||||
## Testing
|
||||
|
||||
- **Codec:** native-form encode per element type (`[10,20,30]`, `[true,false]`, ISO `DateTime`, strings stay `["a","b"]`); old-form backward-compat decode (`["10","20"]` → `List<int>`); round-trip for every element type; malformed still throws; culture-invariance preserved.
|
||||
- **MS SQL normalizer:** old-form row → rewritten to native; native row → untouched (idempotency); malformed row → skipped + logged, other rows still processed; override row element-type resolved via template attribute.
|
||||
- **Site SQLite / InstanceActor:** an old-form List override on load → re-persisted native (assert `SetStaticOverrideAsync` called with native form); a native override → not re-persisted (idempotent); scalar overrides unaffected.
|
||||
- **Bundle import:** importing an old-form bundle lands native-form Values in the DB.
|
||||
|
||||
## Out of scope / follow-ups
|
||||
|
||||
- Rewriting already-exported bundle files (unreachable).
|
||||
- This pairs naturally with follow-up **#93/M3** (populate `InstanceAttributeOverride.ElementDataType` on write); if done, the override normalizer could read the column directly instead of joining to the template attribute. Not required here.
|
||||
Reference in New Issue
Block a user