Files
ScadaBridge/docs/plans/2026-06-18-m8-transport-design.md
T

190 lines
20 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Design: M8 — Transport (T18, T20)
**Date:** 2026-06-18
**Status:** Approved (brainstorming session) — ready for implementation planning
**Milestone:** M8 of `docs/plans/2026-06-15-stillpending-completion-design.md` (Phase 2 — Expand)
**Component:** #24 Transport (extends; component count stays 26)
**Source items:** `stillpending.md` Transport (#24): T18 (site/instance-scoped artifact transport + name-mapping), T20 (per-line/Myers diff for Modified artifacts); plus Tier-2 defect #16 (stale-instance enumeration stub).
## Goal
Extend the file-based, encrypted Transport bundle subsystem — today **central-config-only** (templates, shared scripts, external systems, central DB connections, notification lists, SMTP configs, API methods) — to also promote **site-scoped** and **instance-scoped** artifacts across environments, via a **name-mapping subsystem** that reconciles source-environment site/connection identities to the target environment. Replace the coarse line-count delta in the Modified-artifact diff with a real **per-line Myers diff**. Make the stale-instance enumeration real.
## Scope (locked decisions)
- **D-Scope** — Full M8: T18 + T20 + fix #16.
- **D1 (T18 granularity)** — **Sites + individual instances.** An operator can select a whole `Site` (pulling its `DataConnection`s + all its `Instance`s) and/or cherry-pick individual `Instance`s. Instances carry their `InstanceAttributeOverride`s, `InstanceAlarmOverride`s, `InstanceNativeAlarmSourceOverride`s, `InstanceConnectionBinding`s, and optional `Area` membership (by name).
- **D2 (name-mapping UX)** — **Interactive wizard step + CLI flags.** Auto-match source→target by `SiteIdentifier` / connection `Name`; operator overrides to an existing target or "create new". Unmapped references become blocker rows. CLI mirrors via `--map-site src=dst` and `--map-connection site/src=dst`.
- **D3 (environment-specific data)** — **Carry full config (encrypted secrets).** Site `NodeA/B` + `GrpcNodeA/B` addresses travel; new `DataConnection` `PrimaryConfiguration`/`BackupConfiguration` (endpoint + credentials) travel inside the encrypted `SecretsBlock`; operator overwrites target values post-import as needed.
### Out of scope (deferred to their own brainstorm, per completion design)
- **T19** — direct cluster-to-cluster pull; asymmetric bundle signing; differential/incremental bundles.
## Background — current Transport surface (as audited)
- **Content container:** `BundleContentDto` (`Transport/Serialization/EntityDtos.cs`): `TemplateFolders, Templates, SharedScripts, ExternalSystems, DatabaseConnections, NotificationLists, SmtpConfigs, ApiMethods` (+ legacy read-only `ApiKeys`). Persistence-shaped twin `EntityAggregate`.
- **Selection:** `ExportSelection` (Commons/Types/Transport) carries ID lists per type + `IncludeDependencies`. Across the wire (CLI/ManagementActor) selection is **name**-based; resolved to IDs in `ManagementActor`.
- **Flow:** `BundleExporter.ExportAsync``DependencyResolver.ResolveAsync``EntitySerializer.ToBundleContent``ManifestBuilder.Build``BundleSerializer.Pack`. Import: `BundleImporter.LoadAsync` (envelope/zip-bomb guards, manifest+hash validation, decrypt) → `PreviewAsync` (per-type `ArtifactDiff.Compare*` + `DetectBlockersAsync`) → `ApplyAsync` (single EF transaction; per-type `Apply*Async`; intermediate flush; second-pass rewire `ResolveAlarmScriptLinksAsync`/`ResolveCompositionEdgesAsync`; `BundleImported` audit; commit).
- **Conflict model:** `ImportPreviewItem(EntityType, Name, ExistingVersion?, IncomingVersion?, ConflictKind{Identical,Modified,New,Blocker}, FieldDiffJson?, BlockerReason?)`; `ImportResolution(EntityType, Name, ResolutionAction{Add,Overwrite,Skip,Rename}, RenameTo?)`; `ImportResult(BundleImportId, Added, Overwritten, Skipped, Renamed, StaleInstanceIds, AuditEventCorrelation, ApiKeysIgnored)`.
- **Diff:** `ArtifactDiff` emits coarse field changes + `<N lines>` markers for code (`AddCodeChangeIfDifferent`). Per-line diff explicitly deferred (T20).
- **`#16`:** `ImportResult.StaleInstanceIds = Array.Empty<int>()` (BundleImporter.cs ~L736) — known stub.
- **Plumbing:** `TransportCommands.cs` (`ExportBundleCommand`/`PreviewBundleCommand`/`ImportBundleCommand`, base64 envelope); `ManagementActor` handlers ~L23012468; CLI `BundleCommands.cs`.
## Site-scoped entity model (as audited)
- **`Site`** (`Commons/Entities/Sites/Site.cs`): `Id`, `Name`, `SiteIdentifier` (unique routing key), `Description?`, `NodeAAddress?`, `NodeBAddress?`, `GrpcNodeAAddress?`, `GrpcNodeBAddress?`. Repo `ISiteRepository` (`GetSiteByIdentifierAsync`, `GetAllSitesAsync`, `GetDataConnectionsBySiteIdAsync`, …).
- **`DataConnection`** (`Commons/Entities/Sites/DataConnection.cs`): `Id`, `SiteId` (FK), `Name` (unique within site), `Protocol`, `PrimaryConfiguration?` (JSON: endpoint+creds), `BackupConfiguration?`, `FailoverRetryCount`. Repo `ISiteRepository`.
- **`Instance`** (`Commons/Entities/Instances/Instance.cs`): `Id`, `TemplateId` (FK), `SiteId` (FK), `AreaId?`, `UniqueName` (globally unique), `State{NotDeployed,Enabled,Disabled}`, + collections `AttributeOverrides`, `AlarmOverrides`, `ConnectionBindings`, `NativeAlarmSourceOverrides`. Repos `ITemplateEngineRepository` (CRUD) + `IDeploymentManagerRepository`.
- **Overrides/bindings:**
- `InstanceAttributeOverride(Id, InstanceId, AttributeName, OverrideValue?, ElementDataType?)`.
- `InstanceAlarmOverride(Id, InstanceId, AlarmCanonicalName, TriggerConfigurationOverride?, PriorityLevelOverride?)`.
- `InstanceNativeAlarmSourceOverride(Id, InstanceId, SourceCanonicalName, ConnectionNameOverride?, SourceReferenceOverride?, ConditionFilterOverride?)`**name-based** connection ref.
- `InstanceConnectionBinding(Id, InstanceId, AttributeName, DataConnectionId, DataSourceReferenceOverride?)`**numeric FK** connection ref.
- **Staleness:** `DeploymentService.GetDeploymentComparisonAsync` compares `DeployedConfigSnapshot.RevisionHash` to the freshly-flattened hash. Deployments page surfaces drift; redeploy is the standard flow (Transport never bypasses it).
- **Audited handlers:** `ManagementActor` `HandleCreate/Update/DeleteSite`, `…DataConnection`, `HandleCreateInstance` (`InstanceService`), `HandleSetConnectionBindings`, `HandleSetInstanceOverrides`. Site-scope enforced via `EnforceSiteScope` / `EnforceSiteScopeForInstance`.
## Architecture
### A. Bundle format extension (additive)
Extend the existing format — **no second bundle type** (rejected: duplicates machinery, splits UX).
- `BundleContentDto` gains additive arrays:
- `Sites: IReadOnlyList<SiteDto>``SiteDto(SiteIdentifier, Name, Description?, NodeAAddress?, NodeBAddress?, GrpcNodeAAddress?, GrpcNodeBAddress?)`.
- `DataConnections: IReadOnlyList<DataConnectionDto>``DataConnectionDto(SiteIdentifier, Name, Protocol, int FailoverRetryCount, SecretsBlock? Secrets)`. `PrimaryConfiguration`/`BackupConfiguration` (endpoint+creds) ride **`Secrets`** (presence-only diff, encrypted-at-rest in the bundle). Named to avoid collision with the existing `DatabaseConnectionDto` (External-System DB defs).
- `Instances: IReadOnlyList<InstanceDto>``InstanceDto(UniqueName, TemplateName, SiteIdentifier, AreaName?, InstanceState State, AttributeOverrides[], AlarmOverrides[], NativeAlarmSourceOverrides[], ConnectionBindings[])`, with child DTOs:
- `InstanceAttributeOverrideDto(AttributeName, OverrideValue?, DataType? ElementDataType)`.
- `InstanceAlarmOverrideDto(AlarmCanonicalName, TriggerConfigurationOverride?, PriorityLevelOverride?)`.
- `InstanceNativeAlarmSourceOverrideDto(SourceCanonicalName, ConnectionNameOverride?, SourceReferenceOverride?, ConditionFilterOverride?)`.
- `InstanceConnectionBindingDto(AttributeName, ConnectionName, DataSourceReferenceOverride?)` — note: **`ConnectionName`** (resolved from the source FK at export) replaces the numeric FK, so it is portable.
- `BundleSummary` + manifest `contents[]` extended with `Sites`, `DataConnections`, `Instances` (counts + `dependsOn` edges, e.g. `Instance:Tank01 dependsOn ["Template:Tank","Site:plant-a","DataConnection:plant-a/opc-main"]`).
- **Versioning:** `bundleFormatVersion` stays **1**; `schemaVersion` **1.0 → 1.1** (additive-only; older importers accept minor increments and list the new entity entries as "skipped — unsupported in this version" per the existing forward-compat rule).
### B. Name-mapping subsystem (core of T18)
New Commons type:
```
BundleNameMap(
IReadOnlyList<SiteMapping> Sites,
IReadOnlyList<ConnectionMapping> Connections)
SiteMapping(string SourceSiteIdentifier, MappingAction Action, string? TargetSiteIdentifier)
ConnectionMapping(string SourceSiteIdentifier, string SourceConnectionName, MappingAction Action, string? TargetConnectionName)
enum MappingAction { MapToExisting, CreateNew }
```
- **Detection (PreviewAsync):** after deserialize, collect the distinct source `Site`s and `DataConnection`s referenced by the bundle's instances (via `ConnectionBindings.ConnectionName`, `NativeAlarmSourceOverrides.ConnectionNameOverride`, and `Instance.SiteIdentifier`) **and** the sites/connections carried in the bundle directly. Return them as `RequiredSiteMappings` / `RequiredConnectionMappings` on the preview, each pre-populated with an auto-match: `MapToExisting` when a target `SiteIdentifier` (or `(site,name)` connection) already exists; otherwise `CreateNew`.
- **Wizard Map step:** renders the required mappings as a table; per row a dropdown `{ <existing target> | Create new }`. Connections are grouped under their (mapped) target site.
- **CLI:** `--map-site srcIdentifier=dstIdentifier` (repeatable), `--map-connection srcSiteIdentifier/srcName=dstName` (repeatable); a `--create-missing-sites`/`--create-missing-connections` convenience defaults unmatched rows to `CreateNew` (otherwise unmatched → blocker, preserving fail-safe behaviour for automation).
- **Blockers:** any referenced site/connection that resolves to neither a mapping target nor an in-bundle artifact → blocker row (`BundleImportConnectionUnresolved` / unresolved-site reason). Apply disabled until resolved.
### C. Identity vs conflict (two orthogonal axes)
1. **Identity resolution** — name-map for `Site`/`DataConnection`; `UniqueName` for `Instance`; `Name` for all existing central types. This decides *which* target row an artifact corresponds to.
2. **Conflict resolution**`ResolutionAction{Add,Overwrite,Skip,Rename}` then applies per artifact exactly as today:
- `MapToExisting` + `Skip` → leave target untouched.
- `MapToExisting` + `Overwrite` → apply bundle config to the existing target.
- `CreateNew``Add`.
- `Rename` for instances uses a new `UniqueName`; for sites a new `SiteIdentifier` (rare).
### D. Export flow
- `ExportSelection` gains `SiteIds: IReadOnlyList<int>` + `InstanceIds: IReadOnlyList<int>`.
- `DependencyResolver` expansion (when `IncludeDependencies`):
- selected **Site** → its `DataConnection`s + all its `Instance`s;
- selected **Instance** → its `Site`, its `DataConnection`s (referenced by bindings + native-alarm overrides), and its `Template` (existing template-closure expansion);
- `Instance``Template` → existing shared-script / external-system expansion already covers transitive deps.
- `EntitySerializer.ToBundleContent` resolves each instance's `TemplateId`→name, `SiteId``SiteIdentifier`, `AreaId`→name, and each `ConnectionBinding.DataConnectionId``(siteIdentifier, connectionName)`; carries connection config into `Secrets`.
- Export wizard Step 1 adds a **Sites** group (with nested instances) + a flat **Instances** selector; Step 2 dependency review shows the new edges. `RequireDesign` unchanged.
### E. Import flow + wizard
- Wizard: Upload → Passphrase → **Map** *(new; shown only when the bundle contains site/instance-scoped artifacts)* → Diff & resolve → Confirm → Result.
- `IBundleImporter.PreviewAsync` returns the diff **plus** required mappings. `ApplyAsync(sessionId, resolutions, nameMap, user, ct)` (signature extended with `nameMap`) runs in the existing single EF transaction:
1. Resolve/create target `Site`s per the site-map (CreateNew → full `SiteDto` config; MapToExisting → use target, honour conflict action).
2. Resolve/create target `DataConnection`s per the connection-map (config from `Secrets`).
3. Existing central-config apply (`ApplyTemplatesAsync`, …) — unchanged.
4. Upsert `Instance`s: resolve `TemplateName`→target template id (blocker if unresolved), `SiteIdentifier`→target site id (from map), `AreaName`→area (create-if-missing within target site), `State` forced to `NotDeployed` on import; write overrides; write `ConnectionBinding`s with `ConnectionName`→target `DataConnectionId` (from map) in the second-pass rewire; rewrite `NativeAlarmSourceOverride.ConnectionNameOverride` to the mapped target name.
5. Semantic validation (existing two-tier) — unchanged.
6. `BundleImported` audit + `BundleImportId` correlation (existing) — now also covers the per-entity Site/DataConnection/Instance rows because we route through the audited repository methods.
7. Commit.
- Reuses audited repository paths so per-entity audit rows + `BundleImportId` correlation flow automatically; site-scope (`EnforceSiteScope`) honoured for each target site at import. `RequireAdmin` unchanged.
- **Blockers:** instance whose `TemplateName` resolves in neither bundle nor target; unresolved connection/site references (see B).
### F. T20 — per-line Myers diff
- New pure helper `LineDiffer` (Transport project or Commons/Types/Transport) implementing **Myers O(ND)** LCS over `\n`-split lines (custom, no third-party library — matches the custom-`KpiTrendChart` precedent; central package management discourages new deps).
- Output: structured hunks of `{ Op: Context|Add|Remove, Line, OldLineNo?, NewLineNo? }`, **size-capped** by `TransportOptions` (`MaxDiffLines` default ~400, `MaxDiffHunks`); beyond the cap, emit a `truncated:true` marker + a final `<+N / -M more lines>` summary so very large scripts don't bloat `FieldDiffJson`.
- `ArtifactDiff.AddCodeChangeIfDifferent` (and the script-children path) replace the `<N lines>` marker with the hunk diff embedded in `FieldDiffJson` for `TemplateScript.Code`, `SharedScript.Code`, `ApiMethod.Script` (and `ExternalSystemMethod` body if present). Non-code fields keep coarse value diffs.
- Import wizard Modified rows render the `+/-` line view (fulfils the spec's Step-3 "shows `+/-/~` line diff" promise that the implementation never met). bUnit + Playwright coverage for the rendered diff.
### G. #16 — real stale-instance enumeration
- `ImportResult.StaleInstanceIds` becomes real. Inside `ApplyAsync`, before commit: for every **Template Overwritten** by the import, enumerate existing **deployed** target instances of that template and compute hash drift (reuse the flatten + `DeployedConfigSnapshot.RevisionHash` comparison the Deployments page already uses; called inside the open transaction against staged changes, or post-flush pre-commit). Return the drifting instance ids.
- Directly-imported instances land `NotDeployed` → surfaced as **new** in the result (not "stale"; stale = deployed-and-drifted).
- Step 4 "N instances will become stale" shows the real count; Step 5's existing Deployments deep-link gets real ids. UI already wired — this fills it with data.
## Error handling (additions to the existing table)
| Where | Failure | Surfaced as |
|---|---|---|
| Map | Source site referenced, no auto-match, not mapped, `CreateNew` not chosen | Blocker row: "Site '{id}' not mapped to a target" |
| Map | Connection referenced by an instance binding, unmapped & not in bundle | Blocker row (`BundleImportConnectionUnresolved`) |
| Apply | Instance's `TemplateName` resolves nowhere | Blocker row before Apply enabled |
| Apply | Create-new site/connection but config absent (unencrypted bundle stripped) | Created with available fields; warning event; operator completes config post-import |
All-or-nothing per bundle (unchanged): the EF transaction wraps site/connection/instance writes + audit; rollback removes everything.
## Security
- DataConnection endpoints + credentials ride the **encrypted** `SecretsBlock`; unencrypted export still trips `UnencryptedBundleExport`. Diffs are presence-only for secrets (never echo endpoints/creds).
- Import remains `RequireAdmin`; site-scope enforced per target site.
- No new HTTP channel; bundle bytes ride the existing base64 `/management` envelope (already raised to 200 MB).
## Authorization (unchanged)
| Operation | Role |
|---|---|
| Export page / `bundle export` (incl. sites/instances) | `RequireDesign` |
| Import page / `bundle preview`+`import` (incl. sites/instances) | `RequireAdmin` |
## Audit
- Reuse existing per-entity audited repository methods for Site/DataConnection/Instance create/update so rows carry `BundleImportId`.
- New warning events (mirroring existing `BundleImportAlarmScriptUnresolved` / `…CompositionUnresolved`): `BundleImportConnectionUnresolved`, `BundleImportSiteUnresolved` (when a non-blocking soft-reference can't bind).
## Testing strategy
- **Unit:** `LineDiffer` (Myers correctness + size-cap + edge cases: empty, identical, all-add, all-remove, CRLF); `BundleNameMap` auto-match + CLI flag parsing; `EntitySerializer` round-trip for `SiteDto`/`DataConnectionDto`/`InstanceDto` (incl. connection-name resolution); `ArtifactDiff` Site/Connection/Instance compare.
- **Integration:** export a site (+connections+instances) → import to a fresh target with a name-map (create-new) and to a populated target (map-to-existing + Overwrite/Skip); FK remapping of `ConnectionBinding`/`NativeAlarmSourceOverride`; blocker on unresolved template/connection; `#16` stale enumeration (overwrite a template that has deployed instances → assert returned ids drift); rollback leaves nothing partial.
- **bUnit:** Map step rendering + auto-match; Modified row `+/-` diff render.
- **Playwright:** export-with-site → import-with-mapping happy path; diff line view.
- **Targeted builds/tests per task; full-solution build + `bash docker/deploy.sh` rebuild + live smoke at integration.**
## Forward compatibility
- Older importers (schemaVersion 1.0) reading a 1.1 bundle: unknown `Site`/`DataConnection`/`Instance` entries listed as "skipped — unsupported in this version" (existing rule). Central-config artifacts still import.
- `bundleFormatVersion` unchanged (1) — no hard refusal.
## Execution plan (waves — for writing-plans)
- **Wave A (format + name-map foundation):** Commons DTOs/types (`SiteDto`/`DataConnectionDto`/`InstanceDto` + children, `BundleNameMap`, extended `ExportSelection`/`BundleSummary`/`ManifestContentEntry` usage, extended `ImportResult`/preview required-mappings), `LineDiffer` pure helper. *(high-risk: data contracts; standard: LineDiffer)*
- **Wave B (export):** `DependencyResolver` site/instance expansion; `EntitySerializer` site/connection/instance mapping (FK→name); `ManifestBuilder`/summary counts; `ExportSelection` wiring + `ManagementActor.HandleExportBundle` name resolution; CLI `bundle export` site/instance flags. *(high-risk)*
- **Wave C (diff + preview):** `ArtifactDiff` Myers integration (T20) + Site/Connection/Instance compare; `BundleImporter.PreviewAsync` required-mapping detection + blocker scan extension. *(standard/high-risk)*
- **Wave D (apply + name-map + #16):** `BundleImporter.ApplyAsync` `nameMap` param + site/connection resolve-or-create + instance upsert + FK rewire; `#16` stale enumeration; `ManagementActor.HandleImportBundle`/`PreviewBundle` plumbing; CLI `bundle import` `--map-*` flags. *(high-risk; serial reviews)*
- **Wave E (UI):** Export wizard Sites/Instances selection; Import wizard Map step + Modified `+/-` diff render. *(standard)*
- **Integration:** docs (Component-Transport.md, Component-CLI.md, README, stillpending.md, completion-design.md), full build, docker rebuild, Playwright, live smoke, end-to-end cross-cluster trace.
## Open items / risks
- The name-map detection must be exhaustive: every place a transported instance points at a connection (`ConnectionBinding`, `NativeAlarmSourceOverride`) and at a site must be covered, or an import silently drops a binding. Integration tests assert FK remap explicitly.
- `#16` hash-drift computation inside the open transaction must use the post-staged template state; verify it doesn't double-count directly-imported (NotDeployed) instances.
- `Area` create-if-missing is the lightest reasonable choice; if areas grow fields later, revisit.
## Next step
Hand off to writing-plans to produce the bite-sized, task-metadata'd implementation plan + `.tasks.json`, then execute subagent-driven in this worktree.