diff --git a/upcoming.md b/upcoming.md new file mode 100644 index 0000000..b7aac1e --- /dev/null +++ b/upcoming.md @@ -0,0 +1,87 @@ +# Upcoming normalization candidates + +Backlog of cross-cutting concerns that are candidates for the +[component-normalization](components/README.md) → shared-library treatment already applied to +**Auth** ([`components/auth/`](components/auth/) → `ZB.MOM.WW.Auth`) and **UI Theme** +([`components/ui-theme/`](components/ui-theme/) → `ZB.MOM.WW.Theme`). + +**Fit criteria** (what made Auth and UI-Theme good fits): re-implemented in ≥2 of the three +sister repos · a stable common core that can be extracted · per-project specifics that should +*stay* per-project · drift that causes real bugs or blocks fleet-wide work. + +> Findings below are from a code scan of OtOpcUa, MxAccessGateway, and ScadaBridge on +> 2026-06-01. Re-verify against current code before starting a pass. + +## Scan summary + +| Concern | OtOpcUa | MxAccessGateway | ScadaBridge | Fit | +|---|---|---|---|---| +| **Health checks** | 3-tier `ready/active/healthz` + Database/AkkaCluster/leader probes | liveness only (`/health/live`) | 3-tier `ready/active` + Database/AkkaCluster/ActiveNode | ⭐ Best | +| **Observability / telemetry** | OTel + Prometheus exporter | hand-rolled `System.Diagnostics.Metrics`, no exporter | `OpenTelemetry.Api` only, no instrumentation | ⭐ Strong | +| **Audit logging** | Akka broadcast → singleton batcher → `ConfigAuditLog` | SQLite, API-key denials only | Large Akka site/central pipeline (~3k LOC) | ⭐ Strong (model) | +| **Logging** | Serilog | **MS.Extensions.Logging** + correlation scope | Serilog + enrichers | Good (needs convergence) | +| **Config + validation** | IOptions + `DraftValidator` | IOptions + ~330-LOC validator | IOptions + `IValidateOptions` + `StartupValidator` | Good | +| **gRPC contracts / clients** | consumes MxGateway proto pkg | **owns the 3 `.proto` (the break surface)** | own `sitestream.proto` + channel/keepalive factory | Strategic | +| Result / error types | `BrowseResult` etc. | exc→`RpcException` map | custom `Result` | Low (bikeshed) | +| Hosting/startup · bg services · resilience | role-driven Akka host | 2-process, modular DI | role-branching Akka host | Low / fold-in | + +## Ranked recommendations + +### 1. Health checks — `ZB.MOM.WW.Health` (do first) +Clearest duplication after Theme. OtOpcUa and ScadaBridge have **near-identical** three-tier +health (`/health/ready`, `/health/active`, `/healthz`) with the same probe set +(`DatabaseHealthCheck`, `AkkaClusterHealthCheck`, an active/leader check) — both descend from +the same "ScadaLink three-tier pattern" (OtOpcUa's `Health/HealthEndpoints.cs` comment says so). +MxGateway has only bare liveness (`GatewayApplication.cs`). Two diverging copies of a shared +ancestor + one under-built = textbook fit. +- **Extract:** the tier conventions (ready/active/healthz semantics + tagging) and reusable probes + (Database, AkkaCluster, leader/active, gRPC-dependency). +- **Per-project:** which probes each app registers; orchestrator wiring. +- Evidence: OtOpcUa `src/Server/ZB.MOM.WW.OtOpcUa.Host/Health/{HealthEndpoints,DatabaseHealthCheck,AkkaClusterHealthCheck,AdminRoleLeaderHealthCheck}.cs`; ScadaBridge `src/ZB.MOM.WW.ScadaBridge.Host/Health/{DatabaseHealthCheck,AkkaClusterHealthCheck,ActiveNodeHealthCheck}.cs` + `HealthMonitoring/`; MxGateway `src/ZB.MOM.WW.MxGateway.Server/GatewayApplication.cs`. + +### 2. Observability — `ZB.MOM.WW.Telemetry` +Three different, incompatible approaches mean you can't scrape or dashboard the fleet uniformly. +- **Extract:** `Meter`/`ActivitySource` naming conventions + OTel + Prometheus-exporter bootstrap + + a standard SCADA metric set. +- **Per-project:** the actual instruments each app emits. +- Pairs with Health into one "operability" cluster. +- Evidence: OtOpcUa `src/Server/ZB.MOM.WW.OtOpcUa.Host/Observability/ObservabilityExtensions.cs` + (+ `Commons` `OtOpcUaTelemetry`); MxGateway `src/ZB.MOM.WW.MxGateway.Server/Metrics/GatewayMetrics.cs` + (~470 LOC, no exporter); ScadaBridge `OpenTelemetry.Api` dep only (no instrumentation). + +### 3. Audit — shared event *model* + writer seam +All three audit; the who-did-what record (actor / action / target / time / correlationId / +detailsJson) is genuinely common, and it **closes the loop on the Auth component** (audit's +"who" = identity). Transport differs (Akka cluster vs SQLite vs none), so extract only the +**model + `IAuditWriter` + redaction-filter seam**; leave transport per-project. Scope tightly — +ScadaBridge's pipeline is ~3k LOC. +- Evidence: OtOpcUa `Commons/Messages/Audit/AuditEvent.cs` + `ControlPlane/Audit/AuditWriterActor.cs` + + `Configuration/Entities/ConfigAuditLog.cs`; ScadaBridge `src/ZB.MOM.WW.ScadaBridge.AuditLog/` + (site + central) + `Commons/Entities/Audit/`; MxGateway `Security/Authentication/SqliteApiKeyAuditStore.cs`. + +### Strategic — the gRPC `.proto` break surface +[`CLAUDE.md`](CLAUDE.md) names these as *the* cross-repo break surface ("a green build in one +repo does not prove the others still interoperate"). Less a code library than a **versioned +contract package + interop tests** (partly done — OtOpcUa already consumes +`ZB.MOM.WW.MxGateway.Client`). Worth a dedicated pass focused on contract versioning and +cross-repo interop checks, distinct from the others. +- Surface: MxGateway `src/ZB.MOM.WW.MxGateway.Contracts/Protos/{mxaccess_gateway,mxaccess_worker,galaxy_repository}.proto`; ScadaBridge `Communication/Protos/sitestream.proto`. + +### Tier 2 (good, with caveats) +- **Logging — `ZB.MOM.WW.Logging`:** strong overlap (Serilog bootstrap + enrichers SiteId/NodeRole/Host + + correlation scope), but MxGateway uses MS.Extensions.Logging — step 1 is converging on Serilog. + Natural to bundle with Telemetry as "observability". +- **Config validation conventions:** all three use IOptions + `IValidateOptions` + `ValidateOnStart`; + a shared validation base + startup-validation helper is reusable and pairs with the Auth options pattern. + +## Skip / defer +- **Result/error primitives** — trivially shareable but low-stakes and bikeshed-prone. +- **Hosting/startup, background services, resilience** — bodies are too Akka-/app-specific; resilience + even splits Polly (OtOpcUa, MxGateway) vs Akka supervision (ScadaBridge). A `ZB.MOM.WW.Hosting` + *aggregator* makes sense only **after** Health/Telemetry/Logging exist, to bundle them behind one + `AddZbDefaults()`-style call. + +## Suggested order +1. **Health checks** (cleanest duplication, bounded). 2. **Observability/telemetry** (completes the +operability cluster with Health). 3. **Audit model** (ties back to Auth). Then revisit the gRPC +contract surface and the `Hosting` aggregator.