Files
scadaproj/upcoming.md
T
2026-06-01 09:51:22 -04:00

8.0 KiB

Upcoming normalization candidates

Backlog of cross-cutting concerns that are candidates for the component-normalization → shared-library treatment already applied to Auth (components/auth/ZB.MOM.WW.Auth) and UI Theme (components/ui-theme/ZB.MOM.WW.Theme).

Fit criteria (what made Auth and UI-Theme good fits): re-implemented in ≥2 of the three sister repos · a stable common core that can be extracted · per-project specifics that should stay per-project · drift that causes real bugs or blocks fleet-wide work.

Findings below are from a code scan of OtOpcUa, MxAccessGateway, and ScadaBridge on 2026-06-01. Re-verify against current code before starting a pass.

Scan summary

Concern OtOpcUa MxAccessGateway ScadaBridge Fit
Health checks 3-tier ready/active/healthz + Database/AkkaCluster/leader probes liveness only (/health/live) 3-tier ready/active + Database/AkkaCluster/ActiveNode Best
Observability / telemetry OTel + Prometheus exporter hand-rolled System.Diagnostics.Metrics, no exporter OpenTelemetry.Api only, no instrumentation Strong
Audit logging Akka broadcast → singleton batcher → ConfigAuditLog SQLite, API-key denials only Large Akka site/central pipeline (~3k LOC) Strong (model)
Logging Serilog MS.Extensions.Logging + correlation scope Serilog + enrichers Good (needs convergence)
Config + validation IOptions + DraftValidator IOptions + ~330-LOC validator IOptions + IValidateOptions + StartupValidator Good
gRPC contracts / clients consumes MxGateway proto pkg owns the 3 .proto (the break surface) own sitestream.proto + channel/keepalive factory Strategic
Result / error types BrowseResult etc. exc→RpcException map custom Result<T> Low (bikeshed)
Hosting/startup · bg services · resilience role-driven Akka host 2-process, modular DI role-branching Akka host Low / fold-in

Ranked recommendations

1. Health checks — ZB.MOM.WW.Health (do first)

Clearest duplication after Theme. OtOpcUa and ScadaBridge have near-identical three-tier health (/health/ready, /health/active, /healthz) with the same probe set (DatabaseHealthCheck, AkkaClusterHealthCheck, an active/leader check) — both descend from the same "ScadaLink three-tier pattern" (OtOpcUa's Health/HealthEndpoints.cs comment says so). MxGateway has only bare liveness (GatewayApplication.cs). Two diverging copies of a shared ancestor + one under-built = textbook fit.

  • Extract: the tier conventions (ready/active/healthz semantics + tagging) and reusable probes (Database, AkkaCluster, leader/active, gRPC-dependency).
  • Per-project: which probes each app registers; orchestrator wiring.
  • Evidence: OtOpcUa src/Server/ZB.MOM.WW.OtOpcUa.Host/Health/{HealthEndpoints,DatabaseHealthCheck,AkkaClusterHealthCheck,AdminRoleLeaderHealthCheck}.cs; ScadaBridge src/ZB.MOM.WW.ScadaBridge.Host/Health/{DatabaseHealthCheck,AkkaClusterHealthCheck,ActiveNodeHealthCheck}.cs + HealthMonitoring/; MxGateway src/ZB.MOM.WW.MxGateway.Server/GatewayApplication.cs.

2. Observability — ZB.MOM.WW.Telemetry

Three different, incompatible approaches mean you can't scrape or dashboard the fleet uniformly.

  • Extract: Meter/ActivitySource naming conventions + OTel + Prometheus-exporter bootstrap
    • a standard SCADA metric set.
  • Per-project: the actual instruments each app emits.
  • Pairs with Health into one "operability" cluster.
  • Evidence: OtOpcUa src/Server/ZB.MOM.WW.OtOpcUa.Host/Observability/ObservabilityExtensions.cs (+ Commons OtOpcUaTelemetry); MxGateway src/ZB.MOM.WW.MxGateway.Server/Metrics/GatewayMetrics.cs (~470 LOC, no exporter); ScadaBridge OpenTelemetry.Api dep only (no instrumentation).

3. Audit — shared event model + writer seam DONE

All three audit; the who-did-what record (actor / action / target / time / correlationId / detailsJson) is genuinely common, and it closes the loop on the Auth component (audit's "who" = identity). Transport differs (Akka cluster vs SQLite vs none), so extract only the model + IAuditWriter + redaction-filter seam; leave transport per-project. Scope tightly — ScadaBridge's pipeline is ~3k LOC.

  • Evidence: OtOpcUa Commons/Messages/Audit/AuditEvent.cs + ControlPlane/Audit/AuditWriterActor.cs
    • Configuration/Entities/ConfigAuditLog.cs; ScadaBridge src/ZB.MOM.WW.ScadaBridge.AuditLog/ (site + central) + Commons/Entities/Audit/; MxGateway Security/Authentication/SqliteApiKeyAuditStore.cs.
  • Delivered: shared library built at ZB.MOM.WW.Audit/ (1 nupkg @ 0.1.0); design at components/audit/; adoption backlog in components/audit/GAPS.md.

Strategic — the gRPC .proto break surface

CLAUDE.md names these as the cross-repo break surface ("a green build in one repo does not prove the others still interoperate"). Less a code library than a versioned contract package + interop tests (partly done — OtOpcUa already consumes ZB.MOM.WW.MxGateway.Client). Worth a dedicated pass focused on contract versioning and cross-repo interop checks, distinct from the others.

  • Surface: MxGateway src/ZB.MOM.WW.MxGateway.Contracts/Protos/{mxaccess_gateway,mxaccess_worker,galaxy_repository}.proto; ScadaBridge Communication/Protos/sitestream.proto.

Tier 2 (good, with caveats)

  • Logging — ZB.MOM.WW.Logging: strong overlap (Serilog bootstrap + enrichers SiteId/NodeRole/Host
    • correlation scope), but MxGateway uses MS.Extensions.Logging — step 1 is converging on Serilog. Natural to bundle with Telemetry as "observability". Note: the logging-family work for this family is being delivered as ZB.MOM.WW.Telemetry.Serilog by the health/observability normalization pass (components/health/), not as a standalone ZB.MOM.WW.Logging lib — a separate Logging candidate is not expected.
  • Config validation conventions: DoneZB.MOM.WW.Configuration built @ 0.1.0 (1 package, 27 tests): OptionsValidatorBase + ValidationBuilder primitives + AddValidatedOptions (ValidateOnStart) + pre-host ConfigPreflight (generalizes ScadaBridge's StartupValidator). Design: components/configuration/; implementation: ../ZB.MOM.WW.Configuration/. Adoption tracked in components/configuration/GAPS.md.

Skip / defer

  • Result/error primitives — trivially shareable but low-stakes and bikeshed-prone.
  • Hosting/startup, background services, resilience — bodies are too Akka-/app-specific; resilience even splits Polly (OtOpcUa, MxGateway) vs Akka supervision (ScadaBridge). A ZB.MOM.WW.Hosting aggregator makes sense only after Health/Telemetry/Logging exist, to bundle them behind one AddZbDefaults()-style call.

Suggested order

  1. Health checks (cleanest duplication, bounded). DoneZB.MOM.WW.Health built @ 0.1.0 (3 packages, 58 tests). Design: components/health/; implementation: ../ZB.MOM.WW.Health/. Adoption tracked in components/health/GAPS.md.
  2. Observability/telemetry (completes the operability cluster with Health). DoneZB.MOM.WW.Telemetry built @ 0.1.0 (2 packages, 19 tests). Design: components/observability/; implementation: ../ZB.MOM.WW.Telemetry/. MxAccessGateway MEL → Serilog logging adoption done on its own branch. Broader OtOpcUa/ScadaBridge telemetry adoption tracked in components/observability/GAPS.md.
  3. Audit model (ties back to Auth). Then revisit the gRPC contract surface and the Hosting aggregator.