docs: implementation plans for ZB.MOM.WW.Health + ZB.MOM.WW.Telemetry
Two TDD plans (one per library, per house precedent) derived from the approved design, with co-located .tasks.json execution-persistence: - Health: components/health docs + 3 dependency-split packages (11 tasks) - Telemetry: components/observability docs + 2 packages (3 OTel signals + Serilog) + the MxGateway MEL->Serilog migration (12 tasks) Each task carries classification / est-time / parallelizable metadata for the executing-plans workflow.
This commit is contained in:
@@ -0,0 +1,265 @@
|
||||
# ZB.MOM.WW.Health Shared Library Implementation Plan
|
||||
|
||||
> **For Claude:** REQUIRED SUB-SKILL: Use superpowers-extended-cc:executing-plans to implement this plan task-by-task.
|
||||
|
||||
**Goal:** Author the `components/health/` normalization docs and build the `ZB.MOM.WW.Health` shared library set (3 dependency-split NuGet packages) that normalizes the three-tier health pattern (`/health/ready` · `/health/active` · `/healthz`) and its reusable probes, so OtOpcUa, MxAccessGateway, and ScadaBridge can stop re-implementing health checks and MxGateway can gain readiness/active tiers it lacks today.
|
||||
|
||||
**Architecture:** A new standalone nested repo (`~/Desktop/scadaproj/ZB.MOM.WW.Health`), .NET 10, three library projects — `ZB.MOM.WW.Health` (core: tier mapping, JSON writer, `IActiveNodeGate`, gRPC-dependency probe; depends only on ASP.NET Core HealthChecks abstractions), `ZB.MOM.WW.Health.Akka` (Akka cluster + leader/active probes; opt-in Akka dep), `ZB.MOM.WW.Health.EntityFrameworkCore` (`DatabaseHealthCheck<TContext>`; opt-in EF dep). Heavy deps live in satellites so MxGateway (non-Akka, non-EF) references core only. Reference implementations are lifted and generalized from OtOpcUa and ScadaBridge. **Consumer adoption (wiring into the three apps) is a separate follow-on `GAPS.md` item — this plan delivers docs + library + tests + packages only**, matching where Auth/Theme sit.
|
||||
|
||||
**Tech Stack:** .NET 10, C#; xUnit + coverlet; `Microsoft.AspNetCore.Mvc.Testing` (WebApplicationFactory); `Microsoft.Extensions.Diagnostics.HealthChecks`; `AspNetCore.HealthChecks.UI.Client` (response-writer style); `Akka.Cluster` 1.5.62; `Microsoft.EntityFrameworkCore` 10.0.7 (+ `.Sqlite`/`.InMemory` for tests); `Grpc.Net.Client`; central package management (`Directory.Packages.props`); `.slnx` solution; `Version` 0.1.0 lockstep.
|
||||
|
||||
**Source references (read-only, to port/generalize from):**
|
||||
- Three-tier + probes: OtOpcUa `~/Desktop/OtOpcUa/src/Server/ZB.MOM.WW.OtOpcUa.Host/Health/{HealthEndpoints,DatabaseHealthCheck,AkkaClusterHealthCheck,AdminRoleLeaderHealthCheck}.cs`
|
||||
- Probes + gate + JSON writer: ScadaBridge `~/Desktop/ScadaBridge/src/ZB.MOM.WW.ScadaBridge.Host/Health/{DatabaseHealthCheck,AkkaClusterHealthCheck,ActiveNodeHealthCheck,ActiveNodeGate}.cs` and `Program.cs:114-117,222-233`
|
||||
- The gap to fill: MxGateway `~/Desktop/MxAccessGateway/src/ZB.MOM.WW.MxGateway.Server/GatewayApplication.cs:61,139-145`
|
||||
- Design: `~/Desktop/scadaproj/docs/plans/2026-06-01-health-observability-components-design.md`
|
||||
|
||||
**Conventions for every task:** TDD (@superpowers-extended-cc:test-driven-development) — failing test first, minimal impl, green, commit. File-scoped namespaces, `sealed` by default, `Async` suffix on Task-returning methods. Commit after each green task. The `Files:` block of each task is the `files_to_edit` contract — touch exactly those paths; if more are needed, that's a plan defect to surface.
|
||||
|
||||
---
|
||||
|
||||
## Phase 0 — Normalization docs (spec drives the API)
|
||||
|
||||
### Task 1: components/health spec + shared-contract
|
||||
|
||||
**Classification:** small
|
||||
**Estimated implement time:** ~5 min
|
||||
**Parallelizable with:** Task 2
|
||||
|
||||
**Files:**
|
||||
- Create: `components/health/spec/SPEC.md`
|
||||
- Create: `components/health/shared-contract/ZB.MOM.WW.Health.md`
|
||||
|
||||
**Steps:**
|
||||
1. `spec/SPEC.md` — follow `components/auth/spec/SPEC.md` shape. Section 0 **Scope**: normalized = tier semantics (`ready`/`active`/`live`), canonical tag set, JSON response shape, configurable Akka status policy, role-filtered active/leader probe, `DatabaseHealthCheck<T>`, gRPC-dependency probe, `IActiveNodeGate`. Explicitly NOT normalized = which probes each app registers, Traefik/orchestrator wiring, ScadaBridge's `HealthMonitoring/` domain aggregation pipeline. Then a section per: tier table, probe catalog (with the two real Akka-policy variants + the role-filter unification), response-writer contract.
|
||||
2. `shared-contract/ZB.MOM.WW.Health.md` — the paper public API of the 3 packages: `MapZbHealth`, `AddZbHealthChecks`, `ZbHealthTags`, `IActiveNodeGate`, `GrpcDependencyHealthCheck`(options), `AkkaClusterHealthCheck`(+`AkkaClusterStatusPolicy`), `ActiveNodeHealthCheck`(+role option), `DatabaseHealthCheck<TContext>`(+probe delegate). Valid C# signatures, package-by-package, like `components/auth/shared-contract/ZB.MOM.WW.Auth.md`.
|
||||
|
||||
**Acceptance:** Both files exist; `SPEC.md` has an explicit normalized-vs-per-project Section 0; the shared-contract compiles as C# signatures in your head (no behavior, just the surface). No tests (docs).
|
||||
|
||||
---
|
||||
|
||||
### Task 2: components/health current-state ×3 + GAPS + README
|
||||
|
||||
**Classification:** small
|
||||
**Estimated implement time:** ~5 min
|
||||
**Parallelizable with:** Task 1
|
||||
|
||||
**Files:**
|
||||
- Create: `components/health/current-state/otopcua/CURRENT-STATE.md`
|
||||
- Create: `components/health/current-state/mxaccessgw/CURRENT-STATE.md`
|
||||
- Create: `components/health/current-state/scadabridge/CURRENT-STATE.md`
|
||||
- Create: `components/health/GAPS.md`
|
||||
- Create: `components/health/README.md`
|
||||
|
||||
**Steps:**
|
||||
1. Transcribe the code-verified scan from the design doc's "Health" current-state table + key refs into the three `CURRENT-STATE.md` files at full `file:line` depth (re-verify each ref against the live repo). Each ends in an **Adoption plan** (what that app deletes/replaces to reach the spec). MxGateway's plan = "register the tier mapping + a worker gRPC-dependency probe; today `AddHealthChecks()` at `GatewayApplication.cs:61` is unused."
|
||||
2. `GAPS.md` — per-project divergences + prioritized extraction/adoption backlog. Top entries: MxGateway has no probes/tiers (P1); Akka status-policy divergence (OtOpcUa vs ScadaBridge); DB-probe technique (`CanConnectAsync` vs `Deployments` query); `/healthz` present in OtOpcUa, absent in ScadaBridge.
|
||||
3. `README.md` — overview + per-project status table (like `components/auth/README.md`).
|
||||
|
||||
**Acceptance:** Five files exist; each current-state cites real `file:line` refs; `GAPS.md` lists adoption as a follow-on. No tests (docs).
|
||||
|
||||
---
|
||||
|
||||
## Phase 1 — Scaffold
|
||||
|
||||
### Task 3: Create repo, solution, and project shells
|
||||
|
||||
**Classification:** small
|
||||
**Estimated implement time:** ~5 min
|
||||
**Parallelizable with:** none (gates all impl tasks)
|
||||
|
||||
**Files:**
|
||||
- Create: `ZB.MOM.WW.Health/ZB.MOM.WW.Health.slnx`
|
||||
- Create: `ZB.MOM.WW.Health/Directory.Build.props`
|
||||
- Create: `ZB.MOM.WW.Health/Directory.Packages.props`
|
||||
- Create: `ZB.MOM.WW.Health/.gitignore`
|
||||
- Create: `ZB.MOM.WW.Health/src/ZB.MOM.WW.Health/ZB.MOM.WW.Health.csproj`
|
||||
- Create: `ZB.MOM.WW.Health/src/ZB.MOM.WW.Health.Akka/ZB.MOM.WW.Health.Akka.csproj`
|
||||
- Create: `ZB.MOM.WW.Health/src/ZB.MOM.WW.Health.EntityFrameworkCore/ZB.MOM.WW.Health.EntityFrameworkCore.csproj`
|
||||
- Create: `ZB.MOM.WW.Health/tests/ZB.MOM.WW.Health.Tests/…csproj`
|
||||
- Create: `ZB.MOM.WW.Health/tests/ZB.MOM.WW.Health.Akka.Tests/…csproj`
|
||||
- Create: `ZB.MOM.WW.Health/tests/ZB.MOM.WW.Health.EntityFrameworkCore.Tests/…csproj`
|
||||
|
||||
**Steps:**
|
||||
1. `cd ~/Desktop/scadaproj && mkdir ZB.MOM.WW.Health && cd ZB.MOM.WW.Health && git init && dotnet new gitignore`
|
||||
2. `dotnet new sln -n ZB.MOM.WW.Health --format slnx` (fallback to `.sln` if the SDK rejects `.slnx`).
|
||||
3. `dotnet new classlib -f net10.0 -o src/<Name>` for the 3 libs; `dotnet new xunit -f net10.0 -o tests/<Name>.Tests` for the 3 test projects. Delete default `Class1.cs`/`UnitTest1.cs`.
|
||||
4. Project refs: `.Akka` & `.EntityFrameworkCore` → core `ZB.MOM.WW.Health`; each test project → its lib (core test also refs `Microsoft.AspNetCore.Mvc.Testing`).
|
||||
5. Copy `Directory.Build.props` verbatim from `ZB.MOM.WW.Auth/Directory.Build.props` (net10.0, Nullable, ImplicitUsings, LangVersion latest, Version 0.1.0, central PM).
|
||||
6. `Directory.Packages.props` — pin: `Microsoft.Extensions.Diagnostics.HealthChecks` 10.0.7, `Microsoft.Extensions.Diagnostics.HealthChecks.Abstractions` 10.0.7, `Microsoft.AspNetCore.Http.Abstractions`/`Microsoft.AspNetCore.Routing` (framework refs via `<FrameworkReference Include="Microsoft.AspNetCore.App"/>` in core), `AspNetCore.HealthChecks.UI.Client` 9.0.0, `Akka.Cluster` 1.5.62, `Akka.TestKit.Xunit2` 1.5.62, `Microsoft.EntityFrameworkCore` 10.0.7, `Microsoft.EntityFrameworkCore.Sqlite` 10.0.7, `Microsoft.EntityFrameworkCore.InMemory` 10.0.7, `Grpc.Net.Client` (latest 2.x), test pkgs (`Microsoft.NET.Test.Sdk` 17.14.1, `xunit` 2.9.3, `xunit.runner.visualstudio` 3.1.4, `coverlet.collector` 6.0.4, `Microsoft.AspNetCore.Mvc.Testing` 10.0.7). Core lib uses `<FrameworkReference Include="Microsoft.AspNetCore.App"/>` rather than individual ASP.NET packages.
|
||||
7. `dotnet sln add` all 6 projects; `dotnet build`.
|
||||
8. **Commit:** `git add -A && git commit -m "chore: scaffold ZB.MOM.WW.Health solution and projects"`
|
||||
|
||||
**Acceptance:** `dotnet build` green; 6 projects in the solution.
|
||||
|
||||
---
|
||||
|
||||
## Phase 2 — Core package (`ZB.MOM.WW.Health`)
|
||||
|
||||
### Task 4: Canonical tags + `MapZbHealth` tier mapping
|
||||
|
||||
**Classification:** standard
|
||||
**Estimated implement time:** ~5 min
|
||||
**Parallelizable with:** none (Tasks 5-7 build on this)
|
||||
|
||||
**Files:**
|
||||
- Create: `src/ZB.MOM.WW.Health/ZbHealthTags.cs`
|
||||
- Create: `src/ZB.MOM.WW.Health/ZbHealthEndpointExtensions.cs`
|
||||
- Test: `tests/ZB.MOM.WW.Health.Tests/TierMappingTests.cs`
|
||||
|
||||
**Step 1 — failing test:** `WebApplicationFactory`-based test boots a minimal app that registers a fake check tagged `ready`, another tagged `active`, calls `app.MapZbHealth()`, then asserts: `GET /health/ready` runs ready-tagged only, `/health/active` runs active-tagged only, `/healthz` returns 200 with no checks executed (predicate `_ => false`). Use a counter check to prove which tiers ran which checks.
|
||||
|
||||
**Step 2 — run, expect FAIL** (`MapZbHealth` undefined). `dotnet test --filter TierMappingTests`.
|
||||
|
||||
**Step 3 — implement:** `ZbHealthTags` = `public const string Ready="ready", Active="active", Live="live";`. `ZbHealthEndpointExtensions.MapZbHealth(this IEndpointRouteBuilder, ZbHealthEndpointOptions? = null)` maps the three endpoints with predicates `c => c.Tags.Contains(Ready)`, `…Active`, and `_ => false`; each `AllowAnonymous()`; response writer = the Task 5 writer (stub to default `WriteMinimalPlaintext` for now, replaced in Task 5).
|
||||
|
||||
**Step 4 — run, expect PASS.**
|
||||
|
||||
**Step 5 — commit:** `feat(health): canonical tags + three-tier MapZbHealth`
|
||||
|
||||
---
|
||||
|
||||
### Task 5: Canonical JSON response writer
|
||||
|
||||
**Classification:** standard
|
||||
**Estimated implement time:** ~5 min
|
||||
**Parallelizable with:** none (slots into Task 4's mapper)
|
||||
|
||||
**Files:**
|
||||
- Create: `src/ZB.MOM.WW.Health/ZbHealthResponseWriter.cs`
|
||||
- Modify: `src/ZB.MOM.WW.Health/ZbHealthEndpointExtensions.cs` (wire writer as default)
|
||||
- Test: `tests/ZB.MOM.WW.Health.Tests/ResponseWriterTests.cs`
|
||||
|
||||
**Step 1 — failing test:** assert `/health/ready` body is JSON with `status`, per-check `name`/`status`/`description`, and `totalDurationMs`; content-type `application/json`; HTTP 200 when Healthy, 503 when any check Unhealthy, 200 when Degraded. Port the shape from ScadaBridge's `UIResponseWriter.WriteHealthCheckUIResponse` usage (`Program.cs:222-233`) but as our own writer (no UI dependency required at runtime — keep `AspNetCore.HealthChecks.UI.Client` optional).
|
||||
|
||||
**Step 2 — run, expect FAIL.**
|
||||
**Step 3 — implement** `ZbHealthResponseWriter.WriteJsonAsync(HttpContext, HealthReport)`; set it as the default `ResponseWriter` in `MapZbHealth`.
|
||||
**Step 4 — run, expect PASS.**
|
||||
**Step 5 — commit:** `feat(health): canonical JSON health response writer`
|
||||
|
||||
---
|
||||
|
||||
### Task 6: `IActiveNodeGate` seam + route gating
|
||||
|
||||
**Classification:** standard
|
||||
**Estimated implement time:** ~5 min
|
||||
**Parallelizable with:** Task 7
|
||||
|
||||
**Files:**
|
||||
- Create: `src/ZB.MOM.WW.Health/IActiveNodeGate.cs`
|
||||
- Create: `src/ZB.MOM.WW.Health/ActiveNodeGateEndpointFilter.cs`
|
||||
- Test: `tests/ZB.MOM.WW.Health.Tests/ActiveNodeGateTests.cs`
|
||||
|
||||
**Step 1 — failing test:** register a fake `IActiveNodeGate` whose `IsActiveNode` toggles; a route guarded by `.RequireActiveNode()` returns 200 when active, 503 (`Retry-After`) when standby. Port the gate contract from ScadaBridge `ActiveNodeGate.cs:24-57` (generalized — no Akka in core; the implementation is provided by `.Akka` or the consumer).
|
||||
|
||||
**Step 2 — FAIL. Step 3 — implement** `IActiveNodeGate { bool IsActiveNode { get; } }` + an `IEndpointConventionBuilder RequireActiveNode(this …)` filter that 503s when not active. **Step 4 — PASS. Step 5 — commit:** `feat(health): IActiveNodeGate seam + RequireActiveNode filter`
|
||||
|
||||
---
|
||||
|
||||
### Task 7: `GrpcDependencyHealthCheck`
|
||||
|
||||
**Classification:** standard
|
||||
**Estimated implement time:** ~5 min
|
||||
**Parallelizable with:** Task 6
|
||||
|
||||
**Files:**
|
||||
- Create: `src/ZB.MOM.WW.Health/GrpcDependencyHealthCheck.cs`
|
||||
- Create: `src/ZB.MOM.WW.Health/GrpcDependencyOptions.cs`
|
||||
- Test: `tests/ZB.MOM.WW.Health.Tests/GrpcDependencyHealthCheckTests.cs`
|
||||
|
||||
**Step 1 — failing test:** with a stub probe delegate that resolves a `GrpcChannel`/health RPC, `CheckHealthAsync` returns Healthy on success, Unhealthy on `RpcException`/timeout. Make the actual probe a `Func<GrpcChannel, CancellationToken, Task<bool>>` injected via options so it's testable without a live server (default uses the standard gRPC Health Checking `Health.Check`).
|
||||
|
||||
**Step 2 — FAIL. Step 3 — implement.** Tags default `[Ready, Active]`. **Step 4 — PASS. Step 5 — commit:** `feat(health): gRPC dependency health check`
|
||||
|
||||
---
|
||||
|
||||
## Phase 3 — `ZB.MOM.WW.Health.Akka`
|
||||
|
||||
### Task 8: `AkkaClusterHealthCheck` + configurable status policy
|
||||
|
||||
**Classification:** standard
|
||||
**Estimated implement time:** ~5 min
|
||||
**Parallelizable with:** Task 10
|
||||
|
||||
**Files:**
|
||||
- Create: `src/ZB.MOM.WW.Health.Akka/AkkaClusterStatusPolicy.cs`
|
||||
- Create: `src/ZB.MOM.WW.Health.Akka/AkkaClusterHealthCheck.cs`
|
||||
- Test: `tests/ZB.MOM.WW.Health.Akka.Tests/AkkaClusterStatusPolicyTests.cs`
|
||||
|
||||
**Step 1 — failing test:** table-driven over `MemberStatus` → `HealthStatus` for both presets. `Default` (ScadaBridge `AkkaClusterHealthCheck.cs:29-51`): `Up`/`Joining`→Healthy, `Leaving`/`Exiting`→Degraded, else Unhealthy. `OtOpcUaCompat` (OtOpcUa `AkkaClusterHealthCheck.cs:25-34`): self-`Up`-among-members→Healthy else Degraded. Test the **policy function** directly (pure), no live cluster.
|
||||
|
||||
**Step 2 — FAIL. Step 3 — implement** `AkkaClusterStatusPolicy` as a `Func<MemberStatus, HealthStatus>` with `Default`/`OtOpcUaCompat` static presets; `AkkaClusterHealthCheck` reads `Cluster.Get(system).SelfMember.Status` and applies the configured policy. Tags `[Ready, Active]`.
|
||||
**Step 4 — PASS. Step 5 — commit:** `feat(health.akka): cluster health check with configurable status policy`
|
||||
|
||||
---
|
||||
|
||||
### Task 9: `ActiveNodeHealthCheck` with optional role filter
|
||||
|
||||
**Classification:** standard
|
||||
**Estimated implement time:** ~4 min
|
||||
**Parallelizable with:** Task 10
|
||||
|
||||
**Files:**
|
||||
- Create: `src/ZB.MOM.WW.Health.Akka/ActiveNodeHealthCheck.cs`
|
||||
- Create: `src/ZB.MOM.WW.Health.Akka/AkkaActiveNodeGate.cs` (implements core `IActiveNodeGate`)
|
||||
- Test: `tests/ZB.MOM.WW.Health.Akka.Tests/ActiveNodeHealthCheckTests.cs`
|
||||
|
||||
**Step 1 — failing test:** with faked cluster state, role-less → Healthy iff `SelfMember.Status==Up && Leader==self` (ScadaBridge `ActiveNodeHealthCheck.cs:25-44`); with `role="admin"` → Healthy when node lacks the role (OtOpcUa `AdminRoleLeaderHealthCheck.cs:30`), Healthy when admin-leader, Degraded when admin-but-not-leader. Tags `[Active]` only.
|
||||
|
||||
**Step 2 — FAIL. Step 3 — implement** both the check and `AkkaActiveNodeGate : IActiveNodeGate` (so `RequireActiveNode` works in Akka apps). **Step 4 — PASS. Step 5 — commit:** `feat(health.akka): active/leader check with role filter + IActiveNodeGate impl`
|
||||
|
||||
---
|
||||
|
||||
## Phase 4 — `ZB.MOM.WW.Health.EntityFrameworkCore`
|
||||
|
||||
### Task 10: `DatabaseHealthCheck<TContext>`
|
||||
|
||||
**Classification:** standard
|
||||
**Estimated implement time:** ~4 min
|
||||
**Parallelizable with:** Task 8, Task 9
|
||||
|
||||
**Files:**
|
||||
- Create: `src/ZB.MOM.WW.Health.EntityFrameworkCore/DatabaseHealthCheck.cs`
|
||||
- Create: `src/ZB.MOM.WW.Health.EntityFrameworkCore/DatabaseHealthCheckOptions.cs`
|
||||
- Test: `tests/ZB.MOM.WW.Health.EntityFrameworkCore.Tests/DatabaseHealthCheckTests.cs`
|
||||
|
||||
**Step 1 — failing test:** SQLite in-memory `DbContext` → Healthy (default probe `CanConnectAsync`, ScadaBridge `DatabaseHealthCheck.cs:27-42`); a deliberately broken context → Unhealthy; a custom `Func<TContext,CancellationToken,Task>` probe delegate (OtOpcUa's "query `Deployments`" style, `DatabaseHealthCheck.cs:25-37`) runs and Unhealthy on throw. Resolve `TContext` via `IDbContextFactory<TContext>` (matches OtOpcUa) with a fallback to scoped `TContext`.
|
||||
|
||||
**Step 2 — FAIL. Step 3 — implement** generic check + options carrying the optional probe delegate. Tags `[Ready, Active]`. **Step 4 — PASS. Step 5 — commit:** `feat(health.ef): generic DatabaseHealthCheck<TContext>`
|
||||
|
||||
---
|
||||
|
||||
## Phase 5 — Package & register
|
||||
|
||||
### Task 11: Pack, README, register in indexes
|
||||
|
||||
**Classification:** small
|
||||
**Estimated implement time:** ~5 min
|
||||
**Parallelizable with:** none (final)
|
||||
|
||||
**Files:**
|
||||
- Create: `ZB.MOM.WW.Health/README.md`
|
||||
- Modify: each `.csproj` (PackageId/Description/Authors metadata)
|
||||
- Modify: `components/README.md` (registry row)
|
||||
- Modify: `CLAUDE.md` (Component-normalization table row)
|
||||
- Modify: `upcoming.md` (check off Health in "Suggested order")
|
||||
|
||||
**Steps:**
|
||||
1. Add NuGet metadata to the 3 lib `.csproj`s (`<PackageId>`, `<Description>`, `<Authors>`, `<PackageTags>`); leave `GeneratePackageOnBuild` off (pack explicitly, like Auth).
|
||||
2. `dotnet test` (all 3 test projects green) — record counts.
|
||||
3. `dotnet pack -c Release -o ./artifacts` → confirm 3 `*.0.1.0.nupkg`.
|
||||
4. `README.md` — packages, consumer matrix (MxGateway → core only; OtOpcUa/ScadaBridge → all three), `dotnet test` instructions, the "built, not yet adopted" note.
|
||||
5. Register: `components/README.md` registry row (status `Draft`), `CLAUDE.md` table row, tick Health in `upcoming.md`.
|
||||
6. **Commit:** `git -C ZB.MOM.WW.Health add -A && git -C ZB.MOM.WW.Health commit -m "docs: README + pack metadata"`, then in scadaproj `git add components/health CLAUDE.md components/README.md upcoming.md docs/plans && git commit -m "feat(health): ZB.MOM.WW.Health library + health normalization component"`
|
||||
|
||||
**Acceptance:** 3 nupkgs @ 0.1.0; all tests green (counts recorded); indexes updated; design-doc build-order step for Health complete.
|
||||
|
||||
---
|
||||
|
||||
## Summary of parallelism
|
||||
|
||||
- **Phase 0** docs: Task 1 ∥ Task 2.
|
||||
- **Phase 1** scaffold: Task 3 (barrier — everything below needs it).
|
||||
- **Phase 2** core: Task 4 → Task 5 (sequential, same mapper); Task 6 ∥ Task 7.
|
||||
- **Phase 3/4**: Task 8 ∥ Task 9 ∥ Task 10 (different packages/files).
|
||||
- **Phase 5**: Task 11 (barrier — needs all above green).
|
||||
@@ -0,0 +1,17 @@
|
||||
{
|
||||
"planPath": "docs/plans/2026-06-01-zb-mom-ww-health-shared-library.md",
|
||||
"tasks": [
|
||||
{"id": 1, "subject": "Task 1: components/health spec + shared-contract", "status": "pending"},
|
||||
{"id": 2, "subject": "Task 2: components/health current-state x3 + GAPS + README", "status": "pending"},
|
||||
{"id": 3, "subject": "Task 3: scaffold solution + 3 libs + 3 tests", "status": "pending"},
|
||||
{"id": 4, "subject": "Task 4: canonical tags + MapZbHealth tier mapping", "status": "pending", "blockedBy": [3]},
|
||||
{"id": 5, "subject": "Task 5: canonical JSON response writer", "status": "pending", "blockedBy": [4]},
|
||||
{"id": 6, "subject": "Task 6: IActiveNodeGate seam + RequireActiveNode filter", "status": "pending", "blockedBy": [3]},
|
||||
{"id": 7, "subject": "Task 7: GrpcDependencyHealthCheck", "status": "pending", "blockedBy": [3]},
|
||||
{"id": 8, "subject": "Task 8: AkkaClusterHealthCheck + status policy", "status": "pending", "blockedBy": [3]},
|
||||
{"id": 9, "subject": "Task 9: ActiveNodeHealthCheck + role filter + AkkaActiveNodeGate", "status": "pending", "blockedBy": [3, 6]},
|
||||
{"id": 10, "subject": "Task 10: DatabaseHealthCheck<TContext> (EF)", "status": "pending", "blockedBy": [3]},
|
||||
{"id": 11, "subject": "Task 11: pack + README + register indexes", "status": "pending", "blockedBy": [1, 2, 4, 5, 6, 7, 8, 9, 10]}
|
||||
],
|
||||
"lastUpdated": "2026-06-01"
|
||||
}
|
||||
@@ -0,0 +1,281 @@
|
||||
# ZB.MOM.WW.Telemetry Shared Library Implementation Plan
|
||||
|
||||
> **For Claude:** REQUIRED SUB-SKILL: Use superpowers-extended-cc:executing-plans to implement this plan task-by-task.
|
||||
|
||||
**Goal:** Author the `components/observability/` normalization docs and build the `ZB.MOM.WW.Telemetry` shared library (2 NuGet packages) that gives the fleet one OpenTelemetry bootstrap across all three signals (metrics + traces + logs) with a shared `Resource` and a shared Serilog logging stack, then migrate MxAccessGateway's logging from `Microsoft.Extensions.Logging` onto that shared stack — the one sister-repo adoption that proves the contract.
|
||||
|
||||
**Architecture:** A new standalone nested repo (`~/Desktop/scadaproj/ZB.MOM.WW.Telemetry`), .NET 10, two library projects — `ZB.MOM.WW.Telemetry` (OTel metrics+traces bootstrap, shared `Resource`, standard instrumentation, Prometheus/OTLP exporters) and `ZB.MOM.WW.Telemetry.Serilog` (shared Serilog bootstrap, `SiteId`/`NodeRole`/`NodeHostname` enrichers, a new `TraceContextEnricher`, OTel log export, `ILogRedactor` seam). The unifying hinge: one `ZbTelemetryOptions` identity triple (`ServiceName`/`SiteId`/`NodeRole`) feeds **both** the OTel Resource and the Serilog enrichers. Reference implementations: OTel bootstrap from OtOpcUa `ObservabilityExtensions`, Serilog bootstrap + enrichers from ScadaBridge `LoggerConfigurationFactory`, redaction from MxGateway `GatewayLogRedactor`. **Health/Telemetry wiring into OtOpcUa & ScadaBridge stays a future `GAPS.md` item; the ONLY app touched here is MxGateway's logging.**
|
||||
|
||||
**Tech Stack:** .NET 10, C#; xUnit + coverlet; OpenTelemetry SDK 1.15.3 (`OpenTelemetry.Extensions.Hosting`), `OpenTelemetry.Exporter.Prometheus.AspNetCore` 1.15.3-beta.1, `OpenTelemetry.Exporter.OpenTelemetryProtocol` 1.15.3, `OpenTelemetry.Instrumentation.{AspNetCore,Http,GrpcNetClient,Runtime,Process}` (~1.12–1.15); Serilog 4.3.1, `Serilog.AspNetCore` (see version note), `Serilog.Settings.Configuration`, `Serilog.Sinks.{Console,File,OpenTelemetry}`; central package management; `.slnx`; `Version` 0.1.0 lockstep.
|
||||
|
||||
**Version note (a real convergence item):** OtOpcUa pins `Serilog.AspNetCore` 9.0.0, ScadaBridge 10.0.0. Pin **9.0.0** in this library (works on net10, lowest common); record the 9↔10 split in `GAPS.md` as a convergence task. Consumers' central package management governs the final version at adoption.
|
||||
|
||||
**Source references (read-only, to port/generalize from):**
|
||||
- OTel bootstrap + Meter/ActivitySource: OtOpcUa `~/Desktop/OtOpcUa/src/Server/ZB.MOM.WW.OtOpcUa.Host/Observability/ObservabilityExtensions.cs` + `~/Desktop/OtOpcUa/src/Core/ZB.MOM.WW.OtOpcUa.Commons/Observability/OtOpcUaTelemetry.cs`
|
||||
- Serilog bootstrap + enrichers: ScadaBridge `~/Desktop/ScadaBridge/src/ZB.MOM.WW.ScadaBridge.Host/{LoggerConfigurationFactory,LoggingOptions}.cs` + `appsettings.json:3-23`
|
||||
- Hand-rolled metrics to re-home onto OTel export: MxGateway `~/Desktop/MxAccessGateway/src/ZB.MOM.WW.MxGateway.Server/Metrics/GatewayMetrics.cs`
|
||||
- Logging to migrate: MxGateway `~/Desktop/MxAccessGateway/src/ZB.MOM.WW.MxGateway.Server/Diagnostics/{GatewayRequestLoggingMiddlewareExtensions,GatewayLogScope,GatewayLoggerExtensions,GatewayLogRedactor}.cs` + `GatewayApplication.cs:34,61`
|
||||
- Design: `~/Desktop/scadaproj/docs/plans/2026-06-01-health-observability-components-design.md`
|
||||
|
||||
**Conventions for every task:** TDD — failing test first, minimal impl, green, commit. File-scoped namespaces, `sealed` by default. **Never log secrets.** Commit after each green task. The `Files:` block is the `files_to_edit` contract.
|
||||
|
||||
---
|
||||
|
||||
## Phase 0 — Normalization docs (spec drives the API)
|
||||
|
||||
### Task 1: components/observability spec + METRIC-CONVENTIONS + shared-contract
|
||||
|
||||
**Classification:** small
|
||||
**Estimated implement time:** ~5 min
|
||||
**Parallelizable with:** Task 2
|
||||
|
||||
**Files:**
|
||||
- Create: `components/observability/spec/SPEC.md`
|
||||
- Create: `components/observability/spec/METRIC-CONVENTIONS.md`
|
||||
- Create: `components/observability/shared-contract/ZB.MOM.WW.Telemetry.md`
|
||||
|
||||
**Steps:**
|
||||
1. `spec/SPEC.md` — Section 0 **Scope**: normalized = OTel bootstrap (3 signals), the shared `Resource` attribute set, standard instrumentation, exporter conventions (Prometheus default / OTLP opt-in), Serilog bootstrap + enrichers + trace↔log correlation + redaction seam. NOT normalized = each app's actual instruments (`otopcua.*`, `mxgateway.*`), redaction policy (which fields), the net48 worker's `IWorkerLogger`.
|
||||
2. `spec/METRIC-CONVENTIONS.md` (mirrors auth `CANONICAL-ROLES.md` / theme `DESIGN-TOKENS.md`): Meter name = app namespace; instrument name = `<app>.<subsystem>.<event>`; **duration unit = seconds** (OTel semconv — flag MxGateway's `ms` histograms); the Resource attribute list (`service.name`, `service.namespace=ZB.MOM.WW`, `service.version`, `site.id`, `node.role`, `host.name`); the standard instrumentation everyone enables.
|
||||
3. `shared-contract/ZB.MOM.WW.Telemetry.md` — paper API of both packages: `ZbTelemetryOptions`, `AddZbTelemetry`, `MapZbMetrics`, `ZbExporter` enum; `AddZbSerilog`, `ZbLogEnricherNames`, `TraceContextEnricher`, `ILogRedactor`.
|
||||
|
||||
**Acceptance:** Three files exist; `SPEC.md` has explicit Section 0; `METRIC-CONVENTIONS.md` states the seconds rule and the Resource set. No tests (docs).
|
||||
|
||||
---
|
||||
|
||||
### Task 2: components/observability current-state ×3 + GAPS + README
|
||||
|
||||
**Classification:** small
|
||||
**Estimated implement time:** ~5 min
|
||||
**Parallelizable with:** Task 1
|
||||
|
||||
**Files:**
|
||||
- Create: `components/observability/current-state/otopcua/CURRENT-STATE.md`
|
||||
- Create: `components/observability/current-state/mxaccessgw/CURRENT-STATE.md`
|
||||
- Create: `components/observability/current-state/scadabridge/CURRENT-STATE.md`
|
||||
- Create: `components/observability/GAPS.md`
|
||||
- Create: `components/observability/README.md`
|
||||
|
||||
**Steps:**
|
||||
1. Transcribe the design doc's "Telemetry" + "Logging" current-state into the three docs at full `file:line` depth (re-verify against live repos). OtOpcUa = full OTel + Prometheus + Serilog (no Resource, no trace↔log correlation). MxGateway = hand-rolled `GatewayMetrics` (no export) + MEL logging — its Adoption plan = the migration in Phase 4. ScadaBridge = `OpenTelemetry.Api` dangling CVE-patch ref + Serilog (cleanest enrichers).
|
||||
2. `GAPS.md` — top entries: no `Resource`/`service.name` anywhere (P1); MxGateway metrics never export (P1); MxGateway MEL→Serilog (P1, done here); `ms`→`s` unit convergence; no trace↔log correlation anywhere; `Serilog.AspNetCore` 9↔10 split; ScadaBridge has zero instrumentation.
|
||||
3. `README.md` — overview + per-project status table.
|
||||
|
||||
**Acceptance:** Five files exist; current-states cite real `file:line`; `GAPS.md` lists the migration + convergence items. No tests (docs).
|
||||
|
||||
---
|
||||
|
||||
## Phase 1 — Scaffold
|
||||
|
||||
### Task 3: Create repo, solution, and project shells
|
||||
|
||||
**Classification:** small
|
||||
**Estimated implement time:** ~5 min
|
||||
**Parallelizable with:** none (gates impl tasks)
|
||||
|
||||
**Files:**
|
||||
- Create: `ZB.MOM.WW.Telemetry/ZB.MOM.WW.Telemetry.slnx`
|
||||
- Create: `ZB.MOM.WW.Telemetry/Directory.Build.props`
|
||||
- Create: `ZB.MOM.WW.Telemetry/Directory.Packages.props`
|
||||
- Create: `ZB.MOM.WW.Telemetry/.gitignore`
|
||||
- Create: `ZB.MOM.WW.Telemetry/src/ZB.MOM.WW.Telemetry/ZB.MOM.WW.Telemetry.csproj`
|
||||
- Create: `ZB.MOM.WW.Telemetry/src/ZB.MOM.WW.Telemetry.Serilog/ZB.MOM.WW.Telemetry.Serilog.csproj`
|
||||
- Create: `ZB.MOM.WW.Telemetry/tests/ZB.MOM.WW.Telemetry.Tests/…csproj`
|
||||
- Create: `ZB.MOM.WW.Telemetry/tests/ZB.MOM.WW.Telemetry.Serilog.Tests/…csproj`
|
||||
|
||||
**Steps:**
|
||||
1. `cd ~/Desktop/scadaproj && mkdir ZB.MOM.WW.Telemetry && cd ZB.MOM.WW.Telemetry && git init && dotnet new gitignore`
|
||||
2. `dotnet new sln -n ZB.MOM.WW.Telemetry --format slnx` (fallback `.sln`).
|
||||
3. `dotnet new classlib -f net10.0` ×2 libs; `dotnet new xunit -f net10.0` ×2 tests. Delete default classes.
|
||||
4. Refs: `.Serilog` → core `ZB.MOM.WW.Telemetry`; each test → its lib; core lib `<FrameworkReference Include="Microsoft.AspNetCore.App"/>` (for `MapZbMetrics` / instrumentation).
|
||||
5. Copy `Directory.Build.props` from `ZB.MOM.WW.Auth`.
|
||||
6. `Directory.Packages.props` — pin the OTel + Serilog versions from the Tech Stack/Version-note above + test pkgs (`Microsoft.NET.Test.Sdk` 17.14.1, `xunit` 2.9.3, `xunit.runner.visualstudio` 3.1.4, `coverlet.collector` 6.0.4) + `Serilog.Sinks.InMemory` or `Serilog.Sinks.TestCorrelator` for tests.
|
||||
7. `dotnet sln add` all 4; `dotnet build`.
|
||||
8. **Commit:** `chore: scaffold ZB.MOM.WW.Telemetry solution and projects`
|
||||
|
||||
**Acceptance:** `dotnet build` green; 4 projects.
|
||||
|
||||
---
|
||||
|
||||
## Phase 2 — `ZB.MOM.WW.Telemetry` (metrics + traces)
|
||||
|
||||
### Task 4: `ZbTelemetryOptions` + shared `Resource` builder
|
||||
|
||||
**Classification:** standard
|
||||
**Estimated implement time:** ~4 min
|
||||
**Parallelizable with:** none (Tasks 5-6 build on it)
|
||||
|
||||
**Files:**
|
||||
- Create: `src/ZB.MOM.WW.Telemetry/ZbTelemetryOptions.cs`
|
||||
- Create: `src/ZB.MOM.WW.Telemetry/ZbResource.cs`
|
||||
- Test: `tests/ZB.MOM.WW.Telemetry.Tests/ZbResourceTests.cs`
|
||||
|
||||
**Step 1 — failing test:** `ZbResource.Build(options)` returns a `ResourceBuilder` whose attributes include `service.name` (= `ServiceName`), `service.namespace` (= `ServiceNamespace`, default `"ZB.MOM.WW"`), `service.version`, `site.id` (= `SiteId`), `node.role` (= `NodeRole`), `host.name`. Assert all six present with expected values (build the `Resource`, inspect `Attributes`).
|
||||
|
||||
**Step 2 — FAIL. Step 3 — implement** `ZbTelemetryOptions` (ServiceName, ServiceNamespace=ZB.MOM.WW, ServiceVersion, SiteId, NodeRole, `string[] Meters`, `string[] ActivitySources`, `ZbExporter Exporter=Prometheus`, OTLP endpoint) + `ZbResource.Build`. **Step 4 — PASS. Step 5 — commit:** `feat(telemetry): options + shared OTel Resource`
|
||||
|
||||
---
|
||||
|
||||
### Task 5: `AddZbTelemetry` (metrics + traces wiring)
|
||||
|
||||
**Classification:** high-risk
|
||||
**Estimated implement time:** ~5 min
|
||||
**Parallelizable with:** none
|
||||
|
||||
**Files:**
|
||||
- Create: `src/ZB.MOM.WW.Telemetry/ZbTelemetryExtensions.cs`
|
||||
- Test: `tests/ZB.MOM.WW.Telemetry.Tests/AddZbTelemetryTests.cs`
|
||||
|
||||
**Step 1 — failing test:** build a host with `AddZbTelemetry(o => { o.ServiceName="t"; o.Meters=["Test.Meter"]; })` using an **in-memory metrics exporter** (`MetricReader`/`InMemoryExporter` test harness); emit a counter on `Test.Meter`; assert the metric is collected and the export carries the Resource `service.name="t"`. Port the builder shape from OtOpcUa `ObservabilityExtensions.cs:18-25` (`AddOpenTelemetry().WithMetrics(...).WithTracing(...)`), generalized to register `o.Meters`/`o.ActivitySources` by name + standard instrumentation (`AddAspNetCoreInstrumentation`, `AddHttpClientInstrumentation`, `AddGrpcClientInstrumentation`, `AddRuntimeInstrumentation`, `AddProcessInstrumentation`) + exporter switch (Prometheus default, OTLP when `o.Exporter==Otlp`).
|
||||
|
||||
**Step 2 — FAIL. Step 3 — implement.** Classification high-risk → executor runs spec+code review serially (this is the fleet's telemetry front door). **Step 4 — PASS. Step 5 — commit:** `feat(telemetry): AddZbTelemetry metrics+traces bootstrap`
|
||||
|
||||
---
|
||||
|
||||
### Task 6: `MapZbMetrics` Prometheus endpoint
|
||||
|
||||
**Classification:** small
|
||||
**Estimated implement time:** ~3 min
|
||||
**Parallelizable with:** Task 7, Task 8 (different package)
|
||||
|
||||
**Files:**
|
||||
- Create: `src/ZB.MOM.WW.Telemetry/ZbMetricsEndpointExtensions.cs`
|
||||
- Test: `tests/ZB.MOM.WW.Telemetry.Tests/MapZbMetricsTests.cs`
|
||||
|
||||
**Step 1 — failing test:** `WebApplicationFactory` app with `AddZbTelemetry(Prometheus)` + `app.MapZbMetrics()` → `GET /metrics` returns 200 with `text/plain; version=0.0.4` Prometheus exposition. Port `/metrics` mapping from OtOpcUa `ObservabilityExtensions.cs:36-38`.
|
||||
|
||||
**Step 2 — FAIL. Step 3 — implement** `MapZbMetrics` delegating to `MapPrometheusScrapingEndpoint`. **Step 4 — PASS. Step 5 — commit:** `feat(telemetry): MapZbMetrics Prometheus scrape endpoint`
|
||||
|
||||
---
|
||||
|
||||
## Phase 3 — `ZB.MOM.WW.Telemetry.Serilog` (logs signal)
|
||||
|
||||
### Task 7: Identity enrichers + `AddZbSerilog` bootstrap
|
||||
|
||||
**Classification:** standard
|
||||
**Estimated implement time:** ~5 min
|
||||
**Parallelizable with:** Task 6
|
||||
|
||||
**Files:**
|
||||
- Create: `src/ZB.MOM.WW.Telemetry.Serilog/ZbLogEnricherNames.cs`
|
||||
- Create: `src/ZB.MOM.WW.Telemetry.Serilog/ZbSerilogExtensions.cs`
|
||||
- Test: `tests/ZB.MOM.WW.Telemetry.Serilog.Tests/EnricherTests.cs`
|
||||
|
||||
**Step 1 — failing test:** using `Serilog.Sinks.InMemory`, configure via `AddZbSerilog(options)` with `SiteId="s1"`, `NodeRole="Central"` and log one event; assert the event carries properties `SiteId=s1`, `NodeRole=Central`, `NodeHostname=<machine>`. Bind these from the **same `ZbTelemetryOptions`** (reference the core package) so the dimensions match the Resource. Port two-stage bootstrap + `MinimumLevel.Is` override + `ReadFrom.Configuration` from ScadaBridge `LoggerConfigurationFactory.cs:62-88`.
|
||||
|
||||
**Step 2 — FAIL. Step 3 — implement** `AddZbSerilog(this IHostApplicationBuilder, Action<ZbTelemetryOptions>)` (or `LoggerConfiguration` factory mirroring ScadaBridge) with `Enrich.WithProperty` for the triple. **Step 4 — PASS. Step 5 — commit:** `feat(telemetry.serilog): AddZbSerilog bootstrap + identity enrichers`
|
||||
|
||||
---
|
||||
|
||||
### Task 8: `TraceContextEnricher` (trace↔log correlation)
|
||||
|
||||
**Classification:** standard
|
||||
**Estimated implement time:** ~4 min
|
||||
**Parallelizable with:** Task 6
|
||||
|
||||
**Files:**
|
||||
- Create: `src/ZB.MOM.WW.Telemetry.Serilog/TraceContextEnricher.cs`
|
||||
- Modify: `src/ZB.MOM.WW.Telemetry.Serilog/ZbSerilogExtensions.cs` (register enricher)
|
||||
- Test: `tests/ZB.MOM.WW.Telemetry.Serilog.Tests/TraceContextEnricherTests.cs`
|
||||
|
||||
**Step 1 — failing test:** with an active `Activity` (start an `ActivitySource` span), a logged event carries `trace_id` and `span_id` equal to `Activity.Current.TraceId`/`SpanId`; with no active Activity, neither property is added (clean omission). This is **new shared glue** — no existing app has it.
|
||||
|
||||
**Step 2 — FAIL. Step 3 — implement** `ILogEventEnricher` reading `Activity.Current`; add to `AddZbSerilog`. **Step 4 — PASS. Step 5 — commit:** `feat(telemetry.serilog): TraceContextEnricher for trace<->log correlation`
|
||||
|
||||
---
|
||||
|
||||
### Task 9: `ILogRedactor` seam + OTel log export
|
||||
|
||||
**Classification:** standard
|
||||
**Estimated implement time:** ~5 min
|
||||
**Parallelizable with:** none
|
||||
|
||||
**Files:**
|
||||
- Create: `src/ZB.MOM.WW.Telemetry.Serilog/ILogRedactor.cs`
|
||||
- Create: `src/ZB.MOM.WW.Telemetry.Serilog/RedactionEnricher.cs`
|
||||
- Modify: `src/ZB.MOM.WW.Telemetry.Serilog/ZbSerilogExtensions.cs` (optional `WriteTo.OpenTelemetry` with shared Resource)
|
||||
- Test: `tests/ZB.MOM.WW.Telemetry.Serilog.Tests/RedactionTests.cs`
|
||||
|
||||
**Step 1 — failing test:** register a fake `ILogRedactor` that masks a property named `apiKey`; log an event with `apiKey="mxgw_secret"`; assert the sink sees it masked. The **seam** is shared; policy is the consumer's (generalize MxGateway `GatewayLogRedactor.cs`). Also assert (separate test) that when `o.Exporter` routes logs to OTLP, the log records carry the same Resource as metrics/traces.
|
||||
|
||||
**Step 2 — FAIL. Step 3 — implement** `ILogRedactor { void Redact(IDictionary<string,object?> properties); }` + a `RedactionEnricher` that applies the registered redactor; wire optional `WriteTo.OpenTelemetry(resource: ZbResource…)`. **Step 4 — PASS. Step 5 — commit:** `feat(telemetry.serilog): ILogRedactor seam + OTel log export`
|
||||
|
||||
---
|
||||
|
||||
## Phase 4 — MxGateway MEL → Serilog migration (the one sister-repo touch)
|
||||
|
||||
> Touches `~/Desktop/MxAccessGateway`. Prereq: Phase 3 complete (`ZB.MOM.WW.Telemetry.Serilog` packed, or referenced via local project/`nupkg` source). Add the package via a local NuGet source or `ProjectReference` to the packed lib. The net48 x86 **worker** is OUT of scope — leave `WorkerConsoleLogger`/`IWorkerLogger` untouched.
|
||||
|
||||
### Task 10: Swap gateway bootstrap to `AddZbSerilog`
|
||||
|
||||
**Classification:** high-risk
|
||||
**Estimated implement time:** ~5 min
|
||||
**Parallelizable with:** none
|
||||
|
||||
**Files:**
|
||||
- Modify: `~/Desktop/MxAccessGateway/src/ZB.MOM.WW.MxGateway.Server/GatewayApplication.cs` (replace default MEL logging with `AddZbSerilog`)
|
||||
- Modify: `~/Desktop/MxAccessGateway/src/ZB.MOM.WW.MxGateway.Server/ZB.MOM.WW.MxGateway.Server.csproj` (add package ref)
|
||||
- Modify: `~/Desktop/MxAccessGateway/src/ZB.MOM.WW.MxGateway.Server/appsettings.json` (Serilog section: Console+File sinks, MinimumLevel)
|
||||
- Test: existing `~/Desktop/MxAccessGateway/src/MxGateway.Tests/` (fake worker — no MXAccess needed)
|
||||
|
||||
**Step 1 — failing/red state:** add a focused test (or reuse an existing logging test) asserting the host builds with Serilog as the provider and a log event carries `SiteId`/`NodeRole`. **Step 2 — run, expect FAIL** (still MEL).
|
||||
**Step 3 — implement:** reference `ZB.MOM.WW.Telemetry.Serilog`; call `AddZbSerilog` mapping `o.ServiceName="mxgateway"`, `SiteId`/`NodeRole` from config; add the `Serilog` config section. Remove the default logging assumptions.
|
||||
**Step 4 — run, expect PASS;** then `dotnet build src/MxGateway.sln` + `dotnet test src/MxGateway.Tests` green.
|
||||
**Step 5 — commit (in MxGateway repo):** `refactor(logging): adopt ZB.MOM.WW.Telemetry.Serilog bootstrap`
|
||||
|
||||
---
|
||||
|
||||
### Task 11: Re-express correlation scope + redactor on the shared seam
|
||||
|
||||
**Classification:** high-risk
|
||||
**Estimated implement time:** ~5 min
|
||||
**Parallelizable with:** none
|
||||
|
||||
**Files:**
|
||||
- Modify: `~/Desktop/MxAccessGateway/src/ZB.MOM.WW.MxGateway.Server/Diagnostics/GatewayRequestLoggingMiddlewareExtensions.cs` (BeginScope → `LogContext.PushProperty`)
|
||||
- Modify: `~/Desktop/MxAccessGateway/src/ZB.MOM.WW.MxGateway.Server/Diagnostics/GatewayLogScope.cs` (emit via LogContext)
|
||||
- Create: `~/Desktop/MxAccessGateway/src/ZB.MOM.WW.MxGateway.Server/Diagnostics/GatewayLogRedactorAdapter.cs` (implements `ILogRedactor`, delegates to existing `GatewayLogRedactor` policy)
|
||||
- Test: existing `MxGateway.Tests` correlation/redaction tests
|
||||
|
||||
**Step 1 — failing test:** assert (a) a request still emits the correlation properties (`SessionId`/`CorrelationId`/etc.) now via Serilog `LogContext`, and (b) a `mxgw_`-prefixed secret is still redacted through the registered `ILogRedactor`. **Step 2 — FAIL** (still MEL `BeginScope`/old redactor path).
|
||||
**Step 3 — implement:** convert the scope middleware to push Serilog `LogContext` properties (keep header parsing from `GatewayRequestLoggingMiddlewareExtensions.cs:22-41`); register `GatewayLogRedactorAdapter : ILogRedactor` wrapping the existing `GatewayLogRedactor` field/command policy.
|
||||
**Step 4 — PASS;** full `dotnet test src/MxGateway.Tests` green (record counts); verify no secret leakage.
|
||||
**Step 5 — commit (MxGateway repo):** `refactor(logging): correlation scope + redaction on shared ILogRedactor seam`
|
||||
|
||||
---
|
||||
|
||||
## Phase 5 — Package & register
|
||||
|
||||
### Task 12: Pack, README, register in indexes
|
||||
|
||||
**Classification:** small
|
||||
**Estimated implement time:** ~5 min
|
||||
**Parallelizable with:** none (final)
|
||||
|
||||
**Files:**
|
||||
- Create: `ZB.MOM.WW.Telemetry/README.md`
|
||||
- Modify: both lib `.csproj` (PackageId/Description/metadata)
|
||||
- Modify: `components/README.md` (registry row)
|
||||
- Modify: `CLAUDE.md` (Component-normalization table row)
|
||||
- Modify: `upcoming.md` (check off Observability)
|
||||
|
||||
**Steps:**
|
||||
1. NuGet metadata on both lib `.csproj`s.
|
||||
2. `dotnet test` (both test projects green) — record counts.
|
||||
3. `dotnet pack -c Release -o ./artifacts` → confirm 2 `*.0.1.0.nupkg`.
|
||||
4. `README.md` — packages, the identity-triple hinge, exporter options (Prometheus default / OTLP opt-in), consumer matrix, "built; MxGateway logging adopted; broader adoption deferred" note.
|
||||
5. Register: `components/README.md` row (status `Draft`), `CLAUDE.md` row, tick Observability in `upcoming.md`.
|
||||
6. **Commit:** lib repo `docs: README + pack metadata`; scadaproj `git add components/observability CLAUDE.md components/README.md upcoming.md docs/plans && git commit -m "feat(telemetry): ZB.MOM.WW.Telemetry library + observability normalization component + MxGateway logging adoption"`
|
||||
|
||||
**Acceptance:** 2 nupkgs @ 0.1.0; all library tests green + MxGateway tests green (counts recorded); indexes updated; design-doc build-order steps 2-6 (telemetry side) complete.
|
||||
|
||||
---
|
||||
|
||||
## Summary of parallelism
|
||||
|
||||
- **Phase 0** docs: Task 1 ∥ Task 2.
|
||||
- **Phase 1** scaffold: Task 3 (barrier).
|
||||
- **Phase 2** core: Task 4 → Task 5 (sequential); Task 6 ∥ the Serilog tasks.
|
||||
- **Phase 3** serilog: Task 7 ∥ Task 8 (Task 8 modifies the same extensions file as Task 7 — sequence if conflict), then Task 9.
|
||||
- **Phase 4** migration: Task 10 → Task 11 (serial, same repo; needs Phase 3).
|
||||
- **Phase 5**: Task 12 (barrier).
|
||||
@@ -0,0 +1,18 @@
|
||||
{
|
||||
"planPath": "docs/plans/2026-06-01-zb-mom-ww-telemetry-shared-library.md",
|
||||
"tasks": [
|
||||
{"id": 1, "subject": "Task 1: components/observability spec + METRIC-CONVENTIONS + shared-contract", "status": "pending"},
|
||||
{"id": 2, "subject": "Task 2: components/observability current-state x3 + GAPS + README", "status": "pending"},
|
||||
{"id": 3, "subject": "Task 3: scaffold solution + 2 libs + 2 tests", "status": "pending"},
|
||||
{"id": 4, "subject": "Task 4: ZbTelemetryOptions + shared OTel Resource", "status": "pending", "blockedBy": [3]},
|
||||
{"id": 5, "subject": "Task 5: AddZbTelemetry metrics+traces bootstrap", "status": "pending", "blockedBy": [4]},
|
||||
{"id": 6, "subject": "Task 6: MapZbMetrics Prometheus endpoint", "status": "pending", "blockedBy": [5]},
|
||||
{"id": 7, "subject": "Task 7: identity enrichers + AddZbSerilog bootstrap", "status": "pending", "blockedBy": [3, 4]},
|
||||
{"id": 8, "subject": "Task 8: TraceContextEnricher (trace<->log correlation)", "status": "pending", "blockedBy": [7]},
|
||||
{"id": 9, "subject": "Task 9: ILogRedactor seam + OTel log export", "status": "pending", "blockedBy": [8]},
|
||||
{"id": 10, "subject": "Task 10: MxGateway swap bootstrap to AddZbSerilog (sister-repo)", "status": "pending", "blockedBy": [9]},
|
||||
{"id": 11, "subject": "Task 11: MxGateway correlation scope + redactor on shared seam (sister-repo)", "status": "pending", "blockedBy": [10]},
|
||||
{"id": 12, "subject": "Task 12: pack + README + register indexes + upcoming.md", "status": "pending", "blockedBy": [1, 2, 6, 9, 11]}
|
||||
],
|
||||
"lastUpdated": "2026-06-01"
|
||||
}
|
||||
Reference in New Issue
Block a user