Files
natsnet/docs/plans/2026-02-27-batch-22-monitoring-design.md
Joseph Doherty dc3e162608 Add batch plans for batches 13-15, 18-22 (rounds 8-11)
Generated design docs and implementation plans via Codex for:
- Batch 13: FileStore Read/Query
- Batch 14: FileStore Write/Lifecycle
- Batch 15: MsgBlock + ConsumerFileStore
- Batch 18: Server Core
- Batch 19: Accounts Core
- Batch 20: Accounts Resolvers
- Batch 21: Events + MsgTrace
- Batch 22: Monitoring

All plans include mandatory verification protocol and anti-stub guardrails.
Updated batches.md with file paths and planned status.
2026-02-27 15:43:14 -05:00

7.2 KiB

Batch 22 Monitoring Design

Date: 2026-02-27
Batch: 22 (Monitoring)
Scope: Design only. No implementation in this document.

Problem

Batch 22 ports NATS server monitoring behavior from server/monitor.go into .NET. The batch is large and mixed:

  • Features: 70 (all currently deferred)
  • Tests: 29 (all currently deferred)
  • Dependencies: batches 18, 19
  • Go source: golang/nats-server/server/monitor.go

This batch includes both core data endpoints (/connz, /routez, /subsz, /varz) and broader operational surfaces (/gatewayz, /leafz, /accountz, /jsz, /healthz, /raftz, /debug/vars, profiling).

Context Findings

From tracker and codebase inspection:

  • Batch metadata confirmed with:
    • dotnet run --project tools/NatsNet.PortTracker -- batch show 22 --db porting.db
    • dotnet run --project tools/NatsNet.PortTracker -- batch list --db porting.db
    • dotnet run --project tools/NatsNet.PortTracker -- report summary --db porting.db
  • .NET currently has monitoring constants and partial DTOs (MonitorTypes.cs, MonitorSortOptions.cs, monitor paths in NatsServer.Listeners.cs), but most mapped runtime methods are not yet implemented.
  • Existing mapped test files are mostly placeholder-style in ImplBacklog and need behavioral rewrites for the 29 mapped test IDs.
  • Batch 22 tests span multiple classes (MonitoringHandlerTests, RouteHandlerTests, LeafNodeHandlerTests, AccountTests, EventsHandlerTests, JetStreamJwtTests, ConfigReloaderTests), so verification cannot be isolated to one test class.

Constraints and Success Criteria

  • Must preserve Go behavior semantics while writing idiomatic .NET 10 C#.
  • Must follow project standards (xUnit 3, Shouldly, NSubstitute; no FluentAssertions/Moq).
  • Must avoid stubs and fake-pass tests.
  • Feature status can move to verified only after related test gate is green.
  • Group work in chunks no larger than ~20 features.

Success looks like:

  • 70 features implemented and verified (or explicitly deferred with reason where truly blocked).
  • 29 mapped tests verified with real Arrange/Act/Assert behavior, not placeholders.
  • Batch can be closed with batch complete 22 once statuses satisfy PortTracker rules.

Approaches

Approach A: Single-file monitoring implementation

Implement all monitoring behavior in one or two large files (for example, NatsServer.Monitoring.cs + one test file wave).

Trade-offs:

  • Pros: fewer new files.
  • Cons: poor reviewability, high merge risk, difficult to verify incrementally, very high chance of hidden stubs in large diff.

Split monitoring into focused runtime domains with dedicated partial files and matching test waves.

Trade-offs:

  • Pros: matches the natural endpoint domains in monitor.go, enables strong per-group build/test gating, easier status evidence collection.
  • Cons: adds several files, requires deliberate file map upfront.

Approach C: Test-first across all 29 tests before feature work

Rewrite all 29 tests first, then implement features until all pass.

Trade-offs:

  • Pros: very fast signal on regressions.
  • Cons: test set under-represents some large feature surfaces (healthz, raftz, gateway/account internals), so feature quality still needs per-feature validation loops.

Use Approach B.

Architecture

Implement monitoring in six domain slices, each with a bounded feature group:

  1. Connz core + query decoders + connz handler
  2. Routez/Subsz/Stacksz/IPQueuesz
  3. Varz/root/runtime/config helpers
  4. Gatewayz + Leafz + AccountStatz + response helpers + closed-state rendering
  5. Accountz + JSz account/detail + Jsz endpoint
  6. Healthz + expvarz/profilez + raftz

Each slice follows the same loop: port features -> build -> run related tests -> stub scan -> status updates.

Proposed File Map

Primary production files to create/modify:

  • Create/Modify: dotnet/src/ZB.MOM.NatsNet.Server/NatsServer.Monitoring.Connz.cs
  • Create/Modify: dotnet/src/ZB.MOM.NatsNet.Server/NatsServer.Monitoring.RouteSub.cs
  • Create/Modify: dotnet/src/ZB.MOM.NatsNet.Server/NatsServer.Monitoring.Varz.cs
  • Create/Modify: dotnet/src/ZB.MOM.NatsNet.Server/NatsServer.Monitoring.GatewayLeaf.cs
  • Create/Modify: dotnet/src/ZB.MOM.NatsNet.Server/NatsServer.Monitoring.AccountJsz.cs
  • Create/Modify: dotnet/src/ZB.MOM.NatsNet.Server/NatsServer.Monitoring.HealthRaft.cs
  • Modify: dotnet/src/ZB.MOM.NatsNet.Server/Monitor/MonitorTypes.cs
  • Modify: dotnet/src/ZB.MOM.NatsNet.Server/Monitor/MonitorSortOptions.cs
  • Modify: dotnet/src/ZB.MOM.NatsNet.Server/ClientTypes.cs (for ClosedState.String parity)
  • Modify (if needed for DTO placement):
    • dotnet/src/ZB.MOM.NatsNet.Server/Routes/RouteTypes.cs
    • dotnet/src/ZB.MOM.NatsNet.Server/Gateway/GatewayTypes.cs
    • dotnet/src/ZB.MOM.NatsNet.Server/LeafNode/LeafNodeTypes.cs

Primary mapped test files:

  • dotnet/tests/ZB.MOM.NatsNet.Server.Tests/ImplBacklog/MonitoringHandlerTests.Impltests.cs
  • dotnet/tests/ZB.MOM.NatsNet.Server.Tests/ImplBacklog/RouteHandlerTests.Impltests.cs
  • dotnet/tests/ZB.MOM.NatsNet.Server.Tests/ImplBacklog/LeafNodeHandlerTests.Impltests.cs
  • dotnet/tests/ZB.MOM.NatsNet.Server.Tests/ImplBacklog/AccountTests.Impltests.cs
  • dotnet/tests/ZB.MOM.NatsNet.Server.Tests/ImplBacklog/EventsHandlerTests.Impltests.cs
  • dotnet/tests/ZB.MOM.NatsNet.Server.Tests/ImplBacklog/JetStreamJwtTests.Impltests.cs
  • dotnet/tests/ZB.MOM.NatsNet.Server.Tests/ImplBacklog/ConfigReloaderTests.Impltests.cs

Data Flow

  • HTTP monitor request -> query decode/validation -> domain endpoint function (Connz, Routez, Varz, etc.) -> DTO response projection -> unified response writer.
  • Endpoint functions read server/account/client state under required locks and project immutable response objects.
  • Sort and pagination apply after candidate gathering, matching Go behavior by endpoint.

Error Handling Strategy

  • Invalid query params return bad request through shared response helper.
  • Unsupported combinations (for example sort options not valid for state) return explicit errors.
  • Infra-unavailable behavior in tests remains deferred with explicit reason instead of placeholder implementations.

Testing Strategy

  • Rewrite only the mapped 29 test IDs as behavior-valid tests, class by class.
  • Each feature group uses targeted test filters tied to that domain.
  • Keep full unit test checkpoint between tasks to catch regressions outside monitor-specific tests.

Risks and Mitigations

  • Risk: fake tests pass while behavior is unimplemented.
    • Mitigation: explicit anti-stub scans for placeholder signatures and literal-only assertions.
  • Risk: large healthz/raftz surfaces with sparse mapped tests.
    • Mitigation: per-feature read/port/build loop plus grouped sanity tests and status evidence requirements.
  • Risk: lock-sensitive endpoint logic causes race regressions.
    • Mitigation: keep route/leaf/account race tests in the required per-group gate.

Design Approval Basis

This design is based on the explicit user-provided constraints (planning-only, mandatory guardrails, group size limits, and required tracker commands) and is ready for implementation planning with writeplan.