natsnet/docs/plans/2026-02-27-batch-22-monitoring-design.md

# Batch 22 Monitoring Design

**Date:** 2026-02-27
**Batch:** 22 (`Monitoring`)
**Scope:** Design only. No implementation in this document.

## Problem

Batch 22 ports NATS server monitoring behavior from `server/monitor.go` into .NET. The batch is large and mixed:

- Features: `70` (all currently `deferred`)
- Tests: `29` (all currently `deferred`)
- Dependencies: batches `18`, `19`
- Go source: `golang/nats-server/server/monitor.go`

This batch includes both core data endpoints (`/connz`, `/routez`, `/subsz`, `/varz`) and broader operational surfaces (`/gatewayz`, `/leafz`, `/accountz`, `/jsz`, `/healthz`, `/raftz`, `/debug/vars`, profiling).

## Context Findings

From tracker and codebase inspection:

- Batch metadata confirmed with:
  - `dotnet run --project tools/NatsNet.PortTracker -- batch show 22 --db porting.db`
  - `dotnet run --project tools/NatsNet.PortTracker -- batch list --db porting.db`
  - `dotnet run --project tools/NatsNet.PortTracker -- report summary --db porting.db`
- .NET currently has monitoring constants and partial DTOs (`MonitorTypes.cs`, `MonitorSortOptions.cs`, monitor paths in `NatsServer.Listeners.cs`), but most mapped runtime methods are not yet implemented.
- Existing mapped test files are mostly placeholder-style in `ImplBacklog` and need behavioral rewrites for the 29 mapped test IDs.
- Batch 22 tests span multiple classes (`MonitoringHandlerTests`, `RouteHandlerTests`, `LeafNodeHandlerTests`, `AccountTests`, `EventsHandlerTests`, `JetStreamJwtTests`, `ConfigReloaderTests`), so verification cannot be isolated to one test class.

## Constraints and Success Criteria

- Must preserve Go behavior semantics while writing idiomatic .NET 10 C#.
- Must follow project standards (`xUnit 3`, `Shouldly`, `NSubstitute`; no FluentAssertions/Moq).
- Must avoid stubs and fake-pass tests.
- Feature status can move to `verified` only after related test gate is green.
- Group work in chunks no larger than ~20 features.

Success looks like:

- 70 features implemented and verified (or explicitly deferred with reason where truly blocked).
- 29 mapped tests verified with real Arrange/Act/Assert behavior, not placeholders.
- Batch can be closed with `batch complete 22` once statuses satisfy PortTracker rules.

## Approaches

### Approach A: Single-file monitoring implementation

Implement all monitoring behavior in one or two large files (for example, `NatsServer.Monitoring.cs` + one test file wave).

Trade-offs:

- Pros: fewer new files.
- Cons: poor reviewability, high merge risk, difficult to verify incrementally, very high chance of hidden stubs in large diff.

### Approach B (Recommended): Domain-segmented partials and DTO blocks

Split monitoring into focused runtime domains with dedicated partial files and matching test waves.

Trade-offs:

- Pros: matches the natural endpoint domains in `monitor.go`, enables strong per-group build/test gating, easier status evidence collection.
- Cons: adds several files, requires deliberate file map upfront.

### Approach C: Test-first across all 29 tests before feature work

Rewrite all 29 tests first, then implement features until all pass.

Trade-offs:

- Pros: very fast signal on regressions.
- Cons: test set under-represents some large feature surfaces (`healthz`, `raftz`, gateway/account internals), so feature quality still needs per-feature validation loops.

## Recommended Design

Use **Approach B**.

### Architecture

Implement monitoring in six domain slices, each with a bounded feature group:

1. Connz core + query decoders + connz handler
2. Routez/Subsz/Stacksz/IPQueuesz
3. Varz/root/runtime/config helpers
4. Gatewayz + Leafz + AccountStatz + response helpers + closed-state rendering
5. Accountz + JSz account/detail + Jsz endpoint
6. Healthz + expvarz/profilez + raftz

Each slice follows the same loop: port features -> build -> run related tests -> stub scan -> status updates.

### Proposed File Map

Primary production files to create/modify:

- Create/Modify: `dotnet/src/ZB.MOM.NatsNet.Server/NatsServer.Monitoring.Connz.cs`
- Create/Modify: `dotnet/src/ZB.MOM.NatsNet.Server/NatsServer.Monitoring.RouteSub.cs`
- Create/Modify: `dotnet/src/ZB.MOM.NatsNet.Server/NatsServer.Monitoring.Varz.cs`
- Create/Modify: `dotnet/src/ZB.MOM.NatsNet.Server/NatsServer.Monitoring.GatewayLeaf.cs`
- Create/Modify: `dotnet/src/ZB.MOM.NatsNet.Server/NatsServer.Monitoring.AccountJsz.cs`
- Create/Modify: `dotnet/src/ZB.MOM.NatsNet.Server/NatsServer.Monitoring.HealthRaft.cs`
- Modify: `dotnet/src/ZB.MOM.NatsNet.Server/Monitor/MonitorTypes.cs`
- Modify: `dotnet/src/ZB.MOM.NatsNet.Server/Monitor/MonitorSortOptions.cs`
- Modify: `dotnet/src/ZB.MOM.NatsNet.Server/ClientTypes.cs` (for `ClosedState.String` parity)
- Modify (if needed for DTO placement):
  - `dotnet/src/ZB.MOM.NatsNet.Server/Routes/RouteTypes.cs`
  - `dotnet/src/ZB.MOM.NatsNet.Server/Gateway/GatewayTypes.cs`
  - `dotnet/src/ZB.MOM.NatsNet.Server/LeafNode/LeafNodeTypes.cs`

Primary mapped test files:

- `dotnet/tests/ZB.MOM.NatsNet.Server.Tests/ImplBacklog/MonitoringHandlerTests.Impltests.cs`
- `dotnet/tests/ZB.MOM.NatsNet.Server.Tests/ImplBacklog/RouteHandlerTests.Impltests.cs`
- `dotnet/tests/ZB.MOM.NatsNet.Server.Tests/ImplBacklog/LeafNodeHandlerTests.Impltests.cs`
- `dotnet/tests/ZB.MOM.NatsNet.Server.Tests/ImplBacklog/AccountTests.Impltests.cs`
- `dotnet/tests/ZB.MOM.NatsNet.Server.Tests/ImplBacklog/EventsHandlerTests.Impltests.cs`
- `dotnet/tests/ZB.MOM.NatsNet.Server.Tests/ImplBacklog/JetStreamJwtTests.Impltests.cs`
- `dotnet/tests/ZB.MOM.NatsNet.Server.Tests/ImplBacklog/ConfigReloaderTests.Impltests.cs`

### Data Flow

- HTTP monitor request -> query decode/validation -> domain endpoint function (`Connz`, `Routez`, `Varz`, etc.) -> DTO response projection -> unified response writer.
- Endpoint functions read server/account/client state under required locks and project immutable response objects.
- Sort and pagination apply after candidate gathering, matching Go behavior by endpoint.

### Error Handling Strategy

- Invalid query params return bad request through shared response helper.
- Unsupported combinations (for example sort options not valid for state) return explicit errors.
- Infra-unavailable behavior in tests remains deferred with explicit reason instead of placeholder implementations.

### Testing Strategy

- Rewrite only the mapped 29 test IDs as behavior-valid tests, class by class.
- Each feature group uses targeted test filters tied to that domain.
- Keep full unit test checkpoint between tasks to catch regressions outside monitor-specific tests.

### Risks and Mitigations

- Risk: fake tests pass while behavior is unimplemented.
  - Mitigation: explicit anti-stub scans for placeholder signatures and literal-only assertions.
- Risk: large healthz/raftz surfaces with sparse mapped tests.
  - Mitigation: per-feature read/port/build loop plus grouped sanity tests and status evidence requirements.
- Risk: lock-sensitive endpoint logic causes race regressions.
  - Mitigation: keep route/leaf/account race tests in the required per-group gate.

## Design Approval Basis

This design is based on the explicit user-provided constraints (planning-only, mandatory guardrails, group size limits, and required tracker commands) and is ready for implementation planning with `writeplan`.