Files
natsnet/docs/plans/2026-02-27-batch-21-events-msgtrace-design.md
Joseph Doherty dc3e162608 Add batch plans for batches 13-15, 18-22 (rounds 8-11)
Generated design docs and implementation plans via Codex for:
- Batch 13: FileStore Read/Query
- Batch 14: FileStore Write/Lifecycle
- Batch 15: MsgBlock + ConsumerFileStore
- Batch 18: Server Core
- Batch 19: Accounts Core
- Batch 20: Accounts Resolvers
- Batch 21: Events + MsgTrace
- Batch 22: Monitoring

All plans include mandatory verification protocol and anti-stub guardrails.
Updated batches.md with file paths and planned status.
2026-02-27 15:43:14 -05:00

5.5 KiB

Batch 21 Events + MsgTrace Design

Date: 2026-02-27
Batch: 21 (Events + MsgTrace)
Scope: Design only. No implementation in this document.

Problem

Batch 21 contains high-fanout eventing and tracing behavior used by system account messaging, server stats, admin requests, and distributed message tracing.

Current tracker scope from batch show 21:

  • Features: 118 (all deferred)
  • Tests: 9 (all deferred)
  • Dependencies: batches 18, 19
  • Go sources: server/events.go, server/msgtrace.go

Context Findings

  • Existing .NET code already has many event and trace DTO/types in:
    • dotnet/src/ZB.MOM.NatsNet.Server/Events/EventTypes.cs
    • dotnet/src/ZB.MOM.NatsNet.Server/MessageTrace/MsgTraceTypes.cs
  • Runtime behavior methods mapped in Batch 21 are largely missing from NatsServer and ClientConnection.
  • Existing backlog tests for Events/MsgTrace include placeholder-style assertions and must be replaced with behavior-valid tests for mapped Batch 21 IDs.
  • test show mapping confirms target class/methods for all 9 tests:
    • Events: 6 methods in EventsHandlerTests
    • MsgTrace: 2 methods in MessageTracerTests
    • Concurrency: 1 method in ConcurrencyTests1

Assumptions

  • Batch 21 work begins only after dependencies (batches 18 and 19) are complete enough to compile and run related tests.
  • We preserve existing project structure and naming patterns (NatsServer.*.cs, ClientConnection.*.cs, ImplBacklog/*.Impltests.cs).
  • No new integration infrastructure is introduced in this batch; infra-blocked items remain deferred with explicit reasons.

Approaches

Approach A: Monolithic implementation in existing large files

Implement all methods directly in NatsServer.cs, ClientConnection.cs, ClientTypes.cs, and NatsServerTypes.cs.

Trade-offs:

  • Pros: Minimal file creation.
  • Cons: Very high merge conflict risk, poor reviewability, difficult verification per feature group.

Add focused partial/runtime files for events and msgtrace behavior while leaving DTO/type files intact.

Trade-offs:

  • Pros: Enables clear feature-group boundaries, easier per-group build/test loops, lower risk of accidental regressions.
  • Cons: Requires a few new files and up-front structure decisions.

Approach C: Test-only first, then backfill features

Start by rewriting the 9 tests to force implementation behavior.

Trade-offs:

  • Pros: Fast feedback for mapped tests.
  • Cons: Batch has 118 features and only 9 mapped tests; test-first alone leaves large behavior surface unvalidated.

Use Approach B.

Code organization

Implement runtime behavior in small, targeted files:

  • dotnet/src/ZB.MOM.NatsNet.Server/NatsServer.Events.cs (core eventing/send loops)
  • dotnet/src/ZB.MOM.NatsNet.Server/NatsServer.Events.System.cs (system subscriptions/requests)
  • dotnet/src/ZB.MOM.NatsNet.Server/NatsServer.Events.Admin.cs (reload/kick/ldm/debug/OCSP event paths)
  • dotnet/src/ZB.MOM.NatsNet.Server/ClientConnection.MsgTrace.cs (client-side trace enable/init helpers)
  • dotnet/src/ZB.MOM.NatsNet.Server/MessageTrace/MsgTraceRuntime.cs (runtime trace mutation/send behavior)
  • plus targeted edits in:
    • dotnet/src/ZB.MOM.NatsNet.Server/NatsServerTypes.cs (ServerInfo capability helpers)
    • dotnet/src/ZB.MOM.NatsNet.Server/ClientTypes.cs (ClientInfo helper projections)
    • dotnet/src/ZB.MOM.NatsNet.Server/Accounts/Account.cs (Account.statz)

Behavior domains

  • Domain 1: Capability and metadata helpers (ServerInfo, ClientInfo, hashing helpers).
  • Domain 2: Internal system message send/receive loops and event state lifecycle.
  • Domain 3: Remote server/account tracking and statsz/advisory publication.
  • Domain 4: System subscription wiring and request handlers (connsz/statsz/idz/nsubs/reload/kick/ldm).
  • Domain 5: OCSP advisory events and misc utility wrappers.
  • Domain 6: Message trace runtime (trace enablement, header extraction, event aggregation, publish path).

Test design

Replace mapped placeholder tests with behavior checks that assert:

  • System subscription registration and unsubscribe behavior.
  • Connection update timer/sweep behavior under local/remote account changes.
  • Remote latency update validation and bad payload handling.
  • MsgTrace connection-name normalization and trace-header parsing correctness.
  • No-race JetStream compact scenario behavior for mapped test ID 2412.

Execution model

Port features in 6 groups (<=20 IDs each), then tests in 2 waves, with strict per-feature verification and anti-stub gates.

Risks and Mitigations

  • Risk: Placeholder tests can pass while behavior is wrong.
    • Mitigation: Mandatory anti-stub checks and per-test evidence before status updates.
  • Risk: Large eventing surface can regress unrelated server behavior.
    • Mitigation: Build gate after each feature group + full unit-test checkpoint between tasks.
  • Risk: Some tests require runtime topology not available in unit test harness.
    • Mitigation: keep deferred with explicit blocker reason; do not stub.

Success Criteria

  • All 118 Batch 21 feature IDs moved through stub -> complete -> verified only with build + test evidence.
  • All 9 Batch 21 test IDs either verified with real assertions or deferred with explicit blocker reason.
  • No new stubs (NotImplementedException, empty bodies, TODO placeholders) in touched feature/test files.
  • Batch 21 can be completed with batch complete 21 once all IDs satisfy status requirements.