Generated design docs and implementation plans via Codex for: - Batch 13: FileStore Read/Query - Batch 14: FileStore Write/Lifecycle - Batch 15: MsgBlock + ConsumerFileStore - Batch 18: Server Core - Batch 19: Accounts Core - Batch 20: Accounts Resolvers - Batch 21: Events + MsgTrace - Batch 22: Monitoring All plans include mandatory verification protocol and anti-stub guardrails. Updated batches.md with file paths and planned status.
115 lines
5.5 KiB
Markdown
115 lines
5.5 KiB
Markdown
# Batch 21 Events + MsgTrace Design
|
|
|
|
**Date:** 2026-02-27
|
|
**Batch:** 21 (`Events + MsgTrace`)
|
|
**Scope:** Design only. No implementation in this document.
|
|
|
|
## Problem
|
|
|
|
Batch 21 contains high-fanout eventing and tracing behavior used by system account messaging, server stats, admin requests, and distributed message tracing.
|
|
|
|
Current tracker scope from `batch show 21`:
|
|
- Features: `118` (all `deferred`)
|
|
- Tests: `9` (all `deferred`)
|
|
- Dependencies: batches `18`, `19`
|
|
- Go sources: `server/events.go`, `server/msgtrace.go`
|
|
|
|
## Context Findings
|
|
|
|
- Existing .NET code already has many event and trace DTO/types in:
|
|
- `dotnet/src/ZB.MOM.NatsNet.Server/Events/EventTypes.cs`
|
|
- `dotnet/src/ZB.MOM.NatsNet.Server/MessageTrace/MsgTraceTypes.cs`
|
|
- Runtime behavior methods mapped in Batch 21 are largely missing from `NatsServer` and `ClientConnection`.
|
|
- Existing backlog tests for Events/MsgTrace include placeholder-style assertions and must be replaced with behavior-valid tests for mapped Batch 21 IDs.
|
|
- `test show` mapping confirms target class/methods for all 9 tests:
|
|
- Events: 6 methods in `EventsHandlerTests`
|
|
- MsgTrace: 2 methods in `MessageTracerTests`
|
|
- Concurrency: 1 method in `ConcurrencyTests1`
|
|
|
|
## Assumptions
|
|
|
|
- Batch 21 work begins only after dependencies (batches 18 and 19) are complete enough to compile and run related tests.
|
|
- We preserve existing project structure and naming patterns (`NatsServer.*.cs`, `ClientConnection.*.cs`, `ImplBacklog/*.Impltests.cs`).
|
|
- No new integration infrastructure is introduced in this batch; infra-blocked items remain deferred with explicit reasons.
|
|
|
|
## Approaches
|
|
|
|
### Approach A: Monolithic implementation in existing large files
|
|
|
|
Implement all methods directly in `NatsServer.cs`, `ClientConnection.cs`, `ClientTypes.cs`, and `NatsServerTypes.cs`.
|
|
|
|
Trade-offs:
|
|
- Pros: Minimal file creation.
|
|
- Cons: Very high merge conflict risk, poor reviewability, difficult verification per feature group.
|
|
|
|
### Approach B (Recommended): Partial-file segmentation by runtime domain
|
|
|
|
Add focused partial/runtime files for events and msgtrace behavior while leaving DTO/type files intact.
|
|
|
|
Trade-offs:
|
|
- Pros: Enables clear feature-group boundaries, easier per-group build/test loops, lower risk of accidental regressions.
|
|
- Cons: Requires a few new files and up-front structure decisions.
|
|
|
|
### Approach C: Test-only first, then backfill features
|
|
|
|
Start by rewriting the 9 tests to force implementation behavior.
|
|
|
|
Trade-offs:
|
|
- Pros: Fast feedback for mapped tests.
|
|
- Cons: Batch has 118 features and only 9 mapped tests; test-first alone leaves large behavior surface unvalidated.
|
|
|
|
## Recommended Design
|
|
|
|
Use **Approach B**.
|
|
|
|
### Code organization
|
|
|
|
Implement runtime behavior in small, targeted files:
|
|
- `dotnet/src/ZB.MOM.NatsNet.Server/NatsServer.Events.cs` (core eventing/send loops)
|
|
- `dotnet/src/ZB.MOM.NatsNet.Server/NatsServer.Events.System.cs` (system subscriptions/requests)
|
|
- `dotnet/src/ZB.MOM.NatsNet.Server/NatsServer.Events.Admin.cs` (reload/kick/ldm/debug/OCSP event paths)
|
|
- `dotnet/src/ZB.MOM.NatsNet.Server/ClientConnection.MsgTrace.cs` (client-side trace enable/init helpers)
|
|
- `dotnet/src/ZB.MOM.NatsNet.Server/MessageTrace/MsgTraceRuntime.cs` (runtime trace mutation/send behavior)
|
|
- plus targeted edits in:
|
|
- `dotnet/src/ZB.MOM.NatsNet.Server/NatsServerTypes.cs` (ServerInfo capability helpers)
|
|
- `dotnet/src/ZB.MOM.NatsNet.Server/ClientTypes.cs` (ClientInfo helper projections)
|
|
- `dotnet/src/ZB.MOM.NatsNet.Server/Accounts/Account.cs` (`Account.statz`)
|
|
|
|
### Behavior domains
|
|
|
|
- Domain 1: Capability and metadata helpers (`ServerInfo`, `ClientInfo`, hashing helpers).
|
|
- Domain 2: Internal system message send/receive loops and event state lifecycle.
|
|
- Domain 3: Remote server/account tracking and statsz/advisory publication.
|
|
- Domain 4: System subscription wiring and request handlers (connsz/statsz/idz/nsubs/reload/kick/ldm).
|
|
- Domain 5: OCSP advisory events and misc utility wrappers.
|
|
- Domain 6: Message trace runtime (trace enablement, header extraction, event aggregation, publish path).
|
|
|
|
### Test design
|
|
|
|
Replace mapped placeholder tests with behavior checks that assert:
|
|
- System subscription registration and unsubscribe behavior.
|
|
- Connection update timer/sweep behavior under local/remote account changes.
|
|
- Remote latency update validation and bad payload handling.
|
|
- MsgTrace connection-name normalization and trace-header parsing correctness.
|
|
- No-race JetStream compact scenario behavior for mapped test ID 2412.
|
|
|
|
### Execution model
|
|
|
|
Port features in 6 groups (<=20 IDs each), then tests in 2 waves, with strict per-feature verification and anti-stub gates.
|
|
|
|
## Risks and Mitigations
|
|
|
|
- Risk: Placeholder tests can pass while behavior is wrong.
|
|
- Mitigation: Mandatory anti-stub checks and per-test evidence before status updates.
|
|
- Risk: Large eventing surface can regress unrelated server behavior.
|
|
- Mitigation: Build gate after each feature group + full unit-test checkpoint between tasks.
|
|
- Risk: Some tests require runtime topology not available in unit test harness.
|
|
- Mitigation: keep deferred with explicit blocker reason; do not stub.
|
|
|
|
## Success Criteria
|
|
|
|
- All 118 Batch 21 feature IDs moved through `stub -> complete -> verified` only with build + test evidence.
|
|
- All 9 Batch 21 test IDs either `verified` with real assertions or `deferred` with explicit blocker reason.
|
|
- No new stubs (`NotImplementedException`, empty bodies, TODO placeholders) in touched feature/test files.
|
|
- Batch 21 can be completed with `batch complete 21` once all IDs satisfy status requirements.
|