Files
natsnet/docs/plans/2026-02-27-batch-14-filestore-write-lifecycle-design.md
Joseph Doherty dc3e162608 Add batch plans for batches 13-15, 18-22 (rounds 8-11)
Generated design docs and implementation plans via Codex for:
- Batch 13: FileStore Read/Query
- Batch 14: FileStore Write/Lifecycle
- Batch 15: MsgBlock + ConsumerFileStore
- Batch 18: Server Core
- Batch 19: Accounts Core
- Batch 20: Accounts Resolvers
- Batch 21: Events + MsgTrace
- Batch 22: Monitoring

All plans include mandatory verification protocol and anti-stub guardrails.
Updated batches.md with file paths and planned status.
2026-02-27 15:43:14 -05:00

5.7 KiB

Batch 14 FileStore Write/Lifecycle Design

Context

  • Batch: 14 (FileStore Write/Lifecycle)
  • Dependency: Batch 13 (FileStore Read/Query)
  • Scope: 76 features + 64 tests
  • Go reference: golang/nats-server/server/filestore.go (primarily lines 4394-12549)
  • .NET target surface:
    • dotnet/src/ZB.MOM.NatsNet.Server/JetStream/FileStore.cs
    • dotnet/src/ZB.MOM.NatsNet.Server/JetStream/MessageBlock.cs
    • dotnet/src/ZB.MOM.NatsNet.Server/JetStream/FileStoreTypes.cs

Current implementation state:

  • JetStreamFileStore is still a delegation shell to JetStreamMemStore for most IStreamStore behavior.
  • Batch 14 methods are mostly not present yet in .NET.
  • Most mapped Batch 14 tests are not implemented in backlog files and need real behavioral coverage.

Problem Statement

Batch 14 is the first large FileStore execution batch where write path, retention/compaction, purge/reset, state flush, snapshot, and shutdown lifecycle all need file-backed behavior instead of memory-store delegation. If this batch is implemented without strict lock discipline and anti-stub verification, downstream batches (15, 36, 37) will inherit brittle storage behavior and unreliable test evidence.

Clarified Constraints

  • Keep Batch 14 scoped to mapped methods/tests only; do not pull Batch 15 MessageBlock + ConsumerFileStore work forward.
  • Use evidence-backed status updates in PortTracker (max 15 IDs per update).
  • Keep tests real: no placeholders, no always-pass assertions, no non-behavioral smoke tests.
  • If a test requires unavailable runtime infrastructure, keep it deferred with a concrete reason instead of stubbing.

Approaches Considered

Implement Batch 14 in four functional feature groups (18 + 18 + 20 + 20), each followed by targeted test waves that only promote tests whose feature dependencies are already implemented and verified.

Pros:

  • Keeps each cycle below the complexity threshold for file-store concurrency code.
  • Makes failures local and debuggable.
  • Aligns naturally with mandatory build/test/status checkpoints.

Cons:

  • Requires careful bookkeeping of cross-group tests.
  • More commits and checkpoint overhead.

Approach 2: Implement all 76 features first, then all 64 tests

Complete production surface in one pass, then backfill all tests at the end.

Pros:

  • Fewer context switches.

Cons:

  • High risk of broad regressions discovered late.
  • Weak traceability between feature status and test evidence.
  • Encourages accidental stub completion pressure near the end.

Approach 3: Test-first only with synthetic wrappers over memstore delegation

Attempt to satisfy mapped tests through wrapper behavior while delaying real file-backed implementation.

Pros:

  • Fast initial green tests.

Cons:

  • Violates batch intent (real FileStore write/lifecycle parity).
  • Produces fragile tests that validate wrappers, not storage behavior.
  • Increases later rework and hidden defects.

1) Implementation topology

Use four feature groups with bounded scope:

  • Group 1 (18): write-path foundation, per-subject totals, limits/removal entrypoints.
  • Group 2 (18): age/scheduling loops, record/tombstone writes, block sync/select helpers.
  • Group 3 (20): seq/read helpers, cache/state counters, purge/compact/reset and block-list mutation.
  • Group 4 (20): purge-block/global subject info, stream-state write loop, stop/snapshot/delete-map lifecycle.

2) Locking and lifecycle model

  • Preserve ReaderWriterLockSlim ownership boundaries in JetStreamFileStore.
  • Keep timer/background loop ownership explicit (_ageChk, _syncTmr, _qch, _fsld).
  • Ensure stop/flush/snapshot/delete paths are idempotent and race-safe under repeated calls.
  • Treat file writes and state writes as durability boundaries; enforce explicit optional-sync behavior parity.

3) Test model

Implement backlog tests as real behavioral tests in:

  • dotnet/tests/ZB.MOM.NatsNet.Server.Tests/ImplBacklog/JetStreamFileStoreTests.Impltests.cs
  • dotnet/tests/ZB.MOM.NatsNet.Server.Tests/ImplBacklog/ConcurrencyTests1.Impltests.cs
  • dotnet/tests/ZB.MOM.NatsNet.Server.Tests/ImplBacklog/ConcurrencyTests2.Impltests.cs
  • dotnet/tests/ZB.MOM.NatsNet.Server.Tests/ImplBacklog/LeafNodeHandlerTests.Impltests.cs
  • dotnet/tests/ZB.MOM.NatsNet.Server.Tests/ImplBacklog/RouteHandlerTests.Impltests.cs

Create missing backlog classes for mapped tests:

  • dotnet/tests/ZB.MOM.NatsNet.Server.Tests/ImplBacklog/LeafNodeProxyTests.Impltests.cs
  • dotnet/tests/ZB.MOM.NatsNet.Server.Tests/ImplBacklog/JetStreamClusterLongTests.Impltests.cs

4) Status strategy

  • Features: deferred -> stub -> complete -> verified.
  • Tests: deferred -> stub -> verified or deferred with explicit blocker reason.
  • Promote only IDs that have direct Go-read + build + targeted test evidence.

5) Risk controls

  • Mandatory stub scans after each feature/test wave.
  • Build gate after each feature group.
  • Related test gate before any verified promotion.
  • Full checkpoint (build + full unit tests + commit) between groups.

Non-Goals

  • Port Batch 15 (MessageBlock + ConsumerFileStore) behaviors beyond what Batch 14 methods directly require.
  • Converting integration-only tests into unit tests by weakening assertions.
  • Marking blocked runtime-heavy tests verified without executable evidence.

Acceptance Criteria

  • All 76 Batch 14 features implemented with non-stub behavior and verified evidence.
  • All implementable mapped tests in Batch 14 converted to real behavioral tests and verified.
  • Runtime-blocked tests remain deferred with concrete blocker notes.
  • Batch 14 can be completed with batch complete 14 after status/audit validation.