Files
natsnet/docs/plans/2026-02-27-batch-25-gateways-design.md
Joseph Doherty c05d93618e Add batch plans for batches 23-30 (rounds 12-15)
Generated design docs and implementation plans via Codex for:
- Batch 23: Routes
- Batch 24: Leaf Nodes
- Batch 25: Gateways
- Batch 26: WebSocket
- Batch 27: JetStream Core
- Batch 28: JetStream API
- Batch 29: JetStream Batching
- Batch 30: Raft Part 1

All plans include mandatory verification protocol and anti-stub guardrails.
Updated batches.md with file paths and planned status.
2026-02-27 16:33:10 -05:00

9.0 KiB

Batch 25 Gateways Design

Date: 2026-02-27
Batch: 25 (Gateways)
Scope: Design only. No implementation in this document.

Problem

Batch 25 ports gateway connection handling from golang/nats-server/server/gateway.go into the .NET server.

  • Features: 86 (all currently deferred)
  • Tests: 59 (all currently deferred)
  • Dependencies: batches 19 and 23
  • Go source: server/gateway.go
  • Batch status: pending

This batch is one of the highest fan-in networking areas: it touches server startup, inbound/outbound connection lifecycle, route gossip, account interest propagation, and reply-subject mapping.

Context Findings

Collected with:

  • /usr/local/share/dotnet/dotnet run --project tools/NatsNet.PortTracker -- batch show 25 --db porting.db
  • /usr/local/share/dotnet/dotnet run --project tools/NatsNet.PortTracker -- batch list --db porting.db
  • /usr/local/share/dotnet/dotnet run --project tools/NatsNet.PortTracker -- report summary --db porting.db
  • /usr/local/share/dotnet/dotnet run --project tools/NatsNet.PortTracker -- batch ready --db porting.db

Repository observations:

  • dotnet/src/ZB.MOM.NatsNet.Server/Gateway/GatewayTypes.cs already contains gateway data types, but method behavior is mostly unimplemented and includes at least one explicit TODO (GwReplyMapping.Get).
  • NatsServer.Init.cs still comments gateway initialization as deferred (s.NewGateway(opts)).
  • NatsServer.Lifecycle.cs includes gateway placeholders (GetAllGatewayConnections, RemoveRemoteGatewayConnection, GatewayUpdateSubInterest) that need full parity behavior.
  • ClientConnection.cs does not yet contain gateway protocol/message handlers (ProcessGatewayConnect, ProcessGatewayInfo, ProcessGatewayRSub, SendMsgToGateways, etc.).
  • GatewayHandlerTests.Impltests.cs currently has many placeholder-style tests (for example assertions against literal "Should" strings), so Batch 25 test work must include anti-stub cleanup.
  • batch ready currently does not list Batch 25 as ready, so implementation must not start until dependencies are complete.

Constraints and Success Criteria

Constraints:

  • Follow docs/standards/dotnet-standards.md (.NET 10, nullable enabled, xUnit 3 + Shouldly + NSubstitute).
  • Keep behavior equivalent to Go intent in gateway.go; do not line-by-line transliterate goroutine patterns.
  • No stubs/fake-pass tests/status promotion without evidence.
  • Execute in feature groups of at most ~20 features.
  • Dependency order is strict: Batch 25 execution only starts when batches 19 and 23 are complete.

Success criteria:

  • All 86 features are either implemented and verified, or explicitly deferred with concrete blocker reasons.
  • All 59 mapped tests are real behavioral tests and pass.
  • Gateway regression paths remain green across server/account/route/leaf/monitoring touchpoints.
  • Batch 25 can be completed in PortTracker without audit override due to placeholder code.

Approaches

Approach A: Single-file gateway implementation

Place nearly all gateway methods into one large server file and keep client changes in ClientConnection.cs.

Trade-offs:

  • Pros: fewer files.
  • Cons: very high review risk, poor parallelism, hard evidence tracking, difficult rollback.

Split implementation into focused partials:

  1. Config/bootstrap/listener and outbound solicitation
  2. Handshake/INFO/gossip and connection registry
  3. Interest/subscription propagation
  4. Reply-map and inbound/outbound message processing

Trade-offs:

  • Pros: maps directly to gateway.go sections, clean feature grouping, tighter verification loops, lower merge risk.
  • Cons: more files and coordination.

Approach C: Test-first backlog replacement then backfill feature methods

Replace all gateway-related placeholder tests first, then implement production behavior until tests pass.

Trade-offs:

  • Pros: immediate feedback.
  • Cons: current mapped tests underrepresent some feature paths unless supplemented with targeted regression gates.

Use Approach B with five feature groups (19/18/17/16/16) and four test waves (15/16/18/10) tied to gateway.go line ranges and mapped test IDs.

Architecture

  • NatsServer partials own gateway server state transitions: initialization, accept loop, solicitation/reconnect, remote registration, route gossip, account-level interest operations, and reply-map lifecycle.
  • ClientConnection partials own gateway protocol operations: CONNECT/INFO handling, RSub/RUnsub/account commands, inbound gateway message pipeline, and outbound gateway send decisions.
  • Gateway helper surface owns deterministic utilities: hash/prefix helpers, routed-reply parsing, string/proto builders, and gateway option validation.
  • Existing GatewayTypes remains the source of gateway state objects but receives method implementations and lock-safe helpers.

Proposed File Map

Primary production files:

  • Modify: dotnet/src/ZB.MOM.NatsNet.Server/Gateway/GatewayTypes.cs
  • Create: dotnet/src/ZB.MOM.NatsNet.Server/Gateway/GatewayHandler.cs
  • Create: dotnet/src/ZB.MOM.NatsNet.Server/NatsServer.Gateways.ConfigAndStartup.cs
  • Create: dotnet/src/ZB.MOM.NatsNet.Server/NatsServer.Gateways.ConnectionsAndGossip.cs
  • Create: dotnet/src/ZB.MOM.NatsNet.Server/NatsServer.Gateways.Interest.cs
  • Create: dotnet/src/ZB.MOM.NatsNet.Server/NatsServer.Gateways.ReplyMap.cs
  • Create: dotnet/src/ZB.MOM.NatsNet.Server/ClientConnection.Gateways.Protocol.cs
  • Create: dotnet/src/ZB.MOM.NatsNet.Server/ClientConnection.Gateways.Messages.cs
  • Modify: dotnet/src/ZB.MOM.NatsNet.Server/NatsServer.Init.cs
  • Modify: dotnet/src/ZB.MOM.NatsNet.Server/NatsServer.Lifecycle.cs
  • Modify (as needed): dotnet/src/ZB.MOM.NatsNet.Server/Protocol/ProtocolParser.cs

Mapped test files:

  • Modify: dotnet/tests/ZB.MOM.NatsNet.Server.Tests/ImplBacklog/GatewayHandlerTests.Impltests.cs
  • Modify: dotnet/tests/ZB.MOM.NatsNet.Server.Tests/ImplBacklog/LeafNodeHandlerTests.Impltests.cs
  • Modify: dotnet/tests/ZB.MOM.NatsNet.Server.Tests/ImplBacklog/MonitoringHandlerTests.Impltests.cs
  • Modify: dotnet/tests/ZB.MOM.NatsNet.Server.Tests/ImplBacklog/JetStreamEngineTests.Impltests.cs
  • Create/Modify: dotnet/tests/ZB.MOM.NatsNet.Server.Tests/ImplBacklog/JetStreamSuperClusterTests.Impltests.cs
  • Modify: dotnet/tests/ZB.MOM.NatsNet.Server.Tests/ImplBacklog/ConcurrencyTests1.Impltests.cs
  • Modify: dotnet/tests/ZB.MOM.NatsNet.Server.Tests/ImplBacklog/ConcurrencyTests2.Impltests.cs
  • Modify: dotnet/tests/ZB.MOM.NatsNet.Server.Tests/ImplBacklog/ConfigReloaderTests.Impltests.cs
  • Modify: dotnet/tests/ZB.MOM.NatsNet.Server.Tests/ImplBacklog/NatsServerTests.Impltests.cs

Data and Control Flow

  • Startup flow: validate options -> initialize gateway state (NewGateway) -> start accept loop -> solicit remotes after configured delay.
  • Handshake flow: outbound connection receives INFO -> emits CONNECT -> registers outbound; inbound accepts CONNECT -> validates/authorizes -> registers inbound.
  • Gossip flow: new gateway URLs and cluster metadata propagate to routes and inbound gateways.
  • Interest flow: account/subscription changes update per-account gateway state and trigger RS+/RS-/A+/A- commands.
  • Message flow: publish routing checks local interest, gateway interest mode, queue weights, and reply mapping before fanout.
  • Reply flow: _GR_/legacy reply prefixes are tracked, mapped, and expired through periodic cleanup.

Error Handling Strategy

  • Preserve Go behavior for protocol errors (wrong gateway, malformed gateway commands, invalid account commands).
  • Keep explicit guard clauses and structured log messages around malformed INFO/CONNECT and URL validation failures.
  • On transient dial failures use reconnect/backoff paths; on unrecoverable config violations fail fast.
  • For blocked items, defer with concrete reason instead of placeholder logic.

Verification Strategy

  • Use per-feature and per-test loops (read Go source, implement, build, run targeted tests).
  • Enforce mandatory stub scans for both production and test files after each group.
  • Require build gate after each feature group and full gateway-related test gate before verified.
  • Enforce checkpoint protocol between tasks: full build + full unit test sweep + commit.

Risks and Mitigations

  • Risk: race regressions in gateway interest maps and reply map.
    • Mitigation: include no-race mapped tests (2376, 2490) in mandatory verification wave.
  • Risk: mismatch between route gossip and gateway URL updates.
    • Mitigation: include monitoring and reload mapped tests (2127, 2131, 2747) in verification gate.
  • Risk: placeholder test drift in GatewayHandlerTests.
    • Mitigation: anti-stub guardrails with explicit forbidden patterns and evidence-based status updates.

Design Approval Basis

This design is based on Batch 25 tracker metadata, current repository state, and the mandatory verification/anti-stub guardrail model from docs/plans/2026-02-27-batch-0-implementable-tests-plan.md, adapted for both features and tests.