Files
natsnet/docs/plans/2026-02-27-batch-25-gateways-design.md
Joseph Doherty c05d93618e Add batch plans for batches 23-30 (rounds 12-15)
Generated design docs and implementation plans via Codex for:
- Batch 23: Routes
- Batch 24: Leaf Nodes
- Batch 25: Gateways
- Batch 26: WebSocket
- Batch 27: JetStream Core
- Batch 28: JetStream API
- Batch 29: JetStream Batching
- Batch 30: Raft Part 1

All plans include mandatory verification protocol and anti-stub guardrails.
Updated batches.md with file paths and planned status.
2026-02-27 16:33:10 -05:00

162 lines
9.0 KiB
Markdown

# Batch 25 Gateways Design
**Date:** 2026-02-27
**Batch:** 25 (`Gateways`)
**Scope:** Design only. No implementation in this document.
## Problem
Batch 25 ports gateway connection handling from `golang/nats-server/server/gateway.go` into the .NET server.
- Features: `86` (all currently `deferred`)
- Tests: `59` (all currently `deferred`)
- Dependencies: batches `19` and `23`
- Go source: `server/gateway.go`
- Batch status: `pending`
This batch is one of the highest fan-in networking areas: it touches server startup, inbound/outbound connection lifecycle, route gossip, account interest propagation, and reply-subject mapping.
## Context Findings
Collected with:
- `/usr/local/share/dotnet/dotnet run --project tools/NatsNet.PortTracker -- batch show 25 --db porting.db`
- `/usr/local/share/dotnet/dotnet run --project tools/NatsNet.PortTracker -- batch list --db porting.db`
- `/usr/local/share/dotnet/dotnet run --project tools/NatsNet.PortTracker -- report summary --db porting.db`
- `/usr/local/share/dotnet/dotnet run --project tools/NatsNet.PortTracker -- batch ready --db porting.db`
Repository observations:
- `dotnet/src/ZB.MOM.NatsNet.Server/Gateway/GatewayTypes.cs` already contains gateway data types, but method behavior is mostly unimplemented and includes at least one explicit TODO (`GwReplyMapping.Get`).
- `NatsServer.Init.cs` still comments gateway initialization as deferred (`s.NewGateway(opts)`).
- `NatsServer.Lifecycle.cs` includes gateway placeholders (`GetAllGatewayConnections`, `RemoveRemoteGatewayConnection`, `GatewayUpdateSubInterest`) that need full parity behavior.
- `ClientConnection.cs` does not yet contain gateway protocol/message handlers (`ProcessGatewayConnect`, `ProcessGatewayInfo`, `ProcessGatewayRSub`, `SendMsgToGateways`, etc.).
- `GatewayHandlerTests.Impltests.cs` currently has many placeholder-style tests (for example assertions against literal `"Should"` strings), so Batch 25 test work must include anti-stub cleanup.
- `batch ready` currently does not list Batch 25 as ready, so implementation must not start until dependencies are complete.
## Constraints and Success Criteria
Constraints:
- Follow `docs/standards/dotnet-standards.md` (.NET 10, nullable enabled, xUnit 3 + Shouldly + NSubstitute).
- Keep behavior equivalent to Go intent in `gateway.go`; do not line-by-line transliterate goroutine patterns.
- No stubs/fake-pass tests/status promotion without evidence.
- Execute in feature groups of at most ~20 features.
- Dependency order is strict: Batch 25 execution only starts when batches 19 and 23 are complete.
Success criteria:
- All 86 features are either implemented and verified, or explicitly deferred with concrete blocker reasons.
- All 59 mapped tests are real behavioral tests and pass.
- Gateway regression paths remain green across server/account/route/leaf/monitoring touchpoints.
- Batch 25 can be completed in PortTracker without audit override due to placeholder code.
## Approaches
### Approach A: Single-file gateway implementation
Place nearly all gateway methods into one large server file and keep client changes in `ClientConnection.cs`.
Trade-offs:
- Pros: fewer files.
- Cons: very high review risk, poor parallelism, hard evidence tracking, difficult rollback.
### Approach B (Recommended): Domain-segmented partials by gateway lifecycle
Split implementation into focused partials:
1. Config/bootstrap/listener and outbound solicitation
2. Handshake/INFO/gossip and connection registry
3. Interest/subscription propagation
4. Reply-map and inbound/outbound message processing
Trade-offs:
- Pros: maps directly to `gateway.go` sections, clean feature grouping, tighter verification loops, lower merge risk.
- Cons: more files and coordination.
### Approach C: Test-first backlog replacement then backfill feature methods
Replace all gateway-related placeholder tests first, then implement production behavior until tests pass.
Trade-offs:
- Pros: immediate feedback.
- Cons: current mapped tests underrepresent some feature paths unless supplemented with targeted regression gates.
## Recommended Design
Use **Approach B** with five feature groups (`19/18/17/16/16`) and four test waves (`15/16/18/10`) tied to `gateway.go` line ranges and mapped test IDs.
### Architecture
- `NatsServer` partials own gateway server state transitions: initialization, accept loop, solicitation/reconnect, remote registration, route gossip, account-level interest operations, and reply-map lifecycle.
- `ClientConnection` partials own gateway protocol operations: CONNECT/INFO handling, RSub/RUnsub/account commands, inbound gateway message pipeline, and outbound gateway send decisions.
- `Gateway` helper surface owns deterministic utilities: hash/prefix helpers, routed-reply parsing, string/proto builders, and gateway option validation.
- Existing `GatewayTypes` remains the source of gateway state objects but receives method implementations and lock-safe helpers.
### Proposed File Map
Primary production files:
- Modify: `dotnet/src/ZB.MOM.NatsNet.Server/Gateway/GatewayTypes.cs`
- Create: `dotnet/src/ZB.MOM.NatsNet.Server/Gateway/GatewayHandler.cs`
- Create: `dotnet/src/ZB.MOM.NatsNet.Server/NatsServer.Gateways.ConfigAndStartup.cs`
- Create: `dotnet/src/ZB.MOM.NatsNet.Server/NatsServer.Gateways.ConnectionsAndGossip.cs`
- Create: `dotnet/src/ZB.MOM.NatsNet.Server/NatsServer.Gateways.Interest.cs`
- Create: `dotnet/src/ZB.MOM.NatsNet.Server/NatsServer.Gateways.ReplyMap.cs`
- Create: `dotnet/src/ZB.MOM.NatsNet.Server/ClientConnection.Gateways.Protocol.cs`
- Create: `dotnet/src/ZB.MOM.NatsNet.Server/ClientConnection.Gateways.Messages.cs`
- Modify: `dotnet/src/ZB.MOM.NatsNet.Server/NatsServer.Init.cs`
- Modify: `dotnet/src/ZB.MOM.NatsNet.Server/NatsServer.Lifecycle.cs`
- Modify (as needed): `dotnet/src/ZB.MOM.NatsNet.Server/Protocol/ProtocolParser.cs`
Mapped test files:
- Modify: `dotnet/tests/ZB.MOM.NatsNet.Server.Tests/ImplBacklog/GatewayHandlerTests.Impltests.cs`
- Modify: `dotnet/tests/ZB.MOM.NatsNet.Server.Tests/ImplBacklog/LeafNodeHandlerTests.Impltests.cs`
- Modify: `dotnet/tests/ZB.MOM.NatsNet.Server.Tests/ImplBacklog/MonitoringHandlerTests.Impltests.cs`
- Modify: `dotnet/tests/ZB.MOM.NatsNet.Server.Tests/ImplBacklog/JetStreamEngineTests.Impltests.cs`
- Create/Modify: `dotnet/tests/ZB.MOM.NatsNet.Server.Tests/ImplBacklog/JetStreamSuperClusterTests.Impltests.cs`
- Modify: `dotnet/tests/ZB.MOM.NatsNet.Server.Tests/ImplBacklog/ConcurrencyTests1.Impltests.cs`
- Modify: `dotnet/tests/ZB.MOM.NatsNet.Server.Tests/ImplBacklog/ConcurrencyTests2.Impltests.cs`
- Modify: `dotnet/tests/ZB.MOM.NatsNet.Server.Tests/ImplBacklog/ConfigReloaderTests.Impltests.cs`
- Modify: `dotnet/tests/ZB.MOM.NatsNet.Server.Tests/ImplBacklog/NatsServerTests.Impltests.cs`
### Data and Control Flow
- Startup flow: validate options -> initialize gateway state (`NewGateway`) -> start accept loop -> solicit remotes after configured delay.
- Handshake flow: outbound connection receives INFO -> emits CONNECT -> registers outbound; inbound accepts CONNECT -> validates/authorizes -> registers inbound.
- Gossip flow: new gateway URLs and cluster metadata propagate to routes and inbound gateways.
- Interest flow: account/subscription changes update per-account gateway state and trigger RS+/RS-/A+/A- commands.
- Message flow: publish routing checks local interest, gateway interest mode, queue weights, and reply mapping before fanout.
- Reply flow: `_GR_`/legacy reply prefixes are tracked, mapped, and expired through periodic cleanup.
### Error Handling Strategy
- Preserve Go behavior for protocol errors (`wrong gateway`, malformed gateway commands, invalid account commands).
- Keep explicit guard clauses and structured log messages around malformed INFO/CONNECT and URL validation failures.
- On transient dial failures use reconnect/backoff paths; on unrecoverable config violations fail fast.
- For blocked items, defer with concrete reason instead of placeholder logic.
### Verification Strategy
- Use per-feature and per-test loops (read Go source, implement, build, run targeted tests).
- Enforce mandatory stub scans for both production and test files after each group.
- Require build gate after each feature group and full gateway-related test gate before `verified`.
- Enforce checkpoint protocol between tasks: full build + full unit test sweep + commit.
### Risks and Mitigations
- Risk: race regressions in gateway interest maps and reply map.
- Mitigation: include no-race mapped tests (`2376`, `2490`) in mandatory verification wave.
- Risk: mismatch between route gossip and gateway URL updates.
- Mitigation: include monitoring and reload mapped tests (`2127`, `2131`, `2747`) in verification gate.
- Risk: placeholder test drift in `GatewayHandlerTests`.
- Mitigation: anti-stub guardrails with explicit forbidden patterns and evidence-based status updates.
## Design Approval Basis
This design is based on Batch 25 tracker metadata, current repository state, and the mandatory verification/anti-stub guardrail model from `docs/plans/2026-02-27-batch-0-implementable-tests-plan.md`, adapted for both features and tests.