Generated design docs and implementation plans via Codex for: - Batch 23: Routes - Batch 24: Leaf Nodes - Batch 25: Gateways - Batch 26: WebSocket - Batch 27: JetStream Core - Batch 28: JetStream API - Batch 29: JetStream Batching - Batch 30: Raft Part 1 All plans include mandatory verification protocol and anti-stub guardrails. Updated batches.md with file paths and planned status.
162 lines
9.0 KiB
Markdown
162 lines
9.0 KiB
Markdown
# Batch 25 Gateways Design
|
|
|
|
**Date:** 2026-02-27
|
|
**Batch:** 25 (`Gateways`)
|
|
**Scope:** Design only. No implementation in this document.
|
|
|
|
## Problem
|
|
|
|
Batch 25 ports gateway connection handling from `golang/nats-server/server/gateway.go` into the .NET server.
|
|
|
|
- Features: `86` (all currently `deferred`)
|
|
- Tests: `59` (all currently `deferred`)
|
|
- Dependencies: batches `19` and `23`
|
|
- Go source: `server/gateway.go`
|
|
- Batch status: `pending`
|
|
|
|
This batch is one of the highest fan-in networking areas: it touches server startup, inbound/outbound connection lifecycle, route gossip, account interest propagation, and reply-subject mapping.
|
|
|
|
## Context Findings
|
|
|
|
Collected with:
|
|
|
|
- `/usr/local/share/dotnet/dotnet run --project tools/NatsNet.PortTracker -- batch show 25 --db porting.db`
|
|
- `/usr/local/share/dotnet/dotnet run --project tools/NatsNet.PortTracker -- batch list --db porting.db`
|
|
- `/usr/local/share/dotnet/dotnet run --project tools/NatsNet.PortTracker -- report summary --db porting.db`
|
|
- `/usr/local/share/dotnet/dotnet run --project tools/NatsNet.PortTracker -- batch ready --db porting.db`
|
|
|
|
Repository observations:
|
|
|
|
- `dotnet/src/ZB.MOM.NatsNet.Server/Gateway/GatewayTypes.cs` already contains gateway data types, but method behavior is mostly unimplemented and includes at least one explicit TODO (`GwReplyMapping.Get`).
|
|
- `NatsServer.Init.cs` still comments gateway initialization as deferred (`s.NewGateway(opts)`).
|
|
- `NatsServer.Lifecycle.cs` includes gateway placeholders (`GetAllGatewayConnections`, `RemoveRemoteGatewayConnection`, `GatewayUpdateSubInterest`) that need full parity behavior.
|
|
- `ClientConnection.cs` does not yet contain gateway protocol/message handlers (`ProcessGatewayConnect`, `ProcessGatewayInfo`, `ProcessGatewayRSub`, `SendMsgToGateways`, etc.).
|
|
- `GatewayHandlerTests.Impltests.cs` currently has many placeholder-style tests (for example assertions against literal `"Should"` strings), so Batch 25 test work must include anti-stub cleanup.
|
|
- `batch ready` currently does not list Batch 25 as ready, so implementation must not start until dependencies are complete.
|
|
|
|
## Constraints and Success Criteria
|
|
|
|
Constraints:
|
|
|
|
- Follow `docs/standards/dotnet-standards.md` (.NET 10, nullable enabled, xUnit 3 + Shouldly + NSubstitute).
|
|
- Keep behavior equivalent to Go intent in `gateway.go`; do not line-by-line transliterate goroutine patterns.
|
|
- No stubs/fake-pass tests/status promotion without evidence.
|
|
- Execute in feature groups of at most ~20 features.
|
|
- Dependency order is strict: Batch 25 execution only starts when batches 19 and 23 are complete.
|
|
|
|
Success criteria:
|
|
|
|
- All 86 features are either implemented and verified, or explicitly deferred with concrete blocker reasons.
|
|
- All 59 mapped tests are real behavioral tests and pass.
|
|
- Gateway regression paths remain green across server/account/route/leaf/monitoring touchpoints.
|
|
- Batch 25 can be completed in PortTracker without audit override due to placeholder code.
|
|
|
|
## Approaches
|
|
|
|
### Approach A: Single-file gateway implementation
|
|
|
|
Place nearly all gateway methods into one large server file and keep client changes in `ClientConnection.cs`.
|
|
|
|
Trade-offs:
|
|
|
|
- Pros: fewer files.
|
|
- Cons: very high review risk, poor parallelism, hard evidence tracking, difficult rollback.
|
|
|
|
### Approach B (Recommended): Domain-segmented partials by gateway lifecycle
|
|
|
|
Split implementation into focused partials:
|
|
|
|
1. Config/bootstrap/listener and outbound solicitation
|
|
2. Handshake/INFO/gossip and connection registry
|
|
3. Interest/subscription propagation
|
|
4. Reply-map and inbound/outbound message processing
|
|
|
|
Trade-offs:
|
|
|
|
- Pros: maps directly to `gateway.go` sections, clean feature grouping, tighter verification loops, lower merge risk.
|
|
- Cons: more files and coordination.
|
|
|
|
### Approach C: Test-first backlog replacement then backfill feature methods
|
|
|
|
Replace all gateway-related placeholder tests first, then implement production behavior until tests pass.
|
|
|
|
Trade-offs:
|
|
|
|
- Pros: immediate feedback.
|
|
- Cons: current mapped tests underrepresent some feature paths unless supplemented with targeted regression gates.
|
|
|
|
## Recommended Design
|
|
|
|
Use **Approach B** with five feature groups (`19/18/17/16/16`) and four test waves (`15/16/18/10`) tied to `gateway.go` line ranges and mapped test IDs.
|
|
|
|
### Architecture
|
|
|
|
- `NatsServer` partials own gateway server state transitions: initialization, accept loop, solicitation/reconnect, remote registration, route gossip, account-level interest operations, and reply-map lifecycle.
|
|
- `ClientConnection` partials own gateway protocol operations: CONNECT/INFO handling, RSub/RUnsub/account commands, inbound gateway message pipeline, and outbound gateway send decisions.
|
|
- `Gateway` helper surface owns deterministic utilities: hash/prefix helpers, routed-reply parsing, string/proto builders, and gateway option validation.
|
|
- Existing `GatewayTypes` remains the source of gateway state objects but receives method implementations and lock-safe helpers.
|
|
|
|
### Proposed File Map
|
|
|
|
Primary production files:
|
|
|
|
- Modify: `dotnet/src/ZB.MOM.NatsNet.Server/Gateway/GatewayTypes.cs`
|
|
- Create: `dotnet/src/ZB.MOM.NatsNet.Server/Gateway/GatewayHandler.cs`
|
|
- Create: `dotnet/src/ZB.MOM.NatsNet.Server/NatsServer.Gateways.ConfigAndStartup.cs`
|
|
- Create: `dotnet/src/ZB.MOM.NatsNet.Server/NatsServer.Gateways.ConnectionsAndGossip.cs`
|
|
- Create: `dotnet/src/ZB.MOM.NatsNet.Server/NatsServer.Gateways.Interest.cs`
|
|
- Create: `dotnet/src/ZB.MOM.NatsNet.Server/NatsServer.Gateways.ReplyMap.cs`
|
|
- Create: `dotnet/src/ZB.MOM.NatsNet.Server/ClientConnection.Gateways.Protocol.cs`
|
|
- Create: `dotnet/src/ZB.MOM.NatsNet.Server/ClientConnection.Gateways.Messages.cs`
|
|
- Modify: `dotnet/src/ZB.MOM.NatsNet.Server/NatsServer.Init.cs`
|
|
- Modify: `dotnet/src/ZB.MOM.NatsNet.Server/NatsServer.Lifecycle.cs`
|
|
- Modify (as needed): `dotnet/src/ZB.MOM.NatsNet.Server/Protocol/ProtocolParser.cs`
|
|
|
|
Mapped test files:
|
|
|
|
- Modify: `dotnet/tests/ZB.MOM.NatsNet.Server.Tests/ImplBacklog/GatewayHandlerTests.Impltests.cs`
|
|
- Modify: `dotnet/tests/ZB.MOM.NatsNet.Server.Tests/ImplBacklog/LeafNodeHandlerTests.Impltests.cs`
|
|
- Modify: `dotnet/tests/ZB.MOM.NatsNet.Server.Tests/ImplBacklog/MonitoringHandlerTests.Impltests.cs`
|
|
- Modify: `dotnet/tests/ZB.MOM.NatsNet.Server.Tests/ImplBacklog/JetStreamEngineTests.Impltests.cs`
|
|
- Create/Modify: `dotnet/tests/ZB.MOM.NatsNet.Server.Tests/ImplBacklog/JetStreamSuperClusterTests.Impltests.cs`
|
|
- Modify: `dotnet/tests/ZB.MOM.NatsNet.Server.Tests/ImplBacklog/ConcurrencyTests1.Impltests.cs`
|
|
- Modify: `dotnet/tests/ZB.MOM.NatsNet.Server.Tests/ImplBacklog/ConcurrencyTests2.Impltests.cs`
|
|
- Modify: `dotnet/tests/ZB.MOM.NatsNet.Server.Tests/ImplBacklog/ConfigReloaderTests.Impltests.cs`
|
|
- Modify: `dotnet/tests/ZB.MOM.NatsNet.Server.Tests/ImplBacklog/NatsServerTests.Impltests.cs`
|
|
|
|
### Data and Control Flow
|
|
|
|
- Startup flow: validate options -> initialize gateway state (`NewGateway`) -> start accept loop -> solicit remotes after configured delay.
|
|
- Handshake flow: outbound connection receives INFO -> emits CONNECT -> registers outbound; inbound accepts CONNECT -> validates/authorizes -> registers inbound.
|
|
- Gossip flow: new gateway URLs and cluster metadata propagate to routes and inbound gateways.
|
|
- Interest flow: account/subscription changes update per-account gateway state and trigger RS+/RS-/A+/A- commands.
|
|
- Message flow: publish routing checks local interest, gateway interest mode, queue weights, and reply mapping before fanout.
|
|
- Reply flow: `_GR_`/legacy reply prefixes are tracked, mapped, and expired through periodic cleanup.
|
|
|
|
### Error Handling Strategy
|
|
|
|
- Preserve Go behavior for protocol errors (`wrong gateway`, malformed gateway commands, invalid account commands).
|
|
- Keep explicit guard clauses and structured log messages around malformed INFO/CONNECT and URL validation failures.
|
|
- On transient dial failures use reconnect/backoff paths; on unrecoverable config violations fail fast.
|
|
- For blocked items, defer with concrete reason instead of placeholder logic.
|
|
|
|
### Verification Strategy
|
|
|
|
- Use per-feature and per-test loops (read Go source, implement, build, run targeted tests).
|
|
- Enforce mandatory stub scans for both production and test files after each group.
|
|
- Require build gate after each feature group and full gateway-related test gate before `verified`.
|
|
- Enforce checkpoint protocol between tasks: full build + full unit test sweep + commit.
|
|
|
|
### Risks and Mitigations
|
|
|
|
- Risk: race regressions in gateway interest maps and reply map.
|
|
- Mitigation: include no-race mapped tests (`2376`, `2490`) in mandatory verification wave.
|
|
- Risk: mismatch between route gossip and gateway URL updates.
|
|
- Mitigation: include monitoring and reload mapped tests (`2127`, `2131`, `2747`) in verification gate.
|
|
- Risk: placeholder test drift in `GatewayHandlerTests`.
|
|
- Mitigation: anti-stub guardrails with explicit forbidden patterns and evidence-based status updates.
|
|
|
|
## Design Approval Basis
|
|
|
|
This design is based on Batch 25 tracker metadata, current repository state, and the mandatory verification/anti-stub guardrail model from `docs/plans/2026-02-27-batch-0-implementable-tests-plan.md`, adapted for both features and tests.
|