natsnet/docs/plans/2026-02-27-batch-41-mqtt-client-io-design.md

# Batch 41 MQTT Client/IO Design

**Date:** 2026-02-27
**Batch:** 41 (`MQTT Client/IO`)
**Scope:** 74 features + 28 unit tests
**Dependency:** Batch `40` (`MQTT Server/JSA`)
**Go source:** `golang/nats-server/server/mqtt.go` (line ~3503 through ~5882)

## Problem

Batch 41 is the MQTT client protocol and I/O execution surface:

- CONNECT/PUBLISH/SUBSCRIBE/UNSUBSCRIBE parse and response paths
- QoS1/QoS2 flow control and PUBREL lifecycle
- retain handling and retained-message permission checks
- topic/filter <-> subject conversion logic
- byte-level reader/writer utilities used across MQTT packet handling

This batch can easily appear "complete" via placeholders because many mapped tests are currently template-style and several mapped test IDs require non-MQTT infrastructure. The design must force evidence-based completion and explicit deferrals instead of stubs.

## Context Findings

### Required command outputs

Executed with explicit runtime path because `dotnet` is not on PATH in this shell:

- `/usr/local/share/dotnet/dotnet run --project tools/NatsNet.PortTracker -- batch show 41 --db porting.db`
  - Status: `pending`
  - Features: `74` (all currently `deferred`)
  - Tests: `28` (all currently `deferred`)
  - Depends on: `40`
  - Go file: `server/mqtt.go`
- `/usr/local/share/dotnet/dotnet run --project tools/NatsNet.PortTracker -- batch list --db porting.db`
  - Confirms Batch 41 is the final batch and depends only on Batch 40.
- `/usr/local/share/dotnet/dotnet run --project tools/NatsNet.PortTracker -- report summary --db porting.db`
  - Current snapshot: `1924/6942 (27.7%)`.

### Current .NET baseline

- MQTT types/constants exist in:
  - `dotnet/src/ZB.MOM.NatsNet.Server/Mqtt/MqttConstants.cs`
  - `dotnet/src/ZB.MOM.NatsNet.Server/Mqtt/MqttTypes.cs`
- Batch 41 mapped methods are not implemented yet and many target methods/classes are currently absent.
- `MqttHandler.cs` still contains broad NotImplemented stubs in server extension methods.
- `MqttHandlerTests.Impltests.cs` and related backlog files still include placeholder assertion patterns.
- Mapped test set includes cross-module classes (`AuthCalloutTests`, `NatsConsumerTests`, `LeafNodeHandlerTests`, `WebSocketHandlerTests`, `JetStreamClusterTests3`) in addition to MQTT-specific classes.

## Approach Options

### Approach A: Single-file incremental port in `MqttHandler.cs`

- Pros: minimal file churn.
- Cons: high merge conflict risk, poor reviewability, weak ownership boundaries for 74 methods.

### Approach B: Tests-first for all 28 IDs before feature work

- Pros: strict red/green discipline.
- Cons: too many missing method surfaces creates high-noise red phase; many test IDs depend on deferred non-B41 features.

### Approach C (Recommended): Feature-sliced implementation groups + test waves with explicit deferred protocol

- Pros: aligns with Go function clusters, keeps each group <=20 features, enables mandatory checkpoint evidence, and handles cross-module tests without fake pass pressure.
- Cons: requires initial mapping-alignment step and disciplined status batching.

**Decision:** Approach C.

## Proposed Design

### 1. File and component layout

Implement Batch 41 in focused MQTT slices (new files/partials as needed), not a single monolith:

- `ClientConnection` MQTT packet handlers/parsers
  - `MqttParseConnect`, `MqttParsePub`, `MqttParseSubsOrUnsubs`, `MqttProcessPublishReceived`, enqueue acks
- `NatsServer` MQTT publish/session bridge handlers
  - `MqttProcessConnect`, `MqttProcessPub`, `MqttProcessPubRel`, retained permissions audit
- `MqttSession` QoS2 consumer/pubrel helpers
  - `TrackAsPubRel`, `UntrackPubRel`, `DeleteConsumer`, `ProcessJSConsumer`
- `MqttReader` / `MqttWriter` byte I/O utilities
  - varint, length-prefixed strings/bytes, publish header encoding
- Stateless conversion/trace/helper methods
  - topic/filter conversion, reserve-sub logic, trace formatters, Sparkplug-B helpers

### 2. Feature grouping model (max ~20 each)

- Group A (19): session + connect + initial publish path (`2331-2349`)
- Group B (19): retained/perms + QoS ack processing + subscribe callbacks (`2350-2368`)
- Group C (17): reserved-sub/sparkplug + subscribe processing + unsubscribe/ping (`2369-2385`)
- Group D (19): conversion + reader/writer I/O (`2386-2404`)

### 3. Test-wave model (28 tests)

- Wave T1 (10): deterministic MQTT parser/conversion tests (`2170,2171,2190,2191,2194,2195,2196,2199,2200,2229`)
- Wave T2 (11): publish/retain/session behavior tests (`2182,2204,2234,2235,2236,2237,2238,2246,2251,2253,2285`)
- Wave T3 (4): cross-module integration-touching tests (`115,1258,1924,3095`)
- Wave T4 (3): cluster-dependent tests (`1055,1056,1113`)

### 4. Deferred strategy by design

Because some mapped tests depend on deferred non-B41 features, completion criteria is:

- implement and verify what is truly executable in local unit-test context,
- mark blockers `deferred` with specific reason and evidence,
- never substitute with placeholders.

### 5. Verification architecture

The implementation plan will enforce:

- per-feature verification loop before promotion,
- per-test verification loop with discovered/pass evidence,
- stub detection checks on source and tests,
- build/test gates before every status update,
- status updates limited to <=15 IDs per batch-update,
- mandatory checkpoints between task groups.

## Risks and Mitigations

- Risk: class/method mapping drift (mapped methods not yet present).
  - Mitigation: dedicated mapping-alignment preflight task before feature status changes.
- Risk: placeholder tests pass without exercising MQTT logic.
  - Mitigation: anti-stub scans + assertion minimum checks + single-test evidence requirement.
- Risk: cluster/integration tests block throughput.
  - Mitigation: explicit deferred-with-reason path and continue with unblocked items.
- Risk: large parser/I/O surface causes hidden regressions.
  - Mitigation: incremental group checkpoints with full build + targeted/full test gates.

## Non-Goals

- Executing Batch 41 during this planning session.
- Marking statuses without verification evidence.
- Re-scoping features outside Batch 41 except documented test dependency blockers.