mbproxy: initial commit through Phase 9 (TxId multiplexing)

Adds the mbproxy service end-to-end. Phases 00-08 implement the
production-ready single-listener / 1:1-backend transparent Modbus TCP
proxy with bidirectional BCD rewriting for the ~54-PLC DL205/DL260
fleet. Phase 9 replaces the connection layer with a single backend
socket per PLC plus MBAP TxId rewriting, lifting the H2-ECOM100's
4-concurrent-client cap as an operational ceiling.

Phase 9 additions of note:
- PlcMultiplexer + UpstreamPipe + TxIdAllocator + CorrelationMap
- InFlightRequest with IReadOnlyList<InterestedParty> (load-bearing
  for Phase 10 read coalescing — do not collapse to a single field)
- Per-request watchdog: surfaces Modbus exception 0x0B to upstream
  on BackendRequestTimeoutMs, defending against lost responses,
  dead-PLC paths, and pymodbus 3.13.0's concurrent-multiplexed-
  request bug (its ServerRequestHandler.last_pdu state race)
- Status DTO + HTML gain inFlight / maxInFlight / txIdWraps /
  disconnectCascades / queueDepth (Tier 1.6 in docs/kpi.md)

Tests: 263 unit + 38 E2E. Multiplexer correctness under truly
concurrent backend traffic is proved against a stub backend in
PlcMultiplexerTests; MultiplexerE2ETests paces requests so pymodbus
3.13's single-PDU framer stays in known-good mode.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
Joseph Doherty
2026-05-14 01:49:35 -04:00
parent 2e937228a0
commit 56eee3c563
105 changed files with 18430 additions and 0 deletions
+158
View File
@@ -0,0 +1,158 @@
# Phase 06 — Configuration hot-reload
Subscribe to `IOptionsMonitor<MbproxyOptions>.OnChange` and reconcile the running supervisors + per-PLC tag maps + connection settings against the new config — without restarting the host.
**Depends on:** Phase 05 (supervisor lifecycle).
**Parallel-safe with:** nothing (touches the widest cross-cut: supervisors + tag maps + counters + DI options).
## Goal
A `appsettings.json` save propagates per the design's reconcile table:
| Change | Action |
|--------|--------|
| `BcdTags.Global` add/remove/width | Rebuild every PLC's `BcdTagMap`, swap atomically. Next PDU sees it. |
| `Plcs[i].BcdTags.{Add,Remove}` | Rebuild that PLC's `BcdTagMap` only. |
| New `Plcs[i]` | Create supervisor + context, start it. |
| Removed `Plcs[i]` | Stop supervisor, close all client connections to it. |
| Changed `ListenPort` / `Host` | Stop + start the supervisor (remove + add semantics). |
| `Connection.Backend*TimeoutMs` | Take effect on the next backend connect / request. |
| Invalid reload | Reject as a whole; keep current state; log `mbproxy.config.reload.rejected`. |
Validation runs FIRST. A reload that would produce duplicate `ListenPort` values, or a `BcdTagMapBuilder.Build` error for any PLC, is rejected atomically before any state mutates.
## Outputs
```
src/Mbproxy/Configuration/ConfigReconciler.cs # OnChange handler; orchestrates the apply
src/Mbproxy/Configuration/ReloadValidator.cs # cross-PLC validation (duplicate ports, etc.)
src/Mbproxy/Configuration/ReloadPlan.cs # immutable diff record between current and new
tests/Mbproxy.Tests/Configuration/ReloadValidatorTests.cs
tests/Mbproxy.Tests/Configuration/ConfigReconcilerTests.cs
tests/Mbproxy.Tests/Configuration/HotReloadE2ETests.cs # real appsettings.json mutation, real host
```
Modifications:
- `src/Mbproxy/Proxy/ProxyWorker.cs` — accept a `ConfigReconciler` and forward `IOptionsMonitor.OnChange` to it; on startup, also seed the reconciler with the initial snapshot.
- `src/Mbproxy/Proxy/Supervision/PlcListenerSupervisor.cs` — expose a `Task ReplaceContextAsync(PerPlcContext newCtx, CancellationToken ct)` that atomically swaps the BCD tag map and counters without restarting the listener. Old in-flight connections finish on the old map; new connections use the new map. (Document the brief transition window in comments.)
- Add `mbproxy.config.reload.applied` and `mbproxy.config.reload.rejected` `[LoggerMessage]` events.
- `src/Mbproxy/Options/MbproxyOptions.cs` — wire `IValidateOptions<MbproxyOptions>` to call the schema-level validator only. Cross-PLC validation (duplicate ports, etc.) is handled by `ReloadValidator` because it requires inspecting multiple `Plcs[i]` together, which `IValidateOptions` doesn't naturally express.
## Tasks
1. **`ReloadPlan.cs`** — immutable record describing the diff:
```csharp
public sealed record ReloadPlan(
IReadOnlyList<PlcOptions> ToAdd,
IReadOnlyList<string> ToRemove, // PLC names
IReadOnlyList<(string Name, PlcOptions New)> ToRestart, // port or host changed
IReadOnlyList<(string Name, BcdTagMap NewMap)> ToReseat, // tag map changed
ConnectionOptions Connection);
```
Computed by a pure function `ReloadPlan.Compute(MbproxyOptions current, MbproxyOptions next)`; PLC identity is keyed on `Name` (NOT on `ListenPort`, which is mutable).
2. **`ReloadValidator.cs`** — single static method `Validate(MbproxyOptions next, out IReadOnlyList<string> errors)`:
- PLC names are unique and non-empty.
- `ListenPort` values are unique.
- For each PLC, `BcdTagMapBuilder.Build(global, perPlc).Errors` is empty.
- `AdminPort` doesn't collide with any `Plcs[i].ListenPort`.
- All ports are in `[1, 65535]`.
3. **`ConfigReconciler.cs`** — subscribes via constructor-injected `IOptionsMonitor<MbproxyOptions>.OnChange`. On change:
- Snapshot the new options.
- Run `ReloadValidator.Validate`. On failure: log `mbproxy.config.reload.rejected` with the error list; do nothing else.
- Compute `ReloadPlan` against the current snapshot.
- Apply the plan in order:
1. Stop supervisors in `ToRemove` (concurrently).
2. Stop+restart supervisors in `ToRestart` (concurrently).
3. Build new `PerPlcContext` for each `ToReseat` entry and call `supervisor.ReplaceContextAsync(newCtx)`.
4. Build supervisors for `ToAdd`, start them.
- On success: log `mbproxy.config.reload.applied` with summary (`PlcsAdded`, `PlcsRemoved`, `PlcsReseated`, `TagListDelta`). Record `lastReloadUtc` and bump `reloadCount` on a service-wide counter (consumed by phase 07).
- On any step throwing: best-effort log the partial-apply state at Error, then continue. The host stays up. (The validator should have caught most failure modes; a runtime failure here is a true bug.)
4. **`ProxyWorker.cs`** updates — register the reconciler with the host and wire startup to use it for the initial snapshot.
## Public surface declared in this phase
```csharp
namespace Mbproxy.Configuration;
internal sealed class ConfigReconciler : IDisposable {
public ConfigReconciler(IOptionsMonitor<MbproxyOptions> monitor, /* dependencies */);
public Task ApplyAsync(MbproxyOptions next, CancellationToken ct); // exposed for tests
public void Dispose();
}
public sealed record ReloadPlan(
IReadOnlyList<PlcOptions> ToAdd,
IReadOnlyList<string> ToRemove,
IReadOnlyList<(string Name, PlcOptions New)> ToRestart,
IReadOnlyList<(string Name, BcdTagMap NewMap)> ToReseat,
ConnectionOptions Connection) {
public static ReloadPlan Compute(MbproxyOptions current, MbproxyOptions next);
}
internal static class ReloadValidator {
public static bool Validate(MbproxyOptions next, out IReadOnlyList<string> errors);
}
```
## Tests required
### Unit (`Category = Unit`)
`ReloadValidatorTests` (≥ 6 tests):
1. `Validate_DuplicatePlcName_Fails`
2. `Validate_DuplicateListenPort_Fails`
3. `Validate_AdminPortCollidesWith_PlcListenPort_Fails`
4. `Validate_PerPlc_BcdMapBuildError_Fails`
5. `Validate_PortOutOfRange_Fails`
6. `Validate_HappyPath_Passes`
`ReloadPlanTests` (≥ 5 tests):
1. `Compute_AddOnePlc_OnlyToAddPopulated`
2. `Compute_RemoveOnePlc_OnlyToRemovePopulated`
3. `Compute_ChangePort_GoesToToRestart_NotToReseat`
4. `Compute_ChangePerPlcTagOverride_GoesToToReseat`
5. `Compute_ChangeGlobalTagList_AllPlcsReseat_NoRestart`
`ConfigReconcilerTests` (≥ 4 tests, using a fake `IOptionsMonitor` + fake supervisor factory):
1. `Apply_HappyPath_StartsAndStopsSupervisors_PerPlan`
2. `Apply_ValidationFails_NoMutationOccurs_AndLogsRejected`
3. `Apply_ReseatTagMap_DoesNotRestartSupervisor`
4. `Apply_ConcurrentReloads_Are_Serialised` — two rapid changes get processed in order, no interleaving.
### E2E (`Category = E2E`)
`HotReloadE2ETests` (≥ 4 tests, using a real `Host.CreateApplicationBuilder` + temp appsettings.json file):
1. `E2E_AddPlcAtRuntime_NewListenerBinds_AndIsReachable` — start the host with one PLC, write a new appsettings adding a second PLC pointing at the simulator on a fresh listen port, drive NModbus against the new proxy port within 2 s.
2. `E2E_RemovePlcAtRuntime_ClosesUpstreamConnections` — start with two PLCs and a connected client, write appsettings removing one; client's socket closes within 1 s.
3. `E2E_ChangeGlobalBcdTagList_RewriteReflectsImmediately` — start with addr 1072 NOT in BCD list, read raw 0x1234. Write appsettings adding it. Read again, get decoded 1234.
4. `E2E_InvalidReload_DoesNotMutateRunningState` — start happy, write a broken appsettings (duplicate ListenPort), assert the host keeps running with the OLD config and `mbproxy.config.reload.rejected` is logged.
## Phase gate
- [ ] Zero-warnings build.
- [ ] All phase 0005 tests still green.
- [ ] All new unit tests green.
- [ ] All e2e hot-reload tests green when the simulator is available.
- [ ] `mbproxy.config.reload.applied` / `.rejected` events match the design's properties list.
- [ ] A misconfigured reload (duplicate ports) is rejected atomically — the assertion in test E2E_4 verifies no partial mutation.
- [ ] The reconciler serializes concurrent `OnChange` notifications (`SemaphoreSlim` or equivalent) so two file saves in quick succession don't race.
- [ ] Counters `service.config.reloadCount` and `service.config.reloadRejectedCount` are bumped correctly.
## Out of scope
- Watching for files OTHER than `appsettings.json` (env files, dotnet user-secrets, etc.). The default config source set established in phase 00 is the contract.
- Reloading Serilog log levels at runtime. Possible but not in this phase.
- A reload audit log file. The accept/reject events are sufficient.
- Online schema migrations (e.g., renaming a key in an older config to a new one). Reject-the-whole-thing is the simpler contract.
## Notes for the subagent
- `IOptionsMonitor.OnChange` can fire MULTIPLE times for a single file save on some platforms (text editors saving via rename-and-replace can trigger 2-3 events). Debounce inside the reconciler — a 250 ms quiescent window after the last `OnChange` before computing the plan. Document the choice in code.
- The reconciler must NOT block the `OnChange` callback thread for I/O (`StopAsync` etc.). Use `Channel<ReloadRequest>` or a `Task.Run`-style hand-off so the callback returns immediately.
- When a supervisor restart is in progress (e.g., port changed), reject further reloads briefly with a queued "retry after current applies" — OR just serialise everything via a single semaphore and accept that a backed-up reload queue gets all changes eventually. Pick the simpler option (semaphore); document it.
- `BcdTagMapBuilder.Build` is the validator for tag-list well-formedness; do not duplicate that validation in `ReloadValidator`. The validator just calls `Build` and checks the `Errors` list.