56eee3c563
Adds the mbproxy service end-to-end. Phases 00-08 implement the production-ready single-listener / 1:1-backend transparent Modbus TCP proxy with bidirectional BCD rewriting for the ~54-PLC DL205/DL260 fleet. Phase 9 replaces the connection layer with a single backend socket per PLC plus MBAP TxId rewriting, lifting the H2-ECOM100's 4-concurrent-client cap as an operational ceiling. Phase 9 additions of note: - PlcMultiplexer + UpstreamPipe + TxIdAllocator + CorrelationMap - InFlightRequest with IReadOnlyList<InterestedParty> (load-bearing for Phase 10 read coalescing — do not collapse to a single field) - Per-request watchdog: surfaces Modbus exception 0x0B to upstream on BackendRequestTimeoutMs, defending against lost responses, dead-PLC paths, and pymodbus 3.13.0's concurrent-multiplexed- request bug (its ServerRequestHandler.last_pdu state race) - Status DTO + HTML gain inFlight / maxInFlight / txIdWraps / disconnectCascades / queueDepth (Tier 1.6 in docs/kpi.md) Tests: 263 unit + 38 E2E. Multiplexer correctness under truly concurrent backend traffic is proved against a stub backend in PlcMultiplexerTests; MultiplexerE2ETests paces requests so pymodbus 3.13's single-PDU framer stays in known-good mode. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
159 lines
9.7 KiB
Markdown
159 lines
9.7 KiB
Markdown
# Phase 06 — Configuration hot-reload
|
||
|
||
Subscribe to `IOptionsMonitor<MbproxyOptions>.OnChange` and reconcile the running supervisors + per-PLC tag maps + connection settings against the new config — without restarting the host.
|
||
|
||
**Depends on:** Phase 05 (supervisor lifecycle).
|
||
**Parallel-safe with:** nothing (touches the widest cross-cut: supervisors + tag maps + counters + DI options).
|
||
|
||
## Goal
|
||
|
||
A `appsettings.json` save propagates per the design's reconcile table:
|
||
|
||
| Change | Action |
|
||
|--------|--------|
|
||
| `BcdTags.Global` add/remove/width | Rebuild every PLC's `BcdTagMap`, swap atomically. Next PDU sees it. |
|
||
| `Plcs[i].BcdTags.{Add,Remove}` | Rebuild that PLC's `BcdTagMap` only. |
|
||
| New `Plcs[i]` | Create supervisor + context, start it. |
|
||
| Removed `Plcs[i]` | Stop supervisor, close all client connections to it. |
|
||
| Changed `ListenPort` / `Host` | Stop + start the supervisor (remove + add semantics). |
|
||
| `Connection.Backend*TimeoutMs` | Take effect on the next backend connect / request. |
|
||
| Invalid reload | Reject as a whole; keep current state; log `mbproxy.config.reload.rejected`. |
|
||
|
||
Validation runs FIRST. A reload that would produce duplicate `ListenPort` values, or a `BcdTagMapBuilder.Build` error for any PLC, is rejected atomically before any state mutates.
|
||
|
||
## Outputs
|
||
|
||
```
|
||
src/Mbproxy/Configuration/ConfigReconciler.cs # OnChange handler; orchestrates the apply
|
||
src/Mbproxy/Configuration/ReloadValidator.cs # cross-PLC validation (duplicate ports, etc.)
|
||
src/Mbproxy/Configuration/ReloadPlan.cs # immutable diff record between current and new
|
||
|
||
tests/Mbproxy.Tests/Configuration/ReloadValidatorTests.cs
|
||
tests/Mbproxy.Tests/Configuration/ConfigReconcilerTests.cs
|
||
tests/Mbproxy.Tests/Configuration/HotReloadE2ETests.cs # real appsettings.json mutation, real host
|
||
```
|
||
|
||
Modifications:
|
||
- `src/Mbproxy/Proxy/ProxyWorker.cs` — accept a `ConfigReconciler` and forward `IOptionsMonitor.OnChange` to it; on startup, also seed the reconciler with the initial snapshot.
|
||
- `src/Mbproxy/Proxy/Supervision/PlcListenerSupervisor.cs` — expose a `Task ReplaceContextAsync(PerPlcContext newCtx, CancellationToken ct)` that atomically swaps the BCD tag map and counters without restarting the listener. Old in-flight connections finish on the old map; new connections use the new map. (Document the brief transition window in comments.)
|
||
- Add `mbproxy.config.reload.applied` and `mbproxy.config.reload.rejected` `[LoggerMessage]` events.
|
||
- `src/Mbproxy/Options/MbproxyOptions.cs` — wire `IValidateOptions<MbproxyOptions>` to call the schema-level validator only. Cross-PLC validation (duplicate ports, etc.) is handled by `ReloadValidator` because it requires inspecting multiple `Plcs[i]` together, which `IValidateOptions` doesn't naturally express.
|
||
|
||
## Tasks
|
||
|
||
1. **`ReloadPlan.cs`** — immutable record describing the diff:
|
||
```csharp
|
||
public sealed record ReloadPlan(
|
||
IReadOnlyList<PlcOptions> ToAdd,
|
||
IReadOnlyList<string> ToRemove, // PLC names
|
||
IReadOnlyList<(string Name, PlcOptions New)> ToRestart, // port or host changed
|
||
IReadOnlyList<(string Name, BcdTagMap NewMap)> ToReseat, // tag map changed
|
||
ConnectionOptions Connection);
|
||
```
|
||
Computed by a pure function `ReloadPlan.Compute(MbproxyOptions current, MbproxyOptions next)`; PLC identity is keyed on `Name` (NOT on `ListenPort`, which is mutable).
|
||
2. **`ReloadValidator.cs`** — single static method `Validate(MbproxyOptions next, out IReadOnlyList<string> errors)`:
|
||
- PLC names are unique and non-empty.
|
||
- `ListenPort` values are unique.
|
||
- For each PLC, `BcdTagMapBuilder.Build(global, perPlc).Errors` is empty.
|
||
- `AdminPort` doesn't collide with any `Plcs[i].ListenPort`.
|
||
- All ports are in `[1, 65535]`.
|
||
3. **`ConfigReconciler.cs`** — subscribes via constructor-injected `IOptionsMonitor<MbproxyOptions>.OnChange`. On change:
|
||
- Snapshot the new options.
|
||
- Run `ReloadValidator.Validate`. On failure: log `mbproxy.config.reload.rejected` with the error list; do nothing else.
|
||
- Compute `ReloadPlan` against the current snapshot.
|
||
- Apply the plan in order:
|
||
1. Stop supervisors in `ToRemove` (concurrently).
|
||
2. Stop+restart supervisors in `ToRestart` (concurrently).
|
||
3. Build new `PerPlcContext` for each `ToReseat` entry and call `supervisor.ReplaceContextAsync(newCtx)`.
|
||
4. Build supervisors for `ToAdd`, start them.
|
||
- On success: log `mbproxy.config.reload.applied` with summary (`PlcsAdded`, `PlcsRemoved`, `PlcsReseated`, `TagListDelta`). Record `lastReloadUtc` and bump `reloadCount` on a service-wide counter (consumed by phase 07).
|
||
- On any step throwing: best-effort log the partial-apply state at Error, then continue. The host stays up. (The validator should have caught most failure modes; a runtime failure here is a true bug.)
|
||
4. **`ProxyWorker.cs`** updates — register the reconciler with the host and wire startup to use it for the initial snapshot.
|
||
|
||
## Public surface declared in this phase
|
||
|
||
```csharp
|
||
namespace Mbproxy.Configuration;
|
||
|
||
internal sealed class ConfigReconciler : IDisposable {
|
||
public ConfigReconciler(IOptionsMonitor<MbproxyOptions> monitor, /* dependencies */);
|
||
public Task ApplyAsync(MbproxyOptions next, CancellationToken ct); // exposed for tests
|
||
public void Dispose();
|
||
}
|
||
|
||
public sealed record ReloadPlan(
|
||
IReadOnlyList<PlcOptions> ToAdd,
|
||
IReadOnlyList<string> ToRemove,
|
||
IReadOnlyList<(string Name, PlcOptions New)> ToRestart,
|
||
IReadOnlyList<(string Name, BcdTagMap NewMap)> ToReseat,
|
||
ConnectionOptions Connection) {
|
||
public static ReloadPlan Compute(MbproxyOptions current, MbproxyOptions next);
|
||
}
|
||
|
||
internal static class ReloadValidator {
|
||
public static bool Validate(MbproxyOptions next, out IReadOnlyList<string> errors);
|
||
}
|
||
```
|
||
|
||
## Tests required
|
||
|
||
### Unit (`Category = Unit`)
|
||
|
||
`ReloadValidatorTests` (≥ 6 tests):
|
||
|
||
1. `Validate_DuplicatePlcName_Fails`
|
||
2. `Validate_DuplicateListenPort_Fails`
|
||
3. `Validate_AdminPortCollidesWith_PlcListenPort_Fails`
|
||
4. `Validate_PerPlc_BcdMapBuildError_Fails`
|
||
5. `Validate_PortOutOfRange_Fails`
|
||
6. `Validate_HappyPath_Passes`
|
||
|
||
`ReloadPlanTests` (≥ 5 tests):
|
||
|
||
1. `Compute_AddOnePlc_OnlyToAddPopulated`
|
||
2. `Compute_RemoveOnePlc_OnlyToRemovePopulated`
|
||
3. `Compute_ChangePort_GoesToToRestart_NotToReseat`
|
||
4. `Compute_ChangePerPlcTagOverride_GoesToToReseat`
|
||
5. `Compute_ChangeGlobalTagList_AllPlcsReseat_NoRestart`
|
||
|
||
`ConfigReconcilerTests` (≥ 4 tests, using a fake `IOptionsMonitor` + fake supervisor factory):
|
||
|
||
1. `Apply_HappyPath_StartsAndStopsSupervisors_PerPlan`
|
||
2. `Apply_ValidationFails_NoMutationOccurs_AndLogsRejected`
|
||
3. `Apply_ReseatTagMap_DoesNotRestartSupervisor`
|
||
4. `Apply_ConcurrentReloads_Are_Serialised` — two rapid changes get processed in order, no interleaving.
|
||
|
||
### E2E (`Category = E2E`)
|
||
|
||
`HotReloadE2ETests` (≥ 4 tests, using a real `Host.CreateApplicationBuilder` + temp appsettings.json file):
|
||
|
||
1. `E2E_AddPlcAtRuntime_NewListenerBinds_AndIsReachable` — start the host with one PLC, write a new appsettings adding a second PLC pointing at the simulator on a fresh listen port, drive NModbus against the new proxy port within 2 s.
|
||
2. `E2E_RemovePlcAtRuntime_ClosesUpstreamConnections` — start with two PLCs and a connected client, write appsettings removing one; client's socket closes within 1 s.
|
||
3. `E2E_ChangeGlobalBcdTagList_RewriteReflectsImmediately` — start with addr 1072 NOT in BCD list, read raw 0x1234. Write appsettings adding it. Read again, get decoded 1234.
|
||
4. `E2E_InvalidReload_DoesNotMutateRunningState` — start happy, write a broken appsettings (duplicate ListenPort), assert the host keeps running with the OLD config and `mbproxy.config.reload.rejected` is logged.
|
||
|
||
## Phase gate
|
||
|
||
- [ ] Zero-warnings build.
|
||
- [ ] All phase 00–05 tests still green.
|
||
- [ ] All new unit tests green.
|
||
- [ ] All e2e hot-reload tests green when the simulator is available.
|
||
- [ ] `mbproxy.config.reload.applied` / `.rejected` events match the design's properties list.
|
||
- [ ] A misconfigured reload (duplicate ports) is rejected atomically — the assertion in test E2E_4 verifies no partial mutation.
|
||
- [ ] The reconciler serializes concurrent `OnChange` notifications (`SemaphoreSlim` or equivalent) so two file saves in quick succession don't race.
|
||
- [ ] Counters `service.config.reloadCount` and `service.config.reloadRejectedCount` are bumped correctly.
|
||
|
||
## Out of scope
|
||
|
||
- Watching for files OTHER than `appsettings.json` (env files, dotnet user-secrets, etc.). The default config source set established in phase 00 is the contract.
|
||
- Reloading Serilog log levels at runtime. Possible but not in this phase.
|
||
- A reload audit log file. The accept/reject events are sufficient.
|
||
- Online schema migrations (e.g., renaming a key in an older config to a new one). Reject-the-whole-thing is the simpler contract.
|
||
|
||
## Notes for the subagent
|
||
|
||
- `IOptionsMonitor.OnChange` can fire MULTIPLE times for a single file save on some platforms (text editors saving via rename-and-replace can trigger 2-3 events). Debounce inside the reconciler — a 250 ms quiescent window after the last `OnChange` before computing the plan. Document the choice in code.
|
||
- The reconciler must NOT block the `OnChange` callback thread for I/O (`StopAsync` etc.). Use `Channel<ReloadRequest>` or a `Task.Run`-style hand-off so the callback returns immediately.
|
||
- When a supervisor restart is in progress (e.g., port changed), reject further reloads briefly with a queued "retry after current applies" — OR just serialise everything via a single semaphore and accept that a backed-up reload queue gets all changes eventually. Pick the simpler option (semaphore); document it.
|
||
- `BcdTagMapBuilder.Build` is the validator for tag-list well-formedness; do not duplicate that validation in `ReloadValidator`. The validator just calls `Build` and checks the `Errors` list.
|