mbproxy/docs: split deep docs into focused PascalCase files per StyleGuide
Adds 11 topic-focused docs under docs/{Architecture,Features,Operations,Reference,Testing}/
and links them from README.md's new "Detailed documentation" section. Existing
top-level docs (design.md, kpi.md, operations.md) remain as canonical landings.
Architecture/
- Overview.md (150 lines) — listener topology, request flow, per-PLC isolation
- ConnectionModel.md (247 lines) — TxId multiplexer, watchdog, disconnect cascade
- ReadCoalescing.md (243 lines) — in-flight FC03/04 dedup via InFlightByKeyMap
- ResponseCache.md (398 lines) — opt-in per-tag TTL cache + range-overlap invalidation
Features/
- BcdRewriting.md (252 lines) — codec, CDAB, FC scope, partial-overlap policy
- HotReload.md (189 lines) — IOptionsMonitor + per-change-kind reconcile rules
Operations/
- Configuration.md (422 lines) — every Mbproxy:* option + validation rules
- StatusPage.md (334 lines) — admin endpoint surface, every JSON field
- Troubleshooting.md (364 lines) — diagnosis playbook keyed to log events
Reference/
- LogEvents.md (499 lines) — 28 events across 7 categories, grep-verified
Testing/
- Simulator.md (235 lines) — pymodbus fixture, skip policy, 3.13 framer quirk
Each doc was written by a dedicated agent against the StyleGuide.md rules with
a per-doc phase gate (PascalCase filename, H1 Title Case, code-fence language
tags, Related Documentation section with >=3 relative links, real type names
verified against src/). Cross-references between docs use relative paths;
all 18 README->docs links and all sibling links resolve.
Known follow-up: docs/design.md lines 215-251 are stale on two log-event
property templates (config.reload.applied and config.reload.rejected) and
mention LogContext.PushProperty scoping that isn't actually used. Reference/
LogEvents.md is now the authoritative event catalog and source-of-truth.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -0,0 +1,252 @@
|
||||
# BCD Rewriting
|
||||
|
||||
The BCD rewriter is the inline codec that translates DirectLOGIC's native Binary-Coded Decimal register values to and from plain binary integers on every relevant Modbus TCP PDU. It is the one place in the proxy that knows which registers are BCD, so upstream consumers can treat the wire as plain `Int16` / `Int32`.
|
||||
|
||||
## Why BCD Rewriting Exists
|
||||
|
||||
The DL205 / DL260 family stores numeric V-memory register values in native BCD, not binary. The decimal integer `1234` in `V2000` lands on the Modbus wire as `0x1234` (nibbles `1`, `2`, `3`, `4`) — not as the binary `0x04D2`. See [`../../DL260/dl205.md`](../../DL260/dl205.md) for the device-side rationale and the V-memory ↔ Modbus translation rules.
|
||||
|
||||
Upstream consumers (Wonderware, Historian, OPC UA gateways, generic Modbus clients written against the standard) expect plain binary integers. Asking every consumer to BCD-decode the wire is brittle: each consumer would carry the same tag list, the same word-order quirks, and the same risk of drift. The rewriter centralises that translation so the rest of the world sees plain `Int16` / `Int32` and the proxy is the single source of truth for "which addresses are BCD."
|
||||
|
||||
The rewriter touches only the BCD slots declared in configuration. Every other byte of the PDU — non-BCD registers, coils, discrete inputs, diagnostic function codes, exception responses — passes through unchanged. MBAP transaction IDs, unit IDs, and the MBAP length field are preserved end-to-end; the rewriter only re-encodes payload bytes whose width does not change.
|
||||
|
||||
## CDAB Word Order for 32-Bit Values
|
||||
|
||||
A 32-bit BCD value spans a register pair at `Address` and `Address+1` in CDAB (low-word-first) order:
|
||||
|
||||
- The register at `Address` holds the **low 4 BCD digits**.
|
||||
- The register at `Address+1` holds the **high 4 BCD digits**.
|
||||
- Decoded decimal = `Decode16(high) * 10_000 + Decode16(low)`.
|
||||
|
||||
This follows directly from DirectLOGIC's CDAB word convention (see [`../../DL260/dl205.md`](../../DL260/dl205.md) → Word Order).
|
||||
|
||||
Worked example — the register pair `[0x1234][0x5678]` reads on the wire as the low word `0x1234` first and the high word `0x5678` second:
|
||||
|
||||
```text
|
||||
Address: raw 0x1234 → low 4 digits = 1234
|
||||
Address+1: raw 0x5678 → high 4 digits = 5678
|
||||
|
||||
Decoded decimal = 5678 * 10_000 + 1234 = 56_781_234
|
||||
```
|
||||
|
||||
`BcdCodec.Encode32` and `BcdCodec.Decode32` in [`../../src/Mbproxy/Bcd/BcdCodec.cs`](../../src/Mbproxy/Bcd/BcdCodec.cs) implement this in both directions. `Encode32(12_345_678)` returns `(low: 0x5678, high: 0x1234)`.
|
||||
|
||||
The 16-bit codec is a straight nibble pack / unpack:
|
||||
|
||||
```csharp
|
||||
// From BcdCodec.cs — Encode16 packs four decimal digits into four BCD nibbles.
|
||||
int d3 = value / 1000;
|
||||
int d2 = (value / 100) % 10;
|
||||
int d1 = (value / 10) % 10;
|
||||
int d0 = value % 10;
|
||||
return (ushort)((d3 << 12) | (d2 << 8) | (d1 << 4) | d0);
|
||||
```
|
||||
|
||||
`Decode16` is the reverse, with a `HasBadNibble` guard that throws `FormatException` if any nibble is `>= 0xA`. The Phase-04 rewrite pipeline catches the exception and surfaces it as a `mbproxy.rewrite.invalid_bcd` warning event instead of corrupting the payload.
|
||||
|
||||
## BCD Tag Configuration Shape
|
||||
|
||||
Every BCD register the rewriter handles is described by a `BcdTag` record from [`../../src/Mbproxy/Bcd/BcdTag.cs`](../../src/Mbproxy/Bcd/BcdTag.cs):
|
||||
|
||||
```csharp
|
||||
public sealed record BcdTag(ushort Address, byte Width, int CacheTtlMs = 0)
|
||||
{
|
||||
public bool IsThirtyTwoBit => Width == 32;
|
||||
public ushort HighRegister => /* Address + 1 for 32-bit tags */;
|
||||
}
|
||||
```
|
||||
|
||||
- `Address` is the **Modbus PDU register address** (zero-based, decimal). Configuration must translate from octal V-memory to PDU-decimal before reaching this struct — `V2000` octal = decimal 1024 = `0x0400`. The proxy does not perform that translation itself.
|
||||
- `Width` is `16` (single register) or `32` (CDAB register pair at `Address` and `Address+1`). `BcdTag.Create` rejects any other width.
|
||||
- `CacheTtlMs` is the Phase-11 response-cache opt-in (covered separately in [`../Architecture/ResponseCache.md`](../Architecture/ResponseCache.md)); it has no effect on rewriter behaviour.
|
||||
|
||||
The wire-format options shape lives in [`../../src/Mbproxy/Options/BcdTagOptions.cs`](../../src/Mbproxy/Options/BcdTagOptions.cs) and [`../../src/Mbproxy/Options/BcdTagListOptions.cs`](../../src/Mbproxy/Options/BcdTagListOptions.cs). Configured tags resolve through `BcdTagMapBuilder.Build` (see [`../../src/Mbproxy/Bcd/BcdTagMapBuilder.cs`](../../src/Mbproxy/Bcd/BcdTagMapBuilder.cs)) into an immutable `BcdTagMap` ([`../../src/Mbproxy/Bcd/BcdTagMap.cs`](../../src/Mbproxy/Bcd/BcdTagMap.cs)) per PLC.
|
||||
|
||||
Holding-register (FC03) and input-register (FC04) addresses share the **same** configured tag space. The DL205 / DL260 surfaces V-memory through both tables, so the rewriter applies the configured tag list against both FC03 and FC04 responses.
|
||||
|
||||
## Function-Code Scope Table
|
||||
|
||||
The rewriter touches payloads only for the function codes below. Every other FC — coils (FC01, FC05, FC15), discrete inputs (FC02), diagnostics, exception responses — passes through byte-for-byte.
|
||||
|
||||
| FC | Direction | Action |
|
||||
|----|-----------|--------|
|
||||
| 03 | Request | Pass through (read; no payload rewrite needed) |
|
||||
| 03 | Response | Re-encode covered BCD slots from raw nibbles → binary integer |
|
||||
| 04 | Request | Pass through |
|
||||
| 04 | Response | Same as FC03 response |
|
||||
| 06 | Request | Re-encode binary integer → BCD nibbles before forwarding |
|
||||
| 06 | Response | Decode BCD nibbles → binary integer on the echo (NModbus-style clients validate the echo and would throw otherwise) |
|
||||
| 16 | Request | Per-register over the configured slots |
|
||||
| 16 | Response | Pass through (the response carries only start+qty, not values) |
|
||||
|
||||
The FC06 response decode is non-obvious: the PLC echoes back the value it actually wrote, which is now BCD-encoded because the proxy rewrote the request on the way in. Clients that validate the echo equals the value they sent (NModbus and similar libraries do this) would throw on the round-trip if the proxy did not decode the echo back.
|
||||
|
||||
`BcdPduPipeline.Process` dispatches on direction first, then on FC:
|
||||
|
||||
```csharp
|
||||
public void Process(MbapDirection direction, ReadOnlySpan<byte> mbapHeader,
|
||||
Span<byte> pdu, PduContext context)
|
||||
{
|
||||
if (context is not PerPlcContext ctx) return;
|
||||
if (pdu.Length < 1) return;
|
||||
|
||||
byte fc = pdu[0];
|
||||
ctx.Counters.IncrementPdusForwarded();
|
||||
ctx.Counters.IncrementFcCount(fc);
|
||||
|
||||
if (direction == MbapDirection.RequestToBackend)
|
||||
ProcessRequest(fc, pdu, ctx);
|
||||
else
|
||||
ProcessResponse(fc, pdu, ctx);
|
||||
}
|
||||
```
|
||||
|
||||
`PerPlcContext` carries the `BcdTagMap`, the per-PLC `ProxyCounters`, the logger, and the matched `InFlightRequest` from the multiplexer's correlation map. If a caller passes a plain `PduContext` (e.g. a test harness using `NoopPduPipeline` alongside the BCD pipeline), the rewriter returns without touching the PDU.
|
||||
|
||||
## Partial-Overlap Policy
|
||||
|
||||
A request that touches only **one** register of a configured 32-bit BCD pair cannot be re-encoded correctly. There are two shapes:
|
||||
|
||||
1. An FC03 / FC04 read whose range covers the low address but not the high address (`qty=1` at the low address) or vice versa.
|
||||
2. An FC06 write to either the low or high address of a 32-bit pair, or an FC16 write whose range covers only one of the two registers.
|
||||
|
||||
In every case the rewriter **passes the PDU through raw** and emits a `mbproxy.rewrite.partial_bcd` warning. The `PartialBcdWarnings` counter increments per occurrence.
|
||||
|
||||
The proxy never synthesises a Modbus exception for a partial-overlap. Exception response codes are reserved for transport failure (the per-request watchdog manufactures `0x0B` Gateway Target Device Failed To Respond; the PLC itself produces `0x01`–`0x04`). Using an exception code to signal a configuration / client mismatch would conflate "the device or the path failed" with "the client straddled a 32-bit boundary," and operators chasing the exception would look at the wrong layer.
|
||||
|
||||
The rationale for warn-plus-passthrough rather than silent rewrite: silently rewriting only the half the client touched would corrupt the value (a 16-bit BCD encode of a 32-bit binary integer is meaningless). A warning-plus-raw passthrough surfaces the misconfiguration loudly while leaving the client to discover the mismatch in its own data path.
|
||||
|
||||
The FC16 request path makes the partial-overlap decision per-tag inside its loop over `TryGetForRange` hits:
|
||||
|
||||
```csharp
|
||||
if (tag.IsThirtyTwoBit)
|
||||
{
|
||||
bool lowInRange = offsetWords >= 0 && offsetWords < qty;
|
||||
bool highInRange = (offsetWords + 1) >= 0 && (offsetWords + 1) < qty;
|
||||
|
||||
if (!lowInRange || !highInRange)
|
||||
{
|
||||
RewriterLogEvents.PartialBcd(ctx.Logger, ctx.PlcName,
|
||||
tag.Address, startAddress, qty);
|
||||
ctx.Counters.IncrementPartialBcd();
|
||||
continue;
|
||||
}
|
||||
// ...both registers in range — reconstruct, encode, write back...
|
||||
}
|
||||
```
|
||||
|
||||
For a 32-bit FC16 write where both registers are in range, the rewriter reconstructs the client's 32-bit binary value from the CDAB pair (`clientHigh * 10_000 + clientLow`), runs `BcdCodec.Encode32` to produce the BCD register pair, and writes both registers back to the PDU buffer in place.
|
||||
|
||||
## Unsigned Only
|
||||
|
||||
DL205 / DL260 BCD is non-negative in the default ladder pattern. `BcdCodec.Encode16` rejects values outside `[0, 9999]`; `BcdCodec.Encode32` rejects values outside `[0, 99_999_999]`. The rewriter does not implement signed BCD; signed conventions vary by site and any value out of range surfaces as `mbproxy.rewrite.invalid_bcd` rather than being silently coerced.
|
||||
|
||||
## Exception Pass-Through
|
||||
|
||||
Modbus exception responses pass through unchanged. The rewriter detects an exception response by the high bit of the function code (`fc & 0x80 != 0`), emits a `mbproxy.rewrite.exception_passthrough` event, increments the per-FC exception counter, and returns without touching the payload.
|
||||
|
||||
Covered exception codes:
|
||||
|
||||
- `0x01` Illegal Function
|
||||
- `0x02` Illegal Data Address
|
||||
- `0x03` Illegal Data Value
|
||||
- `0x04` Server Device Failure
|
||||
- `0x0B` Gateway Target Device Failed To Respond — manufactured by the per-request watchdog when a correlation entry ages past `Connection.BackendRequestTimeoutMs`. The rewriter does not distinguish proxy-manufactured from PLC-originated exception codes; both pass through identically.
|
||||
|
||||
The rewriter increments `Counters.IncrementBackendException(exceptionCode)` per exception so the four common codes surface on the status page through `ExceptionCounts` (`Code01`, `Code02`, `Code03`, `Code04`). The Gateway-Target `0x0B` is also recorded but is more usefully traced through the watchdog log events rather than the per-code counter slot.
|
||||
|
||||
## Where the Rewriter Runs in the Pipeline
|
||||
|
||||
The rewriter is implemented as `BcdPduPipeline` in [`../../src/Mbproxy/Proxy/BcdPduPipeline.cs`](../../src/Mbproxy/Proxy/BcdPduPipeline.cs), registered as the singleton `IPduPipeline` in production. The class is stateless; per-call state arrives via the `PerPlcContext` passed into `Process`, which carries the `BcdTagMap`, the per-PLC counters, the logger, and (on the response path) the matched `InFlightRequest` from the multiplexer's correlation map.
|
||||
|
||||
Per-PLC pipeline ordering:
|
||||
|
||||
```text
|
||||
Upstream request →
|
||||
[cache lookup (Phase 11)] →
|
||||
[coalesce check (Phase 10)] →
|
||||
[BCD rewriter — request path] →
|
||||
backend send
|
||||
|
||||
Backend response →
|
||||
[BCD rewriter — response path] →
|
||||
[response-cache populate (Phase 11)] →
|
||||
[fanout to all coalesced parties]
|
||||
```
|
||||
|
||||
The rewriter runs **once per request** on the multiplexer's outbound path and **once per response** on the inbound path. Per-party MBAP TxId restoration happens after the rewriter on fanout, so the rewriter only ever sees the canonical (shared) PDU buffer.
|
||||
|
||||
For Phase-11 cache hits, the response cache stores **POST-rewriter bytes** — the rewriter is bypassed on hits, both as a CPU optimisation and as a correctness guarantee (a future rewriter change does not retroactively re-transform an entry that was decoded against an earlier rewriter version). See [`../Architecture/ResponseCache.md`](../Architecture/ResponseCache.md).
|
||||
|
||||
On the response path, the rewriter cannot infer the original `(StartAddress, Qty)` of an FC03 / FC04 read from the response alone — the response carries only `[fc][byteCount][reg0Hi][reg0Lo]...`. The multiplexer's `CorrelationMap` keys the matched `InFlightRequest` to the response and attaches it to `PerPlcContext.CurrentRequest` before invoking the rewriter, so concurrent responses from different upstream clients each decode against their own request range without cross-talk. If `CurrentRequest` is null (e.g. a unit-test fixture invoking the pipeline directly) the rewriter passes the response bytes through unchanged.
|
||||
|
||||
## Hybrid Tag Resolution
|
||||
|
||||
For each PLC, the effective BCD tag list is `Global ∪ Add − Remove`, resolved by `BcdTagMapBuilder.Build` in this order:
|
||||
|
||||
1. Seed the working set from `BcdTagListOptions.Global`.
|
||||
2. Apply `PlcBcdOverrides.Remove` — drop every address listed. `Remove` matches by address only; width is irrelevant.
|
||||
3. Apply `PlcBcdOverrides.Add` — insert each entry into the working set. If an address already exists from `Global`, the `Add` entry **wins** (this is how a per-PLC width override is expressed: list the same address in `Add` with a different `Width`).
|
||||
|
||||
The shapes are declared in [`../../src/Mbproxy/Options/BcdTagListOptions.cs`](../../src/Mbproxy/Options/BcdTagListOptions.cs):
|
||||
|
||||
```csharp
|
||||
public sealed class BcdTagListOptions
|
||||
{
|
||||
public IReadOnlyList<BcdTagOptions> Global { get; init; } = [];
|
||||
}
|
||||
|
||||
public sealed class PlcBcdOverrides
|
||||
{
|
||||
public IReadOnlyList<BcdTagOptions> Add { get; init; } = [];
|
||||
public IReadOnlyList<ushort> Remove { get; init; } = [];
|
||||
}
|
||||
```
|
||||
|
||||
Resolution produces a `ValidationResult` carrying the resolved `BcdTagMap`, a list of `BcdError` entries, and a list of `BcdWarning` entries. Callers treat any non-empty `Errors` list as a fatal configuration problem for that PLC.
|
||||
|
||||
The user-facing syntax for `Global` + per-PLC `Add` / `Remove` is documented in [`../Operations/Configuration.md`](../Operations/Configuration.md).
|
||||
|
||||
`BcdTagMap.TryGetForRange` is the hot-path range scan used by both the request and response paths. It returns every `BcdTag` whose register footprint intersects `[startAddress, startAddress + qty)`, each carrying its zero-based word `OffsetWords` relative to `startAddress`. A 32-bit tag whose low word starts **before** the range but whose high word lies inside the range returns with a **negative** `OffsetWords` — that is the partial-overlap signal the rewriter consumes when deciding whether to re-encode or warn. The no-hit path returns the empty-list singleton without allocating.
|
||||
|
||||
## Validation at Startup and Hot-Reload
|
||||
|
||||
`BcdTagMapBuilder.Build` runs the same validation pipeline at process start and on every hot-reload of `appsettings.json`. The validation results fall into three buckets, defined in [`../../src/Mbproxy/Bcd/BcdValidationError.cs`](../../src/Mbproxy/Bcd/BcdValidationError.cs):
|
||||
|
||||
- `BcdValidationError.DuplicateAddress` — the same address appears more than once in the **resolved** list (after `Remove` and `Add` have been applied). Fatal error; the entry is excluded from the map.
|
||||
- `BcdValidationError.OverlappingHighRegister` — a 32-bit entry's high register (`Address+1`) collides with the `Address` of a separate entry in the resolved list. Fatal error.
|
||||
- `BcdValidationError.InvalidWidth` — an entry's `Width` is not `16` or `32`. Fatal error; the entry is excluded.
|
||||
- `BcdWarning` — a `Remove` entry whose address does not appear in `Global`. Non-fatal, but typically indicates stale configuration (the global entry was removed without cleaning up the per-PLC override).
|
||||
|
||||
A successful hot-reload that changes the resolved tag list reseats the per-PLC `BcdTagMap` and, for Phase 11, flushes the entire PLC response cache (see [`./HotReload.md`](./HotReload.md)). In-flight requests already past the rewriter are not retroactively re-rewritten; the next PDU sees the new map. A failed validation rejects the reload as a whole and the previous map stays in effect.
|
||||
|
||||
## Counter Accounting
|
||||
|
||||
The rewriter feeds two counters that surface on the status page:
|
||||
|
||||
- `pdus.rewrittenSlots` — `RewrittenSlots` on `PlcPdusStatus`, incremented per re-encoded register. A 32-bit BCD pair counts as 2 slots; a 16-bit tag counts as 1. The FC06 echo decode is **not** counted to avoid double-counting the FC06 request that already incremented the slot on the way out.
|
||||
- `pdus.partialBcdWarnings` — `PartialBcdWarnings` on `PlcPdusStatus`, incremented once per partial-overlap event (request or response path).
|
||||
|
||||
An out-of-range value (`< 0` or `> 9999` for 16-bit; `< 0` or `> 99_999_999` for 32-bit) on a write, or a bad nibble (`>= 0xA`) on a read, increments an internal invalid-BCD counter and emits `mbproxy.rewrite.invalid_bcd` at warning. The PDU passes through raw in that case; the rewriter never substitutes a value the client did not send (writes) or the PLC did not return (reads).
|
||||
|
||||
Both counters are exposed on the status page; see [`../Operations/StatusPage.md`](../Operations/StatusPage.md). The corresponding log events (`mbproxy.rewrite.partial_bcd`, `mbproxy.rewrite.invalid_bcd`, `mbproxy.rewrite.exception_passthrough`) are catalogued in [`../Reference/LogEvents.md`](../Reference/LogEvents.md). Partial-overlap troubleshooting is covered in [`../Operations/Troubleshooting.md`](../Operations/Troubleshooting.md).
|
||||
|
||||
The `dl205.json` pymodbus simulator profile encodes BCD test fixtures used by the integration test suite; see [`../Testing/Simulator.md`](../Testing/Simulator.md).
|
||||
|
||||
A few invariants the rewriter relies on and the test suite enforces:
|
||||
|
||||
- The MBAP length field is **never** modified. Every re-encoded slot is the same byte width as the original (16-bit register in, 16-bit register out), so the PDU length is byte-stable.
|
||||
- The rewriter is **stateless** at the class level. `BcdPduPipeline` holds no fields; everything per-call arrives via `PerPlcContext`. The same instance is safe to call concurrently from multiple upstream-read tasks and the single backend reader task on a given multiplexer.
|
||||
- The rewriter operates on the canonical (shared) PDU buffer. Per-party MBAP TxId restoration on coalesced fanout happens **after** the rewriter, so any per-party byte copy only happens when fanout has more than one party.
|
||||
|
||||
## Related Documentation
|
||||
|
||||
- [`../Architecture/Overview.md`](../Architecture/Overview.md) — service-wide architecture and per-PLC pipeline shape
|
||||
- [`../Architecture/ResponseCache.md`](../Architecture/ResponseCache.md) — Phase-11 response cache; the cache stores post-rewriter bytes and bypasses the rewriter on hits
|
||||
- [`./HotReload.md`](./HotReload.md) — hot-reload semantics for BCD tag-list changes
|
||||
- [`../Operations/Configuration.md`](../Operations/Configuration.md) — `BcdTags.Global` and per-PLC `Add` / `Remove` syntax
|
||||
- [`../Operations/StatusPage.md`](../Operations/StatusPage.md) — `pdus.rewrittenSlots` and `pdus.partialBcdWarnings` exposure
|
||||
- [`../Operations/Troubleshooting.md`](../Operations/Troubleshooting.md) — diagnosing partial-overlap warnings
|
||||
- [`../Reference/LogEvents.md`](../Reference/LogEvents.md) — `mbproxy.rewrite.*` event catalogue
|
||||
- [`../Testing/Simulator.md`](../Testing/Simulator.md) — the `dl205.json` simulator profile that encodes BCD test fixtures
|
||||
- [`../../DL260/dl205.md`](../../DL260/dl205.md) — DL205 / DL260 BCD encoding, CDAB word order, and V-memory ↔ Modbus translation
|
||||
@@ -0,0 +1,189 @@
|
||||
# Hot Reload
|
||||
|
||||
A save to `appsettings.json` propagates to a running `mbproxy` without restarting the service. This document explains the mechanism, the reconcile pipeline, and what each configuration change does to the running state.
|
||||
|
||||
## How Reload Works
|
||||
|
||||
`Microsoft.Extensions.Configuration` loads `appsettings.json` with `reloadOnChange: true`. Every consumer reads its options through `IOptionsMonitor<MbproxyOptions>` instead of capturing a one-shot `IOptions<T>` snapshot at construction. When the framework's `FileSystemWatcher` sees the file change, it re-parses the JSON, re-binds the option tree, and notifies subscribers through `IOptionsMonitor.OnChange`.
|
||||
|
||||
The chosen mechanism is deliberate. There is no custom file watcher, no IPC channel, no admin-port mutation endpoint, and no SIGHUP-style trigger. An operator edits the file in place (or a deployment tool atomically rewrites it) and the running service catches up. The reload contract is identical whether the service is running interactively or as a Windows Service under the SCM.
|
||||
|
||||
The `OnChange` callback can fire multiple times for a single logical save because text editors on Windows commonly use a rename-and-replace pattern that produces two or three `FileSystemWatcher` events. The reconciler debounces these inside its own background loop with a 250 ms quiescent window so a single save produces a single apply.
|
||||
|
||||
### Debounce window
|
||||
|
||||
The debounce window is held in `ConfigReconciler.DebounceWindow = TimeSpan.FromMilliseconds(250)`. The loop reads from the change channel, then keeps re-arming a linked `CancellationTokenSource` with a 250 ms expiry and waits again. As long as new signals keep arriving inside the window, the loop drains them and keeps waiting. When the window elapses with no new signal the loop falls through and calls `ApplyAsync` against `IOptionsMonitor.CurrentValue`. The window is short enough that operators perceive saves as instant and long enough to absorb every editor save pattern observed in practice (rename-and-replace, write-truncate-write, Notepad, Visual Studio Code, PowerShell `Set-Content`).
|
||||
|
||||
## The Reconcile Pipeline
|
||||
|
||||
Three types in `src/Mbproxy/Configuration/` carry the reload contract from "framework noticed the file changed" to "the running service matches the new file":
|
||||
|
||||
- `ReloadValidator` (`src/Mbproxy/Configuration/ReloadValidator.cs`) — runs cross-PLC and per-PLC checks before the reload is allowed to take effect. The validator is a static gate: `Validate(MbproxyOptions next, out IReadOnlyList<string> errors)` returns `false` and a list of error strings if the snapshot is malformed, and the apply step bails out before touching any state.
|
||||
- `ReloadPlan` (`src/Mbproxy/Configuration/ReloadPlan.cs`) — an immutable record produced by the pure function `ReloadPlan.Compute(MbproxyOptions current, MbproxyOptions next)`. It buckets PLCs into `ToAdd`, `ToRemove`, `ToRestart` (network identity changed), and `ToReseat` (only the resolved `BcdTagMap` changed). PLC identity is keyed on `Name`, not `ListenPort`, so a port change is still the same PLC and goes to `ToRestart` rather than `ToRemove` + `ToAdd`.
|
||||
- `ConfigReconciler` (`src/Mbproxy/Configuration/ConfigReconciler.cs`) — subscribes to `IOptionsMonitor.OnChange`, debounces and serialises change events through a bounded `Channel<bool>` and a `SemaphoreSlim(1, 1)`, then runs the plan: removes go first (concurrent), restarts next (concurrent), reseats apply via `PlcListenerSupervisor.ReplaceContextAsync`, and adds finish last.
|
||||
|
||||
The reconciler's `OnChange` handler does not block. It writes to a `Channel<bool>` with `BoundedChannelFullMode.DropOldest` so a busy reload queue never stalls the configuration framework. A dedicated background loop drains the channel, applies the 250 ms debounce, and then calls `ApplyAsync` on the latest snapshot exposed by `IOptionsMonitor.CurrentValue`. The last enqueued change wins.
|
||||
|
||||
The apply itself runs under `_applySemaphore` (a `SemaphoreSlim(1, 1)`) so two saves arriving in rapid succession are serialised and never interleave. If a second save lands while the first apply is still running, it queues at the semaphore and runs against whatever `CurrentValue` exposes when its turn comes — which is the freshest options snapshot, not necessarily the one that caused the wake-up.
|
||||
|
||||
### Apply order
|
||||
|
||||
`ApplyUnderLockAsync` runs the steps in this order against the freshly validated snapshot:
|
||||
|
||||
1. **Validate.** If `ReloadValidator.Validate` returns errors, log `mbproxy.config.reload.rejected`, increment the rejected counter, and return without mutating state.
|
||||
2. **Compute.** Call `ReloadPlan.Compute(_currentOptions, next)` to bucket PLCs into `ToAdd`, `ToRemove`, `ToRestart`, and `ToReseat`.
|
||||
3. **Remove.** Stop every supervisor in `ToRemove` concurrently with a 10-second stop timeout, then dispose.
|
||||
4. **Restart.** Stop the old supervisor, build a fresh `PerPlcContext` (which includes a new `ResponseCache` when any resolved tag opts in), and start a new `PlcListenerSupervisor` on the new endpoint. Restarts run concurrently across affected PLCs.
|
||||
5. **Reseat.** For each PLC in `ToReseat`, build a new context that preserves the existing `Counters` (so operators see real history across the reseat) and call `PlcListenerSupervisor.ReplaceContextAsync` with a 5-second timeout.
|
||||
6. **Add.** Build and start a new supervisor for every PLC in `ToAdd` concurrently.
|
||||
7. **Record.** Update `_currentOptions` to `next`, call `ServiceCounters.RecordReloadApplied`, and log `mbproxy.config.reload.applied` with the apply counts and the global tag delta.
|
||||
|
||||
If a step throws, the exception is logged at Error and the loop continues with the remaining steps. The validator catches every precondition that can be checked from the configuration alone, so a runtime exception here is a true bug worth surfacing. The host stays up regardless.
|
||||
|
||||
## Per-Change-Kind Reconcile Table
|
||||
|
||||
| Change in `appsettings.json` | Propagation |
|
||||
|------------------------------|-------------|
|
||||
| `BcdTags.Global` add / remove / width | The rewriter dereferences `IOptionsMonitor` per PDU. The next PDU sees the new map. In-flight requests are not retroactively touched. |
|
||||
| `Plcs[i].BcdTags.Add` or `Plcs[i].BcdTags.Remove` | Same as above — next-PDU resolution against the rebuilt map. |
|
||||
| New `Plcs[i]` entry | `ConfigReconciler` builds a fresh `PerPlcContext` and `PlcListenerSupervisor`, which binds the new port under the same eager-then-auto-recover policy used at service startup. |
|
||||
| `Plcs[i]` removed | The supervisor for that PLC is stopped (10 s stop timeout) and disposed, which closes every upstream client connection bound to that listener. |
|
||||
| `Plcs[i].ListenPort` or `Host` changed | Equivalent to remove + add. The supervisor stops the old listener, the reconciler rebuilds the context, and a new supervisor starts on the new endpoint. |
|
||||
| `Connection.BackendConnectTimeoutMs` and the other `Backend*TimeoutMs` values | The next backend connect or request reads the new value through the monitor. In-flight operations keep their already-applied timeout. |
|
||||
| `BcdTags.*.CacheTtlMs` or `Plcs[i].DefaultCacheTtlMs` | A tag-map reseat constructs a fresh `ResponseCache` for that PLC, which drops every cached entry for that PLC. Entries re-populate on demand under the new TTL. Per-tag flush granularity is intentionally not implemented. |
|
||||
| `Cache.AllowLongTtl` | Enforced at the next reload validation. A pending reload that depends on it must save together. |
|
||||
| `Cache.MaxEntriesPerPlc` | Applies to subsequent inserts. Existing entries are not pruned. |
|
||||
| `Cache.EvictionIntervalMs` | Read by the next eviction loop tick. |
|
||||
| `Resilience.ReadCoalescing.Enabled` flipped to `false` | Already-running coalesced entries drain naturally. Subsequent reads bypass coalescing. |
|
||||
| `Resilience.ReadCoalescing.MaxParties` | Applies to subsequent attaches. Existing in-flight entries keep their current cap. |
|
||||
| Invalid reload (schema break, duplicate ports, duplicate addresses in a resolved tag list, `CacheTtlMs > 60_000` without `Cache.AllowLongTtl = true`) | Reload is rejected as a whole. The current in-memory config stays in effect. `mbproxy.config.reload.rejected` is logged at Error. |
|
||||
|
||||
The "next-PDU" wording is load-bearing for the tag-list rows: the rewriter does not snapshot the tag map at connection accept time. It resolves the map for the active PLC at the start of every request frame, so a hot-reloaded tag list is in effect for the very next request, even on existing TCP connections.
|
||||
|
||||
### Reseat vs. restart
|
||||
|
||||
The `ReloadPlan` distinguishes two kinds of "PLC is still here but changed":
|
||||
|
||||
- **Restart** is triggered when `Host`, `ListenPort`, or backend `Port` differ between the old and new `PlcOptions`. The TCP socket has to close and reopen on a new endpoint, so there is no way to preserve the listener — the supervisor stops and a brand-new one starts.
|
||||
- **Reseat** is triggered when only the resolved `BcdTagMap` differs (which `ReloadPlan.Compute` checks structurally through `TagMapsEqual`: same set of `(Address, Width, CacheTtlMs)` triples). The listener socket and the upstream pipes stay open. Only the `PerPlcContext` swaps.
|
||||
|
||||
`TagMapsEqual` includes `BcdTag.CacheTtlMs` in the comparison so a per-tag TTL change or a `Plcs[i].DefaultCacheTtlMs` change (which folds into per-tag TTLs through `BcdTagMapBuilder.Build`) also routes to `ToReseat` and so also drops the cache. A `Plcs[i]` whose options are byte-identical to the previous snapshot lands in neither bucket and the supervisor is left alone.
|
||||
|
||||
### Tag map resolution
|
||||
|
||||
`BcdTagMapBuilder.Build` is the single source of truth for what the resolved tag list looks like for one PLC. The hybrid resolution it implements is:
|
||||
|
||||
1. Start with `BcdTags.Global` from the root options.
|
||||
2. Remove every address present in `Plcs[i].BcdTags.Remove`.
|
||||
3. Merge in `Plcs[i].BcdTags.Add` entries — if an address already exists in the working set, the `Add` entry wins. This is how a per-PLC width override is expressed (the global lists a 16-bit tag at the same address; the per-PLC `Add` overrides it to 32-bit).
|
||||
4. Fold `Plcs[i].DefaultCacheTtlMs` into any tag whose explicit `CacheTtlMs` is null.
|
||||
|
||||
The same builder runs both at startup and during reload validation, so a configuration that builds cleanly at startup is guaranteed to build cleanly at reload, and vice versa. There is no second validator that could disagree with the first.
|
||||
|
||||
## Validation Rules
|
||||
|
||||
`ReloadValidator.Validate` is the gate the hot-reload path consults directly. It runs the following checks in order:
|
||||
|
||||
1. PLC names are non-empty and unique under ordinal comparison.
|
||||
2. Every `Plcs[i].ListenPort` is in `[1, 65535]` and unique across the `Plcs` list.
|
||||
3. `AdminPort` is in `[1, 65535]` and does not collide with any `ListenPort`.
|
||||
4. For each PLC, `BcdTagMapBuilder.Build(next.BcdTags, plc.BcdTags, plc.DefaultCacheTtlMs)` reports no errors. This delegates the per-PLC well-formedness checks — duplicate addresses within a single resolved list, and 32-bit entries whose high register (`Address + 1`) overlaps a separate 16-bit entry — to the single source of truth used at startup.
|
||||
5. Cache TTL bounds: every `BcdTag.CacheTtlMs` and every `Plcs[i].DefaultCacheTtlMs` must be `>= 0`, and any value above `60_000` ms requires `Cache.AllowLongTtl = true`. `Cache.MaxEntriesPerPlc` and `Cache.EvictionIntervalMs` must be `>= 0`.
|
||||
|
||||
A failure at any step appends to the error list but the validator runs to completion so the operator sees every problem with a single save. If the list is non-empty, the reload is rejected atomically and no state mutates.
|
||||
|
||||
Schema-level checks — invalid `Width` values on a `BcdTagOptions`, type mismatches, malformed JSON — are also enforced by `MbproxyOptionsValidator` (`IValidateOptions<MbproxyOptions>`) at bind time. The two paths overlap deliberately so both startup and reload reject the same malformed input with the same error wording.
|
||||
|
||||
### Rejected-reload example
|
||||
|
||||
A duplicate `ListenPort` in the saved file produces an error like the following on the rejected log line:
|
||||
|
||||
```text
|
||||
Config reload rejected — Errors=Plc 'plc-02': Duplicate ListenPort 5020 (already used by 'plc-01').
|
||||
```
|
||||
|
||||
When several rules trip on the same save, the validator joins them with `; ` so the operator sees every problem from one file save. The current in-memory configuration is unchanged, every supervisor keeps running on its existing context, and the next valid save will replay the whole apply against the now-current state.
|
||||
|
||||
## What Stays vs. What Changes Mid-Flight
|
||||
|
||||
The reload contract is built around a simple invariant: a Modbus request that has already started routing keeps the configuration it started with. The next request after the reload picks up the new values.
|
||||
|
||||
The rewriter is the clearest example. `BcdPduPipeline` dereferences the tag map at the start of every PDU. A request that is already in the multiplexer's outbound queue is rewritten against the map that was current when it arrived. The very next request on the same TCP connection sees the new map. This avoids a torn behaviour where one PDU is half-rewritten under the old tag list and half under the new — every PDU is fully consistent with exactly one snapshot of the map.
|
||||
|
||||
The same principle applies to timeouts. `Connection.BackendConnectTimeoutMs` and the per-operation timeout values are read through `IOptionsMonitor.CurrentValue` at the point the operation starts. A backend connect that has already entered its retry pipeline keeps its already-applied timeout for the remainder of that attempt. The next backend connect reads the new value.
|
||||
|
||||
The reseat path is the only place where running state changes mid-connection. A reseat swaps the entire `PerPlcContext` — `TagMap`, `Counters`, `Cache` — via `PlcListenerSupervisor.ReplaceContextAsync`. The listener socket and the existing upstream pipes survive the swap. The brief transition window between the old context and the new is documented in code: any PDU mid-flight at the swap point may observe the boundary, but the rewriter only consults the map at PDU start, so the practical effect is the same next-PDU resolution rule.
|
||||
|
||||
Counters are explicitly preserved across a reseat. The reconciler reads `supervisor.CurrentCounters` and passes the same `ProxyCounters` instance into the new context so request counts, rewrite counts, and error counts do not reset to zero every time an operator tweaks a tag. A restart, by contrast, constructs a brand-new `ProxyCounters` because the supervisor itself is brand new.
|
||||
|
||||
### Effect on upstream sockets
|
||||
|
||||
The fate of an open upstream client socket depends on which bucket its PLC lands in:
|
||||
|
||||
- **Reseat.** The socket stays open. The client never notices the reload happened; only its next request frame resolves against the new tag map.
|
||||
- **Restart.** The old listener stops, which closes every upstream socket bound to it. The client sees a TCP close and is expected to reconnect (Wonderware DAServer, generic Modbus masters, and the supported gateways all do this automatically). When it reconnects, it lands on the new listener at the new endpoint.
|
||||
- **Remove.** Same as a restart from the client's perspective: the listener stops and every connection closes. If the operator also removed the IP from the upstream client's configuration, the client stops reconnecting; otherwise the reconnect attempts simply fail with `ECONNREFUSED` until the PLC reappears.
|
||||
- **Add.** No effect on any existing socket. The new listener simply starts accepting on its `ListenPort`.
|
||||
|
||||
## Cache and Hot-Reload
|
||||
|
||||
Any tag-list change that affects a PLC drops the entire `ResponseCache` for that PLC. The reseat path constructs a fresh cache through `ConfigReconciler.BuildCacheIfNeeded`, which inspects the resolved map and returns a new `ResponseCache` when at least one tag opts in, or `null` otherwise. The supervisor disposes the old cache during `ReplaceContextAsync`.
|
||||
|
||||
Per-tag granular flush is intentionally not implemented. The reasoning is correctness over micro-optimisation:
|
||||
|
||||
- A width change between 16-bit and 32-bit can invalidate cached entries at neighbouring addresses, not just at the changed tag.
|
||||
- A tag removal means a cached value is no longer rewritten on the way out, so the cached entry that was valid one millisecond ago is now serving the wrong shape.
|
||||
- A TTL change on one tag does not influence neighbouring entries, but the cost of tracking per-entry TTL versions and replaying flushes outweighs the cost of repopulating on demand.
|
||||
|
||||
A wholesale drop is the simple correct move. Entries repopulate on demand at the next read against the new TTL, and a 54-PLC fleet with second-scale TTLs warms back to steady state within a handful of poll intervals.
|
||||
|
||||
`Cache.MaxEntriesPerPlc` and `Cache.EvictionIntervalMs` deliberately do **not** trigger a reseat. A change to either value is structurally invisible to `TagMapsEqual` (which only inspects the resolved tag triples), so no cache rebuild happens. `MaxEntriesPerPlc` is enforced on subsequent inserts only — existing entries above the new cap stay until natural LRU eviction reaches them. `EvictionIntervalMs` is sampled by each fresh tick of the eviction loop, so a change takes effect at the next tick of the old interval.
|
||||
|
||||
## Reload Events
|
||||
|
||||
Two events surface in the structured log every time the reconciler runs:
|
||||
|
||||
```csharp
|
||||
[LoggerMessage(EventId = 60, EventName = "mbproxy.config.reload.applied",
|
||||
Level = LogLevel.Information,
|
||||
Message = "Config reload applied — PlcsAdded={PlcsAdded} PlcsRemoved={PlcsRemoved} " +
|
||||
"PlcsRestarted={PlcsRestarted} PlcsReseated={PlcsReseated} GlobalTagDelta={GlobalTagDelta}")]
|
||||
private static partial void LogReloadApplied(
|
||||
ILogger logger, int plcsAdded, int plcsRemoved, int plcsRestarted, int plcsReseated, int globalTagDelta);
|
||||
|
||||
[LoggerMessage(EventId = 61, EventName = "mbproxy.config.reload.rejected",
|
||||
Level = LogLevel.Error,
|
||||
Message = "Config reload rejected — Errors={Errors}")]
|
||||
private static partial void LogReloadRejected(ILogger logger, string errors);
|
||||
```
|
||||
|
||||
`mbproxy.config.reload.applied` carries the counts from the executed `ReloadPlan` plus a `GlobalTagDelta` computed by `ConfigReconciler.ComputeGlobalTagDelta`, which counts how many global tag entries differ between the old and new options snapshots (added, removed, or width-changed).
|
||||
|
||||
`mbproxy.config.reload.rejected` carries the joined error string from `ReloadValidator.Validate`. The reconciler also increments service-wide counters through `ServiceCounters.RecordReloadApplied` and `ServiceCounters.RecordReloadRejected`, which surface on the status page as `config.reloadCount`, `config.reloadRejectedCount`, and `config.lastReloadUtc`. Both event names are catalogued in [`../Reference/LogEvents.md`](../Reference/LogEvents.md).
|
||||
|
||||
### Reading the events
|
||||
|
||||
A healthy reload looks like this in the log stream:
|
||||
|
||||
```text
|
||||
INFO mbproxy.config.reload.applied — PlcsAdded=1 PlcsRemoved=0 PlcsRestarted=0 PlcsReseated=2 GlobalTagDelta=3
|
||||
```
|
||||
|
||||
The properties answer four questions at a glance: how many new listeners were brought up, how many old listeners were torn down, how many existing listeners moved to a new endpoint (and therefore disconnected their clients), and how many existing listeners had their tag maps swapped underneath open connections. `GlobalTagDelta` reports the number of `BcdTags.Global` entries that differ between snapshots; it counts each address once whether the difference is "added", "removed", or "width changed".
|
||||
|
||||
A rejected reload looks like this:
|
||||
|
||||
```text
|
||||
ERROR mbproxy.config.reload.rejected — Errors=Plc 'plc-02': Duplicate ListenPort 5020 (already used by 'plc-01').; Plc 'plc-03': BCD tag map error (DuplicateAddress): Address 1072 appears twice in resolved tag list.
|
||||
```
|
||||
|
||||
Every error from the validator concatenates with `; ` so a single rejected event captures every problem. The matching `config.reloadRejectedCount` counter on the status page increments by one per rejected save, not per error inside the save.
|
||||
|
||||
## Related Documentation
|
||||
|
||||
- [Architecture Overview](../Architecture/Overview.md)
|
||||
- [Response Cache](../Architecture/ResponseCache.md)
|
||||
- [BCD Rewriting](./BcdRewriting.md)
|
||||
- [Configuration Reference](../Operations/Configuration.md)
|
||||
- [Log Events](../Reference/LogEvents.md)
|
||||
- [Status Page](../Operations/StatusPage.md)
|
||||
Reference in New Issue
Block a user