# Hot Reload A save to `appsettings.json` propagates to a running `mbproxy` without restarting the service. This document explains the mechanism, the reconcile pipeline, and what each configuration change does to the running state. ## How Reload Works `Microsoft.Extensions.Configuration` loads `appsettings.json` with `reloadOnChange: true`. Every consumer reads its options through `IOptionsMonitor` instead of capturing a one-shot `IOptions` snapshot at construction. When the framework's `FileSystemWatcher` sees the file change, it re-parses the JSON, re-binds the option tree, and notifies subscribers through `IOptionsMonitor.OnChange`. The chosen mechanism is deliberate. There is no custom file watcher, no IPC channel, no admin-port mutation endpoint, and no SIGHUP-style trigger. An operator edits the file in place (or a deployment tool atomically rewrites it) and the running service catches up. The reload contract is identical whether the service is running interactively, as a Windows Service under the SCM, or as a Linux systemd unit. The `OnChange` callback can fire multiple times for a single logical save because text editors on Windows commonly use a rename-and-replace pattern that produces two or three `FileSystemWatcher` events. The reconciler debounces these inside its own background loop with a 250 ms quiescent window so a single save produces a single apply. ### Debounce window The debounce window is held in `ConfigReconciler.DebounceWindow = TimeSpan.FromMilliseconds(250)`. The loop reads from the change channel, then keeps re-arming a linked `CancellationTokenSource` with a 250 ms expiry and waits again. As long as new signals keep arriving inside the window, the loop drains them and keeps waiting. When the window elapses with no new signal the loop falls through and calls `ApplyAsync` against `IOptionsMonitor.CurrentValue`. The window is short enough that operators perceive saves as instant and long enough to absorb every editor save pattern observed in practice (rename-and-replace, write-truncate-write, Notepad, Visual Studio Code, PowerShell `Set-Content`). ## The Reconcile Pipeline Three types in `src/Mbproxy/Configuration/` carry the reload contract from "framework noticed the file changed" to "the running service matches the new file": - `ReloadValidator` (`src/Mbproxy/Configuration/ReloadValidator.cs`) — runs cross-PLC and per-PLC checks before the reload is allowed to take effect. The validator is a static gate: `Validate(MbproxyOptions next, out IReadOnlyList errors)` returns `false` and a list of error strings if the snapshot is malformed, and the apply step bails out before touching any state. - `ReloadPlan` (`src/Mbproxy/Configuration/ReloadPlan.cs`) — an immutable record produced by the pure function `ReloadPlan.Compute(MbproxyOptions current, MbproxyOptions next)`. It buckets PLCs into `ToAdd`, `ToRemove`, `ToRestart` (network identity changed), and `ToReseat` (only the resolved `BcdTagMap` changed). PLC identity is keyed on `Name`, not `ListenPort`, so a port change is still the same PLC and goes to `ToRestart` rather than `ToRemove` + `ToAdd`. - `ConfigReconciler` (`src/Mbproxy/Configuration/ConfigReconciler.cs`) — subscribes to `IOptionsMonitor.OnChange`, debounces and serialises change events through a bounded `Channel` and a `SemaphoreSlim(1, 1)`, then runs the plan: removes go first (concurrent), restarts next (concurrent), reseats apply via `PlcListenerSupervisor.ReplaceContextAsync`, and adds finish last. The reconciler's `OnChange` handler does not block. It writes to a `Channel` with `BoundedChannelFullMode.DropOldest` so a busy reload queue never stalls the configuration framework. A dedicated background loop drains the channel, applies the 250 ms debounce, and then calls `ApplyAsync` on the latest snapshot exposed by `IOptionsMonitor.CurrentValue`. The last enqueued change wins. The apply itself runs under `_applySemaphore` (a `SemaphoreSlim(1, 1)`) so two saves arriving in rapid succession are serialised and never interleave. If a second save lands while the first apply is still running, it queues at the semaphore and runs against whatever `CurrentValue` exposes when its turn comes — which is the freshest options snapshot, not necessarily the one that caused the wake-up. ### Apply order `ApplyUnderLockAsync` runs the steps in this order against the freshly validated snapshot: 1. **Validate.** If `ReloadValidator.Validate` returns errors, log `mbproxy.config.reload.rejected`, increment the rejected counter, and return without mutating state. 2. **Compute.** Call `ReloadPlan.Compute(_currentOptions, next)` to bucket PLCs into `ToAdd`, `ToRemove`, `ToRestart`, and `ToReseat`. 3. **Remove.** Stop every supervisor in `ToRemove` concurrently with a 10-second stop timeout, then dispose. 4. **Restart.** Stop the old supervisor, build a fresh `PerPlcContext` (which includes a new `ResponseCache` when any resolved tag opts in), and start a new `PlcListenerSupervisor` on the new endpoint. Restarts run concurrently across affected PLCs. 5. **Reseat.** For each PLC in `ToReseat`, build a new context that preserves the existing `Counters` (so operators see real history across the reseat) and call `PlcListenerSupervisor.ReplaceContextAsync` with a 5-second timeout. 6. **Add.** Build and start a new supervisor for every PLC in `ToAdd` concurrently. 7. **Record.** Update `_currentOptions` to `next`, call `ServiceCounters.RecordReloadApplied`, and log `mbproxy.config.reload.applied` with the apply counts and the global tag delta. If a step throws, the exception is logged at Error and the loop continues with the remaining steps. The validator catches every precondition that can be checked from the configuration alone, so a runtime exception here is a true bug worth surfacing. The host stays up regardless. ## Per-Change-Kind Reconcile Table | Change in `appsettings.json` | Propagation | |------------------------------|-------------| | `BcdTags.Global` add / remove / width | The rewriter dereferences `IOptionsMonitor` per PDU. The next PDU sees the new map. In-flight requests are not retroactively touched. | | `Plcs[i].BcdTags.Add` or `Plcs[i].BcdTags.Remove` | Same as above — next-PDU resolution against the rebuilt map. | | New `Plcs[i]` entry | `ConfigReconciler` builds a fresh `PerPlcContext` and `PlcListenerSupervisor`, which binds the new port under the same eager-then-auto-recover policy used at service startup. | | `Plcs[i]` removed | The supervisor for that PLC is stopped (10 s stop timeout) and disposed, which closes every upstream client connection bound to that listener. | | `Plcs[i].ListenPort` or `Host` changed | Equivalent to remove + add. The supervisor stops the old listener, the reconciler rebuilds the context, and a new supervisor starts on the new endpoint. | | `Connection.BackendConnectTimeoutMs` and the other `Backend*TimeoutMs` values | The next backend connect or request reads the new value through the monitor. In-flight operations keep their already-applied timeout. | | `BcdTags.*.CacheTtlMs` or `Plcs[i].DefaultCacheTtlMs` | A tag-map reseat constructs a fresh `ResponseCache` for that PLC, which drops every cached entry for that PLC. Entries re-populate on demand under the new TTL. Per-tag flush granularity is intentionally not implemented. | | `Cache.AllowLongTtl` | Enforced at the next reload validation. A pending reload that depends on it must save together. | | `Cache.MaxEntriesPerPlc` | Applies to subsequent inserts. Existing entries are not pruned. | | `Cache.EvictionIntervalMs` | Read by the next eviction loop tick. | | `Resilience.ReadCoalescing.Enabled` flipped to `false` | Already-running coalesced entries drain naturally. Subsequent reads bypass coalescing. | | `Resilience.ReadCoalescing.MaxParties` | Applies to subsequent attaches. Existing in-flight entries keep their current cap. | | Invalid reload (schema break, duplicate ports, duplicate addresses in a resolved tag list, `CacheTtlMs > 60_000` without `Cache.AllowLongTtl = true`) | Reload is rejected as a whole. The current in-memory config stays in effect. `mbproxy.config.reload.rejected` is logged at Error. | The "next-PDU" wording is load-bearing for the tag-list rows: the rewriter does not snapshot the tag map at connection accept time. It resolves the map for the active PLC at the start of every request frame, so a hot-reloaded tag list is in effect for the very next request, even on existing TCP connections. ### Reseat vs. restart The `ReloadPlan` distinguishes two kinds of "PLC is still here but changed": - **Restart** is triggered when `Host`, `ListenPort`, or backend `Port` differ between the old and new `PlcOptions`. The TCP socket has to close and reopen on a new endpoint, so there is no way to preserve the listener — the supervisor stops and a brand-new one starts. - **Reseat** is triggered when only the resolved `BcdTagMap` differs (which `ReloadPlan.Compute` checks structurally through `TagMapsEqual`: same set of `(Address, Width, CacheTtlMs)` triples). The listener socket and the upstream pipes stay open. Only the `PerPlcContext` swaps. `TagMapsEqual` includes `BcdTag.CacheTtlMs` in the comparison so a per-tag TTL change or a `Plcs[i].DefaultCacheTtlMs` change (which folds into per-tag TTLs through `BcdTagMapBuilder.Build`) also routes to `ToReseat` and so also drops the cache. A `Plcs[i]` whose options are byte-identical to the previous snapshot lands in neither bucket and the supervisor is left alone. ### Tag map resolution `BcdTagMapBuilder.Build` is the single source of truth for what the resolved tag list looks like for one PLC. The hybrid resolution it implements is: 1. Start with `BcdTags.Global` from the root options. 2. Remove every address present in `Plcs[i].BcdTags.Remove`. 3. Merge in `Plcs[i].BcdTags.Add` entries — if an address already exists in the working set, the `Add` entry wins. This is how a per-PLC width override is expressed (the global lists a 16-bit tag at the same address; the per-PLC `Add` overrides it to 32-bit). 4. Fold `Plcs[i].DefaultCacheTtlMs` into any tag whose explicit `CacheTtlMs` is null. The same builder runs both at startup and during reload validation, so a configuration that builds cleanly at startup is guaranteed to build cleanly at reload, and vice versa. There is no second validator that could disagree with the first. ## Validation Rules `ReloadValidator.Validate` is the gate the hot-reload path consults directly. It runs the following checks in order: 1. PLC names are non-empty and unique under ordinal comparison. 2. Every `Plcs[i].ListenPort` is in `[1, 65535]` and unique across the `Plcs` list. 3. `AdminPort` is in `[1, 65535]` and does not collide with any `ListenPort`. 4. For each PLC, `BcdTagMapBuilder.Build(next.BcdTags, plc.BcdTags, plc.DefaultCacheTtlMs)` reports no errors. This delegates the per-PLC well-formedness checks — duplicate addresses within a single resolved list, and 32-bit entries whose high register (`Address + 1`) overlaps a separate 16-bit entry — to the single source of truth used at startup. 5. Cache TTL bounds: every `BcdTag.CacheTtlMs` and every `Plcs[i].DefaultCacheTtlMs` must be `>= 0`, and any value above `60_000` ms requires `Cache.AllowLongTtl = true`. `Cache.MaxEntriesPerPlc` and `Cache.EvictionIntervalMs` must be `>= 0`. A failure at any step appends to the error list but the validator runs to completion so the operator sees every problem with a single save. If the list is non-empty, the reload is rejected atomically and no state mutates. Schema-level checks — invalid `Width` values on a `BcdTagOptions`, type mismatches, malformed JSON — are also enforced by `MbproxyOptionsValidator` (`IValidateOptions`) at bind time. The two paths overlap deliberately so both startup and reload reject the same malformed input with the same error wording. ### Rejected-reload example A duplicate `ListenPort` in the saved file produces an error like the following on the rejected log line: ```text Config reload rejected — Errors=Plc 'plc-02': Duplicate ListenPort 5020 (already used by 'plc-01'). ``` When several rules trip on the same save, the validator joins them with `; ` so the operator sees every problem from one file save. The current in-memory configuration is unchanged, every supervisor keeps running on its existing context, and the next valid save will replay the whole apply against the now-current state. ## What Stays vs. What Changes Mid-Flight The reload contract is built around a simple invariant: a Modbus request that has already started routing keeps the configuration it started with. The next request after the reload picks up the new values. The rewriter is the clearest example. `BcdPduPipeline` dereferences the tag map at the start of every PDU. A request that is already in the multiplexer's outbound queue is rewritten against the map that was current when it arrived. The very next request on the same TCP connection sees the new map. This avoids a torn behaviour where one PDU is half-rewritten under the old tag list and half under the new — every PDU is fully consistent with exactly one snapshot of the map. The same principle applies to timeouts. `Connection.BackendConnectTimeoutMs` and the per-operation timeout values are read through `IOptionsMonitor.CurrentValue` at the point the operation starts. A backend connect that has already entered its retry pipeline keeps its already-applied timeout for the remainder of that attempt. The next backend connect reads the new value. The reseat path is the only place where running state changes mid-connection. A reseat swaps the entire `PerPlcContext` — `TagMap`, `Counters`, `Cache` — via `PlcListenerSupervisor.ReplaceContextAsync`. The listener socket and the existing upstream pipes survive the swap. The brief transition window between the old context and the new is documented in code: any PDU mid-flight at the swap point may observe the boundary, but the rewriter only consults the map at PDU start, so the practical effect is the same next-PDU resolution rule. Counters are explicitly preserved across a reseat. The reconciler reads `supervisor.CurrentCounters` and passes the same `ProxyCounters` instance into the new context so request counts, rewrite counts, and error counts do not reset to zero every time an operator tweaks a tag. A restart, by contrast, constructs a brand-new `ProxyCounters` because the supervisor itself is brand new. ### Effect on upstream sockets The fate of an open upstream client socket depends on which bucket its PLC lands in: - **Reseat.** The socket stays open. The client never notices the reload happened; only its next request frame resolves against the new tag map. - **Restart.** The old listener stops, which closes every upstream socket bound to it. The client sees a TCP close and is expected to reconnect (Wonderware DAServer, generic Modbus masters, and the supported gateways all do this automatically). When it reconnects, it lands on the new listener at the new endpoint. - **Remove.** Same as a restart from the client's perspective: the listener stops and every connection closes. If the operator also removed the IP from the upstream client's configuration, the client stops reconnecting; otherwise the reconnect attempts simply fail with `ECONNREFUSED` until the PLC reappears. - **Add.** No effect on any existing socket. The new listener simply starts accepting on its `ListenPort`. ## Cache and Hot-Reload Any tag-list change that affects a PLC drops the entire `ResponseCache` for that PLC. The reseat path constructs a fresh cache through `ConfigReconciler.BuildCacheIfNeeded`, which inspects the resolved map and returns a new `ResponseCache` when at least one tag opts in, or `null` otherwise. The supervisor disposes the old cache during `ReplaceContextAsync`. Per-tag granular flush is intentionally not implemented. The reasoning is correctness over micro-optimisation: - A width change between 16-bit and 32-bit can invalidate cached entries at neighbouring addresses, not just at the changed tag. - A tag removal means a cached value is no longer rewritten on the way out, so the cached entry that was valid one millisecond ago is now serving the wrong shape. - A TTL change on one tag does not influence neighbouring entries, but the cost of tracking per-entry TTL versions and replaying flushes outweighs the cost of repopulating on demand. A wholesale drop is the simple correct move. Entries repopulate on demand at the next read against the new TTL, and a 54-PLC fleet with second-scale TTLs warms back to steady state within a handful of poll intervals. `Cache.MaxEntriesPerPlc` and `Cache.EvictionIntervalMs` deliberately do **not** trigger a reseat. A change to either value is structurally invisible to `TagMapsEqual` (which only inspects the resolved tag triples), so no cache rebuild happens. `MaxEntriesPerPlc` is enforced on subsequent inserts only — existing entries above the new cap stay until natural LRU eviction reaches them. `EvictionIntervalMs` is sampled by each fresh tick of the eviction loop, so a change takes effect at the next tick of the old interval. ## Reload Events Two events surface in the structured log every time the reconciler runs: ```csharp [LoggerMessage(EventId = 60, EventName = "mbproxy.config.reload.applied", Level = LogLevel.Information, Message = "Config reload applied — PlcsAdded={PlcsAdded} PlcsRemoved={PlcsRemoved} " + "PlcsRestarted={PlcsRestarted} PlcsReseated={PlcsReseated} GlobalTagDelta={GlobalTagDelta}")] private static partial void LogReloadApplied( ILogger logger, int plcsAdded, int plcsRemoved, int plcsRestarted, int plcsReseated, int globalTagDelta); [LoggerMessage(EventId = 61, EventName = "mbproxy.config.reload.rejected", Level = LogLevel.Error, Message = "Config reload rejected — Errors={Errors}")] private static partial void LogReloadRejected(ILogger logger, string errors); ``` `mbproxy.config.reload.applied` carries the counts from the executed `ReloadPlan` plus a `GlobalTagDelta` computed by `ConfigReconciler.ComputeGlobalTagDelta`, which counts how many global tag entries differ between the old and new options snapshots (added, removed, or width-changed). `mbproxy.config.reload.rejected` carries the joined error string from `ReloadValidator.Validate`. The reconciler also increments service-wide counters through `ServiceCounters.RecordReloadApplied` and `ServiceCounters.RecordReloadRejected`, which surface on the status page as `config.reloadCount`, `config.reloadRejectedCount`, and `config.lastReloadUtc`. Both event names are catalogued in [`../Reference/LogEvents.md`](../Reference/LogEvents.md). ### Reading the events A healthy reload looks like this in the log stream: ```text INFO mbproxy.config.reload.applied — PlcsAdded=1 PlcsRemoved=0 PlcsRestarted=0 PlcsReseated=2 GlobalTagDelta=3 ``` The properties answer four questions at a glance: how many new listeners were brought up, how many old listeners were torn down, how many existing listeners moved to a new endpoint (and therefore disconnected their clients), and how many existing listeners had their tag maps swapped underneath open connections. `GlobalTagDelta` reports the number of `BcdTags.Global` entries that differ between snapshots; it counts each address once whether the difference is "added", "removed", or "width changed". A rejected reload looks like this: ```text ERROR mbproxy.config.reload.rejected — Errors=Plc 'plc-02': Duplicate ListenPort 5020 (already used by 'plc-01').; Plc 'plc-03': BCD tag map error (DuplicateAddress): Address 1072 appears twice in resolved tag list. ``` Every error from the validator concatenates with `; ` so a single rejected event captures every problem. The matching `config.reloadRejectedCount` counter on the status page increments by one per rejected save, not per error inside the save. ## Related Documentation - [Architecture Overview](../Architecture/Overview.md) - [Response Cache](../Architecture/ResponseCache.md) - [BCD Rewriting](./BcdRewriting.md) - [Configuration Reference](../Operations/Configuration.md) - [Log Events](../Reference/LogEvents.md) - [Status Page](../Operations/StatusPage.md)