1eeee1e2929259b158fb6e012e29bb80f2c36f43
8 Commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
b222362ce0 |
mbproxy: remediate the 2026-05-16 code-review findings
Fixes every finding from the codereviews/2026-05-16 multi-agent review (2 Critical, 20 Major, 38 Minor) and adds that review to the repo. Highlights: dashboard XSS escape; response cache invalidated on the write request (not just the response); ReloadValidator now runs at startup so port collisions / duplicate names / malformed Resilience profiles fail fast; AdminPort 0 genuinely disables the admin endpoint; PlcListener accept-loop faults propagate to the supervisor's faulted path; reconciler Restart builds before removing; Resilience pipelines are restart-only from a frozen snapshot; multiplexer connect-race leak, watchdog party-list snapshot, backend-response and FC16 framing validation; frontend reconnect retry and util.js load guard; plus the log-event/doc drift sweep and test-port hygiene. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
0308490aef |
mbproxy: close out the dashboard code-review minor findings
Resolves the remaining Minor items from the 2026-05-15 review so the web-UI dashboard work has no open follow-ups: a real-HubConnection end-to-end test for the SignalR feed, stable mbproxy.admin.broadcast.* log-event names, keyboard/aria accessibility on the fleet table, frontend JS hardening (URL-decode guard, NaN guards, shared util.js), reconciler<->capture-registry coverage, throwing-sink and embedded-asset tests, broadcaster polish, and a soft upper bound on AdminPushIntervalMs. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
e719dd51c1 |
mbproxy: replace status page with a live SignalR web dashboard
The single auto-refreshing zero-JS status page gave operators a 25-column wall and no way to drill into one connection. This adds a Bootstrap fleet dashboard (filterable/sortable KPI table) and a per-PLC detail page with a real-time debug view of raw PLC-side BCD vs. decoded client-side values, streamed live over a SignalR feed. The debug view is fed by an on-demand per-tag value capture, armed only while a detail page is open. All assets (Bootstrap, SignalR client, fonts) are embedded so the UI works unchanged on firewalled networks; GET /status.json is untouched for scrapers. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
0868613890 |
mbproxy: add keepalive / connection monitoring
The DL205/DL260 ECOM emits no TCP keepalives, so an idle backend socket can be silently dropped by a middlebox (switch, firewall, NAT) after 2-5 minutes. Enable OS SO_KEEPALIVE on backend and accepted upstream sockets, and drive a periodic synthetic FC03 heartbeat on each idle backend socket so a dead path is detected before a real client request hits it. Controlled by Connection.Keepalive (ON by default). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
1a2856526a |
mbproxy: strip historical phase/wave/plan references from source comments
Comments described the *history* of how the code arrived (phase numbers, wave IDs, review IDs, dated TODOs) instead of what it does today. That scaffolding rotted as the codebase evolved. Cleaned 60 source files + .gitignore; behaviour unchanged (387/387 tests still pass). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
e66b17fe5f |
mbproxy: Wave 2 fixes from 2026-05-14 code review
Resolves the 21 Major findings catalogued in
codereviews/2026-05-14/RemediationPlan.md (Wave 2). Tests: 370 pass / 0 fail
(baseline 363 + 7 new W2 regression tests).
Multiplexer / concurrency:
W2.1 ConfigReconciler.Attach now threads the live coalescingAccessor through
to add/restart-built supervisors so a hot-reload of
ReadCoalescing.{Enabled,MaxParties} propagates to PLCs added or
restarted via reload.
W2.2 PlcMultiplexer._disposed and UpstreamPipe._disposed are now volatile
for ARM/portability defense.
W2.3 ProxyWorker._supervisors / ConfigReconciler._supervisors switched from
Dictionary to ConcurrentDictionary; reconciler uses TryRemove. The
outer Apply is serialised by a semaphore but the inner Add/Remove/
Restart Task.WhenAll continuations run in parallel.
W2.4 Counter parity for cache miss + coalescing-saturation miss documented
inline (per-design contract; behavior unchanged).
W2.5 _disposeCts.Dispose() and _connectGate.Dispose() guarded against late
watchdog ticks.
W2.6 _connectGate disposed in DisposeAsync.
W2.7 Inline doc clarifying the post-rewriter FC byte read.
Cache / hot-reload:
W2.8 PlcListenerSupervisor.ReplaceContextAsync now calls Clear() to capture
the entry count, emits mbproxy.cache.flushed, then disposes the old
cache. Previously the event was defined but never emitted.
W2.9 Inline doc explaining the implicit "skip cache invalidation while
recovering" gating (no backend reader during recovery → no FC06/FC16
response → no invalidation).
W2.10 ReloadValidator now re-checks resolved per-tag CacheTtlMs against
Cache.AllowLongTtl after BcdTagMapBuilder folds the per-PLC default.
BCD rewriter:
W2.11 Duplicate addresses detected within Global itself and within the per-PLC
Add list itself, BEFORE the working dictionary collapses keys. Cross-list
collisions (Global vs Add) remain the documented width-override pattern.
Previously the DuplicateAddress error was unreachable dead code.
W2.12 OverlappingHighRegister reports each colliding pair exactly once
(canonicalised low/high pair tracked in a HashSet).
W2.13 FC16 32-bit write rejects clientLow > 9999 or clientHigh > 9999 BEFORE
the high*10000+low reconstruction. Without this guard, (high=9999,
low=9999) silently re-encoded as (high=9998, low=9999), losing 1 from
the high word.
W2.14 FC16 validates pdu.Length >= 6 + qty*2 upfront — no half-rewritten
requests when a malformed client claims more registers than it ships.
Supervisor:
W2.15 WaitForInitialBindAttemptAsync now backed by TaskCompletionSource
instead of 10ms busy-poll. Resolves race against fast Stopped→Bound→
Stopped transitions and hangs when the supervisor task throws.
W2.16 StartAsync refuses re-entry on a non-Stopped supervisor (was leaking
the previous _supervisorCts).
W2.17 New TransitionTo helper writes _state, _lastBindError, and (optionally)
_recoveryAttempts under one lock. Snapshot() reads under the same lock
so the status page never reports an inconsistent triple. Truncate
helper extracted (was copy-pasted across three sites).
W2.18 MbproxyOptionsValidator + ReloadValidator reject Connection.{Backend
ConnectTimeoutMs, BackendRequestTimeoutMs, GracefulShutdownTimeoutMs}
<= 0. Misconfigured 0 produces immediate CancelAfter(0) failures.
Hosting / diagnostics:
W2.20 ProxyWorker.StopAsync supervisor-stop deadline now reads from
IOptionsMonitor.CurrentValue.Connection.GracefulShutdownTimeoutMs
(was hard-coded 5s).
W2.21 src/Mbproxy/appsettings.json deleted; the published file is now a Link
to install/mbproxy.config.template.json so the binary ships with a
usable, fully-commented example config instead of an empty stub. Tests
strip the inherited file from their bin via an AfterTargets="Build"
Target so they don't pick up the template's example PLCs.
W2.22 invalidBcdWarnings (PlcPdusStatus) and codeOther (ExceptionCounts)
added to StatusDto, plumbed through StatusSnapshotBuilder, surfaced
in StatusHtmlRenderer table cells.
W2.23 EventLogBridge caches EventLog.SourceExists at construction so Emit
doesn't hit the registry on every Error+ log line.
New regression tests:
ReloadValidatorTests:
Validate_PerTagCacheTtl_Above60s_Without_AllowLongTtl_Fails
Validate_PerTagCacheTtl_Above60s_With_AllowLongTtl_Passes
Validate_ResolvedTtl_FromPerPlcDefault_AboveCap_Fails
Validate_ZeroBackendConnectTimeoutMs_Fails
Validate_NegativeGracefulShutdownTimeoutMs_Fails
BcdPduPipelineTests:
FC16_32Bit_ClientHighOrLowAbove9999_PassesThroughRaw_WithInvalidBcdWarning
FC16_TruncatedRegisterData_PassesThroughRaw_NoPartialRewrite
Reworked tests in BcdTagMapBuilderTests for the W2.11 contract (Global dup,
Add dup, Add-overrides-Global accepted as width override).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
1db900edef |
mbproxy: add opt-in response cache (Phase 11)
Layers a per-PLC, per-tag response cache on top of Phase 10's coalescing.
Cache is OFF by default per tag (CacheTtlMs = 0); a fresh deployment with no
TTL config behaves identically to Phase 10. Operators opt tags in by setting
CacheTtlMs > 0 on a BcdTagOptions entry (or DefaultCacheTtlMs > 0 on a
PlcOptions entry), explicitly acknowledging the staleness window.
Cache lookup order: cache -> coalesce -> backend. A cache hit short-circuits
both Phase 10's coalescing path and Phase 9's backend send. Cache stores
POST-rewriter PDU bytes so hits never re-invoke the BCD rewriter. FC06/FC16
write responses invalidate every cached entry whose address range overlaps
the write (half-open interval math).
New types (Mbproxy.Proxy.Cache, all internal):
- CacheKey (record-struct, same shape as CoalescingKey but kept SEPARATE so
the two phases evolve independently).
- CacheEntry, ResponseCache (IDisposable; LRU + PeriodicTimer eviction
loop), CacheInvalidator (pure overlap matcher), CacheLogEvents (stable
mbproxy.cache.* names).
Multi-tag range TTL = min(TTLs); any tag with TTL = 0 in the range disables
caching for the whole read (conservative-by-design).
Options surface:
- BcdTagOptions.CacheTtlMs (nullable int; null = fall through to PLC default)
- PlcOptions.DefaultCacheTtlMs
- MbproxyOptions.Cache.{AllowLongTtl, MaxEntriesPerPlc, EvictionIntervalMs}
- TTL > 60_000 ms requires Cache.AllowLongTtl = true (reload validation).
Admin counters (Tier 1.8 + Tier 2 cache-memory KPIs from docs/kpi.md):
- CacheHitCount, CacheMissCount, CacheInvalidations on ProxyCounters.
- CacheEntryCount, CacheBytes via a new ICacheStatsProvider snapshot path.
- /status.json and the HTML page surface a new Cache cell per PLC row.
Hot-reload: any tag-list change to a PLC reseats the per-PLC context with a
fresh cache; the old cache is disposed inside ReplaceContextAsync. Per-tag
flush granularity is intentionally not implemented in v1.
PLCs with no cache-eligible tags (every resolved tag has CacheTtlMs = 0)
get Cache = null on the context and skip the eviction timer entirely, so
the no-cache path is byte-identical to Phase 10.
Tests (32 new unit + 5 new E2E = 37 new; suite now 314 unit + 48 E2E):
- CacheKeyTests, CacheEntryTests (records + boundary semantics).
- CacheInvalidatorTests: full overlap, both partials, adjacent-not-
overlapping, disjoint, different unit ID + auxiliary FC-filter / zero-qty.
- ResponseCacheTests: round-trip, lazy expiry, range invalidation,
unit-id filter, LRU bound, LRU access tracking, concurrent get/set,
dispose, clear, approximate-bytes accounting.
- ResponseCacheMultiplexerTests (stub-backend): hit short-circuits
coalescing, BCD-decoded bytes are cached not raw, FC06 invalidates
overlapping, non-overlapping write does not invalidate, multi-tag
TTL=min rule, regression-cache-disabled-by-default-is-Phase-10, hit
works even when backend unreachable.
- ResponseCacheE2ETests (pymodbus DL205 sim, sequential reads):
* Headline: 10 reads with TTL=1000 ms -> 9 hits, 1 miss, 1 backend trip.
* TTL expiry path with sleep > TTL.
* Write invalidation through the proxy on a scratch register.
* BCD-decoded bytes are cached, not raw BCD nibbles.
* Regression: Cache disabled by default -> behaviour byte-identical to
Phase 10.
Pre-existing flake hardened: BackendDisconnect_CascadesToAllUpstreams now
polls briefly for the cascade counter to absorb the inherent scheduling
gap between "upstream EOF observed" and "counter incremented inside
TearDownBackendAsync." Counter semantics unchanged.
Phase doc updated with implementation clarifications discovered during
this work (CacheKey kept separate from CoalescingKey, LastUsedTick is
long, FC06/FC16 startAddr/qty parsing extension, cache-pre-connect
short-circuit, write-invalidation only on successful responses).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
56eee3c563 |
mbproxy: initial commit through Phase 9 (TxId multiplexing)
Adds the mbproxy service end-to-end. Phases 00-08 implement the production-ready single-listener / 1:1-backend transparent Modbus TCP proxy with bidirectional BCD rewriting for the ~54-PLC DL205/DL260 fleet. Phase 9 replaces the connection layer with a single backend socket per PLC plus MBAP TxId rewriting, lifting the H2-ECOM100's 4-concurrent-client cap as an operational ceiling. Phase 9 additions of note: - PlcMultiplexer + UpstreamPipe + TxIdAllocator + CorrelationMap - InFlightRequest with IReadOnlyList<InterestedParty> (load-bearing for Phase 10 read coalescing — do not collapse to a single field) - Per-request watchdog: surfaces Modbus exception 0x0B to upstream on BackendRequestTimeoutMs, defending against lost responses, dead-PLC paths, and pymodbus 3.13.0's concurrent-multiplexed- request bug (its ServerRequestHandler.last_pdu state race) - Status DTO + HTML gain inFlight / maxInFlight / txIdWraps / disconnectCascades / queueDepth (Tier 1.6 in docs/kpi.md) Tests: 263 unit + 38 E2E. Multiplexer correctness under truly concurrent backend traffic is proved against a stub backend in PlcMultiplexerTests; MultiplexerE2ETests paces requests so pymodbus 3.13's single-PDU framer stays in known-good mode. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |