Files
wwtools/mbproxy/plans/2026-05-15-multiplatform.md
Joseph Doherty b330faff03 mbproxy: cross-platform support — Linux/systemd alongside Windows
Make the service build, run, and install on Linux as a first-class
target while keeping the Windows Service + Event Log behaviour intact.

- Build: drop the hardcoded win-x64 RID — single-file publish now works
  for any RID. publish.ps1 gains -Rid; new publish.sh for Linux hosts.
- Diagnostics: DiagnosticSinkSelector picks the Error+ sink per host —
  Windows Event Log under the SCM, local syslog under systemd
  (Serilog.Sinks.SyslogMessages), none for interactive runs. The
  EventLog truncation helper is extracted so it is testable cross-OS.
- Host: Program.cs registers AddSystemd() alongside AddWindowsService().
- Config: a RID-conditioned appsettings template ships Windows or Unix
  paths; both templates are schema-validated by a test.
- Install: systemd unit (Type=exec) plus install.sh / uninstall.sh.
  Also fixes two cross-platform bugs found while testing: install.ps1
  and uninstall.ps1 used New-EventLog / Remove-EventLog (absent in
  PowerShell 7), and the E2E sim launcher hardcoded Windows venv paths.
- Docs updated across README, CLAUDE.md, and docs/ for dual-platform.

413 tests pass on Windows; 374 (all non-simulator) on Linux.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-15 09:41:59 -04:00

577 lines
29 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# mbproxy Multiplatform Implementation Plan
**Created:** 2026-05-15
**Status:** All six phases implemented. 413 tests green on Windows; Windows Service and
Linux systemd install E2E both green. Two findings (pymodbus-sim-on-Linux, `AddSystemd()`
notify) logged as orthogonal follow-ups. Working tree only — nothing committed.
**Working artifact** — not part of the `docs/` source-of-truth tree (per `../DOCS-GUIDE.md`).
Delete or archive once the work lands.
### Progress log
- **2026-05-15 — Phase 1 done, Gate 1 green.** RID removed from `csproj`
(single-file settings now gated on `'$(RuntimeIdentifier)' != ''`);
`publish.ps1` gained `-Rid`; `publish.sh` added. `dotnet build -c Debug` 0
warnings; `dotnet test` **398 passed / 0 failed** (baseline 325 → 398, the
Keepalive feature added tests); `win-x64``Mbproxy.exe` 100.1 MB,
`linux-x64``Mbproxy` ELF 97.2 MB. ELF launch-smoked on `10.100.0.35`:
full startup, listeners bound, `mbproxy.startup.ready` + admin endpoint up,
no errors. Box prep done (.NET SDK 10.0.300, shellcheck 0.10.0 installed).
- **2026-05-15 — Phases 2 + 3 code done (combined integrator pass).** Packages
added: `Microsoft.Extensions.Hosting.Systemd` 10.0.8,
`Serilog.Sinks.SyslogMessages` 4.1.0 (the maintained IonxSolutions package —
the bare `Serilog.Sinks.Syslog` ID is a near-abandoned 0.2.0 package; same
approved intent). New `DiagnosticSink` enum + `DiagnosticSinkSelector` (pure);
new `SyslogBridge`; `EventLogBridge` truncation extracted to a non-annotated
`EventLogMessage` type (testable cross-OS). `AddMbproxySerilog` now selects
the sink internally; `Program.cs` calls `AddSystemd()` + `AddWindowsService()`.
13 new tests. **411 passed / 0 failed on Windows**; on `10.100.0.35`
**372 passed / 39 skipped / 0 failed** — all 39 skips are simulator-backed
E2E (see finding below), every host/diagnostic/smoke test green on Linux.
- **2026-05-15 — Two cross-platform bugs found and fixed in install tooling.**
(1) `tests/sim/run-dl205-sim.ps1` was Windows-only — hardcoded venv paths
`Scripts\*.exe`; now branches `Scripts`/`.exe` vs `bin`/`` on `$IsWindows`
and adds `python3` to the interpreter candidates. (2) `install.ps1` /
`uninstall.ps1` used `New-EventLog` / `Remove-EventLog`, which exist only in
Windows PowerShell 5.1 — they fail under PowerShell 7+. Switched to the .NET
API (`[EventLog]::CreateEventSource` / `DeleteEventSource`), symmetric with
the `SourceExists` calls already in those scripts.
- **2026-05-15 — Windows Service E2E green (local, admin).** Republished
`win-x64`; `install.ps1 -Start` installs + starts the service; verified
Running/Automatic, `status.json` served, listeners bound,
`mbproxy.startup.ready` logged, Event Log source registered,
`WindowsServiceLifetime` wrote "Service started successfully" (proves the
process runs under the SCM). `uninstall.ps1` stopped/deleted the service,
archived logs, removed the Event Log source. Box left clean. (A forced
`EventLogBridge` Error+ write was not pursued — `Emit` is unchanged code,
covered by `EventLogMessageTests`; sink selection is covered by
`DiagnosticSinkSelectorTests`.)
- **2026-05-15 — Linux systemd E2E done.** The `linux-x64` ELF runs under a
real systemd unit on `10.100.0.35`: starts, binds listeners, serves the
admin endpoint, and `systemctl stop` → graceful SIGTERM drain
(`mbproxy.shutdown.complete` in the journal). `Type=notify` does not work
(see Findings) → Phase 5 will ship `Type=exec`. Box prep this session:
`dotnet-sdk-10.0`, `shellcheck`, `python3-venv`, pwsh 7.6.1 (dotnet global
tool), pymodbus 3.13.0 venv.
- **2026-05-15 — Phases 46 done.** Phase 4: new `install/mbproxy.linux.config.template.json`
(Unix log path `/var/log/mbproxy`, systemd-oriented comments); `csproj` links the
platform-correct template into the published `appsettings.json` by RID
(`win-*`/RID-less → Windows, else Unix) — verified by publishing both RIDs;
`MbproxyOptionsBindingTests` extended to load + schema-validate both templates
(now 413 tests on Windows). Phase 5: `install/mbproxy.service` (`Type=exec`,
hardened, `mbproxy` service account), `install/install.sh`, `install/uninstall.sh`
— `shellcheck` clean; install→active→`status.json` served→uninstall→clean E2E
passed on `10.100.0.35`. Phase 6: `README.md`, `mbproxy/CLAUDE.md`,
`../CLAUDE.md`, `docs/Operations/Configuration.md`, `docs/Reference/LogEvents.md`,
`docs/Operations/Troubleshooting.md`, `docs/Architecture/Overview.md`,
`docs/Features/HotReload.md` updated for the dual-platform reality.
### Findings
- **Linux full run: 374 passed / 37 failed / 0 skipped.** With the simulator
launcher fixed and pymodbus provisioned, the simulator-backed E2E tests now
*run* on Linux (0 skipped) but **37 fail** with `IOException: Broken pipe`
(`SocketException`) when the NModbus client writes through the proxy. The
failures are broad across all simulator-backed E2E (cache, forwarding,
rewriter, supervision). **Not a Phases 13 regression:** the multiplatform
work touches only build config, diagnostic sinks, and host registration —
none of the Modbus proxy data path. The same 37 tests pass on Windows
(411/411), and every non-E2E test — including all 13 new diagnostic tests —
passes on Linux. **Root cause isolated:** the `SimulatorSmokeTests` — which
connect *directly to the pymodbus simulator with no proxy in the path* — also
fail (TCP connect error). So the fault is the pymodbus 3.13.0 simulator
itself on this box, not mbproxy's proxy code. Likely pymodbus 3.13.0 vs
Python 3.13.5 (both very new), or the box's Docker-host networking. Treated
as a **separate investigation** (pymodbus-simulator-on-Linux), entirely
orthogonal to the multiplatform service work — see the session report.
- The `run-dl205-sim.ps1` idempotency check keys on `Test-Path $venvDir` only;
a venv left structurally broken by a killed run (no `bin/`) is not detected
and re-created. Pre-existing latent gap, not platform-specific — noted, not
fixed (out of scope; a clean run is unaffected).
- **`AddSystemd()` does not deliver `sd_notify(READY=1)` here → Phase 5 uses
`Type=exec`.** mbproxy runs correctly under systemd (starts, binds, serves,
and SIGTERM → graceful drain all work — verified in the journal), but a
`Type=notify` unit never receives `READY=1` and times out. Isolated step by
step: `SystemdHelpers.IsSystemdService()` correctly returns `True` under
systemd; a *minimal* `Host.CreateApplicationBuilder()` + `AddSystemd()` host
reproduces the failure; both a `systemd-run` transient unit and a real
`Type=notify` unit file fail identically. So it is **not an mbproxy bug** —
it is a `HostApplicationBuilder` + `Microsoft.Extensions.Hosting.Systemd`
10.0.8 (minimal-hosting) issue. **Resolution:** the Phase 5 unit uses
`Type=exec` — mbproxy is a leaf service that nothing orders against, so the
readiness signal is unnecessary; `Type=exec` + the generic host's built-in
POSIX `SIGTERM` handling (independent of `SystemdLifetime`) gives a fully
working unit with `Restart=on-failure`. `AddSystemd()` stays in `Program.cs`
(correct, documented, forward-compatible, harmless). Root-causing the .NET
notify gap is logged as a separate follow-up.
A plan to make mbproxy run on Linux (and incidentally macOS) as a first-class
target while keeping the Windows Service + Event Log behavior intact and adding
systemd + journald/syslog equivalents.
The hosting model (`Host.CreateApplicationBuilder` + `IHostedService` + Kestrel)
is already portable, so the work is narrow: generalize the build, abstract one
diagnostic sink, add one package + one call, and add Linux tooling/docs.
---
## 0. Test Environments
Both platforms can be exercised fully — no environment is simulated or
deferred.
### 0.1 Windows (the dev box — local)
The dev box runs **with administrator rights**, so every Windows gate runs
locally with no separate test machine:
- `install.ps1` (requires elevation) installs the real Windows Service.
- The Event Log source `mbproxy` can be registered and `EventLogBridge` writes
verified against the Application log.
- Install → start → stop → uninstall is a full local round-trip.
> Windows Service E2E mutates machine state (a registered service + Event Log
> source). It is **integrator-only** and the integrator always runs
> `uninstall.ps1` to leave the box clean after each gate.
### 0.2 Linux
**Host:** `dohertj2@10.100.0.35` — Debian 13 (trixie), amd64, kernel 6.12,
hostname `DOCKER`. systemd 257.
- **Access:** passwordless SSH from the Windows dev box; passwordless `sudo`
(verified 2026-05-15).
- **Reachable** on `10.100.0.35` (also `10.50.0.35`, `10.200.0.35`).
- **One-time prep** (run once before Wave 1 gates):
```
ssh dohertj2@10.100.0.35 'sudo apt-get update && \
sudo apt-get install -y dotnet-sdk-10.0 shellcheck'
```
`dotnet-sdk-10.0` candidate is `10.0.203` — matches the `net10.0` target.
- **Docker is installed** on the box (the user is in the `docker` group). Use
ephemeral Debian containers to isolate per-subagent E2E runs so parallel
Wave-4 agents don't collide on the host's systemd / ports (see section 3,
rule 8).
**How the integrator uses the box per gate:**
- Push the integration branch (or `rsync` the worktree) to the box, then run
`dotnet build` / `dotnet test` / `dotnet publish -r linux-x64` over SSH.
- Run the *actual* `linux-x64` ELF binary, the systemd unit, and `shellcheck`
here — Windows can cross-*publish* a `linux-x64` binary but cannot *run* or
service-host it.
> The box is a **shared mutable resource**. Host-level mutations (apt installs,
> `systemctl` on the real host, privileged-port binds) are integrator-only and
> run serially between waves. Subagents that need Linux E2E use throwaway
> Docker containers, never the host's init system directly.
---
## 1. Scope
**In scope**
- Linux (`linux-x64`) as a supported runtime target alongside `win-x64`.
- systemd integration (`Type=notify`, sd_notify readiness, SIGTERM drain).
- A Linux-appropriate error-event diagnostic sink (syslog, severity-mapped).
- RID-agnostic build + dual-RID publish tooling.
- Linux install tooling (systemd unit + shell scripts).
- Docs/README/CLAUDE.md updates.
**Out of scope (state explicitly in docs)**
- macOS `launchd` integration — mbproxy will *run* on macOS as a console
process but ships no service-manager integration.
- ARM RIDs (`linux-arm64`) — the build will not *forbid* them, but they are
untested.
- Container/Docker packaging — separate future effort.
**Locked design decisions**
- Reference `Microsoft.Extensions.Hosting.WindowsServices` *and*
`Microsoft.Extensions.Hosting.Systemd` unconditionally; both packages are
portable and both helpers self-detect their host. No conditional
`<PackageReference>`.
- All Windows API calls (`System.Diagnostics.EventLog`) stay behind
`OperatingSystem.IsWindows()` + `[SupportedOSPlatform("windows")]`; CA1416
(already enforced via `TreatWarningsAsErrors`) is the safety net.
- Diagnostic sink selection happens **once**, at the composition root
(`AddMbproxySerilog`). No OS branching anywhere else.
- Prefer **new files** over editing shared files, to keep parallel work
conflict-free.
- **Linux error-event sink: `Serilog.Sinks.Syslog`** (decided 2026-05-15).
Error+ events get RFC5424 severity mapping on Linux, mirroring the Windows
Event Log behavior where Error+ is surfaced distinctly.
`DiagnosticSinkSelector` returns `EventLog | Syslog | None`.
---
## 2. Phase Breakdown
Each phase lists its **owned file set** (the parallel-safety contract),
changes, tests, and a **gate** that must be green before the next phase starts.
### Phase 1 — Build & publish generalization (foundation)
**Objective:** Remove the hardcoded RID so the project builds/publishes for any
runtime; keep the Windows output byte-identical.
**Owned files**
- `src/Mbproxy/Mbproxy.csproj`
- `install/publish.ps1`
- `install/publish.sh` *(new)*
**Changes**
- `Mbproxy.csproj`: delete `<RuntimeIdentifier>win-x64</RuntimeIdentifier>`
from the Release `PropertyGroup`; keep `PublishSingleFile` / `SelfContained`
/ `IncludeNativeLibrariesForSelfExtract`. RID becomes a publish-time `-r`
argument.
- `publish.ps1`: add a `-Rid` parameter (default `win-x64`), keep the
two-flavor logic.
- `publish.sh`: Linux counterpart producing `linux-x64` self-contained +
framework-dependent builds.
- (The RID-conditioned `appsettings.json` content item is Phase 4; in Phase 1
just confirm the build works without a baked RID.)
**Tests**
- No xunit tests (build-config change). Gate is publish success on both RIDs.
**Gate 1**
- `dotnet build -c Debug` green; `dotnet test` full suite green (unchanged
count).
- `dotnet publish -c Release -r win-x64` produces a single-file `Mbproxy.exe`
(same size class as before).
- `dotnet publish -c Release -r linux-x64` produces a single-file `Mbproxy`
ELF binary. Cross-published from the Windows dev box; the ELF is then copied
to `10.100.0.35` and confirmed to launch (`./Mbproxy --version`-class smoke).
- Zero new analyzer warnings.
---
### Phase 2 — Diagnostic sink abstraction
**Objective:** Make error-event delivery a platform-selected sink. Windows
keeps `EventLogBridge`; Linux gets a syslog sink.
**Owned files**
- `src/Mbproxy/Diagnostics/DiagnosticSinkSelector.cs` *(new — pure selection
logic)*
- `src/Mbproxy/Diagnostics/SyslogBridge.cs` *(new)*
- `src/Mbproxy/Diagnostics/EventLogBridge.cs` *(minor: extract the 32 KB
truncation helper into a testable static method)*
- `src/Mbproxy/HostingExtensions.cs` *(only `AddMbproxySerilog`)*
- `src/Mbproxy/Mbproxy.csproj` *(add `Serilog.Sinks.Syslog` package)*
- New test files (see below)
> `HostingExtensions.cs` and `Mbproxy.csproj` are also touched by Phase 3.
> **Phases 2 and 3 must not run in parallel** (see section 3). They are
> sequential.
**Changes**
- `DiagnosticSinkSelector` — a pure function taking
`(bool isWindows, bool isWindowsService, bool isSystemd)` and returning an
enum (`EventLog | Syslog | None`). No I/O, fully unit-testable.
- `SyslogBridge`: Serilog `ILogEventSink` wrapping `Serilog.Sinks.Syslog`,
active for Error+ only, mirroring `EventLogBridge`'s contract (silent no-op
if syslog unavailable).
- `AddMbproxySerilog`: replace the `addEventLogBridge` bool parameter with a
`DiagnosticSinkSelector` result; wire the chosen sink. Keep the
`OperatingSystem.IsWindows()` guard around `EventLogBridge`.
- Extract `EventLogBridge`'s message-truncation into
`internal static string TruncateToEventLogLimit(string)` so it can be tested
OS-independently.
**Tests** (`tests/Mbproxy.Tests/Diagnostics/`)
- `DiagnosticSinkSelectorTests` — table-driven: Windows+service→`EventLog`;
Windows console→`None`; Linux+systemd→`Syslog`; Linux console→`None`;
macOS→`None`.
- `EventLogBridgeTests` — `[Trait("Category","Unit")]`, Windows-guarded facts:
source-missing → silent no-op; truncation helper caps at 32 KB and appends
`...` (this fact runs on all OSes since the helper is pure).
- `SyslogBridgeTests` — Error+ filter; no-throw when transport unavailable.
**Gate 2**
- Full test suite green on Windows (local); full suite green on Linux —
integrator runs `dotnet test` over SSH on `10.100.0.35`.
- `EventLogBridge` emits to the Application log — verified locally via a real
Windows Service install (`install.ps1`, admin rights available), then
`uninstall.ps1` to clean up.
- CA1416: zero warnings.
---
### Phase 3 — Service host integration (systemd)
**Objective:** Register both init-system integrations; the host correctly
reports readiness to whichever launched it.
**Owned files**
- `src/Mbproxy/Program.cs`
- `src/Mbproxy/HostingExtensions.cs` *(call-site update only)*
- `src/Mbproxy/Mbproxy.csproj` *(add `Microsoft.Extensions.Hosting.Systemd`)*
**Changes**
- `csproj`: add
`<PackageReference Include="Microsoft.Extensions.Hosting.Systemd" />` (pin to
the 10.0.x line matching the existing Windows-services package).
- `Program.cs`: call `builder.Services.AddSystemd();` alongside
`AddWindowsService();`. Compute `isSystemd` via
`SystemdHelpers.IsSystemdService()` and feed `DiagnosticSinkSelector`
together with `isWindowsService`.
- Confirm SIGTERM → host shutdown → existing
`Connection.GracefulShutdownTimeoutMs` drain path works (it does — POSIX
signal handling is built into the generic host; just verify).
**Tests** (`tests/Mbproxy.Tests/HostSmokeTests.cs` — extend existing file)
- `HostSmoke_RegistersBothServiceIntegrations_StartsAndStops` — builds the host
with both `AddWindowsService` + `AddSystemd`, asserts no throw, asserts
`mbproxy.startup.ready` still logged.
- Existing two smoke tests must remain green.
**Gate 3**
- Full suite green on Windows (local) and Linux (`10.100.0.35` via SSH).
- Windows Service E2E, run locally with admin rights: `install.ps1` → service
starts, logs `mbproxy.startup.ready` + writes to Event Log, `Stop-Service`
drains cleanly, `uninstall.ps1` removes it. **No regression** in Windows
behavior is the hard requirement of this gate.
- Linux systemd E2E on `10.100.0.35` — **done.** The `linux-x64` binary runs
under a real systemd unit: it starts, binds listeners, serves the admin
endpoint, and `systemctl stop` (SIGTERM) drains gracefully
(`mbproxy.shutdown.complete` in the journal). `Type=notify` was found not to
deliver `READY=1` (Findings) → the Phase 5 unit uses `Type=exec`, under which
the service is fully functional.
---
### Phase 4 — Config & filesystem portability
**Objective:** No Windows-only paths in the shipped/installed config.
**Owned files**
- `install/mbproxy.config.template.json` *(Windows — keep `C:\ProgramData\...`
path)*
- `install/mbproxy.linux.config.template.json` *(new — `/var/log/mbproxy/...`,
Linux syslog `Using` entry)*
- `src/Mbproxy/Mbproxy.csproj` *(condition the linked `appsettings.json`
content item by `$(RuntimeIdentifier)`)*
> Touches `csproj`. Must run after Phase 3's csproj edit is merged (sequential
> w.r.t. csproj), but is otherwise independent of Phase 5/6.
**Changes**
- New Linux template: log path `/var/log/mbproxy/mbproxy-.log`; Serilog
`Using` array includes the syslog sink; comment header points at
`/etc/mbproxy/appsettings.json`.
- `csproj`: link the win template for `win-*` RIDs and the linux template for
`linux-*` RIDs into the published `appsettings.json` (RID-conditioned
`<Content>` items).
**Tests** (`tests/Mbproxy.Tests/Options/`)
- Extend `MbproxyOptionsBindingTests`: load **each** shipped template through
the config binder + `MbproxyOptionsValidator`; assert both bind and validate
cleanly. Catches a malformed Linux template at build time.
**Gate 4**
- Both templates bind + validate (new test green).
- `dotnet publish -r linux-x64` ships the Linux template as `appsettings.json`;
`-r win-x64` ships the Windows one. Verify by inspecting publish output.
---
### Phase 5 — Linux install tooling
**Objective:** Parity with `install.ps1` for systemd hosts.
**Owned files** (all new, fully disjoint from all other phases)
- `install/mbproxy.service` — systemd unit, **`Type=exec`** (not `Type=notify` —
see Findings: `AddSystemd()` does not deliver `READY=1` for the minimal
hosting model), `Restart=on-failure`, `User=mbproxy`, `ExecStart` pointing at
the installed binary; sets `DOTNET_BUNDLE_EXTRACT_BASE_DIR`.
- `install/install.sh` — creates `mbproxy` service account, lays down binary +
`/etc/mbproxy/appsettings.json` (preserve-if-exists, matching `install.ps1`
semantics), creates `/var/log/mbproxy`, installs + `systemctl enable --now`.
- `install/uninstall.sh` — `systemctl disable --now`, archives logs (mirror the
`.archived-<ts>` convention), removes unit.
**Tests**
- Not xunit. Gate = `shellcheck` clean + a dry-run inside a throwaway Debian
container on `10.100.0.35`.
**Gate 5**
- `shellcheck install/*.sh` clean — run on `10.100.0.35` (shellcheck installed
in the one-time prep).
- End-to-end on `10.100.0.35`, inside a throwaway Debian container:
`install.sh` → service active → proxy answers Modbus on a configured port →
`uninstall.sh` → service gone, logs archived. Container isolation keeps the
`mbproxy` service account / unit off the real host.
---
### Phase 6 — Documentation
**Objective:** Docs reflect dual-platform reality; doctrine in `DOCS-GUIDE.md`
respected.
**Owned files**
- `README.md` — rewrite "Hard constraints / prerequisites" (drop "No Linux or
Docker support"); add Linux install path; document both publish flavors ×
both RIDs.
- `docs/Operations/Configuration.md` — both config templates, log-path
differences, syslog vs Event Log.
- `docs/Operations/Troubleshooting.md` — `journalctl` guidance alongside Event
Viewer.
- `docs/Architecture/Overview.md` — note dual init-system hosting (only if it
shifts a headline bullet).
- `docs/Reference/LogEvents.md` — note Error+ events route to Event Log
(Windows) / syslog (Linux).
- `mbproxy/CLAUDE.md` — correct the implied Windows-only framing.
- `wwtools/CLAUDE.md` — broaden the mbproxy index row if the task→tool mapping
changed.
**Tests**
- Markdown link-check across touched files.
**Gate 6**
- All internal doc links resolve.
- README "Hard constraints" no longer contradicts the shipped tooling.
---
## 3. Parallel Subagent Execution Plan
### Dependency graph
```
Phase 1 (build) ──> Phase 2 (diagnostics) ──> Phase 3 (host) ──┬─> Phase 4 (config)
├─> Phase 5 (install)
└─> Phase 6 (docs)
```
Phases 2 and 3 are **strictly sequential**: Phase 3 calls the new
`AddMbproxySerilog` signature Phase 2 defines, and both edit
`HostingExtensions.cs` + `csproj`. Phases 4, 5, 6 are **mutually independent**
and parallelizable once Phase 3 is merged.
### Wave plan
| Wave | Phases | Agents | Mode |
| ---- | --------- | ------------------- | ----------------------------------------------- |
| W1 | Phase 1 | 1 agent | Single — touches `csproj` |
| W2 | Phase 2 | 1 agent | Single — touches `csproj` + `HostingExtensions` |
| W3 | Phase 3 | 1 agent | Single — touches `csproj` + `HostingExtensions` + `Program.cs` |
| W4 | 4, 5, 6 | 3 agents (parallel) | Parallel — disjoint file sets |
> Phase 4 touches `csproj` but no other W4 phase does, so within W4 the file
> sets are still disjoint. Safe.
### File-ownership matrix (the parallel-safety contract)
| File | P1 | P2 | P3 | P4 | P5 | P6 |
| --------------------------------------------- | -- | -- | -- | -- | -- | -- |
| `Mbproxy.csproj` | x | x | x | x | | |
| `HostingExtensions.cs` | | x | x | | | |
| `Program.cs` | | | x | | | |
| `Diagnostics/*` (new + EventLogBridge) | | x | | | | |
| `install/publish.*` | x | | | | | |
| `install/*.config.template.json` | | | | x | | |
| `install/install.sh`, `uninstall.sh`, `.service` | | | | | x | |
| `tests/**` | | x | x | x | | |
| docs / READMEs / CLAUDE.md | | | | | | x |
No column in W4 (P4/P5/P6) shares a row. Confirmed conflict-free.
### Subagent rules (enforce in every dispatch prompt)
1. **One git worktree per subagent** — dispatch each `Agent` call with
`isolation: "worktree"`. Physical isolation means even a stray edit can't
corrupt a sibling's tree.
2. **Owned-file contract** — each subagent is told its exact owned file set
from the matrix and instructed to edit nothing outside it. A subagent that
discovers it needs an out-of-set file must stop and report, not edit.
3. **No intra-wave API coupling** — subagents in the same wave may only depend
on public APIs from *already-merged* prior waves, never on a sibling's
in-progress work. (This is why P2→P3 are separate waves, not parallel.)
4. **Tests ship with code** — the subagent that writes a phase's code also
writes that phase's tests and runs `dotnet test` green *in its own
worktree* before reporting done. No separate "test agent."
5. **Integrator merges in declared order** — the main agent merges each
worktree, runs the full build + test suite, and only then declares the
phase gate met. A failed gate blocks the next wave.
6. **High-contention files are single-agent-only** — `csproj`,
`HostingExtensions.cs`, `Program.cs`, `CLAUDE.md` are never edited by two
agents in the same wave (the matrix guarantees this).
7. **Prefer new files** — `DiagnosticSinkSelector.cs`, `SyslogBridge.cs`,
`mbproxy.linux.config.template.json`, the shell scripts, the unit file are
all new — new files can't merge-conflict, maximizing safe parallelism.
8. **Shared test hosts are integrator-only for mutations** — subagents may run
`dotnet build` / `dotnet test` (read-mostly) but must **not** install a
Windows Service, register an Event Log source, or `systemctl` against the
real `10.100.0.35` host. Service-level E2E is the integrator's job at gate
time; if a subagent needs Linux E2E it spins an ephemeral Docker container
on the box (named per-agent, `--rm`) so parallel agents never collide on
ports, the init system, or service accounts.
### Merge protocol per wave
```
for each wave:
dispatch agent(s) with isolation: worktree + owned-file list
on completion:
integrator: merge worktree(s) in matrix order
integrator: dotnet build -c Debug (must be green)
integrator: dotnet test (green, count >= prior)
integrator: dotnet publish -r win-x64 AND -r linux-x64 (must succeed)
integrator: verify phase-specific gate checklist
gate green? -> next wave. gate red? -> fix in a single-agent pass, re-gate.
```
---
## 4. Cross-Cutting Test Strategy
- **Existing baseline (325 = 282 unit + 43 E2E) must never regress.** Every
gate re-runs the full suite.
- **New tests target pure logic** — `DiagnosticSinkSelector` is a pure function
precisely so platform-selection is testable without being a service. Highest-
value new test.
- **OS-conditional tests** use `[Trait]` + a runtime `OperatingSystem.IsWindows()`
skip so the suite is green on both Windows and Linux.
- **Both platforms are exercised every gate, no simulation.** Windows runs
locally (admin rights → real Windows Service install). Linux runs on
`dohertj2@10.100.0.35` (Debian 13, systemd 257) — the integrator drives
`dotnet build` / `dotnet test` / publish / systemd E2E over SSH.
- **CI** (if/when a pipeline exists): add a `linux-x64` build+test leg, ideally
pointed at the same box or an equivalent image. Until then the integrator's
per-gate SSH run on `10.100.0.35` is the Linux leg.
- **CA1416 platform analyzer** is treated as a test — `TreatWarningsAsErrors`
already fails the build if a Windows API escapes its guard.
---
## 5. Risk Register
| Risk | Phase | Mitigation |
| --------------------------------------------- | ----- | -------------------------------------------------------------------------- |
| Windows Service behavior regresses unnoticed | P3 | Gate 3 mandates a real Windows Service install/start/stop smoke check |
| `Serilog.Sinks.Syslog` version drift | P2 | Pin the version; `SyslogBridge` is isolated behind `DiagnosticSinkSelector` |
| Linux publish ships Windows config path | P4 | RID-conditioned `<Content>` item + `MbproxyOptionsBindingTests` on both templates |
| Self-extracting single-file temp-dir perms | P1/P5 | Document + set `DOTNET_BUNDLE_EXTRACT_BASE_DIR` in the systemd unit |
| Two agents racing `csproj` | all | Matrix forbids it — `csproj` edited only in single-agent waves W1W3 + lone P4 |
| Hidden Windows path elsewhere in code | all | `Grep` sweep for `C:\\`, `ProgramData`, `\\\\` before Gate 6 |
| Parallel Wave-4 agents collide on the shared `10.100.0.35` host | W4 | Rule 8 — service-level E2E is integrator-only and serial; subagent E2E uses per-agent `--rm` Docker containers |
| Windows Service E2E leaves stale service/Event Log source | P2/P3 | Integrator always runs `uninstall.ps1` after each Windows gate |
---
## 6. Deliverable Summary
- **3 modified source files** (`csproj`, `HostingExtensions.cs`, `Program.cs`)
+ **3 new** (`DiagnosticSinkSelector.cs`, `SyslogBridge.cs`, and the
truncation-helper extraction in `EventLogBridge.cs`).
- **2 new packages** (`Microsoft.Extensions.Hosting.Systemd`,
`Serilog.Sinks.Syslog`).
- **6 new install/tooling files** (`publish.sh`, Linux config template,
`mbproxy.service`, `install.sh`, `uninstall.sh`).
- **~68 new tests** across 3 new/extended test files; baseline 325 preserved.
- **7 doc files** updated.
- **4 waves**, max 3 concurrent subagents, conflict-free by construction.