diff --git a/.gitignore b/.gitignore index c6c3a4a..f0b93db 100644 --- a/.gitignore +++ b/.gitignore @@ -45,6 +45,7 @@ build/ out/ tmp/ temp/ +install/ # .NET **/bin/ diff --git a/REVIEW-PROCESS.md b/REVIEW-PROCESS.md new file mode 100644 index 0000000..f53825d --- /dev/null +++ b/REVIEW-PROCESS.md @@ -0,0 +1,140 @@ +# Code Review Process + +This document describes how to perform a comprehensive, per-module code review of +the `mxaccessgw` codebase and how to track findings to resolution. + +A **module** is one buildable project under `src/` (e.g. `src/ZB.MOM.WW.MxGateway.Worker`) +or one language client under `clients/` (e.g. `clients/rust`). Each module has +its own folder under `code-reviews/` containing a single `findings.md`. + +## 1. Before you start + +1. Pick the module to review. Its folder is `code-reviews//`: + - For a `src/` project, `` is the project name with the `ZB.MOM.WW.MxGateway.` + prefix stripped — `src/ZB.MOM.WW.MxGateway.Server` is reviewed in `code-reviews/Server/`. + - For a language client, `` is `Client.` — `clients/rust` is + reviewed in `code-reviews/Client.Rust/`. +2. Identify the design context for the module: + - `gateway.md` — top-level architecture, command/event surface, IPC envelope, + STA thread model, fault handling. + - The relevant component design docs under `docs/` (e.g. + `docs/MxAccessWorkerInstanceDesign.md`, `docs/GatewayProcessDesign.md`, + `docs/Sessions.md`, `docs/Authentication.md`, `docs/GalaxyRepository.md`). + - `docs/DesignDecisions.md` for the v1 design choices. + - The **Repository-Specific Conventions** and **Process / Platform Notes** in + `CLAUDE.md`. +3. Record the exact commit being reviewed: `git rev-parse --short HEAD`. Every + review is a snapshot — a finding only means something relative to a known + commit. +4. Open `code-reviews//findings.md` and fill in the header table + (reviewer, date, commit SHA, status). + +## 2. Review checklist + +Work through **every** category below for the module. A comprehensive review +means the checklist is completed even where it produces no findings — record +"No issues found" for a category rather than leaving it ambiguous. + +1. **Correctness & logic bugs** — off-by-one, null handling, incorrect + conditionals, misuse of APIs, broken edge cases. +2. **mxaccessgw conventions** — the rules in `CLAUDE.md` and the style guides + under `docs/style-guides/`: the gateway never instantiates MXAccess COM + directly; all MXAccess COM calls run on the worker's dedicated STA thread and + the STA loop pumps Windows messages; IPC uses one bidirectional named pipe per + worker carrying length-prefixed `WorkerEnvelope` protobuf frames; MXAccess + parity is the contract (don't "fix" surprising MXAccess behaviour, never + synthesize events); one worker and one event subscriber per session; the + gateway terminates orphan workers on startup and does not reattach; C# style + (file-scoped namespaces, `sealed` by default, `Async` suffix, MXAccess-aligned + names); no Blazor UI component libraries; no logging of secrets or full tag + values; generated code is never hand-edited. +3. **Concurrency & thread safety** — shared mutable state, STA affinity, race + conditions, correct use of `async`/`await`, locking, disposal races. +4. **Error handling & resilience** — exception paths, worker crash / reconnect + handling, fail-fast event backpressure, transient vs permanent error + classification, graceful degradation, correct gRPC status codes. +5. **Security** — authentication/authorization checks, API-key scope enforcement, + input validation, SQL injection in the Galaxy Repository RPCs, secret + handling, the dashboard anonymous-localhost bypass, logging of sensitive data. +6. **Performance & resource management** — `IDisposable` disposal, pipe / stream + / COM lifetimes, buffering and back-pressure, unnecessary allocations on hot + paths, N+1 queries. +7. **Design-document adherence** — does the code match `gateway.md`, the relevant + `docs/` component designs, `docs/DesignDecisions.md`, and `CLAUDE.md`? Flag + both code that drifts from the design and design docs that are now stale. +8. **Code organization & conventions** — namespace hierarchy, project layout, the + Options pattern, separation of concerns, additive-only contract evolution. +9. **Testing coverage** — are the module's behaviours covered by tests + (`src/ZB.MOM.WW.MxGateway.Tests`, `src/ZB.MOM.WW.MxGateway.Worker.Tests`, + `src/ZB.MOM.WW.MxGateway.IntegrationTests`)? Note untested critical paths and missing + edge-case tests. +10. **Documentation & comments** — XML doc accuracy, misleading or stale comments, + undocumented non-obvious behaviour. + +## 3. Recording findings + +Add one entry per finding to the `## Findings` section of the module's +`findings.md`, using the entry format in +[`_template/findings.md`](code-reviews/_template/findings.md). + +- **Finding ID** — `-NNN`, numbered sequentially within the module and + never reused (e.g. `Worker-001`). IDs are permanent even after resolution. +- **Severity:** + - **Critical** — data loss, security breach, crash/deadlock, or outage. + - **High** — incorrect behaviour with significant impact; no safe workaround. + - **Medium** — incorrect or risky behaviour with limited impact or a workaround. + - **Low** — minor issues, style, maintainability, documentation. +- **Category** — one of the 10 checklist categories above. +- **Location** — `file:line` (clickable), or a list of locations. +- **Description** — what is wrong and why it matters. +- **Recommendation** — concrete suggested fix. + +After recording findings, update the module header table (status, open-finding +count) and regenerate the base README (step 5). + +## 4. Marking an item resolved + +Findings are **never deleted** — they are an audit trail. To close one, change +its **Status** and complete the **Resolution** field: + +- `Open` — newly recorded, not yet addressed. +- `In Progress` — a fix is actively being worked on. +- `Resolved` — fixed. The Resolution field must state the fixing commit SHA, the + date, and a one-line description of the fix. +- `Won't Fix` — intentionally not fixed. The Resolution field must justify why. +- `Deferred` — valid but postponed. The Resolution field must say what it is + waiting on (e.g. a tracked issue or a later milestone). + +`Resolved`, `Won't Fix`, and `Deferred` findings are all considered **closed**. +`Open` and `In Progress` are **pending** and appear in the base README's Pending +Findings table. + +## 5. Updating the base README + +`code-reviews/README.md` holds the single cross-module view (the Module Status +table and the Pending / Closed Findings tables). It is **generated** from the +per-module `findings.md` files — do not edit it by hand. + +After any review or status change, regenerate it: + +``` +python code-reviews/regen-readme.py +``` + +`regen-readme.py --check` exits non-zero if `README.md` is stale, if a module +header's `Open findings` count disagrees with its finding statuses, or if a +finding carries an unrecognised Status value. The PowerShell wrapper +`scripts/check-code-reviews-readme.ps1` runs that check and is the intended hook +for CI or a pre-commit step. + +> The repo's installed `python` is the real interpreter; the bare `python3` +> alias resolves to the Windows Store stub and fails. Use `python`. + +The per-module `findings.md` files are the source of truth; `README.md` is the +aggregated index and must always agree with them — which the script guarantees. + +## 6. Re-reviewing a module + +Re-reviews append to the same `findings.md`. Update the header to the new commit +and date, continue the finding numbering from the last used ID, and leave prior +findings (including closed ones) in place as history. diff --git a/glauth.md b/glauth.md new file mode 100644 index 0000000..9ade93a --- /dev/null +++ b/glauth.md @@ -0,0 +1,280 @@ +# GLAuth — LDAP authn reference for mxaccessgw + +GLAuth is a lightweight LDAP server installed on this dev box at +`C:\publish\glauth\` and run as a Windows service via NSSM. It already +backs the LmxOpcUa OPC UA server's UserName-token authn and the LmxOpcUa +Admin UI's cookie login; this doc captures everything mxaccessgw needs +to consume the same directory so a single set of dev credentials covers +both stacks. + +The authoritative copy of LmxOpcUa's reference lives at +`C:\publish\glauth\auth.md`. This doc is a redistilled view tailored to +mxaccessgw — what users + groups are already provisioned, how to bind +against them, and what's needed to add a gw-specific role. + +## Connection details + +| Setting | Value | +|---|---| +| Protocol | LDAP (unencrypted) | +| Host | `localhost` | +| Port | `3893` | +| LDAPS | disabled in dev (set `[ldaps]` block to enable) | +| Base DN | `dc=lmxopcua,dc=local` | +| Bind DN format | `cn={username},dc=lmxopcua,dc=local` | +| Group OU | `ou=,ou=groups,dc=lmxopcua,dc=local` | +| Failed-bind throttle | 3 fails → 10-minute IP lockout (per `[behaviors]`) | + +## Pre-existing groups (LmxOpcUa role taxonomy) + +These map cleanly onto MxAccess capability boundaries — mxaccessgw +should reuse them rather than define parallel groups so an operator with +LmxOpcUa write rights doesn't need a second account for the gw. + +| Group | GID | DN | LmxOpcUa meaning | Suggested mxgw mapping | +|---|---|---|---|---| +| ReadOnly | 5501 | `ou=ReadOnly,ou=groups,dc=lmxopcua,dc=local` | Browse + read OPC UA nodes | `Browse` + `Subscribe` (read paths only) | +| WriteOperate | 5502 | `ou=WriteOperate,ou=groups,dc=lmxopcua,dc=local` | Write FreeAccess / Operate attrs | `Write` (plain) | +| WriteTune | 5504 | `ou=WriteTune,ou=groups,dc=lmxopcua,dc=local` | Write Tune attrs | `WriteSecured` (Tune only) | +| WriteConfigure | 5505 | `ou=WriteConfigure,ou=groups,dc=lmxopcua,dc=local` | Write Configure attrs | `WriteSecured` (Configure) | +| AlarmAck | 5503 | `ou=AlarmAck,ou=groups,dc=lmxopcua,dc=local` | Acknowledge alarms | gw alarm-ack RPC, when added | + +**A user can be in multiple groups** — `othergroups = [...]` in the +config is a list. `admin` is the canonical example (in every role +group below). + +## Pre-provisioned users + +| Username | Password | UID | Primary group | Other groups | Capabilities | +|---|---|---|---|---|---| +| `readonly` | `readonly123` | 5001 | ReadOnly | — | Browse, read | +| `writeop` | `writeop123` | 5002 | WriteOperate | — | + plain Write | +| `writetune` | `writetune123` | 5005 | WriteTune | — | + WriteSecured (Tune) | +| `writeconfig` | `writeconfig123` | 5006 | WriteConfigure | — | + WriteSecured (Configure) | +| `alarmack` | `alarmack123` | 5003 | AlarmAck | — | Alarm acknowledgment | +| `admin` | `admin123` | 5004 | ReadOnly | WriteOperate, AlarmAck, WriteTune, WriteConfigure | All roles | +| `serviceaccount` | `serviceaccount123` | 5999 | ReadOnly | — | LDAP search capability (for bind-then-search) | + +For mxaccessgw dev, `admin` covers every gw-side capability test; +`readonly` is the right "negative" case for proving Browse-OK / +Write-denied. + +The gateway dashboard adds one role beyond this LmxOpcUa taxonomy: +`GwAdmin`. `LdapOptions.RequiredGroup` defaults to `GwAdmin`, so the +dashboard login and `DashboardLdapLiveTests` require `admin` to be a +member of a `GwAdmin` group. `GwAdmin` is **not** in the baseline +GLAuth config — it must be provisioned before dashboard authn or the +LDAP live tests work. See [Provisioning the GwAdmin +group](#provisioning-the-gwadmin-group) below. + +## Two bind patterns + +### 1. Direct bind (simplest) + +``` +DN: cn=admin,dc=lmxopcua,dc=local +Password: admin123 +``` + +Construct the DN from the username; bind. Works on GLAuth because +`backend.nameformat = "cn"` and `groupformat = "ou"` are set in the +config. **Doesn't translate to Active Directory** — AD users are keyed +by `sAMAccountName`, not `cn`. Use this only for dev convenience. + +### 2. Bind-then-search (production-grade) + +``` +1. Bind as the service account (cn=serviceaccount,dc=lmxopcua,dc=local + / serviceaccount123). +2. Search under dc=lmxopcua,dc=local with filter + (uid=) — or any attribute the deployment + identifies users by. GLAuth populates uid + cn. +3. Read the returned entry's DN + memberOf list (groups). +4. Bind again as the discovered DN with the entered password. If that + succeeds, authn passes; the memberOf values become the role set. +``` + +The second bind is the actual password check — the search is just a DN +discovery. This is the AD-friendly path: AD's +`tokenGroups` / `LDAP_MATCHING_RULE_IN_CHAIN` flatten nested groups, but +that's an enhancement, not required for first-pass dev. + +LmxOpcUa's `Server/Security/LdapUserAuthenticator.cs` ships a working +implementation of this pattern using `Novell.Directory.Ldap.NETStandard` +v3.6.0 — copy the bind-then-search loop from there if mxaccessgw wants +to avoid re-deriving the LDAP escape-string handling. + +## Suggested mxgw configuration shape + +A YAML/JSON section for mxaccessgw that mirrors LmxOpcUa's `LdapOptions` +record: + +```yaml +ldap: + enabled: true + server: localhost + port: 3893 + useTls: false + allowInsecureLdap: true # dev only + searchBase: "dc=lmxopcua,dc=local" + serviceAccountDn: "cn=serviceaccount,dc=lmxopcua,dc=local" + serviceAccountPassword: "serviceaccount123" + userNameAttribute: "uid" # GLAuth populates this; AD uses sAMAccountName + displayNameAttribute: "cn" + groupAttribute: "memberOf" + groupToRole: + ReadOnly: "Browse" + WriteOperate: "Write" + WriteTune: "WriteSecured" + WriteConfigure: "WriteSecured" + AlarmAck: "AlarmAck" +``` + +`groupAttribute` returns full DNs like +`ou=ReadOnly,ou=groups,dc=lmxopcua,dc=local` — the authenticator +should strip the leading `ou=` (or `cn=` against AD) RDN value and +look that up in `groupToRole`. + +## Provisioning the GwAdmin group + +`GwAdmin` is the gateway-specific dashboard-admin role. It is the +default `LdapOptions.RequiredGroup`, so the dashboard cookie login and +`DashboardLdapLiveTests` (`MXGATEWAY_RUN_LIVE_LDAP_TESTS=1`) reject +`admin` until a `GwAdmin` group exists and `admin` is a member. +GLAuth's baseline config ships only the five LmxOpcUa role groups, so +`GwAdmin` must be added to GLAuth rather than run from a separate LDAP +server: + +1. Edit `C:\publish\glauth\glauth.cfg` +2. Append the group: + +```toml +[[groups]] + name = "GwAdmin" + gidnumber = 5510 # pick the next free GID +``` + +3. Add `5510` to `admin`'s `othergroups` list so `admin` resolves the + `GwAdmin` role. Add it to any other user that needs dashboard-admin + rights. Or create a dedicated user: + +```toml +[[users]] + name = "gwadmin" + givenname = "Gateway" + sn = "Admin" + mail = "gwadmin@lmxopcua.local" + uidnumber = 5010 + primarygroup = 5510 + passsha256 = "" +``` + +4. `nssm restart GLAuth` + +After the restart, `admin`'s `memberOf` includes +`ou=GwAdmin,ou=groups,dc=lmxopcua,dc=local`, which the authenticator +strips to `GwAdmin` and matches against `RequiredGroup`. The same +pattern applies to any future permission that doesn't fit the existing +five roles. + +Generate `passsha256` from a plaintext password: + +```powershell +# Windows / PowerShell +$bytes = [System.Text.Encoding]::UTF8.GetBytes("yourpassword") +$hash = [System.Security.Cryptography.SHA256]::Create().ComputeHash($bytes) +-join ($hash | ForEach-Object { $_.ToString("x2") }) +``` + +```bash +# WSL / git-bash +echo -n "yourpassword" | openssl dgst -sha256 +``` + +## Quick verification + +From mxaccessgw's dev box, prove the directory is reachable: + +```powershell +# Plain bind via PowerShell + System.DirectoryServices.Protocols +$ldap = New-Object System.DirectoryServices.Protocols.LdapConnection("localhost:3893") +$ldap.AuthType = [System.DirectoryServices.Protocols.AuthType]::Basic +$ldap.SessionOptions.ProtocolVersion = 3 +$ldap.SessionOptions.SecureSocketLayer = $false +$cred = New-Object System.Net.NetworkCredential("cn=admin,dc=lmxopcua,dc=local","admin123") +$ldap.Bind($cred) +"Bind OK" +``` + +Or via `ldapsearch` if you have OpenLDAP CLI tools: + +```bash +ldapsearch -x -H ldap://localhost:3893 \ + -D "cn=admin,dc=lmxopcua,dc=local" -w admin123 \ + -b "dc=lmxopcua,dc=local" "(uid=admin)" +``` + +The response should list `admin`'s entry with `memberOf` populated for +all five role groups — plus `GwAdmin` once the gateway-specific group +is provisioned. + +## Service management + +```powershell +# Status / start / stop / restart +nssm status GLAuth +nssm start GLAuth +nssm stop GLAuth +nssm restart GLAuth + +# Inspect what NSSM was told to launch +nssm get GLAuth Parameters +``` + +Logs: + +| File | Purpose | +|---|---| +| `C:\publish\glauth\logs\stdout.log` | Bind events, search responses | +| `C:\publish\glauth\logs\stderr.log` | Startup errors, config parse failures | + +After editing `glauth.cfg`, always tail `stderr.log` after the restart +to catch a fat-fingered TOML before it bites at first bind: + +```powershell +nssm restart GLAuth +Get-Content C:\publish\glauth\logs\stderr.log -Tail 20 -Wait +``` + +## Active Directory migration cheat-sheet + +LmxOpcUa's `LdapOptions` xml-doc captures the AD overrides; same set +applies to mxaccessgw verbatim. Keys that change: + +| Field | GLAuth dev value | AD production value | +|---|---|---| +| `Server` | `localhost` | a domain controller FQDN, or the domain itself | +| `Port` | `3893` | `636` (LDAPS) — AD increasingly rejects plain bind under LDAP-signing enforcement | +| `UseTls` | `false` | `true` | +| `AllowInsecureLdap` | `true` | `false` | +| `SearchBase` | `dc=lmxopcua,dc=local` | `DC=corp,DC=example,DC=com` | +| `ServiceAccountDn` | `cn=serviceaccount,dc=lmxopcua,dc=local` | `CN=MxGwSvc,OU=Service Accounts,DC=corp,...` | +| `UserNameAttribute` | `uid` | `sAMAccountName` (or `userPrincipalName`) | +| `GroupAttribute` | `memberOf` (unchanged) | `memberOf` (unchanged) | + +`memberOf` returns full DNs; the authenticator strips the leading +`CN=` value and uses it as the lookup key in `groupToRole`. Nested +groups are **not** auto-expanded; either flatten in the directory or +add a `tokenGroups` query as an enhancement. + +## Security notes for production + +- **Plaintext passwords in `glauth.cfg` are dev-only.** The config is + unencrypted on disk; anyone with read access to `C:\publish\glauth\` + can SHA256-rainbow-table the entries. Treat the dev creds as + throwaway. Production LDAP is Active Directory. +- The 3-fail / 10-minute lockout is per source IP, not per user — a + shared NAT can lock out a whole office. Tunable in `[behaviors]`. +- LDAPS isn't enabled in dev; binding sends passwords cleartext on the + wire. Fine for `localhost`, never expose port 3893 off-box without + enabling TLS first. diff --git a/scripts/bench-read-bulk.ps1 b/scripts/bench-read-bulk.ps1 new file mode 100644 index 0000000..28e4cb3 --- /dev/null +++ b/scripts/bench-read-bulk.ps1 @@ -0,0 +1,389 @@ +<# +.SYNOPSIS +Cross-language ReadBulk stress benchmark driver. + +.DESCRIPTION +Launches the bench-read-bulk subcommand of every client CLI (.NET, Go, Rust, +Python, Java) concurrently against a running gateway and worker. Each client +opens its own session, subscribes to -BulkSize tags so the worker's per-session +MxAccessValueCache populates from real OnDataChange events, then hammers +ReadBulk in a tight in-process loop for -DurationSeconds with per-call +high-resolution latency capture. Each emits a single JSON stats object on +stdout; this script collates the five into a comparison table. + +The gateway and worker are assumed to be running at -Endpoint with the API +key in $env:. + +.PARAMETER Clients +Which clients to run. Defaults to all five. + +.PARAMETER Endpoint +gRPC endpoint of the gateway. Default localhost:5120. + +.PARAMETER ApiKeyEnv +Environment variable holding the API key. Default MXGATEWAY_API_KEY. + +.PARAMETER DurationSeconds +Steady-state measurement window per client. + +.PARAMETER WarmupSeconds +Warm-up window per client (calls during this window are discarded). + +.PARAMETER BulkSize +Number of tags per ReadBulk call. + +.PARAMETER TagStart +First machine number per client. Each client uses a contiguous range starting +here, so machine ranges do not overlap when -DistinctTags is set. + +.PARAMETER TagPrefix +Tag prefix (machine number is appended as %03d). + +.PARAMETER TagAttribute +Attribute appended to each tag. + +.PARAMETER DistinctTags +When set, each client uses its own slice of tags (clients[i] starts at +TagStart + i * BulkSize). When unset (default), all clients hit the same +tags to maximise contention on the worker's value cache. + +.PARAMETER ReportPath +Where to persist the combined report. Defaults to artifacts/bench/... +#> +[CmdletBinding()] +param( + [string[]]$Clients = @("dotnet", "go", "rust", "python", "java"), + [string]$Endpoint = "localhost:5120", + [string]$ApiKeyEnv = "MXGATEWAY_API_KEY", + [int]$DurationSeconds = 30, + [int]$WarmupSeconds = 3, + [int]$BulkSize = 6, + [int]$TagStart = 1, + [string]$TagPrefix = "TestMachine_", + [string]$TagAttribute = "TestChangingInt", + [int]$TimeoutMs = 1500, + [switch]$DistinctTags, + [string]$ReportPath +) + +Set-StrictMode -Version Latest +$ErrorActionPreference = "Stop" + +$repoRoot = Resolve-Path (Join-Path $PSScriptRoot "..") +$validClients = @("dotnet", "go", "rust", "python", "java") +foreach ($c in $Clients) { + if ($validClients -notcontains $c) { + throw "Unsupported client '$c'. Valid: $($validClients -join ', ')." + } +} + +if ([string]::IsNullOrWhiteSpace($ReportPath)) { + $timestamp = Get-Date -Format "yyyyMMdd-HHmmss" + $ReportPath = Join-Path $repoRoot "artifacts/bench/bench-read-bulk-$timestamp.json" +} +$reportDir = Split-Path -Parent $ReportPath +if (-not (Test-Path $reportDir)) { + New-Item -ItemType Directory -Path $reportDir -Force | Out-Null +} + +$apiKeyValue = (Get-Item -Path "Env:$ApiKeyEnv" -ErrorAction SilentlyContinue).Value +if ([string]::IsNullOrWhiteSpace($apiKeyValue)) { + throw "The API key environment variable '$ApiKeyEnv' is not set. Define it before running the bench." +} + +# Temp dir for per-client stdout/stderr capture + (Java only) a one-shot +# wrapper .bat that handles cmd.exe's quoting rules for `gradle --args="..."`. +$tmpDir = Join-Path ([System.IO.Path]::GetTempPath()) "mxgw-bench-$([guid]::NewGuid())" +New-Item -ItemType Directory -Path $tmpDir -Force | Out-Null + +function ConvertTo-HttpEndpoint { + param([string]$Value) + if ($Value -match '^https?://') { return $Value } + return "http://$Value" +} + +function ConvertTo-HostEndpoint { + param([string]$Value) + return ($Value -replace '^https?://', '') +} + +# Build the per-client command array. Each client gets its own tag range when +# -DistinctTags is set so the workers race against distinct cache slices. +function Get-ClientCommand { + param( + [string]$Client, + [int]$ClientIndex + ) + + $effectiveTagStart = if ($DistinctTags) { $TagStart + ($ClientIndex * $BulkSize) } else { $TagStart } + $httpEndpoint = ConvertTo-HttpEndpoint -Value $Endpoint + $hostEndpoint = ConvertTo-HostEndpoint -Value $Endpoint + $clientName = "mxgw-$Client-bench" + + # Per-call gRPC timeout must exceed (DurationSeconds + WarmupSeconds + slack) + # — otherwise the channel-wide timeout cancels the bench mid-loop. + $callTimeoutSeconds = [int]([Math]::Max(60, $DurationSeconds + $WarmupSeconds + 30)) + + switch ($Client) { + "dotnet" { + # -c Release matches the rest of the matrix: HotSpot/Tier 1 JIT + # closes most of the debug/release gap for .NET on its own, but + # Release also disables JIT inline thresholds that hurt Stopwatch- + # bracketed measurement noise. Same project must have been built + # in Release at least once before this --no-build invocation. + $cliArgs = @( + "run", "--project", "clients/dotnet/ZB.MOM.WW.MxGateway.Client.Cli", "-c", "Release", "--no-build", "--", + "bench-read-bulk", + "--endpoint", $httpEndpoint, + "--api-key-env", $ApiKeyEnv, + "--timeout", "${callTimeoutSeconds}s", + "--client-name", $clientName, + "--duration-seconds", "$DurationSeconds", + "--warmup-seconds", "$WarmupSeconds", + "--bulk-size", "$BulkSize", + "--tag-start", "$effectiveTagStart", + "--tag-prefix", $TagPrefix, + "--tag-attribute", $TagAttribute, + "--timeout-ms", "$TimeoutMs", + "--json" + ) + return [pscustomobject]@{ file = "dotnet"; args = $cliArgs; cwd = $repoRoot } + } + "go" { + $cliArgs = @( + "run", "./cmd/mxgw-go", "bench-read-bulk", + "-endpoint", $hostEndpoint, + "-api-key-env", $ApiKeyEnv, + "-plaintext", + "-json", + "-client-name", $clientName, + "-duration-seconds", "$DurationSeconds", + "-warmup-seconds", "$WarmupSeconds", + "-bulk-size", "$BulkSize", + "-tag-start", "$effectiveTagStart", + "-tag-prefix", $TagPrefix, + "-tag-attribute", $TagAttribute, + "-timeout-ms", "$TimeoutMs" + ) + return [pscustomobject]@{ file = "go"; args = $cliArgs; cwd = (Join-Path $repoRoot "clients/go") } + } + "rust" { + # --release is essential: Rust debug builds disable inlining and + # add overflow checks, which costs the bench ~45% of throughput + # and ~3x of p99 latency vs release. The other compiled clients + # don't have this gap (go run is optimized, dotnet/java run JIT- + # optimized after the warmup window). + $cliArgs = @( + "run", "--release", "--quiet", "-p", "mxgw-cli", "--", + "bench-read-bulk", + "--endpoint", $httpEndpoint, + "--api-key-env", $ApiKeyEnv, + "--client-name", $clientName, + "--duration-seconds", "$DurationSeconds", + "--warmup-seconds", "$WarmupSeconds", + "--bulk-size", "$BulkSize", + "--tag-start", "$effectiveTagStart", + "--tag-prefix", $TagPrefix, + "--tag-attribute", $TagAttribute, + "--timeout-ms", "$TimeoutMs", + "--json" + ) + return [pscustomobject]@{ file = "cargo"; args = $cliArgs; cwd = (Join-Path $repoRoot "clients/rust") } + } + "python" { + $cliArgs = @( + "-m", "mxgateway_cli", "bench-read-bulk", + "--endpoint", $hostEndpoint, + "--api-key-env", $ApiKeyEnv, + "--plaintext", + "--client-name", $clientName, + "--duration-seconds", "$DurationSeconds", + "--warmup-seconds", "$WarmupSeconds", + "--bulk-size", "$BulkSize", + "--tag-start", "$effectiveTagStart", + "--tag-prefix", $TagPrefix, + "--tag-attribute", $TagAttribute, + "--timeout-ms", "$TimeoutMs", + "--json" + ) + $python = 'C:\Users\dohertj2\AppData\Local\Programs\Python\Python312\python.exe' + return [pscustomobject]@{ file = $python; args = $cliArgs; cwd = (Join-Path $repoRoot "clients/python"); pythonpath = (Join-Path $repoRoot "clients/python/src") } + } + "java" { + $inner = @( + "bench-read-bulk", + "--endpoint", $hostEndpoint, + "--api-key-env", $ApiKeyEnv, + "--plaintext", + "--json", + "--client-name", $clientName, + "--duration-seconds", "$DurationSeconds", + "--warmup-seconds", "$WarmupSeconds", + "--bulk-size", "$BulkSize", + "--tag-start", "$effectiveTagStart", + "--tag-prefix", $TagPrefix, + "--tag-attribute", $TagAttribute, + "--timeout-ms", "$TimeoutMs" + ) + $gradle = (Get-Command "gradle.bat", "gradle.cmd", "gradle.exe", "gradle" -ErrorAction SilentlyContinue | Select-Object -First 1) + if ($null -eq $gradle) { throw "gradle not on PATH; required for the Java bench." } + # Start-Process with ArgumentList mangles the `--args="..."` quoting + # cmd.exe needs to keep the whole bench-args expression as a single + # gradle argument. Workaround: write a one-shot .bat that contains + # the literal gradle command line and invoke that batch via cmd. + $batPath = Join-Path $tmpDir "java-bench.bat" + $batContent = '@echo off' + "`r`n" + + '"' + $gradle.Source + '" --quiet :mxgateway-cli:run "--args=' + ($inner -join ' ') + '"' + "`r`n" + Set-Content -Path $batPath -Value $batContent -Encoding ASCII + return [pscustomobject]@{ file = "cmd.exe"; args = @("/c", $batPath); cwd = (Join-Path $repoRoot "clients/java") } + } + } +} + +# Start one detached process per client and wait for all. Stdout (the JSON +# stats line) is captured to a per-client tmp file; stderr is captured too in +# case a bench crashed. +$jobs = @() + +Write-Host "Launching $($Clients.Count) concurrent benches against $Endpoint (duration=$($DurationSeconds)s, warmup=$($WarmupSeconds)s, bulkSize=$BulkSize, distinctTags=$([bool]$DistinctTags))" + +for ($i = 0; $i -lt $Clients.Count; $i++) { + $client = $Clients[$i] + $cmd = Get-ClientCommand -Client $client -ClientIndex $i + $stdoutPath = Join-Path $tmpDir "$client.out" + $stderrPath = Join-Path $tmpDir "$client.err" + $startArgs = @{ + FilePath = $cmd.file + ArgumentList = $cmd.args + WorkingDirectory = $cmd.cwd + RedirectStandardOutput = $stdoutPath + RedirectStandardError = $stderrPath + NoNewWindow = $true + PassThru = $true + } + if ($cmd.PSObject.Properties['pythonpath']) { + # Python needs PYTHONPATH so the editable mxgateway_cli module resolves. + $env:PYTHONPATH = $cmd.pythonpath + } + $process = Start-Process @startArgs + $jobs += [pscustomobject]@{ client = $client; process = $process; stdoutPath = $stdoutPath; stderrPath = $stderrPath } + Write-Host " [$client] pid=$($process.Id)" +} + +foreach ($job in $jobs) { + $job.process.WaitForExit() +} + +# Parse one JSON line per client. The line is typically the last +# `{`-prefixed line in stdout (gradle, dotnet run, cargo run can emit log +# noise before it). +function Get-JsonStats { + param([string]$Path) + if (-not (Test-Path $Path)) { return $null } + $content = Get-Content -Path $Path -Raw + if ([string]::IsNullOrWhiteSpace($content)) { return $null } + + # Scan from the LAST top-level `{` (the bench JSON is the final structured + # output line; earlier text may be log noise from `dotnet run` / `cargo + # run` / `gradle :run`). Walk forward counting braces to locate the + # matching `}` so nested objects like `latencyMs` don't confuse the parser. + $startIndex = -1 + $depth = 0 + for ($i = $content.Length - 1; $i -ge 0; $i--) { + $ch = $content[$i] + if ($ch -eq '}') { $depth++ } + elseif ($ch -eq '{') { + $depth-- + if ($depth -eq 0) { $startIndex = $i; break } + } + } + if ($startIndex -lt 0) { return $null } + + $endIndex = -1 + $depth = 0 + for ($i = $startIndex; $i -lt $content.Length; $i++) { + $ch = $content[$i] + if ($ch -eq '{') { $depth++ } + elseif ($ch -eq '}') { + $depth-- + if ($depth -eq 0) { $endIndex = $i; break } + } + } + if ($endIndex -lt 0) { return $null } + + $json = $content.Substring($startIndex, $endIndex - $startIndex + 1) + try { return $json | ConvertFrom-Json } + catch { return $null } +} + +$results = @() +foreach ($job in $jobs) { + $stats = Get-JsonStats -Path $job.stdoutPath + if ($null -eq $stats) { + $stderr = if (Test-Path $job.stderrPath) { (Get-Content -Path $job.stderrPath -Raw) } else { "" } + Write-Warning "[$($job.client)] no JSON stats parsed; exit=$($job.process.ExitCode); stderr=$([string]::IsNullOrWhiteSpace($stderr) ? '(empty)' : $stderr.Substring(0, [Math]::Min(300, $stderr.Length)))" + $results += [pscustomobject]@{ client = $job.client; exitCode = $job.process.ExitCode; stats = $null; stderr = $stderr } + } else { + $results += [pscustomobject]@{ client = $job.client; exitCode = $job.process.ExitCode; stats = $stats; stderr = $null } + } +} + +# Pretty-print a side-by-side table. +$rows = foreach ($r in $results) { + if ($null -eq $r.stats) { + [pscustomobject]@{ + client = $r.client + "calls/sec" = "ERR" + "total" = "-" + "ok" = "-" + "fail" = "-" + "cached/total" = "-" + "p50 ms" = "-" + "p95 ms" = "-" + "p99 ms" = "-" + "max ms" = "-" + "mean ms" = "-" + } + } else { + $s = $r.stats + [pscustomobject]@{ + client = $s.language + "calls/sec" = $s.callsPerSecond + "total" = $s.totalCalls + "ok" = $s.successfulCalls + "fail" = $s.failedCalls + "cached/total" = "$($s.cachedReadResults)/$($s.totalReadResults)" + "p50 ms" = $s.latencyMs.p50 + "p95 ms" = $s.latencyMs.p95 + "p99 ms" = $s.latencyMs.p99 + "max ms" = $s.latencyMs.max + "mean ms" = $s.latencyMs.mean + } + } +} + +$rows | Format-Table -AutoSize | Out-Host + +$report = [pscustomobject]@{ + schemaVersion = 1 + endpoint = $Endpoint + apiKeyEnv = $ApiKeyEnv + durationSeconds = $DurationSeconds + warmupSeconds = $WarmupSeconds + bulkSize = $BulkSize + distinctTags = [bool]$DistinctTags + tagPrefix = $TagPrefix + tagAttribute = $TagAttribute + startedAt = (Get-Date).ToUniversalTime().ToString("o") + clients = $results | ForEach-Object { + [ordered]@{ + client = $_.client + exitCode = $_.exitCode + stats = $_.stats + } + } +} +$report | ConvertTo-Json -Depth 12 | Set-Content -Path $ReportPath -Encoding UTF8 +Write-Host "Combined report written to: $ReportPath" + +Remove-Item -Path $tmpDir -Recurse -Force -ErrorAction SilentlyContinue diff --git a/scripts/check-code-reviews-readme.ps1 b/scripts/check-code-reviews-readme.ps1 new file mode 100644 index 0000000..78f5a65 --- /dev/null +++ b/scripts/check-code-reviews-readme.ps1 @@ -0,0 +1,20 @@ +# Verifies code-reviews/README.md is regenerated from, and consistent with, the +# per-module findings.md files. Intended as a CI / pre-commit gate. +# +# Exits non-zero when README.md is stale, when a module header's "Open findings" +# count disagrees with its finding statuses, or when a finding carries an +# unrecognised Status value. See REVIEW-PROCESS.md section 5. + +[CmdletBinding()] +param() + +Set-StrictMode -Version Latest +$ErrorActionPreference = "Stop" + +$repoRoot = Resolve-Path (Join-Path $PSScriptRoot "..") +$script = Join-Path $repoRoot "code-reviews/regen-readme.py" + +# The bare `python3` alias on this platform resolves to the Windows Store stub; +# `python` is the real interpreter. +& python $script --check +exit $LASTEXITCODE