Add review-process + glauth design docs, bench scripts; ignore install/
Picks up the missing glauth.md referenced by CLAUDE.md, captures the review workflow alongside the bench-read-bulk and review-readme helper scripts, and excludes the local install/ deployment tree from source. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -45,6 +45,7 @@ build/
|
||||
out/
|
||||
tmp/
|
||||
temp/
|
||||
install/
|
||||
|
||||
# .NET
|
||||
**/bin/
|
||||
|
||||
@@ -0,0 +1,140 @@
|
||||
# Code Review Process
|
||||
|
||||
This document describes how to perform a comprehensive, per-module code review of
|
||||
the `mxaccessgw` codebase and how to track findings to resolution.
|
||||
|
||||
A **module** is one buildable project under `src/` (e.g. `src/ZB.MOM.WW.MxGateway.Worker`)
|
||||
or one language client under `clients/` (e.g. `clients/rust`). Each module has
|
||||
its own folder under `code-reviews/` containing a single `findings.md`.
|
||||
|
||||
## 1. Before you start
|
||||
|
||||
1. Pick the module to review. Its folder is `code-reviews/<Module>/`:
|
||||
- For a `src/` project, `<Module>` is the project name with the `ZB.MOM.WW.MxGateway.`
|
||||
prefix stripped — `src/ZB.MOM.WW.MxGateway.Server` is reviewed in `code-reviews/Server/`.
|
||||
- For a language client, `<Module>` is `Client.<Lang>` — `clients/rust` is
|
||||
reviewed in `code-reviews/Client.Rust/`.
|
||||
2. Identify the design context for the module:
|
||||
- `gateway.md` — top-level architecture, command/event surface, IPC envelope,
|
||||
STA thread model, fault handling.
|
||||
- The relevant component design docs under `docs/` (e.g.
|
||||
`docs/MxAccessWorkerInstanceDesign.md`, `docs/GatewayProcessDesign.md`,
|
||||
`docs/Sessions.md`, `docs/Authentication.md`, `docs/GalaxyRepository.md`).
|
||||
- `docs/DesignDecisions.md` for the v1 design choices.
|
||||
- The **Repository-Specific Conventions** and **Process / Platform Notes** in
|
||||
`CLAUDE.md`.
|
||||
3. Record the exact commit being reviewed: `git rev-parse --short HEAD`. Every
|
||||
review is a snapshot — a finding only means something relative to a known
|
||||
commit.
|
||||
4. Open `code-reviews/<Module>/findings.md` and fill in the header table
|
||||
(reviewer, date, commit SHA, status).
|
||||
|
||||
## 2. Review checklist
|
||||
|
||||
Work through **every** category below for the module. A comprehensive review
|
||||
means the checklist is completed even where it produces no findings — record
|
||||
"No issues found" for a category rather than leaving it ambiguous.
|
||||
|
||||
1. **Correctness & logic bugs** — off-by-one, null handling, incorrect
|
||||
conditionals, misuse of APIs, broken edge cases.
|
||||
2. **mxaccessgw conventions** — the rules in `CLAUDE.md` and the style guides
|
||||
under `docs/style-guides/`: the gateway never instantiates MXAccess COM
|
||||
directly; all MXAccess COM calls run on the worker's dedicated STA thread and
|
||||
the STA loop pumps Windows messages; IPC uses one bidirectional named pipe per
|
||||
worker carrying length-prefixed `WorkerEnvelope` protobuf frames; MXAccess
|
||||
parity is the contract (don't "fix" surprising MXAccess behaviour, never
|
||||
synthesize events); one worker and one event subscriber per session; the
|
||||
gateway terminates orphan workers on startup and does not reattach; C# style
|
||||
(file-scoped namespaces, `sealed` by default, `Async` suffix, MXAccess-aligned
|
||||
names); no Blazor UI component libraries; no logging of secrets or full tag
|
||||
values; generated code is never hand-edited.
|
||||
3. **Concurrency & thread safety** — shared mutable state, STA affinity, race
|
||||
conditions, correct use of `async`/`await`, locking, disposal races.
|
||||
4. **Error handling & resilience** — exception paths, worker crash / reconnect
|
||||
handling, fail-fast event backpressure, transient vs permanent error
|
||||
classification, graceful degradation, correct gRPC status codes.
|
||||
5. **Security** — authentication/authorization checks, API-key scope enforcement,
|
||||
input validation, SQL injection in the Galaxy Repository RPCs, secret
|
||||
handling, the dashboard anonymous-localhost bypass, logging of sensitive data.
|
||||
6. **Performance & resource management** — `IDisposable` disposal, pipe / stream
|
||||
/ COM lifetimes, buffering and back-pressure, unnecessary allocations on hot
|
||||
paths, N+1 queries.
|
||||
7. **Design-document adherence** — does the code match `gateway.md`, the relevant
|
||||
`docs/` component designs, `docs/DesignDecisions.md`, and `CLAUDE.md`? Flag
|
||||
both code that drifts from the design and design docs that are now stale.
|
||||
8. **Code organization & conventions** — namespace hierarchy, project layout, the
|
||||
Options pattern, separation of concerns, additive-only contract evolution.
|
||||
9. **Testing coverage** — are the module's behaviours covered by tests
|
||||
(`src/ZB.MOM.WW.MxGateway.Tests`, `src/ZB.MOM.WW.MxGateway.Worker.Tests`,
|
||||
`src/ZB.MOM.WW.MxGateway.IntegrationTests`)? Note untested critical paths and missing
|
||||
edge-case tests.
|
||||
10. **Documentation & comments** — XML doc accuracy, misleading or stale comments,
|
||||
undocumented non-obvious behaviour.
|
||||
|
||||
## 3. Recording findings
|
||||
|
||||
Add one entry per finding to the `## Findings` section of the module's
|
||||
`findings.md`, using the entry format in
|
||||
[`_template/findings.md`](code-reviews/_template/findings.md).
|
||||
|
||||
- **Finding ID** — `<Module>-NNN`, numbered sequentially within the module and
|
||||
never reused (e.g. `Worker-001`). IDs are permanent even after resolution.
|
||||
- **Severity:**
|
||||
- **Critical** — data loss, security breach, crash/deadlock, or outage.
|
||||
- **High** — incorrect behaviour with significant impact; no safe workaround.
|
||||
- **Medium** — incorrect or risky behaviour with limited impact or a workaround.
|
||||
- **Low** — minor issues, style, maintainability, documentation.
|
||||
- **Category** — one of the 10 checklist categories above.
|
||||
- **Location** — `file:line` (clickable), or a list of locations.
|
||||
- **Description** — what is wrong and why it matters.
|
||||
- **Recommendation** — concrete suggested fix.
|
||||
|
||||
After recording findings, update the module header table (status, open-finding
|
||||
count) and regenerate the base README (step 5).
|
||||
|
||||
## 4. Marking an item resolved
|
||||
|
||||
Findings are **never deleted** — they are an audit trail. To close one, change
|
||||
its **Status** and complete the **Resolution** field:
|
||||
|
||||
- `Open` — newly recorded, not yet addressed.
|
||||
- `In Progress` — a fix is actively being worked on.
|
||||
- `Resolved` — fixed. The Resolution field must state the fixing commit SHA, the
|
||||
date, and a one-line description of the fix.
|
||||
- `Won't Fix` — intentionally not fixed. The Resolution field must justify why.
|
||||
- `Deferred` — valid but postponed. The Resolution field must say what it is
|
||||
waiting on (e.g. a tracked issue or a later milestone).
|
||||
|
||||
`Resolved`, `Won't Fix`, and `Deferred` findings are all considered **closed**.
|
||||
`Open` and `In Progress` are **pending** and appear in the base README's Pending
|
||||
Findings table.
|
||||
|
||||
## 5. Updating the base README
|
||||
|
||||
`code-reviews/README.md` holds the single cross-module view (the Module Status
|
||||
table and the Pending / Closed Findings tables). It is **generated** from the
|
||||
per-module `findings.md` files — do not edit it by hand.
|
||||
|
||||
After any review or status change, regenerate it:
|
||||
|
||||
```
|
||||
python code-reviews/regen-readme.py
|
||||
```
|
||||
|
||||
`regen-readme.py --check` exits non-zero if `README.md` is stale, if a module
|
||||
header's `Open findings` count disagrees with its finding statuses, or if a
|
||||
finding carries an unrecognised Status value. The PowerShell wrapper
|
||||
`scripts/check-code-reviews-readme.ps1` runs that check and is the intended hook
|
||||
for CI or a pre-commit step.
|
||||
|
||||
> The repo's installed `python` is the real interpreter; the bare `python3`
|
||||
> alias resolves to the Windows Store stub and fails. Use `python`.
|
||||
|
||||
The per-module `findings.md` files are the source of truth; `README.md` is the
|
||||
aggregated index and must always agree with them — which the script guarantees.
|
||||
|
||||
## 6. Re-reviewing a module
|
||||
|
||||
Re-reviews append to the same `findings.md`. Update the header to the new commit
|
||||
and date, continue the finding numbering from the last used ID, and leave prior
|
||||
findings (including closed ones) in place as history.
|
||||
@@ -0,0 +1,280 @@
|
||||
# GLAuth — LDAP authn reference for mxaccessgw
|
||||
|
||||
GLAuth is a lightweight LDAP server installed on this dev box at
|
||||
`C:\publish\glauth\` and run as a Windows service via NSSM. It already
|
||||
backs the LmxOpcUa OPC UA server's UserName-token authn and the LmxOpcUa
|
||||
Admin UI's cookie login; this doc captures everything mxaccessgw needs
|
||||
to consume the same directory so a single set of dev credentials covers
|
||||
both stacks.
|
||||
|
||||
The authoritative copy of LmxOpcUa's reference lives at
|
||||
`C:\publish\glauth\auth.md`. This doc is a redistilled view tailored to
|
||||
mxaccessgw — what users + groups are already provisioned, how to bind
|
||||
against them, and what's needed to add a gw-specific role.
|
||||
|
||||
## Connection details
|
||||
|
||||
| Setting | Value |
|
||||
|---|---|
|
||||
| Protocol | LDAP (unencrypted) |
|
||||
| Host | `localhost` |
|
||||
| Port | `3893` |
|
||||
| LDAPS | disabled in dev (set `[ldaps]` block to enable) |
|
||||
| Base DN | `dc=lmxopcua,dc=local` |
|
||||
| Bind DN format | `cn={username},dc=lmxopcua,dc=local` |
|
||||
| Group OU | `ou=<groupname>,ou=groups,dc=lmxopcua,dc=local` |
|
||||
| Failed-bind throttle | 3 fails → 10-minute IP lockout (per `[behaviors]`) |
|
||||
|
||||
## Pre-existing groups (LmxOpcUa role taxonomy)
|
||||
|
||||
These map cleanly onto MxAccess capability boundaries — mxaccessgw
|
||||
should reuse them rather than define parallel groups so an operator with
|
||||
LmxOpcUa write rights doesn't need a second account for the gw.
|
||||
|
||||
| Group | GID | DN | LmxOpcUa meaning | Suggested mxgw mapping |
|
||||
|---|---|---|---|---|
|
||||
| ReadOnly | 5501 | `ou=ReadOnly,ou=groups,dc=lmxopcua,dc=local` | Browse + read OPC UA nodes | `Browse` + `Subscribe` (read paths only) |
|
||||
| WriteOperate | 5502 | `ou=WriteOperate,ou=groups,dc=lmxopcua,dc=local` | Write FreeAccess / Operate attrs | `Write` (plain) |
|
||||
| WriteTune | 5504 | `ou=WriteTune,ou=groups,dc=lmxopcua,dc=local` | Write Tune attrs | `WriteSecured` (Tune only) |
|
||||
| WriteConfigure | 5505 | `ou=WriteConfigure,ou=groups,dc=lmxopcua,dc=local` | Write Configure attrs | `WriteSecured` (Configure) |
|
||||
| AlarmAck | 5503 | `ou=AlarmAck,ou=groups,dc=lmxopcua,dc=local` | Acknowledge alarms | gw alarm-ack RPC, when added |
|
||||
|
||||
**A user can be in multiple groups** — `othergroups = [...]` in the
|
||||
config is a list. `admin` is the canonical example (in every role
|
||||
group below).
|
||||
|
||||
## Pre-provisioned users
|
||||
|
||||
| Username | Password | UID | Primary group | Other groups | Capabilities |
|
||||
|---|---|---|---|---|---|
|
||||
| `readonly` | `readonly123` | 5001 | ReadOnly | — | Browse, read |
|
||||
| `writeop` | `writeop123` | 5002 | WriteOperate | — | + plain Write |
|
||||
| `writetune` | `writetune123` | 5005 | WriteTune | — | + WriteSecured (Tune) |
|
||||
| `writeconfig` | `writeconfig123` | 5006 | WriteConfigure | — | + WriteSecured (Configure) |
|
||||
| `alarmack` | `alarmack123` | 5003 | AlarmAck | — | Alarm acknowledgment |
|
||||
| `admin` | `admin123` | 5004 | ReadOnly | WriteOperate, AlarmAck, WriteTune, WriteConfigure | All roles |
|
||||
| `serviceaccount` | `serviceaccount123` | 5999 | ReadOnly | — | LDAP search capability (for bind-then-search) |
|
||||
|
||||
For mxaccessgw dev, `admin` covers every gw-side capability test;
|
||||
`readonly` is the right "negative" case for proving Browse-OK /
|
||||
Write-denied.
|
||||
|
||||
The gateway dashboard adds one role beyond this LmxOpcUa taxonomy:
|
||||
`GwAdmin`. `LdapOptions.RequiredGroup` defaults to `GwAdmin`, so the
|
||||
dashboard login and `DashboardLdapLiveTests` require `admin` to be a
|
||||
member of a `GwAdmin` group. `GwAdmin` is **not** in the baseline
|
||||
GLAuth config — it must be provisioned before dashboard authn or the
|
||||
LDAP live tests work. See [Provisioning the GwAdmin
|
||||
group](#provisioning-the-gwadmin-group) below.
|
||||
|
||||
## Two bind patterns
|
||||
|
||||
### 1. Direct bind (simplest)
|
||||
|
||||
```
|
||||
DN: cn=admin,dc=lmxopcua,dc=local
|
||||
Password: admin123
|
||||
```
|
||||
|
||||
Construct the DN from the username; bind. Works on GLAuth because
|
||||
`backend.nameformat = "cn"` and `groupformat = "ou"` are set in the
|
||||
config. **Doesn't translate to Active Directory** — AD users are keyed
|
||||
by `sAMAccountName`, not `cn`. Use this only for dev convenience.
|
||||
|
||||
### 2. Bind-then-search (production-grade)
|
||||
|
||||
```
|
||||
1. Bind as the service account (cn=serviceaccount,dc=lmxopcua,dc=local
|
||||
/ serviceaccount123).
|
||||
2. Search under dc=lmxopcua,dc=local with filter
|
||||
(uid=<entered-username>) — or any attribute the deployment
|
||||
identifies users by. GLAuth populates uid + cn.
|
||||
3. Read the returned entry's DN + memberOf list (groups).
|
||||
4. Bind again as the discovered DN with the entered password. If that
|
||||
succeeds, authn passes; the memberOf values become the role set.
|
||||
```
|
||||
|
||||
The second bind is the actual password check — the search is just a DN
|
||||
discovery. This is the AD-friendly path: AD's
|
||||
`tokenGroups` / `LDAP_MATCHING_RULE_IN_CHAIN` flatten nested groups, but
|
||||
that's an enhancement, not required for first-pass dev.
|
||||
|
||||
LmxOpcUa's `Server/Security/LdapUserAuthenticator.cs` ships a working
|
||||
implementation of this pattern using `Novell.Directory.Ldap.NETStandard`
|
||||
v3.6.0 — copy the bind-then-search loop from there if mxaccessgw wants
|
||||
to avoid re-deriving the LDAP escape-string handling.
|
||||
|
||||
## Suggested mxgw configuration shape
|
||||
|
||||
A YAML/JSON section for mxaccessgw that mirrors LmxOpcUa's `LdapOptions`
|
||||
record:
|
||||
|
||||
```yaml
|
||||
ldap:
|
||||
enabled: true
|
||||
server: localhost
|
||||
port: 3893
|
||||
useTls: false
|
||||
allowInsecureLdap: true # dev only
|
||||
searchBase: "dc=lmxopcua,dc=local"
|
||||
serviceAccountDn: "cn=serviceaccount,dc=lmxopcua,dc=local"
|
||||
serviceAccountPassword: "serviceaccount123"
|
||||
userNameAttribute: "uid" # GLAuth populates this; AD uses sAMAccountName
|
||||
displayNameAttribute: "cn"
|
||||
groupAttribute: "memberOf"
|
||||
groupToRole:
|
||||
ReadOnly: "Browse"
|
||||
WriteOperate: "Write"
|
||||
WriteTune: "WriteSecured"
|
||||
WriteConfigure: "WriteSecured"
|
||||
AlarmAck: "AlarmAck"
|
||||
```
|
||||
|
||||
`groupAttribute` returns full DNs like
|
||||
`ou=ReadOnly,ou=groups,dc=lmxopcua,dc=local` — the authenticator
|
||||
should strip the leading `ou=` (or `cn=` against AD) RDN value and
|
||||
look that up in `groupToRole`.
|
||||
|
||||
## Provisioning the GwAdmin group
|
||||
|
||||
`GwAdmin` is the gateway-specific dashboard-admin role. It is the
|
||||
default `LdapOptions.RequiredGroup`, so the dashboard cookie login and
|
||||
`DashboardLdapLiveTests` (`MXGATEWAY_RUN_LIVE_LDAP_TESTS=1`) reject
|
||||
`admin` until a `GwAdmin` group exists and `admin` is a member.
|
||||
GLAuth's baseline config ships only the five LmxOpcUa role groups, so
|
||||
`GwAdmin` must be added to GLAuth rather than run from a separate LDAP
|
||||
server:
|
||||
|
||||
1. Edit `C:\publish\glauth\glauth.cfg`
|
||||
2. Append the group:
|
||||
|
||||
```toml
|
||||
[[groups]]
|
||||
name = "GwAdmin"
|
||||
gidnumber = 5510 # pick the next free GID
|
||||
```
|
||||
|
||||
3. Add `5510` to `admin`'s `othergroups` list so `admin` resolves the
|
||||
`GwAdmin` role. Add it to any other user that needs dashboard-admin
|
||||
rights. Or create a dedicated user:
|
||||
|
||||
```toml
|
||||
[[users]]
|
||||
name = "gwadmin"
|
||||
givenname = "Gateway"
|
||||
sn = "Admin"
|
||||
mail = "gwadmin@lmxopcua.local"
|
||||
uidnumber = 5010
|
||||
primarygroup = 5510
|
||||
passsha256 = "<sha256 of the password — see below>"
|
||||
```
|
||||
|
||||
4. `nssm restart GLAuth`
|
||||
|
||||
After the restart, `admin`'s `memberOf` includes
|
||||
`ou=GwAdmin,ou=groups,dc=lmxopcua,dc=local`, which the authenticator
|
||||
strips to `GwAdmin` and matches against `RequiredGroup`. The same
|
||||
pattern applies to any future permission that doesn't fit the existing
|
||||
five roles.
|
||||
|
||||
Generate `passsha256` from a plaintext password:
|
||||
|
||||
```powershell
|
||||
# Windows / PowerShell
|
||||
$bytes = [System.Text.Encoding]::UTF8.GetBytes("yourpassword")
|
||||
$hash = [System.Security.Cryptography.SHA256]::Create().ComputeHash($bytes)
|
||||
-join ($hash | ForEach-Object { $_.ToString("x2") })
|
||||
```
|
||||
|
||||
```bash
|
||||
# WSL / git-bash
|
||||
echo -n "yourpassword" | openssl dgst -sha256
|
||||
```
|
||||
|
||||
## Quick verification
|
||||
|
||||
From mxaccessgw's dev box, prove the directory is reachable:
|
||||
|
||||
```powershell
|
||||
# Plain bind via PowerShell + System.DirectoryServices.Protocols
|
||||
$ldap = New-Object System.DirectoryServices.Protocols.LdapConnection("localhost:3893")
|
||||
$ldap.AuthType = [System.DirectoryServices.Protocols.AuthType]::Basic
|
||||
$ldap.SessionOptions.ProtocolVersion = 3
|
||||
$ldap.SessionOptions.SecureSocketLayer = $false
|
||||
$cred = New-Object System.Net.NetworkCredential("cn=admin,dc=lmxopcua,dc=local","admin123")
|
||||
$ldap.Bind($cred)
|
||||
"Bind OK"
|
||||
```
|
||||
|
||||
Or via `ldapsearch` if you have OpenLDAP CLI tools:
|
||||
|
||||
```bash
|
||||
ldapsearch -x -H ldap://localhost:3893 \
|
||||
-D "cn=admin,dc=lmxopcua,dc=local" -w admin123 \
|
||||
-b "dc=lmxopcua,dc=local" "(uid=admin)"
|
||||
```
|
||||
|
||||
The response should list `admin`'s entry with `memberOf` populated for
|
||||
all five role groups — plus `GwAdmin` once the gateway-specific group
|
||||
is provisioned.
|
||||
|
||||
## Service management
|
||||
|
||||
```powershell
|
||||
# Status / start / stop / restart
|
||||
nssm status GLAuth
|
||||
nssm start GLAuth
|
||||
nssm stop GLAuth
|
||||
nssm restart GLAuth
|
||||
|
||||
# Inspect what NSSM was told to launch
|
||||
nssm get GLAuth Parameters
|
||||
```
|
||||
|
||||
Logs:
|
||||
|
||||
| File | Purpose |
|
||||
|---|---|
|
||||
| `C:\publish\glauth\logs\stdout.log` | Bind events, search responses |
|
||||
| `C:\publish\glauth\logs\stderr.log` | Startup errors, config parse failures |
|
||||
|
||||
After editing `glauth.cfg`, always tail `stderr.log` after the restart
|
||||
to catch a fat-fingered TOML before it bites at first bind:
|
||||
|
||||
```powershell
|
||||
nssm restart GLAuth
|
||||
Get-Content C:\publish\glauth\logs\stderr.log -Tail 20 -Wait
|
||||
```
|
||||
|
||||
## Active Directory migration cheat-sheet
|
||||
|
||||
LmxOpcUa's `LdapOptions` xml-doc captures the AD overrides; same set
|
||||
applies to mxaccessgw verbatim. Keys that change:
|
||||
|
||||
| Field | GLAuth dev value | AD production value |
|
||||
|---|---|---|
|
||||
| `Server` | `localhost` | a domain controller FQDN, or the domain itself |
|
||||
| `Port` | `3893` | `636` (LDAPS) — AD increasingly rejects plain bind under LDAP-signing enforcement |
|
||||
| `UseTls` | `false` | `true` |
|
||||
| `AllowInsecureLdap` | `true` | `false` |
|
||||
| `SearchBase` | `dc=lmxopcua,dc=local` | `DC=corp,DC=example,DC=com` |
|
||||
| `ServiceAccountDn` | `cn=serviceaccount,dc=lmxopcua,dc=local` | `CN=MxGwSvc,OU=Service Accounts,DC=corp,...` |
|
||||
| `UserNameAttribute` | `uid` | `sAMAccountName` (or `userPrincipalName`) |
|
||||
| `GroupAttribute` | `memberOf` (unchanged) | `memberOf` (unchanged) |
|
||||
|
||||
`memberOf` returns full DNs; the authenticator strips the leading
|
||||
`CN=` value and uses it as the lookup key in `groupToRole`. Nested
|
||||
groups are **not** auto-expanded; either flatten in the directory or
|
||||
add a `tokenGroups` query as an enhancement.
|
||||
|
||||
## Security notes for production
|
||||
|
||||
- **Plaintext passwords in `glauth.cfg` are dev-only.** The config is
|
||||
unencrypted on disk; anyone with read access to `C:\publish\glauth\`
|
||||
can SHA256-rainbow-table the entries. Treat the dev creds as
|
||||
throwaway. Production LDAP is Active Directory.
|
||||
- The 3-fail / 10-minute lockout is per source IP, not per user — a
|
||||
shared NAT can lock out a whole office. Tunable in `[behaviors]`.
|
||||
- LDAPS isn't enabled in dev; binding sends passwords cleartext on the
|
||||
wire. Fine for `localhost`, never expose port 3893 off-box without
|
||||
enabling TLS first.
|
||||
@@ -0,0 +1,389 @@
|
||||
<#
|
||||
.SYNOPSIS
|
||||
Cross-language ReadBulk stress benchmark driver.
|
||||
|
||||
.DESCRIPTION
|
||||
Launches the bench-read-bulk subcommand of every client CLI (.NET, Go, Rust,
|
||||
Python, Java) concurrently against a running gateway and worker. Each client
|
||||
opens its own session, subscribes to -BulkSize tags so the worker's per-session
|
||||
MxAccessValueCache populates from real OnDataChange events, then hammers
|
||||
ReadBulk in a tight in-process loop for -DurationSeconds with per-call
|
||||
high-resolution latency capture. Each emits a single JSON stats object on
|
||||
stdout; this script collates the five into a comparison table.
|
||||
|
||||
The gateway and worker are assumed to be running at -Endpoint with the API
|
||||
key in $env:<ApiKeyEnv>.
|
||||
|
||||
.PARAMETER Clients
|
||||
Which clients to run. Defaults to all five.
|
||||
|
||||
.PARAMETER Endpoint
|
||||
gRPC endpoint of the gateway. Default localhost:5120.
|
||||
|
||||
.PARAMETER ApiKeyEnv
|
||||
Environment variable holding the API key. Default MXGATEWAY_API_KEY.
|
||||
|
||||
.PARAMETER DurationSeconds
|
||||
Steady-state measurement window per client.
|
||||
|
||||
.PARAMETER WarmupSeconds
|
||||
Warm-up window per client (calls during this window are discarded).
|
||||
|
||||
.PARAMETER BulkSize
|
||||
Number of tags per ReadBulk call.
|
||||
|
||||
.PARAMETER TagStart
|
||||
First machine number per client. Each client uses a contiguous range starting
|
||||
here, so machine ranges do not overlap when -DistinctTags is set.
|
||||
|
||||
.PARAMETER TagPrefix
|
||||
Tag prefix (machine number is appended as %03d).
|
||||
|
||||
.PARAMETER TagAttribute
|
||||
Attribute appended to each tag.
|
||||
|
||||
.PARAMETER DistinctTags
|
||||
When set, each client uses its own slice of tags (clients[i] starts at
|
||||
TagStart + i * BulkSize). When unset (default), all clients hit the same
|
||||
tags to maximise contention on the worker's value cache.
|
||||
|
||||
.PARAMETER ReportPath
|
||||
Where to persist the combined report. Defaults to artifacts/bench/...
|
||||
#>
|
||||
[CmdletBinding()]
|
||||
param(
|
||||
[string[]]$Clients = @("dotnet", "go", "rust", "python", "java"),
|
||||
[string]$Endpoint = "localhost:5120",
|
||||
[string]$ApiKeyEnv = "MXGATEWAY_API_KEY",
|
||||
[int]$DurationSeconds = 30,
|
||||
[int]$WarmupSeconds = 3,
|
||||
[int]$BulkSize = 6,
|
||||
[int]$TagStart = 1,
|
||||
[string]$TagPrefix = "TestMachine_",
|
||||
[string]$TagAttribute = "TestChangingInt",
|
||||
[int]$TimeoutMs = 1500,
|
||||
[switch]$DistinctTags,
|
||||
[string]$ReportPath
|
||||
)
|
||||
|
||||
Set-StrictMode -Version Latest
|
||||
$ErrorActionPreference = "Stop"
|
||||
|
||||
$repoRoot = Resolve-Path (Join-Path $PSScriptRoot "..")
|
||||
$validClients = @("dotnet", "go", "rust", "python", "java")
|
||||
foreach ($c in $Clients) {
|
||||
if ($validClients -notcontains $c) {
|
||||
throw "Unsupported client '$c'. Valid: $($validClients -join ', ')."
|
||||
}
|
||||
}
|
||||
|
||||
if ([string]::IsNullOrWhiteSpace($ReportPath)) {
|
||||
$timestamp = Get-Date -Format "yyyyMMdd-HHmmss"
|
||||
$ReportPath = Join-Path $repoRoot "artifacts/bench/bench-read-bulk-$timestamp.json"
|
||||
}
|
||||
$reportDir = Split-Path -Parent $ReportPath
|
||||
if (-not (Test-Path $reportDir)) {
|
||||
New-Item -ItemType Directory -Path $reportDir -Force | Out-Null
|
||||
}
|
||||
|
||||
$apiKeyValue = (Get-Item -Path "Env:$ApiKeyEnv" -ErrorAction SilentlyContinue).Value
|
||||
if ([string]::IsNullOrWhiteSpace($apiKeyValue)) {
|
||||
throw "The API key environment variable '$ApiKeyEnv' is not set. Define it before running the bench."
|
||||
}
|
||||
|
||||
# Temp dir for per-client stdout/stderr capture + (Java only) a one-shot
|
||||
# wrapper .bat that handles cmd.exe's quoting rules for `gradle --args="..."`.
|
||||
$tmpDir = Join-Path ([System.IO.Path]::GetTempPath()) "mxgw-bench-$([guid]::NewGuid())"
|
||||
New-Item -ItemType Directory -Path $tmpDir -Force | Out-Null
|
||||
|
||||
function ConvertTo-HttpEndpoint {
|
||||
param([string]$Value)
|
||||
if ($Value -match '^https?://') { return $Value }
|
||||
return "http://$Value"
|
||||
}
|
||||
|
||||
function ConvertTo-HostEndpoint {
|
||||
param([string]$Value)
|
||||
return ($Value -replace '^https?://', '')
|
||||
}
|
||||
|
||||
# Build the per-client command array. Each client gets its own tag range when
|
||||
# -DistinctTags is set so the workers race against distinct cache slices.
|
||||
function Get-ClientCommand {
|
||||
param(
|
||||
[string]$Client,
|
||||
[int]$ClientIndex
|
||||
)
|
||||
|
||||
$effectiveTagStart = if ($DistinctTags) { $TagStart + ($ClientIndex * $BulkSize) } else { $TagStart }
|
||||
$httpEndpoint = ConvertTo-HttpEndpoint -Value $Endpoint
|
||||
$hostEndpoint = ConvertTo-HostEndpoint -Value $Endpoint
|
||||
$clientName = "mxgw-$Client-bench"
|
||||
|
||||
# Per-call gRPC timeout must exceed (DurationSeconds + WarmupSeconds + slack)
|
||||
# — otherwise the channel-wide timeout cancels the bench mid-loop.
|
||||
$callTimeoutSeconds = [int]([Math]::Max(60, $DurationSeconds + $WarmupSeconds + 30))
|
||||
|
||||
switch ($Client) {
|
||||
"dotnet" {
|
||||
# -c Release matches the rest of the matrix: HotSpot/Tier 1 JIT
|
||||
# closes most of the debug/release gap for .NET on its own, but
|
||||
# Release also disables JIT inline thresholds that hurt Stopwatch-
|
||||
# bracketed measurement noise. Same project must have been built
|
||||
# in Release at least once before this --no-build invocation.
|
||||
$cliArgs = @(
|
||||
"run", "--project", "clients/dotnet/ZB.MOM.WW.MxGateway.Client.Cli", "-c", "Release", "--no-build", "--",
|
||||
"bench-read-bulk",
|
||||
"--endpoint", $httpEndpoint,
|
||||
"--api-key-env", $ApiKeyEnv,
|
||||
"--timeout", "${callTimeoutSeconds}s",
|
||||
"--client-name", $clientName,
|
||||
"--duration-seconds", "$DurationSeconds",
|
||||
"--warmup-seconds", "$WarmupSeconds",
|
||||
"--bulk-size", "$BulkSize",
|
||||
"--tag-start", "$effectiveTagStart",
|
||||
"--tag-prefix", $TagPrefix,
|
||||
"--tag-attribute", $TagAttribute,
|
||||
"--timeout-ms", "$TimeoutMs",
|
||||
"--json"
|
||||
)
|
||||
return [pscustomobject]@{ file = "dotnet"; args = $cliArgs; cwd = $repoRoot }
|
||||
}
|
||||
"go" {
|
||||
$cliArgs = @(
|
||||
"run", "./cmd/mxgw-go", "bench-read-bulk",
|
||||
"-endpoint", $hostEndpoint,
|
||||
"-api-key-env", $ApiKeyEnv,
|
||||
"-plaintext",
|
||||
"-json",
|
||||
"-client-name", $clientName,
|
||||
"-duration-seconds", "$DurationSeconds",
|
||||
"-warmup-seconds", "$WarmupSeconds",
|
||||
"-bulk-size", "$BulkSize",
|
||||
"-tag-start", "$effectiveTagStart",
|
||||
"-tag-prefix", $TagPrefix,
|
||||
"-tag-attribute", $TagAttribute,
|
||||
"-timeout-ms", "$TimeoutMs"
|
||||
)
|
||||
return [pscustomobject]@{ file = "go"; args = $cliArgs; cwd = (Join-Path $repoRoot "clients/go") }
|
||||
}
|
||||
"rust" {
|
||||
# --release is essential: Rust debug builds disable inlining and
|
||||
# add overflow checks, which costs the bench ~45% of throughput
|
||||
# and ~3x of p99 latency vs release. The other compiled clients
|
||||
# don't have this gap (go run is optimized, dotnet/java run JIT-
|
||||
# optimized after the warmup window).
|
||||
$cliArgs = @(
|
||||
"run", "--release", "--quiet", "-p", "mxgw-cli", "--",
|
||||
"bench-read-bulk",
|
||||
"--endpoint", $httpEndpoint,
|
||||
"--api-key-env", $ApiKeyEnv,
|
||||
"--client-name", $clientName,
|
||||
"--duration-seconds", "$DurationSeconds",
|
||||
"--warmup-seconds", "$WarmupSeconds",
|
||||
"--bulk-size", "$BulkSize",
|
||||
"--tag-start", "$effectiveTagStart",
|
||||
"--tag-prefix", $TagPrefix,
|
||||
"--tag-attribute", $TagAttribute,
|
||||
"--timeout-ms", "$TimeoutMs",
|
||||
"--json"
|
||||
)
|
||||
return [pscustomobject]@{ file = "cargo"; args = $cliArgs; cwd = (Join-Path $repoRoot "clients/rust") }
|
||||
}
|
||||
"python" {
|
||||
$cliArgs = @(
|
||||
"-m", "mxgateway_cli", "bench-read-bulk",
|
||||
"--endpoint", $hostEndpoint,
|
||||
"--api-key-env", $ApiKeyEnv,
|
||||
"--plaintext",
|
||||
"--client-name", $clientName,
|
||||
"--duration-seconds", "$DurationSeconds",
|
||||
"--warmup-seconds", "$WarmupSeconds",
|
||||
"--bulk-size", "$BulkSize",
|
||||
"--tag-start", "$effectiveTagStart",
|
||||
"--tag-prefix", $TagPrefix,
|
||||
"--tag-attribute", $TagAttribute,
|
||||
"--timeout-ms", "$TimeoutMs",
|
||||
"--json"
|
||||
)
|
||||
$python = 'C:\Users\dohertj2\AppData\Local\Programs\Python\Python312\python.exe'
|
||||
return [pscustomobject]@{ file = $python; args = $cliArgs; cwd = (Join-Path $repoRoot "clients/python"); pythonpath = (Join-Path $repoRoot "clients/python/src") }
|
||||
}
|
||||
"java" {
|
||||
$inner = @(
|
||||
"bench-read-bulk",
|
||||
"--endpoint", $hostEndpoint,
|
||||
"--api-key-env", $ApiKeyEnv,
|
||||
"--plaintext",
|
||||
"--json",
|
||||
"--client-name", $clientName,
|
||||
"--duration-seconds", "$DurationSeconds",
|
||||
"--warmup-seconds", "$WarmupSeconds",
|
||||
"--bulk-size", "$BulkSize",
|
||||
"--tag-start", "$effectiveTagStart",
|
||||
"--tag-prefix", $TagPrefix,
|
||||
"--tag-attribute", $TagAttribute,
|
||||
"--timeout-ms", "$TimeoutMs"
|
||||
)
|
||||
$gradle = (Get-Command "gradle.bat", "gradle.cmd", "gradle.exe", "gradle" -ErrorAction SilentlyContinue | Select-Object -First 1)
|
||||
if ($null -eq $gradle) { throw "gradle not on PATH; required for the Java bench." }
|
||||
# Start-Process with ArgumentList mangles the `--args="..."` quoting
|
||||
# cmd.exe needs to keep the whole bench-args expression as a single
|
||||
# gradle argument. Workaround: write a one-shot .bat that contains
|
||||
# the literal gradle command line and invoke that batch via cmd.
|
||||
$batPath = Join-Path $tmpDir "java-bench.bat"
|
||||
$batContent = '@echo off' + "`r`n" +
|
||||
'"' + $gradle.Source + '" --quiet :mxgateway-cli:run "--args=' + ($inner -join ' ') + '"' + "`r`n"
|
||||
Set-Content -Path $batPath -Value $batContent -Encoding ASCII
|
||||
return [pscustomobject]@{ file = "cmd.exe"; args = @("/c", $batPath); cwd = (Join-Path $repoRoot "clients/java") }
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
# Start one detached process per client and wait for all. Stdout (the JSON
|
||||
# stats line) is captured to a per-client tmp file; stderr is captured too in
|
||||
# case a bench crashed.
|
||||
$jobs = @()
|
||||
|
||||
Write-Host "Launching $($Clients.Count) concurrent benches against $Endpoint (duration=$($DurationSeconds)s, warmup=$($WarmupSeconds)s, bulkSize=$BulkSize, distinctTags=$([bool]$DistinctTags))"
|
||||
|
||||
for ($i = 0; $i -lt $Clients.Count; $i++) {
|
||||
$client = $Clients[$i]
|
||||
$cmd = Get-ClientCommand -Client $client -ClientIndex $i
|
||||
$stdoutPath = Join-Path $tmpDir "$client.out"
|
||||
$stderrPath = Join-Path $tmpDir "$client.err"
|
||||
$startArgs = @{
|
||||
FilePath = $cmd.file
|
||||
ArgumentList = $cmd.args
|
||||
WorkingDirectory = $cmd.cwd
|
||||
RedirectStandardOutput = $stdoutPath
|
||||
RedirectStandardError = $stderrPath
|
||||
NoNewWindow = $true
|
||||
PassThru = $true
|
||||
}
|
||||
if ($cmd.PSObject.Properties['pythonpath']) {
|
||||
# Python needs PYTHONPATH so the editable mxgateway_cli module resolves.
|
||||
$env:PYTHONPATH = $cmd.pythonpath
|
||||
}
|
||||
$process = Start-Process @startArgs
|
||||
$jobs += [pscustomobject]@{ client = $client; process = $process; stdoutPath = $stdoutPath; stderrPath = $stderrPath }
|
||||
Write-Host " [$client] pid=$($process.Id)"
|
||||
}
|
||||
|
||||
foreach ($job in $jobs) {
|
||||
$job.process.WaitForExit()
|
||||
}
|
||||
|
||||
# Parse one JSON line per client. The line is typically the last
|
||||
# `{`-prefixed line in stdout (gradle, dotnet run, cargo run can emit log
|
||||
# noise before it).
|
||||
function Get-JsonStats {
|
||||
param([string]$Path)
|
||||
if (-not (Test-Path $Path)) { return $null }
|
||||
$content = Get-Content -Path $Path -Raw
|
||||
if ([string]::IsNullOrWhiteSpace($content)) { return $null }
|
||||
|
||||
# Scan from the LAST top-level `{` (the bench JSON is the final structured
|
||||
# output line; earlier text may be log noise from `dotnet run` / `cargo
|
||||
# run` / `gradle :run`). Walk forward counting braces to locate the
|
||||
# matching `}` so nested objects like `latencyMs` don't confuse the parser.
|
||||
$startIndex = -1
|
||||
$depth = 0
|
||||
for ($i = $content.Length - 1; $i -ge 0; $i--) {
|
||||
$ch = $content[$i]
|
||||
if ($ch -eq '}') { $depth++ }
|
||||
elseif ($ch -eq '{') {
|
||||
$depth--
|
||||
if ($depth -eq 0) { $startIndex = $i; break }
|
||||
}
|
||||
}
|
||||
if ($startIndex -lt 0) { return $null }
|
||||
|
||||
$endIndex = -1
|
||||
$depth = 0
|
||||
for ($i = $startIndex; $i -lt $content.Length; $i++) {
|
||||
$ch = $content[$i]
|
||||
if ($ch -eq '{') { $depth++ }
|
||||
elseif ($ch -eq '}') {
|
||||
$depth--
|
||||
if ($depth -eq 0) { $endIndex = $i; break }
|
||||
}
|
||||
}
|
||||
if ($endIndex -lt 0) { return $null }
|
||||
|
||||
$json = $content.Substring($startIndex, $endIndex - $startIndex + 1)
|
||||
try { return $json | ConvertFrom-Json }
|
||||
catch { return $null }
|
||||
}
|
||||
|
||||
$results = @()
|
||||
foreach ($job in $jobs) {
|
||||
$stats = Get-JsonStats -Path $job.stdoutPath
|
||||
if ($null -eq $stats) {
|
||||
$stderr = if (Test-Path $job.stderrPath) { (Get-Content -Path $job.stderrPath -Raw) } else { "" }
|
||||
Write-Warning "[$($job.client)] no JSON stats parsed; exit=$($job.process.ExitCode); stderr=$([string]::IsNullOrWhiteSpace($stderr) ? '(empty)' : $stderr.Substring(0, [Math]::Min(300, $stderr.Length)))"
|
||||
$results += [pscustomobject]@{ client = $job.client; exitCode = $job.process.ExitCode; stats = $null; stderr = $stderr }
|
||||
} else {
|
||||
$results += [pscustomobject]@{ client = $job.client; exitCode = $job.process.ExitCode; stats = $stats; stderr = $null }
|
||||
}
|
||||
}
|
||||
|
||||
# Pretty-print a side-by-side table.
|
||||
$rows = foreach ($r in $results) {
|
||||
if ($null -eq $r.stats) {
|
||||
[pscustomobject]@{
|
||||
client = $r.client
|
||||
"calls/sec" = "ERR"
|
||||
"total" = "-"
|
||||
"ok" = "-"
|
||||
"fail" = "-"
|
||||
"cached/total" = "-"
|
||||
"p50 ms" = "-"
|
||||
"p95 ms" = "-"
|
||||
"p99 ms" = "-"
|
||||
"max ms" = "-"
|
||||
"mean ms" = "-"
|
||||
}
|
||||
} else {
|
||||
$s = $r.stats
|
||||
[pscustomobject]@{
|
||||
client = $s.language
|
||||
"calls/sec" = $s.callsPerSecond
|
||||
"total" = $s.totalCalls
|
||||
"ok" = $s.successfulCalls
|
||||
"fail" = $s.failedCalls
|
||||
"cached/total" = "$($s.cachedReadResults)/$($s.totalReadResults)"
|
||||
"p50 ms" = $s.latencyMs.p50
|
||||
"p95 ms" = $s.latencyMs.p95
|
||||
"p99 ms" = $s.latencyMs.p99
|
||||
"max ms" = $s.latencyMs.max
|
||||
"mean ms" = $s.latencyMs.mean
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
$rows | Format-Table -AutoSize | Out-Host
|
||||
|
||||
$report = [pscustomobject]@{
|
||||
schemaVersion = 1
|
||||
endpoint = $Endpoint
|
||||
apiKeyEnv = $ApiKeyEnv
|
||||
durationSeconds = $DurationSeconds
|
||||
warmupSeconds = $WarmupSeconds
|
||||
bulkSize = $BulkSize
|
||||
distinctTags = [bool]$DistinctTags
|
||||
tagPrefix = $TagPrefix
|
||||
tagAttribute = $TagAttribute
|
||||
startedAt = (Get-Date).ToUniversalTime().ToString("o")
|
||||
clients = $results | ForEach-Object {
|
||||
[ordered]@{
|
||||
client = $_.client
|
||||
exitCode = $_.exitCode
|
||||
stats = $_.stats
|
||||
}
|
||||
}
|
||||
}
|
||||
$report | ConvertTo-Json -Depth 12 | Set-Content -Path $ReportPath -Encoding UTF8
|
||||
Write-Host "Combined report written to: $ReportPath"
|
||||
|
||||
Remove-Item -Path $tmpDir -Recurse -Force -ErrorAction SilentlyContinue
|
||||
@@ -0,0 +1,20 @@
|
||||
# Verifies code-reviews/README.md is regenerated from, and consistent with, the
|
||||
# per-module findings.md files. Intended as a CI / pre-commit gate.
|
||||
#
|
||||
# Exits non-zero when README.md is stale, when a module header's "Open findings"
|
||||
# count disagrees with its finding statuses, or when a finding carries an
|
||||
# unrecognised Status value. See REVIEW-PROCESS.md section 5.
|
||||
|
||||
[CmdletBinding()]
|
||||
param()
|
||||
|
||||
Set-StrictMode -Version Latest
|
||||
$ErrorActionPreference = "Stop"
|
||||
|
||||
$repoRoot = Resolve-Path (Join-Path $PSScriptRoot "..")
|
||||
$script = Join-Path $repoRoot "code-reviews/regen-readme.py"
|
||||
|
||||
# The bare `python3` alias on this platform resolves to the Windows Store stub;
|
||||
# `python` is the real interpreter.
|
||||
& python $script --check
|
||||
exit $LASTEXITCODE
|
||||
Reference in New Issue
Block a user