docs: TLS auto-cert and lenient client trust
This commit is contained in:
@@ -229,6 +229,185 @@ behavior.
|
||||
The alarm monitor is independent of client sessions: `AcknowledgeAlarm` and
|
||||
`StreamAlarms` are session-less RPCs served by the monitor.
|
||||
|
||||
## Host Endpoints and Transport Security (Kestrel)
|
||||
|
||||
The listening endpoints are **not** part of the `MxGateway` section. The gateway
|
||||
uses the stock ASP.NET Core host (`WebApplication.CreateBuilder`) with no
|
||||
`ConfigureKestrel` call in code, so endpoints come entirely from the standard
|
||||
`Kestrel` configuration section. On the deployed hosts these values are supplied
|
||||
as NSSM environment variables (`Kestrel__Endpoints__...`), not from
|
||||
`appsettings.json`.
|
||||
|
||||
Two named endpoints are bound:
|
||||
|
||||
| Endpoint name | Purpose | Protocol requirement |
|
||||
|---|---|---|
|
||||
| `Http` | Public gRPC API (sessions, invoke, events, Galaxy browse) | HTTP/2 |
|
||||
| `Dashboard` | Blazor dashboard and SignalR hubs | HTTP/1.1 (HTTP/2 optional) |
|
||||
|
||||
Both endpoints share one routing pipeline; the names only select which TCP port
|
||||
serves which traffic. The gRPC endpoint must negotiate **HTTP/2**, which drives
|
||||
the protocol settings below.
|
||||
|
||||
### Plaintext (current deployments)
|
||||
|
||||
Both running hosts (`10.100.0.48` and `wonder-app-vd03`) serve the gRPC port in
|
||||
**cleartext HTTP/2 (`h2c`)**. Because cleartext HTTP/2 has no ALPN to negotiate
|
||||
the protocol, the gRPC endpoint must be pinned to `Http2` with prior knowledge:
|
||||
|
||||
```text
|
||||
Kestrel__Endpoints__Http__Url=http://0.0.0.0:5120
|
||||
Kestrel__Endpoints__Http__Protocols=Http2
|
||||
Kestrel__Endpoints__Dashboard__Url=http://0.0.0.0:5130
|
||||
```
|
||||
|
||||
In this mode all client↔gateway traffic — including the
|
||||
`authorization: Bearer mxgw_...` API key and any `WriteSecured` / `AuthenticateUser`
|
||||
payloads — crosses the network **unencrypted**. This is acceptable only on a
|
||||
trusted/isolated network segment. Prefer TLS for anything else.
|
||||
|
||||
### TLS
|
||||
|
||||
To encrypt the gRPC channel, give the `Http` endpoint an `https://` URL and a
|
||||
certificate. Over TLS, ALPN negotiates HTTP/2, so the explicit `Protocols=Http2`
|
||||
pin is no longer required (the default `Http1AndHttp2` works for gRPC over TLS).
|
||||
|
||||
`appsettings.json` form:
|
||||
|
||||
```json
|
||||
{
|
||||
"Kestrel": {
|
||||
"Endpoints": {
|
||||
"Http": {
|
||||
"Url": "https://0.0.0.0:5120",
|
||||
"Certificate": {
|
||||
"Path": "C:\\ProgramData\\MxGateway\\certs\\gateway.pfx",
|
||||
"Password": "<pfx-password>"
|
||||
}
|
||||
},
|
||||
"Dashboard": {
|
||||
"Url": "https://0.0.0.0:5130",
|
||||
"Certificate": {
|
||||
"Path": "C:\\ProgramData\\MxGateway\\certs\\gateway.pfx",
|
||||
"Password": "<pfx-password>"
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
Equivalent NSSM environment-variable form (how config is delivered on the hosts —
|
||||
see [server deploy mechanics in the project notes]):
|
||||
|
||||
```text
|
||||
Kestrel__Endpoints__Http__Url=https://0.0.0.0:5120
|
||||
Kestrel__Endpoints__Http__Certificate__Path=C:\ProgramData\MxGateway\certs\gateway.pfx
|
||||
Kestrel__Endpoints__Http__Certificate__Password=<pfx-password>
|
||||
Kestrel__Endpoints__Dashboard__Url=https://0.0.0.0:5130
|
||||
Kestrel__Endpoints__Dashboard__Certificate__Path=C:\ProgramData\MxGateway\certs\gateway.pfx
|
||||
Kestrel__Endpoints__Dashboard__Certificate__Password=<pfx-password>
|
||||
```
|
||||
|
||||
Certificate sourcing options (any standard ASP.NET Core form is accepted):
|
||||
|
||||
| Form | Keys |
|
||||
|---|---|
|
||||
| PFX file | `Certificate:Path` (+ `Certificate:Password` if encrypted) |
|
||||
| PEM pair | `Certificate:Path` (cert) + `Certificate:KeyPath` (private key) |
|
||||
| Windows cert store | `Certificate:Subject`, `Certificate:Store` (e.g. `My`), `Certificate:Location` (`LocalMachine`), `Certificate:AllowInvalid` |
|
||||
|
||||
The certificate's CN/SAN must cover the host name clients dial (or clients must
|
||||
set a server-name override — see below). The dashboard endpoint can keep its own
|
||||
certificate independent of the gRPC endpoint; pair this with
|
||||
`MxGateway:Dashboard:RequireHttpsCookie` (`true`) for production HTTPS.
|
||||
|
||||
### Automatic self-signed certificate
|
||||
|
||||
`mxaccessgw` is an internal tool with no PKI to issue certificates, so requiring
|
||||
an operator to supply one before TLS works pushed deployments toward plaintext.
|
||||
To avoid that, the gateway fills in a self-signed certificate when an HTTPS
|
||||
endpoint is configured without one.
|
||||
|
||||
**Trigger.** At startup the gateway inspects `Kestrel:Endpoints:*`. If any
|
||||
endpoint has an `https://` URL and no `Certificate` subsection of its own, and no
|
||||
`Kestrel:Certificates:Default` is set, the gateway generates (or loads) a
|
||||
persisted self-signed certificate and wires it in as the HTTPS *default* via
|
||||
`ConfigureHttpsDefaults`. All-plaintext deployments are untouched: when no HTTPS
|
||||
endpoint is configured, no certificate or key material is generated or written.
|
||||
|
||||
**Generated certificate.** ECDSA P-256, `serverAuth` EKU, validity ≈
|
||||
`ValidityYears` (default 10 years, with one day of clock-skew slack before
|
||||
`notBefore`). SANs cover `localhost`, the machine name (and its FQDN when
|
||||
resolvable), each entry in `AdditionalDnsNames`, and the loopback addresses
|
||||
`127.0.0.1` and `::1`.
|
||||
|
||||
**`MxGateway:Tls:*` options.** All optional; the zero-config path needs none of
|
||||
them.
|
||||
|
||||
| Option | Default | Purpose |
|
||||
|---|---|---|
|
||||
| `Tls:SelfSignedCertPath` | `C:\ProgramData\MxGateway\certs\gateway-selfsigned.pfx` | Where the generated certificate is persisted |
|
||||
| `Tls:ValidityYears` | `10` | Lifetime of the generated certificate (validated 1–100) |
|
||||
| `Tls:AdditionalDnsNames` | `[]` | Extra DNS SANs (e.g. a load-balancer name) |
|
||||
| `Tls:RegenerateIfExpired` | `true` | Replace an expired persisted certificate instead of failing |
|
||||
|
||||
`ValidityYears` is validated by `GatewayOptionsValidator` (range 1–100); the
|
||||
"HTTPS endpoint configured but no certificate available" fail-fast lives in the
|
||||
bootstrap/provider, because the validator only sees the `MxGateway` section, not
|
||||
`Kestrel:Endpoints`.
|
||||
|
||||
**Persistence.** The PFX is written with an **empty** export password — a random
|
||||
in-memory password could not be reused across restarts, which the
|
||||
persist-and-reuse model requires. The private key is instead protected at rest by
|
||||
filesystem permissions: a restrictive ACL on Windows (SYSTEM + Administrators,
|
||||
inherited ACEs stripped) on the `certs` directory and file, and mode `0600` on
|
||||
non-Windows. The write is atomic (hardened temp file, then move). The persisted
|
||||
certificate is reused across restarts (stable thumbprint, so CA-pinning clients
|
||||
keep working) and regenerated only when it is missing, expired (and
|
||||
`RegenerateIfExpired` is `true`), or unreadable/corrupt. If the directory is not
|
||||
writable or the ACL cannot be applied, the gateway fails fast with a diagnostic
|
||||
naming the path rather than falling back to an in-memory certificate.
|
||||
|
||||
**Logging.** On generate or load, the gateway logs the certificate thumbprint,
|
||||
SAN list, and `notAfter` at Information. The PFX bytes, export password, and
|
||||
private key are never logged.
|
||||
|
||||
**Operator override.** The generated certificate is only the HTTPS *default*. To
|
||||
use a real certificate, configure one explicitly — either per endpoint via
|
||||
`Kestrel:Endpoints:<name>:Certificate` (`Path`/`Subject`/`Thumbprint`, etc., as
|
||||
in the table above) or globally via `Kestrel:Certificates:Default`. An
|
||||
explicitly-configured certificate takes precedence, and the gateway then writes
|
||||
no self-signed material.
|
||||
|
||||
### Client side
|
||||
|
||||
Each official client opts into TLS explicitly. For the .NET client
|
||||
(`MxGatewayClientOptions`):
|
||||
|
||||
| Option | Effect |
|
||||
|---|---|
|
||||
| `UseTls` (default `false`) | Enables TLS. Requires an `https://` endpoint; an `https://` endpoint without `UseTls` fails validation, and vice versa. |
|
||||
| `CaCertificatePath` | Pins a custom root (self-signed / private CA) using `CustomRootTrust` chain validation instead of the OS trust store; the .NET client also enforces the certificate hostname/SAN match on this path. |
|
||||
| `RequireCertificateValidation` (default `false`) | Forces OS/system-trust verification on a TLS connection with no pinned CA. Leave `false` for the lenient default. |
|
||||
| `ServerNameOverride` | SNI / certificate host name override when the dialed host differs from the certificate CN/SAN. |
|
||||
|
||||
To pair with the auto-generated self-signed certificate above, the clients are
|
||||
**lenient by default**: a TLS connection with no pinned CA accepts whatever
|
||||
certificate the gateway presents. Pin `CaCertificatePath` to verify, or set
|
||||
`RequireCertificateValidation` to force system-trust verification without
|
||||
pinning. The other language clients expose the equivalent options; the exact
|
||||
behavior differs per stack — Python uses trust-on-first-use and Rust is pin-only.
|
||||
See each client README for the as-built behavior.
|
||||
|
||||
### Gateway↔worker IPC
|
||||
|
||||
Transport security here applies only to the public gRPC channel. The
|
||||
gateway↔worker link is a per-session **named pipe**
|
||||
(`mxaccess-gateway-{gatewayPid}-{sessionId}`), not a network socket. It is not
|
||||
TLS-encrypted and does not need to be: it never leaves the local Windows host and
|
||||
is secured by the OS pipe ACL. See [Worker Frame Protocol](./WorkerFrameProtocol.md).
|
||||
|
||||
## Related Documentation
|
||||
|
||||
- [Gateway Process Detailed Design](./GatewayProcessDesign.md)
|
||||
|
||||
Reference in New Issue
Block a user