docs(audit): apply per-cluster judgment fixes across living docs

Resolve audit findings: correct WorkerEnvelope proto/route/metric/session
facts; rewrite auth (ZB.MOM.WW.Auth migration), dashboard (ZB.MOM.WW.Theme),
and StyleGuide (foreign-project copy-paste); document alarm subsystem, Ldap
options, and gateway alarm broker; fix client CLI flags and package paths.
This commit is contained in:
Joseph Doherty
2026-06-03 16:01:28 -04:00
parent f84e0c3474
commit e541339c07
29 changed files with 1102 additions and 432 deletions
+68 -50
View File
@@ -2,11 +2,13 @@
The gateway authentication subsystem verifies inbound API key credentials against a SQLite-backed key store, hashes secrets with a configurable pepper, and records administrative and verification events to an audit trail.
The peppered-HMAC API-key pipeline — token format, parsing, secret generation and hashing, constant-time comparison, the SQLite schema, the stores, the verifier, and the migrator — lives in the shared `ZB.MOM.WW.Auth.ApiKeys` package (with abstractions in `ZB.MOM.WW.Auth.Abstractions`), of which this gateway is the donor. The gateway references the package and binds the library's `ApiKeyOptions` from its own `MxGateway:Authentication` section through `AddSqliteAuthStore`, then layers the gateway-specific pieces on top: constraint enforcement, the gRPC authorization interceptor, the admin CLI, the dashboard API Keys page, and canonical audit forwarding. Types whose code is shown below for reference are owned by the shared package unless noted; the gateway does not re-implement them.
## Token Format
API keys travel in the HTTP `Authorization` header as a bearer token shaped `mxgw_<keyId>_<secret>`. The `mxgw_` prefix scopes parsing to gateway tokens, the `<keyId>` segment is the public identifier used for lookup, and `<secret>` is the high-entropy portion that the gateway verifies against a stored hash.
`ApiKeyParser` enforces the format and rejects malformed tokens before any database round-trip:
The shared library's `ApiKeyParser` enforces the format and rejects malformed tokens before any database round-trip:
```csharp
public bool TryParseAuthorizationHeader(string? authorizationHeader, out ParsedApiKey? apiKey)
@@ -50,7 +52,7 @@ public static string Generate()
### Peppered hashing
`ApiKeySecretHasher` (registered behind `IApiKeySecretHasher`) hashes secrets with `HMACSHA256` keyed by a server-side pepper. The pepper lives outside the database and is resolved by `IConfiguration` lookup against the configured `PepperSecretName`:
The shared library's `ApiKeySecretHasher` (behind `IApiKeySecretHasher`) hashes secrets with `HMACSHA256` keyed by a server-side pepper. The pepper lives outside the database and is resolved through an `IApiKeyPepperProvider` — the gateway wires the configuration-backed provider so the pepper comes from `IConfiguration` lookup against `MxGateway:ApiKeyPepper` (`PepperSecretName`):
```csharp
public byte[] HashSecret(string secret)
@@ -69,37 +71,29 @@ The pepper is intentionally not stored alongside the hash: an attacker who exfil
## Verification
`ApiKeyVerifier` (`IApiKeyVerifier`) implements the verification flow:
The shared library's `IApiKeyVerifier.VerifyAsync(authorizationHeader, cancellationToken)` owns the whole verification flow — the gateway interceptor hands it the raw `authorization` header value and never parses the token itself:
1. Parse the `Authorization` header into a `ParsedApiKey`.
2. Look up the `ApiKeyRecord` by `KeyId` through `IApiKeyStore.FindByKeyIdAsync`.
3. Reject revoked records (`RevokedUtc is not null`).
1. Parse the `Authorization` header into the key id and secret.
2. Look up the record by key id.
3. Reject revoked records.
4. Hash the presented secret with the configured pepper.
5. Compare hashes with `CryptographicOperations.FixedTimeEquals` to avoid timing oracles.
6. Record a `LastUsedUtc` timestamp via `MarkKeyUsedAsync` and return an `ApiKeyIdentity`.
6. Stamp `last_used_utc` and return an identity.
`VerifyAsync` returns an `ApiKeyVerification` value with a `Succeeded` flag and a nullable `Identity`. On failure the result is discriminated so the caller can tell parse errors, missing pepper, missing or revoked keys, and secret mismatch apart for audit detail — without leaking which check failed to the client. The gateway interceptor treats any non-success uniformly as `Unauthenticated` (see [Authorization](./Authorization.md)):
```csharp
if (!CryptographicOperations.FixedTimeEquals(presentedHash, storedKey.SecretHash))
{
return ApiKeyVerificationResult.Fail(ApiKeyVerificationFailure.SecretMismatch);
}
await keyStore.MarkKeyUsedAsync(storedKey.KeyId, DateTimeOffset.UtcNow, cancellationToken)
ApiKeyVerification verification = await apiKeyVerifier
.VerifyAsync(authorizationHeader ?? string.Empty, context.CancellationToken)
.ConfigureAwait(false);
return ApiKeyVerificationResult.Success(new ApiKeyIdentity(
KeyId: storedKey.KeyId,
KeyPrefix: storedKey.KeyPrefix,
DisplayName: storedKey.DisplayName,
Scopes: storedKey.Scopes,
Constraints: storedKey.Constraints));
if (!verification.Succeeded || verification.Identity is null)
{
throw new RpcException(new Status(StatusCode.Unauthenticated, "Missing or invalid API key."));
}
```
`ApiKeyVerificationResult` carries either an `ApiKeyIdentity` or a discriminated `ApiKeyVerificationFailure` value. The failure enum distinguishes parse errors, missing pepper, missing or revoked keys, and secret mismatch so the calling middleware can emit precise audit detail without leaking which check failed to the client.
`ApiKeyIdentity` exposes only non-secret fields (`KeyId`, `KeyPrefix`,
`DisplayName`, `Scopes`, and `Constraints`) and is the type downstream
authorization code consumes.
The shared verifier returns `ZB.MOM.WW.Auth.Abstractions.ApiKeys.ApiKeyIdentity`, which carries the persisted constraints as an opaque JSON string. The gateway's `GatewayApiKeyIdentityMapper.ToGatewayIdentity` projects it onto the gateway-local `ApiKeyIdentity` record, which exposes only non-secret fields (`KeyId`, `KeyPrefix`, `DisplayName`, `Scopes`) plus the deserialized `Constraints`, and is the type downstream authorization code consumes.
## Storage
@@ -107,7 +101,7 @@ The gateway keeps API key state in a dedicated SQLite database. SQLite is suffic
### Connection factory
`AuthSqliteConnectionFactory` reads `GatewayOptions.Authentication.SqlitePath`, ensures the parent directory exists, and builds a connection string in `ReadWriteCreate` mode so first-run installations can create the file without manual provisioning. Connection pooling is enabled and the connection string carries a non-zero `DefaultTimeout`:
The shared library's `AuthSqliteConnectionFactory` (registered by `AddZbApiKeyAuth`) reads the bound `ApiKeyOptions.SqlitePath` — which the gateway populates from `MxGateway:Authentication:SqlitePath` ensures the parent directory exists, and builds a connection string in `ReadWriteCreate` mode so first-run installations can create the file without manual provisioning. Connection pooling is enabled and the connection string carries a non-zero `DefaultTimeout`:
```csharp
SqliteConnectionStringBuilder builder = new()
@@ -119,21 +113,22 @@ SqliteConnectionStringBuilder builder = new()
};
```
Every store opens its connection through `OpenConnectionAsync`, which opens the connection and then applies `PRAGMA journal_mode=WAL` and `PRAGMA busy_timeout`. WAL is a persistent database-level setting so re-applying it per connection is a cheap no-op; `busy_timeout` is per-connection state. Because `MarkKeyUsedAsync` runs on every authenticated request and `SqliteApiKeyAuditStore` appends on every denial, this lets concurrent readers and writers retry briefly instead of surfacing `SQLITE_BUSY` as a hard failure on the request path.
Every store opens its connection through `OpenConnectionAsync`, which opens the connection and then applies `PRAGMA journal_mode=WAL` and `PRAGMA busy_timeout`. WAL is a persistent database-level setting so re-applying it per connection is a cheap no-op; `busy_timeout` is per-connection state. Because `MarkKeyUsedAsync` runs on every authenticated request and the canonical audit writer appends to the same file, this lets concurrent readers and writers retry briefly instead of surfacing `SQLITE_BUSY` as a hard failure on the request path.
### Schema
`SqliteAuthSchema` declares table names and the current schema version as constants. Three tables are involved:
The shared library's `SqliteAuthSchema` declares the API-key table names and the current schema version as constants. Four tables live in the database file:
- `api_keys` stores `key_id`, `key_prefix`, the `secret_hash` blob,
`display_name`, serialized `scopes`, optional serialized `constraints`, and
the `created_utc`, `last_used_utc`, and `revoked_utc` timestamps.
- `api_key_audit` is an append-only log keyed by an autoincrement `audit_id` with `key_id`, `event_type`, `remote_address`, `created_utc`, and `details` columns.
- `api_key_audit` is the shared library's append-only audit log keyed by an autoincrement `audit_id` with `key_id`, `event_type`, `remote_address`, `created_utc`, and `details` columns. The gateway overrides the library audit store (see [Audit trail](#audit-trail)), so this table is **left in place but unused** at runtime — nothing writes to it.
- `audit_event` is the gateway-owned canonical audit table written by `SqliteCanonicalAuditStore`. It lives in the same SQLite file (reusing the library's `AuthSqliteConnectionFactory`) and is where every gateway audit event actually lands. See [Audit trail](#audit-trail).
- `schema_version` carries a single row whose `version` column is matched against `SqliteAuthSchema.CurrentVersion`.
### Read paths
`SqliteApiKeyStore` (`IApiKeyStore`) handles the two reads needed at request time: `FindByKeyIdAsync` returns any record (so revoked keys can be reported distinctly) and `FindActiveByKeyIdAsync` filters to non-revoked rows. `MarkKeyUsedAsync` updates `last_used_utc` only for non-revoked rows so a freshly revoked key cannot have its timestamp refreshed by a racing verification.
The shared library's `SqliteApiKeyStore` (`IApiKeyStore`) handles the two reads needed at request time: `FindByKeyIdAsync` returns any record (so revoked keys can be reported distinctly) and `FindActiveByKeyIdAsync` filters to non-revoked rows. `MarkKeyUsedAsync` updates `last_used_utc` only for non-revoked rows so a freshly revoked key cannot have its timestamp refreshed by a racing verification.
`ApiKeyRecord` is the in-memory projection. `ApiKeyRecordReader.Read` is shared by every read path so column ordering is defined in one place:
@@ -155,17 +150,21 @@ public static ApiKeyRecord Read(SqliteDataReader reader)
### Write paths
`SqliteApiKeyAdminStore` (`IApiKeyAdminStore`) implements administrative mutations: `CreateAsync` accepts an `ApiKeyCreateRequest`, `RevokeAsync` sets `revoked_utc` only when not already revoked, `RotateAsync` replaces `secret_hash`, clears `last_used_utc`, and clears `revoked_utc` so a rotated key is immediately usable, and `DeleteAsync` permanently removes a row but only when `revoked_utc IS NOT NULL` — active keys are untouched (returns false) so the revoke event lands in the audit log before the row disappears.
The shared library's `SqliteApiKeyAdminStore` (`IApiKeyAdminStore`) implements administrative mutations: `CreateAsync` accepts an `ApiKeyCreateRequest`, `RevokeAsync` sets `revoked_utc` only when not already revoked, `RotateAsync` replaces `secret_hash`, clears `last_used_utc`, and clears `revoked_utc` so a rotated key is immediately usable, and `DeleteAsync` permanently removes a row but only when `revoked_utc IS NOT NULL` — active keys are untouched (returns false) so the revoke event lands in the audit log before the row disappears.
Because `RotateAsync` clears `revoked_utc`, rotating a previously revoked key reactivates it. The dashboard API Keys page therefore offers the Rotate (and Revoke) actions only for keys whose status is `Active`; revoked keys instead show a Delete action that calls `DeleteAsync`, so an operator can permanently remove a revoked row without ever risking un-revocation as a side effect of a rotation.
### Audit trail
`SqliteApiKeyAuditStore` (`IApiKeyAuditStore`) appends `ApiKeyAuditEntry` values to the `api_key_audit` table and stamps each row with a UTC timestamp inside the store rather than trusting the caller. `ListRecentAsync` returns the most recent rows ordered by `audit_id` descending and projects them into `ApiKeyAuditRecord`. Rows are kept even after the referenced key is revoked because the audit history is the durable record of administrative action; the `key_id` column is nullable to accommodate non-key-scoped events such as `init-db`.
All gateway audit flows through a single canonical `AuditEvent` written to the gateway-owned `audit_event` table, not the shared library's `api_key_audit` table. The gateway adopts `ZB.MOM.WW.Audit` and **overrides** the library's `IApiKeyAuditStore` registration with `CanonicalForwardingApiKeyAuditStore`. That adapter receives each library-emitted `ApiKeyAuditEntry` — including the library-internal admin-command verbs (`create-key`, `revoke-key`, `rotate-key`, `init-db`) the gateway cannot edit — canonicalizes it onto an `AuditEvent`, and forwards it through `IAuditWriter` (`CanonicalAuditWriter`), which persists to `audit_event` via `SqliteCanonicalAuditStore`.
Because the adapter is registered after `AddZbApiKeyAuth`, it is the `IApiKeyAuditStore` that the admin commands resolve and that the dashboard "recent audit" view reads through `IApiKeyAuditStore.ListRecentAsync`. The library's own `SqliteApiKeyAuditStore` and its `api_key_audit` table are therefore unused at runtime — the override is the only writer. Audit rows are kept even after the referenced key is revoked because the audit history is the durable record of administrative action; non-key-scoped events such as `init-db` carry no key id.
This canonical-forwarding wiring lives under `src/ZB.MOM.WW.MxGateway.Server/Security/Audit/`; the audit store override and writer are gateway types, while the entry shape and admin verbs originate in the shared library.
## Migration
Schema bring-up is centralised behind `IAuthStoreMigrator`. `SqliteAuthStoreMigrator` executes the migration inside a single transaction so a partial failure leaves the database untouched, refuses to start when the on-disk schema version is newer than the binary supports, and idempotently creates the v1 schema:
Schema bring-up for the API-key tables is owned by the shared library's `SqliteAuthStoreMigrator`, wired by `AddZbApiKeyAuth` along with its migration hosted service. It executes the migration inside a single transaction so a partial failure leaves the database untouched, refuses to start when the on-disk schema version is newer than the binary supports, and idempotently creates the schema:
```csharp
if (existingVersion > SqliteAuthSchema.CurrentVersion)
@@ -179,13 +178,11 @@ await ApplyVersionOneAsync(connection, transaction, cancellationToken).Configure
await transaction.CommitAsync(cancellationToken).ConfigureAwait(false);
```
`AuthStoreMigrationHostedService` runs the migrator at startup, but only when API-key authentication is enabled and `RunMigrationsOnStartup` is true. Operators who manage schema out-of-band can disable the hosted run and use the admin CLI's `init-db` command instead.
`AuthStoreMigrationException` is a sealed `InvalidOperationException` so it can be caught precisely without swallowing unrelated failures.
The library's migration hosted service runs the migrator at startup. Operators who manage schema out-of-band can use the admin CLI's `init-db` command instead.
## Admin CLI
`ApiKeyAdminCommandLineParser.Parse` recognises a leading `apikey` argument and dispatches to one of the subcommands declared by `ApiKeyAdminCommandKind`. Each parsed invocation produces an `ApiKeyAdminCommand` (or an `ApiKeyAdminParseResult` carrying an error). `ApiKeyAdminCliRunner` then executes the command, runs the migrator first, calls the relevant store method, appends an audit row, and writes either text or JSON output via `ApiKeyAdminOutput`. The returned `ApiKeyAdminListedKey` projection deliberately omits the `secret_hash` so listing a database does not surface hash material.
`ApiKeyAdminCommandLineParser.Parse` (a gateway type) recognises a leading `apikey` argument and dispatches to one of the subcommands declared by `ApiKeyAdminCommandKind`. Each parsed invocation produces an `ApiKeyAdminCommand` (or an `ApiKeyAdminParseResult` carrying an error). The parser validates requested `--scopes` against `GatewayScopes.All` (see [Authorization](./Authorization.md#scope-catalog)) so a non-canonical scope string cannot be persisted on a key. `ApiKeyAdminCliRunner` then drives the shared library's `ApiKeyAdminCommands` — which the gateway registers over the already-wired stores, pepper provider, and migrator — to execute the command, and writes either text or JSON output via `ApiKeyAdminOutput`. The returned `ApiKeyAdminListedKey` projection deliberately omits the `secret_hash` so listing a database does not surface hash material.
The supported subcommands match `ApiKeyAdminCommandKind` exactly:
@@ -201,7 +198,7 @@ Examples:
```bash
mxgateway apikey init-db
mxgateway apikey create-key --key-id ops.alice --display-name "Alice (ops)" --scopes read,write
mxgateway apikey create-key --key-id ops.alice --display-name "Alice (ops)" --scopes invoke:read,invoke:write
mxgateway apikey create-key --key-id area1.reader --display-name "Area 1 reader" --scopes invoke:read,metadata:read --read-subtree "Area1/*" --browse-subtree "Area1/*"
mxgateway apikey list-keys --json
mxgateway apikey revoke-key --key-id ops.alice
@@ -226,7 +223,7 @@ confirmation dialog and emits its own audit event
## Scope Serialization
Scopes are persisted as a single TEXT column rather than a join table because the set is small, never queried by membership at the database level, and changes atomically with the owning row. `ApiKeyScopeSerializer.Serialize` writes a JSON array sorted with `StringComparer.Ordinal` so equivalent scope sets produce byte-identical column values, which makes audit diffing and database comparisons deterministic:
Scopes are persisted as a single TEXT column rather than a join table because the set is small, never queried by membership at the database level, and changes atomically with the owning row. The shared library's `ApiKeyScopeSerializer.Serialize` writes a JSON array sorted with `StringComparer.Ordinal` so equivalent scope sets produce byte-identical column values, which makes audit diffing and database comparisons deterministic:
```csharp
public static string Serialize(IReadOnlySet<string> scopes)
@@ -249,29 +246,50 @@ public static IReadOnlySet<string> Deserialize(string value)
`Deserialize` tolerates an empty column by returning an empty set so older rows or hand-edited records do not crash the verifier.
## Dashboard Cookie and Hub Token
The API-key model above guards the gRPC surface. Interactive dashboard requests use a separate LDAP-backed cookie scheme (see [Gateway Dashboard Design](./GatewayDashboardDesign.md)). Two timeouts and a few configuration knobs govern that cookie:
- **Cookie idle timeout — 8 hours.** `DashboardServiceCollectionExtensions` applies the shared `ZbCookieDefaults.Apply` hardened cookie defaults (HttpOnly, `SameSite=Strict`, secure policy, sliding expiration) but overrides the library's 30-minute default with an 8-hour idle timeout, so an active operator is not signed out mid-shift. The expiration is sliding, so each authenticated request resets the window.
- **Hub bearer token — 30 minutes.** SignalR hub connections cannot always carry the HttpOnly cookie (the client SignalR JS may resolve the cookie scope to loopback), so the dashboard mints a short-lived data-protected bearer at `/hubs/token` via `HubTokenService`. The token lifetime is 30 minutes; the hubs accept either it or the cookie.
- **`MxGateway:Dashboard:CookieName`** overrides the cookie name (default `MxGatewayDashboard`, from `DashboardAuthenticationDefaults.CookieName`). Two gateway instances on the same host but different ports share a cookie scope — host+path, not port — so giving each a distinct name keeps their dashboard sessions from clobbering each other. Changing it signs out existing sessions on next deploy.
- **`MxGateway:Dashboard:RequireHttpsCookie`** (default `true`) restricts the cookie to HTTPS via `CookieSecurePolicy.Always`. Set it to `false` for plain-HTTP dev so the cookie uses `SameAsRequest`; leaving it `true` while serving the dashboard over plain HTTP from a non-localhost host breaks login, because browsers drop Secure cookies set over HTTP.
The dashboard issues claims through the shared `ZB.MOM.WW.Auth.AspNetCore.ZbClaimTypes` (e.g. `ZbClaimTypes.Username` = `zb:username`, `ZbClaimTypes.Name` = `ClaimTypes.Name` so `Identity.Name` resolves, `ZbClaimTypes.Role` = `ClaimTypes.Role` so `IsInRole`/`[Authorize(Roles=...)]` work). Cookie hardening defaults come from `ZbCookieDefaults`. Both live in the shared Auth packages, not the gateway.
## Registration
`AuthStoreServiceCollectionExtensions.AddSqliteAuthStore` wires every service in this subsystem as a singleton and registers the migration hosted service:
`AuthStoreServiceCollectionExtensions.AddSqliteAuthStore` is the gateway entry point. It does not register the parser, hasher, verifier, stores, or migrator directly — those come from the shared package. Instead it delegates to the package's `AddZbApiKeyAuth` and then layers the gateway-specific audit and CLI services:
```csharp
public static IServiceCollection AddSqliteAuthStore(this IServiceCollection services)
public static IServiceCollection AddSqliteAuthStore(
this IServiceCollection services,
IConfiguration configuration)
{
services.AddSingleton<IApiKeyParser, ApiKeyParser>();
services.AddSingleton<IApiKeySecretHasher, ApiKeySecretHasher>();
services.AddSingleton<IApiKeyVerifier, ApiKeyVerifier>();
// Register the shared API-key provider: binds ApiKeyOptions from MxGateway:Authentication,
// wires up the SQLite stores, the configuration-backed pepper provider, the verifier, the
// migrator and the migration hosted service.
services.AddZbApiKeyAuth(effectiveConfig, AuthenticationSectionPath);
// Gateway-owned canonical audit (ZB.MOM.WW.Audit) in the same SQLite file.
services.AddSingleton(sp =>
new SqliteCanonicalAuditStore(sp.GetRequiredService<AuthSqliteConnectionFactory>()));
services.AddSingleton<IAuditWriter>(sp => new CanonicalAuditWriter(/* ... */));
// Override the library's IApiKeyAuditStore so every audit lands in audit_event.
services.AddSingleton<IApiKeyAuditStore, CanonicalForwardingApiKeyAuditStore>();
// The shared admin command set, driven by the gateway CLI and dashboard.
services.AddSingleton(sp => new ApiKeyAdminCommands(/* ... */));
services.AddSingleton<ApiKeyAdminCliRunner>();
services.AddSingleton<AuthSqliteConnectionFactory>();
services.AddSingleton<IAuthStoreMigrator, SqliteAuthStoreMigrator>();
services.AddSingleton<IApiKeyStore, SqliteApiKeyStore>();
services.AddSingleton<IApiKeyAdminStore, SqliteApiKeyAdminStore>();
services.AddSingleton<IApiKeyAuditStore, SqliteApiKeyAuditStore>();
services.AddHostedService<AuthStoreMigrationHostedService>();
return services;
}
```
Singletons are safe because each operation opens its own short-lived `SqliteConnection` through the factory; there is no shared mutable state inside the services.
The gateway pins its own API-key contract — token prefix `mxgw` and the pepper key `MxGateway:ApiKeyPepper` — by layering those as fallback defaults under the supplied configuration before calling `AddZbApiKeyAuth`, because `ApiKeyOptions` is an init-only record that must be bound with those values present rather than mutated afterward. Explicit configuration still wins. `AddZbApiKeyAuth` binds `ApiKeyOptions` from the `MxGateway:Authentication` section and registers the connection factory, stores, pepper provider, verifier, migrator, and migration hosted service.
The audit-store override is registered *after* `AddZbApiKeyAuth` so it replaces the library's `TryAddSingleton` registration. The shared admin command set is not auto-registered by `AddZbApiKeyAuth`, so the gateway registers `ApiKeyAdminCommands` itself over the wired stores; the CLI and dashboard drive it. Library services are singletons and safe because each operation opens its own short-lived `SqliteConnection` through the factory.
## Related Documentation