Commit Graph

11 Commits

Author SHA1 Message Date
Joseph Doherty 46cb6965ac fix(security): close Theme 7 — 8 secrets / redaction / append-only findings
Security-sensitive batch, handled main-thread for careful judgment on
secret-leak and pepper-bypass paths.

Secret leak / pepper bypass:
- CD-016 (pepper bypass): InboundApiRepository's GetApiKeyByValueAsync no
  longer hashes the candidate with the unpeppered ApiKeyHasher.Default —
  ctor takes a lazy Func<IApiKeyHasher> accessor (lazy so test composition
  roots without a pepper still bring up the repository), and the DI
  registration wires sp.GetService<IApiKeyHasher>() so the production
  peppered hasher matches the stored KeyHash. Regression test asserts
  positive (peppered roundtrip) AND negative (Default hasher misses the
  same key — proving the lookup uses the injected hasher).
- MgmtSvc-020 (SMTP credential leak): UpdateSmtpConfig/ListSmtpConfigs
  now project through SmtpConfigPublicShape so the response payload and
  audit-row afterState never carry the Credentials field — only a
  HasCredentials bool. The SMTP password / OAuth2 client secret no
  longer leaves the Admin-only UpdateSmtpConfig boundary the caller
  already supplied it to.

Redaction:
- AuditLog-008 (test-fixture under-redact): new
  SafeDefaultAuditPayloadFilter (stateless singleton) does HTTP header
  redaction for the always-sensitive defaults (Authorization, X-Api-Key,
  Cookie, Set-Cookie). FallbackAuditWriter, CentralAuditWriter, and
  AuditLogIngestActor (both ingest paths) default to it instead of null
  — composition roots that bypass AddAuditLog can no longer write
  unredacted auth headers to the audit store.
- NotifService-025 (over-mask): CredentialRedactor.Scrub now only masks
  the last colon-separated component (password / clientSecret) AND only
  if it's >= 12 chars (typical password heuristic). Short user names
  like "root" no longer become global redaction tokens that eat unrelated
  diagnostic text. The full packed string is always masked regardless of
  length. 3 new negative tests pin the no-over-mask contract.

Audit-row correctness / fail-loud:
- InboundAPI-025: Program.cs UseWhen predicate now excludes /api/audit,
  /api/management, /api/centralui, /api/script-analysis AND requires POST
  — the AuditWriteMiddleware no longer emits spurious ApiInbound rows
  for audit-log query/export endpoints (write-on-read recursion broken).
- ESG-021: ApplyAuth now logs Warning (not silent) on empty
  AuthConfiguration for apikey/basic, unknown AuthType, and malformed
  Basic config. AuthConfiguration value NEVER logged. AuthType=none
  remains silent (documented unauthenticated sentinel).
- Security-021: AddSecurity now logs a startup Warning when
  RequireHttpsCookie=false — an HTTP-only deployment that previously
  transmitted the cookie-embedded JWT silently in cleartext is now
  audible in the log.

Defensive:
- CD-021: SwitchOutPartitionAsync's monthBoundary format string now
  yyyy-MM-dd HH:mm:ss.fffffff (datetime2(7) precision) so a future
  sub-second / non-midnight boundary doesn't silently round to the
  wrong partition.

Plus reconciled stale per-module Open-findings counters that had drifted
from earlier sessions (AuditLog, CD, ESG, IAPI, MgmtSvc, NotifService,
Security).

Build clean; all affected test projects green (Host 208, ConfigDB 242,
ESG 69, IAPI 151, MgmtSvc 100, NotifService 55, Security 85, AuditLog
247/248 — 1 pre-existing date-sensitive integration test flake on
PartitionPurgeTests, unrelated). README regenerated: 46 open (was 54).
2026-05-28 08:04:10 -04:00
Joseph Doherty 55f46e7c92 perf: close Theme 6 — 11 allocation / N+1 / lock-contention findings
Well-localised perf fixes across 8 modules.

Lock decoupling / SQL streaming:
- AuditLog-005: SqliteAuditWriter gains dedicated read-only _readConnection
  (+ _readLock) backed by WAL journal mode. GetBacklogStatsAsync,
  ReadPendingAsync, ReadPendingSinceAsync, ReadForwardedAsync no longer
  contend with the hot-path INSERT lock — backlog probes on a 30s timer
  can't stall the writer under multi-hundred-K Pending backlog.
- SEL-022: dropped Cache=Shared from SiteEventLogger's default connection
  string (single-connection logger; mode was dormant config).

Memory / streaming:
- CLI-019: bundle export streams base64 in 1 MB-aligned chunks via
  Convert.TryFromBase64Chars straight into the FileStream — no more
  full-bundle byte[] allocation.
- CentralUI-031: TransportImport now stages the upload to a per-session
  temp file under Path.GetTempPath() (replaces in-memory byte[] field);
  page implements IDisposable to delete the temp file on reset / new
  upload / dispose. Per-circuit working set drops from ~100 MB to ~80 KB.

N+1 hoisting:
- Transport-008: added ITemplateEngineRepository.GetTemplatesWithChildrenAsync
  bulk method; BundleImporter.PreviewAsync calls it once instead of per-
  template-name. Single query with .Include(...).AsSplitQuery().
- DM-023: BuildDeployArtifactsCommandAsync's per-site loop now references
  a pre-fetched GlobalArtifactSnapshot (shared scripts, external systems,
  DB connections, notification lists, SMTP) instead of re-querying per site.
- MgmtSvc-023: HandleQueryDeployments unfiltered branch uses one
  GetAllInstancesAsync bulk load + Dictionary<int,int?> lookup (was a
  GetInstanceByIdAsync per record).

Small allocations / per-tick rebuilds:
- InboundAPI-019: AuditWriteMiddleware gates EnableBuffering() on
  RequestHasBody() so GET/HEAD/DELETE/TRACE/OPTIONS and Content-Length:0
  requests skip the FileBufferingReadStream allocation.
- NotifOutbox-006: ResolveAdapters dictionary now cached on
  _adaptersCache (built lazily on first sweep) + actor-lifetime
  _adaptersScope; ResolveAdapters no longer rebuilds per dispatch tick.

Verify-only:
- Comm-017: Confirmed _inProgressDeployments was deleted by Comm-016 in
  commit ac96b83 — marked Resolved with that attribution. No code change.

Doc-correction:
- NS-022: Updated MailKitSmtpClientWrapper XML doc to spell out single-
  connection / per-delivery-factory contract (option (b) — transient
  client per Send — rejected because it re-handshakes TLS per email).

10+ new regression tests across 8 test projects. Build clean; affected
suites all green. README regenerated: 54 open (was 65).
2026-05-28 07:47:24 -04:00
Joseph Doherty 487859bff0 docs+code: close Theme 1 — 24 design-doc / XML-doc drift findings
Doc/XML-comment drift + small adherence fixes across 17 modules. Highlights:
- Host-017: site CoordinatedShutdown ordering — SiteStreamGrpcServer gains
  CancelAllStreams() (refuse new streams, cancel active), wired into
  Program.cs site branch via ApplicationStopping.
- InboundAPI-021: ParentExecutionId now travels on RouteToGet/SetAttributes
  symmetric with RouteToCallRequest; RouteHelper stamps from _parentExecutionId.
- ClusterInfra-012: ClusterOptionsValidator now requires both seed nodes.
- Comm-018: SiteCommunicationActor.HeartbeatMessage.IsActive derived from
  cluster leader check (was hardcoded true).
- DM-020: reconciliation audit row attributes the current user, not prior deployer.
- SEL-019: EventLogPurgeService early-exits on standby via active-node check.
- Plus comment/XML-doc accuracy fixes across AuditLog, ConfigurationDatabase,
  NotificationOutbox, SiteRuntime, SiteCallAudit; doc refreshes for Component-
  Commons / -ManagementService / -CLI / -ExternalSystemGateway / -HealthMonitoring
  / -Transport / -ConfigurationDatabase; CD-023 index-name doc alignment.

11 new regression tests (RouteHelper x4, SiteStreamGrpcServer x2,
ClusterOptionsValidator x1, SiteCommunicationActor x1, DeploymentService x1,
EventLogPurgeService x3). Build clean (0 warnings); InboundAPI/Communication/
Host suites all green. README regenerated: 112 open (was 136).
2026-05-28 06:28:31 -04:00
Joseph Doherty e536178323 fix(security): close auth & site-scoping gaps across 8 findings
Resolves the auth-theme batch from the 2026-05-28 baseline review (8 findings
across Security/CentralUI/ManagementService/CLI). The most consequential gaps:
NotificationReport + SiteCallsReport now route through SiteScopeService so a
site-scoped Deployment user cannot see or act on other sites' rows (CUI-028);
QueryAuditLogCommand is no longer "any authenticated user" — gated Admin-only
to match /api/audit/query's strictness (MS-018); RoleMapper preserves the
broader grant when a user is in both an unscoped and scoped Deployment LDAP
group, instead of silently narrowing to the scoped set (Sec-016); and the
dead SiteScopeRequirement/Handler are deleted so SiteScopeService is
unambiguously the sole site-scoping mechanism (Sec-017). Pending findings:
172 → 164.
2026-05-28 03:35:29 -04:00
Joseph Doherty f93b7b99bb code-review: 2026-05-28 baseline re-review of all 23 modules at 1eb6e97
Re-applies the full 10-category checklist to every src/ project — including
first-time reviews of the four newer components (AuditLog, NotificationOutbox,
SiteCallAudit, Transport) — so the code-reviews/ index reflects today's
codebase rather than the 2026-05-16 baseline. 172 new Open findings (0
Critical, 18 High, 62 Medium, 92 Low); 481 findings total across 23 modules.

regen-readme.py now derives each module's Last reviewed + Commit from its
findings.md header instead of hard-coding 2026-05-16 / 9c60592, so future
single-module re-reviews show their own date in the Module Status table.
2026-05-28 02:55:47 -04:00
Joseph Doherty bf6bd8de5a fix(management-service): resolve ManagementService-014..017 — site-scope enforcement on QueryDeployments, atomic override validation, curated fault messages, test coverage 2026-05-17 03:18:33 -04:00
Joseph Doherty 3b3760f026 docs(code-reviews): re-review batch 3 at 39d737e — Host, InboundAPI, ManagementService, NotificationService, Security
21 new findings: Host-012..015, InboundAPI-014..017, ManagementService-014..017, NotificationService-014..018, Security-012..015.
2026-05-17 00:48:25 -04:00
Joseph Doherty dab0056d1b fix(management-service): resolve ManagementService-005,008,010,011 — supervision strategy, configured command timeout, remove stale ResolveRoles path; ManagementService-012 deferred 2026-05-16 22:24:03 -04:00
Joseph Doherty 57679d49f2 fix(management-service): resolve ManagementService-004,006,007,013 — PipeTo dispatch, JsonDocument disposal, unified serialization, endpoint tests; re-triage MS-009 2026-05-16 21:22:01 -04:00
Joseph Doherty b249ca3bf7 fix(management-service): resolve ManagementService-001/002/003 — enforce site scope on query/snapshot handlers and DebugStreamHub 2026-05-16 19:47:17 -04:00
Joseph Doherty 977d7369a7 docs: add code review process and baseline review of all 19 modules
Establishes a per-module code review workflow under code-reviews/ and
records the 2026-05-16 baseline review (commit 9c60592): 241 findings
across all src/ modules (6 Critical, 46 High, 100 Medium, 89 Low).
This is the clean starting point for remediation work.
2026-05-16 18:09:09 -04:00