9ad17e2964483d29ecd13e1f24a60bc06bfbecd2
4 Commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
4d77279e7e |
Resolve Server-044..050: KillWorker accounting + admin service hardening
Server-044 KillWorkerAsync catch path now calls _metrics.SessionRemoved
so the open-session gauge does not leak when KillWorker throws.
Server-045 KillWorkerAsync routes through a new
GatewaySession.KillWorkerWithCloseGateAsync that takes the
per-session close lock, so concurrent kills count SessionsClosed
exactly once.
Server-046 CloseSessionCoreAsync's SessionCloseStartedException branch and
ShutdownAsync's kill fallback both increment SessionsClosed (not
just the gauge), so the counter and gauge stay consistent.
Server-047 ApiKeysPage.ConfirmPendingAsync holds PendingAction across the
awaited action and clears it in finally, matching the sessions
pages.
Server-048 Closed: the 044/045 regression tests cover the previously-
untested kill paths.
Server-049 IDashboardSessionAdminService + DashboardSessionAdminService
now carry XML docs that pin the Admin gate, missing-session
return-Fail semantics, and the dashboard-admin-kill reason.
Server-050 CloseSessionAsync and KillWorkerAsync catch unexpected
exceptions after the SessionManagerException catches and return
a friendly Fail; OperationCanceledException tied to the caller
token still propagates.
All resolved at 2026-05-24; 503/503 gateway tests pass.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
056f0d8808 |
code-reviews: re-review Server at 42b0037
Append 7 new findings (Server-044..050) covering the destructive-action wave: KillWorkerAsync metric/state leaks, ShutdownAsync kill-fallback gauge leak, inconsistent ConfirmDialog cleanup across pages, missing XML docs on the new DashboardSessionAdmin surface, and unhandled RemoveSessionAsync exception paths. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
327e9c5f94 |
Resolve Server-031..032 (re-triaged) + Server-038..043
Server-031: re-triaged. The recommended gateway-side "skip-while-command-in-flight" guard is already in place at WorkerClient.HeartbeatLoopAsync via WorkerClientOptions.HeartbeatStuckCeiling (default 75s = 5× HeartbeatGrace). Two regression tests pin the behaviour. Recommendation #1 (decouple worker-side _writeLock) is a Worker-module concern (Worker-017 / Worker-023) and out of scope here. Server-032: re-triaged. Recommendation #2 (rich diagnostic) is already in EnqueueWorkerEventAsync, with #3 (overflow grace) absorbed by the TryWrite → WriteAsync-with-timeout fall-through. Test EnqueueWorkerEvent_WhenChannelFullPastTimeout_FaultsWithRichDiagnostic pins the diagnostic string. Recommendation #1 (prose contract in gateway.md / docs) is deferred — outside this pass's edit scope. Server-038 (Security): EventsHub.SubscribeSession's missing per-session ACL is documented with a TODO(per-session-acl) and a <remarks> block explaining the v1 acceptance (any dashboard role can subscribe to any session — non-secret metadata, redacted value logging). The per-session ACL design lands in a follow-up once a session-scoped role exists. Server-039 (Error handling): HubTokenService.Validate now rejects a deserialized payload where both Name and NameIdentifier are null/empty. New test file HubTokenServiceTests.cs covers the regression and five sanity cases. TDD confirmed. Server-040 (Conventions): MapGroupsToRoles gains a precedence comment explaining "full literal match first, leading-RDN fallback; OrdinalIgnoreCase via DashboardOptions.GroupToRole". Documentation-only. Server-041 (Design adherence): EventStreamService.ProduceEventsAsync wraps the broadcaster.Publish call in try/catch (Exception). The producer loop and gRPC stream are no longer at the mercy of the broadcaster's never-throw discipline. New regression test StreamEventsAsync_WhenDashboardBroadcasterThrows_StillYieldsEventsAndDoesNotFaultSession. Server-042 (Performance): DashboardSnapshotPublisher.ExecuteAsync now mirrors AlarmsHubPublisher's reconnect loop — wraps the await foreach in a while-not-cancelled, catches general exceptions, and Task.Delays 5s before retrying. An internal ctor accepts a shorter delay for the test. New test file DashboardSnapshotPublisherTests.cs covers the throw-then-yield reconnect path and the normal-completion case. Server-043 (Documentation): HubTokenService class XML doc gains a <remarks> describing the singleton lifetime, the two consumer scopes (DashboardHubConnectionFactory scoped, HubTokenAuthenticationHandler transient), and the thread-safety contract. Verification: dotnet build src/ZB.MOM.WW.MxGateway.slnx clean (0 warnings / 0 errors); src/ZB.MOM.WW.MxGateway.Tests 486/486 passing. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
d2d2e5f68f |
code-review 2026-05-24: re-review at d692232 across all 11 modules
Restores the `code-reviews/` tree (was unwritten on this working copy)
and re-reviews every module per `REVIEW-PROCESS.md` against HEAD
`d692232`. The diff in scope is the five commits since the last sweep:
`dc9c0c9` (ZB.MOM.WW gateway-side rename + slnx migrate),
`397d3c5` (client SDK rename + the missing alarm-RPC proto types and
the .NET DiscoverHierarchyOptions POCO), `27ed651` (role-based LDAP
auth + HubToken bearer, drop PathBase), `6594359` (sidebar layout +
three SignalR push hubs), and `d692232` (EventsHub publisher + doc
refresh).
Module status
| Module | Open | Total | Delta this pass |
|---|---|---|---|
| Server | 8 | 43 | +6 |
| Contracts | 2 | 17 | +2 |
| Tests | 2 | 26 | +2 |
| IntegrationTests | 3 | 24 | +3 |
| Client.Java | 5 | 31 | +5 |
| Client.Rust | 1 | 21 | +1 |
| Worker | 0 | 25 | 0 (rename-only diff, clean) |
| Worker.Tests | 0 | 30 | 0 (rename-only diff, clean) |
| Client.Dotnet | 0 | 17 | 0 (rename + alarm-fix diff, clean) |
| Client.Python | 0 | 21 | 0 (rename + alarm-fix diff, clean) |
| Client.Go | 0 | 21 | 0 (rename + alarm-fix diff, clean) |
Total new findings: 19. Severity breakdown: 1 Medium-security
(Server-038), 4 Medium-documentation/coverage, 14 Low.
New findings
* Server-038 (Medium / Security) — EventsHub.SubscribeSession accepts
any session id from any Viewer; no per-session ACL guards the
EventsHub group fan-out.
* Server-039 (Low / Error handling) — HubTokenService.Validate
accepts a payload with null Name/NameIdentifier.
* Server-040 (Low / Conventions) — MapGroupsToRoles undocumented
full-vs-RDN lookup precedence.
* Server-041 (Low / Design adherence) — EventStreamService calls
IDashboardEventBroadcaster.Publish without a try/catch — fragile
seam relying on the never-throw contract.
* Server-042 (Low / Performance) — DashboardSnapshotPublisher tight
retry loop with no backoff (vs AlarmsHubPublisher 5s delay).
* Server-043 (Low / Documentation) — HubTokenService singleton
sharing across login + hub-token validation undocumented.
* Contracts-016 (Low / Conventions) — QueryActiveAlarmsRequest.session_id
reserved-for-future-use ambiguity.
* Contracts-017 (Low / Documentation) — rpc QueryActiveAlarms doc
omits the alarm_filter_prefix filter description.
* Tests-025 (Low / Conventions) — duplicate NullDashboardEventBroadcaster
fakes in EventStreamServiceTests and GatewayEndToEndFakeWorkerSmokeTests.
* Tests-026 (Medium / Testing coverage) — no test proves
EventStreamService actually calls IDashboardEventBroadcaster.Publish.
* IntegrationTests-022 (Low / Conventions) — ResolveRepositoryRoot
silent fallback to Directory.GetCurrentDirectory().
* IntegrationTests-023 (Low / Testing coverage) — DashboardLdapLiveTests
success-path asserts ldap_group but not the Role claim.
* IntegrationTests-024 (Low / Conventions) — inline
NullDashboardEventBroadcaster fake duplicates Tests-side copies.
* Client.Java-027 (Medium / Documentation) — README + JavaClientDesign
Gradle task names still use the old short project names.
* Client.Java-028 (Medium / Design adherence) — JavaClientDesign
build-layout shows the old `com/dohertylan/mxgateway/` package paths.
* Client.Java-029 (Low / Documentation) — README installDist path
cites the wrong directory.
* Client.Java-030 (Low / Testing coverage) — no Java test exercises
the regenerated QueryActiveAlarmsRequest RPC.
* Client.Java-031 (Low / Conventions) — README prose uses old short
project names instead of canonical prefixed ones.
* Client.Rust-021 (Low / Design adherence) — RustClientDesign.md
"Crate layout" shows an aspirational nested `crates/zb-mom-ww-mxgateway-client/`
that does not exist; actual layout is the flat top-level crate.
Two pre-existing pending findings (Server-031 lock-contention,
Server-032 bounded event channel) remain unchanged — neither was
touched by this wave of commits.
Process notes
- The `code-reviews/` tree was not in this working copy's git
history (the local extract pre-dates the divergent branch that
carried the reviews). Restored from `dd7ca16` via
`git checkout
|