Commit Graph

125 Commits

Author SHA1 Message Date
Joseph Doherty a894717319 docs(plan): dashboard disable-login implementation plan + tasks 2026-06-16 07:54:43 -04:00
Joseph Doherty 0856cd4f93 docs: dashboard disable-login design + save session-resilience tasklist snapshot 2026-06-16 07:47:31 -04:00
Joseph Doherty c446bef64f chore(plan): mark session-resilience tasks 1-12 complete 2026-06-16 07:29:17 -04:00
Joseph Doherty 36ab8d15f1 feat(sessions): replay-on-reconnect with ReplayGap sentinel 2026-06-16 07:22:19 -04:00
Joseph Doherty 042f5e3d82 fix(sessions): expose DetachGraceSeconds in effective-config; single clock; close reconnect-vs-sweep race
- EffectiveSessionConfiguration: add DetachGraceSeconds field; GatewayConfigurationProvider
  forwards value.Sessions.DetachGraceSeconds (blocker fix).
- GatewaySession.InvokeAsync and ReadEventsAsync: switch TouchClientActivity calls from
  DateTimeOffset.UtcNow to _eventStreaming.TimeProvider.GetUtcNow() so Task 12 fake-clock
  control works end-to-end (split-clock fix).
- TOCTOU fix: add TryBeginCloseIfExpired(now, out alreadyClosing) to GatewaySession that
  re-checks IsLeaseExpiredCore/IsDetachGraceExpiredCore AND _activeEventSubscriberCount==0
  under _syncRoot before transitioning to Closing; CloseExpiredLeasesAsync calls it before
  CloseSessionCoreAsync so a reattach that wins the race leaves the session Ready/usable.
- Minors: lease-expiry-takes-precedence comment in CloseExpiredLeasesAsync; TOCTOU comment
  block; sweep-cycle latency note added to SessionOptions.DetachGraceSeconds XML doc and to
  GatewayConfiguration.md DetachGraceSeconds row.
- New tests: TryBeginCloseIfExpired_ReattachedSubscriberWinsRace_DeclinesClose (GatewaySession),
  CloseExpiredLeasesAsync_DoesNotCloseSessionThatReattachedBeforeSweepCloses (SessionManager),
  plus IsLeaseExpiredCore/IsDetachGraceExpiredCore private helpers used by the guard.
2026-06-16 07:11:59 -04:00
Joseph Doherty db95f8644f feat(sessions): detach-grace retention window for reconnect 2026-06-16 06:15:46 -04:00
Joseph Doherty ac42783e36 feat(sessions): multi-subscriber cap enforcement + mode-gated FailFast 2026-06-15 15:32:08 -04:00
Joseph Doherty 7b12eebbd1 docs: add MaxEventSubscribersPerSession to config shape example; assert its default
Add MaxEventSubscribersPerSession (value 8) to the Sessions block of the
Configuration Shape JSON example in GatewayConfiguration.md, matching the
appsettings.json default the options table already documents. Assert both
MaxEventSubscribersPerSession (8) and MaxPendingCommandsPerSession (128)
defaults in GatewayOptionsTests.OptionsBinding_UsesDesignDefaults.
2026-06-15 15:16:44 -04:00
Joseph Doherty bd190ab012 feat(config): allow multiple event subscribers + add MaxEventSubscribersPerSession cap
Remove the hard-rejection of AllowMultipleEventSubscribers=true in GatewayOptionsValidator
(fan-out is now implemented via SessionEventDistributor). Add MaxEventSubscribersPerSession
(default 8, must be >= 1) to SessionOptions, validate it, expose it in
EffectiveSessionConfiguration / GatewayConfigurationProvider, document it in
GatewayConfiguration.md and appsettings.json. Tests cover the no-error path for
AllowMultipleEventSubscribers=true, the 0/-1 rejection, positive pass, and default pass.
2026-06-15 15:13:21 -04:00
Joseph Doherty 1ea08c3b10 feat(dashboard): mirror events via SessionEventDistributor subscriber (fixes dark feed without gRPC client) 2026-06-15 14:42:32 -04:00
Joseph Doherty 039111ca05 feat(sessions): per-subscriber backpressure isolation in SessionEventDistributor 2026-06-15 13:39:25 -04:00
Joseph Doherty e962737d2c feat(sessions): add bounded replay ring buffer to SessionEventDistributor 2026-06-15 12:42:15 -04:00
Joseph Doherty 00c849e63b docs: session-resilience implementation plan (28 tasks, 5 phases) 2026-06-15 12:15:34 -04:00
Joseph Doherty 3fc6ccad30 docs: session-resilience epic design (fan-out, reconnect, ACL, reattach) 2026-06-15 12:11:46 -04:00
Joseph Doherty a56ce0ddbd docs: refresh stillpending.md for completed work; record residuals (§7/E2)
Mark §1.1 (11 worker commands), §1.2 (audit CorrelationId), and §4 client
CLI/helper parity as Resolved with commit refs; correct §4.4 (dotnet version
already worked). Record open residuals: §1.3 live failover counter, §3.2
multi-sample buffered conversion, §1.4 vendor-stub ack, DrainEvents snapshot
semantics.
2026-06-15 11:35:48 -04:00
Joseph Doherty bd46ba1270 fix(test): drop removed logger arg from GalaxyRepositoryGrpcService test call sites; docs: STA phrasing
Remove the trailing NullLogger<GalaxyRepositoryGrpcService>.Instance argument
from all four CreateService/inline constructions in GalaxyRepositoryGrpcServiceTests
and GalaxyFilterInputSafetyTests, matching the now-4-param constructor after the
dead logger parameter was removed in 0032d2d. Also drop the now-unused
Microsoft.Extensions.Logging.Abstractions using from both files.

Rephrase the §5 STA blurb in docs/AlarmClientDiscovery.md: GatewayAlarmMonitor
routes polling *through* the worker's StaRuntime (which owns the STA pump) rather
than owning the pump itself.
2026-06-15 09:52:07 -04:00
Joseph Doherty 0032d2dc44 docs+chore: fix stale prose, project names, remove dead MapSqlException (§7)
- docs/plans/2026-06-14-deferred-followups.md: mark D1 as executed
  (commit 4af24b9; metric emitted at DashboardSnapshotService.cs:198);
  note D2 resolved as no-op; D3-D5 remain pending
- docs/AlarmClientDiscovery.md §5: rewrite STA "production fix needed"
  to past tense — alarms now route through GatewayAlarmMonitor/worker STA
- EventsHub.cs: replace stale "publisher side is a future follow-up"
  comment; DashboardEventBroadcaster is live and DI-registered
- CLAUDE.md: fix all project-name drift (src/MxGateway.* →
  src/ZB.MOM.WW.MxGateway.*; MxGateway.sln → ZB.MOM.WW.MxGateway.slnx;
  clients/dotnet/MxGateway.Client.sln → ZB.MOM.WW.MxGateway.Client.slnx)
- GalaxyRepositoryGrpcService.cs: remove dead MapSqlException method and
  its IDE0051 suppression pragma; drop now-unused ILogger ctor param and
  Microsoft.Data.SqlClient using; build confirmed 0 warnings/errors
2026-06-15 09:43:00 -04:00
Joseph Doherty 883557fc8a docs: implementation plan for stillpending.md completion
28 tasks across 5 workstreams (A worker control cmds, B worker COM cmds,
C audit CorrelationId, D client CLI parity, E docs). Zero proto changes;
worker net48/x86 + Java on windev, rest local.
2026-06-15 09:35:50 -04:00
Joseph Doherty 4a00b1bdc1 docs: design for completing stillpending.md actionable items
Covers the 11 worker command kinds (§1.1), audit CorrelationId threading
(§1.2), client CLI/helper parity (§4), and doc hygiene (§7). Key finding:
all 11 commands already have proto/validation/scope/routing, so this is a
worker-executor + COM-wrapper + client-CLI effort with zero contract changes.
2026-06-15 09:32:01 -04:00
Joseph Doherty d2c776901b fix(integrationtests): repair GatewayAlarmMonitor ctor build break; LDAP bind + docs (IntegrationTests-026..029) 2026-06-15 02:39:11 -04:00
Joseph Doherty 258e09e0de fix(server): propagate watch-list cancellation; doc + test gaps (Server-051..053) 2026-06-15 02:39:11 -04:00
Joseph Doherty b40aaeef05 docs: forced-subtag fix resolution (#1 proto artifact, #2 fixed) 2026-06-15 01:33:14 -04:00
Joseph Doherty c6f17557f6 docs: forced-subtag mode fix plan 2026-06-15 01:04:46 -04:00
Joseph Doherty bbbef4d098 D2: document no current duplicate / endpoint (no-op) 2026-06-14 23:49:47 -04:00
Joseph Doherty 371ce53409 docs: add deferred follow-ups plan (D1-D5) 2026-06-14 23:46:21 -04:00
Joseph Doherty 37aadf72b3 docs(alarms): clarify resolver cancellation contract; mark design doc superseded
C6b: IAlarmWatchListResolver.ResolveAsync doc now notes that while discovery being
unavailable never throws, a triggered cancellation token still propagates.
C7: annotate the original design doc where it drifted from the shipped code — metric
names / unimplemented watch-list gauges, and the proto-type location (gateway proto, not
worker proto).
2026-06-14 02:33:14 -04:00
Joseph Doherty 5b31e99ab6 alarms: compose subtag reference from object's real Galaxy area for exact alarmmgr parity 2026-06-14 02:12:11 -04:00
Joseph Doherty 64db828d71 docs(alarms): record live confirmation of subtag path + ack; advise-before-write requirement 2026-06-13 11:26:08 -04:00
Joseph Doherty e72763d703 alarms: use confirmed AVEVA AlarmExtension subtag names (InAlarm/Acked/AckMsg/Priority) 2026-06-13 11:07:22 -04:00
Joseph Doherty 3c9becc8d6 docs(plan): mark all alarm-subtag-fallback tasks completed 2026-06-13 10:55:18 -04:00
Joseph Doherty 2f30f0c7c0 docs(alarms): document alarmmgr->subtag fallback (providers, failover, config, contract, parity) 2026-06-13 10:43:37 -04:00
Joseph Doherty c75920c620 docs(plan): correct alarm proto location to mxaccess_gateway.proto (Tasks 1-2) 2026-06-13 09:18:11 -04:00
Joseph Doherty 5ea5618315 docs: implementation plan for alarm subtag-monitoring fallback
18 TDD tasks across contracts, worker (SubtagAlarmConsumer + FailoverAlarmConsumer),
gateway (GR-SQL watch-list discovery, monitor mode reflection, metrics, dashboard),
and docs. Grounded in current signatures; parity-preserving (worker-side synthesis).
2026-06-13 08:44:42 -04:00
Joseph Doherty 38a0ad8ab4 docs: design for alarmmgr→subtag alarm-provider fallback
Auto-failover/failback between the wnwrap alarmmgr consumer and a new
worker-side SubtagAlarmConsumer that advises alarm subtags and synthesizes
transitions. GR-SQL+config watch-list discovery, ack via ack-comment write,
degraded state surfaced in the gRPC contract and dashboard/metrics.
2026-06-13 08:35:18 -04:00
Joseph Doherty e57d864ab2 fix(dashboard): make dashboard auth cookie name configurable
The dashboard auth cookie name was hardcoded to the constant
DashboardAuthenticationDefaults.CookieName (MxGatewayDashboard). Browser
cookies are scoped by host+path but NOT by port, so two gateway instances
sharing a hostname would clobber each other's dashboard session under the
shared name.

Add DashboardOptions.CookieName (MxGateway:Dashboard:CookieName); null/blank
keeps the canonical default. Applied in the existing dashboard cookie
PostConfigure (runs after the inline AddCookie default, so it wins). Behaviour
is unchanged when unset. Adds a Tests case for the override.
2026-06-03 13:11:29 -04:00
Joseph Doherty dbf550da8b docs(mxgateway): sync Metrics.md to renamed meter + seconds histogram units 2026-06-01 16:48:46 -04:00
Joseph Doherty 2eb81379e4 docs: TLS auto-cert and lenient client trust 2026-06-01 07:43:13 -04:00
Joseph Doherty c4e7ddea70 docs: implementation plan for gateway TLS auto-cert and lenient client trust 2026-06-01 07:01:58 -04:00
Joseph Doherty 6bfa4fe884 docs: design for gateway TLS auto-cert and lenient client trust 2026-06-01 06:54:23 -04:00
Joseph Doherty 97e583e96b docs: implementation plan for per-language LazyBrowseNode walker
9 tasks: Java toolchain install (Homebrew), 5 parallel per-language
walker implementations, README updates, final verification. Java
walker is gated on toolchain bootstrap success; other languages
proceed independently if Java fails.
2026-05-28 14:17:52 -04:00
Joseph Doherty eaf479349d docs: design for client-side LazyBrowseNode walker + per-language tests
Adds one high-level walker per client (.NET/Python/Rust/Go/Java) plus
six unit tests each against existing fake transports. One-shot idempotent
Expand semantics; pagination hidden inside the helper. Includes Java
toolchain bootstrap (Homebrew Temurin + Gradle) so the Java client can
build locally on the macOS dev host.
2026-05-28 14:12:03 -04:00
Joseph Doherty 83a4d41fce docs: align design doc test-plan with InvalidArgument error mapping 2026-05-28 13:30:19 -04:00
Joseph Doherty 0b389f5a97 docs: document BrowseChildren RPC and lazy browse architecture 2026-05-28 13:19:08 -04:00
Joseph Doherty cf54a278e1 docs: record lazy-browse stays wire-only; align error mapping 2026-05-28 13:18:23 -04:00
Joseph Doherty b3ebf583ad docs: implementation plan for lazy-browse BrowseChildren RPC
12-task bite-sized plan executing the approved design.
Includes native task persistence file.
2026-05-28 12:41:11 -04:00
Joseph Doherty edb812d859 docs: design for lazy-browse BrowseChildren RPC
OPC UA-style level-at-a-time browse across gRPC, dashboard, and the
shared cache projector. Server still loads the full Galaxy hierarchy;
laziness is wire-side and UI-side only.
2026-05-28 12:34:37 -04:00
Joseph Doherty 615b487a77 docs+ui: backfill XML doc comments and finish dashboard layout pass
Adds missing <summary>/<param> XML docs across 99 server, worker, and test
files so CommentChecker reports zero issues (TreatWarningsAsErrors needs the
analyzer clean). Bundles in WIP dashboard work: NavSection extraction,
MainLayout/site.css/js styling alignment, and DashboardOptions/Auth tweaks.
2026-05-27 14:20:10 -04:00
Joseph Doherty e80f3c70b6 docs: cover admin dashboard actions + API key Delete
Update the design docs so they match the implemented Admin-only
dashboard surface. GatewayDashboardDesign now documents the Close
session / Kill worker controls and the new Delete action on revoked
API keys, plus the ConfirmDialog gate for every destructive action.
Sessions.md adds the SessionManager.KillWorkerAsync entry alongside
CloseSessionAsync and explains the immediate-kill semantics. Authentication.md adds the IApiKeyAdminStore.DeleteAsync write path
and the dashboard-delete-key audit event. DashboardInterfaceDesign
drops the "read-only until admin workflows have a separate design"
line in favor of the confirm-before-act invariant.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-24 07:35:25 -04:00
Joseph Doherty d692232191 dashboard: clear deferred items — EventsHub publisher + doc refresh
EventsHub publisher (closes the v2.1 follow-up flagged in the previous commit)

EventStreamService now mirrors every MxEvent it forwards to a gRPC client
into the `EventsHub` group for the session. The fan-out goes through a new
singleton `IDashboardEventBroadcaster`:

  * IDashboardEventBroadcaster — abstraction so EventStreamService doesn't
    take a direct dependency on SignalR.
  * DashboardEventBroadcaster — singleton implementation that hands the
    SendAsync to IHubContext<EventsHub> as fire-and-forget. Errors are
    logged at debug and dropped so the source gRPC stream is never
    blocked.

EventStreamService now takes IDashboardEventBroadcaster as a ctor parameter
and calls Publish(sessionId, publicEvent) once per event after sequence
filtering, before the bounded queue write. Test fixtures and the live
integration harness pass NullDashboardEventBroadcaster.Instance so the
broadcaster is a no-op in unit tests.

SessionDetailsPage adds a "Recent events" panel:
  * implements IAsyncDisposable
  * opens a second HubConnection via DashboardHubConnectionFactory targeting
    /hubs/events
  * calls SubscribeSession(SessionId) on Start
  * renders the most recent 50 events in a small table (worker seq, family,
    server/item handle, alarm reference when the event is OnAlarmTransition)
  * shows a live/offline conn-pill driven by HubConnection.Closed /
    Reconnected events

The dashboard mirror is intentionally passive — events appear only while a
gRPC client is also consuming that session's events. Documented as such in
the empty-state copy and in GatewayDashboardDesign.md.

Documentation refresh

Every doc that referenced the retired options (PathBase, RequireAdminScope,
RequiredGroup) and the old API-key-cookie auth flow is updated to describe
the new model:

  * CLAUDE.md — Authentication section now explains LDAP bind +
    GroupToRole + HubToken bearer flow.
  * gateway.md — Dashboard section: root-mounted routes, snapshot/alarms/
    events SignalR hubs, LDAP cookie + bearer scheme.
  * docs/GatewayConfiguration.md — drop PathBase / RequireAdminScope rows,
    add GroupToRole row, append "Authorization policies" and "SignalR hubs"
    subsections describing the three policies and the /hubs/* endpoints.
  * docs/GatewayDashboardDesign.md — hosting model (root mount, new
    endpoint layout), Realtime Updates rewritten as a hub table
    (DashboardSnapshotHub / AlarmsHub / EventsHub with producers, payloads,
    and routing), Authentication And Authorization rewritten around LDAP +
    role mapping + the hub bearer flow, Configuration block updated.
  * docs/GatewayProcessDesign.md — security-section dashboard paragraph
    and the example config block both refreshed to LDAP/role auth.
  * docs/ImplementationPlanGateway.md — dashboard-auth deliverable list
    updated (LDAP bind + GroupToRole + /hubs/token bearer mint replace the
    API-key login flow).
  * docs/GatewayTesting.md — DashboardLdapLiveTests blurb describes the
    GroupToRole fixture (`{ GwAdmin: Admin }`) instead of the retired
    RequiredGroup default; success-path assertion explains the role-claim
    check.

Verification: 475 server tests, 275 worker tests (+ 9 dev-rig skips), 18
integration tests (live MxAccess + LDAP + Galaxy) all pass — including the
live worker smoke test fixture that now constructs EventStreamService with
the new broadcaster parameter.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-24 02:07:30 -04:00
Joseph Doherty dc9c0c950c rename: prefix gateway projects/namespaces with ZB.MOM.WW + sln→slnx
Apply the ZB.MOM.WW. prefix to all gateway-side projects, folders,
.csproj/.sln contents, C# namespaces, using directives, generated proto
C# (csharp_namespace + checked-in generated files), InternalsVisibleTo
attributes, project-name string literals (LoadProject, .sln lookups,
worker exe paths, staticwebassets manifest), and the install/script/doc
references that point at any of the above. Migrate the solution from
.sln to .slnx via `dotnet sln migrate` and delete the old file.

External-runtime identifiers are intentionally NOT prefixed so external
configuration keeps working:
- GatewayMetrics.cs MeterName ("MxGateway.Server")
- DashboardAuthenticationDefaults Scheme/Policy ("MxGateway.Dashboard")
- GatewayRequestLoggingMiddleware logger category ("MxGateway.Request")
- StaRuntime thread name ("MxGateway.Worker.STA")
- appsettings.json root section "MxGateway" + env-var prefix
  MxGateway__... and secret-name MxGateway:ApiKeyPepper
- C:\ProgramData\MxGateway\ data dir paths

Also fixes two tests that were not rename-related but became visible
while validating the rename:

- WorkerLiveMxAccessSmokeTests.ShutDownAsync: cancellation that the
  gateway service correctly maps to RpcException(Cancelled) per gRPC
  convention was being misclassified as a stream fault. Added a sibling
  catch on RpcException with StatusCode.Cancelled.

- IntegrationTestEnvironment.ResolveRepositoryRoot: extracted IsRepositoryRoot
  and made it accept either a .git marker OR a .sln/.slnx next to src/
  so the worker-exe walker works in non-git working copies.

clients/proto/proto-inputs.json's protoRoot updated to point at
src/ZB.MOM.WW.MxGateway.Contracts/Protos.

Verified by `dotnet build` and a full `dotnet test` of the .slnx with
MXGATEWAY_RUN_LIVE_{MXACCESS,LDAP,GALAXY}_TESTS=1:
  Tests: 472/472 pass
  Worker.Tests: 280/280 pass (4 dev-rig [Fact(Skip=...)] skipped)
  IntegrationTests: 18/18 pass

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-23 16:22:23 -04:00