Commit Graph

223 Commits

Author SHA1 Message Date
Joseph Doherty bb5139fec2 test(gateway): fake worker responds to control commands (A6)
Add RespondToControlCommandAsync to FakeWorkerHarness so scripted fake
workers can auto-reply to the five control command kinds (Ping,
GetSessionState, GetWorkerInfo, DrainEvents, ShutdownWorker) with canned
replies whose shapes match the real WorkerPipeSession helpers.

Add five unit tests in FakeWorkerHarnessTests covering each control
command kind through the WorkerClient→pipe roundtrip, and one gateway
E2E test (GatewayService_WithFakeWorker_ControlCommandsRoundtripThroughGateway)
that exercises Ping, GetWorkerInfo, and DrainEvents through the full
gRPC→SessionManager→WorkerClient→named-pipe path using a scripted
ControlCommandFakeWorkerProcessLauncher.
2026-06-15 10:56:56 -04:00
Joseph Doherty dde9934e60 test(worker): silence CS0649 on reflection-only FakeMxStatus fields 2026-06-15 10:42:59 -04:00
Joseph Doherty 29399325d5 feat(worker): implement 6 MXAccess COM commands in executor
Wire up the previously-unimplemented Suspend, Activate, AuthenticateUser,
ArchestrAUserToId, AddBufferedItem, and SetBufferedUpdateInterval command
kinds in MxAccessCommandExecutor. These are real COM calls and run on the
STA via the executor.

- IMxAccessServer gains the 6 methods; MxAccessComServer routes them to the
  right interface version (Suspend/Activate -> ILMXProxyServer4 out MxStatus,
  AuthenticateUser -> base ILMXProxyServer, ArchestrAUserToId ->
  ILMXProxyServer2, AddBufferedItem/SetBufferedUpdateInterval ->
  ILMXProxyServer5).
- Suspend/Activate surface the native MxStatus, converted to MxStatusProxy
  via the existing MxStatusProxyConverter.
- AuthenticateUser hands the credential straight to MXAccess and never logs
  it; native HResult failures propagate via the dispatcher.
- MxAccessSession gains matching pass-throughs; AddBufferedItem registers
  the item handle in the handle registry.
- Unit tests (fake IMxAccessServer / fake COM object) cover each arm plus a
  password-non-leak assertion; existing IMxAccessServer fakes updated.

No proto changes (all request/reply messages already exist).
2026-06-15 10:41:22 -04:00
Joseph Doherty f94c206489 test(worker): use Register (a real STA command) for STA-dispatch race tests
Ping is now intercepted as a worker control command and answered on the
message-loop thread, so the dispatch/heartbeat/shutdown-race tests must use a
genuine STA-dispatched command kind to keep exercising DispatchAsync.
2026-06-15 10:25:34 -04:00
Joseph Doherty 72e1aca716 test(worker): fix control-command test helpers (CreatePipeSession overload, drop ConfigureAwait) 2026-06-15 10:23:45 -04:00
Joseph Doherty bf72cd8961 feat(worker): implement Ping/GetSessionState/GetWorkerInfo/DrainEvents/ShutdownWorker control commands
Answer the five worker control/lifecycle commands at the WorkerPipeSession
message-loop layer instead of the STA-bound MxAccessCommandExecutor. These
replies are built from process-level state (worker pid, assembly version,
worker lifecycle, the runtime session's event queue) the executor cannot see,
and ShutdownWorker must emit its OK reply before the graceful shutdown joins
the STA thread - dispatching it onto the STA would deadlock.

- Ping: OK reply, echoes message into diagnostic_message.
- GetSessionState: maps WorkerState to proto SessionState.
- GetWorkerInfo: pid, worker version, MXAccess ProgID/CLSID.
- DrainEvents: drains the runtime event queue into DrainEventsReply.
- ShutdownWorker: OK reply, then graceful shutdown, then stops the loop.

Tests added in WorkerPipeSessionTests; FakeRuntimeSession gains a
batch-size drain suppressor so DrainEvents does not race the background
drain loop.
2026-06-15 10:20:51 -04:00
Joseph Doherty 55526d5e56 fix(gateway): preserve raw client correlation id in denial audit DetailsJson + add wiring test (§1.2) 2026-06-15 09:56:24 -04:00
Joseph Doherty bd46ba1270 fix(test): drop removed logger arg from GalaxyRepositoryGrpcService test call sites; docs: STA phrasing
Remove the trailing NullLogger<GalaxyRepositoryGrpcService>.Instance argument
from all four CreateService/inline constructions in GalaxyRepositoryGrpcServiceTests
and GalaxyFilterInputSafetyTests, matching the now-4-param constructor after the
dead logger parameter was removed in 0032d2d. Also drop the now-unused
Microsoft.Extensions.Logging.Abstractions using from both files.

Rephrase the §5 STA blurb in docs/AlarmClientDiscovery.md: GatewayAlarmMonitor
routes polling *through* the worker's StaRuntime (which owns the STA pump) rather
than owning the pump itself.
2026-06-15 09:52:07 -04:00
Joseph Doherty 0032d2dc44 docs+chore: fix stale prose, project names, remove dead MapSqlException (§7)
- docs/plans/2026-06-14-deferred-followups.md: mark D1 as executed
  (commit 4af24b9; metric emitted at DashboardSnapshotService.cs:198);
  note D2 resolved as no-op; D3-D5 remain pending
- docs/AlarmClientDiscovery.md §5: rewrite STA "production fix needed"
  to past tense — alarms now route through GatewayAlarmMonitor/worker STA
- EventsHub.cs: replace stale "publisher side is a future follow-up"
  comment; DashboardEventBroadcaster is live and DI-registered
- CLAUDE.md: fix all project-name drift (src/MxGateway.* →
  src/ZB.MOM.WW.MxGateway.*; MxGateway.sln → ZB.MOM.WW.MxGateway.slnx;
  clients/dotnet/MxGateway.Client.sln → ZB.MOM.WW.MxGateway.Client.slnx)
- GalaxyRepositoryGrpcService.cs: remove dead MapSqlException method and
  its IDE0051 suppression pragma; drop now-unused ILogger ctor param and
  Microsoft.Data.SqlClient using; build confirmed 0 warnings/errors
2026-06-15 09:43:00 -04:00
Joseph Doherty 8415f35abd feat(gateway): thread ClientCorrelationId into constraint-denial audit (§1.2) 2026-06-15 09:42:40 -04:00
Joseph Doherty 144c293f05 chore(clients): bump all five clients 0.1.0 -> 0.1.1 for release 2026-06-15 05:07:17 -04:00
Joseph Doherty cebe67e9bd fix(worker): resilient failover switch; FIPS-safe synthetic GUID; dup-reference guard + tests (Worker-026..028, Worker.Tests-031..033) 2026-06-15 02:56:15 -04:00
Joseph Doherty ddf2d84fbc contracts: round-trip degraded provenance/watch-list/mode-changed; proto doc (Contracts-018,019) 2026-06-15 02:46:06 -04:00
Joseph Doherty 56dd56954b test(gateway): cover failback reason, FromFeed/SinceUtc badge paths; style + bounded drain (Tests-032..035) 2026-06-15 02:46:06 -04:00
Joseph Doherty d2c776901b fix(integrationtests): repair GatewayAlarmMonitor ctor build break; LDAP bind + docs (IntegrationTests-026..029) 2026-06-15 02:39:11 -04:00
Joseph Doherty 258e09e0de fix(server): propagate watch-list cancellation; doc + test gaps (Server-051..053) 2026-06-15 02:39:11 -04:00
Joseph Doherty 410acc92eb feat(dashboard): distinct 'forced' subtag provider badge
Render Fallback:Mode=ForceSubtag as a cyan 'Subtag monitoring (forced)'
badge, distinct from the amber failover 'degraded' badge, so an intentional
configuration isn't shown as a fault. Distinguished by the shared
AlarmProviderReasons.ForcedSubtag reason carried on the provider-status feed.
2026-06-15 01:43:17 -04:00
Joseph Doherty 9208225f9c fix: gateway reflects configured forced provider mode into gauge/feed (#2) 2026-06-15 01:10:04 -04:00
Joseph Doherty 4af24b9518 D1: surface AlarmProviderSwitchCount on dashboard metric list 2026-06-14 23:49:02 -04:00
Joseph Doherty 393e326275 docs(alarms): note operator/IDE toggle drives the live subtag smoke test
C6a: the rig's TestAlarm attributes are object-driven; a flip script OR a manual
operator/IDE toggle drives them (confirmed live 2026-06-14). Update the how-to-run
comments and Skip reason accordingly.
2026-06-14 02:35:59 -04:00
Joseph Doherty 986dcee14a worker(alarms): UnAdvise only advised handles in LmxSubtagAlarmSource teardown
B3: track advised handles separately from added handles so Dispose only UnAdvises
items that were actually advised — a write-only subtag (e.g. ack-comment added by
Write, never advised) is removed but not unadvised. Add Dispose tests covering the
advised/write-only split, full removal, single Unregister, and double-dispose
idempotency.
2026-06-14 02:35:59 -04:00
Joseph Doherty a3752799de worker(alarms): remove dead FailoverAlarmConsumer.subscriptionExpression
B4: the field was stored in Subscribe but never read — the primary is never
re-subscribed during probing. Drop it and keep the rationale as a comment.
2026-06-14 02:35:59 -04:00
Joseph Doherty 37aadf72b3 docs(alarms): clarify resolver cancellation contract; mark design doc superseded
C6b: IAlarmWatchListResolver.ResolveAsync doc now notes that while discovery being
unavailable never throws, a triggered cancellation token still propagates.
C7: annotate the original design doc where it drifted from the shipped code — metric
names / unimplemented watch-list gauges, and the proto-type location (gateway proto, not
worker proto).
2026-06-14 02:33:14 -04:00
Joseph Doherty 5573f2a229 galaxy(alarms): drop dead primitive branch from AlarmAttributesSql
B5: the candidate CTE's src_pri=1 (primitive-instance) UNION ALL branch was always
excluded by the final WHERE r.src_pri=0, so it added work with no output change. Remove
the branch and the now-constant src_pri column/filter. An alarm anchor is always a user
attribute, so output is identical.
2026-06-14 02:33:14 -04:00
Joseph Doherty 56abd64c6c metrics(alarms): expose provider-switch count in snapshot, bound the reason tag
B1: add AlarmProviderSwitchCount to GatewayMetricsSnapshot so the switch total is
readable without scraping the OTEL counter.
B2: replace the free-text reason tag on mxgateway.alarms.provider_switches with a
bounded AlarmProviderSwitchReason enum (failover/failback/unknown); the human-readable
reason stays in the structured log.
2026-06-14 02:33:02 -04:00
Joseph Doherty 5b31e99ab6 alarms: compose subtag reference from object's real Galaxy area for exact alarmmgr parity 2026-06-14 02:12:11 -04:00
Joseph Doherty 1a9367b5de worker(alarms): advise ack-comment subtag so the ack write targets an active MXAccess item 2026-06-13 11:23:39 -04:00
Joseph Doherty 98e997b573 test(alarms): probe writes evidence log to PROBE_LOG file 2026-06-13 11:15:05 -04:00
Joseph Doherty 0e8d911fd8 test(alarms): live runtime-path resolution probe (LiveMxAccessFact) for alarm subtags 2026-06-13 11:14:12 -04:00
Joseph Doherty e72763d703 alarms: use confirmed AVEVA AlarmExtension subtag names (InAlarm/Acked/AckMsg/Priority) 2026-06-13 11:07:22 -04:00
Joseph Doherty ec88532fe4 alarms: propagate degraded/source_provider through snapshot + gateway cache paths (integration fix I1/I2) 2026-06-13 10:53:55 -04:00
Joseph Doherty 27f6c9e6b7 dashboard(alarms): provider-status badge (alarmmgr vs degraded subtag) 2026-06-13 10:37:37 -04:00
Joseph Doherty 29bd504a99 test(alarms): end-to-end provider failover/failback lifecycle through GatewayAlarmMonitor 2026-06-13 10:34:24 -04:00
Joseph Doherty e10b252e3a test(alarms): drop unsupported Assert.Equal message args in live subtag smoke test (xUnit) 2026-06-13 10:30:39 -04:00
Joseph Doherty bcc54ca56b server(alarms): provider-mode gauge startup baseline; reconcile-lock comment; de-flake monitor test 2026-06-13 10:29:13 -04:00
Joseph Doherty ee459f43e1 test(alarms): opt-in live subtag-fallback smoke test (Skip by default)
Adds AlarmSubtagLiveSmokeTests to validate the open design item from Task 17:
confirms that LmxSubtagAlarmSource (real MxAccessComObjectFactory) wired to
SubtagAlarmConsumer synthesizes degraded Raise transitions with stable synthetic
GUIDs from Galaxy alarm subtags, and that AcknowledgeByName writes the
ack-comment subtag (rc=0). PLACEHOLDER_* subtag addresses are best-guess and
must be verified against MXAccess-Public-API.md + live Galaxy before flipping Skip.
2026-06-13 10:26:28 -04:00
Joseph Doherty ebf1d95f72 server(alarms): monitor resolves watch-list, sends ForcedMode/failover, reflects provider mode into feed + metrics 2026-06-13 10:20:03 -04:00
Joseph Doherty 3ccf0b5f9e server(alarms): honor ExcludeAttributes GR-only contract; warn on empty config-only watch-list 2026-06-13 10:12:58 -04:00
Joseph Doherty f7ccfd678e server(alarms): watch-list resolver merging GR discovery + config override 2026-06-13 10:09:10 -04:00
Joseph Doherty 3f5e5fc0b3 worker(alarms): route ForcedMode/watch-list/failover via AlarmCommandHandler; emit provider-mode-changed event 2026-06-13 10:04:33 -04:00
Joseph Doherty 7241a4fb9c worker(alarms): net48 index fix; enforce ProbeIntervalSeconds; OOM-safe catch; reset-on-failure test 2026-06-13 09:55:07 -04:00
Joseph Doherty d6c0bb41ca worker(alarms): failback probe re-polls the still-subscribed primary (no re-Subscribe) 2026-06-13 09:49:38 -04:00
Joseph Doherty 0a54c0bc4b worker(alarms): FailoverAlarmConsumer auto-failover/failback state machine 2026-06-13 09:46:47 -04:00
Joseph Doherty fd64b9260c worker(alarms): exact-match ack resolution (no substring false-match) + ack-by-guid tests 2026-06-13 09:42:00 -04:00
Joseph Doherty 4bd757a136 worker(alarms): SubtagAlarmConsumer synthesizing degraded transitions; dispatcher propagates Degraded 2026-06-13 09:35:49 -04:00
Joseph Doherty 1e2ed6d1ea worker(alarms): WriteRecord as class not positional record (net48 has no IsExternalInit) 2026-06-13 09:30:52 -04:00
Joseph Doherty 5f6655de27 server(alarms): drop redundant null-coalesce; tidy validator tests (review fixes) 2026-06-13 09:27:37 -04:00
Joseph Doherty fbc9cf56df worker(alarms): SyntheticAlarmGuid internal + alarmmgr-parity assertion (review fixes) 2026-06-13 09:26:52 -04:00
Joseph Doherty 4c0e14fc5d worker(alarms): COM-backed LmxSubtagAlarmSource advising alarm subtags 2026-06-13 09:24:09 -04:00
Joseph Doherty a46ce90e6f server(metrics): alarm provider mode gauge + provider switch counter (Task 13) 2026-06-13 09:18:11 -04:00