Compare commits

..

80 Commits

Author SHA1 Message Date
Joseph Doherty 64db828d71 docs(alarms): record live confirmation of subtag path + ack; advise-before-write requirement 2026-06-13 11:26:08 -04:00
Joseph Doherty 1a9367b5de worker(alarms): advise ack-comment subtag so the ack write targets an active MXAccess item 2026-06-13 11:23:39 -04:00
Joseph Doherty 98e997b573 test(alarms): probe writes evidence log to PROBE_LOG file 2026-06-13 11:15:05 -04:00
Joseph Doherty 0e8d911fd8 test(alarms): live runtime-path resolution probe (LiveMxAccessFact) for alarm subtags 2026-06-13 11:14:12 -04:00
Joseph Doherty e72763d703 alarms: use confirmed AVEVA AlarmExtension subtag names (InAlarm/Acked/AckMsg/Priority) 2026-06-13 11:07:22 -04:00
Joseph Doherty 3c9becc8d6 docs(plan): mark all alarm-subtag-fallback tasks completed 2026-06-13 10:55:18 -04:00
Joseph Doherty ec88532fe4 alarms: propagate degraded/source_provider through snapshot + gateway cache paths (integration fix I1/I2) 2026-06-13 10:53:55 -04:00
Joseph Doherty 2f30f0c7c0 docs(alarms): document alarmmgr->subtag fallback (providers, failover, config, contract, parity) 2026-06-13 10:43:37 -04:00
Joseph Doherty 27f6c9e6b7 dashboard(alarms): provider-status badge (alarmmgr vs degraded subtag) 2026-06-13 10:37:37 -04:00
Joseph Doherty 29bd504a99 test(alarms): end-to-end provider failover/failback lifecycle through GatewayAlarmMonitor 2026-06-13 10:34:24 -04:00
Joseph Doherty e10b252e3a test(alarms): drop unsupported Assert.Equal message args in live subtag smoke test (xUnit) 2026-06-13 10:30:39 -04:00
Joseph Doherty bcc54ca56b server(alarms): provider-mode gauge startup baseline; reconcile-lock comment; de-flake monitor test 2026-06-13 10:29:13 -04:00
Joseph Doherty ee459f43e1 test(alarms): opt-in live subtag-fallback smoke test (Skip by default)
Adds AlarmSubtagLiveSmokeTests to validate the open design item from Task 17:
confirms that LmxSubtagAlarmSource (real MxAccessComObjectFactory) wired to
SubtagAlarmConsumer synthesizes degraded Raise transitions with stable synthetic
GUIDs from Galaxy alarm subtags, and that AcknowledgeByName writes the
ack-comment subtag (rc=0). PLACEHOLDER_* subtag addresses are best-guess and
must be verified against MXAccess-Public-API.md + live Galaxy before flipping Skip.
2026-06-13 10:26:28 -04:00
Joseph Doherty ebf1d95f72 server(alarms): monitor resolves watch-list, sends ForcedMode/failover, reflects provider mode into feed + metrics 2026-06-13 10:20:03 -04:00
Joseph Doherty 3ccf0b5f9e server(alarms): honor ExcludeAttributes GR-only contract; warn on empty config-only watch-list 2026-06-13 10:12:58 -04:00
Joseph Doherty f7ccfd678e server(alarms): watch-list resolver merging GR discovery + config override 2026-06-13 10:09:10 -04:00
Joseph Doherty 3f5e5fc0b3 worker(alarms): route ForcedMode/watch-list/failover via AlarmCommandHandler; emit provider-mode-changed event 2026-06-13 10:04:33 -04:00
Joseph Doherty 7241a4fb9c worker(alarms): net48 index fix; enforce ProbeIntervalSeconds; OOM-safe catch; reset-on-failure test 2026-06-13 09:55:07 -04:00
Joseph Doherty d6c0bb41ca worker(alarms): failback probe re-polls the still-subscribed primary (no re-Subscribe) 2026-06-13 09:49:38 -04:00
Joseph Doherty 0a54c0bc4b worker(alarms): FailoverAlarmConsumer auto-failover/failback state machine 2026-06-13 09:46:47 -04:00
Joseph Doherty fd64b9260c worker(alarms): exact-match ack resolution (no substring false-match) + ack-by-guid tests 2026-06-13 09:42:00 -04:00
Joseph Doherty 4bd757a136 worker(alarms): SubtagAlarmConsumer synthesizing degraded transitions; dispatcher propagates Degraded 2026-06-13 09:35:49 -04:00
Joseph Doherty 1e2ed6d1ea worker(alarms): WriteRecord as class not positional record (net48 has no IsExternalInit) 2026-06-13 09:30:52 -04:00
Joseph Doherty 5f6655de27 server(alarms): drop redundant null-coalesce; tidy validator tests (review fixes) 2026-06-13 09:27:37 -04:00
Joseph Doherty fbc9cf56df worker(alarms): SyntheticAlarmGuid internal + alarmmgr-parity assertion (review fixes) 2026-06-13 09:26:52 -04:00
Joseph Doherty 4c0e14fc5d worker(alarms): COM-backed LmxSubtagAlarmSource advising alarm subtags 2026-06-13 09:24:09 -04:00
Joseph Doherty c75920c620 docs(plan): correct alarm proto location to mxaccess_gateway.proto (Tasks 1-2) 2026-06-13 09:18:11 -04:00
Joseph Doherty a46ce90e6f server(metrics): alarm provider mode gauge + provider switch counter (Task 13) 2026-06-13 09:18:11 -04:00
Joseph Doherty f113ca53a1 server(galaxy): GetAlarmAttributesAsync discovery query + alarm-attribute row mapping (Task 11) 2026-06-13 09:18:11 -04:00
Joseph Doherty f3616cc7fa server(alarms): AlarmFallbackOptions + ForceSubtag/threshold validation (Task 10) 2026-06-13 09:18:11 -04:00
Joseph Doherty 57d5a8725f worker(alarms): synthetic GUID + degraded/source_provider on emitted transitions 2026-06-13 09:14:23 -04:00
Joseph Doherty 60d35a914f contracts: regenerate Generated/ for alarm provider mode + subtag types
Keeps committed generated C# in sync with the .proto change in 1d85db7
(AlarmProviderMode, AlarmSubtagTarget, AlarmFailoverConfig, AlarmProviderStatus,
OnAlarmProviderModeChangedEvent, degraded/source_provider fields).
2026-06-13 09:10:08 -04:00
Joseph Doherty b10e103bcf worker(alarms): fix net48 build (init->set, usings), token-boundary name parse, acked latch, dup-address guard, tests 2026-06-13 09:05:58 -04:00
Joseph Doherty 348ab16456 worker(alarms): subtag value-source seam + synthesis state machine 2026-06-13 08:57:28 -04:00
Joseph Doherty c16f016f0a test(contracts): round-trip provider status + degraded provenance 2026-06-13 08:56:13 -04:00
Joseph Doherty 1d85db7b4e contracts(gateway): AlarmProviderMode, subtag watch-list, provider status, degraded provenance, mode-changed event 2026-06-13 08:53:02 -04:00
Joseph Doherty 5ea5618315 docs: implementation plan for alarm subtag-monitoring fallback
18 TDD tasks across contracts, worker (SubtagAlarmConsumer + FailoverAlarmConsumer),
gateway (GR-SQL watch-list discovery, monitor mode reflection, metrics, dashboard),
and docs. Grounded in current signatures; parity-preserving (worker-side synthesis).
2026-06-13 08:44:42 -04:00
Joseph Doherty 38a0ad8ab4 docs: design for alarmmgr→subtag alarm-provider fallback
Auto-failover/failback between the wnwrap alarmmgr consumer and a new
worker-side SubtagAlarmConsumer that advises alarm subtags and synthesizes
transitions. GR-SQL+config watch-list discovery, ack via ack-comment write,
degraded state surfaced in the gRPC contract and dashboard/metrics.
2026-06-13 08:35:18 -04:00
Joseph Doherty 5df2ef0d1e chore(theme): bump ZB.MOM.WW.Theme 0.3.0 -> 0.3.1 (interactive-render nav fix) 2026-06-05 07:19:11 -04:00
Joseph Doherty e5785fd769 chore(theme): consume ZB.MOM.WW.Theme 0.3.0 (nav/login kit fixes) 2026-06-05 05:13:06 -04:00
Joseph Doherty 22370ca4da docs(glauth): repoint glauth.md at the shared GLAuth on 10.100.0.35
No more per-box C:\publish\glauth NSSM service — dev/test LDAP is the shared
zb-shared-glauth on 10.100.0.35:3893 (dc=zb,dc=local). Provisioning now via
scadaproj/infra/glauth/config.toml. Old localhost/NSSM procedures kept as
retired reference; test users multi-role/gw-viewer.
2026-06-04 16:38:24 -04:00
Joseph Doherty e0a3fbf35b fix(dashboard)!: move login POST to /auth/login to resolve AmbiguousMatchException
The themed Blazor <LoginCard> page (Components/Pages/Login.razor, @page "/login")
registers a Razor Components endpoint that matches ALL HTTP methods. The credential
form POSTed to /login, where MapPost("/login") also matched — so every POST /login
threw Microsoft.AspNetCore.Routing.Matching.AmbiguousMatchException (HTTP 500),
breaking dashboard login for every user. It was latent because the dashboard was only
ever reached via the AllowAnonymousLocalhost bypass on the host box.

Move the credential POST to a distinct /auth/login route (mirroring ScadaBridge, which
never collided because it posts to /auth/login). GET /login stays the Blazor page; the
cookie LoginPath stays /login. Adds a registration assertion pinning DashboardLoginPost
to /auth/login as the regression guard.

Files: Login.razor (LoginCard Action), DashboardEndpointRouteBuilderExtensions (MapPost
route), GatewayApplicationTests (route assertion).
2026-06-04 14:01:05 -04:00
Joseph Doherty 161ed6f80d chore(theme): bump ZB.MOM.WW.Theme 0.2.0 -> 0.2.1 (desktop app-shell render fix) 2026-06-04 10:23:44 -04:00
Joseph Doherty e57d864ab2 fix(dashboard): make dashboard auth cookie name configurable
The dashboard auth cookie name was hardcoded to the constant
DashboardAuthenticationDefaults.CookieName (MxGatewayDashboard). Browser
cookies are scoped by host+path but NOT by port, so two gateway instances
sharing a hostname would clobber each other's dashboard session under the
shared name.

Add DashboardOptions.CookieName (MxGateway:Dashboard:CookieName); null/blank
keeps the canonical default. Applied in the existing dashboard cookie
PostConfigure (runs after the inline AddCookie default, so it wins). Behaviour
is unchanged when unset. Adds a Tests case for the override.
2026-06-03 13:11:29 -04:00
Joseph Doherty 5539ec8542 chore(dashboard): prune dead sidebar + orphaned login CSS from site.css
Removed the dead .sidebar nav block (replaced by the kit's .side-rail shell) and
the orphaned .dashboard-login/.login-card rules (the /login page now uses the
kit's <LoginCard>). Kept .app-bar (still used by the /denied page header) and the
.chip white-space override (emitted by StatusPill); corrected the now-stale
app-bar comment. 106 lines removed; builds clean.
2026-06-03 04:37:23 -04:00
Joseph Doherty 73e54e252d feat(dashboard): Blazor LoginCard page reusing the hardened /login endpoint 2026-06-03 03:56:51 -04:00
Joseph Doherty 70d959bd9b refactor(dashboard): StatusBadge delegates to ZB.MOM.WW.Theme StatusPill 2026-06-03 03:51:45 -04:00
Joseph Doherty 0c5b796e2e feat(dashboard): split MainLayout into ZB.MOM.WW.Theme ThemeShell + kit nav 2026-06-03 03:49:34 -04:00
Joseph Doherty 47dc9d865f refactor(dashboard): drop vendored theme.css/fonts/nav-state.js; keep app-only CSS in site.css
Repoint the server-rendered sign-in/fallback HTML (DashboardEndpointRouteBuilderExtensions) from /css/theme.css to the kit's _content/ZB.MOM.WW.Theme/css/{theme,layout}.css, mirroring ThemeHead, since that static page cannot use the Razor component.
2026-06-03 03:46:37 -04:00
Joseph Doherty 4f757e3c0c feat(dashboard): use ZB.MOM.WW.Theme ThemeHead + ThemeScripts 2026-06-03 03:44:18 -04:00
Joseph Doherty 2f0ee4c961 build(server): reference ZB.MOM.WW.Theme 0.2.0 2026-06-03 03:43:17 -04:00
Joseph Doherty 0859d47f75 feat(audit): MxGateway IAuditActorAccessor + dashboard audit Actor = operator principal (keyId→Target) (Phase 3)
Introduce IAuditActorAccessor seam + HttpAuditActorAccessor impl (reads ZbClaimTypes.Username
from IHttpContextAccessor; falls back to Identity.Name / ZbClaimTypes.Name; null when
unauthenticated). Register in DI via DashboardServiceCollectionExtensions.

Wire DashboardApiKeyManagementService: WriteDashboardAuditAsync now accepts the ClaimsPrincipal
user already in scope at each call site; ResolveOperatorActor extracts ZbClaimTypes.Username
(preferred) or Identity.Name. All four dashboard-* events now emit Actor = LDAP operator
username and Target = managed keyId, fixing the semantic gap where both fields held the keyId.

ConstraintEnforcer (gRPC / API-key actor) and CanonicalForwardingApiKeyAuditStore (CLI /
"system"/"cli" fallback) are unchanged.

Tests: DashboardApiKeyManagementServiceTests updated — CreateAuthorizedUser adds ZbClaimTypes.Username
("alice"), all dashboard-* audit assertions updated to Actor = "alice" / Target = "operator01";
new CreateAsync_AuthorizedUser_CanonicalAuditEventHasOperatorAsActorAndKeyIdAsTarget verifies the
canonical AuditEvent directly. New HttpAuditActorAccessorTests (4 cases: username claim, Identity.Name
fallback, unauthenticated → null, no context → null). ConstraintEnforcer tests still assert API-key/anonymous actor.
2026-06-02 15:25:39 -04:00
Joseph Doherty 7ea8358c06 feat(audit): MxGateway local producers (dashboard + constraint-denial) emit canonical AuditEvent with Target/CorrelationId (Task 2.3 #6) 2026-06-02 10:13:54 -04:00
Joseph Doherty a5944bbe5d feat(audit): MxGateway canonical SQLite audit_event store + IAuditWriter + IApiKeyAuditStore->canonical adapter (Task 2.3) 2026-06-02 10:10:38 -04:00
Joseph Doherty 04bce3ff9f feat(auth)!: MxGateway canonical dashboard roles — Admin→Administrator (Task 1.7)
Standardize the dashboard role VALUE on the canonical six: Admin→Administrator
(Viewer unchanged). Pure value rename via DashboardRoles.Admin constant +
appsettings GroupToRole; the GatewayOptionsValidator allowed-set/message track
the constant so they now require 'Administrator' or 'Viewer'. Enforcement is
unchanged — Administrator authorizes exactly what Admin did.

Dashboard roles are derived at login from LDAP groups via GroupToRole and are
never persisted to the SQLite auth store, so no DB migration/seed change.

UNTOUCHED: the separate gRPC API-key scope GatewayScopes.Admin = "admin"
(lowercase) and every "admin" scope literal — a distinct data-plane system.
2026-06-02 07:22:42 -04:00
Joseph Doherty 9572045787 chore(auth): MxGateway unify dev LDAP base DN to dc=zb,dc=local (Task 1.6) 2026-06-02 06:44:38 -04:00
Joseph Doherty 7e1af37eb1 feat(auth): MxGateway dashboard adopt ZbClaimTypes + ZbCookieDefaults, keep cookie name (Task 1.5)
- DashboardAuthenticator.CreatePrincipal: emit ZbClaimTypes.Username ("zb:username") with
  the login username, ZbClaimTypes.DisplayName ("zb:displayname") with the display name,
  ZbClaimTypes.Name (== ClaimTypes.Name) for Identity.Name resolution, ZbClaimTypes.Role
  (== ClaimTypes.Role) for IsInRole/[Authorize]. Keep ClaimTypes.NameIdentifier for back-compat
  read-sites; keep mxgateway:ldap_group unchanged (MxGateway-specific, no ZbClaimType for groups).
  ClaimsIdentity built with nameType=ZbClaimTypes.Name, roleType=ZbClaimTypes.Role.
- DashboardServiceCollectionExtensions.AddGatewayDashboard: route cookie hardening through
  ZbCookieDefaults.Apply(requireHttps:true, idleTimeout:8h); set cookie name/path/redirects
  after Apply; PostConfigure still overrides SecurePolicy per RequireHttpsCookie setting.
- DashboardAuthenticatorTests: add AuthenticateAsync_Success_EmitsCanonicalZbClaims asserting
  zb:username, zb:displayname, ZbClaimTypes.Role per role, Identity.Name, and ldap_group preserved.
2026-06-02 06:10:48 -04:00
Joseph Doherty 05009d7370 feat(auth): cut MxGateway API keys over to ZB.MOM.WW.Auth.ApiKeys 0.1.2; keep constraint enforcement+gRPC+CLI on top (Task 1.3) 2026-06-02 02:08:38 -04:00
Joseph Doherty f4dc11bae4 fix(auth): MxGateway 1.2 review fixes — group-claim doc, dedup LdapOptions, 0.1.1 pin 2026-06-02 01:28:57 -04:00
Joseph Doherty c3b466e13d feat(auth): cut MxGateway dashboard LDAP over to ZB.MOM.WW.Auth.Ldap; roles via IGroupRoleMapper (Task 1.2/1.4) 2026-06-02 00:51:10 -04:00
Joseph Doherty 792e3f9445 feat(auth): add IGroupRoleMapper<string> seam (Task 1.1) 2026-06-02 00:31:00 -04:00
Joseph Doherty ae281d06bb build: add ZB.MOM.WW.Auth/Audit feed mapping
Maps ZB.MOM.WW.Auth, ZB.MOM.WW.Auth.*, ZB.MOM.WW.Audit to the gitea feed.
PackageReferences (inline Version=) added during Phase 1/2 adoption.
2026-06-02 00:17:10 -04:00
Joseph Doherty 3ca2799c90 fix: tighten MxGateway Ldap:Port to 1-65535; catch IOException in path validation
Defect 1: ValidateLdap used AddIfNotPositive for Port, accepting any value
> 0 including 70000. Replaced with builder.Port() from the shared
ZB.MOM.WW.Configuration library, which enforces the 1-65535 TCP range and
emits "MxGateway:Ldap:Port must be between 1 and 65535 (was {value})".

Defect 2: AddIfInvalidPath only caught ArgumentException, NotSupportedException,
and PathTooLongException from Path.GetFullPath. On macOS/Linux a path containing
an embedded null throws IOException, which escaped the catch block and caused
Validate() to throw instead of returning a failure. Added catch (IOException).

Tests: added Validate_Fails_WhenLdapPortIsZero, Validate_Fails_WhenLdapPortExceedsMaximum,
and Validate_Succeeds_WhenLdapEnabledWithValidPort to cover the new range boundary.
2026-06-01 22:45:16 -04:00
Joseph Doherty 459a88b3e7 refactor: adopt ZB.MOM.WW.Configuration in MxGateway (behaviour-preserving) 2026-06-01 18:22:21 -04:00
Joseph Doherty 437ab65fc1 build: add ZB.MOM.WW.Configuration feed mapping + version pin 2026-06-01 18:10:27 -04:00
Joseph Doherty 679562e5ed Merge feat/telemetry-followons: telemetry follow-ons for MxAccessGateway
Metric normalization: meter MxGateway.Server -> ZB.MOM.WW.MxGateway and the 3
duration histograms ms -> s (safe: never Prometheus-exported before). Config-driven
OTLP exporter opt-in (default Prometheus). Metrics.md synced; doc-review artifacts
gitignored.
2026-06-01 17:17:31 -04:00
Joseph Doherty dbf550da8b docs(mxgateway): sync Metrics.md to renamed meter + seconds histogram units 2026-06-01 16:48:46 -04:00
Joseph Doherty 3965a7741e feat(mxgateway): config-driven OTLP exporter opt-in (default Prometheus) 2026-06-01 16:44:40 -04:00
Joseph Doherty abb2cfb84b feat(mxgateway): normalize metrics — meter ZB.MOM.WW.MxGateway + histograms in seconds 2026-06-01 16:39:56 -04:00
Joseph Doherty 4e0d8ccfed chore(mxgateway): gitignore CommentChecker doc-review artifacts 2026-06-01 16:34:46 -04:00
Joseph Doherty a935aa8b7c Merge feat/adopt-zb-telemetry: adopt ZB.MOM.WW.Telemetry across MxAccessGateway
Full MEL->Serilog migration via AddZbSerilog; GatewayLogRedactor exposed through
the shared ILogRedactor seam; GatewayMetrics now exports via AddZbTelemetry + new
/metrics (meter name MxGateway.Server + ms histogram units unchanged; rename/unit
conversion deferred). Behaviour-preserving.
2026-06-01 16:05:41 -04:00
Joseph Doherty 9912389fa1 feat(mxgateway): export GatewayMetrics via AddZbTelemetry + /metrics (name/units unchanged) 2026-06-01 15:53:46 -04:00
Joseph Doherty f1129b969d feat(mxgateway): expose GatewayLogRedactor via shared ILogRedactor seam 2026-06-01 15:49:32 -04:00
Joseph Doherty c51b6f9ce4 feat(mxgateway): adopt AddZbSerilog — MEL→Serilog provider swap (behaviour-preserving) 2026-06-01 15:43:10 -04:00
Joseph Doherty e39972357b config(mxgateway): translate MEL Logging section to Serilog 2026-06-01 15:32:38 -04:00
Joseph Doherty 9ad17e2964 build(mxgateway): reference ZB.MOM.WW.Telemetry + Serilog packages 2026-06-01 15:29:43 -04:00
Joseph Doherty ef0a883a81 Merge feat/adopt-zb-health: ZB.MOM.WW.Health adoption + TLS auto-cert/lenient-client-trust feature 2026-06-01 14:09:24 -04:00
Joseph Doherty 62ba5e9487 feat: map canonical ZB health tiers; replace bypassing /health/live 2026-06-01 13:44:13 -04:00
Joseph Doherty 136614be94 feat: add AuthStoreHealthCheck readiness probe 2026-06-01 13:33:54 -04:00
Joseph Doherty a912bffad5 build: reference ZB.MOM.WW.Health from the Gitea feed 2026-06-01 13:29:39 -04:00
156 changed files with 13140 additions and 3494 deletions
+5
View File
@@ -147,3 +147,8 @@ generated-scratch/
# Keep empty directories with .gitkeep files when needed # Keep empty directories with .gitkeep files when needed
!.gitkeep !.gitkeep
# Documentation review artifacts (CommentChecker output)
*-docs-issues.md
*-docs-fixed.md
*-docs-final.md
+1 -1
View File
@@ -100,7 +100,7 @@ When source code changes, build and test the affected component before reporting
## Design Sources To Consult Before Non-Trivial Changes ## Design Sources To Consult Before Non-Trivial Changes
- `gateway.md` — top-level architecture, command/event surface, IPC envelope, STA thread model, fault handling. - `gateway.md` — top-level architecture, command/event surface, IPC envelope, STA thread model, fault handling.
- `glauth.md`local LDAP server (GLAuth on `localhost:3893`, base DN `dc=lmxopcua,dc=local`) used for dev authn. Pre-provisioned users (`admin/admin123`, `readonly/readonly123`, etc.) and the role→capability mapping live there. - `glauth.md`shared GLAuth LDAP server (`10.100.0.35:3893`, base DN `dc=zb,dc=local`, source of truth `scadaproj/infra/glauth/`) used for dev authn. Dashboard test users (`multi-role`/`password` = Administrator, `gw-viewer`/`password` = Viewer) and the role→capability mapping live there.
- `docs/DesignDecisions.md` — v1 choices (MXAccess COM target `LMXProxyServerClass` from `C:\Program Files (x86)\ArchestrA\Framework\Bin\ArchestrA.MXAccess.dll`, API-key-in-SQLite auth, fail-fast event backpressure, etc.). - `docs/DesignDecisions.md` — v1 choices (MXAccess COM target `LMXProxyServerClass` from `C:\Program Files (x86)\ArchestrA\Framework\Bin\ArchestrA.MXAccess.dll`, API-key-in-SQLite auth, fail-fast event backpressure, etc.).
- `docs/GatewayProcessDesign.md`, `docs/MxAccessWorkerInstanceDesign.md`, `docs/WorkerFrameProtocol.md`, `docs/WorkerProcessLauncher.md` — detailed component designs. - `docs/GatewayProcessDesign.md`, `docs/MxAccessWorkerInstanceDesign.md`, `docs/WorkerFrameProtocol.md`, `docs/WorkerProcessLauncher.md` — detailed component designs.
- `docs/GatewayConfiguration.md` — full `MxGateway:*` options bound by `GatewayOptions` and validated at startup by `GatewayOptionsValidator`. - `docs/GatewayConfiguration.md` — full `MxGateway:*` options bound by `GatewayOptions` and validated at startup by `GatewayOptionsValidator`.
+146
View File
@@ -790,3 +790,149 @@ Post-ack transition: kind=Clear …
10s cadence held throughout; full proto fields populated correctly; 10s cadence held throughout; full proto fields populated correctly;
ack registered server-side without errors. ack registered server-side without errors.
## Subtag-monitoring fallback provider
When the wnwrap alarm-manager source fails, the gateway worker switches to
`SubtagAlarmConsumer` — a synthetic alarm source that advises each alarm
attribute's subtags via the existing MXAccess `AddItem`/`Advise` pipeline and
derives alarm transitions from the resulting value-change stream. This is a
non-parity, degraded-mode source; every transition and snapshot it produces
carries `degraded = true`.
### Watch-list discovery
`GatewayAlarmMonitor` resolves the subtag watch-list at subscribe time by
calling `IAlarmWatchListResolver.GetAlarmAttributesAsync`. The resolver merges:
1. Galaxy Repository SQL (`GetAlarmAttributesAsync`) — objects that have alarm
extensions in the configured area.
2. Config overrides — `IncludeAttributes` adds explicit entries;
`ExcludeAttributes` removes Repository-derived ones. The config list takes
effect even when `UseGalaxyRepository` is `false`.
The resolved list is a set of `AlarmSubtagTarget` messages sent to the worker
inside `SubscribeAlarmsCommand.watch_list`. Each target carries the composed
MXAccess item addresses for the `InAlarm`, `Acked`, `AckMsg`, and `Priority`
subtags (confirmed AVEVA `AlarmExtension` field names, verified against the live
ZB Galaxy `attribute_definition` rows). The gateway re-runs discovery on its
reconcile cadence and pushes an updated watch-list when the model changes.
### Subtag advise and `LmxSubtagAlarmSource`
`LmxSubtagAlarmSource` (implements `ISubtagAlarmSource`) owns a separate
`LMXProxyServerClass` instance on the worker STA — it does not share the
session's main MXAccess object. For each watch-list target it calls
`AddItem`/`Advise` on the configured subtag addresses. When a subtag value
changes, it raises `ValueChanged` on the STA and `SubtagAlarmConsumer`
forwards it to `SubtagAlarmStateMachine`.
`PollOnce()` on the subtag consumer is a no-op — the path is event-driven
through `Advise`, not poll-driven.
### Synthesis rules
`SubtagAlarmStateMachine` tracks `(active, acked)` per watch-list entry and
emits `MxAlarmTransitionEvent` records on change:
| Subtag change | Emitted transition | Notes |
|---|---|---|
| `InAlarm` false → true | Raise (`UNACK_ALM`) | `original_raise_timestamp` = first observed active time for this episode |
| `Acked` false → true, while `InAlarm` | Acknowledge (`ACK_ALM`) | `AckedDuringEpisode` latch set |
| `InAlarm` true → false | Clear | `AckRtn` if `AckedDuringEpisode` is set, else `UnackRtn` |
| `Acked` true → false, while `InAlarm` | (none) | Latch is NOT cleared; the episode retains its acknowledged status at clear |
The `AckedDuringEpisode` latch addresses out-of-order subtag delivery:
MXAccess does not guarantee the `Acked = false` update arrives before the
`InAlarm = false` update. The latch ensures a clear always emits `ACK_RTN`
when the alarm was acknowledged at any point during the active episode.
`SnapshotActive()` returns one `MxAlarmSnapshotRecord` per currently-active
alarm. State mapping:
- `InAlarm && !Acked` → `UNACK_ALM`
- `InAlarm && Acked` → `ACK_ALM`
- `!InAlarm` → not included in the snapshot
### Synthetic GUID
The alarmmgr provider supplies a native GUID per alarm record. The subtag
provider has no native GUID. `SubtagAlarmConsumer` derives a deterministic
GUID by hashing `alarm_full_reference` (via `SyntheticAlarmGuid.ForReference`).
The same reference always produces the same GUID within a session, so
GUID-based ack routing resolves correctly. The GUID is not stable across
different alarm references or gateway restarts in the sense of matching any
AVEVA-internal GUID.
### Acknowledge in subtag mode
`AlarmDispatcher` routes ack calls by active provider mode:
- **Alarm-manager mode:** `AlarmAckByName` on `wwAlarmConsumerClass` (unchanged).
- **Subtag mode:** `SubtagAlarmConsumer.AcknowledgeByName` resolves the
watch-list entry's `ack_comment_subtag` and issues a `Write(comment)` on
the STA via `LmxSubtagAlarmSource`. Writing the `AckMsg` subtag performs
the acknowledge in AVEVA (`AckMsg` is the confirmed `AlarmExtension` ack-comment
write target).
If the alarm has no writable ack-comment subtag (`AckComment` config key is
empty, or the entry's `ack_comment_subtag` field is empty), the ack call
returns a failure code that the gateway surfaces as `FailedPrecondition`.
`AcknowledgeByGuid` maps the synthetic GUID back to its reference via an
internal dictionary, then calls the same write path.
`SubtagAlarmConsumer.Subscribe` advises the ack-comment subtag alongside the
observed ones (active/acked/priority). This is required: MXAccess rejects a
write to an item that has been added but not advised with `E_INVALIDARG`
("Value does not fall within the expected range"). Advising it at subscribe
time makes it an active item so the later ack write succeeds — its value
changes carry no transition (the state machine ignores unmapped addresses).
### Live validation
The subtag path was validated against live MXAccess on the dev rig
(`DESKTOP-6JL3KKO`, Galaxy `DEV`, `TestMachine_001.TestAlarm001`):
- `…​.InAlarm` → `True` (Boolean), `…​.Acked` → `False` (Boolean),
`…​.Priority` → `500` (Int32), `…​.AckMsg` → string — confirming the field
names **and** the runtime reference shape `<Object>.<AlarmAttr>.<field>`
with **no** intermediate alarm-condition segment.
- `AcknowledgeByName` (AckMsg write) returned `0` once the ack-comment subtag
was advised — confirming the ack-by-comment-write mechanism end to end.
### Fidelity limitations
The following fields are not available or have lower quality in subtag mode:
| Field | Subtag-mode behavior |
|-------|---------------------|
| `alarm_guid` | Synthetic deterministic GUID from `alarm_full_reference`; not an AVEVA-native GUID |
| `original_raise_timestamp` | First observed `active = true` time; no AVEVA-native raise time |
| `transition_timestamp` | `OnDataChange` source timestamp from MXAccess |
| `severity` | From priority subtag if advised; 0 otherwise |
| `category` / `description` | Not populated (no subtag for these) |
| `current_value` / `limit_value` | Not populated unless corresponding subtags are in the watch-list |
| `alarm_type_name` | Not populated |
| `operator_user` / `operator_comment` | Not populated on synthesized raise/clear transitions |
| `retrigger` transition | Not synthesized (no re-alarm counter subtag is observed) |
Every transition and snapshot record carries `degraded = true` and
`source_provider = ALARM_PROVIDER_MODE_SUBTAG`. Clients that require full
fidelity must wait for failback to the alarm manager.
### Provider mode reflection
When `FailoverAlarmConsumer` switches between providers, it raises
`ProviderModeChanged`. `AlarmDispatcher` enqueues an
`OnAlarmProviderModeChangedEvent` (carried as an `MxEvent`), which the
gateway receives and reflects into:
- `AlarmFeedMessage.provider_status` emitted to every `StreamAlarms`
subscriber.
- The `/hubs/alarms` SignalR hub for the dashboard.
- Metrics: `mxgateway.alarms.provider_mode` gauge and
`mxgateway.alarms.provider_switches` counter.
On every switch `GatewayAlarmMonitor` also forces a reconcile
(`QueryActiveAlarms`) against the now-active provider so the gateway cache
reflects the post-switch state without a spurious raise/clear storm.
+52
View File
@@ -411,6 +411,58 @@ a per-channel skip-verify hook:
See [Gateway Configuration — Automatic self-signed certificate](./GatewayConfiguration.md#automatic-self-signed-certificate) See [Gateway Configuration — Automatic self-signed certificate](./GatewayConfiguration.md#automatic-self-signed-certificate)
and the per-client READMEs for the as-built behavior. and the per-client READMEs for the as-built behavior.
## Alarm-Manager to Subtag Fallback
Decision: add a second alarm provider (subtag monitoring) that the worker
activates automatically when the native wnwrap alarm manager fails, and fails
back to automatically when the manager recovers.
### Worker-side synthesis
Synthesis of alarm transitions from subtag value changes happens entirely in
the worker (`SubtagAlarmConsumer` / `SubtagAlarmStateMachine`). The gateway
still forwards only events the worker emits and synthesizes nothing itself.
This satisfies the parity rule even though the subtag path is inherently
non-parity: the parity rule governs where synthesis lives, not whether
synthesis is permitted when the native source is unavailable.
### Degraded is explicit
Every subtag-mode transition carries `degraded = true` on the
`OnAlarmTransitionEvent` and `ActiveAlarmSnapshot` proto messages, and the
`AlarmFeedMessage` feed carries an `AlarmProviderStatus` payload on stream
open and on every switch. No client can mistake a subtag-mode alarm for an
authoritative alarmmgr record. Subtag mode has lower fidelity: synthetic
deterministic GUID (SHA-derived from the alarm reference), best-effort
original-raise timestamp, narrower field set. Clients that need full fidelity
must wait for failback.
### Failover trigger
The failover trigger is N consecutive wnwrap COM failures — a `COMException`
thrown by `Subscribe` or `PollOnce`, or a failure HRESULT from
`GetXmlCurrentAlarms2`. A single poll failure does not trigger a switch; the
threshold (default 3, floored at 1) guards against transient COM hiccups. The
counter resets on any clean poll so a flapping provider does not permanently
latch in subtag mode.
### Acknowledge via ack-comment write
In subtag mode, `AcknowledgeAlarm` writes the operator comment to the alarm
attribute's ack-comment subtag (`Fallback:Subtags:AckComment`). The write
performs the native ack in AVEVA. This differs from alarmmgr mode, where
`AlarmAckByName` on `wwAlarmConsumerClass` is called directly. The `AckComment`
subtag name is empty by default; configuring it is required for ack to work in
subtag mode. The exact AVEVA subtag names are not hard-coded — the `Subtags`
config block exists precisely so names are not guessed without validation
against the live MXAccess attribute set.
### Related documentation
- [Gateway Configuration — Alarm Fallback options](./GatewayConfiguration.md#alarm-fallback-options)
- [Alarm Client Discovery — Subtag provider](./AlarmClientDiscovery.md)
- [gRPC Contract — provider_status and degraded fields](./Grpc.md)
## Later Revisit Items ## Later Revisit Items
These are explicit post-v1 revisit items, not open blockers: These are explicit post-v1 revisit items, not open blockers:
+70
View File
@@ -148,6 +148,7 @@ the affected stream while the MXAccess session remains active.
| `MxGateway:Dashboard:Enabled` | `true` | Enables Blazor Server dashboard route mapping. The dashboard mounts at the host root (`/`); there is no separate path-base prefix. | | `MxGateway:Dashboard:Enabled` | `true` | Enables Blazor Server dashboard route mapping. The dashboard mounts at the host root (`/`); there is no separate path-base prefix. |
| `MxGateway:Dashboard:AllowAnonymousLocalhost` | `true` | Allows loopback dashboard requests to bypass the dashboard cookie requirement for local development. Remote requests still require dashboard authentication. | | `MxGateway:Dashboard:AllowAnonymousLocalhost` | `true` | Allows loopback dashboard requests to bypass the dashboard cookie requirement for local development. Remote requests still require dashboard authentication. |
| `MxGateway:Dashboard:RequireHttpsCookie` | `true` | Sets the dashboard auth cookie's secure policy. `true` keeps `CookieSecurePolicy.Always` — the cookie is only sent over HTTPS, which matches a production HTTPS deployment. Set to `false` for plain-HTTP dev deployments to use `CookieSecurePolicy.SameAsRequest`; the cookie is still flagged Secure on HTTPS requests, but it can round-trip over HTTP. Browsers drop Secure cookies set over HTTP from non-localhost hosts, so leaving this `true` while serving the dashboard over plain HTTP will break login from any remote browser. | | `MxGateway:Dashboard:RequireHttpsCookie` | `true` | Sets the dashboard auth cookie's secure policy. `true` keeps `CookieSecurePolicy.Always` — the cookie is only sent over HTTPS, which matches a production HTTPS deployment. Set to `false` for plain-HTTP dev deployments to use `CookieSecurePolicy.SameAsRequest`; the cookie is still flagged Secure on HTTPS requests, but it can round-trip over HTTP. Browsers drop Secure cookies set over HTTP from non-localhost hosts, so leaving this `true` while serving the dashboard over plain HTTP will break login from any remote browser. |
| `MxGateway:Dashboard:CookieName` | `MxGatewayDashboard` | Dashboard auth cookie name. Leave unset (null/blank) to use the default. Override it to give a distinct name to a gateway that shares a hostname with another gateway instance: browser cookies are scoped by host+path but **not** by port, so two instances on the same host would otherwise clobber each other's dashboard session under a shared cookie name. Changing it signs out existing dashboard sessions on next deploy. |
| `MxGateway:Dashboard:SnapshotIntervalMilliseconds` | `1000` | Dashboard snapshot refresh interval used by the snapshot SignalR hub and the pages that subscribe to it. | | `MxGateway:Dashboard:SnapshotIntervalMilliseconds` | `1000` | Dashboard snapshot refresh interval used by the snapshot SignalR hub and the pages that subscribe to it. |
| `MxGateway:Dashboard:RecentFaultLimit` | `100` | Maximum number of fault summaries projected into each dashboard snapshot. | | `MxGateway:Dashboard:RecentFaultLimit` | `100` | Maximum number of fault summaries projected into each dashboard snapshot. |
| `MxGateway:Dashboard:RecentSessionLimit` | `200` | Maximum number of session summaries projected into each dashboard snapshot. | | `MxGateway:Dashboard:RecentSessionLimit` | `200` | Maximum number of session summaries projected into each dashboard snapshot. |
@@ -229,6 +230,75 @@ behavior.
The alarm monitor is independent of client sessions: `AcknowledgeAlarm` and The alarm monitor is independent of client sessions: `AcknowledgeAlarm` and
`StreamAlarms` are session-less RPCs served by the monitor. `StreamAlarms` are session-less RPCs served by the monitor.
### Alarm fallback options
The `Fallback` sub-section controls how the alarm feed selects between the
native wnwrap alarm-manager provider and the subtag-monitoring fallback.
| Option | Default | Description |
|--------|---------|-------------|
| `MxGateway:Alarms:Fallback:Mode` | `Auto` | Provider selection mode. `Auto` uses the alarm manager as primary and fails over to subtag monitoring after consecutive COM failures, then fails back automatically. `ForceAlarmManager` disables failover. `ForceSubtag` forces subtag monitoring on from startup. Values are case-insensitive. |
| `MxGateway:Alarms:Fallback:ConsecutiveFailureThreshold` | `3` | Number of consecutive wnwrap COM failures (`COMException` or failure HRESULT from `Subscribe` / `GetXmlCurrentAlarms2`) before the monitor switches to subtag mode. Floored at 1. |
| `MxGateway:Alarms:Fallback:FailbackProbeIntervalSeconds` | `30` | While in subtag mode, how often (in seconds) the monitor probes the wnwrap provider to detect recovery. Floored at 1. |
| `MxGateway:Alarms:Fallback:FailbackStableProbes` | `3` | Number of consecutive clean wnwrap probes required before the monitor switches back to the alarm manager. Floored at 1. |
| `MxGateway:Alarms:Fallback:Discovery:UseGalaxyRepository` | `true` | When `true`, the monitor queries the Galaxy Repository SQL database to build the subtag watch-list for the configured area. |
| `MxGateway:Alarms:Fallback:Discovery:Area` | _(empty)_ | Galaxy area to scope the Repository query to. Falls back to `MxGateway:Alarms:DefaultArea` when empty. Ignored when `UseGalaxyRepository` is `false`. |
| `MxGateway:Alarms:Fallback:Discovery:IncludeAttributes` | _(empty)_ | Explicit MXAccess attribute paths to add to the subtag watch-list, supplementing (or replacing, when `UseGalaxyRepository` is `false`) the Repository-derived list. |
| `MxGateway:Alarms:Fallback:Discovery:ExcludeAttributes` | _(empty)_ | Attribute paths to remove from the Repository-derived watch-list. Ignored when `UseGalaxyRepository` is `false`. |
| `MxGateway:Alarms:Fallback:Subtags:Active` | `InAlarm` | Subtag name for the in-alarm boolean. Confirmed AVEVA `AlarmExtension` field name. |
| `MxGateway:Alarms:Fallback:Subtags:Acked` | `Acked` | Subtag name for the acknowledged boolean. Confirmed AVEVA `AlarmExtension` field name. |
| `MxGateway:Alarms:Fallback:Subtags:AckComment` | `AckMsg` | Subtag name for the acknowledgement comment write target. Writing this subtag performs the acknowledge in AVEVA. Confirmed AVEVA `AlarmExtension` field name. When empty, the ack-comment write path is disabled. |
| `MxGateway:Alarms:Fallback:Subtags:Priority` | `Priority` | Subtag name for the alarm priority / severity value. Confirmed AVEVA `AlarmExtension` field name. |
Validation rules:
- `Mode` must be `Auto`, `ForceAlarmManager`, or `ForceSubtag` (case-insensitive).
- `Mode = ForceSubtag` with both `UseGalaxyRepository = false` and an empty
`IncludeAttributes` list produces a startup validation warning: the subtag
provider has no attributes to advise.
- `ConsecutiveFailureThreshold`, `FailbackProbeIntervalSeconds`, and
`FailbackStableProbes` are floored at 1 by `GatewayOptionsValidator`.
Full example with non-default fallback settings:
```json
{
"MxGateway": {
"Alarms": {
"Enabled": true,
"SubscriptionExpression": "\\\\SCADA01\\Galaxy!PlantArea",
"DefaultArea": "PlantArea",
"ReconcileIntervalSeconds": 30,
"Fallback": {
"Mode": "Auto",
"ConsecutiveFailureThreshold": 3,
"FailbackProbeIntervalSeconds": 30,
"FailbackStableProbes": 3,
"Discovery": {
"UseGalaxyRepository": true,
"Area": "",
"IncludeAttributes": [],
"ExcludeAttributes": []
},
"Subtags": {
"Active": "InAlarm",
"Acked": "Acked",
"AckComment": "AckMsg",
"Priority": "Priority"
}
}
}
}
}
```
The defaults (`InAlarm`/`Acked`/`AckMsg`/`Priority`) are the confirmed AVEVA
`AlarmExtension` primitive field names, verified by querying the live ZB Galaxy
`attribute_definition` rows. The `Subtags` block exists so names can be
overridden without a code change if a site's alarm template uses different
attribute names. See `docs/AlarmClientDiscovery.md` for the synthesis rules that
depend on these names.
## Host Endpoints and Transport Security (Kestrel) ## Host Endpoints and Transport Security (Kestrel)
The listening endpoints are **not** part of the `MxGateway` section. The gateway The listening endpoints are **not** part of the `MxGateway` section. The gateway
+67
View File
@@ -94,6 +94,73 @@ Carrying the enqueue timestamp into the worker layer is what lets queue-wait tim
`StreamAlarms` is a server-streaming, **session-less** RPC that attaches to the gateway's central alarm feed. The handler delegates to `IGatewayAlarmService.StreamAsync`. The stream opens with one `AlarmFeedMessage` carrying an `active_alarm` per currently-active alarm (the ConditionRefresh snapshot), then a single `snapshot_complete`, then a `transition` for every subsequent raise / acknowledge / clear. It is served by the always-on `GatewayAlarmMonitor`, which owns a single gateway-managed worker session and fans out to every attached client — clients no longer open a session of their own. `alarm_filter_prefix`, when set, scopes the stream to a sub-tree. `StreamAlarms` is a server-streaming, **session-less** RPC that attaches to the gateway's central alarm feed. The handler delegates to `IGatewayAlarmService.StreamAsync`. The stream opens with one `AlarmFeedMessage` carrying an `active_alarm` per currently-active alarm (the ConditionRefresh snapshot), then a single `snapshot_complete`, then a `transition` for every subsequent raise / acknowledge / clear. It is served by the always-on `GatewayAlarmMonitor`, which owns a single gateway-managed worker session and fans out to every attached client — clients no longer open a session of their own. `alarm_filter_prefix`, when set, scopes the stream to a sub-tree.
#### Provider status on the alarm feed
`AlarmFeedMessage` has a fourth `payload` case, `provider_status`, carrying
an `AlarmProviderStatus` message:
```protobuf
message AlarmProviderStatus {
AlarmProviderMode mode = 1;
bool degraded = 2; // true whenever mode == SUBTAG
string reason = 3; // human-readable switch reason
google.protobuf.Timestamp since = 4;
}
```
The gateway emits `provider_status` once when a client first subscribes
(immediately after the initial snapshot and before the first live transition)
and again on every failover or failback. A late-joining client therefore
always learns the current provider mode without waiting for the next switch.
`AlarmProviderMode` is an enum with three values:
| Value | Meaning |
|-------|---------|
| `ALARM_PROVIDER_MODE_UNSPECIFIED` (0) | Default / unset |
| `ALARM_PROVIDER_MODE_ALARMMGR` (1) | Native wnwrap alarm-manager source |
| `ALARM_PROVIDER_MODE_SUBTAG` (2) | Subtag-monitoring fallback (degraded) |
#### Degraded and source-provider fields on transitions and snapshots
`OnAlarmTransitionEvent` and `ActiveAlarmSnapshot` both carry two new fields:
- `bool degraded` (field 14) — `true` when the record came from the subtag
fallback, not the native alarmmgr.
- `AlarmProviderMode source_provider` (field 15) — which provider produced
this record (`ALARMMGR` or `SUBTAG`).
Both fields are proto3 defaults (`false` / `UNSPECIFIED`) in alarmmgr mode,
so existing clients that do not read them continue to function without change.
Clients that care about provenance — for example, an OPC UA server that
applies different quality flags to degraded alarms — should inspect `degraded`
before consuming the transition.
Subtag-mode records are a non-parity source. They carry synthetic GUIDs,
best-effort timestamps, and reduced field coverage. See
`docs/AlarmClientDiscovery.md` for the full fidelity table.
#### Provider-mode-changed event
The worker emits `OnAlarmProviderModeChangedEvent` (family
`MX_EVENT_FAMILY_ON_ALARM_PROVIDER_MODE_CHANGED`) on each switch between
providers:
```protobuf
message OnAlarmProviderModeChangedEvent {
AlarmProviderMode mode = 1;
string reason = 2;
int32 hresult = 3; // COM HRESULT that triggered failover; 0 on failback
google.protobuf.Timestamp at = 4;
}
```
This event arrives on the `StreamEvents` stream of the alarm monitor's
internal gateway session (not on client sessions). `GatewayAlarmMonitor`
consumes it and reflects the new mode into the `StreamAlarms` feed's
`provider_status`, the dashboard hub, and metrics. Client sessions do not
receive this event directly.
## Validation Rules ## Validation Rules
`MxAccessGrpcRequestValidator` rejects requests with `StatusCode.InvalidArgument` before any session work happens. The rules are intentionally narrow — anything that requires session state (for example, "session does not exist") is left for `ISessionManager` so the validator can stay synchronous and side-effect free. `MxAccessGrpcRequestValidator` rejects requests with `StatusCode.InvalidArgument` before any session work happens. The rules are intentionally narrow — anything that requires session state (for example, "session does not exist") is left for `ISessionManager` so the validator can stay synchronous and side-effect free.
+6 -6
View File
@@ -4,7 +4,7 @@ The metrics subsystem exposes counters, histograms, and observable gauges that d
## Overview ## Overview
`GatewayMetrics` is a singleton (registered in `GatewayApplication.cs`) that owns a single `Meter` named `ZB.MOM.WW.MxGateway.Server` and a set of synchronised counters, histograms, and observable gauges. Subsystems call typed mutator methods (`SessionOpened`, `CommandFailed`, `EventReceived`, etc.) rather than touching the `Meter` directly, which keeps the OpenTelemetry instrument names and tag conventions in one place. A `lock (_syncRoot)` block guards the scalar fields used by `GetSnapshot`, while per-event maps use `ConcurrentDictionary<string, long>` so the hot event path avoids the lock. `GatewayMetrics` is a singleton (registered in `GatewayApplication.cs`) that owns a single `Meter` named `ZB.MOM.WW.MxGateway` and a set of synchronised counters, histograms, and observable gauges. Subsystems call typed mutator methods (`SessionOpened`, `CommandFailed`, `EventReceived`, etc.) rather than touching the `Meter` directly, which keeps the OpenTelemetry instrument names and tag conventions in one place. A `lock (_syncRoot)` block guards the scalar fields used by `GetSnapshot`, while per-event maps use `ConcurrentDictionary<string, long>` so the hot event path avoids the lock.
## Meter and OpenTelemetry Compatibility ## Meter and OpenTelemetry Compatibility
@@ -13,7 +13,7 @@ The meter name is exposed as a constant so that hosting code can register it wit
```csharp ```csharp
public sealed class GatewayMetrics : IDisposable public sealed class GatewayMetrics : IDisposable
{ {
public const string MeterName = "ZB.MOM.WW.MxGateway.Server"; public const string MeterName = "ZB.MOM.WW.MxGateway";
public GatewayMetrics() public GatewayMetrics()
{ {
@@ -50,12 +50,12 @@ All counters are `Counter<long>`. Tag values come from the call sites listed und
### Histograms ### Histograms
Histograms record durations in milliseconds (the `unit` argument on `CreateHistogram`): Histograms record durations in seconds (the `unit` argument on `CreateHistogram`):
```csharp ```csharp
_workerStartupLatencyHistogram = _meter.CreateHistogram<double>("mxgateway.workers.startup.duration", "ms"); _workerStartupLatencyHistogram = _meter.CreateHistogram<double>("mxgateway.workers.startup.duration", "s");
_commandLatencyHistogram = _meter.CreateHistogram<double>("mxgateway.commands.duration", "ms"); _commandLatencyHistogram = _meter.CreateHistogram<double>("mxgateway.commands.duration", "s");
_eventStreamSendLatencyHistogram = _meter.CreateHistogram<double>("mxgateway.events.stream_send.duration", "ms"); _eventStreamSendLatencyHistogram = _meter.CreateHistogram<double>("mxgateway.events.stream_send.duration", "s");
``` ```
| Instrument | Tags | What it measures | | Instrument | Tags | What it measures |
@@ -0,0 +1,302 @@
# Alarm Subtag-Monitoring Fallback — Design
**Date:** 2026-06-13
**Status:** Approved (brainstorming), ready for implementation planning
**Branch:** `feat/alarm-subtag-fallback`
## Problem
The gateway's central alarm feed (`GatewayAlarmMonitor` → worker
`WnWrapAlarmConsumer`) depends on the AVEVA wnwrap COM consumer
(`WNWRAPCONSUMERLib.wwAlarmConsumerClass`), which polls `GetXmlCurrentAlarms2`
on the worker STA. That provider can fail at the COM boundary (the older
`aaAlarmManagedClient` crashed on FILETIME marshaling; wnwrap can still return
failure HRESULTs or throw `COMException`). When it does, the gateway loses all
alarm visibility.
This design adds a **second alarm source** — direct monitoring of each alarm
attribute's subtags (`.active`, `.acked`, …) via the existing MXAccess
`AddItem`/`Advise` pipeline — and **fails over to it automatically when the
wnwrap provider breaks, then fails back automatically when it recovers**. The
subtag source can also be forced on by config.
## Decisions (locked during brainstorming)
| Decision | Choice |
|---|---|
| Failover model | **Auto-failover + auto-failback** (both directions, runtime) |
| Watch-list source | **Galaxy Repository SQL discovery + config override** |
| Acknowledge in subtag mode | **Write the operator comment to the alarm's ack-comment subtag** (the write performs the ack) |
| Failure signal | **N consecutive wnwrap COM failures** (Subscribe / `GetXmlCurrentAlarms2` throws or returns a failure HRESULT) |
| Degraded-state visibility | **Both** — explicit field in the gRPC contract **and** dashboard + metrics |
| Synthesis location | **Worker-side** (`Approach A`) — keeps the parity rule "the gateway forwards only events the worker emits; it never synthesizes events" |
## Core principle
Subtag monitoring is, by definition, a **non-parity, lower-fidelity** alarm
source: it synthesizes alarm transitions from raw data changes, has no native
alarm GUID, no native original-raise timestamp, and a narrower field set. Per
`CLAUDE.md`, synthesizing events is allowed only as an explicit opt-in
non-parity mode. This design satisfies that by (a) doing the synthesis **inside
the worker** (so the gateway still only forwards worker-emitted events) and
(b) marking every degraded event and the whole feed as degraded so no client
mistakes it for the authoritative alarmmgr feed.
## Architecture
```
GATEWAY (.NET 10, x64)
┌─────────────────────────────────────────────────────────────────┐
│ GatewayAlarmMonitor (BackgroundService) │
│ • resolves watch-list: Galaxy Repository SQL + config override │
│ • arms the worker with the watch-list at subscribe time │
│ • consumes AlarmProviderModeChanged → reflects mode into feed, │
│ /hubs/alarms dashboard hub, and metrics │
│ • forces a cache reconcile (QueryActiveAlarms) on every switch │
└───────────────────────────────┬───────────────────────────────────┘
│ IPC (WorkerEnvelope frames)
│ · SubscribeAlarms{ watch_list, failover cfg }
│ · AlarmProviderModeChanged{ mode, reason, hresult }
│ · OnAlarmTransitionEvent (degraded flag set in subtag mode)
WORKER (.NET FW 4.8, x86, STA)
┌─────────────────────────────────────────────────────────────────┐
│ AlarmDispatcher → FailoverAlarmConsumer : IMxAccessAlarmConsumer │
│ ├─ primary : WnWrapAlarmConsumer (wnwrap COM poll, unchanged) │
│ └─ standby : SubtagAlarmConsumer (AddItem/Advise on subtags) │
│ │
│ FailoverAlarmConsumer owns the state machine: │
│ PrimaryActive ──(N consecutive wnwrap COM failures)──▶ Degraded │
│ Degraded ──(M consecutive clean wnwrap probe polls)──▶ Primary │
│ on each switch: snapshot the now-active provider, hand off │
└─────────────────────────────────────────────────────────────────┘
```
The failover state machine lives **worker-local** so the switch is instant — no
IPC round-trip at the moment alarmmgr dies. The gateway *arms* the standby
consumer up front (passes the watch-list at subscribe time) so it is ready
before it is ever needed.
## Components
### Worker (`src/ZB.MOM.WW.MxGateway.Worker/MxAccess/`)
**`SubtagAlarmConsumer : IMxAccessAlarmConsumer` (new)** — the standby provider.
- On `Subscribe`, instead of wnwrap registration it `AddItem`/`Advise`s the
configured subtags for each watch-list entry on the existing STA (reuses the
worker's item-subscription machinery). Per attribute it advises at minimum
`.active` and `.acked`; optionally `.priority`/severity, `.descr`, value/limit
if present.
- Converts each `OnDataChange` into the same `MxAlarmTransitionEvent` the wnwrap
consumer emits, via the synthesis rules below, and raises
`AlarmTransitionEmitted`. Marks each as **degraded**.
- `SnapshotActiveAlarms()` returns the currently-active set computed from
last-known subtag values.
- `AcknowledgeByName(...)` resolves the watch-list entry's ack-comment subtag and
issues a `Write(comment)` on the STA. `AcknowledgeByGuid(...)` maps the
synthetic GUID (see below) back to a reference, then does the same. If the
attribute exposes no writable ack-comment subtag, returns a failure code that
the gateway surfaces as `FailedPrecondition`.
- `PollOnce()` is a no-op (subtag mode is event-driven via Advise).
**`FailoverAlarmConsumer : IMxAccessAlarmConsumer` (new)** — composite + state
machine. Owns the wnwrap consumer (primary) and the subtag consumer (standby),
forwards `AlarmTransitionEmitted` from whichever child is active, and raises a
new `ProviderModeChanged` event on every switch.
- **Failure counting:** wraps `Subscribe`/`PollOnce` on the primary; a thrown
`COMException` or a failure HRESULT increments a consecutive-failure counter,
reset to zero on any clean poll.
- **Failover** (`PrimaryActive → Degraded`): at `ConsecutiveFailureThreshold`
(default 3), ensures the standby is subscribed (it was armed at startup), sets
active = standby, snapshots the standby's active set for hand-off, and emits
`ProviderModeChanged(SUBTAG, reason, hresult)`.
- **Failback probe** (`Degraded → PrimaryActive`): while degraded, every
`FailbackProbeIntervalSeconds` (default 30) it re-attempts wnwrap
`Subscribe`+`PollOnce` on the STA. After `FailbackStableProbes` (default 3)
consecutive clean polls it switches active = primary, returns the standby to
standby, and emits `ProviderModeChanged(ALARMMGR, "recovered")`.
- **Hand-off:** on every switch it takes `SnapshotActiveAlarms()` from the
now-active provider so the gateway can reconcile and avoid spurious
raise/clear storms.
**`AlarmDispatcher` / `MxAccessAlarmEventSink` / `AlarmCommandHandler`
(changed, minimal)** — `AlarmDispatcher` holds a `FailoverAlarmConsumer` instead
of a bare `WnWrapAlarmConsumer`; it subscribes to `ProviderModeChanged` and
enqueues a mode-changed worker event. The ack path routes by active mode (native
wnwrap ack in alarmmgr mode; ack-comment write in subtag mode), but that routing
is entirely inside the consumer — the dispatcher just calls
`AcknowledgeByName`/`AcknowledgeByGuid`.
### Gateway (`src/ZB.MOM.WW.MxGateway.Server/`)
**Galaxy Repository discovery (new query)** — alongside the existing GR SQL
browse RPCs, a query "attributes that have alarms configured, with their
ack-comment subtag and area", scoped to the configured area. Merged with the
config override (explicit includes/excludes). Produces the watch-list of
`AlarmSubtagTarget`s.
**`GatewayAlarmMonitor` (changed)** — resolves the watch-list at subscribe time
and passes it to the worker; consumes `AlarmProviderModeChanged` and reflects
the current provider mode into (a) the `AlarmFeedMessage` provider-status,
(b) the `/hubs/alarms` dashboard hub, and (c) metrics; forces a reconcile
(`QueryActiveAlarms`) on every switch. Re-runs discovery on its existing
reconcile cadence and pushes an updated watch-list when the model changes.
**`AlarmsOptions` (extended)** — new `Fallback` sub-section (below).
### Contract (`src/ZB.MOM.WW.MxGateway.Contracts/Protos/`)
**`mxaccess_gateway.proto`:**
- `enum AlarmProviderMode { ALARM_PROVIDER_MODE_UNSPECIFIED = 0; ALARMMGR = 1; SUBTAG = 2; }`
- New `AlarmFeedMessage` oneof case `AlarmProviderStatus provider_status`,
carrying `{ AlarmProviderMode mode; bool degraded; string reason;
google.protobuf.Timestamp since; }`. Emitted on stream open and on every
change so a late-joining client immediately learns the mode.
- Add `bool degraded` + `AlarmProviderMode source_provider` to
`OnAlarmTransitionEvent` **and** `ActiveAlarmSnapshot`, so per-item provenance
is visible even mid-stream. All additions are new field numbers — backward
compatible; existing clients ignore them and keep seeing alarms.
**`mxaccess_worker.proto`:**
- Extend the alarm-subscribe command with: `AlarmProviderMode forced_mode`
(`UNSPECIFIED` = auto), `int32 consecutive_failure_threshold`,
`int32 failback_probe_interval_seconds`, `int32 failback_stable_probes`, and
`repeated AlarmSubtagTarget watch_list`, where `AlarmSubtagTarget =
{ string alarm_full_reference; string source_object_reference;
string active_subtag; string acked_subtag; string ack_comment_subtag;
string priority_subtag; }`.
- New worker→gateway event `AlarmProviderModeChanged { AlarmProviderMode mode;
string reason; int32 hresult; google.protobuf.Timestamp at; }`.
> Generated code under `Generated/` and `clients/*/generated*/` is rebuilt from
> these `.proto` files — never hand-edited. Every generated client touched by
> the contract is rebuilt per the source-update workflow.
## Data flow
### Subtag synthesis rules
`SubtagAlarmConsumer` keeps last-known `(active, acked)` per watch-list entry and
emits transitions on change:
| Subtag change | Emitted transition | Notes |
|---|---|---|
| `active` false → true | `RAISE` (state `UNACK_ALM`) | `original_raise_timestamp` = first-observed active time |
| `acked` false → true while `active` | `ACKNOWLEDGE` | `operator_user`/`operator_comment` from ack-comment subtag if advised |
| `active` true → false | `CLEAR` | maps to `AckRtn` if acked at clear, else `UnackRtn` |
| `active` stays true, re-alarm | `RETRIGGER` | **only** if a re-alarm counter subtag exists; otherwise not synthesized (documented limitation) |
Snapshot state mapping for `ActiveAlarmSnapshot.current_state`:
`active && !acked → ACTIVE`, `active && acked → ACTIVE_ACKED`,
`!active → INACTIVE`.
Field degradation in subtag mode:
- `alarm_full_reference` — from the watch-list entry (stable, drives ack-by-ref).
- Synthetic, deterministic GUID derived by hashing `alarm_full_reference` so
GUID-based ack still resolves; flagged `degraded = true`.
- `severity` — from the priority subtag if advised, else 0.
- `original_raise_timestamp` — first-observed active time (best effort).
- `transition_timestamp` — the `OnDataChange` timestamp.
- `category`/`description`/`current_value`/`limit_value` — populated only if the
corresponding subtag is advised; otherwise empty.
### Acknowledge
`AcknowledgeAlarm`/`AcknowledgeAlarmByName` are unchanged at the RPC surface.
`AlarmDispatcher` routes by active provider mode:
- **alarmmgr mode:** native wnwrap `AlarmAckByName`/`AlarmAckByGUID` (unchanged).
- **subtag mode:** resolve the target's `ack_comment_subtag`, `Write` the
operator comment via the existing worker write path on the STA. No writable
ack-comment subtag → `FailedPrecondition`.
### Provider-mode reflection
Worker `AlarmProviderModeChanged` → `GatewayAlarmMonitor` → (a) emit/refresh
`AlarmFeedMessage.provider_status` to every `StreamAlarms` subscriber, (b) push
to `/hubs/alarms`, (c) update metrics, (d) force a reconcile.
## Error handling
- **Both providers down** (subtag advise also failing): the monitor stays
faulted and keeps retrying both; acknowledge returns `Unavailable`. No silent
data loss — the feed reports degraded with reason.
- **Empty watch-list in subtag mode** (GR SQL unavailable, no config override):
log + metric `alarm_fallback_watchlist_empty`; the feed reports degraded +
empty; the gateway keeps re-running discovery on its reconcile cadence and
pushes an updated watch-list when one becomes available.
- **Switch hand-off:** every switch snapshots the now-active provider and
reconciles against the gateway cache to avoid a raise/clear storm.
- **STA affinity:** all subtag advise/write and wnwrap probe calls run on the
worker STA (reuse the existing affinity guard) to satisfy
`ThreadingModel=Apartment`.
### Metrics
- `mxgateway_alarm_provider_mode` (gauge: 1 = alarmmgr, 2 = subtag)
- `mxgateway_alarm_provider_switch_total{from,to,reason}` (counter)
- `mxgateway_alarm_fallback_watchlist_size` (gauge)
## Configuration
```jsonc
"MxGateway": {
"Alarms": {
"Enabled": true,
"SubscriptionExpression": "\\\\DESKTOP-6JL3KKO\\Galaxy!DEV",
"DefaultArea": "DEV",
"ReconcileIntervalSeconds": 30,
"Fallback": {
"Mode": "Auto", // Auto | ForceAlarmManager | ForceSubtag
"ConsecutiveFailureThreshold": 3,
"FailbackProbeIntervalSeconds": 30,
"FailbackStableProbes": 3,
"Discovery": {
"UseGalaxyRepository": true,
"Area": "", // defaults to Alarms.DefaultArea
"IncludeAttributes": [], // explicit additions
"ExcludeAttributes": []
},
"Subtags": {
"Active": "active",
"Acked": "acked",
"AckComment": "", // verified against MXAccess analysis
"Priority": "priority"
}
}
}
}
```
`GatewayOptionsValidator` additions: `Mode = ForceSubtag` with empty discovery
result and no explicit `IncludeAttributes` → startup validation warning;
threshold/interval/probe values floored at sane minimums.
## Open item to confirm during implementation
The exact AVEVA subtag names (`.active`, `.acked`, the ack-comment attribute,
priority) must be confirmed against the MXAccess analysis project
(`C:\Users\dohertj2\Desktop\mxaccess`, `docs/MXAccess-Public-API.md`) and the
live Galaxy before wiring `SubtagAlarmConsumer`. The config `Subtags` block
exists precisely so the resolved names are not hard-coded.
## Testing
| Layer | Tests |
|---|---|
| Worker unit (`MxGateway.Worker.Tests`, x86) | `SubtagAlarmConsumer` synthesis — feed `OnDataChange` sequences, assert raise/ack/clear transitions, snapshot states, degraded flag, synthetic-GUID stability, ack-comment write routing |
| Worker unit | `FailoverAlarmConsumer` state machine — fake wnwrap throwing after K polls: assert switch at threshold, failback after stable probes, `ProviderModeChanged` emitted, no duplicate transitions across switch (hand-off reconcile) |
| Gateway unit (`MxGateway.Tests`, fake worker) | discovery + config-override merge; `GatewayAlarmMonitor` reflects mode into feed + hub; metrics increment on switch |
| Contract | proto round-trip for new fields; existing alarm tests unchanged (alarmmgr-mode regression — parity preserved) |
| Live (opt-in, `MXGATEWAY_RUN_LIVE_MXACCESS_TESTS=1`) | real subtag advise + ack-comment write against a live alarm; GR SQL discovery query against the `ZB` DB (gated like existing GR tests) |
## Docs to update in the same change
`gateway.md` (alarm provider section), `docs/DesignDecisions.md` (record the
fallback decision), `docs/GatewayConfiguration.md` (the `Fallback` block),
`docs/AlarmClientDiscovery.md` (subtag provider + synthesis rules),
`docs/Grpc.md` (the new `provider_status` / `degraded` fields), and any client
READMEs whose generated alarm types gain fields.
@@ -0,0 +1,860 @@
# Alarm Subtag-Monitoring Fallback — Implementation Plan
> **For Claude:** REQUIRED SUB-SKILL: Use superpowers-extended-cc:executing-plans (or subagent-driven-development) to implement this plan task-by-task.
**Goal:** Add a second alarm source — direct MXAccess subtag monitoring — that the gateway auto-fails-over to when the wnwrap alarmmgr provider breaks, auto-fails-back to when it recovers, and can be forced on by config.
**Architecture:** Worker-side synthesis (parity rule preserved). A new `SubtagAlarmConsumer` (own `LMXProxyServerClass`, `AddItem`/`Advise` on alarm subtags) and a `FailoverAlarmConsumer` composite (state machine over the wnwrap primary + subtag standby) both implement the existing `IMxAccessAlarmConsumer` seam. The gateway resolves the subtag watch-list (Galaxy Repository SQL + config override), arms the worker at subscribe time, and reflects the live provider mode into the gRPC alarm feed, the dashboard hub, and metrics.
**Tech Stack:** .NET 10 (gateway, x64) + .NET Framework 4.8 (worker, x86, STA), protobuf/gRPC, `Microsoft.Data.SqlClient` (Galaxy Repository), SignalR (dashboard), `System.Diagnostics.Metrics`, xUnit (plain `Assert`, no FluentAssertions).
**Design source:** `docs/plans/2026-06-13-alarm-subtag-fallback-design.md`
**Branch:** `feat/alarm-subtag-fallback` (already created)
---
## Conventions for every task
- **TDD:** write the failing test, run it red, implement, run it green, commit.
- **xUnit, plain `Assert.*`**, naming `Subject_Condition_Expected`. Worker fakes are sealed private nested classes that raise events.
- **Build/test commands:**
- Contracts regen: `dotnet build src/ZB.MOM.WW.MxGateway.Contracts/ZB.MOM.WW.MxGateway.Contracts.csproj`
- Gateway: `dotnet build src/ZB.MOM.WW.MxGateway.Server` ; `dotnet test src/ZB.MOM.WW.MxGateway.Tests/ZB.MOM.WW.MxGateway.Tests.csproj`
- Worker (x86): `dotnet build src/ZB.MOM.WW.MxGateway.Worker/ZB.MOM.WW.MxGateway.Worker.csproj -p:Platform=x86` ; `dotnet test src/ZB.MOM.WW.MxGateway.Worker.Tests/ZB.MOM.WW.MxGateway.Worker.Tests.csproj -p:Platform=x86`
- Single test: append `--filter FullyQualifiedName~<ClassOrMethod>`
- **Build is strict:** `TreatWarningsAsErrors=true`, nullable enabled. Add XML doc comments on public members (the repo runs a doc checker).
- **Generated code** under `Generated/` is never hand-edited — rebuild the contracts project to regenerate.
- **Namespaces:** worker MxAccess types live in `ZB.MOM.WW.MxGateway.Worker.MxAccess`; proto C# types in `ZB.MOM.WW.MxGateway.Contracts.Proto`.
---
## Phase 0 — Contracts
### Task 1: Worker proto — subtag watch-list, failover config, provider-mode enum
**Classification:** high-risk
**Estimated implement time:** ~4 min
**Parallelizable with:** none (Task 2 imports these types)
**Files:**
- Modify: `src/ZB.MOM.WW.MxGateway.Contracts/Protos/mxaccess_gateway.proto` (real `SubscribeAlarmsCommand` at ~line 324; `MxCommand` references it at 123-125)
> **CORRECTION (execution):** The alarm command messages and `MxCommand` live in **`mxaccess_gateway.proto`**, not the worker proto. `mxaccess_worker.proto` *imports* the gateway proto (`WorkerCommand.command` is `mxaccess_gateway.v1.MxCommand`), so the gateway proto is the base and the worker proto needs **no** change. `AlarmProviderMode` and the new types are added to the gateway proto and are visible to worker code as `mxaccess_gateway.v1` types. Tasks 1 and 2 are executed as a single combined edit on this one file.
**Step 1: Add the enum and messages.** In `mxaccess_gateway.proto`, extend the existing `SubscribeAlarmsCommand` message (line 324) and add the new types after it:
```protobuf
// Provider selection / current provider for the alarm feed. Defined here in
// the worker contract because the worker SubscribeAlarmsCommand references it;
// mxaccess_gateway.proto imports this file and reuses the same enum.
enum AlarmProviderMode {
ALARM_PROVIDER_MODE_UNSPECIFIED = 0; // auto: alarmmgr primary, subtag fallback
ALARM_PROVIDER_MODE_ALARMMGR = 1;
ALARM_PROVIDER_MODE_SUBTAG = 2;
}
message SubscribeAlarmsCommand {
string subscription_expression = 1; // existing field — keep
// UNSPECIFIED = auto-failover/failback. ALARMMGR/SUBTAG force one provider.
AlarmProviderMode forced_mode = 2;
// Subtag watch-list resolved by the gateway (GR SQL + config). Empty in pure
// alarmmgr mode; in subtag mode it bounds what the consumer can observe.
repeated AlarmSubtagTarget watch_list = 3;
AlarmFailoverConfig failover = 4;
}
// One alarm attribute the subtag consumer advises. Addresses are full MXAccess
// item references the worker passes straight to AddItem.
message AlarmSubtagTarget {
string alarm_full_reference = 1; // e.g. "Galaxy!Area.Tank01.Level.HiHi"
string source_object_reference = 2; // e.g. "Tank01"
string active_subtag = 3; // item address of the in-alarm boolean
string acked_subtag = 4; // item address of the acknowledged boolean
string ack_comment_subtag = 5; // writable ack-comment attribute (ack write target)
string priority_subtag = 6; // optional severity source; empty if absent
}
message AlarmFailoverConfig {
int32 consecutive_failure_threshold = 1; // wnwrap COM failures before switching (>=1)
int32 failback_probe_interval_seconds = 2; // probe cadence while degraded (>=1)
int32 failback_stable_probes = 3; // clean probes before switching back (>=1)
}
```
`UnsubscribeAlarmsCommand` and `AcknowledgeAlarmCommand` are unchanged.
**Step 2: Regenerate & verify it compiles.**
Run: `dotnet build src/ZB.MOM.WW.MxGateway.Contracts/ZB.MOM.WW.MxGateway.Contracts.csproj`
Expected: build succeeds; generated `AlarmProviderMode`, `AlarmSubtagTarget`, `AlarmFailoverConfig` types appear.
**Step 3: Commit.**
```bash
git add src/ZB.MOM.WW.MxGateway.Contracts/Protos/mxaccess_worker.proto
git commit -m "contracts(worker): subtag watch-list + failover config + AlarmProviderMode"
```
---
### Task 2: Gateway proto — provider status on the feed, degraded provenance, mode-changed event
**Classification:** high-risk
**Estimated implement time:** ~5 min
**Parallelizable with:** none (depends on Task 1; Task 3 tests both)
**Files:**
- Modify: `src/ZB.MOM.WW.MxGateway.Contracts/Protos/mxaccess_gateway.proto` (`OnAlarmTransitionEvent` ~719-771, `ActiveAlarmSnapshot` ~783-803, `AlarmFeedMessage` ~860-870, `MxEvent` family enum + body oneof, `MxEventFamily` enum)
**Step 1: Add degraded provenance to the two alarm payloads.** Append to `OnAlarmTransitionEvent` (next free field 14):
```protobuf
// True when this transition came from the subtag-monitoring fallback rather
// than the native alarmmgr provider — i.e. it was synthesized from data
// changes and carries reduced fidelity (synthetic GUID, no native raise time).
bool degraded = 14;
// Which provider produced this transition.
AlarmProviderMode source_provider = 15;
```
Append the identical two fields to `ActiveAlarmSnapshot` (next free field 14):
```protobuf
bool degraded = 14;
AlarmProviderMode source_provider = 15;
```
**Step 2: Add provider status to the feed oneof.** Add a new oneof case to `AlarmFeedMessage` (next free field 4) and a new message:
```protobuf
message AlarmFeedMessage {
oneof payload {
ActiveAlarmSnapshot active_alarm = 1;
bool snapshot_complete = 2;
OnAlarmTransitionEvent transition = 3;
// Provider-mode status. Emitted once on stream open and again on every
// failover/failback so late joiners learn the current mode immediately.
AlarmProviderStatus provider_status = 4;
}
}
message AlarmProviderStatus {
AlarmProviderMode mode = 1;
bool degraded = 2; // true whenever mode == SUBTAG
string reason = 3; // human-readable switch reason
google.protobuf.Timestamp since = 4;
}
```
**Step 3: Add the worker→gateway mode-changed event to `MxEvent`.** Find the `MxEventFamily` enum and the `MxEvent` body oneof. Add a family member and a body message + oneof case (use the next free family value and the next free `MxEvent` body field number — check the file):
```protobuf
// in MxEventFamily enum:
MX_EVENT_FAMILY_ON_ALARM_PROVIDER_MODE_CHANGED = <next>;
// new message near OnAlarmTransitionEvent:
message OnAlarmProviderModeChangedEvent {
AlarmProviderMode mode = 1;
string reason = 2;
int32 hresult = 3; // COM HRESULT that triggered failover; 0 on failback
google.protobuf.Timestamp at = 4;
}
// in MxEvent body oneof:
OnAlarmProviderModeChangedEvent on_alarm_provider_mode_changed = <next>;
```
`AlarmProviderMode` is defined in `mxaccess_worker.proto`; confirm `mxaccess_gateway.proto` already has `import "mxaccess_worker.proto";` (it references `SubscribeAlarmsCommand`, so it does) and reference the enum unqualified or via its package as the existing references do.
**Step 4: Regenerate & verify.**
Run: `dotnet build src/ZB.MOM.WW.MxGateway.Contracts/ZB.MOM.WW.MxGateway.Contracts.csproj`
Expected: build succeeds.
**Step 5: Commit.**
```bash
git add src/ZB.MOM.WW.MxGateway.Contracts/Protos/mxaccess_gateway.proto
git commit -m "contracts(gateway): AlarmProviderStatus feed case, degraded provenance, mode-changed event"
```
---
### Task 3: Proto round-trip tests for the new alarm fields
**Classification:** small
**Estimated implement time:** ~3 min
**Parallelizable with:** none (depends on Tasks 1-2)
**Files:**
- Modify: `src/ZB.MOM.WW.MxGateway.Tests/Contracts/ProtobufContractRoundTripTests.cs`
**Step 1: Add tests** mirroring the existing `Event_RoundTripsOnAlarmTransitionWithFullPayload` style:
```csharp
[Fact]
public void Feed_RoundTripsProviderStatus()
{
var since = Timestamp.FromDateTime(new DateTime(2026, 6, 13, 9, 0, 0, DateTimeKind.Utc));
var original = new AlarmFeedMessage
{
ProviderStatus = new AlarmProviderStatus
{
Mode = AlarmProviderMode.Subtag,
Degraded = true,
Reason = "wnwrap poll failed 3x (HRESULT 0x80004005)",
Since = since,
},
};
var parsed = AlarmFeedMessage.Parser.ParseFrom(original.ToByteArray());
Assert.Equal(original, parsed);
Assert.Equal(AlarmFeedMessage.PayloadOneofCase.ProviderStatus, parsed.PayloadCase);
Assert.True(parsed.ProviderStatus.Degraded);
Assert.Equal(AlarmProviderMode.Subtag, parsed.ProviderStatus.Mode);
}
[Fact]
public void Transition_RoundTripsDegradedProvenance()
{
var t = new OnAlarmTransitionEvent
{
AlarmFullReference = "Galaxy!Area.Tank01.Level.HiHi",
TransitionKind = AlarmTransitionKind.Raise,
Degraded = true,
SourceProvider = AlarmProviderMode.Subtag,
};
var parsed = OnAlarmTransitionEvent.Parser.ParseFrom(t.ToByteArray());
Assert.True(parsed.Degraded);
Assert.Equal(AlarmProviderMode.Subtag, parsed.SourceProvider);
}
```
**Step 2: Run red→green.**
Run: `dotnet test src/ZB.MOM.WW.MxGateway.Tests/ZB.MOM.WW.MxGateway.Tests.csproj --filter FullyQualifiedName~ProtobufContractRoundTripTests`
Expected: PASS.
**Step 3: Commit.**
```bash
git add src/ZB.MOM.WW.MxGateway.Tests/Contracts/ProtobufContractRoundTripTests.cs
git commit -m "test(contracts): round-trip provider status + degraded provenance"
```
---
## Phase 1 — Worker: subtag consumer + failover
### Task 4: Subtag value-source abstraction + synthesis state holder
**Classification:** standard
**Estimated implement time:** ~5 min
**Parallelizable with:** none (Task 5 builds on it)
A testable seam so synthesis logic is unit-tested without COM. The COM wiring lands in Task 6.
**Files:**
- Create: `src/ZB.MOM.WW.MxGateway.Worker/MxAccess/ISubtagAlarmSource.cs`
- Create: `src/ZB.MOM.WW.MxGateway.Worker/MxAccess/SubtagAlarmStateMachine.cs`
- Test: `src/ZB.MOM.WW.MxGateway.Worker.Tests/MxAccess/SubtagAlarmStateMachineTests.cs`
**Step 1: Define the source abstraction.** `ISubtagAlarmSource` advises subtag addresses and raises a normalized value-change callback on the STA:
```csharp
namespace ZB.MOM.WW.MxGateway.Worker.MxAccess;
/// <summary>A change in one advised subtag value, normalized off the COM boundary.</summary>
public sealed class SubtagValueChange
{
/// <summary>The full item address that changed (matches an AlarmSubtagTarget subtag).</summary>
public string ItemAddress { get; init; } = string.Empty;
/// <summary>The new value (boolean for .active/.acked, numeric for priority).</summary>
public object? Value { get; init; }
/// <summary>The change timestamp in UTC.</summary>
public DateTime TimestampUtc { get; init; }
}
/// <summary>
/// Advises a set of MXAccess subtag addresses and surfaces value changes.
/// The production implementation (Task 6) owns its own LMXProxyServerClass;
/// tests substitute a fake that pushes <see cref="SubtagValueChange"/>s.
/// </summary>
public interface ISubtagAlarmSource : IDisposable
{
/// <summary>Raised on the STA when an advised subtag's value changes.</summary>
event EventHandler<SubtagValueChange>? ValueChanged;
/// <summary>Advises every subtag in the supplied addresses; idempotent per address.</summary>
void Advise(IReadOnlyCollection<string> itemAddresses);
/// <summary>Writes a value to an item address (used for the ack-comment write).</summary>
void Write(string itemAddress, object? value);
}
```
**Step 2: Write the state-machine tests first.** `SubtagAlarmStateMachine` maps `(active, acked)` changes per target to `MxAlarmTransitionEvent`s. Test the four core transitions:
```csharp
namespace ZB.MOM.WW.MxGateway.Worker.Tests.MxAccess;
public sealed class SubtagAlarmStateMachineTests
{
private static AlarmSubtagTarget Target() => new()
{
AlarmFullReference = "Galaxy!Area.Tank01.Level.HiHi",
SourceObjectReference = "Tank01",
ActiveSubtag = "Tank01.Level.HiHi.active",
AckedSubtag = "Tank01.Level.HiHi.acked",
AckCommentSubtag = "Tank01.Level.HiHi.ackmsg",
};
[Fact]
public void ActiveFalseToTrue_EmitsRaise_FlaggedDegraded()
{
var sm = new SubtagAlarmStateMachine(new[] { Target() });
var ts = new DateTime(2026, 6, 13, 9, 0, 0, DateTimeKind.Utc);
var events = sm.Apply("Tank01.Level.HiHi.active", true, ts);
var e = Assert.Single(events);
Assert.Equal(MxAlarmStateKind.UnackAlm, e.Record.State);
Assert.Equal(MxAlarmStateKind.Unspecified, e.PreviousState);
Assert.Equal("Tank01.Level.HiHi", e.Record.TagName); // reference minus provider/area
}
[Fact]
public void AckedTrueWhileActive_EmitsAckTransition()
{
var sm = new SubtagAlarmStateMachine(new[] { Target() });
var ts = new DateTime(2026, 6, 13, 9, 0, 0, DateTimeKind.Utc);
sm.Apply("Tank01.Level.HiHi.active", true, ts);
var events = sm.Apply("Tank01.Level.HiHi.acked", true, ts.AddSeconds(5));
var e = Assert.Single(events);
Assert.Equal(MxAlarmStateKind.AckAlm, e.Record.State);
Assert.Equal(MxAlarmStateKind.UnackAlm, e.PreviousState);
}
[Fact]
public void ActiveTrueToFalse_WhileUnacked_EmitsUnackRtn()
{
var sm = new SubtagAlarmStateMachine(new[] { Target() });
var ts = new DateTime(2026, 6, 13, 9, 0, 0, DateTimeKind.Utc);
sm.Apply("Tank01.Level.HiHi.active", true, ts);
var events = sm.Apply("Tank01.Level.HiHi.active", false, ts.AddSeconds(10));
var e = Assert.Single(events);
Assert.Equal(MxAlarmStateKind.UnackRtn, e.Record.State);
}
[Fact]
public void Snapshot_ReflectsActiveAndAckedState()
{
var sm = new SubtagAlarmStateMachine(new[] { Target() });
var ts = new DateTime(2026, 6, 13, 9, 0, 0, DateTimeKind.Utc);
sm.Apply("Tank01.Level.HiHi.active", true, ts);
sm.Apply("Tank01.Level.HiHi.acked", true, ts);
var snap = Assert.Single(sm.SnapshotActive());
Assert.Equal(MxAlarmStateKind.AckAlm, snap.State);
}
}
```
Run: `dotnet test ...Worker.Tests... -p:Platform=x86 --filter FullyQualifiedName~SubtagAlarmStateMachineTests` → FAIL (type missing).
**Step 3: Implement `SubtagAlarmStateMachine`.** Build an address→target index (active/acked/priority/comment addresses), hold per-reference `(bool active, bool acked, DateTime firstRaiseUtc, int priority)`, and emit on change:
- active `false→true``UnackAlm`, set `firstRaiseUtc`, `PreviousState` from prior state.
- acked `false→true` while active ⇒ `AckAlm`.
- active `true→false``AckRtn` if currently acked else `UnackRtn`; then reset acked.
- priority change ⇒ update stored priority, no transition.
- `TagName` = `alarm_full_reference` with any `Provider!Area.` prefix stripped (match `WnWrapAlarmConsumer`'s reference shape so `GatewayAlarmMonitor` keys align). Set `ProviderName`, `Group`, `Priority`, `AlarmComment` from the target/last values. Mark a `Degraded`/source flag (carried via a new field — see Task 5 wiring).
- `SnapshotActive()` returns `MxAlarmSnapshotRecord` for references whose active is true.
**Step 4: Run green.** Expected: PASS.
**Step 5: Commit.**
```bash
git add src/ZB.MOM.WW.MxGateway.Worker/MxAccess/ISubtagAlarmSource.cs \
src/ZB.MOM.WW.MxGateway.Worker/MxAccess/SubtagAlarmStateMachine.cs \
src/ZB.MOM.WW.MxGateway.Worker.Tests/MxAccess/SubtagAlarmStateMachineTests.cs
git commit -m "worker(alarms): subtag value-source seam + synthesis state machine"
```
---
### Task 5: `SubtagAlarmConsumer` over the source seam (no COM yet)
**Classification:** standard
**Estimated implement time:** ~5 min
**Parallelizable with:** none (depends on Task 4)
**Files:**
- Create: `src/ZB.MOM.WW.MxGateway.Worker/MxAccess/SubtagAlarmConsumer.cs`
- Test: `src/ZB.MOM.WW.MxGateway.Worker.Tests/MxAccess/SubtagAlarmConsumerTests.cs`
**Step 1: Test with a fake `ISubtagAlarmSource`.** Drive value changes through the source, assert `AlarmTransitionEmitted` fires with synthesized records and that ack writes the comment to the ack-comment subtag:
```csharp
public sealed class SubtagAlarmConsumerTests
{
private sealed class FakeSource : ISubtagAlarmSource
{
public event EventHandler<SubtagValueChange>? ValueChanged;
public List<string> Advised { get; } = new();
public (string Address, object? Value)? LastWrite { get; private set; }
public void Advise(IReadOnlyCollection<string> a) => Advised.AddRange(a);
public void Write(string a, object? v) => LastWrite = (a, v);
public void Raise(string addr, object? val, DateTime ts) =>
ValueChanged?.Invoke(this, new SubtagValueChange { ItemAddress = addr, Value = val, TimestampUtc = ts });
public void Dispose() { }
}
private static AlarmSubtagTarget Target() => new()
{
AlarmFullReference = "Galaxy!Area.Tank01.Level.HiHi",
ActiveSubtag = "Tank01.Level.HiHi.active",
AckedSubtag = "Tank01.Level.HiHi.acked",
AckCommentSubtag = "Tank01.Level.HiHi.ackmsg",
};
[Fact]
public void Subscribe_AdvisesAllSubtags()
{
var src = new FakeSource();
using var c = new SubtagAlarmConsumer(src, new[] { Target() });
c.Subscribe("ignored-in-subtag-mode");
Assert.Contains("Tank01.Level.HiHi.active", src.Advised);
Assert.Contains("Tank01.Level.HiHi.acked", src.Advised);
}
[Fact]
public void ValueChange_RaisesSynthesizedTransition()
{
var src = new FakeSource();
using var c = new SubtagAlarmConsumer(src, new[] { Target() });
c.Subscribe("x");
MxAlarmTransitionEvent? seen = null;
c.AlarmTransitionEmitted += (_, e) => seen = e;
src.Raise("Tank01.Level.HiHi.active", true, new DateTime(2026, 6, 13, 9, 0, 0, DateTimeKind.Utc));
Assert.NotNull(seen);
Assert.Equal(MxAlarmStateKind.UnackAlm, seen!.Record.State);
}
[Fact]
public void AcknowledgeByName_WritesCommentToAckCommentSubtag()
{
var src = new FakeSource();
using var c = new SubtagAlarmConsumer(src, new[] { Target() });
c.Subscribe("x");
int rc = c.AcknowledgeByName("Tank01.Level.HiHi", "Galaxy", "Area",
"ack from HMI", "op1", "node", "dom", "Op One");
Assert.Equal(0, rc);
Assert.Equal(("Tank01.Level.HiHi.ackmsg", (object?)"ack from HMI"), src.LastWrite);
}
}
```
**Step 2: Implement `SubtagAlarmConsumer : IMxAccessAlarmConsumer`.**
- Constructor `(ISubtagAlarmSource source, IReadOnlyList<AlarmSubtagTarget> watchList)`; build a `SubtagAlarmStateMachine`; index `alarm_full_reference`→target for ack routing.
- `Subscribe(_)`: call `source.Advise(<all subtag addresses>)`; subscribe to `source.ValueChanged`, feed each into the state machine, and re-raise each produced `MxAlarmTransitionEvent` via `AlarmTransitionEmitted` (mark degraded).
- `AcknowledgeByName(alarmName, …, comment, …)`: resolve the target by reference; if no `AckCommentSubtag`, return a non-zero failure code; else `source.Write(target.AckCommentSubtag, comment)` and return 0.
- `AcknowledgeByGuid(guid, …)`: map the synthetic GUID (deterministic hash of reference — see Task 8 helper, or a local copy) back to a reference, then delegate to the name path; unknown GUID ⇒ non-zero.
- `SnapshotActiveAlarms()`: from the state machine.
- `PollOnce()`: no-op.
- `Dispose()`: unsubscribe + dispose source.
**Step 3: Run green.** `dotnet test ...Worker.Tests... -p:Platform=x86 --filter FullyQualifiedName~SubtagAlarmConsumerTests`.
**Step 4: Commit.**
```bash
git add src/ZB.MOM.WW.MxGateway.Worker/MxAccess/SubtagAlarmConsumer.cs \
src/ZB.MOM.WW.MxGateway.Worker.Tests/MxAccess/SubtagAlarmConsumerTests.cs
git commit -m "worker(alarms): SubtagAlarmConsumer synthesizing transitions over the source seam"
```
---
### Task 6: COM-backed `LmxSubtagAlarmSource` (own LMXProxyServerClass)
**Classification:** high-risk
**Estimated implement time:** ~5 min
**Parallelizable with:** none
The only piece that touches live COM. Like `WnWrapAlarmConsumer`, it owns its own MXAccess server object so the subtag source is self-contained and isolated from the session's item pipeline. Logic stays thin (advise/write/marshal); real verification is the live smoke test in Task 17.
**Files:**
- Create: `src/ZB.MOM.WW.MxGateway.Worker/MxAccess/LmxSubtagAlarmSource.cs`
- Test: `src/ZB.MOM.WW.MxGateway.Worker.Tests/MxAccess/LmxSubtagAlarmSourceTests.cs` (constructor/guard tests only; COM path is live-gated)
**Step 1: Implement `LmxSubtagAlarmSource : ISubtagAlarmSource`.**
- Own an `LMXProxyServerClass` (reuse the worker's `IMxAccessServer`/`MxAccessComServer` wrapper + `IMxAccessComObjectFactory` so it is fakeable; constructor takes the factory).
- `Advise(addresses)`: `RegisterServer` (topic) once; per address `AddItem``itemHandle`, `Advise`, and record `itemHandle→address`. Subscribe to the proxy's `OnDataChange`; in the handler, look up the address by `phItemHandle`, normalize `pvItemValue` (VARIANT→bool/double) and `pftItemTimeStamp`→UTC, and raise `ValueChanged`. All calls run on the STA (the worker STA pumps messages, so `OnDataChange` delivers).
- `Write(address, value)`: resolve/create the item handle, `server.Write(serverHandle, itemHandle, value, userId: 0)`.
- `Dispose()`: `UnAdvise`/`RemoveItem`/`Unregister`/release COM.
**Step 2: Tests** — only the non-COM guards (null factory throws; `Write` before `Advise` resolves a handle or throws a clear error). Mark the COM round-trip `[LiveMxAccessFact]` and `Skip` per the `AlarmsLiveSmokeTests` precedent.
**Step 3: Build x86 + run unit tests.**
`dotnet build src/ZB.MOM.WW.MxGateway.Worker/ZB.MOM.WW.MxGateway.Worker.csproj -p:Platform=x86`
`dotnet test ...Worker.Tests... -p:Platform=x86 --filter FullyQualifiedName~LmxSubtagAlarmSourceTests`
**Step 4: Commit.**
```bash
git add src/ZB.MOM.WW.MxGateway.Worker/MxAccess/LmxSubtagAlarmSource.cs \
src/ZB.MOM.WW.MxGateway.Worker.Tests/MxAccess/LmxSubtagAlarmSourceTests.cs
git commit -m "worker(alarms): COM-backed LmxSubtagAlarmSource advising alarm subtags"
```
---
### Task 7: `FailoverAlarmConsumer` state machine
**Classification:** high-risk
**Estimated implement time:** ~5 min
**Parallelizable with:** none (depends on Task 5)
**Files:**
- Create: `src/ZB.MOM.WW.MxGateway.Worker/MxAccess/FailoverAlarmConsumer.cs`
- Create: `src/ZB.MOM.WW.MxGateway.Worker/MxAccess/AlarmProviderModeChange.cs` (small EventArgs)
- Test: `src/ZB.MOM.WW.MxGateway.Worker.Tests/MxAccess/FailoverAlarmConsumerTests.cs`
**Step 1: Test the switch/failback with a fake primary that throws.**
```csharp
public sealed class FailoverAlarmConsumerTests
{
private sealed class FlakyPrimary : IMxAccessAlarmConsumer
{
public event EventHandler<MxAlarmTransitionEvent>? AlarmTransitionEmitted;
public int PollsUntilHeal = int.MaxValue; // becomes healthy after N polls while degraded
public bool ThrowOnPoll = true;
private int _polls;
public void Subscribe(string s) { if (ThrowOnPoll) throw new COMException("boom", unchecked((int)0x80004005)); }
public void PollOnce()
{
_polls++;
if (ThrowOnPoll && _polls < PollsUntilHeal) throw new COMException("boom", unchecked((int)0x80004005));
}
public int AcknowledgeByGuid(Guid g, string c, string a, string b, string d, string e) => 0;
public int AcknowledgeByName(string n, string p, string gr, string c, string a, string b, string d, string e) => 0;
public IReadOnlyList<MxAlarmSnapshotRecord> SnapshotActiveAlarms() => Array.Empty<MxAlarmSnapshotRecord>();
public void Dispose() { }
}
private sealed class StubStandby : IMxAccessAlarmConsumer { /* records Subscribe, no-op rest */ }
[Fact]
public void Primary_FailsThresholdTimes_SwitchesToSubtagAndEmitsModeChange()
{
var primary = new FlakyPrimary();
var standby = new StubStandby();
using var c = new FailoverAlarmConsumer(primary, standby,
new FailoverSettings(threshold: 3, probeIntervalSeconds: 30, stableProbes: 3));
AlarmProviderModeChange? change = null;
c.ProviderModeChanged += (_, e) => change = e;
c.Subscribe("\\\\host\\Galaxy!Area"); // primary.Subscribe throws -> counts as failure 1
c.PollOnce(); // failure 2
c.PollOnce(); // failure 3 -> switch
Assert.NotNull(change);
Assert.Equal(AlarmProviderMode.Subtag, change!.Mode);
}
[Fact]
public void WhileDegraded_PrimaryHeals_FailsBackAfterStableProbes()
{
var primary = new FlakyPrimary { PollsUntilHeal = 0 }; // will heal once we stop throwing
var standby = new StubStandby();
using var c = new FailoverAlarmConsumer(primary, standby,
new FailoverSettings(threshold: 1, probeIntervalSeconds: 0, stableProbes: 2));
var modes = new List<AlarmProviderMode>();
c.ProviderModeChanged += (_, e) => modes.Add(e.Mode);
c.Subscribe("x"); // failure -> switch to subtag
primary.ThrowOnPoll = false;
c.ProbeOnce(); // clean probe 1
c.ProbeOnce(); // clean probe 2 -> failback
Assert.Equal(AlarmProviderMode.Subtag, modes[0]);
Assert.Equal(AlarmProviderMode.Alarmmgr, modes[^1]);
}
}
```
**Step 2: Implement.**
- `record FailoverSettings(int threshold, int probeIntervalSeconds, int stableProbes)`; `AlarmProviderModeChange : EventArgs { AlarmProviderMode Mode; string Reason; int HResult; DateTime AtUtc; }`.
- Constructor `(IMxAccessAlarmConsumer primary, IMxAccessAlarmConsumer standby, FailoverSettings settings)`; forced-mode variants handled in Task 9 wiring (forced ⇒ skip the other consumer).
- Forward `AlarmTransitionEmitted` from the **active** child only (swap the subscription on switch).
- Wrap `Subscribe`/`PollOnce` on the primary: on `COMException` (or a failure HRESULT) while `PrimaryActive`, increment a counter; at `threshold`, ensure standby `Subscribe`d, set active=standby, snapshot standby for hand-off, raise `ProviderModeChanged(Subtag, reason, hresult, now)`. Reset counter on any clean primary poll.
- `ProbeOnce()` (driven by the poll loop while degraded, gated by `probeIntervalSeconds`): try primary `Subscribe`+`PollOnce`; count consecutive clean probes; at `stableProbes`, set active=primary, return standby to standby, raise `ProviderModeChanged(Alarmmgr, "recovered", 0, now)`.
- `Acknowledge*` / `SnapshotActiveAlarms` delegate to the **active** child.
- `PollOnce()` drives the active child's poll, and—while degraded—also drives the failback probe cadence.
**Step 3: Run green** (x86 filter `FailoverAlarmConsumerTests`).
**Step 4: Commit.**
```bash
git add src/ZB.MOM.WW.MxGateway.Worker/MxAccess/FailoverAlarmConsumer.cs \
src/ZB.MOM.WW.MxGateway.Worker/MxAccess/AlarmProviderModeChange.cs \
src/ZB.MOM.WW.MxGateway.Worker.Tests/MxAccess/FailoverAlarmConsumerTests.cs
git commit -m "worker(alarms): FailoverAlarmConsumer auto-failover/failback state machine"
```
---
### Task 8: Synthetic-GUID helper + degraded flag on the event sink path
**Classification:** standard
**Estimated implement time:** ~4 min
**Parallelizable with:** Task 9
Carry `degraded` + `source_provider` from the worker synthesis into the emitted `OnAlarmTransitionEvent`.
**Files:**
- Modify: `src/ZB.MOM.WW.MxGateway.Worker/MxAccess/MxAlarmSnapshot.cs` (add `bool Degraded`)
- Modify: `src/ZB.MOM.WW.MxGateway.Worker/MxAccess/MxAccessAlarmEventSink.cs` (`EnqueueTransition` carries degraded)
- Modify: `src/ZB.MOM.WW.MxGateway.Worker/MxAccess/MxAccessEventMapper.cs` (`CreateOnAlarmTransition` sets `Degraded`/`SourceProvider`)
- Create: `src/ZB.MOM.WW.MxGateway.Worker/MxAccess/SyntheticAlarmGuid.cs`
- Test: add cases to `src/ZB.MOM.WW.MxGateway.Worker.Tests/MxAccess/AlarmDispatcherTests.cs` and a new `SyntheticAlarmGuidTests.cs`
**Step 1: `SyntheticAlarmGuid.ForReference(string reference)`** — deterministic GUID from a stable hash (e.g. MD5 of the UTF-8 reference → `new Guid(bytes)`), so subtag-mode acks resolve by GUID. Test determinism + difference:
```csharp
[Fact] public void SameReference_SameGuid() =>
Assert.Equal(SyntheticAlarmGuid.ForReference("A.B.C"), SyntheticAlarmGuid.ForReference("A.B.C"));
[Fact] public void DifferentReference_DifferentGuid() =>
Assert.NotEqual(SyntheticAlarmGuid.ForReference("A.B.C"), SyntheticAlarmGuid.ForReference("A.B.D"));
```
**Step 2: Thread `degraded`** through `MxAlarmSnapshotRecord.Degraded`, `EnqueueTransition(... bool degraded)`, and `CreateOnAlarmTransition(... bool degraded, AlarmProviderMode sourceProvider)`. Default `degraded=false`, `sourceProvider=Alarmmgr` so the wnwrap path is unchanged (regression: existing `AlarmDispatcherTests` still pass with `Degraded=false`).
**Step 3: Tests** — extend `AlarmDispatcherTests` with a subtag-style transition asserting `body.Degraded == true` and `SourceProvider == Subtag`.
**Step 4: Build x86 + run** worker tests for `AlarmDispatcherTests`, `SyntheticAlarmGuidTests`.
**Step 5: Commit.**
```bash
git add src/ZB.MOM.WW.MxGateway.Worker/MxAccess/MxAlarmSnapshot.cs \
src/ZB.MOM.WW.MxGateway.Worker/MxAccess/MxAccessAlarmEventSink.cs \
src/ZB.MOM.WW.MxGateway.Worker/MxAccess/MxAccessEventMapper.cs \
src/ZB.MOM.WW.MxGateway.Worker/MxAccess/SyntheticAlarmGuid.cs \
src/ZB.MOM.WW.MxGateway.Worker.Tests/MxAccess/
git commit -m "worker(alarms): synthetic GUID + degraded provenance on emitted transitions"
```
---
### Task 9: Wire watch-list + failover config through `AlarmCommandHandler`; emit mode-changed event
**Classification:** high-risk
**Estimated implement time:** ~5 min
**Parallelizable with:** none (depends on Tasks 5, 7, 8)
**Files:**
- Modify: `src/ZB.MOM.WW.MxGateway.Worker/MxAccess/AlarmCommandHandler.cs`
- Modify: `src/ZB.MOM.WW.MxGateway.Worker/MxAccess/IAlarmCommandHandler.cs`
- Modify: `src/ZB.MOM.WW.MxGateway.Worker/MxAccess/MxAccessCommandExecutor.cs` (`ExecuteSubscribeAlarms`, ~lines 588-616)
- Modify: `src/ZB.MOM.WW.MxGateway.Worker/MxAccess/MxAccessStaSession.cs` (consumer factory wiring; mode-change → event queue)
- Test: `src/ZB.MOM.WW.MxGateway.Worker.Tests/MxAccess/AlarmCommandHandlerTests.cs` (extend or create)
**Step 1: Carry the subscribe payload.** Change the alarm subscribe entry point from `Subscribe(string subscription)` to `Subscribe(SubscribeAlarmsCommand command)` (the command now has `ForcedMode`, `WatchList`, `Failover`). In `AlarmCommandHandler.Subscribe`:
- Build the active provider per `ForcedMode`:
- `ALARMMGR``WnWrapAlarmConsumer` only.
- `SUBTAG``SubtagAlarmConsumer(new LmxSubtagAlarmSource(factory), watchList)` only.
- `UNSPECIFIED``FailoverAlarmConsumer(primary: wnwrap, standby: subtag, settings-from-Failover)`.
- Use the existing `consumerFactory` seam but widen it to `Func<SubscribeAlarmsCommand, IMxAccessAlarmConsumer>` so tests inject fakes and production builds the failover composite. Subscribe to `FailoverAlarmConsumer.ProviderModeChanged` and enqueue an `OnAlarmProviderModeChangedEvent` MxEvent via the event queue (new mapper method `CreateOnAlarmProviderModeChanged`).
**Step 2: Executor + STA wiring.** `ExecuteSubscribeAlarms` passes the full `SubscribeAlarmsCommand` (not just the expression). In `MxAccessStaSession`, the `alarmCommandHandlerFactory` must give the handler access to the `IMxAccessComObjectFactory` so the subtag source can create its own proxy server on the STA; keep the `EnsureOnAlarmConsumerThread` affinity guard on every path.
**Step 3: Test** — fake consumer factory; assert that a `SUBTAG` forced command builds the subtag consumer and advises; that an auto command building a fake failover composite, when it raises `ProviderModeChanged`, enqueues an `OnAlarmProviderModeChangedEvent` on the queue.
**Step 4: Build x86 + worker tests.**
**Step 5: Commit.**
```bash
git add src/ZB.MOM.WW.MxGateway.Worker/MxAccess/ src/ZB.MOM.WW.MxGateway.Worker.Tests/MxAccess/
git commit -m "worker(alarms): route watch-list/failover config; emit provider-mode-changed event"
```
---
## Phase 2 — Gateway: discovery, options, monitor, metrics, dashboard
### Task 10: `AlarmsOptions.Fallback` + validation
**Classification:** standard
**Estimated implement time:** ~4 min
**Parallelizable with:** Task 11, Task 13
**Files:**
- Modify: `src/ZB.MOM.WW.MxGateway.Server/Configuration/AlarmsOptions.cs`
- Create: `src/ZB.MOM.WW.MxGateway.Server/Configuration/AlarmFallbackOptions.cs`
- Modify: `src/ZB.MOM.WW.MxGateway.Server/Configuration/GatewayOptionsValidator.cs` (`ValidateAlarms`, ~lines 234-258)
- Test: `src/ZB.MOM.WW.MxGateway.Tests/Configuration/GatewayOptionsValidatorTests.cs` (extend)
**Step 1:** Add `AlarmFallbackOptions Fallback { get; init; } = new();` to `AlarmsOptions`. `AlarmFallbackOptions`: `string Mode = "Auto"` (`Auto|ForceAlarmManager|ForceSubtag`), `int ConsecutiveFailureThreshold = 3`, `int FailbackProbeIntervalSeconds = 30`, `int FailbackStableProbes = 3`, a `Discovery` sub-object (`bool UseGalaxyRepository = true`, `string Area = ""`, `string[] IncludeAttributes = []`, `string[] ExcludeAttributes = []`), and a `Subtags` sub-object (`Active="active"`, `Acked="acked"`, `AckComment=""`, `Priority="priority"`).
**Step 2:** In `ValidateAlarms`, when `Enabled` and `Mode == "ForceSubtag"` and `Discovery.UseGalaxyRepository == false` and `IncludeAttributes` empty ⇒ add a validation error ("ForceSubtag requires Galaxy Repository discovery or an explicit IncludeAttributes list"). Floor the three numeric values at 1. Validate `Mode` is one of the three literals.
**Step 3-5:** Test the new validation cases (red→green), build the server, commit.
---
### Task 11: Galaxy Repository "alarm attributes" discovery query
**Classification:** standard
**Estimated implement time:** ~5 min
**Parallelizable with:** Task 10, Task 13
**Files:**
- Modify: `src/ZB.MOM.WW.MxGateway.Server/Galaxy/GalaxyRepository.cs` (add `GetAlarmAttributesAsync` + SQL constant, following `GetAttributesAsync` ~lines 86-115 and `AttributesSql` ~line 176)
- Modify: `src/ZB.MOM.WW.MxGateway.Server/Galaxy/IGalaxyRepository.cs`
- Create: `src/ZB.MOM.WW.MxGateway.Server/Galaxy/GalaxyAlarmAttributeRow.cs`
- Test: `src/ZB.MOM.WW.MxGateway.Tests/Galaxy/` (projection unit test; live SQL gated)
**Step 1:** `GalaxyAlarmAttributeRow { string FullTagReference; string SourceObjectReference; string AckCommentSubtag; }` (and any priority subtag). `GetAlarmAttributesAsync` reuses the existing `is_alarm` detection (the `AlarmExtension` primitive join already in `AttributesSql`) filtered to `is_alarm = 1`, projecting the alarm reference + its ack-comment attribute. Follow the exact `SqlConnection`/`SqlCommand`/`SqlDataReader` pattern from `GetAttributesAsync`.
**Step 2:** Unit-test the row→`AlarmSubtagTarget` mapping (a pure mapper function); gate any live-DB test like the existing Galaxy live tests (or `Skip` with a note, matching `AlarmsLiveSmokeTests`).
**Step 3-5:** red→green, build server, commit.
---
### Task 12: Watch-list resolver (GR SQL + config override → `AlarmSubtagTarget[]`)
**Classification:** standard
**Estimated implement time:** ~4 min
**Parallelizable with:** none (depends on Tasks 10, 11)
**Files:**
- Create: `src/ZB.MOM.WW.MxGateway.Server/Alarms/AlarmWatchListResolver.cs`
- Create: `src/ZB.MOM.WW.MxGateway.Server/Alarms/IAlarmWatchListResolver.cs`
- Test: `src/ZB.MOM.WW.MxGateway.Tests/Alarms/AlarmWatchListResolverTests.cs`
**Step 1: Test the merge** with a fake `IGalaxyRepository`:
- discovery rows + `IncludeAttributes` are unioned; `ExcludeAttributes` removed; each becomes an `AlarmSubtagTarget` with `.active`/`.acked`/`.ackmsg` addresses composed from the configured `Subtags` names (`<reference>.<Active>`, etc.); empty config subtag names fall back to defaults; GR unavailable + no includes ⇒ empty list + a logged warning flag.
**Step 2: Implement** `ResolveAsync(AlarmsOptions, CancellationToken) → IReadOnlyList<AlarmSubtagTarget>`.
**Step 3-5:** red→green, build, commit.
---
### Task 13: Gateway metrics — provider-mode gauge + switch counter
**Classification:** small
**Estimated implement time:** ~3 min
**Parallelizable with:** Task 10, Task 11
**Files:**
- Modify: `src/ZB.MOM.WW.MxGateway.Server/Metrics/GatewayMetrics.cs` (ctor ~lines 55-79; add counter + observable gauge following the existing pattern)
- Test: `src/ZB.MOM.WW.MxGateway.Tests/Metrics/GatewayMetricsTests.cs` (if present; else assert via a `MeterListener`)
**Step 1:** Add `mxgateway.alarms.provider_switches` counter (tagged `from`,`to`,`reason`) and `mxgateway.alarms.provider_mode` observable gauge (1=alarmmgr, 2=subtag), plus `AlarmProviderSwitched(int from, int to, string reason)` and a private `GetAlarmProviderMode()` (lock on `_syncRoot` like the others).
**Step 2-4:** test, build, commit.
---
### Task 14: `GatewayAlarmMonitor` — arm watch-list, reflect provider mode, reconcile on switch
**Classification:** high-risk
**Estimated implement time:** ~5 min
**Parallelizable with:** none (depends on Tasks 9, 12, 13)
**Files:**
- Modify: `src/ZB.MOM.WW.MxGateway.Server/Alarms/GatewayAlarmMonitor.cs` (ctor ~41-49; `SubscribeAlarmsAsync` ~210-233; event-drain loop; `StreamAsync` ~386-434)
- Test: `src/ZB.MOM.WW.MxGateway.Tests/Alarms/GatewayAlarmMonitorProviderModeTests.cs` (new, using `FakeWorkerHarness`)
**Step 1:** Inject `IAlarmWatchListResolver` and `GatewayMetrics`. In `SubscribeAlarmsAsync`, resolve the watch-list and build the `SubscribeAlarmsCommand` with `ForcedMode` (from `Fallback.Mode`), `WatchList`, and `Failover` populated from options — instead of the bare `{ SubscriptionExpression }`.
**Step 2:** In the worker-event drain path, handle `OnAlarmProviderModeChangedEvent`: update a `_providerStatus` field (mode/degraded/reason/since), `Broadcast(new AlarmFeedMessage { ProviderStatus = … })` to every subscriber, call `metrics.AlarmProviderSwitched(...)`, and force a `ReconcileAsync` so the cache re-seeds from the now-active provider (avoids raise/clear storms).
**Step 3:** In `StreamAsync`, emit the current `provider_status` as the **first** message (before the snapshot) so a late joiner immediately knows the mode.
**Step 4: Test** — stand up the monitor with `FakeWorkerHarness`; emit an `OnAlarmProviderModeChangedEvent(Subtag)`; assert a `StreamAsync` subscriber receives a `ProviderStatus{ Mode=Subtag, Degraded=true }` and that the switch counter incremented. Also assert a transition emitted in subtag mode flows through with `Degraded=true`.
**Step 5:** build server, run the new test, commit.
---
### Task 15: Dashboard — push provider status to `/hubs/alarms` + UI indicator
**Classification:** standard
**Estimated implement time:** ~5 min
**Parallelizable with:** none (depends on Task 14)
**Files:**
- Modify: `src/ZB.MOM.WW.MxGateway.Server/Dashboard/Hubs/AlarmsHubPublisher.cs` (forward `ProviderStatus` messages — they already flow through `StreamAsync`, so confirm the existing `SendAsync(AlarmMessage, message)` carries them; add a dedicated `"ProviderModeChanged"` client method if the dashboard needs a distinct channel)
- Modify: the alarms dashboard page/component (Bootstrap-only badge: green "alarmmgr" / amber "degraded — subtag") — find under `src/ZB.MOM.WW.MxGateway.Server/Dashboard/`
- Test: `src/ZB.MOM.WW.MxGateway.Tests/` dashboard model test (e.g. a `DashboardAlarmProviderStatus.FromFeed` mapper, mirroring `DashboardActiveAlarm.FromSnapshot`)
**Constraint:** Bootstrap CSS/JS only — no MudBlazor/Radzen/FluentUI.
**Steps:** TDD the model mapper, wire the publisher + badge, build, commit.
---
## Phase 3 — Integration, docs, live smoke
### Task 16: End-to-end fake-worker failover test
**Classification:** standard
**Estimated implement time:** ~5 min
**Parallelizable with:** Task 18
**Files:**
- Test: `src/ZB.MOM.WW.MxGateway.Tests/Alarms/AlarmFailoverEndToEndTests.cs`
Drive the full gateway path with `FakeWorkerHarness`: subscribe (assert the `SubscribeAlarmsCommand` carries a watch-list), emit a wnwrap-style transition (assert `Degraded=false`), emit `OnAlarmProviderModeChangedEvent(Subtag)`, emit a synthesized transition (assert `Degraded=true`, `SourceProvider=Subtag`), then `OnAlarmProviderModeChangedEvent(Alarmmgr)` and assert the feed reports recovery. Build, run, commit.
---
### Task 17: Live subtag smoke test (opt-in)
**Classification:** small
**Estimated implement time:** ~4 min
**Parallelizable with:** Task 18
**Files:**
- Test: `src/ZB.MOM.WW.MxGateway.IntegrationTests/...AlarmSubtagLiveSmokeTests.cs` (or the worker live suite)
A `[LiveMxAccessFact]`, `Skip`-by-default test (per `AlarmsLiveSmokeTests` precedent) that, against a live Galaxy + alarm flip script: advises the real `.active`/`.acked` subtags via `LmxSubtagAlarmSource`, asserts a synthesized raise/clear, and performs an ack via the ack-comment write. Document the exact subtag names discovered (resolves the design's open item). Commit.
---
### Task 18: Documentation
**Classification:** trivial
**Estimated implement time:** ~5 min
**Parallelizable with:** Task 16, Task 17
**Files:**
- Modify: `gateway.md` (alarm provider section: dual provider + auto-failover/failback)
- Modify: `docs/DesignDecisions.md` (record the fallback decision + parity rationale)
- Modify: `docs/GatewayConfiguration.md` (the `MxGateway:Alarms:Fallback` block)
- Modify: `docs/AlarmClientDiscovery.md` (subtag provider, synthesis rules, ack-comment write)
- Modify: `docs/Grpc.md` (new `provider_status` feed case + `degraded`/`source_provider` fields)
Follow `StyleGuide.md` (PascalCase filenames, present tense, explain *why*). No code; commit.
---
## Execution order & parallelism summary
- **Serial spine:** 1 → 2 → 3 → 4 → 5 → 6 → 7 → 8/9 → 10/11 → 12 → 13 → 14 → 15 → 16 → 17/18.
- **Parallelizable clusters:** {8, 9 partially}, {10, 11, 13}, {16, 17, 18}.
- **High-risk tasks** (full review chain): 1, 2, 6, 7, 9, 14. **Standard:** 4, 5, 8, 10, 11, 12, 15, 16. **Small/trivial:** 3, 13, 17, 18.
## Risk notes for the executor
- **Field-number collisions:** Task 2 must read the live `MxEvent`/`MxEventFamily` numbers before adding — the agent map gave alarm-payload maxima but not `MxEvent`'s. Verify before editing.
- **STA discipline:** every COM call in `LmxSubtagAlarmSource` and every consumer swap runs on the worker STA; keep the `EnsureOnAlarmConsumerThread` guard. The worker STA already pumps Windows messages, which is required for the subtag `OnDataChange` to deliver.
- **Parity regression:** alarmmgr-mode output must be byte-for-byte unchanged. Existing `AlarmDispatcherTests` and `ProtobufContractRoundTripTests` are the guardrail — they must stay green with `Degraded=false` defaults.
- **Subtag names unverified:** the design leaves exact AVEVA subtag names (`.active`, `.acked`, ack-comment) to confirm against `C:\Users\dohertj2\Desktop\mxaccess` + a live Galaxy (Task 17). The config `Subtags` block exists so names are not hard-coded.
@@ -0,0 +1,147 @@
{
"planPath": "docs/plans/2026-06-13-alarm-subtag-fallback.md",
"tasks": [
{
"id": 54,
"subject": "Task 1: Worker proto \u2014 watch-list, failover config, AlarmProviderMode",
"status": "completed"
},
{
"id": 55,
"subject": "Task 2: Gateway proto \u2014 provider status, degraded provenance, mode-changed event",
"status": "completed",
"blockedBy": [
54
]
},
{
"id": 56,
"subject": "Task 3: Proto round-trip tests for new alarm fields",
"status": "completed",
"blockedBy": [
54,
55
]
},
{
"id": 57,
"subject": "Task 4: Subtag value-source abstraction + synthesis state machine",
"status": "completed",
"blockedBy": [
54
]
},
{
"id": 58,
"subject": "Task 5: SubtagAlarmConsumer over the source seam",
"status": "completed",
"blockedBy": [
57
]
},
{
"id": 59,
"subject": "Task 6: COM-backed LmxSubtagAlarmSource",
"status": "completed",
"blockedBy": [
57
]
},
{
"id": 60,
"subject": "Task 7: FailoverAlarmConsumer state machine",
"status": "completed",
"blockedBy": [
58
]
},
{
"id": 61,
"subject": "Task 8: Synthetic GUID + degraded flag on event sink path",
"status": "completed",
"blockedBy": [
55
]
},
{
"id": 62,
"subject": "Task 9: Wire watch-list/failover through AlarmCommandHandler; emit mode-changed",
"status": "completed",
"blockedBy": [
58,
60,
61
]
},
{
"id": 63,
"subject": "Task 10: AlarmsOptions.Fallback + validation",
"status": "completed"
},
{
"id": 64,
"subject": "Task 11: Galaxy Repository alarm-attributes discovery query",
"status": "completed"
},
{
"id": 65,
"subject": "Task 12: Watch-list resolver (GR SQL + config override)",
"status": "completed",
"blockedBy": [
54,
63,
64
]
},
{
"id": 66,
"subject": "Task 13: Metrics \u2014 provider-mode gauge + switch counter",
"status": "completed"
},
{
"id": 67,
"subject": "Task 14: GatewayAlarmMonitor \u2014 arm watch-list, reflect mode, reconcile on switch",
"status": "completed",
"blockedBy": [
55,
62,
65,
66
]
},
{
"id": 68,
"subject": "Task 15: Dashboard \u2014 push provider status + UI badge",
"status": "completed",
"blockedBy": [
67
]
},
{
"id": 69,
"subject": "Task 16: End-to-end fake-worker failover test",
"status": "completed",
"blockedBy": [
67
]
},
{
"id": 70,
"subject": "Task 17: Live subtag smoke test (opt-in)",
"status": "completed",
"blockedBy": [
59,
62
]
},
{
"id": 71,
"subject": "Task 18: Documentation",
"status": "completed",
"blockedBy": [
67
]
}
],
"lastUpdated": "2026-06-13T13:30:00Z"
}
+57
View File
@@ -143,6 +143,63 @@ session if the worker faults. Gated by `MxGateway:Alarms:Enabled` — see
`docs/DesignDecisions.md` for why this reverses the v1 single-subscriber rule `docs/DesignDecisions.md` for why this reverses the v1 single-subscriber rule
for the alarm subsystem. for the alarm subsystem.
### Alarm providers and failover
The alarm feed has two providers, both implemented worker-side:
- **Alarm manager (primary):** `WnWrapAlarmConsumer` polls
`wwAlarmConsumerClass.GetXmlCurrentAlarms2` on the worker STA. This is the
authoritative native source.
- **Subtag monitoring (standby):** `SubtagAlarmConsumer` advises each alarm
attribute's subtags (`.active`, `.acked`, optionally `.priority`) via the
existing `AddItem`/`Advise` pipeline through `LmxSubtagAlarmSource` and
synthesizes alarm transitions with `SubtagAlarmStateMachine`. This is a
non-parity, lower-fidelity source — synthetic GUIDs, no native raise
timestamps, narrower fields.
`FailoverAlarmConsumer` wraps both and owns the state machine:
- **Auto-failover:** after `ConsecutiveFailureThreshold` (default 3)
consecutive wnwrap COM failures — `Subscribe` or `PollOnce` throws or
returns a failure HRESULT — it activates the standby. The standby is armed
(subscribed and adviseing) from the start so its state is warm at the moment
of switch.
- **Auto-failback:** while degraded, every `FailbackProbeIntervalSeconds`
(default 30) it re-probes the still-subscribed primary. After
`FailbackStableProbes` (default 3) consecutive clean polls it switches back
to the alarm manager.
- **On every switch:** the consumer snapshots the now-active provider and
emits `OnAlarmProviderModeChangedEvent` so the gateway can reconcile its
cache without a raise/clear storm.
Synthesis is worker-side. This preserves the parity rule — the gateway
forwards only events the worker emits and never synthesizes transitions
itself. The synthesis rules are documented in
`docs/AlarmClientDiscovery.md`.
**Acknowledge in subtag mode:** the ack-by-name path writes the operator
comment to the alarm attribute's ack-comment subtag. The write performs the
ack. If the attribute has no writable ack-comment subtag configured, the RPC
returns `FailedPrecondition`. In alarm-manager mode, `AlarmAckByName` is
used as before.
**Degraded state visibility:** every subtag-mode transition carries
`degraded = true` and `source_provider = ALARM_PROVIDER_MODE_SUBTAG` on the
`OnAlarmTransitionEvent` and `ActiveAlarmSnapshot` proto fields. The
`AlarmFeedMessage` feed emits an `AlarmProviderStatus` message (the
`provider_status` oneof case) on stream open and on every switch. The
dashboard shows a Bootstrap badge (green for alarm manager, amber when
degraded). Metrics: `mxgateway.alarms.provider_mode` gauge (1 = alarmmgr,
2 = subtag) and `mxgateway.alarms.provider_switches` counter.
Forced modes are available via `MxGateway:Alarms:Fallback:Mode`:
`ForceAlarmManager` disables failover; `ForceSubtag` forces the standby
on from startup; `Auto` (default) enables failover and failback. Watch-list
discovery for the subtag provider uses Galaxy Repository SQL with config
overrides. See `docs/GatewayConfiguration.md` for the full `Fallback` option
block and `docs/AlarmClientDiscovery.md` for synthesis rules and fidelity
limitations.
Dashboard authentication is LDAP-backed (distinct from the API-key model on Dashboard authentication is LDAP-backed (distinct from the API-key model on
the gRPC API). `/login` accepts username and password in a form body, binds the gRPC API). `/login` accepts username and password in a form body, binds
against `MxGateway:Ldap`, maps the user's LDAP groups to `Admin` or `Viewer` against `MxGateway:Ldap`, maps the user's LDAP groups to `Admin` or `Viewer`
+104 -57
View File
@@ -1,28 +1,37 @@
# GLAuth — LDAP authn reference for mxaccessgw # GLAuth — LDAP authn reference for mxaccessgw
GLAuth is a lightweight LDAP server installed on this dev box at > **UPDATED 2026-06-04 — mxaccessgw no longer uses a per-box GLAuth at `C:\publish\glauth`.
`C:\publish\glauth\` and run as a Windows service via NSSM. It already > Dev/test LDAP is now the SHARED GLAuth on `10.100.0.35:3893` (`dc=zb,dc=local`);
backs the LmxOpcUa OPC UA server's UserName-token authn and the LmxOpcUa > the single source of truth is `scadaproj/infra/glauth/` (`config.toml` + `README`).
Admin UI's cookie login; this doc captures everything mxaccessgw needs > The localhost/NSSM/`glauth.cfg` procedures below are RETIRED, kept for reference/rollback.**
to consume the same directory so a single set of dev credentials covers
both stacks.
The authoritative copy of LmxOpcUa's reference lives at GLAuth is a lightweight LDAP server. It already backs all three sister apps (MxAccessGateway,
`C:\publish\glauth\auth.md`. This doc is a redistilled view tailored to OtOpcUa, ScadaBridge) through a **shared container** (`zb-shared-glauth`) running on the Linux
mxaccessgw — what users + groups are already provisioned, how to bind docker host at **`10.100.0.35:3893`**. This doc captures everything mxaccessgw needs to consume
against them, and what's needed to add a gw-specific role. that directory so a single set of dev credentials covers all stacks.
~~GLAuth is installed on this dev box at `C:\publish\glauth\` and run as a Windows service via
NSSM.~~ *(RETIRED — the per-box Windows service has been stopped and set to Manual startup;
kept only as a rollback option. Do not edit or restart it for new work.)*
The single source of truth for the shared GLAuth is
**`~/Desktop/scadaproj/infra/glauth/config.toml`** (deploy/verify runbook:
`scadaproj/infra/glauth/README.md`). This doc is a redistilled view tailored to mxaccessgw —
what users + groups are provisioned, how to bind against them, and what's needed to add a
gw-specific role.
## Connection details ## Connection details
| Setting | Value | | Setting | Value |
|---|---| |---|---|
| Protocol | LDAP (unencrypted) | | Protocol | LDAP (unencrypted) |
| Host | `localhost` | | Host | **`10.100.0.35`** (shared docker host — ~~`localhost`~~ retired) |
| Port | `3893` | | Port | `3893` |
| LDAPS | disabled in dev (set `[ldaps]` block to enable) | | LDAPS | disabled in dev (`Transport=None`, `AllowInsecure=true`) |
| Base DN | `dc=lmxopcua,dc=local` | | Base DN | `dc=zb,dc=local` |
| Bind DN format | `cn={username},dc=lmxopcua,dc=local` | | Bind DN format | `cn={username},dc=zb,dc=local` |
| Group OU | `ou=<groupname>,ou=groups,dc=lmxopcua,dc=local` | | Service account DN | `cn=serviceaccount,dc=zb,dc=local` / `serviceaccount123` |
| Group OU | `ou=<groupname>,ou=groups,dc=zb,dc=local` |
| Failed-bind throttle | 3 fails → 10-minute IP lockout (per `[behaviors]`) | | Failed-bind throttle | 3 fails → 10-minute IP lockout (per `[behaviors]`) |
## Pre-existing groups (LmxOpcUa role taxonomy) ## Pre-existing groups (LmxOpcUa role taxonomy)
@@ -33,11 +42,11 @@ LmxOpcUa write rights doesn't need a second account for the gw.
| Group | GID | DN | LmxOpcUa meaning | Suggested mxgw mapping | | Group | GID | DN | LmxOpcUa meaning | Suggested mxgw mapping |
|---|---|---|---|---| |---|---|---|---|---|
| ReadOnly | 5501 | `ou=ReadOnly,ou=groups,dc=lmxopcua,dc=local` | Browse + read OPC UA nodes | `Browse` + `Subscribe` (read paths only) | | ReadOnly | 5501 | `ou=ReadOnly,ou=groups,dc=zb,dc=local` | Browse + read OPC UA nodes | `Browse` + `Subscribe` (read paths only) |
| WriteOperate | 5502 | `ou=WriteOperate,ou=groups,dc=lmxopcua,dc=local` | Write FreeAccess / Operate attrs | `Write` (plain) | | WriteOperate | 5502 | `ou=WriteOperate,ou=groups,dc=zb,dc=local` | Write FreeAccess / Operate attrs | `Write` (plain) |
| WriteTune | 5504 | `ou=WriteTune,ou=groups,dc=lmxopcua,dc=local` | Write Tune attrs | `WriteSecured` (Tune only) | | WriteTune | 5504 | `ou=WriteTune,ou=groups,dc=zb,dc=local` | Write Tune attrs | `WriteSecured` (Tune only) |
| WriteConfigure | 5505 | `ou=WriteConfigure,ou=groups,dc=lmxopcua,dc=local` | Write Configure attrs | `WriteSecured` (Configure) | | WriteConfigure | 5505 | `ou=WriteConfigure,ou=groups,dc=zb,dc=local` | Write Configure attrs | `WriteSecured` (Configure) |
| AlarmAck | 5503 | `ou=AlarmAck,ou=groups,dc=lmxopcua,dc=local` | Acknowledge alarms | gw alarm-ack RPC, when added | | AlarmAck | 5503 | `ou=AlarmAck,ou=groups,dc=zb,dc=local` | Acknowledge alarms | gw alarm-ack RPC, when added |
**A user can be in multiple groups**`othergroups = [...]` in the **A user can be in multiple groups**`othergroups = [...]` in the
config is a list. `admin` is the canonical example (in every role config is a list. `admin` is the canonical example (in every role
@@ -59,20 +68,26 @@ For mxaccessgw dev, `admin` covers every gw-side capability test;
`readonly` is the right "negative" case for proving Browse-OK / `readonly` is the right "negative" case for proving Browse-OK /
Write-denied. Write-denied.
The gateway dashboard adds one role beyond this LmxOpcUa taxonomy: The gateway dashboard uses two gateway-specific groups beyond the LmxOpcUa taxonomy:
`GwAdmin`. `LdapOptions.RequiredGroup` defaults to `GwAdmin`, so the `GwAdmin` (gid 5610 → role `Administrator`) and `GwReader` (gid 5611 → role `Viewer`).
dashboard login and `DashboardLdapLiveTests` require `admin` to be a These are already provisioned in the shared `scadaproj/infra/glauth/config.toml`.
member of a `GwAdmin` group. `GwAdmin` is **not** in the baseline The dashboard test users are **`multi-role`/`password`** (Administrator) and
GLAuth config — it must be provisioned before dashboard authn or the **`gw-viewer`/`password`** (Viewer). `LdapOptions.RequiredGroup` defaults to `GwAdmin`.
LDAP live tests work. See [Provisioning the GwAdmin See [Provisioning the GwAdmin group](#provisioning-the-gwadmin-group) below for the
group](#provisioning-the-gwadmin-group) below. (now-retired) per-box procedure and for the shared-config equivalent.
> **Dashboard role value (Task 1.7):** the LDAP `GwAdmin` group now maps to
> the canonical dashboard role **`Administrator`** (was `Admin`); `GwReader`
> maps to `Viewer`. This is a pure value rename via
> `MxGateway:Dashboard:GroupToRole` — same operations are authorized. (This
> dashboard role is distinct from the lowercase gRPC `admin` *API-key scope*.)
## Two bind patterns ## Two bind patterns
### 1. Direct bind (simplest) ### 1. Direct bind (simplest)
``` ```
DN: cn=admin,dc=lmxopcua,dc=local DN: cn=admin,dc=zb,dc=local
Password: admin123 Password: admin123
``` ```
@@ -84,9 +99,9 @@ by `sAMAccountName`, not `cn`. Use this only for dev convenience.
### 2. Bind-then-search (production-grade) ### 2. Bind-then-search (production-grade)
``` ```
1. Bind as the service account (cn=serviceaccount,dc=lmxopcua,dc=local 1. Bind as the service account (cn=serviceaccount,dc=zb,dc=local
/ serviceaccount123). / serviceaccount123).
2. Search under dc=lmxopcua,dc=local with filter 2. Search under dc=zb,dc=local with filter
(uid=<entered-username>) — or any attribute the deployment (uid=<entered-username>) — or any attribute the deployment
identifies users by. GLAuth populates uid + cn. identifies users by. GLAuth populates uid + cn.
3. Read the returned entry's DN + memberOf list (groups). 3. Read the returned entry's DN + memberOf list (groups).
@@ -112,12 +127,12 @@ record:
```yaml ```yaml
ldap: ldap:
enabled: true enabled: true
server: localhost server: 10.100.0.35 # shared GLAuth on docker host (was localhost)
port: 3893 port: 3893
useTls: false useTls: false
allowInsecureLdap: true # dev only allowInsecureLdap: true # dev only
searchBase: "dc=lmxopcua,dc=local" searchBase: "dc=zb,dc=local"
serviceAccountDn: "cn=serviceaccount,dc=lmxopcua,dc=local" serviceAccountDn: "cn=serviceaccount,dc=zb,dc=local"
serviceAccountPassword: "serviceaccount123" serviceAccountPassword: "serviceaccount123"
userNameAttribute: "uid" # GLAuth populates this; AD uses sAMAccountName userNameAttribute: "uid" # GLAuth populates this; AD uses sAMAccountName
displayNameAttribute: "cn" displayNameAttribute: "cn"
@@ -131,19 +146,35 @@ ldap:
``` ```
`groupAttribute` returns full DNs like `groupAttribute` returns full DNs like
`ou=ReadOnly,ou=groups,dc=lmxopcua,dc=local` — the authenticator `ou=ReadOnly,ou=groups,dc=zb,dc=local` — the authenticator
should strip the leading `ou=` (or `cn=` against AD) RDN value and should strip the leading `ou=` (or `cn=` against AD) RDN value and
look that up in `groupToRole`. look that up in `groupToRole`.
## Provisioning the GwAdmin group ## Provisioning the GwAdmin group
> **UPDATED 2026-06-04 — RETIRED per-box procedure.** `GwAdmin` (gid 5610) and `GwReader`
> (gid 5611) are already present in the shared GLAuth. To add or modify users/groups,
> edit **`~/Desktop/scadaproj/infra/glauth/config.toml`** on host `10.100.0.35` and run:
>
> ```bash
> cd ~/Desktop/scadaproj/infra/glauth
> docker compose up -d --force-recreate
> ```
>
> The per-box `C:\publish\glauth\glauth.cfg` + NSSM procedure below is kept for
> rollback reference only — do not use it for new provisioning.
`GwAdmin` is the gateway-specific dashboard-admin role. It is the `GwAdmin` is the gateway-specific dashboard-admin role. It is the
default `LdapOptions.RequiredGroup`, so the dashboard cookie login and default `LdapOptions.RequiredGroup`, so the dashboard cookie login and
`DashboardLdapLiveTests` (`MXGATEWAY_RUN_LIVE_LDAP_TESTS=1`) reject `DashboardLdapLiveTests` (`MXGATEWAY_RUN_LIVE_LDAP_TESTS=1`) reject
`admin` until a `GwAdmin` group exists and `admin` is a member. logins unless the user is a member of `GwAdmin`.
GLAuth's baseline config ships only the five LmxOpcUa role groups, so The `GwAdmin` (gid 5610) and `GwReader` (gid 5611) groups already exist in the shared
`GwAdmin` must be added to GLAuth rather than run from a separate LDAP config at `scadaproj/infra/glauth/config.toml`. Dashboard test users are
server: `multi-role`/`password` (Administrator) and `gw-viewer`/`password` (Viewer).
---
**RETIRED — per-box provisioning (reference/rollback only):**
1. Edit `C:\publish\glauth\glauth.cfg` 1. Edit `C:\publish\glauth\glauth.cfg`
2. Append the group: 2. Append the group:
@@ -172,7 +203,7 @@ server:
4. `nssm restart GLAuth` 4. `nssm restart GLAuth`
After the restart, `admin`'s `memberOf` includes After the restart, `admin`'s `memberOf` includes
`ou=GwAdmin,ou=groups,dc=lmxopcua,dc=local`, which the authenticator `ou=GwAdmin,ou=groups,dc=zb,dc=local`, which the authenticator
strips to `GwAdmin` and matches against `RequiredGroup`. The same strips to `GwAdmin` and matches against `RequiredGroup`. The same
pattern applies to any future permission that doesn't fit the existing pattern applies to any future permission that doesn't fit the existing
five roles. five roles.
@@ -193,15 +224,16 @@ echo -n "yourpassword" | openssl dgst -sha256
## Quick verification ## Quick verification
From mxaccessgw's dev box, prove the directory is reachable: From mxaccessgw's dev box, prove the shared directory is reachable:
```powershell ```powershell
# Plain bind via PowerShell + System.DirectoryServices.Protocols # Plain bind via PowerShell + System.DirectoryServices.Protocols
$ldap = New-Object System.DirectoryServices.Protocols.LdapConnection("localhost:3893") # (shared GLAuth on 10.100.0.35 — was localhost, now the docker host)
$ldap = New-Object System.DirectoryServices.Protocols.LdapConnection("10.100.0.35:3893")
$ldap.AuthType = [System.DirectoryServices.Protocols.AuthType]::Basic $ldap.AuthType = [System.DirectoryServices.Protocols.AuthType]::Basic
$ldap.SessionOptions.ProtocolVersion = 3 $ldap.SessionOptions.ProtocolVersion = 3
$ldap.SessionOptions.SecureSocketLayer = $false $ldap.SessionOptions.SecureSocketLayer = $false
$cred = New-Object System.Net.NetworkCredential("cn=admin,dc=lmxopcua,dc=local","admin123") $cred = New-Object System.Net.NetworkCredential("cn=multi-role,dc=zb,dc=local","password")
$ldap.Bind($cred) $ldap.Bind($cred)
"Bind OK" "Bind OK"
``` ```
@@ -209,17 +241,32 @@ $ldap.Bind($cred)
Or via `ldapsearch` if you have OpenLDAP CLI tools: Or via `ldapsearch` if you have OpenLDAP CLI tools:
```bash ```bash
ldapsearch -x -H ldap://localhost:3893 \ ldapsearch -x -H ldap://10.100.0.35:3893 \
-D "cn=admin,dc=lmxopcua,dc=local" -w admin123 \ -D "cn=serviceaccount,dc=zb,dc=local" -w serviceaccount123 \
-b "dc=lmxopcua,dc=local" "(uid=admin)" -b "dc=zb,dc=local" "(uid=multi-role)"
``` ```
The response should list `admin`'s entry with `memberOf` populated for The response should list `multi-role`'s entry with `memberOf` including
all five role groups — plus `GwAdmin` once the gateway-specific group `ou=GwAdmin,ou=groups,dc=zb,dc=local`.
is provisioned.
## Service management ## Service management
> **RETIRED — per-box NSSM service (reference/rollback only).** The shared GLAuth is
> managed via `docker compose` on `10.100.0.35` (`scadaproj/infra/glauth/`). The
> Windows NSSM `GLAuth` service on the dev box has been stopped and set to
> `StartupType=Manual`; only restart it if you need to roll back to a local directory.
>
> **Active (shared) management:**
> ```bash
> ssh 10.100.0.35
> cd ~/Desktop/scadaproj/infra/glauth
> docker compose ps # check container status
> docker compose up -d --force-recreate # apply config.toml changes
> docker compose logs -f # tail logs
> ```
**RETIRED — per-box NSSM commands (rollback reference):**
```powershell ```powershell
# Status / start / stop / restart # Status / start / stop / restart
nssm status GLAuth nssm status GLAuth
@@ -253,12 +300,12 @@ applies to mxaccessgw verbatim. Keys that change:
| Field | GLAuth dev value | AD production value | | Field | GLAuth dev value | AD production value |
|---|---|---| |---|---|---|
| `Server` | `localhost` | a domain controller FQDN, or the domain itself | | `Server` | `10.100.0.35` (shared docker host) | a domain controller FQDN, or the domain itself |
| `Port` | `3893` | `636` (LDAPS) — AD increasingly rejects plain bind under LDAP-signing enforcement | | `Port` | `3893` | `636` (LDAPS) — AD increasingly rejects plain bind under LDAP-signing enforcement |
| `UseTls` | `false` | `true` | | `UseTls` | `false` | `true` |
| `AllowInsecureLdap` | `true` | `false` | | `AllowInsecureLdap` | `true` | `false` |
| `SearchBase` | `dc=lmxopcua,dc=local` | `DC=corp,DC=example,DC=com` | | `SearchBase` | `dc=zb,dc=local` | `DC=corp,DC=example,DC=com` |
| `ServiceAccountDn` | `cn=serviceaccount,dc=lmxopcua,dc=local` | `CN=MxGwSvc,OU=Service Accounts,DC=corp,...` | | `ServiceAccountDn` | `cn=serviceaccount,dc=zb,dc=local` | `CN=MxGwSvc,OU=Service Accounts,DC=corp,...` |
| `UserNameAttribute` | `uid` | `sAMAccountName` (or `userPrincipalName`) | | `UserNameAttribute` | `uid` | `sAMAccountName` (or `userPrincipalName`) |
| `GroupAttribute` | `memberOf` (unchanged) | `memberOf` (unchanged) | | `GroupAttribute` | `memberOf` (unchanged) | `memberOf` (unchanged) |
@@ -269,12 +316,12 @@ add a `tokenGroups` query as an enhancement.
## Security notes for production ## Security notes for production
- **Plaintext passwords in `glauth.cfg` are dev-only.** The config is - **Plaintext passwords in `config.toml` are dev-only.** The shared config is in
unencrypted on disk; anyone with read access to `C:\publish\glauth\` `scadaproj/infra/glauth/config.toml` (unencrypted); restrict filesystem access on
can SHA256-rainbow-table the entries. Treat the dev creds as `10.100.0.35` accordingly. Treat the dev creds as throwaway. Production LDAP is Active
throwaway. Production LDAP is Active Directory. Directory. *(The retired per-box `C:\publish\glauth\glauth.cfg` has the same caveat.)*
- The 3-fail / 10-minute lockout is per source IP, not per user — a - The 3-fail / 10-minute lockout is per source IP, not per user — a
shared NAT can lock out a whole office. Tunable in `[behaviors]`. shared NAT can lock out a whole office. Tunable in `[behaviors]`.
- LDAPS isn't enabled in dev; binding sends passwords cleartext on the - LDAPS isn't enabled in dev; binding sends passwords cleartext on the
wire. Fine for `localhost`, never expose port 3893 off-box without wire. The shared GLAuth listens only on the LAN (`10.100.0.35`); never
enabling TLS first. expose port 3893 externally without enabling TLS first.
+26
View File
@@ -0,0 +1,26 @@
<?xml version="1.0" encoding="utf-8"?>
<configuration>
<packageSources>
<clear />
<add key="nuget.org" value="https://api.nuget.org/v3/index.json" />
<add key="dohertj2-gitea" value="https://gitea.dohertylan.com/api/packages/dohertj2/nuget/index.json" />
</packageSources>
<!-- nuget.org serves everything; the Gitea feed serves only the ZB.MOM.WW.* shared libs.
Credentials are NOT committed: they are provided per-developer at the user level. -->
<packageSourceMapping>
<packageSource key="nuget.org">
<package pattern="*" />
</packageSource>
<packageSource key="dohertj2-gitea">
<package pattern="ZB.MOM.WW.Health" />
<package pattern="ZB.MOM.WW.Health.*" />
<package pattern="ZB.MOM.WW.Telemetry" />
<package pattern="ZB.MOM.WW.Telemetry.*" />
<package pattern="ZB.MOM.WW.Configuration" />
<package pattern="ZB.MOM.WW.Auth" />
<package pattern="ZB.MOM.WW.Auth.*" />
<package pattern="ZB.MOM.WW.Audit" />
<package pattern="ZB.MOM.WW.Theme" />
</packageSource>
</packageSourceMapping>
</configuration>
File diff suppressed because it is too large Load Diff
@@ -315,6 +315,14 @@ message SubscribeBulkCommand {
repeated string tag_addresses = 2; repeated string tag_addresses = 2;
} }
// Provider selection / current provider for the alarm feed. UNSPECIFIED on a
// SubscribeAlarmsCommand means auto: alarmmgr primary with subtag fallback.
enum AlarmProviderMode {
ALARM_PROVIDER_MODE_UNSPECIFIED = 0;
ALARM_PROVIDER_MODE_ALARMMGR = 1;
ALARM_PROVIDER_MODE_SUBTAG = 2;
}
// Subscribe the worker's alarm consumer to an AVEVA alarm provider. // Subscribe the worker's alarm consumer to an AVEVA alarm provider.
// Subscription expression follows the canonical // Subscription expression follows the canonical
// `\\<machine>\Galaxy!<area>` format (literal "Galaxy" provider). The // `\\<machine>\Galaxy!<area>` format (literal "Galaxy" provider). The
@@ -323,6 +331,12 @@ message SubscribeBulkCommand {
// SubscribeAlarms to reconfigure). // SubscribeAlarms to reconfigure).
message SubscribeAlarmsCommand { message SubscribeAlarmsCommand {
string subscription_expression = 1; string subscription_expression = 1;
// UNSPECIFIED = auto-failover/failback. ALARMMGR/SUBTAG force one provider.
AlarmProviderMode forced_mode = 2;
// Subtag watch-list resolved by the gateway (GR SQL + config). Empty in pure
// alarmmgr mode; in subtag mode it bounds what the consumer can observe.
repeated AlarmSubtagTarget watch_list = 3;
AlarmFailoverConfig failover = 4;
} }
// Tear down the worker's alarm consumer. No-op if no subscription is // Tear down the worker's alarm consumer. No-op if no subscription is
@@ -330,6 +344,23 @@ message SubscribeAlarmsCommand {
message UnsubscribeAlarmsCommand { message UnsubscribeAlarmsCommand {
} }
// One alarm attribute the subtag fallback consumer advises. Addresses are full
// MXAccess item references the worker passes straight to AddItem.
message AlarmSubtagTarget {
string alarm_full_reference = 1; // e.g. "Galaxy!Area.Tank01.Level.HiHi"
string source_object_reference = 2; // e.g. "Tank01"
string active_subtag = 3; // item address of the in-alarm boolean
string acked_subtag = 4; // item address of the acknowledged boolean
string ack_comment_subtag = 5; // writable ack-comment attribute (ack write target)
string priority_subtag = 6; // optional severity source; empty if absent
}
message AlarmFailoverConfig {
int32 consecutive_failure_threshold = 1; // wnwrap COM failures before switching (>=1)
int32 failback_probe_interval_seconds = 2; // probe cadence while degraded (>=1)
int32 failback_stable_probes = 3; // clean probes before switching back (>=1)
}
// Acknowledge a single alarm by its GUID. Operator identity fields are // Acknowledge a single alarm by its GUID. Operator identity fields are
// recorded atomically with the ack transition in the alarm-history log. // recorded atomically with the ack transition in the alarm-history log.
// The reply's hresult / native_status surfaces AVEVA's // The reply's hresult / native_status surfaces AVEVA's
@@ -684,6 +715,7 @@ message MxEvent {
OperationCompleteEvent operation_complete = 22; OperationCompleteEvent operation_complete = 22;
OnBufferedDataChangeEvent on_buffered_data_change = 23; OnBufferedDataChangeEvent on_buffered_data_change = 23;
OnAlarmTransitionEvent on_alarm_transition = 24; OnAlarmTransitionEvent on_alarm_transition = 24;
OnAlarmProviderModeChangedEvent on_alarm_provider_mode_changed = 25;
} }
} }
@@ -694,6 +726,7 @@ enum MxEventFamily {
MX_EVENT_FAMILY_OPERATION_COMPLETE = 3; MX_EVENT_FAMILY_OPERATION_COMPLETE = 3;
MX_EVENT_FAMILY_ON_BUFFERED_DATA_CHANGE = 4; MX_EVENT_FAMILY_ON_BUFFERED_DATA_CHANGE = 4;
MX_EVENT_FAMILY_ON_ALARM_TRANSITION = 5; MX_EVENT_FAMILY_ON_ALARM_TRANSITION = 5;
MX_EVENT_FAMILY_ON_ALARM_PROVIDER_MODE_CHANGED = 6;
} }
message OnDataChangeEvent { message OnDataChangeEvent {
@@ -768,6 +801,20 @@ message OnAlarmTransitionEvent {
// Limit/threshold value that triggered the transition for limit alarms. // Limit/threshold value that triggered the transition for limit alarms.
// Optional; populated for AnalogLimitAlarm-family transitions. // Optional; populated for AnalogLimitAlarm-family transitions.
MxValue limit_value = 13; MxValue limit_value = 13;
// True when this transition came from the subtag-monitoring fallback rather
// than the native alarmmgr provider synthesized from data changes, reduced
// fidelity (synthetic GUID, no native raise time).
bool degraded = 14;
// Which provider produced this transition.
AlarmProviderMode source_provider = 15;
}
message OnAlarmProviderModeChangedEvent {
AlarmProviderMode mode = 1;
string reason = 2;
int32 hresult = 3; // COM HRESULT that triggered failover; 0 on failback
google.protobuf.Timestamp at = 4;
} }
enum AlarmTransitionKind { enum AlarmTransitionKind {
@@ -800,6 +847,8 @@ message ActiveAlarmSnapshot {
string operator_comment = 11; string operator_comment = 11;
MxValue current_value = 12; MxValue current_value = 12;
MxValue limit_value = 13; MxValue limit_value = 13;
bool degraded = 14;
AlarmProviderMode source_provider = 15;
} }
enum AlarmConditionState { enum AlarmConditionState {
@@ -866,9 +915,19 @@ message AlarmFeedMessage {
bool snapshot_complete = 2; bool snapshot_complete = 2;
// A live alarm state change (raise / acknowledge / clear). // A live alarm state change (raise / acknowledge / clear).
OnAlarmTransitionEvent transition = 3; OnAlarmTransitionEvent transition = 3;
// Provider-mode status. Emitted once on stream open and again on every
// failover/failback so late joiners learn the current mode immediately.
AlarmProviderStatus provider_status = 4;
} }
} }
message AlarmProviderStatus {
AlarmProviderMode mode = 1;
bool degraded = 2; // true whenever mode == SUBTAG
string reason = 3; // human-readable switch reason
google.protobuf.Timestamp since = 4;
}
message MxStatusProxy { message MxStatusProxy {
// Mirrors the `success` member of the MXAccess MXSTATUS_PROXY struct // Mirrors the `success` member of the MXAccess MXSTATUS_PROXY struct
// (a 16-bit signed value in the COM struct, widened to int32 on the // (a 16-bit signed value in the COM struct, widened to int32 on the
@@ -1,8 +1,11 @@
using System.Security.Claims; using System.Security.Claims;
using Microsoft.Extensions.Logging.Abstractions; using Microsoft.Extensions.Logging.Abstractions;
using Microsoft.Extensions.Options; using Microsoft.Extensions.Options;
using ZB.MOM.WW.Auth.Abstractions.Ldap;
using ZB.MOM.WW.Auth.Ldap;
using ZB.MOM.WW.MxGateway.Server.Configuration; using ZB.MOM.WW.MxGateway.Server.Configuration;
using ZB.MOM.WW.MxGateway.Server.Dashboard; using ZB.MOM.WW.MxGateway.Server.Dashboard;
using LibraryLdapOptions = ZB.MOM.WW.Auth.Abstractions.Ldap.LdapOptions;
namespace ZB.MOM.WW.MxGateway.IntegrationTests; namespace ZB.MOM.WW.MxGateway.IntegrationTests;
@@ -28,12 +31,11 @@ public sealed class DashboardLdapLiveTests
claim.Type == DashboardAuthenticationDefaults.LdapGroupClaimType claim.Type == DashboardAuthenticationDefaults.LdapGroupClaimType
&& claim.Value.Contains("GwAdmin", StringComparison.OrdinalIgnoreCase)); && claim.Value.Contains("GwAdmin", StringComparison.OrdinalIgnoreCase));
// IntegrationTests-023: DashboardAuthenticator.CreatePrincipal emits a // IntegrationTests-023: DashboardAuthenticator builds the principal with a
// ClaimTypes.Role claim derived from MapGroupsToRoles. The seeded // ClaimTypes.Role claim resolved from the LDAP groups via the
// GroupToRole map (GwAdmin -> Admin) means the admin principal must // DashboardGroupRoleMapper. The seeded GroupToRole map (GwAdmin -> Admin)
// carry Role=Admin alongside the raw LDAP-group claim. A regression in // means the admin principal must carry Role=Admin alongside the raw LDAP-group
// MapGroupsToRoles (returning an empty list, missing the RDN fallback) // claim. A regression in the group→role mapping would fail this assertion.
// would silently pass without this assertion.
Assert.Contains(result.Principal.Claims, claim => Assert.Contains(result.Principal.Claims, claim =>
claim.Type == ClaimTypes.Role claim.Type == ClaimTypes.Role
&& claim.Value == DashboardRoles.Admin); && claim.Value == DashboardRoles.Admin);
@@ -59,7 +61,7 @@ public sealed class DashboardLdapLiveTests
[LiveLdapFact] [LiveLdapFact]
public async Task AuthenticateAsync_AdminWithWrongPassword_FailsWithoutLeakingPassword() public async Task AuthenticateAsync_AdminWithWrongPassword_FailsWithoutLeakingPassword()
{ {
// Exercises the LdapException branch: the user exists and the service // Exercises the user-bind-failure branch: the user exists and the service
// account search succeeds, but the candidate bind is rejected. // account search succeeds, but the candidate bind is rejected.
const string wrongPassword = "definitely-not-the-admin-password"; const string wrongPassword = "definitely-not-the-admin-password";
DashboardAuthenticator authenticator = CreateAuthenticator(); DashboardAuthenticator authenticator = CreateAuthenticator();
@@ -78,8 +80,8 @@ public sealed class DashboardLdapLiveTests
[LiveLdapFact] [LiveLdapFact]
public async Task AuthenticateAsync_UnknownUsername_Fails() public async Task AuthenticateAsync_UnknownUsername_Fails()
{ {
// Exercises the `candidate is null` branch: the service-account search // Exercises the user-not-found branch: the service-account search returns no
// returns no entry, so no candidate bind is attempted. // entry, so no candidate bind is attempted.
DashboardAuthenticator authenticator = CreateAuthenticator(); DashboardAuthenticator authenticator = CreateAuthenticator();
DashboardAuthenticationResult result = await authenticator.AuthenticateAsync( DashboardAuthenticationResult result = await authenticator.AuthenticateAsync(
@@ -96,18 +98,13 @@ public sealed class DashboardLdapLiveTests
public async Task AuthenticateAsync_ServerUnreachable_FailsWithoutThrowing() public async Task AuthenticateAsync_ServerUnreachable_FailsWithoutThrowing()
{ {
// Exercises the connect-failure path: a closed loopback port produces a // Exercises the connect-failure path: a closed loopback port produces a
// connection error that DashboardAuthenticator must absorb into a Fail // connection error that the shared LdapAuthService must absorb into a Fail
// result rather than propagating an exception to the dashboard. // result rather than propagating an exception to the dashboard.
DashboardAuthenticator authenticator = new( DashboardAuthenticator authenticator = CreateAuthenticator(LibraryOptions() with
Options.Create(new GatewayOptions {
{ // 1 is a reserved port number that no LDAP server listens on.
Ldap = new LdapOptions Port = 1,
{ });
// 1 is a reserved port number that no LDAP server listens on.
Port = 1,
},
}),
NullLogger<DashboardAuthenticator>.Instance);
DashboardAuthenticationResult result = await authenticator.AuthenticateAsync( DashboardAuthenticationResult result = await authenticator.AuthenticateAsync(
"admin", "admin",
@@ -118,19 +115,48 @@ public sealed class DashboardLdapLiveTests
Assert.Null(result.Principal); Assert.Null(result.Principal);
} }
private static DashboardAuthenticator CreateAuthenticator() private static DashboardAuthenticator CreateAuthenticator() => CreateAuthenticator(LibraryOptions());
private static DashboardAuthenticator CreateAuthenticator(LibraryLdapOptions ldapOptions)
{ {
return new DashboardAuthenticator( GatewayOptions gatewayOptions = new()
Options.Create(new GatewayOptions {
Dashboard = new DashboardOptions
{ {
Dashboard = new DashboardOptions GroupToRole = new Dictionary<string, string>(StringComparer.OrdinalIgnoreCase)
{ {
GroupToRole = new Dictionary<string, string>(StringComparer.OrdinalIgnoreCase) ["GwAdmin"] = DashboardRoles.Admin,
{
["GwAdmin"] = DashboardRoles.Admin,
},
}, },
}), },
};
return new DashboardAuthenticator(
new LdapAuthService(ldapOptions),
new DashboardGroupRoleMapper(Options.Create(gatewayOptions)),
NullLogger<DashboardAuthenticator>.Instance); NullLogger<DashboardAuthenticator>.Instance);
} }
/// <summary>
/// Builds the shared library <see cref="LibraryLdapOptions"/> from the gateway's
/// default LDAP settings so the live tests exercise the same seeded directory the
/// gateway connects to (localhost:3893, plaintext, with AllowInsecure for dev).
/// </summary>
private static LibraryLdapOptions LibraryOptions()
{
ZB.MOM.WW.MxGateway.Server.Configuration.LdapOptions gateway = new();
return new LibraryLdapOptions
{
Enabled = gateway.Enabled,
Server = gateway.Server,
Port = gateway.Port,
Transport = gateway.Transport,
AllowInsecure = gateway.AllowInsecure,
SearchBase = gateway.SearchBase,
ServiceAccountDn = gateway.ServiceAccountDn,
ServiceAccountPassword = gateway.ServiceAccountPassword,
UserNameAttribute = gateway.UserNameAttribute,
DisplayNameAttribute = gateway.DisplayNameAttribute,
GroupAttribute = gateway.GroupAttribute,
};
}
} }
@@ -0,0 +1,168 @@
using ZB.MOM.WW.MxGateway.Contracts.Proto;
using ZB.MOM.WW.MxGateway.Server.Configuration;
using ZB.MOM.WW.MxGateway.Server.Galaxy;
namespace ZB.MOM.WW.MxGateway.Server.Alarms;
/// <summary>
/// Default <see cref="IAlarmWatchListResolver"/>. Merges Galaxy Repository
/// alarm-attribute discovery with the configured include/exclude overrides
/// and composes the per-attribute subtag item addresses from the configured
/// subtag names.
/// </summary>
// NOTE: The exact subtag names and the canonical AlarmFullReference shape
// ("Galaxy!{area}.{reference}") are validated against a live Galaxy in the
// Task 17 live smoke test. The config Subtags block exists precisely so these
// names are not hard-coded here.
public sealed class AlarmWatchListResolver : IAlarmWatchListResolver
{
private const string ProviderLiteral = "Galaxy";
private const string DefaultActiveSubtag = "InAlarm";
private const string DefaultAckedSubtag = "Acked";
private readonly IGalaxyRepository _repository;
private readonly ILogger<AlarmWatchListResolver> _logger;
/// <summary>Initializes the watch-list resolver.</summary>
/// <param name="repository">Galaxy Repository used for alarm-attribute discovery.</param>
/// <param name="logger">Diagnostic logger.</param>
public AlarmWatchListResolver(
IGalaxyRepository repository,
ILogger<AlarmWatchListResolver> logger)
{
_repository = repository ?? throw new ArgumentNullException(nameof(repository));
_logger = logger ?? throw new ArgumentNullException(nameof(logger));
}
/// <inheritdoc />
public async Task<IReadOnlyList<AlarmSubtagTarget>> ResolveAsync(
AlarmsOptions options,
CancellationToken cancellationToken = default)
{
ArgumentNullException.ThrowIfNull(options);
AlarmDiscoveryOptions discovery = options.Fallback.Discovery;
// 1. Build the ordered, de-duplicated attribute reference set.
// Each entry carries the reference plus the source-object reference.
List<(string Reference, string SourceObject)> ordered = [];
HashSet<string> seen = new(StringComparer.OrdinalIgnoreCase);
if (discovery.UseGalaxyRepository)
{
List<GalaxyAlarmAttributeRow> rows;
try
{
rows = await _repository.GetAlarmAttributesAsync(cancellationToken).ConfigureAwait(false);
}
catch (Exception ex)
{
// Discovery being unavailable must not crash the resolver: log and
// continue with an empty discovery set. The caller decides what to
// do with the (possibly config-only) result.
_logger.LogWarning(
ex,
"Galaxy Repository alarm-attribute discovery failed; continuing with configuration-only watch-list.");
rows = [];
}
foreach (GalaxyAlarmAttributeRow row in rows)
{
if (string.IsNullOrEmpty(row.FullTagReference) || !seen.Add(row.FullTagReference))
{
continue;
}
ordered.Add((row.FullTagReference, row.SourceObjectReference));
}
}
foreach (string include in discovery.IncludeAttributes)
{
if (string.IsNullOrEmpty(include) || !seen.Add(include))
{
continue;
}
ordered.Add((include, DeriveSourceObject(include)));
}
// Remove excluded references (case-insensitive), but only when GR discovery
// is active. ExcludeAttributes is documented as "Ignored when
// UseGalaxyRepository is false" (AlarmDiscoveryOptions.ExcludeAttributes).
// Whitespace-only entries are skipped, consistent with the include guard above.
if (discovery.UseGalaxyRepository)
{
HashSet<string> excluded = new(
discovery.ExcludeAttributes.Where(e => !string.IsNullOrWhiteSpace(e)),
StringComparer.OrdinalIgnoreCase);
if (excluded.Count > 0)
{
ordered.RemoveAll(e => excluded.Contains(e.Reference));
}
}
// 2. Resolve subtag names with safe fallbacks.
string active = string.IsNullOrEmpty(options.Fallback.Subtags.Active)
? DefaultActiveSubtag
: options.Fallback.Subtags.Active;
string acked = string.IsNullOrEmpty(options.Fallback.Subtags.Acked)
? DefaultAckedSubtag
: options.Fallback.Subtags.Acked;
string priority = options.Fallback.Subtags.Priority;
string ackComment = options.Fallback.Subtags.AckComment;
// 3. Resolve the area: discovery area, else the default area (may be empty).
string area = string.IsNullOrEmpty(discovery.Area) ? options.DefaultArea : discovery.Area;
// 4. Compose one target per reference.
List<AlarmSubtagTarget> targets = new(ordered.Count);
foreach ((string reference, string sourceObject) in ordered)
{
targets.Add(new AlarmSubtagTarget
{
AlarmFullReference = ComposeFullReference(area, reference),
SourceObjectReference = sourceObject,
ActiveSubtag = $"{reference}.{active}",
AckedSubtag = $"{reference}.{acked}",
PrioritySubtag = string.IsNullOrEmpty(priority) ? string.Empty : $"{reference}.{priority}",
AckCommentSubtag = string.IsNullOrEmpty(ackComment) ? string.Empty : $"{reference}.{ackComment}",
});
}
// 5. Report the resolved count; warn when subtag mode was expected to cover
// something (GR enabled, or explicit includes were configured) but resolved
// to nothing. Only emit the Debug line when there is at least one target,
// to avoid a confusing "0 target(s)" noise line.
if (targets.Count == 0 && (discovery.UseGalaxyRepository || discovery.IncludeAttributes.Length > 0))
{
_logger.LogWarning(
"Alarm subtag watch-list resolved to zero targets; subtag-polling fallback will cover no alarms.");
}
else if (targets.Count > 0)
{
_logger.LogDebug("Resolved alarm subtag watch-list with {TargetCount} target(s).", targets.Count);
}
return targets;
}
/// <summary>
/// Derives the source-object reference for a configuration entry: the
/// substring before the first '.', or the whole string when there is no dot.
/// </summary>
private static string DeriveSourceObject(string reference)
{
int dot = reference.IndexOf('.', StringComparison.Ordinal);
return dot < 0 ? reference : reference[..dot];
}
/// <summary>
/// Composes the canonical alarm full reference: <c>Galaxy!{area}.{reference}</c>
/// when an area is set, otherwise <c>Galaxy!{reference}</c>.
/// </summary>
private static string ComposeFullReference(string area, string reference) =>
string.IsNullOrEmpty(area)
? $"{ProviderLiteral}!{reference}"
: $"{ProviderLiteral}!{area}.{reference}";
}
@@ -13,6 +13,7 @@ public static class AlarmsServiceCollectionExtensions
/// <returns>The service collection for chaining.</returns> /// <returns>The service collection for chaining.</returns>
public static IServiceCollection AddGatewayAlarms(this IServiceCollection services) public static IServiceCollection AddGatewayAlarms(this IServiceCollection services)
{ {
services.AddSingleton<IAlarmWatchListResolver, AlarmWatchListResolver>();
services.AddSingleton<GatewayAlarmMonitor>(); services.AddSingleton<GatewayAlarmMonitor>();
services.AddSingleton<IGatewayAlarmService>(provider => provider.GetRequiredService<GatewayAlarmMonitor>()); services.AddSingleton<IGatewayAlarmService>(provider => provider.GetRequiredService<GatewayAlarmMonitor>());
services.AddHostedService(provider => provider.GetRequiredService<GatewayAlarmMonitor>()); services.AddHostedService(provider => provider.GetRequiredService<GatewayAlarmMonitor>());
@@ -1,7 +1,9 @@
using System.Threading.Channels; using System.Threading.Channels;
using Google.Protobuf.WellKnownTypes;
using Microsoft.Extensions.Options; using Microsoft.Extensions.Options;
using ZB.MOM.WW.MxGateway.Contracts.Proto; using ZB.MOM.WW.MxGateway.Contracts.Proto;
using ZB.MOM.WW.MxGateway.Server.Configuration; using ZB.MOM.WW.MxGateway.Server.Configuration;
using ZB.MOM.WW.MxGateway.Server.Metrics;
using ZB.MOM.WW.MxGateway.Server.Sessions; using ZB.MOM.WW.MxGateway.Server.Sessions;
namespace ZB.MOM.WW.MxGateway.Server.Alarms; namespace ZB.MOM.WW.MxGateway.Server.Alarms;
@@ -23,6 +25,8 @@ public sealed class GatewayAlarmMonitor : BackgroundService, IGatewayAlarmServic
private static readonly TimeSpan StartupGrace = TimeSpan.FromSeconds(2); private static readonly TimeSpan StartupGrace = TimeSpan.FromSeconds(2);
private readonly ISessionManager _sessionManager; private readonly ISessionManager _sessionManager;
private readonly IAlarmWatchListResolver _watchListResolver;
private readonly GatewayMetrics _metrics;
private readonly AlarmsOptions _options; private readonly AlarmsOptions _options;
private readonly ILogger<GatewayAlarmMonitor> _logger; private readonly ILogger<GatewayAlarmMonitor> _logger;
@@ -30,20 +34,34 @@ public sealed class GatewayAlarmMonitor : BackgroundService, IGatewayAlarmServic
private readonly Dictionary<string, ActiveAlarmSnapshot> _alarms = new(StringComparer.Ordinal); private readonly Dictionary<string, ActiveAlarmSnapshot> _alarms = new(StringComparer.Ordinal);
private readonly List<Subscriber> _subscribers = []; private readonly List<Subscriber> _subscribers = [];
// Current provider status (mode + degraded + reason + since), guarded by _sync.
// Initialized to the alarm-manager, not-degraded baseline so a late joiner sees
// a sensible status even before any OnAlarmProviderModeChanged event arrives.
private AlarmProviderMode _providerMode = AlarmProviderMode.Alarmmgr;
private bool _providerDegraded;
private string _providerReason = string.Empty;
private DateTimeOffset _providerSince = DateTimeOffset.UtcNow;
private volatile GatewayAlarmMonitorState _state = GatewayAlarmMonitorState.Disabled; private volatile GatewayAlarmMonitorState _state = GatewayAlarmMonitorState.Disabled;
private volatile string? _lastError; private volatile string? _lastError;
private GatewaySession? _session; private GatewaySession? _session;
/// <summary>Initializes the gateway alarm monitor.</summary> /// <summary>Initializes the gateway alarm monitor.</summary>
/// <param name="sessionManager">Gateway session manager.</param> /// <param name="sessionManager">Gateway session manager.</param>
/// <param name="watchListResolver">Resolver for the subtag-fallback watch-list.</param>
/// <param name="metrics">Gateway metrics sink.</param>
/// <param name="options">Gateway options carrying the alarm configuration.</param> /// <param name="options">Gateway options carrying the alarm configuration.</param>
/// <param name="logger">Diagnostic logger.</param> /// <param name="logger">Diagnostic logger.</param>
public GatewayAlarmMonitor( public GatewayAlarmMonitor(
ISessionManager sessionManager, ISessionManager sessionManager,
IAlarmWatchListResolver watchListResolver,
GatewayMetrics metrics,
IOptions<GatewayOptions> options, IOptions<GatewayOptions> options,
ILogger<GatewayAlarmMonitor> logger) ILogger<GatewayAlarmMonitor> logger)
{ {
_sessionManager = sessionManager ?? throw new ArgumentNullException(nameof(sessionManager)); _sessionManager = sessionManager ?? throw new ArgumentNullException(nameof(sessionManager));
_watchListResolver = watchListResolver ?? throw new ArgumentNullException(nameof(watchListResolver));
_metrics = metrics ?? throw new ArgumentNullException(nameof(metrics));
_options = (options ?? throw new ArgumentNullException(nameof(options))).Value.Alarms; _options = (options ?? throw new ArgumentNullException(nameof(options))).Value.Alarms;
_logger = logger ?? throw new ArgumentNullException(nameof(logger)); _logger = logger ?? throw new ArgumentNullException(nameof(logger));
} }
@@ -139,6 +157,20 @@ public sealed class GatewayAlarmMonitor : BackgroundService, IGatewayAlarmServic
private async Task RunMonitorAsync(string subscription, CancellationToken stoppingToken) private async Task RunMonitorAsync(string subscription, CancellationToken stoppingToken)
{ {
_state = GatewayAlarmMonitorState.Starting; _state = GatewayAlarmMonitorState.Starting;
lock (_sync)
{
// Re-baseline the provider status for this lifecycle so a restarted
// monitor advertises alarm-manager/not-degraded until told otherwise.
_providerMode = AlarmProviderMode.Alarmmgr;
_providerDegraded = false;
_providerReason = string.Empty;
_providerSince = DateTimeOffset.UtcNow;
}
// Align the observable gauge with the Alarmmgr baseline without recording
// a switch — the gauge was 0 (unknown) from construction until now.
_metrics.SetAlarmProviderMode(ModeToInt(AlarmProviderMode.Alarmmgr));
GatewaySession session = await _sessionManager.OpenSessionAsync( GatewaySession session = await _sessionManager.OpenSessionAsync(
new SessionOpenRequest(BackendName, MonitorClientName, Guid.NewGuid().ToString("N"), CommandTimeout: null), new SessionOpenRequest(BackendName, MonitorClientName, Guid.NewGuid().ToString("N"), CommandTimeout: null),
MonitorClientName, MonitorClientName,
@@ -173,6 +205,15 @@ public sealed class GatewayAlarmMonitor : BackgroundService, IGatewayAlarmServic
{ {
ApplyTransition(mxEvent.OnAlarmTransition); ApplyTransition(mxEvent.OnAlarmTransition);
} }
else if (mxEvent is { BodyCase: MxEvent.BodyOneofCase.OnAlarmProviderModeChanged }
&& mxEvent.OnAlarmProviderModeChanged is not null)
{
await ApplyProviderModeChangeAsync(
session.SessionId,
mxEvent.OnAlarmProviderModeChanged,
linked.Token)
.ConfigureAwait(false);
}
} }
} }
finally finally
@@ -209,6 +250,29 @@ public sealed class GatewayAlarmMonitor : BackgroundService, IGatewayAlarmServic
private async Task SubscribeAlarmsAsync(string sessionId, string subscription, CancellationToken cancellationToken) private async Task SubscribeAlarmsAsync(string sessionId, string subscription, CancellationToken cancellationToken)
{ {
IReadOnlyList<AlarmSubtagTarget> watchList = await _watchListResolver
.ResolveAsync(_options, cancellationToken)
.ConfigureAwait(false);
AlarmProviderMode forcedMode = MapForcedMode(_options.Fallback.Mode);
// When the forced mode is Unspecified (the "Auto" case) and the resolved
// watch-list is empty — the common alarmmgr-only deployment — the command
// is identical-in-effect to the historical SubscribeAlarms (wnwrap only):
// the worker builds the wnwrap consumer and no subtag watch-list.
SubscribeAlarmsCommand command = new()
{
SubscriptionExpression = subscription,
ForcedMode = forcedMode,
Failover = new AlarmFailoverConfig
{
ConsecutiveFailureThreshold = _options.Fallback.ConsecutiveFailureThreshold,
FailbackProbeIntervalSeconds = _options.Fallback.FailbackProbeIntervalSeconds,
FailbackStableProbes = _options.Fallback.FailbackStableProbes,
},
};
command.WatchList.AddRange(watchList);
WorkerCommandReply reply = await _sessionManager.InvokeAsync( WorkerCommandReply reply = await _sessionManager.InvokeAsync(
sessionId, sessionId,
new WorkerCommand new WorkerCommand
@@ -216,7 +280,7 @@ public sealed class GatewayAlarmMonitor : BackgroundService, IGatewayAlarmServic
Command = new MxCommand Command = new MxCommand
{ {
Kind = MxCommandKind.SubscribeAlarms, Kind = MxCommandKind.SubscribeAlarms,
SubscribeAlarms = new SubscribeAlarmsCommand { SubscriptionExpression = subscription }, SubscribeAlarms = command,
}, },
}, },
cancellationToken) cancellationToken)
@@ -310,6 +374,98 @@ public sealed class GatewayAlarmMonitor : BackgroundService, IGatewayAlarmServic
} }
} }
// Handles the worker's provider-mode-change event: updates the stored provider
// status, broadcasts it to every subscriber (provider status is global, not
// alarm-scoped), records the switch metric, and forces a cache reconcile so the
// active-alarm set reflects whatever the new mode reports.
private async Task ApplyProviderModeChangeAsync(
string sessionId,
OnAlarmProviderModeChangedEvent change,
CancellationToken cancellationToken)
{
AlarmProviderMode toMode = change.Mode;
string reason = change.Reason ?? string.Empty;
AlarmProviderStatus status;
int fromModeInt;
lock (_sync)
{
fromModeInt = ModeToInt(_providerMode);
_providerMode = toMode;
_providerDegraded = toMode == AlarmProviderMode.Subtag;
_providerReason = reason;
_providerSince = DateTimeOffset.UtcNow;
status = BuildProviderStatus();
BroadcastToAll(new AlarmFeedMessage { ProviderStatus = status });
}
_metrics.AlarmProviderSwitched(fromModeInt, ModeToInt(toMode), reason);
_logger.LogInformation(
"Alarm provider mode changed to {Mode} (degraded={Degraded}): {Reason}",
toMode,
status.Degraded,
reason);
try
{
// Intentionally awaited OUTSIDE _sync: ReconcileAsync acquires _sync itself,
// so holding it across the await here would deadlock. Subscribers therefore
// see the ProviderStatus push (above) slightly before the cache is re-seeded
// by the reconcile — an accepted brief inconsistency.
await ReconcileAsync(sessionId, cancellationToken).ConfigureAwait(false);
}
catch (OperationCanceledException)
{
throw;
}
catch (Exception exception)
{
_logger.LogDebug(
exception,
"Reconcile after alarm provider mode change failed; keeping the current cache.");
}
}
// Caller holds _sync. Builds an AlarmProviderStatus snapshot of the current state.
private AlarmProviderStatus BuildProviderStatus()
{
return new AlarmProviderStatus
{
Mode = _providerMode,
Degraded = _providerDegraded,
Reason = _providerReason,
Since = Timestamp.FromDateTimeOffset(_providerSince),
};
}
// Maps the configured fallback mode string to the forced provider mode the
// worker honours. Case-insensitive; anything other than the two force values
// (including the default "Auto") yields Unspecified ("let the worker decide").
private static AlarmProviderMode MapForcedMode(string? mode)
{
if (string.Equals(mode, "ForceAlarmManager", StringComparison.OrdinalIgnoreCase))
{
return AlarmProviderMode.Alarmmgr;
}
if (string.Equals(mode, "ForceSubtag", StringComparison.OrdinalIgnoreCase))
{
return AlarmProviderMode.Subtag;
}
return AlarmProviderMode.Unspecified;
}
// Maps the provider-mode enum to the integer the metric expects
// (alarmmgr=1, subtag=2, unknown/unspecified=0).
private static int ModeToInt(AlarmProviderMode mode) => mode switch
{
AlarmProviderMode.Alarmmgr => 1,
AlarmProviderMode.Subtag => 2,
_ => 0,
};
// Replaces the cache with the worker's authoritative snapshot, broadcasting // Replaces the cache with the worker's authoritative snapshot, broadcasting
// a synthetic transition for any alarm the live stream missed. // a synthetic transition for any alarm the live stream missed.
private void ApplyReconcile(IEnumerable<ActiveAlarmSnapshot> snapshots) private void ApplyReconcile(IEnumerable<ActiveAlarmSnapshot> snapshots)
@@ -374,6 +530,23 @@ public sealed class GatewayAlarmMonitor : BackgroundService, IGatewayAlarmServic
} }
} }
// Caller holds _sync. Pushes a feed message to every subscriber regardless of
// its alarm-filter prefix. Used for provider-status messages, which are global
// rather than scoped to a single alarm reference.
private void BroadcastToAll(AlarmFeedMessage message)
{
for (int index = _subscribers.Count - 1; index >= 0; index--)
{
Subscriber subscriber = _subscribers[index];
if (!subscriber.Channel.Writer.TryWrite(message))
{
subscriber.Channel.Writer.TryComplete(new InvalidOperationException(
"Alarm feed subscriber fell behind and was dropped; reconnect to re-snapshot."));
_subscribers.RemoveAt(index);
}
}
}
private void ClearCache() private void ClearCache()
{ {
lock (_sync) lock (_sync)
@@ -398,11 +571,14 @@ public sealed class GatewayAlarmMonitor : BackgroundService, IGatewayAlarmServic
Subscriber subscriber = new(channel, prefix); Subscriber subscriber = new(channel, prefix);
ActiveAlarmSnapshot[] snapshot; ActiveAlarmSnapshot[] snapshot;
AlarmProviderStatus providerStatus;
lock (_sync) lock (_sync)
{ {
// Register before snapshotting under the same lock so no transition // Register before snapshotting under the same lock so neither a
// can slip between the snapshot and the live stream. // transition nor a provider-mode change can slip between the snapshot
// and the live stream.
_subscribers.Add(subscriber); _subscribers.Add(subscriber);
providerStatus = BuildProviderStatus();
snapshot = _alarms.Values snapshot = _alarms.Values
.Where(alarm => prefix.Length == 0 .Where(alarm => prefix.Length == 0
|| alarm.AlarmFullReference.StartsWith(prefix, StringComparison.Ordinal)) || alarm.AlarmFullReference.StartsWith(prefix, StringComparison.Ordinal))
@@ -412,6 +588,10 @@ public sealed class GatewayAlarmMonitor : BackgroundService, IGatewayAlarmServic
try try
{ {
// Emit the current provider status first so a late joiner immediately
// learns the mode (and whether the feed is degraded) before any alarms.
yield return new AlarmFeedMessage { ProviderStatus = providerStatus };
foreach (ActiveAlarmSnapshot alarm in snapshot) foreach (ActiveAlarmSnapshot alarm in snapshot)
{ {
yield return new AlarmFeedMessage { ActiveAlarm = alarm }; yield return new AlarmFeedMessage { ActiveAlarm = alarm };
@@ -624,6 +804,8 @@ public sealed class GatewayAlarmMonitor : BackgroundService, IGatewayAlarmServic
Description = transition.Description, Description = transition.Description,
OperatorUser = transition.OperatorUser, OperatorUser = transition.OperatorUser,
OperatorComment = transition.OperatorComment, OperatorComment = transition.OperatorComment,
Degraded = transition.Degraded,
SourceProvider = transition.SourceProvider,
}; };
if (transition.OriginalRaiseTimestamp is not null) if (transition.OriginalRaiseTimestamp is not null)
{ {
@@ -660,6 +842,8 @@ public sealed class GatewayAlarmMonitor : BackgroundService, IGatewayAlarmServic
Description = snapshot.Description, Description = snapshot.Description,
OperatorUser = snapshot.OperatorUser, OperatorUser = snapshot.OperatorUser,
OperatorComment = snapshot.OperatorComment, OperatorComment = snapshot.OperatorComment,
Degraded = snapshot.Degraded,
SourceProvider = snapshot.SourceProvider,
}; };
if (snapshot.OriginalRaiseTimestamp is not null) if (snapshot.OriginalRaiseTimestamp is not null)
{ {
@@ -0,0 +1,28 @@
using ZB.MOM.WW.MxGateway.Contracts.Proto;
using ZB.MOM.WW.MxGateway.Server.Configuration;
namespace ZB.MOM.WW.MxGateway.Server.Alarms;
/// <summary>
/// Resolves the subtag watch-list the gateway sends to the worker when the
/// central alarm monitor operates in subtag-polling fallback mode. Merges
/// Galaxy Repository alarm-attribute discovery with the configured
/// include/exclude overrides and composes the per-attribute subtag item
/// addresses from the configured subtag names.
/// </summary>
public interface IAlarmWatchListResolver
{
/// <summary>
/// Builds the subtag watch-list for the supplied alarm configuration.
/// </summary>
/// <param name="options">Alarm configuration carrying discovery and subtag-name settings.</param>
/// <param name="cancellationToken">Token to cancel the asynchronous operation.</param>
/// <returns>
/// The resolved <see cref="AlarmSubtagTarget"/> watch-list, possibly empty.
/// Discovery being unavailable never throws; the caller decides what to do
/// with an empty list.
/// </returns>
Task<IReadOnlyList<AlarmSubtagTarget>> ResolveAsync(
AlarmsOptions options,
CancellationToken cancellationToken = default);
}
@@ -0,0 +1,131 @@
namespace ZB.MOM.WW.MxGateway.Server.Configuration;
/// <summary>
/// Controls how the central alarm monitor selects between the MXAccess
/// alarm-manager subscription and the subtag-polling fallback, and
/// governs the failure-detection thresholds used when switching.
/// </summary>
public sealed class AlarmFallbackOptions
{
/// <summary>
/// Selects the operating mode for the alarm-manager ↔ subtag fallback
/// mechanism. Accepted values (case-insensitive):
/// <list type="bullet">
/// <item><c>Auto</c> — use the alarm manager; switch to subtag polling
/// automatically when <see cref="ConsecutiveFailureThreshold"/> failures
/// are detected, and probe for failback.</item>
/// <item><c>ForceAlarmManager</c> — always use the alarm manager;
/// never fall back.</item>
/// <item><c>ForceSubtag</c> — always use subtag polling;
/// never try the alarm manager.</item>
/// </list>
/// Default is <c>Auto</c>.
/// </summary>
public string Mode { get; init; } = "Auto";
/// <summary>
/// Number of consecutive alarm-manager failures before the monitor
/// switches to subtag-polling fallback. Must be at least 1. Default 3.
/// </summary>
public int ConsecutiveFailureThreshold { get; init; } = 3;
/// <summary>
/// How often (in seconds) the monitor sends a probe to the alarm manager
/// while operating in subtag-polling fallback mode, to detect recovery.
/// Must be at least 1. Default 30.
/// </summary>
public int FailbackProbeIntervalSeconds { get; init; } = 30;
/// <summary>
/// Number of consecutive successful probes required before the monitor
/// considers the alarm manager recovered and switches back. Must be at
/// least 1. Default 3.
/// </summary>
public int FailbackStableProbes { get; init; } = 3;
/// <summary>
/// Controls how the monitor discovers the set of objects to poll when
/// operating in subtag-polling fallback mode.
/// </summary>
public AlarmDiscoveryOptions Discovery { get; init; } = new();
/// <summary>
/// Configures the subtag names the monitor reads when polling alarm state
/// in subtag-fallback mode.
/// </summary>
public AlarmSubtagNameOptions Subtags { get; init; } = new();
}
/// <summary>
/// Governs how the alarm monitor discovers objects to include in subtag-polling
/// fallback mode. Either the Galaxy Repository query (when
/// <see cref="UseGalaxyRepository"/> is <c>true</c>) or an explicit
/// <see cref="IncludeAttributes"/> list must be supplied when
/// <c>MxGateway:Alarms:Fallback:Mode</c> is <c>ForceSubtag</c>.
/// </summary>
public sealed class AlarmDiscoveryOptions
{
/// <summary>
/// When <c>true</c> the monitor queries the Galaxy Repository SQL database
/// to enumerate alarm objects for the configured area. Default <c>true</c>.
/// </summary>
public bool UseGalaxyRepository { get; init; } = true;
/// <summary>
/// Galaxy area to scope the Repository query to. When empty the monitor
/// falls back to <see cref="AlarmsOptions.DefaultArea"/>. Ignored when
/// <see cref="UseGalaxyRepository"/> is <c>false</c>.
/// </summary>
public string Area { get; init; } = string.Empty;
/// <summary>
/// Explicit list of MXAccess attribute paths to include in subtag polling,
/// supplementing (or replacing, when <see cref="UseGalaxyRepository"/> is
/// <c>false</c>) the Repository-derived list. Default empty.
/// </summary>
public string[] IncludeAttributes { get; init; } = Array.Empty<string>();
/// <summary>
/// Attribute paths to exclude from the Repository-derived poll list.
/// Ignored when <see cref="UseGalaxyRepository"/> is <c>false</c>.
/// Default empty.
/// </summary>
public string[] ExcludeAttributes { get; init; } = Array.Empty<string>();
}
/// <summary>
/// Configures the subtag names read by the alarm monitor when it is operating
/// in subtag-polling fallback mode. Names are matched against MXAccess item
/// handles; validation against the live MXAccess attribute list occurs at
/// runtime, not at startup.
/// Defaults are the confirmed AVEVA <c>AlarmExtension</c> primitive field names,
/// verified against the live ZB Galaxy <c>attribute_definition</c> rows.
/// </summary>
public sealed class AlarmSubtagNameOptions
{
/// <summary>
/// Subtag name for the in-alarm boolean. Confirmed AVEVA <c>AlarmExtension</c>
/// field name. Default <c>InAlarm</c>.
/// </summary>
public string Active { get; init; } = "InAlarm";
/// <summary>
/// Subtag name for the acknowledged boolean. Confirmed AVEVA <c>AlarmExtension</c>
/// field name. Default <c>Acked</c>.
/// </summary>
public string Acked { get; init; } = "Acked";
/// <summary>
/// Subtag name for the acknowledgement comment write target. Writing this subtag
/// performs the acknowledge in AVEVA. Confirmed AVEVA <c>AlarmExtension</c>
/// field name. When empty the ack-comment write path is disabled.
/// Default <c>AckMsg</c>.
/// </summary>
public string AckComment { get; init; } = "AckMsg";
/// <summary>
/// Subtag name for the alarm priority / severity. Confirmed AVEVA
/// <c>AlarmExtension</c> field name. Default <c>Priority</c>.
/// </summary>
public string Priority { get; init; } = "Priority";
}
@@ -45,4 +45,12 @@ public sealed class AlarmsOptions
/// the monitor floors it at 5 seconds. /// the monitor floors it at 5 seconds.
/// </summary> /// </summary>
public int ReconcileIntervalSeconds { get; init; } = 30; public int ReconcileIntervalSeconds { get; init; } = 30;
/// <summary>
/// Configuration for the alarm-manager ↔ subtag fallback mechanism:
/// operating mode, failure-detection thresholds, discovery, and subtag
/// names. Defaults (Mode = "Auto") preserve behaviour when the section is
/// omitted from configuration.
/// </summary>
public AlarmFallbackOptions Fallback { get; init; } = new();
} }
@@ -21,6 +21,17 @@ public sealed class DashboardOptions
/// </summary> /// </summary>
public bool RequireHttpsCookie { get; init; } = true; public bool RequireHttpsCookie { get; init; } = true;
/// <summary>
/// Dashboard auth cookie name. When null/blank (the default) the canonical
/// <see cref="ZB.MOM.WW.MxGateway.Server.Dashboard.DashboardAuthenticationDefaults.CookieName"/>
/// is used. Override it (<c>MxGateway:Dashboard:CookieName</c>) to give a distinct name to a
/// gateway that shares a hostname with another gateway instance — browser cookies are scoped
/// by host+path but NOT by port, so two instances on the same host would otherwise clobber
/// each other's dashboard session under a shared cookie name. Changing this signs out
/// existing dashboard sessions on next deploy.
/// </summary>
public string? CookieName { get; init; }
/// <summary>Gets the dashboard snapshot update interval in milliseconds.</summary> /// <summary>Gets the dashboard snapshot update interval in milliseconds.</summary>
public int SnapshotIntervalMilliseconds { get; init; } = 1_000; public int SnapshotIntervalMilliseconds { get; init; } = 1_000;
@@ -4,8 +4,8 @@ public sealed record EffectiveLdapConfiguration(
bool Enabled, bool Enabled,
string Server, string Server,
int Port, int Port,
bool UseTls, string Transport,
bool AllowInsecureLdap, bool AllowInsecure,
string SearchBase, string SearchBase,
string ServiceAccountDn, string ServiceAccountDn,
string ServiceAccountPassword, string ServiceAccountPassword,
@@ -23,8 +23,8 @@ public sealed class GatewayConfigurationProvider(IOptions<GatewayOptions> option
Enabled: value.Ldap.Enabled, Enabled: value.Ldap.Enabled,
Server: value.Ldap.Server, Server: value.Ldap.Server,
Port: value.Ldap.Port, Port: value.Ldap.Port,
UseTls: value.Ldap.UseTls, Transport: value.Ldap.Transport.ToString(),
AllowInsecureLdap: value.Ldap.AllowInsecureLdap, AllowInsecure: value.Ldap.AllowInsecure,
SearchBase: value.Ldap.SearchBase, SearchBase: value.Ldap.SearchBase,
ServiceAccountDn: value.Ldap.ServiceAccountDn, ServiceAccountDn: value.Ldap.ServiceAccountDn,
ServiceAccountPassword: RedactedValue, ServiceAccountPassword: RedactedValue,
@@ -1,4 +1,5 @@
using Microsoft.Extensions.Options; using Microsoft.Extensions.Configuration;
using ZB.MOM.WW.Configuration;
namespace ZB.MOM.WW.MxGateway.Server.Configuration; namespace ZB.MOM.WW.MxGateway.Server.Configuration;
@@ -6,15 +7,14 @@ public static class GatewayConfigurationServiceCollectionExtensions
{ {
/// <summary>Registers gateway configuration services in the dependency injection container.</summary> /// <summary>Registers gateway configuration services in the dependency injection container.</summary>
/// <param name="services">The service collection.</param> /// <param name="services">The service collection.</param>
/// <param name="configuration">The configuration to bind gateway options from.</param>
/// <returns>The service collection for chaining.</returns> /// <returns>The service collection for chaining.</returns>
public static IServiceCollection AddGatewayConfiguration(this IServiceCollection services) public static IServiceCollection AddGatewayConfiguration(
this IServiceCollection services, IConfiguration configuration)
{ {
services services.AddValidatedOptions<GatewayOptions, GatewayOptionsValidator>(
.AddOptions<GatewayOptions>() configuration, GatewayOptions.SectionName);
.BindConfiguration(GatewayOptions.SectionName)
.ValidateOnStart();
services.AddSingleton<IValidateOptions<GatewayOptions>, GatewayOptionsValidator>();
services.AddSingleton<IGatewayConfigurationProvider, GatewayConfigurationProvider>(); services.AddSingleton<IGatewayConfigurationProvider, GatewayConfigurationProvider>();
return services; return services;
@@ -1,9 +1,10 @@
using Microsoft.Extensions.Options; using ZB.MOM.WW.Auth.Abstractions.Ldap;
using ZB.MOM.WW.Configuration;
using ZB.MOM.WW.MxGateway.Contracts; using ZB.MOM.WW.MxGateway.Contracts;
namespace ZB.MOM.WW.MxGateway.Server.Configuration; namespace ZB.MOM.WW.MxGateway.Server.Configuration;
public sealed class GatewayOptionsValidator : IValidateOptions<GatewayOptions> public sealed class GatewayOptionsValidator : OptionsValidatorBase<GatewayOptions>
{ {
private const int MinimumMaxMessageBytes = 1024; private const int MinimumMaxMessageBytes = 1024;
private const int MaximumMaxMessageBytes = 256 * 1024 * 1024; private const int MaximumMaxMessageBytes = 256 * 1024 * 1024;
@@ -11,33 +12,26 @@ public sealed class GatewayOptionsValidator : IValidateOptions<GatewayOptions>
/// <summary> /// <summary>
/// Validates gateway configuration options. /// Validates gateway configuration options.
/// </summary> /// </summary>
/// <param name="name">Options name.</param> /// <param name="builder">The accumulator to record failures on.</param>
/// <param name="options">Gateway options to validate.</param> /// <param name="options">Gateway options to validate.</param>
/// <returns>Validation result.</returns> protected override void Validate(ValidationBuilder builder, GatewayOptions options)
public ValidateOptionsResult Validate(string? name, GatewayOptions options)
{ {
List<string> failures = []; ValidateAuthentication(options.Authentication, builder);
ValidateLdap(options.Ldap, builder);
ValidateAuthentication(options.Authentication, failures); ValidateWorker(options.Worker, builder);
ValidateLdap(options.Ldap, failures); ValidateSessions(options.Sessions, builder);
ValidateWorker(options.Worker, failures); ValidateEvents(options.Events, builder);
ValidateSessions(options.Sessions, failures); ValidateDashboard(options.Dashboard, builder);
ValidateEvents(options.Events, failures); ValidateProtocol(options.Protocol, builder);
ValidateDashboard(options.Dashboard, failures); ValidateAlarms(options.Alarms, builder);
ValidateProtocol(options.Protocol, failures); ValidateTls(options.Tls, builder);
ValidateAlarms(options.Alarms, failures);
ValidateTls(options.Tls, failures);
return failures.Count == 0
? ValidateOptionsResult.Success
: ValidateOptionsResult.Fail(failures);
} }
private static void ValidateAuthentication(AuthenticationOptions options, List<string> failures) private static void ValidateAuthentication(AuthenticationOptions options, ValidationBuilder builder)
{ {
if (!Enum.IsDefined(options.Mode)) if (!Enum.IsDefined(options.Mode))
{ {
failures.Add("MxGateway:Authentication:Mode must be a supported authentication mode."); builder.Add("MxGateway:Authentication:Mode must be a supported authentication mode.");
return; return;
} }
@@ -46,67 +40,67 @@ public sealed class GatewayOptionsValidator : IValidateOptions<GatewayOptions>
AddIfBlank( AddIfBlank(
options.SqlitePath, options.SqlitePath,
"MxGateway:Authentication:SqlitePath is required when API-key authentication is enabled.", "MxGateway:Authentication:SqlitePath is required when API-key authentication is enabled.",
failures); builder);
AddIfInvalidPath( AddIfInvalidPath(
options.SqlitePath, options.SqlitePath,
"MxGateway:Authentication:SqlitePath must be a valid filesystem path.", "MxGateway:Authentication:SqlitePath must be a valid filesystem path.",
failures); builder);
AddIfBlank( AddIfBlank(
options.PepperSecretName, options.PepperSecretName,
"MxGateway:Authentication:PepperSecretName is required when API-key authentication is enabled.", "MxGateway:Authentication:PepperSecretName is required when API-key authentication is enabled.",
failures); builder);
} }
} }
private static void ValidateLdap(LdapOptions options, List<string> failures) private static void ValidateLdap(LdapOptions options, ValidationBuilder builder)
{ {
if (!options.Enabled) if (!options.Enabled)
{ {
return; return;
} }
AddIfBlank(options.Server, "MxGateway:Ldap:Server is required when LDAP login is enabled.", failures); AddIfBlank(options.Server, "MxGateway:Ldap:Server is required when LDAP login is enabled.", builder);
AddIfBlank(options.SearchBase, "MxGateway:Ldap:SearchBase is required when LDAP login is enabled.", failures); AddIfBlank(options.SearchBase, "MxGateway:Ldap:SearchBase is required when LDAP login is enabled.", builder);
AddIfBlank( AddIfBlank(
options.ServiceAccountDn, options.ServiceAccountDn,
"MxGateway:Ldap:ServiceAccountDn is required when LDAP login is enabled.", "MxGateway:Ldap:ServiceAccountDn is required when LDAP login is enabled.",
failures); builder);
AddIfBlank( AddIfBlank(
options.ServiceAccountPassword, options.ServiceAccountPassword,
"MxGateway:Ldap:ServiceAccountPassword is required when LDAP login is enabled.", "MxGateway:Ldap:ServiceAccountPassword is required when LDAP login is enabled.",
failures); builder);
AddIfBlank( AddIfBlank(
options.UserNameAttribute, options.UserNameAttribute,
"MxGateway:Ldap:UserNameAttribute is required when LDAP login is enabled.", "MxGateway:Ldap:UserNameAttribute is required when LDAP login is enabled.",
failures); builder);
AddIfBlank( AddIfBlank(
options.DisplayNameAttribute, options.DisplayNameAttribute,
"MxGateway:Ldap:DisplayNameAttribute is required when LDAP login is enabled.", "MxGateway:Ldap:DisplayNameAttribute is required when LDAP login is enabled.",
failures); builder);
AddIfBlank( AddIfBlank(
options.GroupAttribute, options.GroupAttribute,
"MxGateway:Ldap:GroupAttribute is required when LDAP login is enabled.", "MxGateway:Ldap:GroupAttribute is required when LDAP login is enabled.",
failures); builder);
AddIfNotPositive(options.Port, "MxGateway:Ldap:Port must be greater than zero.", failures); builder.Port(options.Port, "MxGateway:Ldap:Port");
if (!options.UseTls && !options.AllowInsecureLdap) if (options.Transport == LdapTransport.None && !options.AllowInsecure)
{ {
failures.Add("MxGateway:Ldap:AllowInsecureLdap must be true when UseTls is false."); builder.Add("MxGateway:Ldap:AllowInsecure must be true when Transport is None (plaintext).");
} }
} }
private static void ValidateWorker(WorkerOptions options, List<string> failures) private static void ValidateWorker(WorkerOptions options, ValidationBuilder builder)
{ {
AddIfBlank(options.ExecutablePath, "MxGateway:Worker:ExecutablePath is required.", failures); AddIfBlank(options.ExecutablePath, "MxGateway:Worker:ExecutablePath is required.", builder);
AddIfInvalidPath( AddIfInvalidPath(
options.ExecutablePath, options.ExecutablePath,
"MxGateway:Worker:ExecutablePath must be a valid filesystem path.", "MxGateway:Worker:ExecutablePath must be a valid filesystem path.",
failures); builder);
if (!string.IsNullOrWhiteSpace(options.ExecutablePath) if (!string.IsNullOrWhiteSpace(options.ExecutablePath)
&& !string.Equals(Path.GetExtension(options.ExecutablePath), ".exe", StringComparison.OrdinalIgnoreCase)) && !string.Equals(Path.GetExtension(options.ExecutablePath), ".exe", StringComparison.OrdinalIgnoreCase))
{ {
failures.Add("MxGateway:Worker:ExecutablePath must point to a .exe file."); builder.Add("MxGateway:Worker:ExecutablePath must point to a .exe file.");
} }
if (!string.IsNullOrWhiteSpace(options.WorkingDirectory)) if (!string.IsNullOrWhiteSpace(options.WorkingDirectory))
@@ -114,94 +108,94 @@ public sealed class GatewayOptionsValidator : IValidateOptions<GatewayOptions>
AddIfInvalidPath( AddIfInvalidPath(
options.WorkingDirectory, options.WorkingDirectory,
"MxGateway:Worker:WorkingDirectory must be a valid filesystem path.", "MxGateway:Worker:WorkingDirectory must be a valid filesystem path.",
failures); builder);
} }
if (!Enum.IsDefined(options.RequiredArchitecture)) if (!Enum.IsDefined(options.RequiredArchitecture))
{ {
failures.Add("MxGateway:Worker:RequiredArchitecture must be a supported worker architecture."); builder.Add("MxGateway:Worker:RequiredArchitecture must be a supported worker architecture.");
} }
AddIfNotPositive( AddIfNotPositive(
options.StartupTimeoutSeconds, options.StartupTimeoutSeconds,
"MxGateway:Worker:StartupTimeoutSeconds must be greater than zero.", "MxGateway:Worker:StartupTimeoutSeconds must be greater than zero.",
failures); builder);
AddIfNotPositive( AddIfNotPositive(
options.StartupProbeRetryAttempts, options.StartupProbeRetryAttempts,
"MxGateway:Worker:StartupProbeRetryAttempts must be greater than zero.", "MxGateway:Worker:StartupProbeRetryAttempts must be greater than zero.",
failures); builder);
AddIfNotPositive( AddIfNotPositive(
options.StartupProbeRetryDelayMilliseconds, options.StartupProbeRetryDelayMilliseconds,
"MxGateway:Worker:StartupProbeRetryDelayMilliseconds must be greater than zero.", "MxGateway:Worker:StartupProbeRetryDelayMilliseconds must be greater than zero.",
failures); builder);
AddIfNotPositive( AddIfNotPositive(
options.PipeConnectAttemptTimeoutMilliseconds, options.PipeConnectAttemptTimeoutMilliseconds,
"MxGateway:Worker:PipeConnectAttemptTimeoutMilliseconds must be greater than zero.", "MxGateway:Worker:PipeConnectAttemptTimeoutMilliseconds must be greater than zero.",
failures); builder);
AddIfNotPositive( AddIfNotPositive(
options.ShutdownTimeoutSeconds, options.ShutdownTimeoutSeconds,
"MxGateway:Worker:ShutdownTimeoutSeconds must be greater than zero.", "MxGateway:Worker:ShutdownTimeoutSeconds must be greater than zero.",
failures); builder);
AddIfNotPositive( AddIfNotPositive(
options.HeartbeatIntervalSeconds, options.HeartbeatIntervalSeconds,
"MxGateway:Worker:HeartbeatIntervalSeconds must be greater than zero.", "MxGateway:Worker:HeartbeatIntervalSeconds must be greater than zero.",
failures); builder);
AddIfNotPositive( AddIfNotPositive(
options.HeartbeatGraceSeconds, options.HeartbeatGraceSeconds,
"MxGateway:Worker:HeartbeatGraceSeconds must be greater than zero.", "MxGateway:Worker:HeartbeatGraceSeconds must be greater than zero.",
failures); builder);
if (options.HeartbeatGraceSeconds < options.HeartbeatIntervalSeconds) if (options.HeartbeatGraceSeconds < options.HeartbeatIntervalSeconds)
{ {
failures.Add( builder.Add(
"MxGateway:Worker:HeartbeatGraceSeconds must be greater than or equal to HeartbeatIntervalSeconds."); "MxGateway:Worker:HeartbeatGraceSeconds must be greater than or equal to HeartbeatIntervalSeconds.");
} }
if (options.MaxMessageBytes is < MinimumMaxMessageBytes or > MaximumMaxMessageBytes) if (options.MaxMessageBytes is < MinimumMaxMessageBytes or > MaximumMaxMessageBytes)
{ {
failures.Add( builder.Add(
$"MxGateway:Worker:MaxMessageBytes must be between {MinimumMaxMessageBytes} and {MaximumMaxMessageBytes}."); $"MxGateway:Worker:MaxMessageBytes must be between {MinimumMaxMessageBytes} and {MaximumMaxMessageBytes}.");
} }
} }
private static void ValidateSessions(SessionOptions options, List<string> failures) private static void ValidateSessions(SessionOptions options, ValidationBuilder builder)
{ {
AddIfNotPositive( AddIfNotPositive(
options.DefaultCommandTimeoutSeconds, options.DefaultCommandTimeoutSeconds,
"MxGateway:Sessions:DefaultCommandTimeoutSeconds must be greater than zero.", "MxGateway:Sessions:DefaultCommandTimeoutSeconds must be greater than zero.",
failures); builder);
AddIfNotPositive(options.MaxSessions, "MxGateway:Sessions:MaxSessions must be greater than zero.", failures); AddIfNotPositive(options.MaxSessions, "MxGateway:Sessions:MaxSessions must be greater than zero.", builder);
AddIfNotPositive( AddIfNotPositive(
options.MaxPendingCommandsPerSession, options.MaxPendingCommandsPerSession,
"MxGateway:Sessions:MaxPendingCommandsPerSession must be greater than zero.", "MxGateway:Sessions:MaxPendingCommandsPerSession must be greater than zero.",
failures); builder);
AddIfNotPositive( AddIfNotPositive(
options.DefaultLeaseSeconds, options.DefaultLeaseSeconds,
"MxGateway:Sessions:DefaultLeaseSeconds must be greater than zero.", "MxGateway:Sessions:DefaultLeaseSeconds must be greater than zero.",
failures); builder);
AddIfNotPositive( AddIfNotPositive(
options.LeaseSweepIntervalSeconds, options.LeaseSweepIntervalSeconds,
"MxGateway:Sessions:LeaseSweepIntervalSeconds must be greater than zero.", "MxGateway:Sessions:LeaseSweepIntervalSeconds must be greater than zero.",
failures); builder);
if (options.AllowMultipleEventSubscribers) if (options.AllowMultipleEventSubscribers)
{ {
failures.Add( builder.Add(
"MxGateway:Sessions:AllowMultipleEventSubscribers is not supported until event fan-out is implemented."); "MxGateway:Sessions:AllowMultipleEventSubscribers is not supported until event fan-out is implemented.");
} }
} }
private static void ValidateEvents(EventOptions options, List<string> failures) private static void ValidateEvents(EventOptions options, ValidationBuilder builder)
{ {
AddIfNotPositive(options.QueueCapacity, "MxGateway:Events:QueueCapacity must be greater than zero.", failures); AddIfNotPositive(options.QueueCapacity, "MxGateway:Events:QueueCapacity must be greater than zero.", builder);
if (!Enum.IsDefined(options.BackpressurePolicy)) if (!Enum.IsDefined(options.BackpressurePolicy))
{ {
failures.Add("MxGateway:Events:BackpressurePolicy must be a supported backpressure policy."); builder.Add("MxGateway:Events:BackpressurePolicy must be a supported backpressure policy.");
} }
} }
private static void ValidateDashboard(DashboardOptions options, List<string> failures) private static void ValidateDashboard(DashboardOptions options, ValidationBuilder builder)
{ {
// GroupToRole shape is validated even when the dashboard is disabled so // GroupToRole shape is validated even when the dashboard is disabled so
// misconfiguration surfaces at startup; emptiness is allowed, with the // misconfiguration surfaces at startup; emptiness is allowed, with the
@@ -212,13 +206,13 @@ public sealed class GatewayOptionsValidator : IValidateOptions<GatewayOptions>
{ {
if (string.IsNullOrWhiteSpace(entry.Key)) if (string.IsNullOrWhiteSpace(entry.Key))
{ {
failures.Add("MxGateway:Dashboard:GroupToRole keys (LDAP group names) must be non-blank."); builder.Add("MxGateway:Dashboard:GroupToRole keys (LDAP group names) must be non-blank.");
} }
if (!string.Equals(entry.Value, Dashboard.DashboardRoles.Admin, StringComparison.Ordinal) if (!string.Equals(entry.Value, Dashboard.DashboardRoles.Admin, StringComparison.Ordinal)
&& !string.Equals(entry.Value, Dashboard.DashboardRoles.Viewer, StringComparison.Ordinal)) && !string.Equals(entry.Value, Dashboard.DashboardRoles.Viewer, StringComparison.Ordinal))
{ {
failures.Add( builder.Add(
$"MxGateway:Dashboard:GroupToRole['{entry.Key}'] must be '{Dashboard.DashboardRoles.Admin}' or '{Dashboard.DashboardRoles.Viewer}'."); $"MxGateway:Dashboard:GroupToRole['{entry.Key}'] must be '{Dashboard.DashboardRoles.Admin}' or '{Dashboard.DashboardRoles.Viewer}'.");
} }
} }
@@ -226,18 +220,20 @@ public sealed class GatewayOptionsValidator : IValidateOptions<GatewayOptions>
AddIfNotPositive( AddIfNotPositive(
options.SnapshotIntervalMilliseconds, options.SnapshotIntervalMilliseconds,
"MxGateway:Dashboard:SnapshotIntervalMilliseconds must be greater than zero.", "MxGateway:Dashboard:SnapshotIntervalMilliseconds must be greater than zero.",
failures); builder);
AddIfNegative( AddIfNegative(
options.RecentFaultLimit, options.RecentFaultLimit,
"MxGateway:Dashboard:RecentFaultLimit must be greater than or equal to zero.", "MxGateway:Dashboard:RecentFaultLimit must be greater than or equal to zero.",
failures); builder);
AddIfNegative( AddIfNegative(
options.RecentSessionLimit, options.RecentSessionLimit,
"MxGateway:Dashboard:RecentSessionLimit must be greater than or equal to zero.", "MxGateway:Dashboard:RecentSessionLimit must be greater than or equal to zero.",
failures); builder);
} }
private static void ValidateAlarms(AlarmsOptions options, List<string> failures) private static readonly string[] ValidAlarmFallbackModes = ["Auto", "ForceAlarmManager", "ForceSubtag"];
private static void ValidateAlarms(AlarmsOptions options, ValidationBuilder builder)
{ {
if (!options.Enabled) if (!options.Enabled)
{ {
@@ -251,26 +247,66 @@ public sealed class GatewayOptionsValidator : IValidateOptions<GatewayOptions>
if (string.IsNullOrWhiteSpace(options.SubscriptionExpression) if (string.IsNullOrWhiteSpace(options.SubscriptionExpression)
&& string.IsNullOrWhiteSpace(options.DefaultArea)) && string.IsNullOrWhiteSpace(options.DefaultArea))
{ {
failures.Add( builder.Add(
"MxGateway:Alarms requires either a non-blank SubscriptionExpression or a non-blank DefaultArea when Enabled is true."); "MxGateway:Alarms requires either a non-blank SubscriptionExpression or a non-blank DefaultArea when Enabled is true.");
} }
if (!string.IsNullOrWhiteSpace(options.SubscriptionExpression) if (!string.IsNullOrWhiteSpace(options.SubscriptionExpression)
&& !options.SubscriptionExpression.StartsWith(@"\\", StringComparison.Ordinal)) && !options.SubscriptionExpression.StartsWith(@"\\", StringComparison.Ordinal))
{ {
failures.Add( builder.Add(
@"MxGateway:Alarms:SubscriptionExpression must start with '\\' (canonical \\<host>\Galaxy!<area> shape)."); @"MxGateway:Alarms:SubscriptionExpression must start with '\\' (canonical \\<host>\Galaxy!<area> shape).");
} }
ValidateAlarmFallback(options.Fallback, builder);
}
private static void ValidateAlarmFallback(AlarmFallbackOptions fallback, ValidationBuilder builder)
{
// Validate Mode is one of the recognised values (case-insensitive).
bool modeValid = Array.Exists(
ValidAlarmFallbackModes,
m => string.Equals(m, fallback.Mode, StringComparison.OrdinalIgnoreCase));
if (!modeValid)
{
builder.Add(
$"MxGateway:Alarms:Fallback:Mode must be one of: {string.Join(", ", ValidAlarmFallbackModes)} (was '{fallback.Mode}').");
}
// ForceSubtag requires either Galaxy Repository discovery or an explicit IncludeAttributes list.
if (modeValid
&& string.Equals(fallback.Mode, "ForceSubtag", StringComparison.OrdinalIgnoreCase)
&& !fallback.Discovery.UseGalaxyRepository
&& fallback.Discovery.IncludeAttributes.Length == 0)
{
builder.Add(
"MxGateway:Alarms:Fallback ForceSubtag requires Galaxy Repository discovery or a non-empty Discovery:IncludeAttributes list.");
}
// Floor validation: numeric thresholds must be at least 1.
AddIfNotPositive(
fallback.ConsecutiveFailureThreshold,
"MxGateway:Alarms:Fallback:ConsecutiveFailureThreshold must be greater than zero.",
builder);
AddIfNotPositive(
fallback.FailbackProbeIntervalSeconds,
"MxGateway:Alarms:Fallback:FailbackProbeIntervalSeconds must be greater than zero.",
builder);
AddIfNotPositive(
fallback.FailbackStableProbes,
"MxGateway:Alarms:Fallback:FailbackStableProbes must be greater than zero.",
builder);
} }
private const int MinimumCertValidityYears = 1; private const int MinimumCertValidityYears = 1;
private const int MaximumCertValidityYears = 100; private const int MaximumCertValidityYears = 100;
private static void ValidateTls(TlsOptions options, List<string> failures) private static void ValidateTls(TlsOptions options, ValidationBuilder builder)
{ {
if (options.ValidityYears is < MinimumCertValidityYears or > MaximumCertValidityYears) if (options.ValidityYears is < MinimumCertValidityYears or > MaximumCertValidityYears)
{ {
failures.Add( builder.Add(
$"MxGateway:Tls:ValidityYears must be between {MinimumCertValidityYears} and {MaximumCertValidityYears}."); $"MxGateway:Tls:ValidityYears must be between {MinimumCertValidityYears} and {MaximumCertValidityYears}.");
} }
@@ -278,61 +314,52 @@ public sealed class GatewayOptionsValidator : IValidateOptions<GatewayOptions>
AddIfBlank( AddIfBlank(
options.SelfSignedCertPath, options.SelfSignedCertPath,
"MxGateway:Tls:SelfSignedCertPath must not be blank.", "MxGateway:Tls:SelfSignedCertPath must not be blank.",
failures); builder);
AddIfInvalidPath( AddIfInvalidPath(
options.SelfSignedCertPath, options.SelfSignedCertPath,
"MxGateway:Tls:SelfSignedCertPath must be a valid filesystem path.", "MxGateway:Tls:SelfSignedCertPath must be a valid filesystem path.",
failures); builder);
foreach (string dns in options.AdditionalDnsNames) foreach (string dns in options.AdditionalDnsNames)
{ {
if (string.IsNullOrWhiteSpace(dns)) if (string.IsNullOrWhiteSpace(dns))
{ {
failures.Add("MxGateway:Tls:AdditionalDnsNames entries must be non-blank."); builder.Add("MxGateway:Tls:AdditionalDnsNames entries must be non-blank.");
} }
} }
} }
private static void ValidateProtocol(ProtocolOptions options, List<string> failures) private static void ValidateProtocol(ProtocolOptions options, ValidationBuilder builder)
{ {
if (options.WorkerProtocolVersion != GatewayContractInfo.WorkerProtocolVersion) if (options.WorkerProtocolVersion != GatewayContractInfo.WorkerProtocolVersion)
{ {
failures.Add( builder.Add(
$"MxGateway:Protocol:WorkerProtocolVersion must be {GatewayContractInfo.WorkerProtocolVersion}."); $"MxGateway:Protocol:WorkerProtocolVersion must be {GatewayContractInfo.WorkerProtocolVersion}.");
} }
if (options.MaxGrpcMessageBytes is < MinimumMaxMessageBytes or > MaximumMaxMessageBytes) if (options.MaxGrpcMessageBytes is < MinimumMaxMessageBytes or > MaximumMaxMessageBytes)
{ {
failures.Add( builder.Add(
$"MxGateway:Protocol:MaxGrpcMessageBytes must be between {MinimumMaxMessageBytes} and {MaximumMaxMessageBytes}."); $"MxGateway:Protocol:MaxGrpcMessageBytes must be between {MinimumMaxMessageBytes} and {MaximumMaxMessageBytes}.");
} }
} }
private static void AddIfBlank(string? value, string message, List<string> failures) private static void AddIfBlank(string? value, string message, ValidationBuilder builder)
{ {
if (string.IsNullOrWhiteSpace(value)) builder.RequireThat(!string.IsNullOrWhiteSpace(value), message);
{
failures.Add(message);
}
} }
private static void AddIfNotPositive(int value, string message, List<string> failures) private static void AddIfNotPositive(int value, string message, ValidationBuilder builder)
{ {
if (value <= 0) builder.RequireThat(value > 0, message);
{
failures.Add(message);
}
} }
private static void AddIfNegative(int value, string message, List<string> failures) private static void AddIfNegative(int value, string message, ValidationBuilder builder)
{ {
if (value < 0) builder.RequireThat(value >= 0, message);
{
failures.Add(message);
}
} }
private static void AddIfInvalidPath(string? value, string message, List<string> failures) private static void AddIfInvalidPath(string? value, string message, ValidationBuilder builder)
{ {
if (string.IsNullOrWhiteSpace(value)) if (string.IsNullOrWhiteSpace(value))
{ {
@@ -345,15 +372,19 @@ public sealed class GatewayOptionsValidator : IValidateOptions<GatewayOptions>
} }
catch (ArgumentException) catch (ArgumentException)
{ {
failures.Add(message); builder.Add(message);
} }
catch (NotSupportedException) catch (NotSupportedException)
{ {
failures.Add(message); builder.Add(message);
} }
catch (PathTooLongException) catch (PathTooLongException)
{ {
failures.Add(message); builder.Add(message);
}
catch (IOException)
{
builder.Add(message);
} }
} }
} }
@@ -1,5 +1,32 @@
using ZB.MOM.WW.Auth.Abstractions.Ldap;
namespace ZB.MOM.WW.MxGateway.Server.Configuration; namespace ZB.MOM.WW.MxGateway.Server.Configuration;
/// <summary>
/// Gateway-side view of the <c>MxGateway:Ldap</c> section. This is a SHADOW of the
/// shared <see cref="ZB.MOM.WW.Auth.Abstractions.Ldap.LdapOptions"/> type and is NOT
/// used to perform LDAP authentication at runtime — runtime bind/search is done by the
/// shared <c>ZB.MOM.WW.Auth.Ldap</c> provider, whose options are bound directly from the
/// same <c>MxGateway:Ldap</c> section by <c>AddZbLdapAuth</c> (see
/// <see cref="ZB.MOM.WW.MxGateway.Server.Dashboard.DashboardServiceCollectionExtensions"/>).
/// <para>
/// This shadow exists for three things only: (1) startup validation via
/// <see cref="GatewayOptionsValidator"/>; (2) the redacted effective-config display
/// (<see cref="EffectiveLdapConfiguration"/> / <see cref="GatewayConfigurationProvider"/>);
/// and (3) it is the single home of the gateway's dev/default LDAP values, which the
/// integration live-test helper copies onto the shared options.
/// </para>
/// <para>
/// Review C2 — DRIFT WARNING: this class MUST stay field-compatible with the shared
/// <see cref="ZB.MOM.WW.Auth.Abstractions.Ldap.LdapOptions"/> so the one config section
/// binds cleanly onto both. The two are intentionally NOT merged because their defaults
/// differ on purpose: this shadow ships dev-friendly defaults (plaintext localhost,
/// <c>AllowInsecure=true</c>, populated <c>SearchBase</c>/<c>ServiceAccount*</c>), whereas
/// the shared type is secure-by-default (<c>Transport=Ldaps</c>, <c>AllowInsecure=false</c>,
/// empty DN fields). If you add/rename/remove a field on the shared type, mirror it here
/// (and in the validator + effective-config) so the section keeps binding to both.
/// </para>
/// </summary>
public sealed class LdapOptions public sealed class LdapOptions
{ {
/// <summary>Gets a value indicating whether LDAP authentication is enabled.</summary> /// <summary>Gets a value indicating whether LDAP authentication is enabled.</summary>
@@ -11,17 +38,24 @@ public sealed class LdapOptions
/// <summary>Gets the LDAP server port.</summary> /// <summary>Gets the LDAP server port.</summary>
public int Port { get; init; } = 3893; public int Port { get; init; } = 3893;
/// <summary>Gets a value indicating whether TLS is required for the connection.</summary> /// <summary>
public bool UseTls { get; init; } /// Gets the transport/TLS mode for the LDAP connection. Replaces the former
/// boolean <c>UseTls</c> (true ≈ <see cref="LdapTransport.Ldaps"/>, false =
/// <see cref="LdapTransport.None"/>). <see cref="LdapTransport.StartTls"/> upgrades
/// a plaintext connection to TLS. Matches the shared
/// <see cref="ZB.MOM.WW.Auth.Abstractions.Ldap.LdapOptions.Transport"/> field so the
/// <c>MxGateway:Ldap</c> section binds straight onto the shared options.
/// </summary>
public LdapTransport Transport { get; init; } = LdapTransport.None;
/// <summary>Gets a value indicating whether insecure LDAP connections are allowed.</summary> /// <summary>Gets a value indicating whether insecure (plaintext) LDAP connections are allowed.</summary>
public bool AllowInsecureLdap { get; init; } = true; public bool AllowInsecure { get; init; } = true;
/// <summary>Gets the LDAP search base distinguished name.</summary> /// <summary>Gets the LDAP search base distinguished name.</summary>
public string SearchBase { get; init; } = "dc=lmxopcua,dc=local"; public string SearchBase { get; init; } = "dc=zb,dc=local";
/// <summary>Gets the service account distinguished name.</summary> /// <summary>Gets the service account distinguished name.</summary>
public string ServiceAccountDn { get; init; } = "cn=serviceaccount,dc=lmxopcua,dc=local"; public string ServiceAccountDn { get; init; } = "cn=serviceaccount,dc=zb,dc=local";
/// <summary>Gets the service account password.</summary> /// <summary>Gets the service account password.</summary>
public string ServiceAccountPassword { get; init; } = "serviceaccount123"; public string ServiceAccountPassword { get; init; } = "serviceaccount123";
@@ -5,14 +5,14 @@
<meta name="viewport" content="width=device-width, initial-scale=1" /> <meta name="viewport" content="width=device-width, initial-scale=1" />
<base href="/" /> <base href="/" />
<link rel="stylesheet" href="/lib/bootstrap/css/bootstrap.min.css" /> <link rel="stylesheet" href="/lib/bootstrap/css/bootstrap.min.css" />
<link rel="stylesheet" href="/css/theme.css" /> <ThemeHead />
<link rel="stylesheet" href="/css/site.css" /> <link rel="stylesheet" href="/css/site.css" />
<HeadOutlet @rendermode="InteractiveServer" /> <HeadOutlet @rendermode="InteractiveServer" />
</head> </head>
<body class="dashboard-body"> <body class="dashboard-body">
<Routes @rendermode="InteractiveServer" /> <Routes @rendermode="InteractiveServer" />
<script src="/lib/bootstrap/js/bootstrap.bundle.min.js"></script> <script src="/lib/bootstrap/js/bootstrap.bundle.min.js"></script>
<script src="/js/nav-state.js"></script> <ThemeScripts />
<script src="/_framework/blazor.web.js"></script> <script src="/_framework/blazor.web.js"></script>
</body> </body>
</html> </html>
@@ -0,0 +1,6 @@
@inherits LayoutComponentBase
@* Minimal layout for the login page: no side rail, no brand block. The page
renders its own centred card via the shared kit's <LoginCard>. Mirrors
OtOpcUa AdminUI's LoginLayout. *@
@Body
@@ -1,210 +1,40 @@
@using System.Linq
@using Microsoft.AspNetCore.Components.Routing
@using Microsoft.JSInterop
@implements IDisposable
@inherits LayoutComponentBase @inherits LayoutComponentBase
@inject NavigationManager Navigation
@inject IJSRuntime JS
<div class="d-flex flex-column flex-lg-row" style="min-height: 100vh;"> @* Thin layout: delegates the side-rail chassis (hamburger, brand, responsive
@* Hamburger toggle: visible only on viewports <lg. Bootstrap collapse JS collapse) to the shared ZB.MOM.WW.Theme <ThemeShell>. The nav is reproduced
lives in bootstrap.bundle.min.js (loaded in App.razor). *@ with the kit's NavRailSection / NavRailItem; section expand-state persistence
<button class="btn btn-outline-secondary btn-sm d-lg-none m-2 align-self-start" is owned by the kit's <details> + ThemeScripts (no JS interop here). *@
type="button" <ThemeShell Product="MXAccess Gateway" Accent="#2f5fd0">
data-bs-toggle="collapse" <Nav>
data-bs-target="#sidebar-collapse" <NavRailItem Href="/" Text="Dashboard" Match="NavLinkMatch.All" />
aria-controls="sidebar-collapse" <NavRailSection Title="Runtime" Key="runtime">
aria-expanded="false" <NavRailItem Href="/sessions" Text="Sessions" />
aria-label="Toggle navigation"> <NavRailItem Href="/workers" Text="Workers" />
&#9776; <NavRailItem Href="/events" Text="Events" />
</button> <NavRailItem Href="/alarms" Text="Alarms" />
</NavRailSection>
<div class="collapse d-lg-block" id="sidebar-collapse"> <NavRailSection Title="Galaxy" Key="galaxy">
<nav class="sidebar d-flex flex-column"> <NavRailItem Href="/galaxy" Text="Repository" />
<a class="brand" href="/"><span class="mark">&#9646;</span> MXAccess Gateway</a> <NavRailItem Href="/browse" Text="Browse" />
</NavRailSection>
<div style="overflow-y:auto; flex:1 1 auto; min-height:0;"> <NavRailSection Title="Admin" Key="admin">
<ul class="nav flex-column"> <NavRailItem Href="/apikeys" Text="API Keys" />
<li class="nav-item"> <NavRailItem Href="/settings" Text="Settings" />
<NavLink class="nav-link" href="/" Match="NavLinkMatch.All">Dashboard</NavLink> </NavRailSection>
</li> </Nav>
<RailFooter>
<NavSection Title="Runtime" <AuthorizeView>
Expanded="@_expanded.Contains("runtime")" <Authorized Context="authState">
OnToggle="@(() => ToggleAsync("runtime"))"> <span class="rail-user">@authState.User.Identity?.Name</span>
<li class="nav-item"> <form method="post" action="/logout" data-enhance="false">
<NavLink class="nav-link" href="/sessions" Match="NavLinkMatch.Prefix">Sessions</NavLink> <AntiforgeryToken />
</li> <button class="rail-btn" type="submit">Sign Out</button>
<li class="nav-item"> </form>
<NavLink class="nav-link" href="/workers" Match="NavLinkMatch.Prefix">Workers</NavLink> </Authorized>
</li> <NotAuthorized>
<li class="nav-item"> <a class="rail-btn" href="/login">Sign In</a>
<NavLink class="nav-link" href="/events" Match="NavLinkMatch.Prefix">Events</NavLink> </NotAuthorized>
</li> </AuthorizeView>
<li class="nav-item"> </RailFooter>
<NavLink class="nav-link" href="/alarms" Match="NavLinkMatch.Prefix">Alarms</NavLink> <ChildContent>@Body</ChildContent>
</li> </ThemeShell>
</NavSection>
<NavSection Title="Galaxy"
Expanded="@_expanded.Contains("galaxy")"
OnToggle="@(() => ToggleAsync("galaxy"))">
<li class="nav-item">
<NavLink class="nav-link" href="/galaxy" Match="NavLinkMatch.Prefix">Repository</NavLink>
</li>
<li class="nav-item">
<NavLink class="nav-link" href="/browse" Match="NavLinkMatch.Prefix">Browse</NavLink>
</li>
</NavSection>
<NavSection Title="Admin"
Expanded="@_expanded.Contains("admin")"
OnToggle="@(() => ToggleAsync("admin"))">
<li class="nav-item">
<NavLink class="nav-link" href="/apikeys" Match="NavLinkMatch.Prefix">API Keys</NavLink>
</li>
<li class="nav-item">
<NavLink class="nav-link" href="/settings" Match="NavLinkMatch.Prefix">Settings</NavLink>
</li>
</NavSection>
</ul>
</div>
<AuthorizeView>
<Authorized Context="authState">
<div class="border-top px-3 py-2">
<div class="d-flex justify-content-between align-items-center">
<span class="text-body-secondary small">@authState.User.Identity?.Name</span>
<form method="post" action="/logout" data-enhance="false">
<AntiforgeryToken />
<button type="submit" class="btn btn-outline-secondary btn-sm py-0 px-2">Sign Out</button>
</form>
</div>
</div>
</Authorized>
<NotAuthorized>
<div class="border-top px-3 py-2">
<a href="/login" class="btn btn-outline-secondary btn-sm py-0 px-2 w-100">Sign In</a>
</div>
</NotAuthorized>
</AuthorizeView>
</nav>
</div>
<main class="page flex-grow-1">
@Body
</main>
</div>
@code {
// Sections whose collapsed/expanded state we persist. Acts as the allow-list
// when parsing the cookie so stale or attacker-supplied ids are ignored.
private static readonly string[] SectionIds = { "runtime", "galaxy", "admin" };
// The currently-expanded sections. Populated from the cookie on first
// render; mutated by ToggleAsync and by navigating into a section.
private readonly HashSet<string> _expanded = new(StringComparer.Ordinal);
protected override void OnInitialized()
{
Navigation.LocationChanged += OnLocationChanged;
}
protected override async Task OnAfterRenderAsync(bool firstRender)
{
if (!firstRender)
{
return;
}
// Hydrate from the cookie. Until this completes the sidebar paints
// collapsed, matching the CentralUI behaviour.
string saved;
try
{
saved = await JS.InvokeAsync<string>("navState.get") ?? string.Empty;
}
catch (JSDisconnectedException)
{
return;
}
foreach (var id in saved.Split(
',', StringSplitOptions.RemoveEmptyEntries | StringSplitOptions.TrimEntries))
{
if (Array.IndexOf(SectionIds, id) >= 0)
{
_expanded.Add(id);
}
}
// The section of the page we loaded on is always expanded.
if (EnsureCurrentSectionExpanded())
{
await PersistAsync();
}
StateHasChanged();
}
private void OnLocationChanged(object? sender, LocationChangedEventArgs e)
{
if (EnsureCurrentSectionExpanded())
{
_ = PersistAsync();
_ = InvokeAsync(StateHasChanged);
}
}
private async Task ToggleAsync(string id)
{
if (!_expanded.Remove(id))
{
_expanded.Add(id);
}
await PersistAsync();
}
// Adds the current page's section to _expanded; returns true if it changed.
private bool EnsureCurrentSectionExpanded()
{
var section = CurrentSection();
return section is not null && _expanded.Add(section);
}
// Maps the current URL's first path segment to a section id, or null for
// sectionless pages (Dashboard, Login).
private string? CurrentSection()
{
var relative = Navigation.ToBaseRelativePath(Navigation.Uri);
var firstSegment = relative.Split('?', '#')[0]
.Split('/', StringSplitOptions.RemoveEmptyEntries)
.FirstOrDefault();
return firstSegment switch
{
"sessions" or "workers" or "events" or "alarms" => "runtime",
"galaxy" or "browse" => "galaxy",
"apikeys" or "settings" => "admin",
_ => null,
};
}
private async Task PersistAsync()
{
try
{
await JS.InvokeVoidAsync("navState.set", string.Join(',', _expanded));
}
catch (JSDisconnectedException)
{
// The circuit is gone — nothing to persist to.
}
}
public void Dispose()
{
Navigation.LocationChanged -= OnLocationChanged;
}
}
@@ -1,35 +0,0 @@
@* A collapsible sidebar nav section. The header is a full-width button that
toggles ChildContent visibility. Pattern lifted from ScadaLink CentralUI
(Components/Layout/NavSection.razor) — see [[project-deployed-service]]. *@
<li class="nav-item">
<button type="button"
class="nav-section-toggle"
@onclick="OnToggle"
aria-expanded="@(Expanded ? "true" : "false")">
<span class="chevron" aria-hidden="true">@(Expanded ? "▾" : "▸")</span>
<span>@Title</span>
</button>
</li>
@if (Expanded)
{
@ChildContent
}
@code {
/// <summary>Section label shown in the header (e.g. "Runtime").</summary>
[Parameter, EditorRequired]
public string Title { get; set; } = string.Empty;
/// <summary>Whether the section is expanded — its items rendered.</summary>
[Parameter]
public bool Expanded { get; set; }
/// <summary>Raised when the header button is clicked.</summary>
[Parameter]
public EventCallback OnToggle { get; set; }
/// <summary>The section's nav items, rendered only while expanded.</summary>
[Parameter]
public RenderFragment? ChildContent { get; set; }
}
@@ -1,7 +1,10 @@
@page "/alarms" @page "/alarms"
@implements IAsyncDisposable @implements IAsyncDisposable
@using Microsoft.AspNetCore.SignalR.Client
@using ZB.MOM.WW.MxGateway.Server.Dashboard.Hubs
@inject IDashboardLiveDataService LiveData @inject IDashboardLiveDataService LiveData
@inject IOptions<GatewayOptions> GatewayOptions @inject IOptions<GatewayOptions> GatewayOptions
@inject DashboardHubConnectionFactory HubFactory
<PageTitle>Dashboard Alarms</PageTitle> <PageTitle>Dashboard Alarms</PageTitle>
@@ -10,6 +13,12 @@
<h1>Alarms</h1> <h1>Alarms</h1>
<div class="text-secondary">@HeaderLine()</div> <div class="text-secondary">@HeaderLine()</div>
</div> </div>
<div class="d-flex align-items-center gap-2">
<span class="badge @_providerStatus.BadgeCssClass"
title="@ProviderStatusTitle()">
@_providerStatus.Label
</span>
</div>
</div> </div>
@if (!GatewayOptions.Value.Alarms.Enabled) @if (!GatewayOptions.Value.Alarms.Enabled)
@@ -163,10 +172,44 @@
private readonly CancellationTokenSource _cts = new(); private readonly CancellationTokenSource _cts = new();
private Task? _pollTask; private Task? _pollTask;
private DashboardAlarmProviderStatus _providerStatus = DashboardAlarmProviderStatus.Healthy;
private HubConnection? _alarmsHub;
/// <inheritdoc /> /// <inheritdoc />
protected override void OnInitialized() protected override void OnInitialized()
{ {
_pollTask = PollLoopAsync(); _pollTask = PollLoopAsync();
_ = AttachAlarmsHubAsync();
}
private string? ProviderStatusTitle()
{
return _providerStatus.IsDegraded && !string.IsNullOrWhiteSpace(_providerStatus.Reason)
? _providerStatus.Reason
: null;
}
private async Task AttachAlarmsHubAsync()
{
_alarmsHub = HubFactory.Create("/hubs/alarms");
_alarmsHub.On<AlarmFeedMessage>(AlarmsHub.AlarmMessage, async message =>
{
if (message.PayloadCase == AlarmFeedMessage.PayloadOneofCase.ProviderStatus)
{
_providerStatus = DashboardAlarmProviderStatus.FromFeed(message);
await InvokeAsync(StateHasChanged).ConfigureAwait(false);
}
});
try
{
await _alarmsHub.StartAsync(_cts.Token).ConfigureAwait(false);
}
catch
{
// The badge is best-effort; it stays at the healthy default until
// the hub reconnects and delivers a fresh provider-status message.
}
} }
private string HeaderLine() private string HeaderLine()
@@ -268,6 +311,19 @@
public async ValueTask DisposeAsync() public async ValueTask DisposeAsync()
{ {
await _cts.CancelAsync(); await _cts.CancelAsync();
if (_alarmsHub is not null)
{
try
{
await _alarmsHub.DisposeAsync();
}
catch
{
// Disposal-time errors are best-effort.
}
}
if (_pollTask is not null) if (_pollTask is not null)
{ {
try try
@@ -0,0 +1,33 @@
@page "/login"
@layout LoginLayout
@using Microsoft.AspNetCore.Authorization
@* Login MUST stay anonymously reachable — [AllowAnonymous] overrides the
RequireAuthorization(ViewerPolicy) that MapRazorComponents<App>() applies, so the
cookie scheme's LoginPath="/login" redirect lands here for unauthenticated users.
The card is the shared kit's <LoginCard>: it renders a NATIVE static
<form method="post" action="/auth/login"> (username/password + hidden returnUrl). A native
form submit is not a Blazor event, so it reaches the minimal-API POST /auth/login endpoint
regardless of this app's InteractiveServer render mode. <AntiforgeryToken/> supplies the
token that PostLoginAsync's antiforgery.ValidateRequestAsync checks.
NOTE: the POST target is /auth/login, NOT /login. This @page lives at "/login" and the
Razor Components endpoint matches ALL methods, so a POST to /login collided with the
minimal-API MapPost("/login") and threw AmbiguousMatchException (HTTP 500). Posting to a
distinct /auth/login path (mirroring ScadaBridge) keeps the GET page and POST handler from
sharing a route. *@
@attribute [AllowAnonymous]
<LoginCard Product="MXAccess Gateway" Action="/auth/login" ReturnUrl="@ReturnUrl" Error="@Error">
<AntiforgeryToken />
</LoginCard>
@code {
/// <summary>Original protected URL the operator was bounced from; round-tripped to POST /login.</summary>
[SupplyParameterFromQuery(Name = "returnUrl")]
private string? ReturnUrl { get; set; }
/// <summary>Failure message surfaced by POST /login after a failed authentication.</summary>
[SupplyParameterFromQuery(Name = "error")]
private string? Error { get; set; }
}
@@ -26,7 +26,7 @@ else
<tr><th scope="row">Run migrations</th><td>@Snapshot.Configuration.Authentication.RunMigrationsOnStartup</td></tr> <tr><th scope="row">Run migrations</th><td>@Snapshot.Configuration.Authentication.RunMigrationsOnStartup</td></tr>
<tr><th scope="row">LDAP enabled</th><td>@Snapshot.Configuration.Ldap.Enabled</td></tr> <tr><th scope="row">LDAP enabled</th><td>@Snapshot.Configuration.Ldap.Enabled</td></tr>
<tr><th scope="row">LDAP server</th><td>@Snapshot.Configuration.Ldap.Server:@Snapshot.Configuration.Ldap.Port</td></tr> <tr><th scope="row">LDAP server</th><td>@Snapshot.Configuration.Ldap.Server:@Snapshot.Configuration.Ldap.Port</td></tr>
<tr><th scope="row">LDAP TLS</th><td>@Snapshot.Configuration.Ldap.UseTls</td></tr> <tr><th scope="row">LDAP transport</th><td>@Snapshot.Configuration.Ldap.Transport</td></tr>
<tr><th scope="row">LDAP search base</th><td><code>@Snapshot.Configuration.Ldap.SearchBase</code></td></tr> <tr><th scope="row">LDAP search base</th><td><code>@Snapshot.Configuration.Ldap.SearchBase</code></td></tr>
<tr><th scope="row">LDAP service account</th><td><code>@Snapshot.Configuration.Ldap.ServiceAccountDn</code></td></tr> <tr><th scope="row">LDAP service account</th><td><code>@Snapshot.Configuration.Ldap.ServiceAccountDn</code></td></tr>
<tr><th scope="row">LDAP service password</th><td>@Snapshot.Configuration.Ldap.ServiceAccountPassword</td></tr> <tr><th scope="row">LDAP service password</th><td>@Snapshot.Configuration.Ldap.ServiceAccountPassword</td></tr>
@@ -1,16 +1,16 @@
<span class="chip @CssClass">@Text</span> @* Thin adapter: maps MxGateway runtime state text → kit StatusPill state.
The bespoke .chip rendering now lives in the kit; only the app's domain
text→state vocabulary remains here. Call sites (<StatusBadge Text="..."/>) unchanged. *@
<StatusPill State="MapState(Text)">@Text</StatusPill>
@code { @code {
[Parameter] [Parameter] public string? Text { get; set; }
public string? Text { get; set; }
private string CssClass => Text switch private static StatusState MapState(string? text) => text switch
{ {
"Ready" or "Healthy" or "Active" => "chip-ok", "Ready" or "Healthy" or "Active" => StatusState.Ok,
"Creating" or "StartingWorker" or "WaitingForPipe" or "InitializingWorker" or "Closing" => "chip-warn", "Creating" or "StartingWorker" or "WaitingForPipe" or "InitializingWorker" or "Closing"
"Stale" or "Degraded" => "chip-warn", or "Stale" or "Degraded" => StatusState.Warn,
"Faulted" or "Unavailable" => "chip-bad", "Faulted" or "Unavailable" => StatusState.Bad,
"Closed" or "Revoked" or "Unknown" => "chip-idle", _ => StatusState.Idle,
_ => "chip-idle"
}; };
} }
@@ -10,4 +10,5 @@
@using ZB.MOM.WW.MxGateway.Server.Dashboard.Components.Shared @using ZB.MOM.WW.MxGateway.Server.Dashboard.Components.Shared
@using ZB.MOM.WW.MxGateway.Server.Security.Authorization @using ZB.MOM.WW.MxGateway.Server.Security.Authorization
@using ZB.MOM.WW.MxGateway.Server.Workers @using ZB.MOM.WW.MxGateway.Server.Workers
@using ZB.MOM.WW.Theme
@using static Microsoft.AspNetCore.Components.Web.RenderMode @using static Microsoft.AspNetCore.Components.Web.RenderMode
@@ -0,0 +1,78 @@
using ZB.MOM.WW.MxGateway.Contracts.Proto;
namespace ZB.MOM.WW.MxGateway.Server.Dashboard;
/// <summary>
/// Dashboard projection of an <see cref="AlarmProviderStatus" /> message
/// carried on the alarm feed. Maps the protobuf provider mode / degraded
/// flag into Bootstrap-only display fields so the Alarms page can render a
/// status badge without touching protobuf types.
/// </summary>
public sealed record DashboardAlarmProviderStatus(
AlarmProviderMode Mode,
bool IsDegraded,
string Label,
string BadgeCssClass,
string Reason,
DateTimeOffset? SinceUtc)
{
/// <summary>Badge label shown when the alarm-manager provider is healthy.</summary>
public const string AlarmManagerLabel = "Alarm Manager";
/// <summary>Badge label shown when the feed has fallen back to subtag monitoring.</summary>
public const string DegradedLabel = "Subtag monitoring (degraded)";
private const string HealthyBadge = "bg-success";
private const string DegradedBadge = "bg-warning text-dark";
/// <summary>
/// The default status assumed before the first provider-status message
/// arrives: healthy alarm-manager mode.
/// </summary>
public static DashboardAlarmProviderStatus Healthy { get; } = new(
Mode: AlarmProviderMode.Alarmmgr,
IsDegraded: false,
Label: AlarmManagerLabel,
BadgeCssClass: HealthyBadge,
Reason: string.Empty,
SinceUtc: null);
/// <summary>Projects an alarm-feed provider-status payload into a dashboard badge model.</summary>
/// <param name="status">The provider-status payload from an <see cref="AlarmFeedMessage" />.</param>
/// <returns>The projected dashboard status.</returns>
public static DashboardAlarmProviderStatus FromProviderStatus(AlarmProviderStatus status)
{
ArgumentNullException.ThrowIfNull(status);
// Treat the explicit degraded flag and the SUBTAG mode as equivalent;
// the contract sets degraded=true whenever mode == SUBTAG, but guard
// against either being set independently.
bool degraded = status.Degraded || status.Mode == AlarmProviderMode.Subtag;
return new DashboardAlarmProviderStatus(
Mode: status.Mode,
IsDegraded: degraded,
Label: degraded ? DegradedLabel : AlarmManagerLabel,
BadgeCssClass: degraded ? DegradedBadge : HealthyBadge,
Reason: status.Reason ?? string.Empty,
SinceUtc: status.Since?.ToDateTimeOffset());
}
/// <summary>Projects an alarm-feed message into a dashboard badge model.</summary>
/// <param name="message">An alarm-feed message whose payload is a provider status.</param>
/// <returns>The projected dashboard status.</returns>
/// <exception cref="ArgumentException">The message does not carry a provider-status payload.</exception>
public static DashboardAlarmProviderStatus FromFeed(AlarmFeedMessage message)
{
ArgumentNullException.ThrowIfNull(message);
if (message.PayloadCase != AlarmFeedMessage.PayloadOneofCase.ProviderStatus)
{
throw new ArgumentException(
"Alarm-feed message does not carry a provider-status payload.",
nameof(message));
}
return FromProviderStatus(message.ProviderStatus);
}
}
@@ -1,5 +1,10 @@
using System.Security.Claims; using System.Security.Claims;
using System.Text.Json;
using Microsoft.Data.Sqlite; using Microsoft.Data.Sqlite;
using ZB.MOM.WW.Audit;
using ZB.MOM.WW.Auth.Abstractions.ApiKeys;
using ZB.MOM.WW.Auth.ApiKeys.Admin;
using ZB.MOM.WW.MxGateway.Server.Security.Audit;
using ZB.MOM.WW.MxGateway.Server.Security.Authentication; using ZB.MOM.WW.MxGateway.Server.Security.Authentication;
using ZB.MOM.WW.MxGateway.Server.Security.Authorization; using ZB.MOM.WW.MxGateway.Server.Security.Authorization;
@@ -7,12 +12,13 @@ namespace ZB.MOM.WW.MxGateway.Server.Dashboard;
public sealed class DashboardApiKeyManagementService( public sealed class DashboardApiKeyManagementService(
DashboardApiKeyAuthorization authorization, DashboardApiKeyAuthorization authorization,
ApiKeyAdminCommands adminCommands,
IApiKeyAdminStore adminStore, IApiKeyAdminStore adminStore,
IApiKeyAuditStore auditStore, IAuditWriter auditWriter,
IApiKeySecretHasher hasher,
IHttpContextAccessor httpContextAccessor) : IDashboardApiKeyManagementService IHttpContextAccessor httpContextAccessor) : IDashboardApiKeyManagementService
{ {
private const string UnauthorizedMessage = "Sign in with an authorized LDAP account to manage API keys."; private const string UnauthorizedMessage = "Sign in with an authorized LDAP account to manage API keys.";
private const string PepperUnavailableMarker = "pepper unavailable";
/// <summary>Determines whether the user can manage API keys.</summary> /// <summary>Determines whether the user can manage API keys.</summary>
/// <param name="user">The authenticated user principal.</param> /// <param name="user">The authenticated user principal.</param>
@@ -42,28 +48,31 @@ public sealed class DashboardApiKeyManagementService(
} }
string keyId = request.KeyId.Trim(); string keyId = request.KeyId.Trim();
string secret = ApiKeySecretGenerator.Generate();
string apiKey = FormatApiKey(keyId, secret);
try try
{ {
await adminStore.CreateAsync( // The shared command set generates the secret, hashes it with the pepper, persists the
new ApiKeyCreateRequest( // record and assembles the mxgw_<id>_<secret> token (shown once). It also appends its own
KeyId: keyId, // "create-key" audit entry (now canonicalized through the IApiKeyAuditStore->IAuditWriter
KeyPrefix: $"mxgw_{keyId}", // adapter); the dashboard layers a richer "dashboard-create-key" canonical AuditEvent
SecretHash: hasher.HashSecret(secret), // (Target + CorrelationId + remote address) on top via IAuditWriter to preserve the
DisplayName: request.DisplayName.Trim(), // dashboard audit vocabulary — both rows land in the canonical audit_event store.
Scopes: request.Scopes, CreateKeyResult created = await adminCommands.CreateKeyAsync(
Constraints: request.Constraints, keyId,
CreatedUtc: DateTimeOffset.UtcNow), request.DisplayName.Trim(),
request.Scopes,
ApiKeyConstraintSerializer.Serialize(request.Constraints),
RemoteAddress(),
cancellationToken) cancellationToken)
.ConfigureAwait(false); .ConfigureAwait(false);
await AppendAuditAsync(keyId, "dashboard-create-key", null, cancellationToken).ConfigureAwait(false); await WriteDashboardAuditAsync(user, keyId, "dashboard-create-key", null, cancellationToken).ConfigureAwait(false);
return DashboardApiKeyManagementResult.Success("API key created. Copy the key now; it will not be shown again.", apiKey); return DashboardApiKeyManagementResult.Success(
"API key created. Copy the key now; it will not be shown again.",
created.Token);
} }
catch (ApiKeyPepperUnavailableException) catch (InvalidOperationException exception) when (IsPepperUnavailable(exception))
{ {
return DashboardApiKeyManagementResult.Fail("API key pepper is not configured."); return DashboardApiKeyManagementResult.Fail("API key pepper is not configured.");
} }
@@ -94,18 +103,19 @@ public sealed class DashboardApiKeyManagementService(
} }
string normalizedKeyId = keyId.Trim(); string normalizedKeyId = keyId.Trim();
bool revoked = await adminStore KeyActionResult result = await adminCommands
.RevokeAsync(normalizedKeyId, DateTimeOffset.UtcNow, cancellationToken) .RevokeKeyAsync(normalizedKeyId, RemoteAddress(), cancellationToken)
.ConfigureAwait(false); .ConfigureAwait(false);
await AppendAuditAsync( await WriteDashboardAuditAsync(
user,
normalizedKeyId, normalizedKeyId,
"dashboard-revoke-key", "dashboard-revoke-key",
revoked ? "revoked" : "not-found-or-already-revoked", result.Succeeded ? "revoked" : "not-found-or-already-revoked",
cancellationToken) cancellationToken)
.ConfigureAwait(false); .ConfigureAwait(false);
return revoked return result.Succeeded
? DashboardApiKeyManagementResult.Success("API key revoked.") ? DashboardApiKeyManagementResult.Success("API key revoked.")
: DashboardApiKeyManagementResult.Fail("API key was not found or is already revoked."); : DashboardApiKeyManagementResult.Fail("API key was not found or is already revoked.");
} }
@@ -131,27 +141,30 @@ public sealed class DashboardApiKeyManagementService(
} }
string normalizedKeyId = keyId.Trim(); string normalizedKeyId = keyId.Trim();
string secret = ApiKeySecretGenerator.Generate();
string apiKey = FormatApiKey(normalizedKeyId, secret);
try try
{ {
bool rotated = await adminStore CreateKeyResult rotated = await adminCommands
.RotateAsync(normalizedKeyId, hasher.HashSecret(secret), DateTimeOffset.UtcNow, cancellationToken) .RotateKeyAsync(normalizedKeyId, RemoteAddress(), cancellationToken)
.ConfigureAwait(false); .ConfigureAwait(false);
await AppendAuditAsync( bool succeeded = rotated.Token is not null;
await WriteDashboardAuditAsync(
user,
normalizedKeyId, normalizedKeyId,
"dashboard-rotate-key", "dashboard-rotate-key",
rotated ? "rotated" : "not-found", succeeded ? "rotated" : "not-found",
cancellationToken) cancellationToken)
.ConfigureAwait(false); .ConfigureAwait(false);
return rotated return succeeded
? DashboardApiKeyManagementResult.Success("API key rotated. Copy the key now; it will not be shown again.", apiKey) ? DashboardApiKeyManagementResult.Success(
"API key rotated. Copy the key now; it will not be shown again.",
rotated.Token)
: DashboardApiKeyManagementResult.Fail("API key was not found."); : DashboardApiKeyManagementResult.Fail("API key was not found.");
} }
catch (ApiKeyPepperUnavailableException) catch (InvalidOperationException exception) when (IsPepperUnavailable(exception))
{ {
return DashboardApiKeyManagementResult.Fail("API key pepper is not configured."); return DashboardApiKeyManagementResult.Fail("API key pepper is not configured.");
} }
@@ -182,7 +195,8 @@ public sealed class DashboardApiKeyManagementService(
.DeleteAsync(normalizedKeyId, cancellationToken) .DeleteAsync(normalizedKeyId, cancellationToken)
.ConfigureAwait(false); .ConfigureAwait(false);
await AppendAuditAsync( await WriteDashboardAuditAsync(
user,
normalizedKeyId, normalizedKeyId,
"dashboard-delete-key", "dashboard-delete-key",
deleted ? "deleted" : "not-found-or-active", deleted ? "deleted" : "not-found-or-active",
@@ -194,22 +208,92 @@ public sealed class DashboardApiKeyManagementService(
: DashboardApiKeyManagementResult.Fail("API key was not found, or is still active. Revoke it before deleting."); : DashboardApiKeyManagementResult.Fail("API key was not found, or is still active. Revoke it before deleting.");
} }
private async Task AppendAuditAsync( private string? RemoteAddress() =>
string? keyId, httpContextAccessor.HttpContext?.Connection.RemoteIpAddress?.ToString();
string eventType,
string? details, /// <summary>
/// Resolves the operator's username from the authenticated dashboard principal.
/// </summary>
/// <remarks>
/// The passed <paramref name="user"/> is preferred over the ambient HTTP context because it
/// is already in scope at every call site (the callers gate on <see cref="CanManage"/> using
/// it) and is unambiguous. Falls back to <see cref="IAuditActorAccessor.CurrentActor"/> for
/// defensive coverage, then to <c>"unknown"</c> when neither is available.
/// </remarks>
private static string ResolveOperatorActor(ClaimsPrincipal user)
{
// ZbClaimTypes.Username = "zb:username" — the canonical LDAP login name.
string? username = user.FindFirstValue(ZB.MOM.WW.Auth.AspNetCore.ZbClaimTypes.Username);
if (!string.IsNullOrWhiteSpace(username))
{
return username;
}
// Framework fallback: Identity.Name is driven by the nameClaimType on the ClaimsIdentity
// (set to ZbClaimTypes.Name = ClaimTypes.Name by DashboardAuthenticator → display name).
string? identityName = user.Identity?.Name;
if (!string.IsNullOrWhiteSpace(identityName))
{
return identityName;
}
return "unknown";
}
/// <summary>
/// Emits the dashboard's own canonical <see cref="AuditEvent"/> for a <c>dashboard-*</c> op
/// directly through the best-effort <see cref="IAuditWriter"/> (Task 2.3 #6). This is in
/// addition to the <c>create/revoke/rotate-key</c> event that <see cref="ApiKeyAdminCommands"/>
/// emits via the canonical-forwarding <c>IApiKeyAuditStore</c> adapter — the doubled-audit
/// behaviour is preserved, both rows now land in the canonical <c>audit_event</c> store.
/// </summary>
/// <remarks>
/// Phase 3 (Actor = operator principal): <c>Actor</c> is the LDAP operator who performed the
/// action (resolved from the <paramref name="user"/> principal); <c>Target</c> is the managed
/// API key id. This fixes the pre-Phase-3 semantic gap where both fields held the keyId.
/// </remarks>
private async Task WriteDashboardAuditAsync(
ClaimsPrincipal user,
string keyId,
string action,
string? detail,
CancellationToken cancellationToken) CancellationToken cancellationToken)
{ {
await auditStore.AppendAsync( AuditEvent auditEvent = new()
new ApiKeyAuditEntry( {
KeyId: keyId, EventId = Guid.NewGuid(),
EventType: eventType, OccurredAtUtc = DateTimeOffset.UtcNow,
RemoteAddress: httpContextAccessor.HttpContext?.Connection.RemoteIpAddress?.ToString(), Actor = ResolveOperatorActor(user),
Details: details), Action = action,
cancellationToken) Outcome = AuditOutcome.Success,
.ConfigureAwait(false); Category = CanonicalForwardingApiKeyAuditStore.ApiKeyCategory,
Target = keyId,
SourceNode = RemoteAddress(),
CorrelationId = ParseCorrelationId(),
DetailsJson = WrapDetail(detail),
};
await auditWriter.WriteAsync(auditEvent, cancellationToken).ConfigureAwait(false);
} }
/// <summary>
/// Derives a correlation id from the ASP.NET Core request trace identifier when it is a
/// well-formed GUID; otherwise null (the default <c>HttpContext.TraceIdentifier</c> is the
/// connection:request form, not a GUID, so it correlates to null rather than fabricating one).
/// </summary>
private Guid? ParseCorrelationId() =>
Guid.TryParse(httpContextAccessor.HttpContext?.TraceIdentifier, out Guid correlationId)
? correlationId
: null;
private static string? WrapDetail(string? detail) =>
detail is null
? null
: JsonSerializer.Serialize(new Dictionary<string, string> { ["detail"] = detail });
private static bool IsPepperUnavailable(InvalidOperationException exception) =>
exception.Message.Contains(PepperUnavailableMarker, StringComparison.OrdinalIgnoreCase);
private static string? ValidateCreateRequest(DashboardApiKeyManagementRequest request) private static string? ValidateCreateRequest(DashboardApiKeyManagementRequest request)
{ {
string? keyIdValidation = ValidateKeyId(request.KeyId); string? keyIdValidation = ValidateKeyId(request.KeyId);
@@ -248,9 +332,4 @@ public sealed class DashboardApiKeyManagementService(
? null ? null
: "API key id may contain only letters, numbers, periods, and hyphens."; : "API key id may contain only letters, numbers, periods, and hyphens.";
} }
private static string FormatApiKey(string keyId, string secret)
{
return $"mxgw_{keyId}_{secret}";
}
} }
@@ -1,14 +1,26 @@
using System.Security.Claims; using System.Security.Claims;
using System.Text; using ZB.MOM.WW.Auth.Abstractions.Ldap;
using Microsoft.Extensions.Options; using ZB.MOM.WW.Auth.Abstractions.Roles;
using ZB.MOM.WW.MxGateway.Server.Configuration; using ZB.MOM.WW.Auth.AspNetCore;
using ZB.MOM.WW.MxGateway.Server.Security.Authorization;
using Novell.Directory.Ldap;
namespace ZB.MOM.WW.MxGateway.Server.Dashboard; namespace ZB.MOM.WW.MxGateway.Server.Dashboard;
/// <summary>
/// Authenticates interactive dashboard logins against LDAP. The bind/search
/// mechanics are delegated to the shared <see cref="ILdapAuthService"/>
/// (<c>ZB.MOM.WW.Auth.Ldap</c>), which performs bind-then-search, fails closed,
/// and never throws — returning the user's display name and LDAP groups on
/// success. This class keeps the dashboard-specific policy: groups are resolved
/// to dashboard roles via <see cref="IGroupRoleMapper{TRole}"/>, a login with no
/// matching role is denied, and the resulting <see cref="ClaimsPrincipal"/> is
/// shaped exactly as before (see <see cref="CreatePrincipal"/>).
/// </summary>
/// <param name="ldapAuthService">Shared LDAP bind-then-search provider.</param>
/// <param name="roleMapper">Maps LDAP groups to dashboard roles (Task 1.1 seam).</param>
/// <param name="logger">Logger for diagnostic, credential-free login outcomes.</param>
public sealed class DashboardAuthenticator( public sealed class DashboardAuthenticator(
IOptions<GatewayOptions> options, ILdapAuthService ldapAuthService,
IGroupRoleMapper<string> roleMapper,
ILogger<DashboardAuthenticator> logger) : IDashboardAuthenticator ILogger<DashboardAuthenticator> logger) : IDashboardAuthenticator
{ {
private const string GenericFailureMessage = "The username or password is invalid, or the user is not authorized."; private const string GenericFailureMessage = "The username or password is invalid, or the user is not authorized.";
@@ -19,240 +31,72 @@ public sealed class DashboardAuthenticator(
string? password, string? password,
CancellationToken cancellationToken) CancellationToken cancellationToken)
{ {
LdapOptions ldapOptions = options.Value.Ldap; if (string.IsNullOrWhiteSpace(username) || string.IsNullOrWhiteSpace(password))
DashboardOptions dashboardOptions = options.Value.Dashboard;
if (!ldapOptions.Enabled
|| string.IsNullOrWhiteSpace(username)
|| string.IsNullOrWhiteSpace(password))
{
return DashboardAuthenticationResult.Fail(GenericFailureMessage);
}
if (!ldapOptions.UseTls && !ldapOptions.AllowInsecureLdap)
{ {
return DashboardAuthenticationResult.Fail(GenericFailureMessage); return DashboardAuthenticationResult.Fail(GenericFailureMessage);
} }
string normalizedUsername = username.Trim(); string normalizedUsername = username.Trim();
try // The shared service owns connect/bind/search and the fail-closed contract:
// it returns Fail(Disabled) when LDAP is off, enforces TLS-or-AllowInsecure via
// its startup validator, and never throws. We only translate its outcome into a
// dashboard principal here.
LdapAuthResult ldapResult = await ldapAuthService
.AuthenticateAsync(normalizedUsername, password, cancellationToken)
.ConfigureAwait(false);
if (!ldapResult.Succeeded)
{ {
using LdapConnection connection = new(); return DashboardAuthenticationResult.Fail(GenericFailureMessage);
connection.SecureSocketLayer = ldapOptions.UseTls;
await Task.Run(
() => connection.Connect(ldapOptions.Server, ldapOptions.Port),
cancellationToken)
.ConfigureAwait(false);
await BindServiceAccountAsync(connection, ldapOptions, cancellationToken).ConfigureAwait(false);
LdapEntry? candidate = await SearchUserAsync(
connection,
ldapOptions,
normalizedUsername,
cancellationToken)
.ConfigureAwait(false);
if (candidate is null)
{
return DashboardAuthenticationResult.Fail(GenericFailureMessage);
}
await Task.Run(
() => connection.Bind(candidate.Dn, password),
cancellationToken)
.ConfigureAwait(false);
await BindServiceAccountAsync(connection, ldapOptions, cancellationToken).ConfigureAwait(false);
LdapEntry? authenticatedEntry = await SearchUserAsync(
connection,
ldapOptions,
normalizedUsername,
cancellationToken)
.ConfigureAwait(false);
if (authenticatedEntry is null)
{
return DashboardAuthenticationResult.Fail(GenericFailureMessage);
}
string displayName = ReadAttribute(authenticatedEntry, ldapOptions.DisplayNameAttribute)
?? normalizedUsername;
IReadOnlyList<string> groups = ReadAttributeValues(authenticatedEntry, ldapOptions.GroupAttribute);
IReadOnlyList<string> roles = MapGroupsToRoles(groups, dashboardOptions.GroupToRole);
if (roles.Count == 0)
{
logger.LogInformation(
"LDAP dashboard login denied for user {User}: no GroupToRole mapping matched their LDAP groups.",
normalizedUsername);
return DashboardAuthenticationResult.Fail(GenericFailureMessage);
}
return DashboardAuthenticationResult.Success(CreatePrincipal(
normalizedUsername,
displayName,
groups,
roles));
} }
catch (OperationCanceledException)
{ GroupRoleMapping<string> mapping = await roleMapper
throw; .MapAsync(ldapResult.Groups, cancellationToken)
} .ConfigureAwait(false);
catch (LdapException ex)
IReadOnlyList<string> roles = mapping.Roles;
if (roles.Count == 0)
{ {
// Preserve the long-standing "no roles matched -> login denied" rule.
logger.LogInformation( logger.LogInformation(
"LDAP dashboard login rejected for user {User}: result code {ResultCode}.", "LDAP dashboard login denied for user {User}: no GroupToRole mapping matched their LDAP groups.",
normalizedUsername, ldapResult.Username);
ex.ResultCode);
return DashboardAuthenticationResult.Fail(GenericFailureMessage); return DashboardAuthenticationResult.Fail(GenericFailureMessage);
} }
catch (Exception ex)
{
logger.LogError(ex, "Unexpected LDAP dashboard login error for user {User}.", normalizedUsername);
return DashboardAuthenticationResult.Fail(GenericFailureMessage); return DashboardAuthenticationResult.Success(CreatePrincipal(
} ldapResult.Username,
} ldapResult.DisplayName,
ldapResult.Groups,
/// <summary>Escapes special characters in LDAP filter strings.</summary> roles));
/// <param name="value">The string value to escape.</param>
internal static string EscapeLdapFilter(string value)
{
StringBuilder builder = new(value.Length);
foreach (char character in value)
{
builder.Append(character switch
{
'\\' => @"\5c",
'*' => @"\2a",
'(' => @"\28",
')' => @"\29",
'\0' => @"\00",
_ => character.ToString()
});
}
return builder.ToString();
} }
/// <summary> /// <summary>
/// Maps the user's LDAP groups to dashboard roles. A user can pick up /// Builds the dashboard <see cref="ClaimsPrincipal"/> from the LDAP outcome.
/// multiple roles; Admin and Viewer are the only legal values. Returns
/// an empty list when no group matches (caller rejects the login).
/// </summary> /// </summary>
/// <param name="groups">The collection of LDAP groups the user belongs to.</param> /// <param name="username">
/// <param name="groupToRole">The mapping from group names to dashboard role names.</param> /// The (trimmed) login name. Emitted as <see cref="ClaimTypes.NameIdentifier"/> (kept for
internal static IReadOnlyList<string> MapGroupsToRoles( /// back-compat reads) and as the canonical <see cref="ZbClaimTypes.Username"/> ("zb:username").
IEnumerable<string> groups, /// </param>
IReadOnlyDictionary<string, string> groupToRole) /// <param name="displayName">
{ /// The user's display name. Emitted as <see cref="ZbClaimTypes.Name"/> (= <see cref="ClaimTypes.Name"/>
if (groupToRole.Count == 0) /// so <c>Identity.Name</c> resolves) and as <see cref="ZbClaimTypes.DisplayName"/> ("zb:displayname")
{ /// for cross-app consistency.
return []; /// </param>
} /// <param name="groups">
/// The user's LDAP groups, as returned by <see cref="ILdapAuthService"/>. NOTE
HashSet<string> roles = new(StringComparer.Ordinal); /// (review C1): these are <b>already-normalized short RDN names</b> (e.g.
foreach (string group in groups) /// <c>GwAdmin</c>), not raw distinguished names. The shared
{ /// <c>ZB.MOM.WW.Auth.Ldap</c> provider strips each group DN to its first RDN
string normalizedGroup = group.Trim(); /// value before returning it, so the <see cref="DashboardAuthenticationDefaults.LdapGroupClaimType"/>
/// claim carries the short name. This differs from the pre-cutover behaviour,
// Lookup precedence (Server-040): the full literal group string is /// which surfaced the raw <c>memberOf</c> values (full DNs) on the claim; the
// tried first; only if that misses do we fall back to the leading /// claim is informational only (no policy or UI reads its value — authorization
// RDN value (e.g. "GwAdmin" extracted from /// is role-based), so the shape change is non-breaking for dashboard consumers.
// "ou=GwAdmin,ou=groups,..."). The map's comparer is /// </param>
// OrdinalIgnoreCase (see DashboardOptions.GroupToRole), so e.g. /// <param name="roles">The dashboard roles resolved from <paramref name="groups"/>.</param>
// "GwAdmin" and "gwadmin" both match.
if (groupToRole.TryGetValue(normalizedGroup, out string? mapped)
|| groupToRole.TryGetValue(ExtractFirstRdnValue(normalizedGroup), out mapped))
{
roles.Add(mapped);
}
}
return [.. roles];
}
/// <summary>Extracts the first RDN value from a distinguished name.</summary>
/// <param name="distinguishedName">The LDAP distinguished name.</param>
internal static string ExtractFirstRdnValue(string distinguishedName)
{
int equalsIndex = distinguishedName.IndexOf('=');
if (equalsIndex < 0)
{
return distinguishedName;
}
int valueStart = equalsIndex + 1;
int commaIndex = distinguishedName.IndexOf(',', valueStart);
return commaIndex > valueStart
? distinguishedName[valueStart..commaIndex]
: distinguishedName[valueStart..];
}
private static Task BindServiceAccountAsync(
LdapConnection connection,
LdapOptions ldapOptions,
CancellationToken cancellationToken)
{
return Task.Run(
() => connection.Bind(ldapOptions.ServiceAccountDn, ldapOptions.ServiceAccountPassword),
cancellationToken);
}
private static async Task<LdapEntry?> SearchUserAsync(
LdapConnection connection,
LdapOptions ldapOptions,
string username,
CancellationToken cancellationToken)
{
string filter = $"({ldapOptions.UserNameAttribute}={EscapeLdapFilter(username)})";
ILdapSearchResults results = await Task.Run(
() => connection.Search(
ldapOptions.SearchBase,
LdapConnection.ScopeSub,
filter,
attrs: null,
typesOnly: false),
cancellationToken)
.ConfigureAwait(false);
LdapEntry? entry = null;
while (results.HasMore())
{
LdapEntry next = results.Next();
if (entry is not null)
{
return null;
}
entry = next;
}
return entry;
}
private static string? ReadAttribute(LdapEntry entry, string attributeName)
{
return ReadLdapAttribute(entry, attributeName)?.StringValue;
}
private static IReadOnlyList<string> ReadAttributeValues(LdapEntry entry, string attributeName)
{
LdapAttribute? attribute = ReadLdapAttribute(entry, attributeName);
return attribute?.StringValueArray ?? [];
}
private static LdapAttribute? ReadLdapAttribute(LdapEntry entry, string attributeName)
{
return entry.GetAttribute(attributeName)
?? entry.GetAttribute(attributeName.ToLowerInvariant())
?? entry.GetAttribute(attributeName.ToUpperInvariant());
}
private static ClaimsPrincipal CreatePrincipal( private static ClaimsPrincipal CreatePrincipal(
string username, string username,
string displayName, string displayName,
@@ -261,11 +105,21 @@ public sealed class DashboardAuthenticator(
{ {
List<Claim> claims = List<Claim> claims =
[ [
// Keep NameIdentifier so any existing read-site that uses it continues to work.
new Claim(ClaimTypes.NameIdentifier, username), new Claim(ClaimTypes.NameIdentifier, username),
new Claim(ClaimTypes.Name, displayName), // Canonical login-username claim (Task 1.5).
new Claim(ZbClaimTypes.Username, username),
// ZbClaimTypes.Name == ClaimTypes.Name — drives Identity.Name resolution.
new Claim(ZbClaimTypes.Name, displayName),
// Canonical display-name claim for cross-app consistency (Task 1.5).
new Claim(ZbClaimTypes.DisplayName, displayName),
]; ];
claims.AddRange(roles.Select(role => new Claim(ClaimTypes.Role, role))); // ZbClaimTypes.Role == ClaimTypes.Role — drives IsInRole and [Authorize(Roles=...)].
claims.AddRange(roles.Select(role => new Claim(ZbClaimTypes.Role, role)));
// Groups are short RDN names from ILdapAuthService (see param doc above), so
// this claim value is the short group name, not the original DN.
// LdapGroupClaimType is MxGateway-specific ("mxgateway:ldap_group") — no ZbClaimType for groups.
claims.AddRange(groups.Select(group => new Claim( claims.AddRange(groups.Select(group => new Claim(
DashboardAuthenticationDefaults.LdapGroupClaimType, DashboardAuthenticationDefaults.LdapGroupClaimType,
group))); group)));
@@ -273,8 +127,8 @@ public sealed class DashboardAuthenticator(
ClaimsIdentity claimsIdentity = new( ClaimsIdentity claimsIdentity = new(
claims, claims,
DashboardAuthenticationDefaults.AuthenticationScheme, DashboardAuthenticationDefaults.AuthenticationScheme,
ClaimTypes.Name, ZbClaimTypes.Name,
ClaimTypes.Role); ZbClaimTypes.Role);
return new ClaimsPrincipal(claimsIdentity); return new ClaimsPrincipal(claimsIdentity);
} }
@@ -1,7 +1,6 @@
using System.Text.Encodings.Web; using System.Text.Encodings.Web;
using Microsoft.AspNetCore.Antiforgery; using Microsoft.AspNetCore.Antiforgery;
using Microsoft.AspNetCore.Authentication; using Microsoft.AspNetCore.Authentication;
using Microsoft.AspNetCore.Http.HttpResults;
using ZB.MOM.WW.MxGateway.Server.Configuration; using ZB.MOM.WW.MxGateway.Server.Configuration;
using ZB.MOM.WW.MxGateway.Server.Dashboard.Components; using ZB.MOM.WW.MxGateway.Server.Dashboard.Components;
using ZB.MOM.WW.MxGateway.Server.Dashboard.Hubs; using ZB.MOM.WW.MxGateway.Server.Dashboard.Hubs;
@@ -25,14 +24,19 @@ public static class DashboardEndpointRouteBuilderExtensions
return endpoints; return endpoints;
} }
endpoints.MapGet( // GET /login is served by the [AllowAnonymous] Blazor <Login> component
"/login", // (Components/Pages/Login.razor → @page "/login"), which renders the shared
(HttpContext httpContext, IAntiforgery antiforgery) => GetLoginAsync(httpContext, antiforgery)) // kit's <LoginCard>. Its [AllowAnonymous] attribute overrides the
.AllowAnonymous() // RequireAuthorization(ViewerPolicy) that MapRazorComponents<App>() applies,
.WithName("DashboardLogin"); // so the cookie scheme's LoginPath="/login" redirect resolves for anonymous users.
//
// The credential POST is mapped to /auth/login, NOT /login. The @page "/login"
// Razor Components endpoint matches ALL HTTP methods, so a MapPost("/login") shared
// the "/login" route with it and every POST threw AmbiguousMatchException (HTTP 500).
// A distinct /auth/login path (as ScadaBridge does) keeps the GET page and the POST
// handler on separate routes. The <LoginCard Action="/auth/login"> form posts here.
endpoints.MapPost( endpoints.MapPost(
"/login", "/auth/login",
(HttpContext httpContext, IAntiforgery antiforgery, IDashboardAuthenticator authenticator) => (HttpContext httpContext, IAntiforgery antiforgery, IDashboardAuthenticator authenticator) =>
PostLoginAsync(httpContext, antiforgery, authenticator)) PostLoginAsync(httpContext, antiforgery, authenticator))
.AllowAnonymous() .AllowAnonymous()
@@ -92,17 +96,6 @@ public static class DashboardEndpointRouteBuilderExtensions
return endpoints; return endpoints;
} }
private static Task<ContentHttpResult> GetLoginAsync(
HttpContext httpContext,
IAntiforgery antiforgery)
{
string returnUrl = SanitizeReturnUrl(httpContext.Request.Query["returnUrl"].ToString());
return Task.FromResult(TypedResults.Content(
RenderLoginPage(httpContext, antiforgery, returnUrl, failureMessage: null),
"text/html"));
}
private static async Task<IResult> PostLoginAsync( private static async Task<IResult> PostLoginAsync(
HttpContext httpContext, HttpContext httpContext,
IAntiforgery antiforgery, IAntiforgery antiforgery,
@@ -124,10 +117,13 @@ public static class DashboardEndpointRouteBuilderExtensions
if (!result.Succeeded || result.Principal is null) if (!result.Succeeded || result.Principal is null)
{ {
return TypedResults.Content( // Round-trip the failure back to the anonymous Blazor /login page, carrying
RenderLoginPage(httpContext, antiforgery, returnUrl, result.FailureMessage), // the (sanitized) returnUrl so a successful retry still lands on the target.
"text/html", string failureMessage = result.FailureMessage
statusCode: StatusCodes.Status401Unauthorized); ?? "The username or password is invalid, or the user is not authorized.";
return Results.Redirect(
$"/login?error={Uri.EscapeDataString(failureMessage)}"
+ $"&returnUrl={Uri.EscapeDataString(returnUrl)}");
} }
await httpContext await httpContext
@@ -158,42 +154,6 @@ public static class DashboardEndpointRouteBuilderExtensions
return Results.LocalRedirect("/login"); return Results.LocalRedirect("/login");
} }
private static string RenderLoginPage(
HttpContext httpContext,
IAntiforgery antiforgery,
string returnUrl,
string? failureMessage)
{
AntiforgeryTokenSet tokens = antiforgery.GetAndStoreTokens(httpContext);
string requestToken = tokens.RequestToken ?? string.Empty;
string alert = string.IsNullOrWhiteSpace(failureMessage)
? string.Empty
: $"<p class=\"alert alert-danger\" role=\"alert\">{HtmlEncoder.Default.Encode(failureMessage)}</p>";
string body = $"""
<section class="dashboard-login">
{alert}
<form method="post" action="/login" class="card login-card">
<div class="card-body">
<input name="{tokens.FormFieldName}" type="hidden" value="{HtmlEncoder.Default.Encode(requestToken)}" />
<input name="returnUrl" type="hidden" value="{HtmlEncoder.Default.Encode(returnUrl)}" />
<div class="mb-3">
<label for="username" class="form-label">Username</label>
<input id="username" name="username" type="text" autocomplete="username" class="form-control" />
</div>
<div class="mb-3">
<label for="password" class="form-label">Password</label>
<input id="password" name="password" type="password" autocomplete="current-password" class="form-control" />
</div>
<button type="submit" class="btn btn-primary">Sign in</button>
</div>
</form>
</section>
""";
return RenderPage("Dashboard Sign In", heading: null, body);
}
private static string RenderPage(string title, string body) private static string RenderPage(string title, string body)
=> RenderPage(title, heading: title, body); => RenderPage(title, heading: title, body);
@@ -215,7 +175,8 @@ public static class DashboardEndpointRouteBuilderExtensions
<meta name="viewport" content="width=device-width, initial-scale=1" /> <meta name="viewport" content="width=device-width, initial-scale=1" />
<title>{HtmlEncoder.Default.Encode(title)}</title> <title>{HtmlEncoder.Default.Encode(title)}</title>
<link rel="stylesheet" href="/lib/bootstrap/css/bootstrap.min.css" /> <link rel="stylesheet" href="/lib/bootstrap/css/bootstrap.min.css" />
<link rel="stylesheet" href="/css/theme.css" /> <link rel="stylesheet" href="/_content/ZB.MOM.WW.Theme/css/theme.css" />
<link rel="stylesheet" href="/_content/ZB.MOM.WW.Theme/css/layout.css" />
<link rel="stylesheet" href="/css/site.css" /> <link rel="stylesheet" href="/css/site.css" />
</head> </head>
<body class="dashboard-body"> <body class="dashboard-body">
@@ -0,0 +1,31 @@
using Microsoft.Extensions.Options;
using ZB.MOM.WW.Auth.Abstractions.Roles;
using ZB.MOM.WW.MxGateway.Server.Configuration;
namespace ZB.MOM.WW.MxGateway.Server.Dashboard;
/// <summary>
/// Shared-Auth <see cref="IGroupRoleMapper{TRole}"/> seam over the dashboard's
/// LDAP-group → role mapping. Roles are plain strings
/// (<see cref="DashboardRoles.Admin"/> / <see cref="DashboardRoles.Viewer"/>),
/// so <c>TRole</c> is <see cref="string"/>. The mapping rules (full-DN first,
/// leading-RDN fallback, case-insensitive) live in
/// <see cref="DashboardGroupRoleMapping"/>, shared with
/// <see cref="DashboardAuthenticator"/> so behaviour stays identical.
/// </summary>
/// <param name="options">Gateway options supplying the dashboard GroupToRole map.</param>
public sealed class DashboardGroupRoleMapper(IOptions<GatewayOptions> options)
: IGroupRoleMapper<string>
{
/// <inheritdoc />
public Task<GroupRoleMapping<string>> MapAsync(
IReadOnlyList<string> groups,
CancellationToken ct)
{
IReadOnlyList<string> roles = DashboardGroupRoleMapping.MapGroupsToRoles(
groups,
options.Value.Dashboard.GroupToRole);
return Task.FromResult(new GroupRoleMapping<string>(roles, Scope: null));
}
}
@@ -0,0 +1,79 @@
namespace ZB.MOM.WW.MxGateway.Server.Dashboard;
/// <summary>
/// Single source of truth for mapping a user's LDAP groups to dashboard roles.
/// Both <see cref="DashboardAuthenticator"/> (the existing login flow) and
/// <see cref="DashboardGroupRoleMapper"/> (the shared-Auth
/// <see cref="ZB.MOM.WW.Auth.Abstractions.Roles.IGroupRoleMapper{TRole}"/> seam)
/// delegate here so the precedence and case rules stay identical.
/// </summary>
internal static class DashboardGroupRoleMapping
{
/// <summary>
/// Maps the user's LDAP groups to dashboard roles. A user can pick up
/// multiple roles; Admin and Viewer are the only legal values. Returns
/// an empty list when no group matches (caller rejects the login).
/// </summary>
/// <param name="groups">The collection of LDAP groups the user belongs to.</param>
/// <param name="groupToRole">The mapping from group names to dashboard role names.</param>
internal static IReadOnlyList<string> MapGroupsToRoles(
IEnumerable<string> groups,
IReadOnlyDictionary<string, string> groupToRole)
{
if (groupToRole.Count == 0)
{
return [];
}
HashSet<string> roles = new(StringComparer.Ordinal);
foreach (string group in groups)
{
string normalizedGroup = group.Trim();
// Lookup precedence (Server-040): the full literal group string is
// tried first; only if that misses do we fall back to the leading
// RDN value (e.g. "GwAdmin" extracted from
// "ou=GwAdmin,ou=groups,..."). The map's comparer is
// OrdinalIgnoreCase (see DashboardOptions.GroupToRole), so e.g.
// "GwAdmin" and "gwadmin" both match.
//
// Review C1: with the shared ZB.MOM.WW.Auth.Ldap provider, groups
// arrive here already stripped to short RDN names (the library calls
// FirstRdnValue before returning them). So through the live login path
// the full-string branch only ever sees short names and the RDN
// fallback is effectively a no-op — they collapse to the same key.
// The fallback is retained because this mapping is also reachable
// directly via the IGroupRoleMapper<string> seam (DashboardGroupRoleMapper),
// where a caller could still pass a full DN. CONSEQUENCE: configuring a
// full-DN GroupToRole *key* (e.g. "ou=GwAdmin,ou=groups,...") is
// UNSUPPORTED with the shared library — the incoming group is a short
// name, so it will never equal a full-DN key. Keep GroupToRole keys as
// short group names.
if (groupToRole.TryGetValue(normalizedGroup, out string? mapped)
|| groupToRole.TryGetValue(ExtractFirstRdnValue(normalizedGroup), out mapped))
{
roles.Add(mapped);
}
}
return [.. roles];
}
/// <summary>Extracts the first RDN value from a distinguished name.</summary>
/// <param name="distinguishedName">The LDAP distinguished name.</param>
internal static string ExtractFirstRdnValue(string distinguishedName)
{
int equalsIndex = distinguishedName.IndexOf('=');
if (equalsIndex < 0)
{
return distinguishedName;
}
int valueStart = equalsIndex + 1;
int commaIndex = distinguishedName.IndexOf(',', valueStart);
return commaIndex > valueStart
? distinguishedName[valueStart..commaIndex]
: distinguishedName[valueStart..];
}
}
@@ -8,8 +8,10 @@ public static class DashboardRoles
{ {
/// <summary> /// <summary>
/// Read-write access: API-key CRUD, settings, any state-changing UI. /// Read-write access: API-key CRUD, settings, any state-changing UI.
/// Canonical role value (Task 1.7); formerly <c>"Admin"</c> — pure value
/// rename, the operations this role authorizes are unchanged.
/// </summary> /// </summary>
public const string Admin = "Admin"; public const string Admin = "Administrator";
/// <summary> /// <summary>
/// Read-only access: all pages render but write affordances are hidden. /// Read-only access: all pages render but write affordances are hidden.
@@ -1,8 +1,12 @@
using Microsoft.AspNetCore.Authentication; using Microsoft.AspNetCore.Authentication;
using Microsoft.AspNetCore.Authentication.Cookies; using Microsoft.AspNetCore.Authentication.Cookies;
using Microsoft.AspNetCore.Authorization; using Microsoft.AspNetCore.Authorization;
using Microsoft.Extensions.Configuration;
using Microsoft.Extensions.Options; using Microsoft.Extensions.Options;
using ZB.MOM.WW.Auth.Abstractions.Roles;
using ZB.MOM.WW.Auth.AspNetCore;
using ZB.MOM.WW.MxGateway.Server.Configuration; using ZB.MOM.WW.MxGateway.Server.Configuration;
using ZB.MOM.WW.MxGateway.Server.Security.Audit;
namespace ZB.MOM.WW.MxGateway.Server.Dashboard; namespace ZB.MOM.WW.MxGateway.Server.Dashboard;
@@ -15,11 +19,25 @@ public static class DashboardServiceCollectionExtensions
/// Registers all dashboard services, authentication, and Razor components. /// Registers all dashboard services, authentication, and Razor components.
/// </summary> /// </summary>
/// <param name="services">Service collection to register services.</param> /// <param name="services">Service collection to register services.</param>
public static IServiceCollection AddGatewayDashboard(this IServiceCollection services) /// <param name="configuration">
/// Application configuration, used to bind the shared LDAP provider's options
/// from the <c>MxGateway:Ldap</c> section.
/// </param>
public static IServiceCollection AddGatewayDashboard(
this IServiceCollection services,
IConfiguration configuration)
{ {
// Dashboard logins delegate bind/search to the shared ZB.MOM.WW.Auth.Ldap
// provider. Its LdapOptions bind straight from MxGateway:Ldap (the gateway's
// LdapOptions field names match the shared options: Transport / AllowInsecure /
// SearchBase / ServiceAccount* / *Attribute). AddZbLdapAuth also adds a
// ValidateOnStart() so an insecure-transport misconfiguration fails fast at boot.
services.AddZbLdapAuth(configuration, "MxGateway:Ldap");
services.AddSingleton<IDashboardSnapshotService, DashboardSnapshotService>(); services.AddSingleton<IDashboardSnapshotService, DashboardSnapshotService>();
services.AddSingleton<IDashboardLiveDataService, DashboardLiveDataService>(); services.AddSingleton<IDashboardLiveDataService, DashboardLiveDataService>();
services.AddSingleton<IDashboardAuthenticator, DashboardAuthenticator>(); services.AddSingleton<IDashboardAuthenticator, DashboardAuthenticator>();
services.AddSingleton<IGroupRoleMapper<string>, DashboardGroupRoleMapper>();
services.AddSingleton<DashboardApiKeyAuthorization>(); services.AddSingleton<DashboardApiKeyAuthorization>();
services.AddSingleton<IDashboardApiKeyManagementService, DashboardApiKeyManagementService>(); services.AddSingleton<IDashboardApiKeyManagementService, DashboardApiKeyManagementService>();
services.AddSingleton<IDashboardSessionAdminService, DashboardSessionAdminService>(); services.AddSingleton<IDashboardSessionAdminService, DashboardSessionAdminService>();
@@ -30,6 +48,7 @@ public static class DashboardServiceCollectionExtensions
services.AddHostedService<Hubs.DashboardSnapshotPublisher>(); services.AddHostedService<Hubs.DashboardSnapshotPublisher>();
services.AddHostedService<Hubs.AlarmsHubPublisher>(); services.AddHostedService<Hubs.AlarmsHubPublisher>();
services.AddHttpContextAccessor(); services.AddHttpContextAccessor();
services.AddSingleton<IAuditActorAccessor, HttpAuditActorAccessor>();
services.AddAntiforgery(); services.AddAntiforgery();
services.AddCascadingAuthenticationState(); services.AddCascadingAuthenticationState();
services.AddRazorComponents() services.AddRazorComponents()
@@ -40,29 +59,42 @@ public static class DashboardServiceCollectionExtensions
.AddAuthentication(DashboardAuthenticationDefaults.AuthenticationScheme) .AddAuthentication(DashboardAuthenticationDefaults.AuthenticationScheme)
.AddCookie(DashboardAuthenticationDefaults.AuthenticationScheme, cookieOptions => .AddCookie(DashboardAuthenticationDefaults.AuthenticationScheme, cookieOptions =>
{ {
// Hardened defaults (HttpOnly, SameSite=Strict, SecurePolicy, SlidingExpiration,
// ExpireTimeSpan) via the shared ZbCookieDefaults.Apply. requireHttps is set to
// its default (true / Always) here and overridden per-environment by the
// PostConfigure below; the 8-hour idle timeout is preserved (not the 30-min default).
ZbCookieDefaults.Apply(cookieOptions, requireHttps: true, idleTimeout: TimeSpan.FromHours(8));
// Cookie name, path, and redirect paths are MxGateway-specific — set after Apply
// so they are never overwritten by the shared helper (Apply intentionally skips name).
// This is the canonical default; it is overridden per-environment from
// DashboardOptions.CookieName by the PostConfigure below.
cookieOptions.Cookie.Name = DashboardAuthenticationDefaults.CookieName; cookieOptions.Cookie.Name = DashboardAuthenticationDefaults.CookieName;
cookieOptions.Cookie.HttpOnly = true;
cookieOptions.Cookie.SameSite = SameSiteMode.Strict;
// SecurePolicy is bound via PostConfigure below so it can honour
// DashboardOptions.RequireHttpsCookie (default Always; dev HTTP
// deployments set RequireHttpsCookie=false to use SameAsRequest).
cookieOptions.Cookie.Path = "/"; cookieOptions.Cookie.Path = "/";
cookieOptions.LoginPath = "/login"; cookieOptions.LoginPath = "/login";
cookieOptions.LogoutPath = "/logout"; cookieOptions.LogoutPath = "/logout";
cookieOptions.AccessDeniedPath = "/denied"; cookieOptions.AccessDeniedPath = "/denied";
cookieOptions.ExpireTimeSpan = TimeSpan.FromHours(8);
cookieOptions.SlidingExpiration = true;
}) })
.AddScheme<AuthenticationSchemeOptions, HubTokenAuthenticationHandler>( .AddScheme<AuthenticationSchemeOptions, HubTokenAuthenticationHandler>(
DashboardAuthenticationDefaults.HubAuthenticationScheme, DashboardAuthenticationDefaults.HubAuthenticationScheme,
_ => { }); _ => { });
// Honour DashboardOptions.RequireHttpsCookie (default true / Always; set false for dev
// HTTP deployments → SameAsRequest) and the optional per-environment cookie-name
// override. Both run after the inline AddCookie config above, so they win.
services.AddOptions<CookieAuthenticationOptions>(DashboardAuthenticationDefaults.AuthenticationScheme) services.AddOptions<CookieAuthenticationOptions>(DashboardAuthenticationDefaults.AuthenticationScheme)
.Configure<IOptions<GatewayOptions>>((cookieOptions, gatewayOptions) => .Configure<IOptions<GatewayOptions>>((cookieOptions, gatewayOptions) =>
{ {
cookieOptions.Cookie.SecurePolicy = gatewayOptions.Value.Dashboard.RequireHttpsCookie cookieOptions.Cookie.SecurePolicy = gatewayOptions.Value.Dashboard.RequireHttpsCookie
? CookieSecurePolicy.Always ? CookieSecurePolicy.Always
: CookieSecurePolicy.SameAsRequest; : CookieSecurePolicy.SameAsRequest;
// Config-driven cookie name (MxGateway:Dashboard:CookieName). Null/blank keeps
// the canonical default set above, so a misconfiguration cannot unname the cookie.
var cookieName = gatewayOptions.Value.Dashboard.CookieName;
if (!string.IsNullOrWhiteSpace(cookieName))
{
cookieOptions.Cookie.Name = cookieName;
}
}); });
services.AddAuthorization(authorization => services.AddAuthorization(authorization =>
@@ -2,6 +2,7 @@ using System.Runtime.CompilerServices;
using Microsoft.Extensions.Logging; using Microsoft.Extensions.Logging;
using Microsoft.Extensions.Logging.Abstractions; using Microsoft.Extensions.Logging.Abstractions;
using Microsoft.Extensions.Options; using Microsoft.Extensions.Options;
using ZB.MOM.WW.Auth.Abstractions.ApiKeys;
using ZB.MOM.WW.MxGateway.Server.Configuration; using ZB.MOM.WW.MxGateway.Server.Configuration;
using ZB.MOM.WW.MxGateway.Server.Galaxy; using ZB.MOM.WW.MxGateway.Server.Galaxy;
using ZB.MOM.WW.MxGateway.Server.Metrics; using ZB.MOM.WW.MxGateway.Server.Metrics;
@@ -242,7 +243,7 @@ public sealed class DashboardSnapshotService : IDashboardSnapshotService
KeyId: key.KeyId, KeyId: key.KeyId,
DisplayName: key.DisplayName, DisplayName: key.DisplayName,
Scopes: key.Scopes, Scopes: key.Scopes,
Constraints: key.Constraints, Constraints: ApiKeyConstraintSerializer.Deserialize(key.ConstraintsJson),
CreatedUtc: key.CreatedUtc, CreatedUtc: key.CreatedUtc,
LastUsedUtc: key.LastUsedUtc, LastUsedUtc: key.LastUsedUtc,
RevokedUtc: key.RevokedUtc)) RevokedUtc: key.RevokedUtc))
@@ -0,0 +1,40 @@
using Microsoft.Data.Sqlite;
using Microsoft.Extensions.Diagnostics.HealthChecks;
using ZB.MOM.WW.Auth.ApiKeys.Sqlite;
namespace ZB.MOM.WW.MxGateway.Server.Diagnostics;
/// <summary>
/// Readiness probe: verifies the SQLite authentication store is reachable. The gateway
/// authenticates every gRPC call against this store, so its reachability gates readiness.
/// </summary>
public sealed class AuthStoreHealthCheck : IHealthCheck
{
private readonly AuthSqliteConnectionFactory _connectionFactory;
public AuthStoreHealthCheck(AuthSqliteConnectionFactory connectionFactory) =>
_connectionFactory = connectionFactory ?? throw new ArgumentNullException(nameof(connectionFactory));
public async Task<HealthCheckResult> CheckHealthAsync(
HealthCheckContext context,
CancellationToken cancellationToken = default)
{
try
{
await using SqliteConnection connection =
await _connectionFactory.OpenConnectionAsync(cancellationToken).ConfigureAwait(false);
await using SqliteCommand command = connection.CreateCommand();
command.CommandText = "SELECT 1;";
await command.ExecuteScalarAsync(cancellationToken).ConfigureAwait(false);
return HealthCheckResult.Healthy("Auth store is reachable.");
}
catch (OperationCanceledException) when (cancellationToken.IsCancellationRequested)
{
throw;
}
catch (Exception ex)
{
return HealthCheckResult.Unhealthy("Auth store is unreachable.", ex);
}
}
}
@@ -0,0 +1,28 @@
using ZB.MOM.WW.Telemetry.Serilog;
namespace ZB.MOM.WW.MxGateway.Server.Diagnostics;
/// <summary>
/// Adapts the static <see cref="GatewayLogRedactor"/> to the shared <see cref="ILogRedactor"/> seam
/// so the telemetry RedactionEnricher masks API-key/credential material on every log event.
/// </summary>
public sealed class GatewayLogRedactorSeam : ILogRedactor
{
private static readonly string[] IdentityKeys = ["ClientIdentity", "authorization", "Authorization"];
/// <summary>
/// Masks API-key/credential material in known identity-bearing log properties.
/// </summary>
/// <param name="properties">The log event property dictionary to redact in place.</param>
public void Redact(IDictionary<string, object?> properties)
{
ArgumentNullException.ThrowIfNull(properties);
foreach (var key in IdentityKeys)
{
if (properties.TryGetValue(key, out var value) && value is string s)
{
properties[key] = GatewayLogRedactor.RedactClientIdentity(s);
}
}
}
}
@@ -0,0 +1,36 @@
namespace ZB.MOM.WW.MxGateway.Server.Galaxy;
/// <summary>
/// One alarm-bearing attribute discovered by
/// <see cref="GalaxyRepository.GetAlarmAttributesAsync"/>: an attribute whose owning
/// object configures an <c>AlarmExtension</c> primitive (the same
/// <c>is_alarm</c> detection used by <see cref="GalaxyRepository.GetAttributesAsync"/>).
/// Used to build the subtag-fallback watch-list.
/// </summary>
public sealed class GalaxyAlarmAttributeRow
{
/// <summary>
/// Gets the alarm-bearing attribute reference (e.g. <c>Tank01.Level.HiHi</c>),
/// matching the <c>full_tag_reference</c> projection of
/// <see cref="GalaxyRepository.GetAttributesAsync"/>.
/// </summary>
public string FullTagReference { get; init; } = string.Empty;
/// <summary>
/// Gets the owning object reference (e.g. <c>Tank01</c>). This is the Galaxy
/// <c>tag_name</c> — the segment that precedes the first attribute dot in
/// <see cref="FullTagReference"/>.
/// </summary>
public string SourceObjectReference { get; init; } = string.Empty;
/// <summary>
/// Gets the writable ack-comment attribute address.
/// <para>
/// The Galaxy Repository schema does not expose an ack-comment subtag address
/// directly, so this is always <see cref="string.Empty"/> here. The watch-list
/// resolver (a later task) composes the concrete address from configuration plus
/// <see cref="SourceObjectReference"/> / <see cref="FullTagReference"/>.
/// </para>
/// </summary>
public string AckCommentSubtag { get; init; } = string.Empty;
}
@@ -114,6 +114,56 @@ public sealed class GalaxyRepository(GalaxyRepositoryOptions options) : IGalaxyR
return rows; return rows;
} }
/// <summary>
/// Retrieves only the alarm-bearing attributes for the subtag-fallback watch-list.
/// Alarm detection is identical to <see cref="GetAttributesAsync"/>: a row is
/// alarm-bearing when its owning object configures an <c>AlarmExtension</c>
/// primitive (the same <c>is_alarm</c> projection, here applied as a SQL filter).
/// </summary>
/// <param name="ct">Token to cancel the asynchronous operation.</param>
public async Task<List<GalaxyAlarmAttributeRow>> GetAlarmAttributesAsync(CancellationToken ct = default)
{
List<GalaxyAlarmAttributeRow> rows = new();
using SqlConnection conn = new(options.ConnectionString);
await conn.OpenAsync(ct).ConfigureAwait(false);
using SqlCommand cmd = new(AlarmAttributesSql, conn) { CommandTimeout = options.CommandTimeoutSeconds };
using SqlDataReader reader = await cmd.ExecuteReaderAsync(ct).ConfigureAwait(false);
while (await reader.ReadAsync(ct).ConfigureAwait(false))
{
rows.Add(MapAlarmRow(
fullTagReference: reader.GetString(0),
sourceObjectReference: reader.GetString(1)));
}
return rows;
}
/// <summary>
/// Maps a raw alarm-attribute reader row to a <see cref="GalaxyAlarmAttributeRow"/>.
/// <para>
/// <paramref name="sourceObjectReference"/> is the Galaxy <c>tag_name</c> (the
/// owning object), and <paramref name="fullTagReference"/> is
/// <c>tag_name + '.' + attribute_name</c> — the same composition the
/// <c>full_tag_reference</c> projection of <see cref="AttributesSql"/> produces.
/// <see cref="GalaxyAlarmAttributeRow.AckCommentSubtag"/> is left empty here; the
/// schema does not expose an ack-comment address and the watch-list resolver
/// composes it later.
/// </para>
/// Exposed internally so the derivation can be unit-tested without a database.
/// </summary>
/// <param name="fullTagReference">The alarm-bearing attribute reference.</param>
/// <param name="sourceObjectReference">The owning object reference (tag name).</param>
internal static GalaxyAlarmAttributeRow MapAlarmRow(
string fullTagReference,
string sourceObjectReference) => new()
{
FullTagReference = fullTagReference,
SourceObjectReference = sourceObjectReference,
AckCommentSubtag = string.Empty,
};
private const string HierarchySql = @" private const string HierarchySql = @"
;WITH template_chain AS ( ;WITH template_chain AS (
SELECT g.gobject_id AS instance_gobject_id, t.gobject_id AS template_gobject_id, SELECT g.gobject_id AS instance_gobject_id, t.gobject_id AS template_gobject_id,
@@ -248,5 +298,71 @@ SELECT
FROM ranked r FROM ranked r
LEFT JOIN data_type dt ON dt.mx_data_type = r.mx_data_type LEFT JOIN data_type dt ON dt.mx_data_type = r.mx_data_type
WHERE r.rn = 1 WHERE r.rn = 1
ORDER BY r.tag_name, r.attribute_name";
// Alarm-only discovery for the subtag-fallback watch-list. This deliberately reuses the
// exact candidate/ranked CTE structure and the same `AlarmExtension`-based is_alarm
// detection as AttributesSql so the two queries cannot drift: a row qualifies only when
// its user attribute (src_pri 0) anchors an `AlarmExtension` primitive on the owning
// object. It projects just what the watch-list needs — full_tag_reference (tag_name +
// '.' + attribute_name, matching AttributesSql) and the owning object's tag_name as
// source_object_reference. The array `[]` suffix is intentionally omitted: an
// alarm-bearing attribute is a scalar anchor, not an array body.
private const string AlarmAttributesSql = @"
;WITH deployed_package_chain AS (
SELECT g.gobject_id, p.package_id, p.derived_from_package_id, 0 AS depth
FROM gobject g
INNER JOIN package p ON p.package_id = g.deployed_package_id
WHERE g.is_template = 0 AND g.deployed_package_id <> 0
UNION ALL
SELECT dpc.gobject_id, p.package_id, p.derived_from_package_id, dpc.depth + 1
FROM deployed_package_chain dpc
INNER JOIN package p ON p.package_id = dpc.derived_from_package_id
WHERE dpc.derived_from_package_id <> 0 AND dpc.depth < 10
),
candidate AS (
SELECT
dpc.gobject_id, g.tag_name, da.attribute_name, dpc.depth, 0 AS src_pri
FROM deployed_package_chain dpc
INNER JOIN dynamic_attribute da ON da.package_id = dpc.package_id
INNER JOIN gobject g ON g.gobject_id = dpc.gobject_id
INNER JOIN template_definition td ON td.template_definition_id = g.template_definition_id
WHERE td.category_id IN (1, 3, 4, 10, 11, 13, 17, 24, 26)
AND da.attribute_name NOT LIKE '[_]%'
AND da.attribute_name NOT LIKE '%.Description'
AND da.mx_attribute_category IN (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 24)
UNION ALL
SELECT
dpc.gobject_id, g.tag_name,
CASE WHEN pi.primitive_name IS NULL OR pi.primitive_name = ''
THEN ad.attribute_name
ELSE pi.primitive_name + '.' + ad.attribute_name END AS attribute_name,
dpc.depth, 1 AS src_pri
FROM deployed_package_chain dpc
INNER JOIN primitive_instance pi ON pi.package_id = dpc.package_id
INNER JOIN attribute_definition ad ON ad.primitive_definition_id = pi.primitive_definition_id
INNER JOIN gobject g ON g.gobject_id = dpc.gobject_id
INNER JOIN template_definition td ON td.template_definition_id = g.template_definition_id
WHERE td.category_id IN (1, 3, 4, 10, 11, 13, 17, 24, 26)
AND ad.attribute_name NOT LIKE '[_]%'
AND ad.attribute_name NOT LIKE '%.Description'
),
ranked AS (
SELECT c.*, ROW_NUMBER() OVER (
PARTITION BY c.gobject_id, c.attribute_name ORDER BY c.src_pri, c.depth) AS rn
FROM candidate c
)
SELECT
r.tag_name + '.' + r.attribute_name AS full_tag_reference,
r.tag_name AS source_object_reference
FROM ranked r
WHERE r.rn = 1
AND r.src_pri = 0
AND EXISTS (
SELECT 1 FROM deployed_package_chain dpc2
INNER JOIN primitive_instance pi ON pi.package_id = dpc2.package_id AND pi.primitive_name = r.attribute_name
INNER JOIN primitive_definition pd ON pd.primitive_definition_id = pi.primitive_definition_id AND pd.primitive_name = 'AlarmExtension'
WHERE dpc2.gobject_id = r.gobject_id
)
ORDER BY r.tag_name, r.attribute_name"; ORDER BY r.tag_name, r.attribute_name";
} }
@@ -27,4 +27,12 @@ public interface IGalaxyRepository
/// <summary>Retrieves all attributes for Galaxy objects from the repository.</summary> /// <summary>Retrieves all attributes for Galaxy objects from the repository.</summary>
/// <param name="ct">Token to cancel the asynchronous operation.</param> /// <param name="ct">Token to cancel the asynchronous operation.</param>
Task<List<GalaxyAttributeRow>> GetAttributesAsync(CancellationToken ct = default); Task<List<GalaxyAttributeRow>> GetAttributesAsync(CancellationToken ct = default);
/// <summary>
/// Retrieves only the alarm-bearing attributes (those whose owning object
/// configures an <c>AlarmExtension</c> primitive) for building the
/// subtag-fallback watch-list.
/// </summary>
/// <param name="ct">Token to cancel the asynchronous operation.</param>
Task<List<GalaxyAlarmAttributeRow>> GetAlarmAttributesAsync(CancellationToken ct = default);
} }
@@ -2,6 +2,7 @@ using System.Security.Cryptography.X509Certificates;
using Microsoft.AspNetCore.Hosting.StaticWebAssets; using Microsoft.AspNetCore.Hosting.StaticWebAssets;
using Microsoft.Extensions.Logging; using Microsoft.Extensions.Logging;
using Microsoft.Extensions.Logging.Configuration; using Microsoft.Extensions.Logging.Configuration;
using ZB.MOM.WW.Health;
using ZB.MOM.WW.MxGateway.Contracts; using ZB.MOM.WW.MxGateway.Contracts;
using ZB.MOM.WW.MxGateway.Server.Alarms; using ZB.MOM.WW.MxGateway.Server.Alarms;
using ZB.MOM.WW.MxGateway.Server.Configuration; using ZB.MOM.WW.MxGateway.Server.Configuration;
@@ -14,6 +15,8 @@ using ZB.MOM.WW.MxGateway.Server.Security.Authentication;
using ZB.MOM.WW.MxGateway.Server.Security.Authorization; using ZB.MOM.WW.MxGateway.Server.Security.Authorization;
using ZB.MOM.WW.MxGateway.Server.Sessions; using ZB.MOM.WW.MxGateway.Server.Sessions;
using ZB.MOM.WW.MxGateway.Server.Workers; using ZB.MOM.WW.MxGateway.Server.Workers;
using ZB.MOM.WW.Telemetry;
using ZB.MOM.WW.Telemetry.Serilog;
namespace ZB.MOM.WW.MxGateway.Server; namespace ZB.MOM.WW.MxGateway.Server;
@@ -60,18 +63,35 @@ public static class GatewayApplication
ConfigureSelfSignedTls(builder); ConfigureSelfSignedTls(builder);
builder.Services.AddGatewayConfiguration(); builder.AddZbSerilog(o => o.ServiceName = "mxgateway");
builder.Services.AddSqliteAuthStore();
builder.Services.AddGatewayConfiguration(builder.Configuration);
builder.Services.AddSqliteAuthStore(builder.Configuration);
builder.Services.AddGatewayGrpcAuthorization(); builder.Services.AddGatewayGrpcAuthorization();
builder.Services.AddHealthChecks(); builder.Services.AddHealthChecks()
.AddTypeActivatedCheck<AuthStoreHealthCheck>(
"auth-store",
failureStatus: null,
tags: new[] { ZbHealthTags.Ready });
builder.Services.AddSingleton<GatewayMetrics>(); builder.Services.AddSingleton<GatewayMetrics>();
builder.AddZbTelemetry(o =>
{
o.ServiceName = "mxgateway";
o.Meters = [GatewayMetrics.MeterName]; // "MxGateway.Server" — name unchanged
if (Enum.TryParse<ZbExporter>(builder.Configuration["MxGateway:Telemetry:Exporter"], ignoreCase: true, out var exporter))
o.Exporter = exporter;
var otlp = builder.Configuration["MxGateway:Telemetry:OtlpEndpoint"];
if (!string.IsNullOrWhiteSpace(otlp))
o.OtlpEndpoint = otlp;
});
builder.Services.AddSingleton<ILogRedactor, GatewayLogRedactorSeam>();
builder.Services.AddSingleton<MxAccessGrpcMapper>(); builder.Services.AddSingleton<MxAccessGrpcMapper>();
builder.Services.AddSingleton<MxAccessGrpcRequestValidator>(); builder.Services.AddSingleton<MxAccessGrpcRequestValidator>();
builder.Services.AddSingleton<IEventStreamService, EventStreamService>(); builder.Services.AddSingleton<IEventStreamService, EventStreamService>();
builder.Services.AddWorkerProcessLauncher(); builder.Services.AddWorkerProcessLauncher();
builder.Services.AddGatewaySessions(); builder.Services.AddGatewaySessions();
builder.Services.AddGatewayAlarms(); builder.Services.AddGatewayAlarms();
builder.Services.AddGatewayDashboard(); builder.Services.AddGatewayDashboard(builder.Configuration);
builder.Services.AddGalaxyRepository(); builder.Services.AddGalaxyRepository();
return builder; return builder;
@@ -169,13 +189,8 @@ public static class GatewayApplication
{ {
endpoints.MapStaticAssets(ResolveStaticAssetsManifestPath()); endpoints.MapStaticAssets(ResolveStaticAssetsManifestPath());
endpoints.MapGet( endpoints.MapZbHealth();
"/health/live", endpoints.MapZbMetrics();
() => Results.Ok(new GatewayHealthReply(
Status: "Healthy",
DefaultBackend: GatewayContractInfo.DefaultBackendName,
WorkerProtocolVersion: GatewayContractInfo.WorkerProtocolVersion)))
.WithName("LiveHealth");
endpoints.MapGrpcService<MxAccessGatewayService>(); endpoints.MapGrpcService<MxAccessGatewayService>();
endpoints.MapGrpcService<GalaxyRepositoryGrpcService>(); endpoints.MapGrpcService<GalaxyRepositoryGrpcService>();
@@ -1,11 +1,12 @@
using System.Collections.Concurrent; using System.Collections.Concurrent;
using System.Diagnostics.Metrics; using System.Diagnostics.Metrics;
using System.Globalization;
namespace ZB.MOM.WW.MxGateway.Server.Metrics; namespace ZB.MOM.WW.MxGateway.Server.Metrics;
public sealed class GatewayMetrics : IDisposable public sealed class GatewayMetrics : IDisposable
{ {
public const string MeterName = "MxGateway.Server"; public const string MeterName = "ZB.MOM.WW.MxGateway";
private readonly object _syncRoot = new(); private readonly object _syncRoot = new();
private readonly Meter _meter; private readonly Meter _meter;
@@ -22,6 +23,7 @@ public sealed class GatewayMetrics : IDisposable
private readonly Counter<long> _heartbeatFailuresCounter; private readonly Counter<long> _heartbeatFailuresCounter;
private readonly Counter<long> _streamDisconnectsCounter; private readonly Counter<long> _streamDisconnectsCounter;
private readonly Counter<long> _retryAttemptsCounter; private readonly Counter<long> _retryAttemptsCounter;
private readonly Counter<long> _alarmProviderSwitchesCounter;
private readonly Histogram<double> _workerStartupLatencyHistogram; private readonly Histogram<double> _workerStartupLatencyHistogram;
private readonly Histogram<double> _commandLatencyHistogram; private readonly Histogram<double> _commandLatencyHistogram;
private readonly Histogram<double> _eventStreamSendLatencyHistogram; private readonly Histogram<double> _eventStreamSendLatencyHistogram;
@@ -34,6 +36,7 @@ public sealed class GatewayMetrics : IDisposable
private int _workersRunning; private int _workersRunning;
private int _workerEventQueueDepth; private int _workerEventQueueDepth;
private int _grpcEventStreamQueueDepth; private int _grpcEventStreamQueueDepth;
private int _alarmProviderMode;
private long _sessionsOpened; private long _sessionsOpened;
private long _sessionsClosed; private long _sessionsClosed;
private long _commandsStarted; private long _commandsStarted;
@@ -68,14 +71,16 @@ public sealed class GatewayMetrics : IDisposable
_heartbeatFailuresCounter = _meter.CreateCounter<long>("mxgateway.heartbeats.failed"); _heartbeatFailuresCounter = _meter.CreateCounter<long>("mxgateway.heartbeats.failed");
_streamDisconnectsCounter = _meter.CreateCounter<long>("mxgateway.grpc.streams.disconnected"); _streamDisconnectsCounter = _meter.CreateCounter<long>("mxgateway.grpc.streams.disconnected");
_retryAttemptsCounter = _meter.CreateCounter<long>("mxgateway.retries.attempted"); _retryAttemptsCounter = _meter.CreateCounter<long>("mxgateway.retries.attempted");
_workerStartupLatencyHistogram = _meter.CreateHistogram<double>("mxgateway.workers.startup.duration", "ms"); _alarmProviderSwitchesCounter = _meter.CreateCounter<long>("mxgateway.alarms.provider_switches");
_commandLatencyHistogram = _meter.CreateHistogram<double>("mxgateway.commands.duration", "ms"); _workerStartupLatencyHistogram = _meter.CreateHistogram<double>("mxgateway.workers.startup.duration", "s");
_eventStreamSendLatencyHistogram = _meter.CreateHistogram<double>("mxgateway.events.stream_send.duration", "ms"); _commandLatencyHistogram = _meter.CreateHistogram<double>("mxgateway.commands.duration", "s");
_eventStreamSendLatencyHistogram = _meter.CreateHistogram<double>("mxgateway.events.stream_send.duration", "s");
_meter.CreateObservableGauge("mxgateway.sessions.open", GetOpenSessions); _meter.CreateObservableGauge("mxgateway.sessions.open", GetOpenSessions);
_meter.CreateObservableGauge("mxgateway.workers.running", GetWorkersRunning); _meter.CreateObservableGauge("mxgateway.workers.running", GetWorkersRunning);
_meter.CreateObservableGauge("mxgateway.events.worker_queue.depth", GetWorkerEventQueueDepth); _meter.CreateObservableGauge("mxgateway.events.worker_queue.depth", GetWorkerEventQueueDepth);
_meter.CreateObservableGauge("mxgateway.events.grpc_stream_queue.depth", GetGrpcEventStreamQueueDepth); _meter.CreateObservableGauge("mxgateway.events.grpc_stream_queue.depth", GetGrpcEventStreamQueueDepth);
_meter.CreateObservableGauge("mxgateway.alarms.provider_mode", GetAlarmProviderMode);
} }
/// <summary> /// <summary>
@@ -144,7 +149,7 @@ public sealed class GatewayMetrics : IDisposable
_workersRunning++; _workersRunning++;
} }
_workerStartupLatencyHistogram.Record(startupDuration.TotalMilliseconds); _workerStartupLatencyHistogram.Record(startupDuration.TotalSeconds);
} }
/// <summary> /// <summary>
@@ -208,7 +213,7 @@ public sealed class GatewayMetrics : IDisposable
KeyValuePair<string, object?> methodTag = new("method", method); KeyValuePair<string, object?> methodTag = new("method", method);
_commandsSucceededCounter.Add(1, methodTag); _commandsSucceededCounter.Add(1, methodTag);
_commandLatencyHistogram.Record(duration.TotalMilliseconds, methodTag); _commandLatencyHistogram.Record(duration.TotalSeconds, methodTag);
} }
/// <summary> /// <summary>
@@ -228,7 +233,7 @@ public sealed class GatewayMetrics : IDisposable
KeyValuePair<string, object?> methodTag = new("method", method); KeyValuePair<string, object?> methodTag = new("method", method);
KeyValuePair<string, object?> categoryTag = new("category", category); KeyValuePair<string, object?> categoryTag = new("category", category);
_commandsFailedCounter.Add(1, methodTag, categoryTag); _commandsFailedCounter.Add(1, methodTag, categoryTag);
_commandLatencyHistogram.Record(duration.TotalMilliseconds, methodTag, categoryTag); _commandLatencyHistogram.Record(duration.TotalSeconds, methodTag, categoryTag);
} }
/// <summary> /// <summary>
@@ -255,7 +260,7 @@ public sealed class GatewayMetrics : IDisposable
public void RecordEventStreamSend(string family, TimeSpan duration) public void RecordEventStreamSend(string family, TimeSpan duration)
{ {
_eventStreamSendLatencyHistogram.Record( _eventStreamSendLatencyHistogram.Record(
duration.TotalMilliseconds, duration.TotalSeconds,
new KeyValuePair<string, object?>("family", family)); new KeyValuePair<string, object?>("family", family));
} }
@@ -377,6 +382,32 @@ public sealed class GatewayMetrics : IDisposable
_retryAttemptsCounter.Add(1, new KeyValuePair<string, object?>("area", area)); _retryAttemptsCounter.Add(1, new KeyValuePair<string, object?>("area", area));
} }
/// <summary>
/// Records that the alarm provider switched modes and updates the current provider mode gauge.
/// </summary>
/// <param name="fromMode">Provider mode before the switch (1=alarmmgr, 2=subtag, 0=unknown).</param>
/// <param name="toMode">Provider mode after the switch (1=alarmmgr, 2=subtag, 0=unknown).</param>
/// <param name="reason">Human-readable reason for the switch.</param>
public void AlarmProviderSwitched(int fromMode, int toMode, string reason)
{
lock (_syncRoot)
{
_alarmProviderMode = toMode;
}
_alarmProviderSwitchesCounter.Add(
1,
new KeyValuePair<string, object?>("from", fromMode.ToString(CultureInfo.InvariantCulture)),
new KeyValuePair<string, object?>("to", toMode.ToString(CultureInfo.InvariantCulture)),
new KeyValuePair<string, object?>("reason", reason));
}
/// <summary>Sets the current alarm provider-mode gauge without recording a switch (e.g. startup baseline).</summary>
public void SetAlarmProviderMode(int mode)
{
lock (_syncRoot) { _alarmProviderMode = mode; }
}
/// <summary> /// <summary>
/// Returns a snapshot of all current metric values. /// Returns a snapshot of all current metric values.
/// </summary> /// </summary>
@@ -455,6 +486,14 @@ public sealed class GatewayMetrics : IDisposable
} }
} }
private int GetAlarmProviderMode()
{
lock (_syncRoot)
{
return _alarmProviderMode;
}
}
private static void Increment(Dictionary<string, long> values, string key) private static void Increment(Dictionary<string, long> values, string key)
{ {
values.TryGetValue(key, out long currentValue); values.TryGetValue(key, out long currentValue);
@@ -0,0 +1,42 @@
using ZB.MOM.WW.Audit;
namespace ZB.MOM.WW.MxGateway.Server.Security.Audit;
/// <summary>
/// Best-effort <see cref="IAuditWriter"/> over the MxGateway-owned
/// <see cref="SqliteCanonicalAuditStore"/>. It honours the canonical
/// <see cref="IAuditWriter"/> contract: a failed audit write is swallowed and logged
/// rather than propagated, so it can never abort the user-facing action that produced it.
/// </summary>
/// <remarks>
/// This is the single sink through which ALL MxGateway audit flows — the library admin
/// verbs (via <see cref="CanonicalForwardingApiKeyAuditStore"/>) and the gateway's own
/// dashboard / constraint-denial producers, which write canonical events directly. The
/// best-effort wrapping here also closes the gap that the library's
/// <c>SqliteApiKeyAuditStore.AppendAsync</c> propagated exceptions.
/// </remarks>
public sealed class CanonicalAuditWriter(
SqliteCanonicalAuditStore store,
ILogger<CanonicalAuditWriter> logger) : IAuditWriter
{
/// <inheritdoc />
public async Task WriteAsync(AuditEvent auditEvent, CancellationToken cancellationToken = default)
{
ArgumentNullException.ThrowIfNull(auditEvent);
try
{
await store.InsertAsync(auditEvent, cancellationToken).ConfigureAwait(false);
}
catch (Exception exception)
{
// Best-effort: a failed audit write must never abort the action that produced it.
// Swallow everything (including OperationCanceledException) and log for diagnosis.
logger.LogWarning(
exception,
"Failed to persist audit event {EventId} (action {Action}); audit write is best-effort and was suppressed.",
auditEvent.EventId,
auditEvent.Action);
}
}
}
@@ -0,0 +1,143 @@
using System.Text.Json;
using ZB.MOM.WW.Audit;
using ZB.MOM.WW.Auth.Abstractions.ApiKeys;
namespace ZB.MOM.WW.MxGateway.Server.Security.Audit;
/// <summary>
/// Adapter that overrides the shared library's <see cref="IApiKeyAuditStore"/> so that
/// library-emitted API-key audit events (CLI / admin verbs from
/// <c>ApiKeyAdminCommands</c>) are canonicalized onto <see cref="AuditEvent"/> and routed
/// through the gateway's <see cref="IAuditWriter"/> into the canonical
/// <c>audit_event</c> store.
/// </summary>
/// <remarks>
/// Overriding the registered <see cref="IApiKeyAuditStore"/> is the ONLY way to
/// canonicalize the library-internal <c>ApiKeyAdminCommands</c> events, since that type
/// cannot be edited. <see cref="ListRecentAsync"/> reads back from the canonical store
/// and maps each <see cref="AuditEvent"/> to an <see cref="ApiKeyAuditEntry"/> so the
/// existing dashboard "recent audit" view (and the CLI/store tests) keep working through
/// this same seam, unchanged.
/// <para>
/// The library's own <c>api_key_audit</c> table is left in place but UNUSED — nothing
/// writes to it once this adapter overrides the library's <c>SqliteApiKeyAuditStore</c>
/// registration.
/// </para>
/// </remarks>
public sealed class CanonicalForwardingApiKeyAuditStore(
IAuditWriter auditWriter,
SqliteCanonicalAuditStore store) : IApiKeyAuditStore
{
/// <summary>The canonical <see cref="AuditEvent.Category"/> assigned to API-key events.</summary>
public const string ApiKeyCategory = "ApiKey";
/// <summary>Actor used for the library's keyless <c>init-db</c> event.</summary>
private const string SystemActor = "system";
/// <summary>Actor used for any other keyless (CLI-originated) library event.</summary>
private const string CliActor = "cli";
/// <summary>The library event type that denotes a constraint denial.</summary>
private const string ConstraintDeniedEventType = "constraint-denied";
/// <summary>The library's keyless schema-init event type.</summary>
private const string InitDbEventType = "init-db";
/// <inheritdoc />
public async Task AppendAsync(ApiKeyAuditEntry entry, CancellationToken ct)
{
ArgumentNullException.ThrowIfNull(entry);
AuditEvent auditEvent = new()
{
EventId = Guid.NewGuid(),
OccurredAtUtc = entry.CreatedUtc,
// Keyless library events: init-db is system-originated; any other keyless event
// is a CLI/admin verb run without an authenticated principal.
Actor = entry.KeyId
?? (entry.EventType == InitDbEventType ? SystemActor : CliActor),
Action = entry.EventType,
Outcome = entry.EventType == ConstraintDeniedEventType
? AuditOutcome.Denied
: AuditOutcome.Success,
Category = ApiKeyCategory,
Target = entry.KeyId,
SourceNode = entry.RemoteAddress,
CorrelationId = null,
DetailsJson = WrapDetails(entry.Details),
};
// Best-effort: IAuditWriter swallows/logs failures, so this never throws.
await auditWriter.WriteAsync(auditEvent, ct).ConfigureAwait(false);
}
/// <inheritdoc />
public async Task<IReadOnlyList<ApiKeyAuditEntry>> ListRecentAsync(int limit, CancellationToken ct)
{
IReadOnlyList<AuditEvent> events = await store.ListRecentAsync(limit, ct).ConfigureAwait(false);
ApiKeyAuditEntry[] entries = new ApiKeyAuditEntry[events.Count];
for (int index = 0; index < events.Count; index++)
{
AuditEvent auditEvent = events[index];
entries[index] = new ApiKeyAuditEntry(
KeyId: auditEvent.Actor switch
{
// Keyless library events were mapped to the system/cli sentinel actors on the
// way in; map them back to a null KeyId so the dashboard view is faithful.
SystemActor or CliActor => null,
string actor => actor,
},
EventType: auditEvent.Action,
RemoteAddress: auditEvent.SourceNode,
CreatedUtc: auditEvent.OccurredAtUtc,
Details: UnwrapDetails(auditEvent.DetailsJson));
}
return entries;
}
/// <summary>
/// Wraps a free-form library detail string into the canonical
/// <c>{"detail": "&lt;escaped&gt;"}</c> JSON envelope, or null when there is no detail.
/// </summary>
private static string? WrapDetails(string? details)
{
if (details is null)
{
return null;
}
return JsonSerializer.Serialize(new Dictionary<string, string> { ["detail"] = details });
}
/// <summary>
/// Unwraps the canonical detail envelope back to the original free-form string. Falls
/// back to the raw JSON when it is not a recognised <c>{"detail": ...}</c> envelope, so
/// directly-emitted canonical events (whose DetailsJson is richer) still surface text.
/// </summary>
private static string? UnwrapDetails(string? detailsJson)
{
if (string.IsNullOrEmpty(detailsJson))
{
return null;
}
try
{
using JsonDocument document = JsonDocument.Parse(detailsJson);
if (document.RootElement.ValueKind == JsonValueKind.Object
&& document.RootElement.TryGetProperty("detail", out JsonElement detail)
&& detail.ValueKind == JsonValueKind.String)
{
return detail.GetString();
}
}
catch (JsonException)
{
// Not JSON we recognise; surface the raw payload below.
}
return detailsJson;
}
}
@@ -0,0 +1,51 @@
using System.Security.Claims;
using ZB.MOM.WW.Auth.AspNetCore;
namespace ZB.MOM.WW.MxGateway.Server.Security.Audit;
/// <summary>
/// HTTP-context-backed implementation of <see cref="IAuditActorAccessor"/> that reads the
/// dashboard operator's identity from the current <see cref="IHttpContextAccessor"/>.
/// </summary>
/// <remarks>
/// Claim resolution order:
/// <list type="number">
/// <item><see cref="ZbClaimTypes.Username"/> ("zb:username") — the canonical LDAP login name.</item>
/// <item><see cref="ClaimsPrincipal.Identity"/>.<see cref="System.Security.Principal.IIdentity.Name"/> — framework fallback (= <see cref="ZbClaimTypes.Name"/> = <see cref="ClaimTypes.Name"/> = display name).</item>
/// <item><see cref="ZbClaimTypes.Name"/> — explicit fallback matching the claim emitted by <c>DashboardAuthenticator.CreatePrincipal</c>.</item>
/// </list>
/// Returns <see langword="null"/> when there is no HTTP context or the user is not authenticated.
/// </remarks>
public sealed class HttpAuditActorAccessor(IHttpContextAccessor httpContextAccessor) : IAuditActorAccessor
{
/// <inheritdoc />
public string? CurrentActor
{
get
{
ClaimsPrincipal? user = httpContextAccessor.HttpContext?.User;
if (user?.Identity?.IsAuthenticated != true)
{
return null;
}
// Prefer the canonical login-username claim (set by DashboardAuthenticator).
string? username = user.FindFirstValue(ZbClaimTypes.Username);
if (!string.IsNullOrWhiteSpace(username))
{
return username;
}
// Framework fallback: Identity.Name is driven by the ClaimsIdentity nameClaimType,
// which DashboardAuthenticator sets to ZbClaimTypes.Name (= ClaimTypes.Name = display name).
string? identityName = user.Identity?.Name;
if (!string.IsNullOrWhiteSpace(identityName))
{
return identityName;
}
// Final explicit fallback — ZbClaimTypes.Name claim value directly.
return user.FindFirstValue(ZbClaimTypes.Name);
}
}
}
@@ -0,0 +1,18 @@
namespace ZB.MOM.WW.MxGateway.Server.Security.Audit;
/// <summary>
/// Returns the current actor name for use in audit events.
/// </summary>
/// <remarks>
/// Implementations resolve the actor from the ambient request context. For the dashboard
/// this is the authenticated LDAP operator; for non-HTTP contexts (gRPC, CLI) the caller
/// provides the actor directly and this seam is not used.
/// </remarks>
public interface IAuditActorAccessor
{
/// <summary>
/// Gets the current actor's username, or <see langword="null"/> when there is no
/// authenticated principal in scope (e.g. an anonymous or unauthenticated request).
/// </summary>
string? CurrentActor { get; }
}
@@ -0,0 +1,138 @@
using System.Globalization;
using Microsoft.Data.Sqlite;
using ZB.MOM.WW.Audit;
using ZB.MOM.WW.Auth.ApiKeys.Sqlite;
namespace ZB.MOM.WW.MxGateway.Server.Security.Audit;
/// <summary>
/// MxGateway-owned, append-only SQLite store for canonical
/// <see cref="AuditEvent"/>s. It writes to a NEW <c>audit_event</c> table in the
/// SAME database file as the shared <c>ZB.MOM.WW.Auth.ApiKeys</c> stores: both share
/// the library's <see cref="AuthSqliteConnectionFactory"/> (so they target the same
/// <c>ApiKeyOptions.SqlitePath</c> with the same WAL/busy-timeout connection config).
/// </summary>
/// <remarks>
/// This store is the canonical sink for ALL MxGateway audit. The library's own
/// <c>api_key_audit</c> table is left in place but UNUSED after adoption — the library's
/// <c>IApiKeyAuditStore</c> registration is overridden by
/// <see cref="CanonicalForwardingApiKeyAuditStore"/>, which forwards onto this store via
/// <see cref="CanonicalAuditWriter"/>. The library's <c>schema_version</c> /
/// <c>api_key_audit</c> tables are not touched here; the <c>audit_event</c> table is
/// created idempotently (<c>CREATE TABLE IF NOT EXISTS</c>) on each write so it
/// self-bootstraps regardless of migration ordering.
/// </remarks>
public sealed class SqliteCanonicalAuditStore(AuthSqliteConnectionFactory connectionFactory)
{
private const string CreateTableSql =
"""
CREATE TABLE IF NOT EXISTS audit_event (
event_id TEXT PRIMARY KEY,
occurred_at_utc TEXT NOT NULL,
actor TEXT NOT NULL,
action TEXT NOT NULL,
outcome TEXT NOT NULL,
category TEXT NULL,
target TEXT NULL,
source_node TEXT NULL,
correlation_id TEXT NULL,
details_json TEXT NULL
);
""";
/// <summary>Inserts a canonical audit event into the <c>audit_event</c> table.</summary>
/// <param name="auditEvent">The canonical event to persist.</param>
/// <param name="cancellationToken">Token to observe for cancellation.</param>
public async Task InsertAsync(AuditEvent auditEvent, CancellationToken cancellationToken)
{
ArgumentNullException.ThrowIfNull(auditEvent);
await using SqliteConnection connection =
await connectionFactory.OpenConnectionAsync(cancellationToken).ConfigureAwait(false);
await EnsureTableAsync(connection, cancellationToken).ConfigureAwait(false);
await using SqliteCommand command = connection.CreateCommand();
command.CommandText =
"""
INSERT INTO audit_event
(event_id, occurred_at_utc, actor, action, outcome,
category, target, source_node, correlation_id, details_json)
VALUES
($event_id, $occurred_at_utc, $actor, $action, $outcome,
$category, $target, $source_node, $correlation_id, $details_json);
""";
command.Parameters.AddWithValue("$event_id", auditEvent.EventId.ToString());
command.Parameters.AddWithValue("$occurred_at_utc", auditEvent.OccurredAtUtc.ToString("O", CultureInfo.InvariantCulture));
command.Parameters.AddWithValue("$actor", auditEvent.Actor);
command.Parameters.AddWithValue("$action", auditEvent.Action);
command.Parameters.AddWithValue("$outcome", auditEvent.Outcome.ToString());
command.Parameters.AddWithValue("$category", (object?)auditEvent.Category ?? DBNull.Value);
command.Parameters.AddWithValue("$target", (object?)auditEvent.Target ?? DBNull.Value);
command.Parameters.AddWithValue("$source_node", (object?)auditEvent.SourceNode ?? DBNull.Value);
command.Parameters.AddWithValue("$correlation_id", (object?)auditEvent.CorrelationId?.ToString() ?? DBNull.Value);
command.Parameters.AddWithValue("$details_json", (object?)auditEvent.DetailsJson ?? DBNull.Value);
await command.ExecuteNonQueryAsync(cancellationToken).ConfigureAwait(false);
}
/// <summary>Returns the most recent canonical audit events, newest first.</summary>
/// <param name="limit">Maximum number of events to return.</param>
/// <param name="cancellationToken">Token to observe for cancellation.</param>
public async Task<IReadOnlyList<AuditEvent>> ListRecentAsync(int limit, CancellationToken cancellationToken)
{
if (limit <= 0)
{
return [];
}
await using SqliteConnection connection =
await connectionFactory.OpenConnectionAsync(cancellationToken).ConfigureAwait(false);
await EnsureTableAsync(connection, cancellationToken).ConfigureAwait(false);
await using SqliteCommand command = connection.CreateCommand();
command.CommandText =
"""
SELECT event_id, occurred_at_utc, actor, action, outcome,
category, target, source_node, correlation_id, details_json
FROM audit_event
ORDER BY rowid DESC
LIMIT $limit;
""";
command.Parameters.AddWithValue("$limit", limit);
List<AuditEvent> events = [];
await using SqliteDataReader reader = await command.ExecuteReaderAsync(cancellationToken).ConfigureAwait(false);
while (await reader.ReadAsync(cancellationToken).ConfigureAwait(false))
{
events.Add(new AuditEvent
{
EventId = Guid.Parse(reader.GetString(0)),
OccurredAtUtc = ParseUtc(reader.GetString(1)),
Actor = reader.GetString(2),
Action = reader.GetString(3),
Outcome = Enum.Parse<AuditOutcome>(reader.GetString(4)),
Category = reader.IsDBNull(5) ? null : reader.GetString(5),
Target = reader.IsDBNull(6) ? null : reader.GetString(6),
SourceNode = reader.IsDBNull(7) ? null : reader.GetString(7),
CorrelationId = reader.IsDBNull(8) ? null : Guid.Parse(reader.GetString(8)),
DetailsJson = reader.IsDBNull(9) ? null : reader.GetString(9),
});
}
return events;
}
private static async Task EnsureTableAsync(SqliteConnection connection, CancellationToken cancellationToken)
{
await using SqliteCommand command = connection.CreateCommand();
command.CommandText = CreateTableSql;
await command.ExecuteNonQueryAsync(cancellationToken).ConfigureAwait(false);
}
private static DateTimeOffset ParseUtc(string value) =>
DateTimeOffset.Parse(value, CultureInfo.InvariantCulture, DateTimeStyles.RoundtripKind);
}
@@ -1,15 +1,19 @@
using System.Text.Json; using System.Text.Json;
using ZB.MOM.WW.Auth.Abstractions.ApiKeys;
using ZB.MOM.WW.Auth.ApiKeys.Admin;
namespace ZB.MOM.WW.MxGateway.Server.Security.Authentication; namespace ZB.MOM.WW.MxGateway.Server.Security.Authentication;
/// <summary> /// <summary>
/// Executes API key administration commands from the CLI. /// Executes API key administration commands from the CLI.
/// </summary> /// </summary>
public sealed class ApiKeyAdminCliRunner( /// <remarks>
IAuthStoreMigrator migrator, /// The create/revoke/rotate/list/init-db verbs (secret generation, peppered hashing, token
IApiKeyAdminStore adminStore, /// assembly and per-action audit) are delegated to the shared
IApiKeyAuditStore auditStore, /// <see cref="ApiKeyAdminCommands"/>. This runner adapts the gateway's strongly-typed command and
IApiKeySecretHasher hasher) /// output DTOs (which carry <see cref="ApiKeyConstraints"/>) onto the library's JSON-based contract.
/// </remarks>
public sealed class ApiKeyAdminCliRunner(ApiKeyAdminCommands commands)
{ {
private static readonly JsonSerializerOptions JsonOptions = new() private static readonly JsonSerializerOptions JsonOptions = new()
{ {
@@ -44,8 +48,7 @@ public sealed class ApiKeyAdminCliRunner(
private async Task<ApiKeyAdminOutput> InitDbAsync(CancellationToken cancellationToken) private async Task<ApiKeyAdminOutput> InitDbAsync(CancellationToken cancellationToken)
{ {
await migrator.MigrateAsync(cancellationToken).ConfigureAwait(false); await commands.InitDbAsync(remoteAddress: null, cancellationToken).ConfigureAwait(false);
await AppendAuditAsync(null, "init-db", null, cancellationToken).ConfigureAwait(false);
return new ApiKeyAdminOutput("init-db", "initialized", null, []); return new ApiKeyAdminOutput("init-db", "initialized", null, []);
} }
@@ -54,33 +57,26 @@ public sealed class ApiKeyAdminCliRunner(
ApiKeyAdminCommand command, ApiKeyAdminCommand command,
CancellationToken cancellationToken) CancellationToken cancellationToken)
{ {
await migrator.MigrateAsync(cancellationToken).ConfigureAwait(false); // The shared command set requires the schema to exist; init-db is idempotent.
await commands.InitDbAsync(remoteAddress: null, cancellationToken).ConfigureAwait(false);
string keyId = Required(command.KeyId); string keyId = Required(command.KeyId);
string secret = ApiKeySecretGenerator.Generate(); CreateKeyResult created = await commands.CreateKeyAsync(
string apiKey = FormatApiKey(keyId, secret); keyId,
Required(command.DisplayName),
await adminStore.CreateAsync( command.Scopes,
new ApiKeyCreateRequest( ApiKeyConstraintSerializer.Serialize(command.Constraints),
KeyId: keyId, remoteAddress: null,
KeyPrefix: $"mxgw_{keyId}",
SecretHash: hasher.HashSecret(secret),
DisplayName: Required(command.DisplayName),
Scopes: command.Scopes,
Constraints: command.Constraints,
CreatedUtc: DateTimeOffset.UtcNow),
cancellationToken) cancellationToken)
.ConfigureAwait(false); .ConfigureAwait(false);
await AppendAuditAsync(keyId, "create-key", null, cancellationToken).ConfigureAwait(false);
return new ApiKeyAdminOutput("create-key", "created", apiKey, []); return new ApiKeyAdminOutput("create-key", "created", created.Token, []);
} }
private async Task<ApiKeyAdminOutput> ListKeysAsync(CancellationToken cancellationToken) private async Task<ApiKeyAdminOutput> ListKeysAsync(CancellationToken cancellationToken)
{ {
await migrator.MigrateAsync(cancellationToken).ConfigureAwait(false); await commands.InitDbAsync(remoteAddress: null, cancellationToken).ConfigureAwait(false);
IReadOnlyList<ApiKeyRecord> keys = await adminStore.ListAsync(cancellationToken).ConfigureAwait(false); IReadOnlyList<ApiKeyListItem> keys = await commands.ListKeysAsync(cancellationToken).ConfigureAwait(false);
await AppendAuditAsync(null, "list-keys", null, cancellationToken).ConfigureAwait(false);
return new ApiKeyAdminOutput( return new ApiKeyAdminOutput(
"list-keys", "list-keys",
@@ -93,35 +89,28 @@ public sealed class ApiKeyAdminCliRunner(
ApiKeyAdminCommand command, ApiKeyAdminCommand command,
CancellationToken cancellationToken) CancellationToken cancellationToken)
{ {
await migrator.MigrateAsync(cancellationToken).ConfigureAwait(false); await commands.InitDbAsync(remoteAddress: null, cancellationToken).ConfigureAwait(false);
string keyId = Required(command.KeyId); string keyId = Required(command.KeyId);
bool revoked = await adminStore.RevokeAsync(keyId, DateTimeOffset.UtcNow, cancellationToken) KeyActionResult result = await commands.RevokeKeyAsync(keyId, remoteAddress: null, cancellationToken)
.ConfigureAwait(false); .ConfigureAwait(false);
await AppendAuditAsync(keyId, "revoke-key", revoked ? "revoked" : "not-found-or-already-revoked", cancellationToken) return new ApiKeyAdminOutput("revoke-key", result.Succeeded ? "revoked" : "not-found-or-already-revoked", null, []);
.ConfigureAwait(false);
return new ApiKeyAdminOutput("revoke-key", revoked ? "revoked" : "not-found-or-already-revoked", null, []);
} }
private async Task<ApiKeyAdminOutput> RotateKeyAsync( private async Task<ApiKeyAdminOutput> RotateKeyAsync(
ApiKeyAdminCommand command, ApiKeyAdminCommand command,
CancellationToken cancellationToken) CancellationToken cancellationToken)
{ {
await migrator.MigrateAsync(cancellationToken).ConfigureAwait(false); await commands.InitDbAsync(remoteAddress: null, cancellationToken).ConfigureAwait(false);
string keyId = Required(command.KeyId); string keyId = Required(command.KeyId);
string secret = ApiKeySecretGenerator.Generate(); CreateKeyResult rotated = await commands.RotateKeyAsync(keyId, remoteAddress: null, cancellationToken)
string apiKey = FormatApiKey(keyId, secret);
bool rotated = await adminStore.RotateAsync(keyId, hasher.HashSecret(secret), DateTimeOffset.UtcNow, cancellationToken)
.ConfigureAwait(false); .ConfigureAwait(false);
await AppendAuditAsync(keyId, "rotate-key", rotated ? "rotated" : "not-found", cancellationToken) bool succeeded = rotated.Token is not null;
.ConfigureAwait(false);
return new ApiKeyAdminOutput("rotate-key", rotated ? "rotated" : "not-found", rotated ? apiKey : null, []); return new ApiKeyAdminOutput("rotate-key", succeeded ? "rotated" : "not-found", rotated.Token, []);
} }
private static async Task WriteOutputAsync( private static async Task WriteOutputAsync(
@@ -150,40 +139,19 @@ public sealed class ApiKeyAdminCliRunner(
} }
} }
private async Task AppendAuditAsync( private static ApiKeyAdminListedKey ToListedKey(ApiKeyListItem key)
string? keyId,
string eventType,
string? details,
CancellationToken cancellationToken)
{
await auditStore.AppendAsync(
new ApiKeyAuditEntry(
KeyId: keyId,
EventType: eventType,
RemoteAddress: null,
Details: details),
cancellationToken)
.ConfigureAwait(false);
}
private static ApiKeyAdminListedKey ToListedKey(ApiKeyRecord key)
{ {
return new ApiKeyAdminListedKey( return new ApiKeyAdminListedKey(
KeyId: key.KeyId, KeyId: key.KeyId,
KeyPrefix: key.KeyPrefix, KeyPrefix: key.KeyPrefix,
DisplayName: key.DisplayName, DisplayName: key.DisplayName,
Scopes: key.Scopes, Scopes: key.Scopes,
Constraints: key.Constraints, Constraints: ApiKeyConstraintSerializer.Deserialize(key.ConstraintsJson),
CreatedUtc: key.CreatedUtc, CreatedUtc: key.CreatedUtc,
LastUsedUtc: key.LastUsedUtc, LastUsedUtc: key.LastUsedUtc,
RevokedUtc: key.RevokedUtc); RevokedUtc: key.RevokedUtc);
} }
private static string FormatApiKey(string keyId, string secret)
{
return $"mxgw_{keyId}_{secret}";
}
private static string Required(string? value) private static string Required(string? value)
{ {
return value ?? throw new InvalidOperationException("Required command value was not provided."); return value ?? throw new InvalidOperationException("Required command value was not provided.");
@@ -1,7 +0,0 @@
namespace ZB.MOM.WW.MxGateway.Server.Security.Authentication;
public sealed record ApiKeyAuditEntry(
string? KeyId,
string EventType,
string? RemoteAddress,
string? Details);
@@ -1,9 +0,0 @@
namespace ZB.MOM.WW.MxGateway.Server.Security.Authentication;
public sealed record ApiKeyAuditRecord(
long AuditId,
string? KeyId,
string EventType,
string? RemoteAddress,
DateTimeOffset CreatedUtc,
string? Details);
@@ -1,10 +0,0 @@
namespace ZB.MOM.WW.MxGateway.Server.Security.Authentication;
public sealed record ApiKeyCreateRequest(
string KeyId,
string KeyPrefix,
byte[] SecretHash,
string DisplayName,
IReadOnlySet<string> Scopes,
ApiKeyConstraints Constraints,
DateTimeOffset CreatedUtc);
@@ -1,49 +0,0 @@
namespace ZB.MOM.WW.MxGateway.Server.Security.Authentication;
public sealed class ApiKeyParser : IApiKeyParser
{
private const string BearerPrefix = "Bearer ";
private const string TokenPrefix = "mxgw_";
/// <summary>Attempts to parse a Bearer token from an Authorization header and extract the API key ID and secret.</summary>
/// <param name="authorizationHeader">Authorization header value to parse.</param>
/// <param name="apiKey">Parsed API key with ID and secret, or null if parsing failed.</param>
/// <returns>True if the header was successfully parsed; otherwise, false.</returns>
public bool TryParseAuthorizationHeader(string? authorizationHeader, out ParsedApiKey? apiKey)
{
apiKey = null;
if (string.IsNullOrWhiteSpace(authorizationHeader)
|| !authorizationHeader.StartsWith(BearerPrefix, StringComparison.OrdinalIgnoreCase))
{
return false;
}
string token = authorizationHeader[BearerPrefix.Length..].Trim();
if (!token.StartsWith(TokenPrefix, StringComparison.OrdinalIgnoreCase))
{
return false;
}
string keyPayload = token[TokenPrefix.Length..];
int separatorIndex = keyPayload.IndexOf('_', StringComparison.Ordinal);
if (separatorIndex <= 0 || separatorIndex == keyPayload.Length - 1)
{
return false;
}
string keyId = keyPayload[..separatorIndex];
string secret = keyPayload[(separatorIndex + 1)..];
if (string.IsNullOrWhiteSpace(keyId) || string.IsNullOrWhiteSpace(secret))
{
return false;
}
apiKey = new ParsedApiKey(keyId, secret);
return true;
}
}
@@ -1,4 +0,0 @@
namespace ZB.MOM.WW.MxGateway.Server.Security.Authentication;
public sealed class ApiKeyPepperUnavailableException(string pepperSecretName)
: InvalidOperationException($"API key pepper secret '{pepperSecretName}' is not configured.");
@@ -1,12 +0,0 @@
namespace ZB.MOM.WW.MxGateway.Server.Security.Authentication;
public sealed record ApiKeyRecord(
string KeyId,
string KeyPrefix,
byte[] SecretHash,
string DisplayName,
IReadOnlySet<string> Scopes,
ApiKeyConstraints Constraints,
DateTimeOffset CreatedUtc,
DateTimeOffset? LastUsedUtc,
DateTimeOffset? RevokedUtc);
@@ -1,31 +0,0 @@
using Microsoft.Data.Sqlite;
namespace ZB.MOM.WW.MxGateway.Server.Security.Authentication;
/// <summary>Reads API key records from SQLite query results.</summary>
public static class ApiKeyRecordReader
{
/// <summary>Deserializes a row from the API key table into an ApiKeyRecord.</summary>
/// <param name="reader">The data reader positioned at the API key row.</param>
/// <returns>The deserialized API key record.</returns>
public static ApiKeyRecord Read(SqliteDataReader reader)
{
return new ApiKeyRecord(
KeyId: reader.GetString(0),
KeyPrefix: reader.GetString(1),
SecretHash: (byte[])reader["secret_hash"],
DisplayName: reader.GetString(3),
Scopes: ApiKeyScopeSerializer.Deserialize(reader.GetString(4)),
Constraints: ApiKeyConstraintSerializer.Deserialize(reader.IsDBNull(5) ? null : reader.GetString(5)),
CreatedUtc: DateTimeOffset.Parse(reader.GetString(6), System.Globalization.CultureInfo.InvariantCulture),
LastUsedUtc: ReadNullableDateTimeOffset(reader, 7),
RevokedUtc: ReadNullableDateTimeOffset(reader, 8));
}
private static DateTimeOffset? ReadNullableDateTimeOffset(SqliteDataReader reader, int ordinal)
{
return reader.IsDBNull(ordinal)
? null
: DateTimeOffset.Parse(reader.GetString(ordinal), System.Globalization.CultureInfo.InvariantCulture);
}
}
@@ -1,29 +0,0 @@
using System.Text.Json;
namespace ZB.MOM.WW.MxGateway.Server.Security.Authentication;
public static class ApiKeyScopeSerializer
{
/// <summary>Serializes scopes to JSON string.</summary>
/// <param name="scopes">The scopes to serialize.</param>
/// <returns>JSON string representation.</returns>
public static string Serialize(IReadOnlySet<string> scopes)
{
return JsonSerializer.Serialize(scopes.Order(StringComparer.Ordinal));
}
/// <summary>Deserializes scopes from JSON string.</summary>
/// <param name="value">The JSON string to deserialize.</param>
/// <returns>Deserialized scopes set.</returns>
public static IReadOnlySet<string> Deserialize(string value)
{
if (string.IsNullOrWhiteSpace(value))
{
return new HashSet<string>(StringComparer.Ordinal);
}
string[]? scopes = JsonSerializer.Deserialize<string[]>(value);
return new HashSet<string>(scopes ?? [], StringComparer.Ordinal);
}
}
@@ -1,19 +0,0 @@
using System.Security.Cryptography;
namespace ZB.MOM.WW.MxGateway.Server.Security.Authentication;
/// <summary>Generates cryptographically secure API key secrets.</summary>
public static class ApiKeySecretGenerator
{
/// <summary>Generates a new random API key secret string.</summary>
public static string Generate()
{
Span<byte> bytes = stackalloc byte[32];
RandomNumberGenerator.Fill(bytes);
return Convert.ToBase64String(bytes)
.TrimEnd('=')
.Replace('+', '-')
.Replace('/', '_');
}
}
@@ -1,38 +0,0 @@
using System.Security.Cryptography;
using System.Text;
using Microsoft.Extensions.Options;
using ZB.MOM.WW.MxGateway.Server.Configuration;
namespace ZB.MOM.WW.MxGateway.Server.Security.Authentication;
public sealed class ApiKeySecretHasher(
IConfiguration configuration,
IOptions<GatewayOptions> options) : IApiKeySecretHasher
{
/// <summary>Hashes an API key secret with pepper using HMAC-SHA256.</summary>
/// <param name="secret">The secret to hash.</param>
/// <returns>The hashed secret.</returns>
public byte[] HashSecret(string secret)
{
string pepper = GetPepper();
byte[] pepperBytes = Encoding.UTF8.GetBytes(pepper);
byte[] secretBytes = Encoding.UTF8.GetBytes(secret);
using HMACSHA256 hmac = new(pepperBytes);
return hmac.ComputeHash(secretBytes);
}
private string GetPepper()
{
string pepperSecretName = options.Value.Authentication.PepperSecretName;
string? pepper = configuration[pepperSecretName];
if (string.IsNullOrWhiteSpace(pepper))
{
throw new ApiKeyPepperUnavailableException(pepperSecretName);
}
return pepper;
}
}
@@ -1,11 +0,0 @@
namespace ZB.MOM.WW.MxGateway.Server.Security.Authentication;
public enum ApiKeyVerificationFailure
{
None,
MissingOrMalformedCredentials,
PepperUnavailable,
KeyNotFound,
KeyRevoked,
SecretMismatch
}
@@ -1,33 +0,0 @@
namespace ZB.MOM.WW.MxGateway.Server.Security.Authentication;
public sealed record ApiKeyVerificationResult(
bool Succeeded,
ApiKeyIdentity? Identity,
ApiKeyVerificationFailure Failure)
{
/// <summary>
/// Creates a successful verification result.
/// </summary>
/// <param name="identity">API key identity.</param>
/// <returns>Success result.</returns>
public static ApiKeyVerificationResult Success(ApiKeyIdentity identity)
{
return new ApiKeyVerificationResult(
Succeeded: true,
Identity: identity,
Failure: ApiKeyVerificationFailure.None);
}
/// <summary>
/// Creates a failed verification result.
/// </summary>
/// <param name="failure">Verification failure reason.</param>
/// <returns>Failure result.</returns>
public static ApiKeyVerificationResult Fail(ApiKeyVerificationFailure failure)
{
return new ApiKeyVerificationResult(
Succeeded: false,
Identity: null,
Failure: failure);
}
}
@@ -1,64 +0,0 @@
using System.Security.Cryptography;
namespace ZB.MOM.WW.MxGateway.Server.Security.Authentication;
public sealed class ApiKeyVerifier(
IApiKeyParser parser,
IApiKeySecretHasher hasher,
IApiKeyStore keyStore) : IApiKeyVerifier
{
/// <summary>
/// Verifies an API key from an authorization header asynchronously.
/// </summary>
/// <param name="authorizationHeader">Authorization header value.</param>
/// <param name="cancellationToken">Cancellation token.</param>
/// <returns>Verification result.</returns>
public async Task<ApiKeyVerificationResult> VerifyAsync(
string? authorizationHeader,
CancellationToken cancellationToken)
{
if (!parser.TryParseAuthorizationHeader(authorizationHeader, out ParsedApiKey? parsedKey)
|| parsedKey is null)
{
return ApiKeyVerificationResult.Fail(ApiKeyVerificationFailure.MissingOrMalformedCredentials);
}
ApiKeyRecord? storedKey = await keyStore.FindByKeyIdAsync(parsedKey.KeyId, cancellationToken)
.ConfigureAwait(false);
if (storedKey is null)
{
return ApiKeyVerificationResult.Fail(ApiKeyVerificationFailure.KeyNotFound);
}
if (storedKey.RevokedUtc is not null)
{
return ApiKeyVerificationResult.Fail(ApiKeyVerificationFailure.KeyRevoked);
}
byte[] presentedHash;
try
{
presentedHash = hasher.HashSecret(parsedKey.Secret);
}
catch (ApiKeyPepperUnavailableException)
{
return ApiKeyVerificationResult.Fail(ApiKeyVerificationFailure.PepperUnavailable);
}
if (!CryptographicOperations.FixedTimeEquals(presentedHash, storedKey.SecretHash))
{
return ApiKeyVerificationResult.Fail(ApiKeyVerificationFailure.SecretMismatch);
}
await keyStore.MarkKeyUsedAsync(storedKey.KeyId, DateTimeOffset.UtcNow, cancellationToken)
.ConfigureAwait(false);
return ApiKeyVerificationResult.Success(new ApiKeyIdentity(
KeyId: storedKey.KeyId,
KeyPrefix: storedKey.KeyPrefix,
DisplayName: storedKey.DisplayName,
Scopes: storedKey.Scopes,
Constraints: storedKey.Constraints));
}
}
@@ -1,80 +0,0 @@
using Microsoft.Data.Sqlite;
using Microsoft.Extensions.Options;
using ZB.MOM.WW.MxGateway.Server.Configuration;
namespace ZB.MOM.WW.MxGateway.Server.Security.Authentication;
/// <summary>
/// Factory for creating SQLite connections to the authentication store.
/// </summary>
public sealed class AuthSqliteConnectionFactory(IOptions<GatewayOptions> options)
{
/// <summary>
/// Busy timeout applied to every auth-store connection. SQLite retries a busy
/// database for this long before surfacing <c>SQLITE_BUSY</c>, so the concurrent
/// <c>MarkKeyUsedAsync</c> / audit-append writers degrade gracefully under load
/// instead of failing the request path.
/// </summary>
private static readonly TimeSpan BusyTimeout = TimeSpan.FromSeconds(5);
/// <summary>
/// Creates an unopened SQLite connection to the auth database. Prefer
/// <see cref="OpenConnectionAsync"/>, which also applies WAL journaling and the
/// busy timeout.
/// </summary>
public SqliteConnection CreateConnection()
{
string sqlitePath = options.Value.Authentication.SqlitePath;
string? directory = Path.GetDirectoryName(sqlitePath);
if (!string.IsNullOrWhiteSpace(directory))
{
Directory.CreateDirectory(directory);
}
SqliteConnectionStringBuilder builder = new()
{
DataSource = sqlitePath,
Mode = SqliteOpenMode.ReadWriteCreate,
Pooling = true,
DefaultTimeout = (int)BusyTimeout.TotalSeconds,
};
return new SqliteConnection(builder.ToString());
}
/// <summary>
/// Creates a SQLite connection, opens it, and configures WAL journaling and a
/// non-zero busy timeout so concurrent readers and writers degrade gracefully
/// rather than surfacing <c>SQLITE_BUSY</c> as a hard failure.
/// </summary>
/// <param name="cancellationToken">Cancellation token for the operation.</param>
/// <returns>An opened and configured SQLite connection.</returns>
public async Task<SqliteConnection> OpenConnectionAsync(CancellationToken cancellationToken)
{
SqliteConnection connection = CreateConnection();
try
{
await connection.OpenAsync(cancellationToken).ConfigureAwait(false);
await ConfigureConnectionAsync(connection, cancellationToken).ConfigureAwait(false);
return connection;
}
catch
{
await connection.DisposeAsync().ConfigureAwait(false);
throw;
}
}
private static async Task ConfigureConnectionAsync(
SqliteConnection connection,
CancellationToken cancellationToken)
{
// WAL is a persistent, database-level setting; re-applying it per connection
// is cheap and a no-op once set. busy_timeout is per-connection state.
await using SqliteCommand command = connection.CreateCommand();
command.CommandText =
$"PRAGMA journal_mode=WAL; PRAGMA busy_timeout={(int)BusyTimeout.TotalMilliseconds};";
await command.ExecuteNonQueryAsync(cancellationToken).ConfigureAwait(false);
}
}
@@ -1,3 +0,0 @@
namespace ZB.MOM.WW.MxGateway.Server.Security.Authentication;
public sealed class AuthStoreMigrationException(string message) : InvalidOperationException(message);
@@ -1,29 +0,0 @@
using Microsoft.Extensions.Options;
using ZB.MOM.WW.MxGateway.Server.Configuration;
namespace ZB.MOM.WW.MxGateway.Server.Security.Authentication;
/// <summary>
/// Hosted service that runs authentication store migrations on startup.
/// </summary>
public sealed class AuthStoreMigrationHostedService(
IOptions<GatewayOptions> options,
IAuthStoreMigrator migrator) : IHostedService
{
/// <inheritdoc />
public async Task StartAsync(CancellationToken cancellationToken)
{
AuthenticationOptions authentication = options.Value.Authentication;
if (authentication.Mode == AuthenticationMode.ApiKey && authentication.RunMigrationsOnStartup)
{
await migrator.MigrateAsync(cancellationToken).ConfigureAwait(false);
}
}
/// <inheritdoc />
public Task StopAsync(CancellationToken cancellationToken)
{
return Task.CompletedTask;
}
}
@@ -1,27 +1,109 @@
using Microsoft.Extensions.Configuration;
using Microsoft.Extensions.Options;
using ZB.MOM.WW.Audit;
using ZB.MOM.WW.Auth.Abstractions.ApiKeys;
using ZB.MOM.WW.Auth.ApiKeys;
using ZB.MOM.WW.Auth.ApiKeys.Admin;
using ZB.MOM.WW.Auth.ApiKeys.DependencyInjection;
using ZB.MOM.WW.Auth.ApiKeys.Sqlite;
using ZB.MOM.WW.MxGateway.Server.Security.Audit;
namespace ZB.MOM.WW.MxGateway.Server.Security.Authentication; namespace ZB.MOM.WW.MxGateway.Server.Security.Authentication;
/// <summary> /// <summary>
/// Extension methods for configuring the SQLite authentication store. /// Extension methods for configuring the SQLite authentication store.
/// </summary> /// </summary>
/// <remarks>
/// The peppered-HMAC API-key pipeline (token format, hashing, constant-time compare, SQLite
/// schema, stores, verifier and migration) is provided by the shared
/// <c>ZB.MOM.WW.Auth.ApiKeys</c> library, of which this gateway is the donor. This wiring binds
/// the library's <see cref="ApiKeyOptions"/> from the gateway's <c>MxGateway:Authentication</c>
/// section and layers the gateway-specific constraint enforcement, gRPC interceptor, CLI and
/// dashboard on top.
/// </remarks>
public static class AuthStoreServiceCollectionExtensions public static class AuthStoreServiceCollectionExtensions
{ {
/// <summary>The configuration section the gateway binds API-key options from.</summary>
public const string AuthenticationSectionPath = "MxGateway:Authentication";
/// <summary>The gateway API-key token prefix (token format <c>mxgw_&lt;id&gt;_&lt;secret&gt;</c>).</summary>
public const string TokenPrefix = "mxgw";
/// <summary>The configuration key the API-key pepper is resolved from.</summary>
public const string PepperSecretName = "MxGateway:ApiKeyPepper";
/// <summary> /// <summary>
/// Adds the SQLite authentication store and related services to the dependency container. /// Adds the SQLite authentication store and related services to the dependency container.
/// </summary> /// </summary>
/// <param name="services">Service collection to configure.</param> /// <param name="services">Service collection to configure.</param>
/// <param name="configuration">Application configuration carrying the API-key options.</param>
/// <returns>The service collection for chaining.</returns> /// <returns>The service collection for chaining.</returns>
public static IServiceCollection AddSqliteAuthStore(this IServiceCollection services) public static IServiceCollection AddSqliteAuthStore(
this IServiceCollection services,
IConfiguration configuration)
{ {
services.AddSingleton<IApiKeyParser, ApiKeyParser>(); ArgumentNullException.ThrowIfNull(services);
services.AddSingleton<IApiKeySecretHasher, ApiKeySecretHasher>(); ArgumentNullException.ThrowIfNull(configuration);
services.AddSingleton<IApiKeyVerifier, ApiKeyVerifier>();
// Pin the gateway's API-key contract (token prefix "mxgw"; pepper resolved from
// MxGateway:ApiKeyPepper) by layering fallback defaults UNDER the supplied configuration:
// an in-memory source provides TokenPrefix/PepperSecretName only when the bound
// MxGateway:Authentication section omits them (the section has no TokenPrefix, and the pepper
// is intentionally not in appsettings — it is supplied at runtime). Explicit config wins
// because it is added last. ApiKeyOptions is an init-only record, so the values must be
// present at bind time rather than mutated post-configure.
IConfiguration effectiveConfig = new ConfigurationBuilder()
.AddInMemoryCollection(new Dictionary<string, string?>
{
[$"{AuthenticationSectionPath}:TokenPrefix"] = TokenPrefix,
[$"{AuthenticationSectionPath}:PepperSecretName"] = PepperSecretName,
})
.AddConfiguration(configuration)
.Build();
// Register the shared API-key provider: binds ApiKeyOptions from MxGateway:Authentication,
// wires up the SQLite stores, the configuration-backed pepper provider, the verifier, the
// migrator and the migration hosted service.
services.AddZbApiKeyAuth(effectiveConfig, AuthenticationSectionPath);
// Canonical audit (Task 2.3 — DEEP adopt ZB.MOM.WW.Audit). All MxGateway audit flows as a
// canonical AuditEvent through the library IAuditWriter, persisted in a NEW gateway-owned
// audit_event table that lives in the SAME SQLite DB file as the api-key stores (it reuses
// the library's AuthSqliteConnectionFactory, registered by AddZbApiKeyAuth above).
services.AddSingleton(sp =>
new SqliteCanonicalAuditStore(sp.GetRequiredService<AuthSqliteConnectionFactory>()));
// Resolve the logger defensively: the production host always registers ILogger<T>, but the
// DI-only auth/CLI/dashboard unit tests build a bare ServiceCollection without AddLogging().
// Fall back to NullLogger there so the audit writer (and the IApiKeyAuditStore override that
// depends on it) still resolve. The write path is best-effort regardless.
services.AddSingleton<IAuditWriter>(sp =>
new CanonicalAuditWriter(
sp.GetRequiredService<SqliteCanonicalAuditStore>(),
sp.GetService<ILogger<CanonicalAuditWriter>>()
?? Microsoft.Extensions.Logging.Abstractions.NullLogger<CanonicalAuditWriter>.Instance));
// OVERRIDE the library's IApiKeyAuditStore (AddZbApiKeyAuth registered the library's
// SqliteApiKeyAuditStore via TryAddSingleton) with an adapter that canonicalizes every
// library-emitted ApiKeyAuditEntry onto AuditEvent and forwards it through IAuditWriter.
// This is the only way to canonicalize the library-internal ApiKeyAdminCommands verbs
// (create/revoke/rotate/init-db, etc.), which we cannot edit. The adapter is registered
// after AddZbApiKeyAuth so it (last registration) is what ApiKeyAdminCommands resolves and
// what the dashboard "recent audit" view reads via IApiKeyAuditStore.ListRecentAsync.
// The library's api_key_audit table is left in place but is now UNUSED — nothing writes to
// it once this adapter replaces the library's audit store.
services.AddSingleton<IApiKeyAuditStore, CanonicalForwardingApiKeyAuditStore>();
// The shared admin command set (create/revoke/rotate/list/init-db with audit) is not
// auto-registered by AddZbApiKeyAuth; the gateway CLI and dashboard drive it, so register
// it here over the already-wired stores, pepper provider and migrator.
services.AddSingleton(sp => new ApiKeyAdminCommands(
sp.GetRequiredService<IOptions<ApiKeyOptions>>().Value,
sp.GetRequiredService<IApiKeyAdminStore>(),
sp.GetRequiredService<IApiKeyAuditStore>(),
sp.GetRequiredService<IApiKeyPepperProvider>(),
sp.GetRequiredService<SqliteAuthStoreMigrator>()));
services.AddSingleton<ApiKeyAdminCliRunner>(); services.AddSingleton<ApiKeyAdminCliRunner>();
services.AddSingleton<AuthSqliteConnectionFactory>();
services.AddSingleton<IAuthStoreMigrator, SqliteAuthStoreMigrator>();
services.AddSingleton<IApiKeyStore, SqliteApiKeyStore>();
services.AddSingleton<IApiKeyAdminStore, SqliteApiKeyAdminStore>();
services.AddSingleton<IApiKeyAuditStore, SqliteApiKeyAuditStore>();
services.AddHostedService<AuthStoreMigrationHostedService>();
return services; return services;
} }
@@ -0,0 +1,43 @@
using LibApiKeyIdentity = ZB.MOM.WW.Auth.Abstractions.ApiKeys.ApiKeyIdentity;
namespace ZB.MOM.WW.MxGateway.Server.Security.Authentication;
/// <summary>
/// Maps the shared <see cref="ZB.MOM.WW.Auth.Abstractions.ApiKeys.ApiKeyIdentity"/> (which
/// carries the key's scopes plus the opaque constraints JSON blob) onto the gateway's
/// <see cref="ApiKeyIdentity"/> (which exposes the deserialized
/// <see cref="ApiKeyConstraints"/> the downstream authorization code enforces).
/// </summary>
/// <remarks>
/// The shared verifier does not interpret the constraints column; it returns the stored
/// JSON verbatim in <see cref="ZB.MOM.WW.Auth.Abstractions.ApiKeys.ApiKeyIdentity.Constraints"/>.
/// This mapper re-hydrates it via <see cref="ApiKeyConstraintSerializer"/> so the gateway's
/// constraint enforcement (<c>ConstraintEnforcer</c>) and request-identity accessor continue
/// to operate on the strongly-typed model unchanged.
/// </remarks>
public static class GatewayApiKeyIdentityMapper
{
/// <summary>
/// Converts a shared API-key identity into the gateway identity, deserializing the opaque
/// constraints JSON into <see cref="ApiKeyConstraints"/>.
/// </summary>
/// <param name="identity">The shared identity returned by the library verifier.</param>
/// <returns>The gateway identity carrying the effective constraints.</returns>
public static ApiKeyIdentity ToGatewayIdentity(LibApiKeyIdentity identity)
{
ArgumentNullException.ThrowIfNull(identity);
// The library stores the opaque constraints blob in Constraints as the ConstraintsJson
// string (or null when the key has no constraints).
string? constraintsJson = identity.Constraints as string;
return new ApiKeyIdentity(
KeyId: identity.KeyId,
// The gateway token prefix is fixed ("mxgw"); the key id is its own field. KeyPrefix
// is retained only for surface compatibility with the gateway identity record.
KeyPrefix: "mxgw",
DisplayName: identity.DisplayName,
Scopes: identity.Scopes,
Constraints: ApiKeyConstraintSerializer.Deserialize(constraintsJson));
}
}
@@ -1,53 +0,0 @@
namespace ZB.MOM.WW.MxGateway.Server.Security.Authentication;
public interface IApiKeyAdminStore
{
/// <summary>
/// Creates a new API key asynchronously.
/// </summary>
/// <param name="request">API key creation request.</param>
/// <param name="cancellationToken">Cancellation token.</param>
/// <returns>Completed task.</returns>
Task CreateAsync(ApiKeyCreateRequest request, CancellationToken cancellationToken);
/// <summary>
/// Lists all API keys asynchronously.
/// </summary>
/// <param name="cancellationToken">Cancellation token.</param>
/// <returns>List of API key records.</returns>
Task<IReadOnlyList<ApiKeyRecord>> ListAsync(CancellationToken cancellationToken);
/// <summary>
/// Revokes an API key asynchronously.
/// </summary>
/// <param name="keyId">Key identifier.</param>
/// <param name="revokedUtc">Revocation timestamp.</param>
/// <param name="cancellationToken">Cancellation token.</param>
/// <returns>True if revoked; otherwise false.</returns>
Task<bool> RevokeAsync(string keyId, DateTimeOffset revokedUtc, CancellationToken cancellationToken);
/// <summary>
/// Rotates an API key secret asynchronously.
/// </summary>
/// <param name="keyId">Key identifier.</param>
/// <param name="secretHash">New secret hash.</param>
/// <param name="rotatedUtc">Rotation timestamp.</param>
/// <param name="cancellationToken">Cancellation token.</param>
/// <returns>True if rotated; otherwise false.</returns>
Task<bool> RotateAsync(
string keyId,
byte[] secretHash,
DateTimeOffset rotatedUtc,
CancellationToken cancellationToken);
/// <summary>
/// Permanently deletes an API key, but only if it is already revoked. Active keys are
/// untouched (returns false) so an admin cannot delete a working credential without
/// first revoking it — that preserves the audit trail and forces the revoke event to
/// land in the audit log before the row disappears.
/// </summary>
/// <param name="keyId">Key identifier.</param>
/// <param name="cancellationToken">Cancellation token.</param>
/// <returns>True if a revoked key was deleted; false if the key is missing or active.</returns>
Task<bool> DeleteAsync(string keyId, CancellationToken cancellationToken);
}
@@ -1,23 +0,0 @@
namespace ZB.MOM.WW.MxGateway.Server.Security.Authentication;
/// <summary>
/// Stores and retrieves audit events for API key operations.
/// </summary>
public interface IApiKeyAuditStore
{
/// <summary>
/// Appends an audit entry to the audit log.
/// </summary>
/// <param name="entry">Audit entry to record.</param>
/// <param name="cancellationToken">Token to cancel the asynchronous operation.</param>
/// <returns>Asynchronous task representing the append operation.</returns>
Task AppendAsync(ApiKeyAuditEntry entry, CancellationToken cancellationToken);
/// <summary>
/// Lists the most recent audit entries, up to the specified count.
/// </summary>
/// <param name="count">Maximum number of entries to return.</param>
/// <param name="cancellationToken">Token to cancel the asynchronous operation.</param>
/// <returns>Asynchronous task returning the list of audit records.</returns>
Task<IReadOnlyList<ApiKeyAuditRecord>> ListRecentAsync(int count, CancellationToken cancellationToken);
}
@@ -1,9 +0,0 @@
namespace ZB.MOM.WW.MxGateway.Server.Security.Authentication;
public interface IApiKeyParser
{
/// <summary>Attempts to parse an authorization header and extract the API key.</summary>
/// <param name="authorizationHeader">Authorization header value to parse.</param>
/// <param name="apiKey">Parsed API key if successful.</param>
bool TryParseAuthorizationHeader(string? authorizationHeader, out ParsedApiKey? apiKey);
}
@@ -1,8 +0,0 @@
namespace ZB.MOM.WW.MxGateway.Server.Security.Authentication;
public interface IApiKeySecretHasher
{
/// <summary>Hashes an API key secret and returns the hash bytes.</summary>
/// <param name="secret">API key secret to hash.</param>
byte[] HashSecret(string secret);
}
@@ -1,21 +0,0 @@
namespace ZB.MOM.WW.MxGateway.Server.Security.Authentication;
/// <summary>Persists API keys and audit records for authentication and accounting.</summary>
public interface IApiKeyStore
{
/// <summary>Retrieves an API key by ID regardless of revocation status.</summary>
/// <param name="keyId">Identifier of the API key.</param>
/// <param name="cancellationToken">Token to cancel the asynchronous operation.</param>
Task<ApiKeyRecord?> FindByKeyIdAsync(string keyId, CancellationToken cancellationToken);
/// <summary>Retrieves an active (non-revoked) API key by ID.</summary>
/// <param name="keyId">Identifier of the API key.</param>
/// <param name="cancellationToken">Token to cancel the asynchronous operation.</param>
Task<ApiKeyRecord?> FindActiveByKeyIdAsync(string keyId, CancellationToken cancellationToken);
/// <summary>Records that an API key was used for auditing and tracking.</summary>
/// <param name="keyId">Identifier of the API key.</param>
/// <param name="usedUtc">Timestamp when the key was used in UTC.</param>
/// <param name="cancellationToken">Token to cancel the asynchronous operation.</param>
Task MarkKeyUsedAsync(string keyId, DateTimeOffset usedUtc, CancellationToken cancellationToken);
}
@@ -1,12 +0,0 @@
namespace ZB.MOM.WW.MxGateway.Server.Security.Authentication;
/// <summary>Verifies API key authorization headers and returns the authenticated identity.</summary>
public interface IApiKeyVerifier
{
/// <summary>Parses and verifies an authorization header, returning success with identity or a failure reason.</summary>
/// <param name="authorizationHeader">The authorization header value to verify.</param>
/// <param name="cancellationToken">Token to cancel the asynchronous operation.</param>
Task<ApiKeyVerificationResult> VerifyAsync(
string? authorizationHeader,
CancellationToken cancellationToken);
}
@@ -1,10 +0,0 @@
namespace ZB.MOM.WW.MxGateway.Server.Security.Authentication;
/// <summary>Migrates authentication storage between versions.</summary>
public interface IAuthStoreMigrator
{
/// <summary>Performs authentication store migration asynchronously.</summary>
/// <param name="cancellationToken">Token to cancel the asynchronous operation.</param>
/// <returns>Asynchronous task representing the migration operation.</returns>
Task MigrateAsync(CancellationToken cancellationToken);
}
@@ -1,3 +0,0 @@
namespace ZB.MOM.WW.MxGateway.Server.Security.Authentication;
public sealed record ParsedApiKey(string KeyId, string Secret);
@@ -1,141 +0,0 @@
using Microsoft.Data.Sqlite;
namespace ZB.MOM.WW.MxGateway.Server.Security.Authentication;
/// <summary>
/// SQLite-backed storage for API key administration (create, list, revoke, rotate).
/// </summary>
public sealed class SqliteApiKeyAdminStore(AuthSqliteConnectionFactory connectionFactory) : IApiKeyAdminStore
{
/// <inheritdoc />
public async Task CreateAsync(ApiKeyCreateRequest request, CancellationToken cancellationToken)
{
await using SqliteConnection connection = await connectionFactory.OpenConnectionAsync(cancellationToken).ConfigureAwait(false);
await using SqliteCommand command = connection.CreateCommand();
command.CommandText = """
INSERT INTO api_keys (
key_id,
key_prefix,
secret_hash,
display_name,
scopes,
constraints,
created_utc,
last_used_utc,
revoked_utc)
VALUES (
$key_id,
$key_prefix,
$secret_hash,
$display_name,
$scopes,
$constraints,
$created_utc,
NULL,
NULL);
""";
AddCreateParameters(command, request);
await command.ExecuteNonQueryAsync(cancellationToken).ConfigureAwait(false);
}
/// <inheritdoc />
public async Task<IReadOnlyList<ApiKeyRecord>> ListAsync(CancellationToken cancellationToken)
{
await using SqliteConnection connection = await connectionFactory.OpenConnectionAsync(cancellationToken).ConfigureAwait(false);
await using SqliteCommand command = connection.CreateCommand();
command.CommandText = """
SELECT key_id, key_prefix, secret_hash, display_name, scopes, constraints, created_utc, last_used_utc, revoked_utc
FROM api_keys
ORDER BY key_id;
""";
List<ApiKeyRecord> records = [];
await using SqliteDataReader reader = await command.ExecuteReaderAsync(cancellationToken)
.ConfigureAwait(false);
while (await reader.ReadAsync(cancellationToken).ConfigureAwait(false))
{
records.Add(ApiKeyRecordReader.Read(reader));
}
return records;
}
/// <inheritdoc />
public async Task<bool> RevokeAsync(string keyId, DateTimeOffset revokedUtc, CancellationToken cancellationToken)
{
await using SqliteConnection connection = await connectionFactory.OpenConnectionAsync(cancellationToken).ConfigureAwait(false);
await using SqliteCommand command = connection.CreateCommand();
command.CommandText = """
UPDATE api_keys
SET revoked_utc = $revoked_utc
WHERE key_id = $key_id AND revoked_utc IS NULL;
""";
command.Parameters.AddWithValue("$key_id", keyId);
command.Parameters.AddWithValue("$revoked_utc", revokedUtc.ToString("O"));
int rows = await command.ExecuteNonQueryAsync(cancellationToken).ConfigureAwait(false);
return rows > 0;
}
/// <inheritdoc />
public async Task<bool> RotateAsync(
string keyId,
byte[] secretHash,
DateTimeOffset rotatedUtc,
CancellationToken cancellationToken)
{
await using SqliteConnection connection = await connectionFactory.OpenConnectionAsync(cancellationToken).ConfigureAwait(false);
await using SqliteCommand command = connection.CreateCommand();
command.CommandText = """
UPDATE api_keys
SET secret_hash = $secret_hash,
last_used_utc = NULL,
revoked_utc = NULL
WHERE key_id = $key_id;
""";
command.Parameters.AddWithValue("$key_id", keyId);
command.Parameters.Add("$secret_hash", SqliteType.Blob).Value = secretHash;
int rows = await command.ExecuteNonQueryAsync(cancellationToken).ConfigureAwait(false);
return rows > 0;
}
/// <inheritdoc />
public async Task<bool> DeleteAsync(string keyId, CancellationToken cancellationToken)
{
await using SqliteConnection connection = await connectionFactory.OpenConnectionAsync(cancellationToken).ConfigureAwait(false);
await using SqliteCommand command = connection.CreateCommand();
command.CommandText = """
DELETE FROM api_keys
WHERE key_id = $key_id AND revoked_utc IS NOT NULL;
""";
command.Parameters.AddWithValue("$key_id", keyId);
int rows = await command.ExecuteNonQueryAsync(cancellationToken).ConfigureAwait(false);
return rows > 0;
}
private static void AddCreateParameters(SqliteCommand command, ApiKeyCreateRequest request)
{
command.Parameters.AddWithValue("$key_id", request.KeyId);
command.Parameters.AddWithValue("$key_prefix", request.KeyPrefix);
command.Parameters.Add("$secret_hash", SqliteType.Blob).Value = request.SecretHash;
command.Parameters.AddWithValue("$display_name", request.DisplayName);
command.Parameters.AddWithValue("$scopes", ApiKeyScopeSerializer.Serialize(request.Scopes));
command.Parameters.AddWithValue(
"$constraints",
(object?)ApiKeyConstraintSerializer.Serialize(request.Constraints) ?? DBNull.Value);
command.Parameters.AddWithValue("$created_utc", request.CreatedUtc.ToString("O"));
}
}
@@ -1,65 +0,0 @@
using Microsoft.Data.Sqlite;
namespace ZB.MOM.WW.MxGateway.Server.Security.Authentication;
public sealed class SqliteApiKeyAuditStore(AuthSqliteConnectionFactory connectionFactory) : IApiKeyAuditStore
{
/// <inheritdoc />
public async Task AppendAsync(ApiKeyAuditEntry entry, CancellationToken cancellationToken)
{
await using SqliteConnection connection = await connectionFactory.OpenConnectionAsync(cancellationToken).ConfigureAwait(false);
await using SqliteCommand command = connection.CreateCommand();
command.CommandText = """
INSERT INTO api_key_audit (key_id, event_type, remote_address, created_utc, details)
VALUES ($key_id, $event_type, $remote_address, $created_utc, $details);
""";
command.Parameters.AddWithValue("$key_id", (object?)entry.KeyId ?? DBNull.Value);
command.Parameters.AddWithValue("$event_type", entry.EventType);
command.Parameters.AddWithValue("$remote_address", (object?)entry.RemoteAddress ?? DBNull.Value);
command.Parameters.AddWithValue("$created_utc", DateTimeOffset.UtcNow.ToString("O"));
command.Parameters.AddWithValue("$details", (object?)entry.Details ?? DBNull.Value);
await command.ExecuteNonQueryAsync(cancellationToken).ConfigureAwait(false);
}
/// <inheritdoc />
public async Task<IReadOnlyList<ApiKeyAuditRecord>> ListRecentAsync(int count, CancellationToken cancellationToken)
{
if (count <= 0)
{
return [];
}
await using SqliteConnection connection = await connectionFactory.OpenConnectionAsync(cancellationToken).ConfigureAwait(false);
await using SqliteCommand command = connection.CreateCommand();
command.CommandText = """
SELECT audit_id, key_id, event_type, remote_address, created_utc, details
FROM api_key_audit
ORDER BY audit_id DESC
LIMIT $count;
""";
command.Parameters.AddWithValue("$count", count);
List<ApiKeyAuditRecord> records = [];
await using SqliteDataReader reader = await command.ExecuteReaderAsync(cancellationToken)
.ConfigureAwait(false);
while (await reader.ReadAsync(cancellationToken).ConfigureAwait(false))
{
records.Add(new ApiKeyAuditRecord(
AuditId: reader.GetInt64(0),
KeyId: reader.IsDBNull(1) ? null : reader.GetString(1),
EventType: reader.GetString(2),
RemoteAddress: reader.IsDBNull(3) ? null : reader.GetString(3),
CreatedUtc: DateTimeOffset.Parse(
reader.GetString(4),
System.Globalization.CultureInfo.InvariantCulture),
Details: reader.IsDBNull(5) ? null : reader.GetString(5)));
}
return records;
}
}
@@ -1,68 +0,0 @@
using Microsoft.Data.Sqlite;
namespace ZB.MOM.WW.MxGateway.Server.Security.Authentication;
/// <summary>SQLite-based store for API key records.</summary>
public sealed class SqliteApiKeyStore(AuthSqliteConnectionFactory connectionFactory) : IApiKeyStore
{
/// <inheritdoc />
public Task<ApiKeyRecord?> FindByKeyIdAsync(string keyId, CancellationToken cancellationToken)
{
return FindByKeyIdAsync(keyId, requireActive: false, cancellationToken);
}
/// <inheritdoc />
public Task<ApiKeyRecord?> FindActiveByKeyIdAsync(string keyId, CancellationToken cancellationToken)
{
return FindByKeyIdAsync(keyId, requireActive: true, cancellationToken);
}
/// <inheritdoc />
public async Task MarkKeyUsedAsync(string keyId, DateTimeOffset usedUtc, CancellationToken cancellationToken)
{
await using SqliteConnection connection = await connectionFactory.OpenConnectionAsync(cancellationToken).ConfigureAwait(false);
await using SqliteCommand command = connection.CreateCommand();
command.CommandText = """
UPDATE api_keys
SET last_used_utc = $last_used_utc
WHERE key_id = $key_id AND revoked_utc IS NULL;
""";
command.Parameters.AddWithValue("$key_id", keyId);
command.Parameters.AddWithValue("$last_used_utc", usedUtc.ToString("O"));
await command.ExecuteNonQueryAsync(cancellationToken).ConfigureAwait(false);
}
private async Task<ApiKeyRecord?> FindByKeyIdAsync(
string keyId,
bool requireActive,
CancellationToken cancellationToken)
{
await using SqliteConnection connection = await connectionFactory.OpenConnectionAsync(cancellationToken).ConfigureAwait(false);
await using SqliteCommand command = connection.CreateCommand();
command.CommandText = requireActive
? """
SELECT key_id, key_prefix, secret_hash, display_name, scopes, constraints, created_utc, last_used_utc, revoked_utc
FROM api_keys
WHERE key_id = $key_id AND revoked_utc IS NULL;
"""
: """
SELECT key_id, key_prefix, secret_hash, display_name, scopes, constraints, created_utc, last_used_utc, revoked_utc
FROM api_keys
WHERE key_id = $key_id;
""";
command.Parameters.AddWithValue("$key_id", keyId);
await using SqliteDataReader reader = await command.ExecuteReaderAsync(cancellationToken)
.ConfigureAwait(false);
if (!await reader.ReadAsync(cancellationToken).ConfigureAwait(false))
{
return null;
}
return ApiKeyRecordReader.Read(reader);
}
}
@@ -1,12 +0,0 @@
namespace ZB.MOM.WW.MxGateway.Server.Security.Authentication;
public static class SqliteAuthSchema
{
public const int CurrentVersion = 2;
public const string SchemaVersionTable = "schema_version";
public const string ApiKeysTable = "api_keys";
public const string ApiKeyAuditTable = "api_key_audit";
}
@@ -1,192 +0,0 @@
using Microsoft.Data.Sqlite;
namespace ZB.MOM.WW.MxGateway.Server.Security.Authentication;
public sealed class SqliteAuthStoreMigrator(AuthSqliteConnectionFactory connectionFactory) : IAuthStoreMigrator
{
/// <summary>Applies database migrations to the authentication store.</summary>
/// <param name="cancellationToken">Cancellation token.</param>
public async Task MigrateAsync(CancellationToken cancellationToken)
{
await using SqliteConnection connection = await connectionFactory.OpenConnectionAsync(cancellationToken).ConfigureAwait(false);
await using SqliteTransaction transaction =
(SqliteTransaction)await connection.BeginTransactionAsync(cancellationToken).ConfigureAwait(false);
int existingVersion = await ReadExistingSchemaVersionAsync(connection, transaction, cancellationToken)
.ConfigureAwait(false);
if (existingVersion > SqliteAuthSchema.CurrentVersion)
{
throw new AuthStoreMigrationException(
$"Auth database schema version {existingVersion} is newer than supported version {SqliteAuthSchema.CurrentVersion}.");
}
await ApplyVersionOneAsync(connection, transaction, cancellationToken).ConfigureAwait(false);
await ApplyVersionTwoAsync(connection, transaction, cancellationToken).ConfigureAwait(false);
await WriteSchemaVersionAsync(connection, transaction, cancellationToken).ConfigureAwait(false);
await transaction.CommitAsync(cancellationToken).ConfigureAwait(false);
}
private static async Task<int> ReadExistingSchemaVersionAsync(
SqliteConnection connection,
SqliteTransaction transaction,
CancellationToken cancellationToken)
{
await using SqliteCommand tableExistsCommand = connection.CreateCommand();
tableExistsCommand.Transaction = transaction;
tableExistsCommand.CommandText = """
SELECT COUNT(*)
FROM sqlite_master
WHERE type = 'table' AND name = $table_name;
""";
tableExistsCommand.Parameters.AddWithValue("$table_name", SqliteAuthSchema.SchemaVersionTable);
long tableCount = (long)(await tableExistsCommand.ExecuteScalarAsync(cancellationToken).ConfigureAwait(false) ?? 0L);
if (tableCount == 0)
{
return 0;
}
await using SqliteCommand versionCommand = connection.CreateCommand();
versionCommand.Transaction = transaction;
versionCommand.CommandText = """
SELECT version
FROM schema_version
WHERE id = 1;
""";
object? version = await versionCommand.ExecuteScalarAsync(cancellationToken).ConfigureAwait(false);
return version is null || version == DBNull.Value
? 0
: Convert.ToInt32(version, System.Globalization.CultureInfo.InvariantCulture);
}
private static async Task ApplyVersionOneAsync(
SqliteConnection connection,
SqliteTransaction transaction,
CancellationToken cancellationToken)
{
await ExecuteNonQueryAsync(
connection,
transaction,
"""
CREATE TABLE IF NOT EXISTS schema_version (
id INTEGER PRIMARY KEY CHECK (id = 1),
version INTEGER NOT NULL,
applied_utc TEXT NOT NULL
);
CREATE TABLE IF NOT EXISTS api_keys (
key_id TEXT PRIMARY KEY,
key_prefix TEXT NOT NULL,
secret_hash BLOB NOT NULL,
display_name TEXT NOT NULL,
scopes TEXT NOT NULL,
constraints TEXT NULL,
created_utc TEXT NOT NULL,
last_used_utc TEXT NULL,
revoked_utc TEXT NULL
);
CREATE TABLE IF NOT EXISTS api_key_audit (
audit_id INTEGER PRIMARY KEY AUTOINCREMENT,
key_id TEXT NULL,
event_type TEXT NOT NULL,
remote_address TEXT NULL,
created_utc TEXT NOT NULL,
details TEXT NULL
);
CREATE INDEX IF NOT EXISTS ix_api_keys_revoked_utc
ON api_keys (revoked_utc);
CREATE INDEX IF NOT EXISTS ix_api_key_audit_key_id_created_utc
ON api_key_audit (key_id, created_utc);
""",
cancellationToken).ConfigureAwait(false);
}
private static async Task ApplyVersionTwoAsync(
SqliteConnection connection,
SqliteTransaction transaction,
CancellationToken cancellationToken)
{
if (await ColumnExistsAsync(connection, transaction, SqliteAuthSchema.ApiKeysTable, "constraints", cancellationToken)
.ConfigureAwait(false))
{
return;
}
await ExecuteNonQueryAsync(
connection,
transaction,
"""
ALTER TABLE api_keys
ADD COLUMN constraints TEXT NULL;
""",
cancellationToken).ConfigureAwait(false);
}
private static async Task WriteSchemaVersionAsync(
SqliteConnection connection,
SqliteTransaction transaction,
CancellationToken cancellationToken)
{
await using SqliteCommand versionCommand = connection.CreateCommand();
versionCommand.Transaction = transaction;
versionCommand.CommandText = """
INSERT INTO schema_version (id, version, applied_utc)
VALUES (1, $version, $applied_utc)
ON CONFLICT(id) DO UPDATE SET
version = excluded.version,
applied_utc = excluded.applied_utc;
""";
versionCommand.Parameters.AddWithValue("$version", SqliteAuthSchema.CurrentVersion);
versionCommand.Parameters.AddWithValue("$applied_utc", DateTimeOffset.UtcNow.ToString("O"));
await versionCommand.ExecuteNonQueryAsync(cancellationToken).ConfigureAwait(false);
}
private static async Task<bool> ColumnExistsAsync(
SqliteConnection connection,
SqliteTransaction transaction,
string tableName,
string columnName,
CancellationToken cancellationToken)
{
await using SqliteCommand command = connection.CreateCommand();
command.Transaction = transaction;
command.CommandText = $"PRAGMA table_info({tableName});";
await using SqliteDataReader reader = await command.ExecuteReaderAsync(cancellationToken)
.ConfigureAwait(false);
while (await reader.ReadAsync(cancellationToken).ConfigureAwait(false))
{
if (string.Equals(reader.GetString(1), columnName, StringComparison.OrdinalIgnoreCase))
{
return true;
}
}
return false;
}
private static async Task ExecuteNonQueryAsync(
SqliteConnection connection,
SqliteTransaction transaction,
string commandText,
CancellationToken cancellationToken)
{
await using SqliteCommand command = connection.CreateCommand();
command.Transaction = transaction;
command.CommandText = commandText;
await command.ExecuteNonQueryAsync(cancellationToken).ConfigureAwait(false);
}
}
@@ -1,13 +1,20 @@
using System.Text.Json;
using ZB.MOM.WW.Audit;
using ZB.MOM.WW.MxGateway.Contracts.Proto.Galaxy; using ZB.MOM.WW.MxGateway.Contracts.Proto.Galaxy;
using ZB.MOM.WW.MxGateway.Server.Galaxy; using ZB.MOM.WW.MxGateway.Server.Galaxy;
using ZB.MOM.WW.MxGateway.Server.Security.Audit;
using ZB.MOM.WW.MxGateway.Server.Security.Authentication; using ZB.MOM.WW.MxGateway.Server.Security.Authentication;
using ZB.MOM.WW.MxGateway.Server.Sessions; using ZB.MOM.WW.MxGateway.Server.Sessions;
// The gateway carries its own constraint-bearing identity downstream; the shared library also
// defines an ApiKeyIdentity (scopes + opaque constraints JSON), so disambiguate to the gateway type.
using ApiKeyIdentity = ZB.MOM.WW.MxGateway.Server.Security.Authentication.ApiKeyIdentity;
namespace ZB.MOM.WW.MxGateway.Server.Security.Authorization; namespace ZB.MOM.WW.MxGateway.Server.Security.Authorization;
public sealed class ConstraintEnforcer( public sealed class ConstraintEnforcer(
IGalaxyHierarchyCache cache, IGalaxyHierarchyCache cache,
IApiKeyAuditStore auditStore) : IConstraintEnforcer IAuditWriter auditWriter) : IConstraintEnforcer
{ {
/// <summary>Checks read constraints on a tag address.</summary> /// <summary>Checks read constraints on a tag address.</summary>
/// <param name="identity">The API key identity to check constraints for.</param> /// <param name="identity">The API key identity to check constraints for.</param>
@@ -121,14 +128,33 @@ public sealed class ConstraintEnforcer(
ConstraintFailure failure, ConstraintFailure failure,
CancellationToken cancellationToken) CancellationToken cancellationToken)
{ {
await auditStore.AppendAsync( // Emit a canonical Denied AuditEvent directly through the best-effort IAuditWriter
new ApiKeyAuditEntry( // (Task 2.3 #6): structured Target ("<commandKind>:<target>") and a richer DetailsJson
KeyId: identity?.KeyId, // envelope carrying constraint/message/commandKind/target.
EventType: "constraint-denied", // TODO(Task 2.3): CorrelationId is left null here. Threading the per-request
RemoteAddress: null, // ClientCorrelationId down to RecordDenialAsync would require an invasive IConstraintEnforcer
Details: $"{commandKind}: {target}: {failure.ConstraintName}: {failure.Message}"), // signature change across the gRPC call path; that is deferred to a follow-up.
cancellationToken) AuditEvent auditEvent = new()
.ConfigureAwait(false); {
EventId = Guid.NewGuid(),
OccurredAtUtc = DateTimeOffset.UtcNow,
Actor = identity?.KeyId ?? "anonymous",
Action = "constraint-denied",
Outcome = AuditOutcome.Denied,
Category = CanonicalForwardingApiKeyAuditStore.ApiKeyCategory,
Target = $"{commandKind}:{target}",
SourceNode = null,
CorrelationId = null,
DetailsJson = JsonSerializer.Serialize(new Dictionary<string, string>
{
["constraint"] = failure.ConstraintName,
["message"] = failure.Message,
["commandKind"] = commandKind,
["target"] = target,
}),
};
await auditWriter.WriteAsync(auditEvent, cancellationToken).ConfigureAwait(false);
} }
private ConstraintFailure? CheckReadTarget( private ConstraintFailure? CheckReadTarget(
@@ -1,9 +1,14 @@
using Grpc.Core; using Grpc.Core;
using Grpc.Core.Interceptors; using Grpc.Core.Interceptors;
using Microsoft.Extensions.Options; using Microsoft.Extensions.Options;
using ZB.MOM.WW.Auth.Abstractions.ApiKeys;
using ZB.MOM.WW.MxGateway.Server.Configuration; using ZB.MOM.WW.MxGateway.Server.Configuration;
using ZB.MOM.WW.MxGateway.Server.Security.Authentication; using ZB.MOM.WW.MxGateway.Server.Security.Authentication;
// The handler pushes the gateway's constraint-bearing identity; alias away the shared library's
// ApiKeyIdentity so the unqualified name resolves to the gateway type.
using ApiKeyIdentity = ZB.MOM.WW.MxGateway.Server.Security.Authentication.ApiKeyIdentity;
namespace ZB.MOM.WW.MxGateway.Server.Security.Authorization; namespace ZB.MOM.WW.MxGateway.Server.Security.Authorization;
public sealed class GatewayGrpcAuthorizationInterceptor( public sealed class GatewayGrpcAuthorizationInterceptor(
@@ -57,25 +62,31 @@ public sealed class GatewayGrpcAuthorizationInterceptor(
} }
string? authorizationHeader = context.RequestHeaders.GetValue("authorization"); string? authorizationHeader = context.RequestHeaders.GetValue("authorization");
ApiKeyVerificationResult verificationResult = await apiKeyVerifier
.VerifyAsync(authorizationHeader, context.CancellationToken) // The shared verifier owns parse + pepper + lookup + revocation + constant-time compare,
// returning a discriminated failure rather than throwing. Every authentication failure maps
// to Unauthenticated with an opaque message; the client never learns which stage failed.
ApiKeyVerification verification = await apiKeyVerifier
.VerifyAsync(authorizationHeader ?? string.Empty, context.CancellationToken)
.ConfigureAwait(false); .ConfigureAwait(false);
if (!verificationResult.Succeeded || verificationResult.Identity is null) if (!verification.Succeeded || verification.Identity is null)
{ {
throw new RpcException(new Status( throw new RpcException(new Status(
StatusCode.Unauthenticated, StatusCode.Unauthenticated,
"Missing or invalid API key.")); "Missing or invalid API key."));
} }
ApiKeyIdentity identity = GatewayApiKeyIdentityMapper.ToGatewayIdentity(verification.Identity);
string requiredScope = scopeResolver.ResolveRequiredScope(request); string requiredScope = scopeResolver.ResolveRequiredScope(request);
if (!verificationResult.Identity.Scopes.Contains(requiredScope)) if (!identity.Scopes.Contains(requiredScope))
{ {
throw new RpcException(new Status( throw new RpcException(new Status(
StatusCode.PermissionDenied, StatusCode.PermissionDenied,
$"API key is missing required scope '{requiredScope}'.")); $"API key is missing required scope '{requiredScope}'."));
} }
return verificationResult.Identity; return identity;
} }
} }
@@ -6,10 +6,22 @@
<ItemGroup> <ItemGroup>
<PackageReference Include="Grpc.AspNetCore" Version="2.76.0" /> <PackageReference Include="Grpc.AspNetCore" Version="2.76.0" />
<PackageReference Include="ZB.MOM.WW.Auth.Abstractions" Version="0.1.2" />
<PackageReference Include="ZB.MOM.WW.Auth.Ldap" Version="0.1.2" />
<PackageReference Include="ZB.MOM.WW.Auth.ApiKeys" Version="0.1.2" />
<PackageReference Include="ZB.MOM.WW.Auth.AspNetCore" Version="0.1.2" />
<PackageReference Include="ZB.MOM.WW.Audit" Version="0.1.0" />
<PackageReference Include="ZB.MOM.WW.Theme" Version="0.3.1" />
<PackageReference Include="ZB.MOM.WW.Configuration" Version="0.1.0" />
<PackageReference Include="ZB.MOM.WW.Health" Version="0.1.0" />
<PackageReference Include="ZB.MOM.WW.Telemetry" Version="0.1.0" />
<PackageReference Include="ZB.MOM.WW.Telemetry.Serilog" Version="0.1.0" />
<PackageReference Include="Serilog.AspNetCore" Version="10.0.0" />
<PackageReference Include="Serilog.Sinks.Console" Version="6.1.1" />
<PackageReference Include="Serilog.Sinks.File" Version="7.0.0" />
<PackageReference Include="Microsoft.AspNetCore.SignalR.Client" Version="10.0.0" /> <PackageReference Include="Microsoft.AspNetCore.SignalR.Client" Version="10.0.0" />
<PackageReference Include="Microsoft.Data.Sqlite" Version="10.0.7" /> <PackageReference Include="Microsoft.Data.Sqlite" Version="10.0.7" />
<PackageReference Include="Microsoft.Data.SqlClient" Version="6.0.2" /> <PackageReference Include="Microsoft.Data.SqlClient" Version="6.0.2" />
<PackageReference Include="Novell.Directory.Ldap.NETStandard" Version="3.6.0" />
<PackageReference Include="Polly.Core" Version="8.6.6" /> <PackageReference Include="Polly.Core" Version="8.6.6" />
</ItemGroup> </ItemGroup>
@@ -1,8 +1,13 @@
{ {
"Logging": { "Serilog": {
"LogLevel": { "Using": [ "Serilog.Sinks.Console", "Serilog.Sinks.File" ],
"MinimumLevel": {
"Default": "Information", "Default": "Information",
"Microsoft.AspNetCore": "Warning" "Override": { "Microsoft.AspNetCore": "Warning" }
} },
"WriteTo": [
{ "Name": "Console" },
{ "Name": "File", "Args": { "path": "logs/mxgateway-.log", "rollingInterval": "Day" } }
]
} }
} }
@@ -1,9 +1,14 @@
{ {
"Logging": { "Serilog": {
"LogLevel": { "Using": [ "Serilog.Sinks.Console", "Serilog.Sinks.File" ],
"MinimumLevel": {
"Default": "Information", "Default": "Information",
"Microsoft.AspNetCore": "Warning" "Override": { "Microsoft.AspNetCore": "Warning" }
} },
"WriteTo": [
{ "Name": "Console" },
{ "Name": "File", "Args": { "path": "logs/mxgateway-.log", "rollingInterval": "Day" } }
]
}, },
"AllowedHosts": "*", "AllowedHosts": "*",
"MxGateway": { "MxGateway": {
@@ -17,10 +22,10 @@
"Enabled": true, "Enabled": true,
"Server": "localhost", "Server": "localhost",
"Port": 3893, "Port": 3893,
"UseTls": false, "Transport": "None",
"AllowInsecureLdap": true, "AllowInsecure": true,
"SearchBase": "dc=lmxopcua,dc=local", "SearchBase": "dc=zb,dc=local",
"ServiceAccountDn": "cn=serviceaccount,dc=lmxopcua,dc=local", "ServiceAccountDn": "cn=serviceaccount,dc=zb,dc=local",
"ServiceAccountPassword": "serviceaccount123", "ServiceAccountPassword": "serviceaccount123",
"UserNameAttribute": "cn", "UserNameAttribute": "cn",
"DisplayNameAttribute": "cn", "DisplayNameAttribute": "cn",
@@ -55,7 +60,7 @@
"RecentSessionLimit": 200, "RecentSessionLimit": 200,
"ShowTagValues": false, "ShowTagValues": false,
"GroupToRole": { "GroupToRole": {
"GwAdmin": "Admin", "GwAdmin": "Administrator",
"GwReader": "Viewer" "GwReader": "Viewer"
} }
}, },
@@ -9,108 +9,13 @@
body.dashboard-body { min-height: 100vh; } body.dashboard-body { min-height: 100vh; }
/* App bar /* App bar
theme.css styles .app-bar / .brand / .mark / .spacer. Here we centre the row The kit's theme.css styles .app-bar / .brand / .mark / .spacer; these rules
and add the meta cluster. Navigation lives in the side rail below. */ centre the row and tweak the brand. Used by the minimal /denied page header
the main dashboard's navigation lives in the kit side-rail shell (ThemeShell). */
.app-bar { align-items: center; gap: 1rem; } .app-bar { align-items: center; gap: 1rem; }
.app-bar .brand { color: var(--ink); } .app-bar .brand { color: var(--ink); }
.app-bar .brand:hover { text-decoration: none; } .app-bar .brand:hover { text-decoration: none; }
/* Sidebar
Left-rail navigation. Pattern lifted from ScadaLink CentralUI: a fixed-width
sidebar that hosts the brand at the top, a scrollable nav region with
collapsible NavSections in the middle, and a sign-in/out footer at the
bottom. The sidebar is wrapped in a Bootstrap .collapse so a hamburger
button can show/hide it on <lg viewports. */
.sidebar {
min-width: 220px;
max-width: 220px;
height: 100vh;
background: var(--card);
border-right: 1px solid var(--rule-strong);
}
/* Pin the sidebar to the viewport on lg+ so it stays visible when the main
content scrolls past 100vh. The wrapper is the flex child of MainLayout;
align-self prevents the flex row from stretching it. */
@media (min-width: 992px) {
#sidebar-collapse {
position: sticky;
top: 0;
height: 100vh;
align-self: flex-start;
z-index: 1020;
}
}
/* When collapsed under <lg viewports the Bootstrap collapse container removes
the fixed width; restore full width on mobile. */
@media (max-width: 991.98px) {
.sidebar {
min-width: 100%;
max-width: 100%;
height: auto;
}
}
.sidebar .brand {
display: block;
color: var(--ink);
font-size: 1.1rem;
font-weight: 600;
letter-spacing: 0.02em;
padding: 1rem;
border-bottom: 1px solid var(--rule);
text-decoration: none;
}
.sidebar .brand:hover { text-decoration: none; }
.sidebar .brand .mark { color: var(--accent); }
.sidebar .nav-link {
color: var(--ink-soft);
padding: 0.4rem 1rem;
font-size: 0.9rem;
}
.sidebar .nav-link:hover {
color: var(--ink);
background-color: var(--paper);
text-decoration: none;
}
.sidebar .nav-link.active {
color: var(--accent-deep);
background-color: var(--paper);
font-weight: 600;
/* Left accent so active state isn't carried by colour alone. */
border-left: 3px solid var(--accent);
padding-left: calc(1rem - 3px);
}
/* Collapsible section header a full-width button styled as an uppercase
eyebrow with a leading expand/collapse chevron. */
.sidebar .nav-section-toggle {
display: flex;
align-items: center;
gap: 0.4rem;
width: 100%;
background: none;
border: 0;
cursor: pointer;
text-align: left;
color: var(--ink-faint);
font-size: 0.7rem;
font-weight: 600;
text-transform: uppercase;
letter-spacing: 0.07em;
padding: 0.75rem 1rem 0.25rem;
margin-top: 0.5rem;
}
.sidebar .nav-section-toggle:hover { color: var(--ink); }
.sidebar .nav-section-toggle .chevron {
display: inline-block;
width: 0.8rem;
font-size: 0.8rem;
line-height: 1;
}
/* Page header /* Page header
h1 in sans, the sub-line in monospace as a quiet meta crumb. */ h1 in sans, the sub-line in monospace as a quiet meta crumb. */
.dashboard-page-header { .dashboard-page-header {
@@ -337,14 +242,6 @@ code {
.alert-success { color: var(--ok); background: var(--ok-bg); border-color: #c6e6cd; } .alert-success { color: var(--ok); background: var(--ok-bg); border-color: #c6e6cd; }
.alert-danger { color: var(--bad); background: var(--bad-bg); border-color: #eec3c3; } .alert-danger { color: var(--bad); background: var(--bad-bg); border-color: #eec3c3; }
/* ── Login ───────────────────────────────────────────────────────────────────*/
.dashboard-login { max-width: 24rem; margin: 0 auto; }
.login-card {
background: var(--card);
border: 1px solid var(--rule);
border-radius: 8px;
}
/* ── API key management ──────────────────────────────────────────────────────*/ /* ── API key management ──────────────────────────────────────────────────────*/
.api-key-management-grid { .api-key-management-grid {
display: grid; display: grid;
@@ -1,379 +0,0 @@
/* ============================================================================
Technical-Light design system portable theme layer
----------------------------------------------------------------------------
A refined technical-light aesthetic: warm-neutral paper, hairline rules,
IBM Plex type, monospace tabular numerics, status carried by colour. Built
to layer over Bootstrap 5 via --bs-* overrides, but every rule below works
standalone Bootstrap is optional.
HOW TO ADOPT
1. Serve the three IBM Plex woff2 files (shipped in fonts/) and fix the
@font-face url() paths below to wherever you serve them.
2. Include this file once, globally. Add view-specific rules in a separate
stylesheet never edit the token block per-view.
3. Status is colour, not iconography. Use the .s-* / .chip-* / .kv .v.*
helpers; do not hand-pick hex values in feature CSS.
========================================================================= */
/* Vendored fonts (embedded woff2, no network/CDN fetch)
Adjust these url()s to your asset route. If you cannot vendor the fonts the
--sans / --mono fallback stacks below degrade gracefully to system fonts. */
@font-face {
font-family: 'IBM Plex Sans';
font-style: normal; font-weight: 400; font-display: swap;
src: url('/fonts/ibm-plex-sans-400.woff2') format('woff2');
}
@font-face {
font-family: 'IBM Plex Sans';
font-style: normal; font-weight: 600; font-display: swap;
src: url('/fonts/ibm-plex-sans-600.woff2') format('woff2');
}
@font-face {
font-family: 'IBM Plex Mono';
font-style: normal; font-weight: 500; font-display: swap;
src: url('/fonts/ibm-plex-mono-500.woff2') format('woff2');
}
/* Design tokens
The single source of truth. Re-theme by editing only this block. */
:root {
/* Surfaces & ink */
--paper: #f4f4f1; /* page background — warm off-white, never pure */
--card: #ffffff; /* raised surfaces: cards, bars, table heads */
--ink: #1b1d21; /* primary text */
--ink-soft: #5a6066; /* secondary text, labels */
--ink-faint: #8b9097; /* tertiary text, captions, units */
--rule: #e4e4df; /* hairline borders / row dividers */
--rule-strong: #d2d2cb; /* emphasised hairlines: bar underline, pills */
/* Accent */
--accent: #2f5fd0; /* links, sort arrows, primary actions */
--accent-deep: #1e3f99; /* hover / pressed accent, raw-value emphasis */
/* Status — foreground */
--ok: #2f9e44;
--warn: #e8920c;
--bad: #e03131;
--idle: #868e96;
/* Status — tinted backgrounds (pair with the matching foreground) */
--ok-bg: #e9f6ec;
--warn-bg: #fdf1dd;
--bad-bg: #fceaea;
--idle-bg: #eef0f2;
/* Type stacks — Plex first, graceful system fallback */
--mono: 'IBM Plex Mono', ui-monospace, 'Cascadia Mono', Consolas, monospace;
--sans: 'IBM Plex Sans', system-ui, -apple-system, 'Segoe UI', sans-serif;
/* Bootstrap 5 overrides — harmless if Bootstrap is absent */
--bs-body-bg: var(--paper);
--bs-body-color: var(--ink);
--bs-body-font-family: var(--sans);
--bs-body-font-size: 0.9rem;
--bs-primary: var(--accent);
--bs-border-color: var(--rule);
--bs-emphasis-color: var(--ink);
}
/* Base
The faint top-right radial is the one deliberate flourish a soft sheen,
not a gradient wash. Keep it subtle. */
body {
background:
radial-gradient(1200px 480px at 88% -8%, #ffffff 0%, rgba(255,255,255,0) 70%),
var(--paper);
color: var(--ink);
font-family: var(--sans);
font-size: 0.9rem;
-webkit-font-smoothing: antialiased;
}
/* Any numeric / fixed-width text. Tabular figures so columns of digits align. */
.numeric,
.mono { font-family: var(--mono); font-variant-numeric: tabular-nums; }
a { color: var(--accent); text-decoration: none; }
a:hover { color: var(--accent-deep); text-decoration: underline; }
/* App chrome: top bar
One bar across the top: brand, breadcrumb crumbs, a flex spacer, then meta
text and any status pill pushed hard right. */
.app-bar {
display: flex;
align-items: baseline;
gap: 1rem;
padding: 0.85rem 1.25rem;
background: var(--card);
border-bottom: 1px solid var(--rule-strong);
}
.app-bar .brand {
font-weight: 600;
font-size: 1.05rem;
letter-spacing: 0.02em;
}
.app-bar .brand .mark { color: var(--accent); } /* the one accent glyph */
.app-bar .crumb { color: var(--ink-faint); font-size: 0.85rem; }
.app-bar .spacer { flex: 1; } /* pushes meta/pill right */
.app-bar .meta {
font-family: var(--mono);
font-size: 0.78rem;
color: var(--ink-soft);
}
/* Connection / liveness pill
A rounded pill with a dot, driven entirely by data-state. Use for any
live-link health indicator (websocket, SSE, polling). */
.conn-pill {
display: inline-flex;
align-items: center;
gap: 0.4rem;
font-size: 0.74rem;
font-weight: 600;
text-transform: uppercase;
letter-spacing: 0.06em;
padding: 0.2rem 0.6rem;
border-radius: 999px;
border: 1px solid var(--rule-strong);
color: var(--ink-soft);
background: var(--card);
}
.conn-pill .dot {
width: 7px; height: 7px; border-radius: 50%;
background: var(--idle);
}
.conn-pill[data-state="connected"] { color: var(--ok); border-color: #bfe3c6; background: var(--ok-bg); }
.conn-pill[data-state="connected"] .dot { background: var(--ok); }
.conn-pill[data-state="connecting"] { color: var(--warn); border-color: #f0d9ab; background: var(--warn-bg); }
.conn-pill[data-state="connecting"] .dot { background: var(--warn); animation: pulse 1.1s ease-in-out infinite; }
.conn-pill[data-state="disconnected"] { color: var(--bad); border-color: #f0c0c0; background: var(--bad-bg); }
.conn-pill[data-state="disconnected"] .dot { background: var(--bad); }
@keyframes pulse { 0%,100% { opacity: 1; } 50% { opacity: 0.25; } }
/* Status text helpers
Recolour a value in place counts, ratios, error totals. */
.s-ok { color: var(--ok); }
.s-warn { color: var(--warn); }
.s-bad { color: var(--bad); }
.s-idle { color: var(--idle); }
/* State chip
Compact rectangular badge for an enumerated state (bound/recovering/).
Squarer than the pill; use the pill for liveness, the chip for state. */
.chip {
display: inline-block;
font-size: 0.72rem;
font-weight: 600;
text-transform: uppercase;
letter-spacing: 0.05em;
padding: 0.15rem 0.5rem;
border-radius: 4px;
border: 1px solid transparent;
}
.chip-ok { color: var(--ok); background: var(--ok-bg); border-color: #c6e6cd; }
.chip-warn { color: #b56a00; background: var(--warn-bg); border-color: #efd6a6; }
.chip-bad { color: var(--bad); background: var(--bad-bg); border-color: #eec3c3; }
.chip-idle { color: var(--ink-soft); background: var(--idle-bg); border-color: var(--rule-strong); }
/* Panel the base raised surface
A white card with a hairline border and 8px radius. .panel-head is the
uppercase eyebrow label that sits on top. */
.panel {
background: var(--card);
border: 1px solid var(--rule);
border-radius: 8px;
}
.panel-head {
font-size: 0.74rem;
font-weight: 600;
text-transform: uppercase;
letter-spacing: 0.07em;
color: var(--ink-faint);
padding: 0.6rem 0.9rem;
border-bottom: 1px solid var(--rule);
}
/* Page wrapper
Centred, capped width, even gutter. */
.page { padding: 1.25rem; max-width: 1680px; margin: 0 auto; }
/* Reveal-on-paint
Add .rise to top-level sections; stagger with inline animation-delay
(.02s, .08s, .14s ) so panels settle in sequence, not all at once. */
@keyframes rise { from { opacity: 0; transform: translateY(6px); } to { opacity: 1; transform: none; } }
.rise { animation: rise 0.4s ease both; }
/*
COMPONENT LIBRARY
Generic, reusable pieces. View-specific layout belongs in a separate sheet.
*/
/* KPI / aggregate cards
A responsive strip of headline numbers. .agg-card.alert / .caution tint the
whole card when a watched metric goes non-zero. */
.agg-grid {
display: grid;
grid-template-columns: repeat(6, 1fr);
gap: 0.75rem;
margin-bottom: 1rem;
}
@media (max-width: 1100px) { .agg-grid { grid-template-columns: repeat(3, 1fr); } }
@media (max-width: 620px) { .agg-grid { grid-template-columns: repeat(2, 1fr); } }
.agg-card {
background: var(--card);
border: 1px solid var(--rule);
border-radius: 8px;
padding: 0.7rem 0.9rem;
}
.agg-label {
font-size: 0.68rem;
font-weight: 600;
text-transform: uppercase;
letter-spacing: 0.07em;
color: var(--ink-faint);
}
.agg-value {
margin-top: 0.25rem;
font-size: 1.5rem;
font-weight: 600;
line-height: 1.1;
display: flex;
align-items: baseline;
gap: 0.35rem;
}
.agg-sub { /* trailing "/ 54", "ms" etc. — quieter */
font-size: 0.85rem;
font-weight: 400;
color: var(--ink-faint);
}
.agg-card.alert { border-color: #eec3c3; background: var(--bad-bg); }
.agg-card.alert .agg-value { color: var(--bad); }
.agg-card.caution { border-color: #efd6a6; background: var(--warn-bg); }
.agg-card.caution .agg-value { color: #b56a00; }
/* Metric card + key/value rows
A .panel-head over a stack of .kv rows: label left, monospace value right.
Zebra striping on even rows. .v.warn / .v.bad / .v.ok recolour a value. */
.card-grid {
display: grid;
grid-template-columns: repeat(auto-fill, minmax(290px, 1fr));
gap: 0.85rem;
margin-bottom: 1rem;
}
.metric-card {
background: var(--card);
border: 1px solid var(--rule);
border-radius: 8px;
overflow: hidden;
}
.metric-card .panel-head { margin: 0; }
.kv {
display: flex;
justify-content: space-between;
align-items: baseline;
gap: 1rem;
padding: 0.32rem 0.9rem;
font-size: 0.85rem;
}
.kv:nth-child(even) { background: #fbfbf9; }
.kv .k { color: var(--ink-soft); }
.kv .v {
font-family: var(--mono);
font-variant-numeric: tabular-nums;
text-align: right;
}
.kv .v.warn { color: var(--warn); }
.kv .v.bad { color: var(--bad); }
.kv .v.ok { color: var(--ok); }
/* Toolbar
Filter/search row that sits inside a .panel above a table. */
.toolbar {
display: flex;
align-items: center;
gap: 0.6rem;
padding: 0.6rem 0.9rem;
border-bottom: 1px solid var(--rule);
}
.toolbar .spacer { flex: 1; }
.tb-search { max-width: 280px; }
.tb-state { max-width: 150px; }
.tb-check {
display: flex; align-items: center; gap: 0.35rem;
font-size: 0.82rem; color: var(--ink-soft); white-space: nowrap;
user-select: none;
}
.tb-count { font-family: var(--mono); font-size: 0.78rem; color: var(--ink-faint); }
/* Data table
Dense, hairline-ruled table. Uppercase sticky head on a faint fill; numeric
columns get .num (right-aligned, monospace). Rows are clickable by default
drop the cursor/hover rules if yours are not. */
.table-wrap { overflow-x: auto; }
.data-table {
width: 100%;
border-collapse: collapse;
font-size: 0.85rem;
}
.data-table th,
.data-table td {
padding: 0.45rem 0.8rem;
text-align: left;
white-space: nowrap;
border-bottom: 1px solid var(--rule);
}
.data-table th {
font-size: 0.7rem;
font-weight: 600;
text-transform: uppercase;
letter-spacing: 0.05em;
color: var(--ink-faint);
background: #fbfbf9;
position: sticky;
top: 0;
}
.data-table th.num,
.data-table td.num { text-align: right; font-family: var(--mono); }
.data-table th.sortable { cursor: pointer; user-select: none; }
.data-table th.sortable:hover { color: var(--ink); }
.data-table th.sorted-asc::after { content: ' \2191'; color: var(--accent); }
.data-table th.sorted-desc::after { content: ' \2193'; color: var(--accent); }
.data-table tbody tr { cursor: pointer; transition: background 0.08s; }
.data-table tbody tr:hover { background: #f3f6fd; }
.data-table tbody tr:last-child td { border-bottom: none; }
.empty-row {
text-align: center !important;
color: var(--ink-faint);
padding: 1.6rem !important;
font-style: italic;
}
/* Direction / category tag
Tiny inline tag for a per-row category (e.g. read vs write). */
.dir-tag {
font-size: 0.68rem;
font-weight: 600;
text-transform: uppercase;
letter-spacing: 0.05em;
padding: 0.1rem 0.4rem;
border-radius: 3px;
}
.dir-read { color: var(--accent-deep); background: #e7ecfb; }
.dir-write { color: #8a5a00; background: var(--warn-bg); }
/* Inline notice
A .panel with a warning tint for "this thing is gone / degraded" banners. */
.notice {
padding: 0.85rem 1.1rem;
margin-bottom: 1rem;
color: #b56a00;
background: var(--warn-bg);
border-color: #efd6a6;
}

Some files were not shown because too many files have changed in this diff Show More