Compare commits

..

94 Commits

Author SHA1 Message Date
Joseph Doherty 6c2d16d4af docs: refresh HistorianGateway + GalaxyRepository status in index
HistorianGateway is now pushed to gitea (gitea.dohertylan.com/dohertj2/historiangw), and
ZB.MOM.WW.GalaxyRepository is published to the Gitea feed and consumed as a PackageReference
(no longer a cross-repo ProjectReference). Updates the sister-project row, the component
table, and the GalaxyRepository narrative; test figure 584 green -> 590 total (584 on macOS).
2026-06-24 07:28:25 -04:00
Joseph Doherty a08ddab9dd chore: retire unused ZB.MOM.WW.SPHistorianClient (stale partial port; superseded by histsdk vendored in HistorianGateway; no consumers, not on feed) 2026-06-24 06:45:10 -04:00
Joseph Doherty 744eb090ac docs(scadaproj): index ZB.MOM.WW.HistorianGateway sidecar + GalaxyRepository shared lib
- Add HistorianGateway to the Runtime/implementation table (single-process
  .NET 10 x64 gRPC sidecar; no COM/x86; 584 tests; local only, not yet
  pushed to gitea)
- Update "What this repository is" count (five → six pieces of source;
  add GalaxyRepository)
- Add HistorianGateway paragraph to Cross-project relationships / Net effect
  (independent sidecar; no runtime coupling to the other three; depends on
  shared GalaxyRepository lib via ProjectReference)
- Add ZB.MOM.WW.GalaxyRepository row to Component normalization table +
  full description paragraph (built 0.1.0; consumed by HistorianGateway;
  mxaccessgw adoption is a follow-on; not yet published to Gitea feed)
- Add HistorianGateway primary commands block (build/test/run/live-integration)
- Extend Shared GLAuth note to cover HistorianGateway
2026-06-24 00:41:29 -04:00
Joseph Doherty 94512acf1f fix(galaxyrepo): drop no-op ValidateOnStart (consumer owns validation) 2026-06-23 20:36:28 -04:00
Joseph Doherty 2c6c764d3c test(galaxyrepo): projector + cache tests; dispose semaphores; pack 0.1.0 2026-06-23 20:34:32 -04:00
Joseph Doherty a30f8551e9 feat(galaxyrepo): reusable gRPC service + AddZbGalaxyRepository DI 2026-06-23 20:26:59 -04:00
Joseph Doherty afd0287f54 feat(galaxyrepo): hierarchy cache + snapshot + refresh service + projector 2026-06-23 20:22:35 -04:00
Joseph Doherty 1041f87b59 feat(galaxyrepo): SQL browse provider (hierarchy + attributes) 2026-06-23 20:12:33 -04:00
Joseph Doherty 5572edda85 feat(galaxyrepo): canonical galaxy_repository.v1 proto (neutral namespace) 2026-06-23 20:05:39 -04:00
Joseph Doherty aff7264df8 feat(galaxyrepo): scaffold ZB.MOM.WW.GalaxyRepository shared lib 2026-06-23 19:48:43 -04:00
Joseph Doherty 510b0010d6 docs(historian-gateway): implementation plan + task ledger (31 tasks) 2026-06-23 19:43:08 -04:00
Joseph Doherty 42ad31aded docs(historian-gateway): brainstormed design for ZB.MOM.WW.HistorianGateway sidecar 2026-06-23 19:31:54 -04:00
Joseph Doherty e3c0503a4f docs(sphistorianclient): mark RemoteGrpc (2023 R2) live-verified 2026-06-19 06:57:06 -04:00
Joseph Doherty a0527f9b5a fix(sphistorianclient): gRPC auth handshake uses StorageService.ValidateClientCredential
The RemoteGrpc orchestrator drove the SSPI/NTLM token loop through
HistoryService.ExchangeKey, which the 2023 R2 contract analysis shows is a
separate key-exchange/cert op — not the credential handshake. The server
rejected the NTLM Type-1 token at round 0. The Negotiate loop belongs on
StorageService.ValidateClientCredential (Handle/InBuff -> Status/OutBuff;
field names match the 2020 native contract). Live-verified end-to-end against
a 2023 R2 Historian (wonder-sql-vd03): SysTimeSec raw read returns correct
timestamped values.
2026-06-19 06:56:44 -04:00
Joseph Doherty 5f7d7e1b58 docs(sphistorianclient): document HISTORIAN_PORT env var; mark plan tasks complete 2026-06-19 06:09:43 -04:00
Joseph Doherty 78418346df build(sphistorianclient): pack 0.1.0 nupkg 2026-06-19 06:02:05 -04:00
Joseph Doherty 4920b89666 docs(sphistorianclient): correct retrieval-mode count (15) + EnsureTag verification scope 2026-06-19 06:01:07 -04:00
Joseph Doherty 989db9317d docs(sphistorianclient): add CLAUDE.md + README.md 2026-06-19 05:58:13 -04:00
Joseph Doherty 81bf7322f0 feat(sphistorianclient): add AddZbSpHistorianClient DI extension 2026-06-19 05:53:56 -04:00
Joseph Doherty 8033a7f12d fix(sphistorianclient): resolve port build/test fallout 2026-06-19 05:49:22 -04:00
Joseph Doherty 63cddfb65b feat(sphistorianclient): port SDK source + tests, rebrand namespace to ZB.MOM.WW.SPHistorianClient 2026-06-19 05:45:06 -04:00
Joseph Doherty 965f5006f2 feat(sphistorianclient): scaffold shared library skeleton (props, csprojs, slnx) 2026-06-19 05:40:10 -04:00
Joseph Doherty 294da8b2db docs(sphistorianclient): implementation plan + task tracking 2026-06-19 05:36:24 -04:00
Joseph Doherty bbb7942788 docs(sphistorianclient): approved design for ZB.MOM.WW.SPHistorianClient port 2026-06-19 05:29:51 -04:00
Joseph Doherty d5b134b117 docs: add MES + Delmia-DNC integration API/MXAccess specs
mes-delmia-integration-api.md: endpoints, request/response DTOs, and the MXAccess flag handshake for MESAPI (in-repo MesNotifier) and DelmiaIntegration (DNC Downloader.asmx -> WWNotifier /notify -> Galaxy $DelmiaReceiver). mesrec.md / nj.md: live Galaxy receiver + reactor attribute references.
2026-06-17 06:52:36 -04:00
Joseph Doherty eb8b44c29d loader: purge legacy driver in overlay namespace on teardown (self-heal nw-uns-modbus placeholder) 2026-06-08 07:07:22 -04:00
Joseph Doherty a6fa36043a loader: equipment is driver-less (drop Modbus placeholder, NULL DriverInstanceId) 2026-06-08 06:42:31 -04:00
Joseph Doherty 05a4a547f4 feat(loader): canonical EQ-+uuid EquipmentIds (passes OtOpcUa full DraftValidator); clean by UnsLine scope 2026-06-07 11:18:39 -04:00
Joseph Doherty 4d57e34ff3 docs(loader): record live-values verification + 396/1036 explanation for company overlay 2026-06-07 06:08:36 -04:00
Joseph Doherty b3d8990a0f fix(loader): keep empty folderPath distinct in vtag ids; dedupe verify args; readme wait-seconds 2026-06-07 05:07:00 -04:00
Joseph Doherty 5655b75fe6 feat(loader): company overlay as VirtualTags mirroring the galaxy mirror + verify --require-good 2026-06-07 04:59:51 -04:00
Joseph Doherty dce6f83488 loader: add populate-equipment (company-shape Equipment overlay) + scope verify-equipment
populate-equipment loads the Northwind Enterprise/Site/Area/Line/Equipment/Signal
shape from company-uns.json as a second Equipment-kind namespace (nw-uns) alongside
the galaxy mirror — 3 areas / 8 lines / 40 equipment / 1036 signals. Friendly
DisplayName, stable logical-Id NodeId. verify-equipment now scopes to the nw-area-*
overlay by default (--all for the whole tree). Verified live on :4840 against OtOpcUa
master's Equipment-namespace materialization (structure-only; leaves are
BadWaitingForInitialData). clean now drops the overlay too.
2026-06-06 16:19:53 -04:00
Joseph Doherty fd34e25cb1 feat(uns-loader): verify-equipment — recursive Equipment UNS tree browse + leaf count
browse_summary assumes the flat 2-level Galaxy hierarchy; the Equipment tree is deep
(Area/Line/Equipment/[FolderPath]/Signal). Add browse_tree (recursive leaf descent) + a
verify-equipment subcommand that reports/asserts the leaf signal count (--expect N), for
verifying OtOpcUa equipment-namespace structure materialisation. Smoke-tested against a live
:4840 (40 folders / 396 leaf signals).
2026-06-06 15:25:17 -04:00
Joseph Doherty eb26bf3248 Add Galaxy UNS artifacts + reloadable OtOpcUa loader tool
galaxy-hierarchy.json: full AVEVA Galaxy DEV hierarchy pulled live via the
MxGateway .NET client (129 objects, 14k attrs). company-uns.json/.tree.txt +
gen_uns.py: a fake-company (Northwind) ISA-95 UNS modeled on OtOpcUa's
Cluster->Namespace->Area->Line->Equipment->Tag schema, grounded in the 40
TestMachine instances. otopcua-uns-loader/: reloadable generate/populate/verify/
clean tool that recreates + verifies the galaxy mirror (396 live tags across 40
machines) in OtOpcUa's config DB after a rebuild.
2026-06-06 14:22:25 -04:00
Joseph Doherty e5a609be83 docs(theme): mark themeissues #6 resolved in 0.3.1
Interactive-render nav fix (CSS display:none-when-closed + nav-state.js
MutationObserver re-wire) shipped in 0.3.1 and verified — ScadaBridge Central UI
NavCollapseTests now pass. All six issues now resolved (5 fixed, 1 tradeoff).
2026-06-05 08:32:03 -04:00
Joseph Doherty f1efe6e081 fix(theme): 0.3.1 — interactive-render nav backstop (issue #6)
Under an interactive Blazor render mode the runtime replaces the prerendered
<details> after DOMContentLoaded, so nav-state.js (wired on load, re-run only on
'enhancedload') never wires the live rail — no aria sync, no persistence, no
active-reveal — and native <details> content-hiding is unreliable, leaving a
collapsed section's items visible. 0.3.1:
- nav-state.js: add a MutationObserver backstop that re-runs apply() when
  details.rail-section nodes are (re)inserted; idempotent via the per-element
  init guard, loop-safe (childList-only + active-reveal's !open guard).
- layout.css: explicit .rail-section:not([open]) > .rail-section-body{display:none}
  so visual collapse works across all render modes.
- themeissues.md: document issue #6; Directory.Build.props 0.3.0 -> 0.3.1.
48 bUnit tests green.
2026-06-05 07:18:30 -04:00
Joseph Doherty 0e41e7c2e4 fix(theme): resolve nav/login kit issues + bump 0.2.1 -> 0.3.0
Addresses ZB.MOM.WW.Theme/themeissues.md:
- #1 NavRailSection <summary> renders aria-expanded (SSR from Expanded),
  kept in sync by nav-state.js on restore + toggle.
- #2 nav-state.js auto-expands the section holding a.rail-link.active
  (transient via data-zbnav-transient — does not overwrite saved state).
- #3 nav-state.js re-applies on Blazor 'enhancedload' (idempotent via
  per-element init guard).
- #5 LoginCard wraps product in span.login-product + optional Heading
  override param.
- #4 documented as an accepted client-only-persistence tradeoff (no code change).

+4 bUnit tests (48 total, all green).
2026-06-05 04:42:24 -04:00
Joseph Doherty 5f97c9d1ed docs(glauth): point all dev/test LDAP at the shared GLAuth on 10.100.0.35
deployment.md / CLAUDE.md / env_vars.md: the per-app LDAP (scadabridge-ldap
container, OtOpcUa DevStubMode, per-box C:\publish\glauth) is replaced by one
shared zb-shared-glauth on 10.100.0.35:3893 (dc=zb,dc=local); source of truth
infra/glauth/. Fixed stale baseDNs (dc=lmxopcua/dc=otopcua -> dc=zb).
2026-06-04 16:37:52 -04:00
Joseph Doherty 9d373efbe0 docs(glauth): mark shared-GLAuth design implemented + all plan tasks complete 2026-06-04 16:21:13 -04:00
Joseph Doherty 4c0f1eaaf7 fix(glauth): rename OPC/Gw testers to avoid username/group case-collision
glauth exposes each group as cn=<Group> under ou=users, so a case-insensitive
(cn=x) search matched both the user and the group (2 entries -> the shared
ZB.MOM.WW.Auth.Ldap 'exactly one entry' rule failed the bind). Renamed the 4
colliding testers (readonly/writetune/alarmack/gwreader) + the 2 siblings for
consistency: opc-readonly/opc-writeop/opc-writetune/opc-writeconfig/opc-alarmack
and gw-viewer. Verified gw-viewer logs into the MxGateway dashboard as Viewer.
multi-role/admin/designer/etc. were never affected (no case-collision).
2026-06-04 16:19:33 -04:00
Joseph Doherty 0f2b2b8351 feat(glauth): merged shared dev GLAuth directory + compose + runbook (10.100.0.35)
Phase 0 of the shared-GLAuth standardization. config.toml = merged dc=zb,dc=local
directory (15 groups in partitioned 55xx/56xx/57xx families, 14 users incl.
multi-role spanning all groups, serviceaccount search account). compose runs one
glauth/glauth:latest on :3893. README is the deploy/verify runbook. Code-reviewed;
fixed scp -r idempotency in the deploy command (README + plan Task 4).
2026-06-04 15:45:41 -04:00
Joseph Doherty 5be0cec601 docs(glauth): implementation plan + tasks for shared GLAuth standardization
19 tasks across 5 phases: author scadaproj/infra/glauth/ (merged config + compose +
runbook) → deploy/verify on 10.100.0.35 (hard gate, access-prerequisite) → repoint
ScadaBridge (Mac), un-stub OtOpcUa docker-dev, repoint windev MxGateway + OtOpcUa →
retire old glauths → full cross-app verification. Co-located .tasks.json.
2026-06-04 15:37:06 -04:00
Joseph Doherty 106fb8b149 docs(glauth): shared GLAuth standardization design (dev/test consolidation onto 10.100.0.35)
Approved design: consolidate OtOpcUa, MxAccessGateway, ScadaBridge dev/test auth
onto one shared GLAuth at 10.100.0.35:3893 (dc=zb,dc=local, plaintext). App-neutral
source of truth in scadaproj/infra/glauth/; merged directory with gid families
partitioned 55xx/56xx/57xx + multi-role/admin/serviceaccount; per-app Server
repoints; incremental rollout keeping old glauths until verified.
2026-06-04 15:26:32 -04:00
Joseph Doherty b0fe7b15ca fix(theme): render app-shell on desktop Chromium via ::details-content (0.2.1)
Chromium >=121 wraps a <details>'s content in a generated ::details-content
box with content-visibility:hidden while closed. The SSR app-shell ships
closed (no JS) and hides its summary toggle at lg+, so on desktop the rail+page
were invisible and the flex-lg-row layout collapsed to a vertical stack.

Add '.app-shell::details-content { display: contents }' inside the lg+ media
query: dissolving the wrapper box reveals the content regardless of open state
and restores rail/page as direct flex children of .app-shell. Browsers without
::details-content support drop the invalid selector and fall back to the legacy
force-show. Mobile (<lg) and nested NavRailSection disclosures unaffected.

Bump 0.2.0 -> 0.2.1.
2026-06-04 10:23:05 -04:00
Joseph Doherty 3070169e5d docs(ui-theme): record post-adoption site.css prune + reconfirm 0.2.0 on feed
Audit follow-up: the deferred 'dead .sidebar/.nav-link residual' was broader than
logged (OtOpcUa's site.css duplicated and overrode the whole kit shell). Pruned
across all 3 apps on chore/theme-css-prune branches (-167/-95/-106 lines, builds
clean). Note the remaining deferred items (kit layout.css calc review; ScadaBridge
Host transitive kit ref) and reconfirm the Theme 0.2.0 publish is genuine.
2026-06-03 04:38:24 -04:00
Joseph Doherty ea4116cc5b docs(ui-theme): mark merged to local default + pushed to origin (in sync) 2026-06-03 04:15:20 -04:00
Joseph Doherty ca21615090 docs(ui-theme): record 0.2.0 publish + adoption across all 3 apps (local feat branches) 2026-06-03 04:06:20 -04:00
Joseph Doherty a474eb6bd6 chore(theme): bump 0.1.0 -> 0.2.0 (nav persistence + ThemeScripts) 2026-06-03 02:59:27 -04:00
Joseph Doherty 9e4dedc987 fix(theme): guard nav-state.js against duplicate toggle listeners 2026-06-03 02:58:34 -04:00
Joseph Doherty 6aa2ee8095 fix(theme): null/whitespace-safe NavRailSection slug + edge tests 2026-06-03 02:57:07 -04:00
Joseph Doherty e2749b7d69 feat(theme): ThemeScripts + localStorage nav-state enhancer 2026-06-03 02:55:35 -04:00
Joseph Doherty edd49765d6 feat(theme): NavRailSection data-nav-key for persistence 2026-06-03 02:53:15 -04:00
Joseph Doherty 7e11f9aac8 docs(ui-theme): implementation plan + task graph (26 tasks, Phases 0-4) 2026-06-03 02:50:31 -04:00
Joseph Doherty e6e9dbfedb docs(ui-theme): approved adoption design (publish 0.2.0 + full canonical cutover across 3 apps) 2026-06-03 02:35:00 -04:00
Joseph Doherty 6d262f7d7c docs: Auth+Audit normalization PUSHED to origin (gitea) 2026-06-03 — default branches in sync; feat/* kept locally 2026-06-03 00:36:55 -04:00
Joseph Doherty 4b90ebb588 docs: reflect final delivery — Auth+Audit normalization merged to each repo's LOCAL default (main/master) 2026-06-03, NOT pushed (origin untouched), feat/* branches kept 2026-06-03 00:31:07 -04:00
Joseph Doherty 4de61d29f5 docs: PROGRAM COMPLETE — Auth+Audit normalization adopted across all 3 repos (Phases 0-3); mark exit-gate (CLAUDE.md Auth/Audit rows + components/{auth,audit}/GAPS.md adopted, local-only/not-pushed); tasks #10/#30/#31 done 2026-06-02 15:42:23 -04:00
Joseph Doherty 1ec057a32a plan: Task 2.5 (ScadaBridge audit full re-arch C1-C7) DONE+reviewed -> PHASE 2 COMPLETE (audit adopted across all 3 repos, deep/canonical, local-only). Next = Phase 3 Actor->principal wiring 2026-06-02 15:10:54 -04:00
Joseph Doherty a591a9fb47 plan(2.5): ScadaBridge audit C5 done+reviewed (central migration, MSSQL-verified); C6 subsumed (consumer surfaces already canonical via C3 shims); C7 (perf re-baseline + cleanup) in progress 2026-06-02 14:24:32 -04:00
Joseph Doherty e9100d0b74 plan(2.5): ScadaBridge audit C4 done+reviewed (site sidecar); C5 (central migration) in progress 2026-06-02 13:34:12 -04:00
Joseph Doherty 672ac5ff04 plan(2.5): ScadaBridge audit C3 done+reviewed (record swap keystone); C4 (site sidecar) in progress 2026-06-02 13:07:32 -04:00
Joseph Doherty f073241f52 plan(2.5): ScadaBridge audit re-arch C1+C2 done (reviewed); C3 (atomic record swap) in progress 2026-06-02 11:54:57 -04:00
Joseph Doherty 98e957903f plan(2.5): ScadaBridge audit full-rearch design + C1-C7 decomposition (sidecar forwarding, new-table-copy central migration, persisted computed cols, canonical record everywhere) 2026-06-02 10:36:00 -04:00
Joseph Doherty ca2a9ac507 plan(phase2): OtOpcUa 2.1/2.2 + MxGateway 2.3 DONE (deep audit adoption, spec+code reviewed, local-only); ScadaBridge 2.5 pending variant decision 2026-06-02 10:26:55 -04:00
Joseph Doherty abe06a2163 plan(phase2): Task 2.0 gate DONE — verified plan specs materially off (MxGw store moved to lib, OtOpcUa path dormant, SB rename structurally impossible); user chose DEEP adopt + pause; corrected deep design in -phase2-deep.md; PAUSED for review 2026-06-02 09:13:09 -04:00
Joseph Doherty 95681ac0b2 plan(phase1): Tasks 1.5/1.6/1.7 done+reviewed — PHASE 1 COMPLETE across all 3 repos (claims/cookies, dev base DN dc=zb, canonical-six roles + SB SoD collapse + config-DB migrations); next = Phase 2 audit 2026-06-02 08:15:46 -04:00
Joseph Doherty d73762bf76 plan(phase1): ScadaBridge re-arch C5 done+reviewed; Task 1.3 (ApiKeys adopt) COMPLETE across all 3 repos; installer/secret catch noted 2026-06-02 05:51:10 -04:00
Joseph Doherty 02a84b074a plan(phase1): ScadaBridge re-arch C4 done+reviewed (TransportExport excludes keys); C5 (retire entity) next 2026-06-02 05:17:09 -04:00
Joseph Doherty 9b5535ea47 plan(phase1): ScadaBridge re-arch C3 done+reviewed (CentralUI onto seam); C4 next 2026-06-02 04:50:09 -04:00
Joseph Doherty 406ede19dd plan(phase1): ScadaBridge re-arch C2 done+reviewed (mgmt+CLI onto seam); C3 next 2026-06-02 04:25:02 -04:00
Joseph Doherty ba7b38a654 plan(phase1): ScadaBridge re-arch C1 done+reviewed; 2 pre-existing Host.Tests baseline reds fixed; C2 next 2026-06-02 04:03:31 -04:00
Joseph Doherty e69e9c635b plan(phase1): ScadaBridge re-arch discovered architecture (CentralUI direct-repo + TransportExport) + C1-C5 decomposition + transport=exclude-keys 2026-06-02 03:22:19 -04:00
Joseph Doherty a4f9968917 plan(phase1): Auth lib 0.1.3 published (SetScopes/SetEnabled); ScadaBridge re-arch C mapping 2026-06-02 03:14:29 -04:00
Joseph Doherty 290e85cb38 test(auth.apikeys): store-level arg guards + SetEnabledAsync idempotence (review M1/M2) 2026-06-02 03:12:24 -04:00
Joseph Doherty 468959ca8a feat(auth.apikeys): add IApiKeyAdminStore.SetScopesAsync + SetEnabledAsync (editable scopes + reversible enable, no schema change); bump 0.1.3 2026-06-02 03:08:19 -04:00
Joseph Doherty 30c60f9d5f plan(phase1): SB ApiKeys A+B foundation done+reviewed; C/D/E pending 2026-06-02 02:50:57 -04:00
Joseph Doherty d30cdea487 plan(phase1): ScadaBridge ApiKeys full-adopt re-arch spec + sub-task decomposition 2026-06-02 02:29:03 -04:00
Joseph Doherty f2b73367d5 plan(phase1): MxGateway 1.3 done+approved (lib 0.1.2); ScadaBridge 1.3 pending 2026-06-02 02:14:45 -04:00
Joseph Doherty da669bfc9b fix(auth.apikeys): stamp schema version 2 to match donor gateway DBs; bump 0.1.2
The store was extracted from MxAccessGateway, whose deployed gateway-auth.db
is at schema_version=2. The library capped at 1 and threw on a newer on-disk
version -> gateway would fail to boot. Final schema is byte-identical since v1;
stamp 2 so existing deployed DBs interoperate (no key re-issuance). +2 tests.
2026-06-02 01:45:57 -04:00
Joseph Doherty 2d50d5dcf0 plan(phase1): 1.2/1.4 done across 3 repos (lib 0.1.1); remaining 1.3/1.5-1.7 2026-06-02 01:38:50 -04:00
Joseph Doherty aecc106657 fix(auth.ldap): skip LdapOptionsValidator when Enabled=false; bump 0.1.1
A disabled LDAP provider's connection fields are inert — don't require
Server/SearchBase/ServiceAccountDn at startup when Enabled=false. Surfaced
by the MxGateway 1.2 review (dashboard LDAP can be disabled). +1 test.
2026-06-02 01:17:53 -04:00
Joseph Doherty 0586e64f64 plan(phase1): record Task 1.2 review findings + LdapOptionsValidator 0.1.1 question 2026-06-02 01:12:20 -04:00
Joseph Doherty 37c03e5fc2 plan(phase1): note Roles sub-namespace; Task 1.1 done+approved (3 repos) 2026-06-02 00:34:13 -04:00
Joseph Doherty bea08f9673 plan(phase1): lock resolved decisions (SB ApiKeys full adopt, roles, dev hatches) 2026-06-02 00:25:53 -04:00
Joseph Doherty 32fd953969 plan(phase1): Task 1.0 exploration findings + elaborated Auth cutover
Per-app cutover steps mapped to the library surface; flags 5 findings that
change the plan (OtOpcUa section is Security:Ldap not Authentication:Ldap;
singleton 'bug' already mitigated; ScadaBridge inbound API keys are a
re-architecture not a reformat; OtOpcUa config+DB mapping + DevStubMode +
2nd LDAP consumer; MxGateway ApiKeys is the low-risk donor path).
2026-06-02 00:24:03 -04:00
Joseph Doherty c715565bd2 build(audit): add Gitea push.sh mirroring Auth's 2026-06-02 00:13:24 -04:00
Joseph Doherty f98fa84e4a plan: implementation plan + task graph for Auth+Audit normalization
Phase 0 command-exact (publish + feed-map); Phases 1-3 decomposed into
bite-sized cutover tasks with files-to-edit contracts, classification,
parallelizability, and per-phase explore/elaborate gates. Co-located
.tasks.json mirrors native tasks #7-#31.
2026-06-02 00:11:48 -04:00
Joseph Doherty 6ec1ea7d65 docs: design for full Auth+Audit normalization across 3 sister projects
Approved brainstorming output: two-library program (publish + adopt
ZB.MOM.WW.Auth then ZB.MOM.WW.Audit across OtOpcUa, MxAccessGateway,
ScadaBridge), library-major waterfall, ending with audit Actor wired
from the Auth principal. Local-only delivery; verified feed/source state.
2026-06-02 00:04:33 -04:00
Joseph Doherty c3ab37523a docs: record ZB.MOM.WW.Configuration fleet-wide adoption + add design/plan
Configuration is now adopted across all three sister apps (local branches),
so flip the status lines in CLAUDE.md, components/configuration/GAPS.md, and the
lib README/CLAUDE.md from 'not adopted' to adopted (also corrects 27->42 tests).
Adds the brainstorm design doc + bite-sized implementation plan (+tasks.json)
under docs/plans/ that drove the adoption.
2026-06-01 23:18:02 -04:00
Joseph Doherty 2f124fa02c docs(observability): record telemetry follow-ons DONE (metric normalization, ScadaBridge instruments, OTLP opt-in, site metrics listener, Serilog alignment) 2026-06-01 17:16:46 -04:00
Joseph Doherty 6c2a43a238 docs: plan for ZB.MOM.WW.Telemetry follow-ons (A additive/hygiene, B metric normalization, C ScadaBridge instruments, D OTLP opt-in) 2026-06-01 16:32:57 -04:00
Joseph Doherty dee55aadc6 docs(observability): record ZB.MOM.WW.Telemetry adoption across 3 apps; correct false MxGateway logging-status claim
All 3 apps adopted on branch feat/adopt-zb-telemetry (behaviour-preserving).
Records the per-repo result + accepted scope deviations (ScadaBridge keeps
LoggerConfigurationFactory + TraceContextEnricher instead of AddZbSerilog;
MxGateway keeps GatewayLogScope, exposes redaction via ILogRedactor seam) and
deferred follow-ons (#6 ms->s, #7 meter rename, #9 app instruments, OTLP, and
the new ScadaBridge Site-node HTTP/1.1 metrics-listener item). Corrects the
prior false 'MxGateway logging adopted on its own branch' claim — that migration
actually landed in this pass.
2026-06-01 15:58:10 -04:00
Joseph Doherty 30425726d4 docs: implementation plan for ZB.MOM.WW.Telemetry adoption across the 3 sister apps
13 tasks: Task 0 publishes/verifies the 2 nupkgs on Gitea (gates all); then 3
independent per-repo phases — OtOpcUa (1-3), ScadaBridge (4-6), MxGateway (7-11,
incl. the high-risk MEL->Serilog swap) — and Task 12 scadaproj bookkeeping last.
Records two behaviour-preserving refinements vs the design: ScadaBridge keeps
LoggerConfigurationFactory (+TraceContextEnricher) instead of AddZbSerilog, and
MxGateway keeps GatewayLogScope as-is. Breaking items #6/#7 deferred.
2026-06-01 15:24:28 -04:00
Joseph Doherty 3729ff2152 docs: design for ZB.MOM.WW.Telemetry adoption across the 3 sister apps
Second cross-fleet shared-library adoption (after Health). Full scope:
AddZbTelemetry (OTel Resource identity triple + standard instrumentation +
Prometheus /metrics) on all 3, plus shared Serilog on all 3 — including the
MxGateway MEL->Serilog migration. Records the correction that MxGateway's
logging was NOT actually adopted on main despite the docs' claim. Behaviour-
preserving bar; breaking items (#6 unit, #7 rename) deferred.
2026-06-01 15:11:50 -04:00
113 changed files with 148209 additions and 82 deletions
+105 -27
View File
@@ -6,12 +6,13 @@ This file provides guidance to Claude Code (claude.ai/code) when working with co
`scadaproj` is primarily an umbrella/index workspace that aggregates a family of `scadaproj` is primarily an umbrella/index workspace that aggregates a family of
related SCADA / OT / Wonderware / OPC UA "sister projects" that live as **sibling related SCADA / OT / Wonderware / OPC UA "sister projects" that live as **sibling
directories under `~/Desktop/`**. It now also **hosts five pieces of source itself** directories under `~/Desktop/`**. It now also **hosts six pieces of source itself**
the shared [`ZB.MOM.WW.Auth/`](ZB.MOM.WW.Auth/) library, the shared the shared [`ZB.MOM.WW.Auth/`](ZB.MOM.WW.Auth/) library, the shared
[`ZB.MOM.WW.Theme/`](ZB.MOM.WW.Theme/) UI kit, the shared [`ZB.MOM.WW.Theme/`](ZB.MOM.WW.Theme/) UI kit, the shared
[`ZB.MOM.WW.Health/`](ZB.MOM.WW.Health/) health-check library, the shared [`ZB.MOM.WW.Health/`](ZB.MOM.WW.Health/) health-check library, the shared
[`ZB.MOM.WW.Telemetry/`](ZB.MOM.WW.Telemetry/) observability library, and the shared [`ZB.MOM.WW.Telemetry/`](ZB.MOM.WW.Telemetry/) observability library, the shared
[`ZB.MOM.WW.Configuration/`](ZB.MOM.WW.Configuration/) config-validation library — all the realized output of their [`ZB.MOM.WW.Configuration/`](ZB.MOM.WW.Configuration/) config-validation library, and the new
[`ZB.MOM.WW.GalaxyRepository/`](ZB.MOM.WW.GalaxyRepository/) Galaxy browse library — all the realized output of their
respective component normalizations (see [Component normalization](#component-normalization)). respective component normalizations (see [Component normalization](#component-normalization)).
The point of this file is to give a high-level scan of each sister project — its purpose, The point of this file is to give a high-level scan of each sister project — its purpose,
location, stack, and primary commands — so a fresh Claude Code session can orient across location, stack, and primary commands — so a fresh Claude Code session can orient across
@@ -30,9 +31,10 @@ own `CLAUDE.md` for the full picture. See [Refreshing this index](#refreshing-th
| Project | Location | Stack | Repo | Summary | | Project | Location | Stack | Repo | Summary |
|---|---|---|---|---| |---|---|---|---|---|
| **OtOpcUa** | `~/Desktop/OtOpcUa` | .NET 10, OPC UA, gRPC | `gitea.dohertylan.com/dohertj2/lmxopcua` | OPC UA server that exposes AVEVA System Platform (Wonderware) Galaxy tags as an OPC UA address space. Galaxy access flows through an in-process `GalaxyDriver` → gRPC → the **mxaccessgw** gateway. | | **OtOpcUa** | `~/Desktop/OtOpcUa` | .NET 10, OPC UA, gRPC | `gitea.dohertylan.com/dohertj2/lmxopcua` | OPC UA server that exposes industrial data sources under a **unified Equipment-based address space** — native-protocol drivers (Modbus, S7, AB CIP/Legacy, TwinCAT, FOCAS, OpcUaClient) **and AVEVA System Platform (Wonderware) Galaxy, now a standard Equipment-kind driver** (the old SystemPlatform mirror / alias-tag model was retired ~2026-06-12). Galaxy access flows through the in-process `GalaxyDriver` → gRPC → the **mxaccessgw** gateway. Surfaces live read + authorized write, native OPC UA Part 9 alarms, and server-side HistoryRead. |
| **MxAccessGateway** (`mxaccessgw`) | `~/Desktop/MxAccessGateway` | .NET 10 gateway (x64) + .NET 4.8 worker (**x86**), gRPC | `gitea.dohertylan.com/dohertj2/mxaccessgw` | gRPC gateway giving modern clients full MXAccess parity without loading 32-bit COM. Two-process: gateway (ASP.NET Core gRPC + Blazor dashboard) + per-session x86 worker that owns the MXAccess COM STA. **OtOpcUa depends on this.** | | **MxAccessGateway** (`mxaccessgw`) | `~/Desktop/MxAccessGateway` | .NET 10 gateway (x64) + .NET 4.8 worker (**x86**), gRPC | `gitea.dohertylan.com/dohertj2/mxaccessgw` | gRPC gateway giving modern clients full MXAccess parity without loading 32-bit COM. Two-process: gateway (ASP.NET Core gRPC + Blazor dashboard) + per-session x86 worker that owns the MXAccess COM STA. **OtOpcUa depends on this.** |
| **ScadaBridge** | `~/Desktop/ScadaBridge` | .NET 10, Akka.NET, Docker | _git_ | Full implementation of the distributed SCADA platform — hub-and-spoke (1 central cluster + N site clusters). Projects prefixed `ZB.MOM.WW.ScadaBridge.*`; solution `ZB.MOM.WW.ScadaBridge.slnx`. Ships `src/`, `tests/`, `docker/` topology, and the design docs that are the spec. | | **ScadaBridge** | `~/Desktop/ScadaBridge` | .NET 10, Akka.NET, Docker | _git_ | Full implementation of the distributed SCADA platform — hub-and-spoke (1 central cluster + N site clusters). Projects prefixed `ZB.MOM.WW.ScadaBridge.*`; solution `ZB.MOM.WW.ScadaBridge.slnx`. Ships `src/`, `tests/`, `docker/` topology, and the design docs that are the spec. |
| **HistorianGateway** | `~/Desktop/HistorianGateway` | .NET 10 x64, gRPC, Blazor | `gitea.dohertylan.com/dohertj2/historiangw` | Single-process gRPC sidecar exposing (1) full read/write API to the AVEVA Historian (5 gRPC services; 15 retrieval modes; historical/backfill writes; tag-config lifecycle; SQL live-value path; store-forward + redundancy resilience; all default-disabled) and (2) read-only Galaxy object-hierarchy browse via the shared `ZB.MOM.WW.GalaxyRepository` lib (consumed as a Gitea-feed package). No COM, no x86 worker. Dashboard on `:5220` (HTTP/1.1); gRPC h2c on `:5221`. Vendors `AVEVA.Historian.Client` from `histsdk`. 590 tests total — 584 green on macOS; the env-gated live historian + Galaxy integration suite (6 tests) skips without a live server. |
## Cross-project relationships ## Cross-project relationships
@@ -84,8 +86,10 @@ the gateway uses `MxGateway.*`). The common subject is **AVEVA System Platform (
`GalaxyRepositoryClient` for the static hierarchy, and an MXAccess session `GalaxyRepositoryClient` for the static hierarchy, and an MXAccess session
(`MxCommand`/`MxEvent` protos) for live read/write/subscribe. A `DeployWatcher` polls the (`MxCommand`/`MxEvent` protos) for live read/write/subscribe. A `DeployWatcher` polls the
gateway's deploy-event signal to rebuild the OPC UA address space on Galaxy redeploy. gateway's deploy-event signal to rebuild the OPC UA address space on Galaxy redeploy.
OtOpcUa's job is purely a **protocol bridge**: it republishes Galaxy as an OPC UA address OtOpcUa's job is a **protocol bridge**: it republishes Galaxy — now bound as a *standard
space for *any* OPC UA client. Equipment-kind driver* alongside its native-protocol drivers, not a special SystemPlatform
mirror — as an OPC UA address space (live values, Part 9 alarms, HistoryRead) for *any* OPC
UA client.
- **ScadaBridge → OPC UA** (OPC UA client). ScadaBridge's DCL has an OPC UA adapter that - **ScadaBridge → OPC UA** (OPC UA client). ScadaBridge's DCL has an OPC UA adapter that
collects data and mirrors native OPC UA Alarms & Conditions. OtOpcUa is exactly such a collects data and mirrors native OPC UA Alarms & Conditions. OtOpcUa is exactly such a
server, so ScadaBridge can ingest Wonderware data **indirectly via OtOpcUa**. server, so ScadaBridge can ingest Wonderware data **indirectly via OtOpcUa**.
@@ -101,15 +105,21 @@ the gateway uses `MxGateway.*`). The common subject is **AVEVA System Platform (
- ScadaBridge has **two paths** to the same Wonderware data: (1) OPC UA → OtOpcUa → - ScadaBridge has **two paths** to the same Wonderware data: (1) OPC UA → OtOpcUa →
gateway, or (2) MxGateway adapter → gateway directly. Path 1 gives standards-based OPC UA gateway, or (2) MxGateway adapter → gateway directly. Path 1 gives standards-based OPC UA
decoupling; path 2 gives a more direct/native feed. decoupling; path 2 gives a more direct/native feed.
- **HistorianGateway is a new, independent sidecar** (no runtime coupling to the three above).
It reaches the Historian via its vendored gRPC client and the Galaxy Repository SQL DB directly,
not through `mxaccessgw`. It consumes the shared `ZB.MOM.WW.GalaxyRepository` lib
(cross-repo `ProjectReference`). Any client that needs Historian data or Galaxy browse can
target HistorianGateway independently; it is not a dependency of OtOpcUa or ScadaBridge today.
- Coupling is loose: each repo references the others only as **sibling context** (the - Coupling is loose: each repo references the others only as **sibling context** (the
`## Sister Projects` note in ScadaBridge's own `CLAUDE.md` lists `MxAccessGateway` and `## Sister Projects` note in ScadaBridge's own `CLAUDE.md` lists `MxAccessGateway` and
`OtOpcUa` with their Gitea URLs but states they are *not part of its solution*). `OtOpcUa` with their Gitea URLs but states they are *not part of its solution*).
- **The break surface is the wire contracts, not code.** Because coupling is by network - **The break surface is the wire contracts, not code.** Because coupling is by network
protocol, the things that break across repo boundaries are: the gateway's `.proto` files protocol, the things that break across repo boundaries are: the gateway's `.proto` files
(`mxaccess_gateway.proto`, `mxaccess_worker.proto`, `galaxy_repository.proto`), and the (`mxaccess_gateway.proto`, `mxaccess_worker.proto`, `galaxy_repository.proto`), the
OPC UA address-space shape OtOpcUa publishes (browse paths, node IDs, A&C alarm model). `historian_gateway.v1` proto (HistorianGateway's own contract), and the OPC UA address-space
Changes to any of these must be coordinated across the affected repos — a green build in shape OtOpcUa publishes (browse paths, node IDs, A&C alarm model). Changes to any of these
one repo does not prove the others still interoperate. must be coordinated across the affected repos — a green build in one repo does not prove the
others still interoperate.
## Component normalization ## Component normalization
@@ -120,12 +130,13 @@ each project's **code-verified current state**, and the **gaps** between. See
| Component | Status | Goal | Design | Implementation | | Component | Status | Goal | Design | Implementation |
|---|---|---|---|---| |---|---|---|---|---|
| Auth (login / identity / authz) | Built (lib `0.1.0`) | Shared `ZB.MOM.WW.Auth` lib | [`components/auth/`](components/auth/) | [`ZB.MOM.WW.Auth/`](ZB.MOM.WW.Auth/) | | Auth (login / identity / authz) | Adopted (lib `0.1.3`; all 3 apps, merged to **local default** main/master + **pushed to origin** (gitea)) | Shared `ZB.MOM.WW.Auth` lib | [`components/auth/`](components/auth/) | [`ZB.MOM.WW.Auth/`](ZB.MOM.WW.Auth/) |
| UI Theme (layout / tokens / components) | Built (lib `0.1.0`) | Shared `ZB.MOM.WW.Theme` RCL | [`components/ui-theme/`](components/ui-theme/) | [`ZB.MOM.WW.Theme/`](ZB.MOM.WW.Theme/) | | UI Theme (layout / tokens / components) | Adopted (lib `0.2.0`; all 3 apps, merged to **local default** + **pushed to origin** (gitea)) | Shared `ZB.MOM.WW.Theme` RCL | [`components/ui-theme/`](components/ui-theme/) | [`ZB.MOM.WW.Theme/`](ZB.MOM.WW.Theme/) |
| Health (readiness / liveness / active-node) | Built (lib `0.1.0`) | Shared `ZB.MOM.WW.Health` lib | [`components/health/`](components/health/) | [`ZB.MOM.WW.Health/`](ZB.MOM.WW.Health/) | | Health (readiness / liveness / active-node) | Built (lib `0.1.0`) | Shared `ZB.MOM.WW.Health` lib | [`components/health/`](components/health/) | [`ZB.MOM.WW.Health/`](ZB.MOM.WW.Health/) |
| Observability (metrics / traces / logs) | Built (lib `0.1.0`) | Shared `ZB.MOM.WW.Telemetry` lib + `.Serilog` | [`components/observability/`](components/observability/) | [`ZB.MOM.WW.Telemetry/`](ZB.MOM.WW.Telemetry/) | | Observability (metrics / traces / logs) | Built (lib `0.1.0`) | Shared `ZB.MOM.WW.Telemetry` lib + `.Serilog` | [`components/observability/`](components/observability/) | [`ZB.MOM.WW.Telemetry/`](ZB.MOM.WW.Telemetry/) |
| Config + validation (options / startup validation) | Built (lib `0.1.0`) | Shared `ZB.MOM.WW.Configuration` lib | [`components/configuration/`](components/configuration/) | [`ZB.MOM.WW.Configuration/`](ZB.MOM.WW.Configuration/) | | Config + validation (options / startup validation) | Adopted (lib `0.1.0`; all 3 apps, local) | Shared `ZB.MOM.WW.Configuration` lib | [`components/configuration/`](components/configuration/) | [`ZB.MOM.WW.Configuration/`](ZB.MOM.WW.Configuration/) |
| Audit (event model + writer seam) | Built (lib `0.1.0`) | Shared `ZB.MOM.WW.Audit` lib | [`components/audit/`](components/audit/) | [`ZB.MOM.WW.Audit/`](ZB.MOM.WW.Audit/) | | Audit (event model + writer seam) | Adopted (lib `0.1.0`; all 3 apps, merged to **local default** main/master + **pushed to origin** (gitea)) | Shared `ZB.MOM.WW.Audit` lib | [`components/audit/`](components/audit/) | [`ZB.MOM.WW.Audit/`](ZB.MOM.WW.Audit/) |
| Galaxy Repository (object-hierarchy SQL browse + gRPC service) | Built (lib `0.1.0`, **published to the Gitea feed**; consumed by HistorianGateway as a feed `PackageReference`) | Shared `ZB.MOM.WW.GalaxyRepository` lib | _(design in histsdk + design doc 2026-06-23)_ | [`ZB.MOM.WW.GalaxyRepository/`](ZB.MOM.WW.GalaxyRepository/) |
The auth component is fully populated: a normalized [`spec`](components/auth/spec/SPEC.md), a The auth component is fully populated: a normalized [`spec`](components/auth/spec/SPEC.md), a
proposed [`shared-contract`](components/auth/shared-contract/ZB.MOM.WW.Auth.md), three proposed [`shared-contract`](components/auth/shared-contract/ZB.MOM.WW.Auth.md), three
@@ -137,7 +148,14 @@ The shared library is **built and lives in this repo** at [`ZB.MOM.WW.Auth/`](ZB
(its own nested git repo; .NET 10; 4 packages — `Abstractions`, `Ldap`, `ApiKeys`, `AspNetCore`; (its own nested git repo; .NET 10; 4 packages — `Abstractions`, `Ldap`, `ApiKeys`, `AspNetCore`;
172 tests; `dotnet pack` → 4 nupkgs @ 0.1.0). The implementation plan is at 172 tests; `dotnet pack` → 4 nupkgs @ 0.1.0). The implementation plan is at
[`docs/plans/2026-06-01-zb-mom-ww-auth-shared-library.md`](docs/plans/2026-06-01-zb-mom-ww-auth-shared-library.md). [`docs/plans/2026-06-01-zb-mom-ww-auth-shared-library.md`](docs/plans/2026-06-01-zb-mom-ww-auth-shared-library.md).
**Not yet adopted** by the three apps — that's the follow-on tracked in [`components/auth/GAPS.md`](components/auth/GAPS.md) (#8). **Adopted across all three apps on 2026-06-02** (auth GAPS #1#8) on each repo's `feat/adopt-zb-auth` branch —
committed + reviewed, then **fast-forward-merged into the repo's local default (main/master) and PUSHED to origin
(gitea) on 2026-06-03** (in sync; the `feat/*` branches kept locally as history). Cutover: shared `Auth.Ldap`,
`Auth.ApiKeys` (ScadaBridge inbound fully re-architected to the keyId/Bearer model), `IGroupRoleMapper<TRole>` seam,
`Transport`-enum config, canonical `ZbClaimTypes`/`ZbCookieDefaults`, unified dev base DN `dc=zb,dc=local`, and the
canonical-six role vocabulary (with ScadaBridge's accepted auditor/admin SoD collapse). Consumer pins: OtOpcUa `0.1.1`,
MxGateway `0.1.2`, ScadaBridge `0.1.3`. Per-repo detail in [`components/auth/GAPS.md`](components/auth/GAPS.md) +
`docs/plans/2026-06-02-auth-audit-normalization*.md`.
Build/test from `ZB.MOM.WW.Auth/`: `dotnet test`. Consumer matrix: OtOpcUa → Abstractions+Ldap+AspNetCore; Build/test from `ZB.MOM.WW.Auth/`: `dotnet test`. Consumer matrix: OtOpcUa → Abstractions+Ldap+AspNetCore;
MxAccessGateway & ScadaBridge → all four (ApiKeys not used by OtOpcUa). MxAccessGateway & ScadaBridge → all four (ApiKeys not used by OtOpcUa).
@@ -149,10 +167,18 @@ backlog. Shared = Technical-Light tokens + IBM Plex fonts + side-rail shell + wi
per-project = each app's `site.css` page layout, route content, scoped `.razor.css`. per-project = each app's `site.css` page layout, route content, scoped `.razor.css`.
The shared RCL is **built and lives in this repo** at [`ZB.MOM.WW.Theme/`](ZB.MOM.WW.Theme/) The shared RCL is **built and lives in this repo** at [`ZB.MOM.WW.Theme/`](ZB.MOM.WW.Theme/)
(.NET 10 Razor Class Library; single package; 32 bUnit tests; `dotnet pack` → 1 nupkg @ 0.1.0). (.NET 10 Razor Class Library; single package; 44 bUnit tests; `dotnet pack` → 1 nupkg @ 0.2.0,
The implementation plan is at **published to the Gitea feed**). The build plan is at
[`docs/plans/2026-06-01-zb-mom-ww-theme-shared-library.md`](docs/plans/2026-06-01-zb-mom-ww-theme-shared-library.md). [`docs/plans/2026-06-01-zb-mom-ww-theme-shared-library.md`](docs/plans/2026-06-01-zb-mom-ww-theme-shared-library.md);
**Not yet adopted** by the three apps — that's the follow-on tracked in [`components/ui-theme/GAPS.md`](components/ui-theme/GAPS.md). the adoption plan at [`docs/plans/2026-06-03-ui-theme-adoption.md`](docs/plans/2026-06-03-ui-theme-adoption.md).
**Adopted across all three apps on 2026-06-03** (full canonical cutover, SPEC §7) on each repo's
`feat/adopt-zb-theme` branch — committed + spec/code-reviewed, then **fast-forward-merged into each repo's local
default (master/main) and PUSHED to origin (gitea)** (in sync; `feat/*` kept locally as history): OtOpcUa
`lmxopcua` `master`@`11de14d`, ScadaBridge `main`@`58352a6`, MxGateway→`mxaccessgw` `main`@`73e54e2`. The `0.1.0 → 0.2.0` bump first promoted nav-expand persistence
into the kit (`NavRailSection.Key`/`data-nav-key` + a localStorage `nav-state.js` enhancer emitted by a new
`<ThemeScripts/>`), so all three apps share one persistence mechanism (OtOpcUa's bespoke cookie/JS-interop nav
island retired); MxGateway additionally gained a net-new Blazor `<LoginCard>` `/login` page over its existing
hardened endpoint. Per-app result in [`components/ui-theme/GAPS.md`](components/ui-theme/GAPS.md).
Build/test from `ZB.MOM.WW.Theme/`: `dotnet test`. Consumer matrix: all three apps consume Build/test from `ZB.MOM.WW.Theme/`: `dotnet test`. Consumer matrix: all three apps consume
the single `ZB.MOM.WW.Theme` package (OtOpcUa AdminUI, MxGateway Server, ScadaBridge Host + CentralUI). the single `ZB.MOM.WW.Theme` package (OtOpcUa AdminUI, MxGateway Server, ScadaBridge Host + CentralUI).
@@ -183,9 +209,14 @@ enrichers, and redaction policies.
The shared library is **built and lives in this repo** at [`ZB.MOM.WW.Telemetry/`](ZB.MOM.WW.Telemetry/) The shared library is **built and lives in this repo** at [`ZB.MOM.WW.Telemetry/`](ZB.MOM.WW.Telemetry/)
(.NET 10; 2 packages — `ZB.MOM.WW.Telemetry`, `ZB.MOM.WW.Telemetry.Serilog`; 19 tests; (.NET 10; 2 packages — `ZB.MOM.WW.Telemetry`, `ZB.MOM.WW.Telemetry.Serilog`; 19 tests;
`dotnet pack` → 2 nupkgs @ 0.1.0). **MxAccessGateway logging adopted** (MEL → Serilog migration done on `dotnet pack` → 2 nupkgs @ 0.1.0). **Adopted across all three apps on 2026-06-01** (branch
its own branch) — the one in-pass adoption. Broader OtOpcUa and ScadaBridge telemetry adoption is `feat/adopt-zb-telemetry` per repo, behaviour-preserving): `AddZbTelemetry` (Resource + standard
follow-on, tracked in [`components/observability/GAPS.md`](components/observability/GAPS.md). instrumentation + Prometheus `/metrics`) everywhere; OtOpcUa + MxGateway on `AddZbSerilog` (MxGateway's
MEL→Serilog migration + metrics export both landed in this pass — they were *not* actually done
beforehand despite an earlier claim); ScadaBridge keeps its `LoggerConfigurationFactory` (min-level
governance) and only adds the shared `TraceContextEnricher`. Deferred: MxGateway `ms``s` + Meter
rename, ScadaBridge app instruments + Site-node HTTP/1.1 metrics listener, OTLP wiring. Per-repo
result tracked in [`components/observability/GAPS.md`](components/observability/GAPS.md).
Build/test from `ZB.MOM.WW.Telemetry/`: `dotnet test`. Consumer matrix: all three apps consume both Build/test from `ZB.MOM.WW.Telemetry/`: `dotnet test`. Consumer matrix: all three apps consume both
packages after adoption (OtOpcUa, MxGateway Server, ScadaBridge Host + any instrumented project). packages after adoption (OtOpcUa, MxGateway Server, ScadaBridge Host + any instrumented project).
@@ -203,7 +234,12 @@ The shared library is **built and lives in this repo** at [`ZB.MOM.WW.Configurat
(.NET 10; single package `ZB.MOM.WW.Configuration`; 27 tests; `dotnet pack` → 1 nupkg @ 0.1.0). (.NET 10; single package `ZB.MOM.WW.Configuration`; 27 tests; `dotnet pack` → 1 nupkg @ 0.1.0).
The implementation plan is at The implementation plan is at
[`docs/plans/2026-06-01-zb-mom-ww-configuration-shared-library.md`](docs/plans/2026-06-01-zb-mom-ww-configuration-shared-library.md). [`docs/plans/2026-06-01-zb-mom-ww-configuration-shared-library.md`](docs/plans/2026-06-01-zb-mom-ww-configuration-shared-library.md).
**Not yet adopted** by the three apps — that's the follow-on tracked in [`components/configuration/GAPS.md`](components/configuration/GAPS.md). **Adopted across all three apps on 2026-06-01** (OtOpcUa, MxAccessGateway, ScadaBridge) on each repo's
local default branch (`main`/`master`) — merged, **not yet pushed** to remotes; the package was first
published to the Gitea feed. Behaviour-preserving onto `OptionsValidatorBase`/`AddValidatedOptions`
for MxGateway + ScadaBridge (validator messages byte-identical), `StartupValidator``ConfigPreflight`
for ScadaBridge, and net-new `Ldap`/`OpcUa` validators for OtOpcUa. Per-app result tracked in
[`components/configuration/GAPS.md`](components/configuration/GAPS.md).
Build/test from `ZB.MOM.WW.Configuration/`: `dotnet test`. Consumer matrix: all three apps consume the Build/test from `ZB.MOM.WW.Configuration/`: `dotnet test`. Consumer matrix: all three apps consume the
single package; ScadaBridge is the heaviest adopter (per-module validators + `StartupValidator` single package; ScadaBridge is the heaviest adopter (per-module validators + `StartupValidator`
`ConfigPreflight`); OtOpcUa adoption is additive (it has no `IValidateOptions` usage today). `ConfigPreflight`); OtOpcUa adoption is additive (it has no `IValidateOptions` usage today).
@@ -221,10 +257,39 @@ principal. `IAuditRedactor` is aligned with Telemetry's `ILogRedactor` seam conv
The shared library is **built and lives in this repo** at [`ZB.MOM.WW.Audit/`](ZB.MOM.WW.Audit/) The shared library is **built and lives in this repo** at [`ZB.MOM.WW.Audit/`](ZB.MOM.WW.Audit/)
(.NET 10; 1 package — `ZB.MOM.WW.Audit`; only non-BCL dependency `Microsoft.Extensions.DependencyInjection.Abstractions`; (.NET 10; 1 package — `ZB.MOM.WW.Audit`; only non-BCL dependency `Microsoft.Extensions.DependencyInjection.Abstractions`;
19 tests; `dotnet pack` → 1 nupkg @ 0.1.0). Repo: `https://gitea.dohertylan.com/dohertj2/zb-mom-ww-audit`. 19 tests; `dotnet pack` → 1 nupkg @ 0.1.0). Repo: `https://gitea.dohertylan.com/dohertj2/zb-mom-ww-audit`.
**Not yet adopted** by the three apps — that's the follow-on tracked in [`components/audit/GAPS.md`](components/audit/GAPS.md). **Adopted across all three apps on 2026-06-02** (audit GAPS #1#6) on each repo's `feat/adopt-zb-audit` branch
(stacked on `feat/adopt-zb-auth`) — committed + reviewed, then **merged into the repo's local default (main/master)
and PUSHED to origin (gitea) on 2026-06-03** (in sync). Depth =
**DEEP adopt** (the canonical 9-field `AuditEvent` is the record everywhere; domain fields ride in `DetailsJson`).
OtOpcUa: canonical record + `AuditWriterActor : IAuditWriter` + `Outcome` column/migration + `ClusterAudit` fix.
MxGateway: new canonical SQLite `audit_event` store + `IAuditWriter` + `IApiKeyAuditStore`→canonical adapter.
**ScadaBridge: a full audit-subsystem re-architecture** (the program's largest task) — canonical record everywhere via a
deterministic codec; site SQLite split into `audit_event` + an `audit_forward_state` forwarding sidecar; central
partitioned `dbo.AuditLog` collapsed to 10 canonical cols + persisted computed cols (`CollapseAuditLogToCanonical`
migration, MSSQL-verified). Phase 3 wires `Actor` from the Auth principal at authenticated emit sites (per-app
`IAuditActorAccessor`). Per-repo detail in [`components/audit/GAPS.md`](components/audit/GAPS.md) +
`docs/plans/2026-06-02-auth-audit-normalization-phase2-deep.md` + `…-scadabridge-audit-rearch.md`.
Build/test from `ZB.MOM.WW.Audit/`: `dotnet test`. Consumer matrix: all three apps consume the single Build/test from `ZB.MOM.WW.Audit/`: `dotnet test`. Consumer matrix: all three apps consume the single
`ZB.MOM.WW.Audit` package (OtOpcUa, MxAccessGateway, ScadaBridge each map their own audit record/seam `ZB.MOM.WW.Audit` package (OtOpcUa, MxAccessGateway, ScadaBridge — DEEP-adopted as the canonical record).
onto the canonical type at the emit boundary).
The Galaxy Repository component normalizes the **Galaxy object-hierarchy SQL browse + reusable gRPC service**
that was previously embedded in `mxaccessgw`. Shared = canonical `galaxy_repository.v1` proto (wire-compatible
with `mxaccessgw`'s existing contract so OtOpcUa's `GalaxyRepositoryClient` is unaffected), the SQL browse
provider (`HierarchySql` / `AttributesSql` validated reverse-engineered queries), in-memory hierarchy cache +
snapshot + deploy-poll refresh `BackgroundService`, `GalaxyHierarchyProjector`, and `AddZbGalaxyRepository` /
`MapZbGalaxyRepository` DI extension. Left per-consumer = section path, subtree auth filtering, and any
app-specific paging defaults.
The shared library is **built and lives in this repo** at [`ZB.MOM.WW.GalaxyRepository/`](ZB.MOM.WW.GalaxyRepository/)
(.NET 10; single package `ZB.MOM.WW.GalaxyRepository`; `dotnet pack` → 1 nupkg @ 0.1.0, **published to
the Gitea NuGet feed** `gitea.dohertylan.com/api/packages/dohertj2/nuget`). The design doc is at
[`docs/plans/2026-06-23-historian-gateway-design.md`](docs/plans/2026-06-23-historian-gateway-design.md) (§10, component 1).
**Consumed by HistorianGateway** as a `PackageReference` from that Gitea feed, pinned at `0.1.0` (originally a
cross-repo `ProjectReference` to this scadaproj tree; switched to the feed package 2026-06-24).
**mxaccessgw adoption is a tracked follow-on** — once adopted, mxaccessgw's inline Galaxy browse code is replaced
by the shared lib (the `galaxy_repository.v1` wire contract is already identical, so OtOpcUa and ScadaBridge
clients are unaffected). Build/test from `ZB.MOM.WW.GalaxyRepository/`: `dotnet test`.
Consumer matrix: HistorianGateway (initial); mxaccessgw (follow-on adoption).
## Per-project primary commands ## Per-project primary commands
@@ -246,9 +311,22 @@ dotnet run --project src/MxGateway.Server/MxGateway.Server.csproj
# ScadaBridge (~/Desktop/ScadaBridge) # ScadaBridge (~/Desktop/ScadaBridge)
dotnet build ZB.MOM.WW.ScadaBridge.slnx dotnet build ZB.MOM.WW.ScadaBridge.slnx
bash docker/deploy.sh # rebuild + redeploy the 8-node cluster bash docker/deploy.sh # rebuild + redeploy the 8-node cluster
cd infra && docker compose up -d # local test services (LDAP, SQL, OPC UA, SMTP, REST, Traefik) cd infra && docker compose up -d # local test services (SQL, OPC UA, SMTP, REST, Traefik) — LDAP is NOT here
# HistorianGateway (~/Desktop/HistorianGateway)
dotnet build ZB.MOM.WW.HistorianGateway.slnx
dotnet test ZB.MOM.WW.HistorianGateway.slnx # unit + golden; live integration tests skip without env vars
dotnet run --project src/ZB.MOM.WW.HistorianGateway.Server/ZB.MOM.WW.HistorianGateway.Server.csproj
# dashboard on :5220, gRPC h2c on :5221
# Live integration (need HISTORIAN_GRPC_HOST + HISTORIAN_GRPC_WRITE_SANDBOX_TAG + GALAXY_SQL_CONNSTR set)
dotnet test ZB.MOM.WW.HistorianGateway.slnx --filter "Category=LiveIntegration"
``` ```
> **Shared GLAuth (all three apps + HistorianGateway):** LDAP auth for every local dev/test stack is provided by a
> single `zb-shared-glauth` container on the Linux fixture host **`10.100.0.35:3893`**
> (`baseDN dc=zb,dc=local`, Transport=None). Source of truth and deploy runbook:
> [`scadaproj/infra/glauth/`](infra/glauth/) (`config.toml` + `docker-compose.yml` + `README.md`).
## Refreshing this index ## Refreshing this index
This file is meant to be re-scanned when `scadaproj` is opened in Claude Code: This file is meant to be re-scanned when `scadaproj` is opened in Claude Code:
+24
View File
@@ -0,0 +1,24 @@
#!/usr/bin/env bash
# push.sh — pack and push the ZB.MOM.WW.Audit NuGet package to the Gitea feed.
#
# Required environment variables:
# GITEA_NUGET_SOURCE — full URL of the Gitea NuGet feed
# e.g. https://gitea.dohertylan.com/api/packages/dohertj2/nuget/index.json
# GITEA_NUGET_KEY — Gitea access token with package:write permission
#
# Usage:
# export GITEA_NUGET_SOURCE="https://gitea.dohertylan.com/api/packages/dohertj2/nuget/index.json"
# export GITEA_NUGET_KEY="your-gitea-token"
# ./build/push.sh
set -euo pipefail
: "${GITEA_NUGET_SOURCE:?set GITEA_NUGET_SOURCE to your Gitea NuGet feed URL}"
: "${GITEA_NUGET_KEY:?set GITEA_NUGET_KEY to your Gitea access token}"
dotnet pack -c Release -o ./artifacts
dotnet nuget push "./artifacts/*.nupkg" \
--source "$GITEA_NUGET_SOURCE" \
--api-key "$GITEA_NUGET_KEY" \
--skip-duplicate
+1 -1
View File
@@ -5,7 +5,7 @@
<Nullable>enable</Nullable> <Nullable>enable</Nullable>
<ImplicitUsings>enable</ImplicitUsings> <ImplicitUsings>enable</ImplicitUsings>
<LangVersion>latest</LangVersion> <LangVersion>latest</LangVersion>
<Version>0.1.0</Version> <Version>0.1.3</Version>
<ManagePackageVersionsCentrally>true</ManagePackageVersionsCentrally> <ManagePackageVersionsCentrally>true</ManagePackageVersionsCentrally>
</PropertyGroup> </PropertyGroup>
@@ -55,6 +55,12 @@ public interface IApiKeyAdminStore
Task<bool> RotateAsync(string keyId, byte[] newSecretHash, CancellationToken ct); Task<bool> RotateAsync(string keyId, byte[] newSecretHash, CancellationToken ct);
Task<bool> DeleteAsync(string keyId, CancellationToken ct); Task<bool> DeleteAsync(string keyId, CancellationToken ct);
/// <summary>Replaces the scope set on an existing key. Does not touch the secret. Returns false if the key does not exist.</summary>
Task<bool> SetScopesAsync(string keyId, IReadOnlySet<string> scopes, CancellationToken ct);
/// <summary>Enables (clears revoked_utc) or disables (sets revoked_utc) a key WITHOUT changing its secret. Returns false if the key does not exist.</summary>
Task<bool> SetEnabledAsync(string keyId, bool enabled, DateTimeOffset whenUtc, CancellationToken ct);
/// <summary> /// <summary>
/// Enumerates all API keys as hash-free <see cref="ApiKeyListItem"/> projections, newest first. /// Enumerates all API keys as hash-free <see cref="ApiKeyListItem"/> projections, newest first.
/// The secret hash is never selected, so callers cannot use this to recover secret material. /// The secret hash is never selected, so callers cannot use this to recover secret material.
@@ -187,6 +187,53 @@ public sealed class ApiKeyAdminCommands
return new KeyActionResult(deleted, status); return new KeyActionResult(deleted, status);
} }
/// <summary>
/// set-scopes: replaces the scope set on an existing key WITHOUT touching its secret, and
/// appends a <c>set-scopes</c> audit entry. Only the scope count is recorded in the audit
/// details — the scope values themselves are not logged verbatim.
/// All attempts are audited, including failures (key not found) — this is intentional to
/// maintain a complete security trail.
/// </summary>
public async Task<KeyActionResult> SetScopesAsync(
string keyId, IReadOnlySet<string> scopes, string? remoteAddress, CancellationToken ct)
{
ArgumentException.ThrowIfNullOrWhiteSpace(keyId);
ArgumentNullException.ThrowIfNull(scopes);
bool updated = await _adminStore.SetScopesAsync(keyId, scopes, ct).ConfigureAwait(false);
string status = updated ? "scopes-set" : "not-found";
// Record only the count, never the scope contents, to avoid leaking authority detail into audit.
await AppendAuditAsync(keyId, "set-scopes", remoteAddress, $"{status}; count={scopes.Count}", ct)
.ConfigureAwait(false);
return new KeyActionResult(updated, status);
}
/// <summary>
/// enable-key / disable-key: reversibly toggles a key's active state WITHOUT changing its
/// secret, and appends an <c>enable-key</c> (when enabling) or <c>disable-key</c> (when
/// disabling) audit entry.
/// All attempts are audited, including failures (key not found) — this is intentional to
/// maintain a complete security trail.
/// </summary>
public async Task<KeyActionResult> SetEnabledAsync(
string keyId, bool enabled, string? remoteAddress, CancellationToken ct)
{
ArgumentException.ThrowIfNullOrWhiteSpace(keyId);
DateTimeOffset now = _clock.GetUtcNow();
bool updated = await _adminStore.SetEnabledAsync(keyId, enabled, now, ct).ConfigureAwait(false);
string eventType = enabled ? "enable-key" : "disable-key";
string status = updated
? (enabled ? "enabled" : "disabled")
: "not-found";
await AppendAuditAsync(keyId, eventType, remoteAddress, status, ct).ConfigureAwait(false);
return new KeyActionResult(updated, status);
}
private string RequirePepper() private string RequirePepper()
{ {
string? pepper = _pepperProvider.GetPepper(); string? pepper = _pepperProvider.GetPepper();
@@ -4,7 +4,8 @@ using ZB.MOM.WW.Auth.Abstractions.ApiKeys;
namespace ZB.MOM.WW.Auth.ApiKeys.Sqlite; namespace ZB.MOM.WW.Auth.ApiKeys.Sqlite;
/// <summary> /// <summary>
/// SQLite-backed administration store for API keys (create, revoke, rotate, delete). /// SQLite-backed administration store for API keys (create, revoke, rotate, delete,
/// set-scopes, enable/disable).
/// </summary> /// </summary>
public sealed class SqliteApiKeyAdminStore(AuthSqliteConnectionFactory connectionFactory) : IApiKeyAdminStore public sealed class SqliteApiKeyAdminStore(AuthSqliteConnectionFactory connectionFactory) : IApiKeyAdminStore
{ {
@@ -85,6 +86,67 @@ public sealed class SqliteApiKeyAdminStore(AuthSqliteConnectionFactory connectio
return rows > 0; return rows > 0;
} }
/// <inheritdoc />
public async Task<bool> SetScopesAsync(string keyId, IReadOnlySet<string> scopes, CancellationToken ct)
{
ArgumentException.ThrowIfNullOrWhiteSpace(keyId);
ArgumentNullException.ThrowIfNull(scopes);
await using SqliteConnection connection =
await connectionFactory.OpenConnectionAsync(ct).ConfigureAwait(false);
await using SqliteCommand command = connection.CreateCommand();
command.CommandText = """
UPDATE api_keys
SET scopes = $scopes
WHERE key_id = $key_id;
""";
command.Parameters.AddWithValue("$key_id", keyId);
command.Parameters.AddWithValue("$scopes", ScopeSerializer.Serialize(scopes));
int rows = await command.ExecuteNonQueryAsync(ct).ConfigureAwait(false);
return rows > 0;
}
/// <inheritdoc />
public async Task<bool> SetEnabledAsync(string keyId, bool enabled, DateTimeOffset whenUtc, CancellationToken ct)
{
ArgumentException.ThrowIfNullOrWhiteSpace(keyId);
await using SqliteConnection connection =
await connectionFactory.OpenConnectionAsync(ct).ConfigureAwait(false);
await using SqliteCommand command = connection.CreateCommand();
// Reversible toggle: NO `revoked_utc IS NULL` guard (unlike RevokeAsync), so it works
// regardless of current state. Deliberately leaves secret_hash and last_used_utc untouched
// — that is what distinguishes re-enable from RotateAsync.
if (enabled)
{
command.CommandText = """
UPDATE api_keys
SET revoked_utc = NULL
WHERE key_id = $key_id;
""";
command.Parameters.AddWithValue("$key_id", keyId);
}
else
{
command.CommandText = """
UPDATE api_keys
SET revoked_utc = $revoked_utc
WHERE key_id = $key_id;
""";
command.Parameters.AddWithValue("$key_id", keyId);
command.Parameters.AddWithValue("$revoked_utc", whenUtc.ToString("O"));
}
int rows = await command.ExecuteNonQueryAsync(ct).ConfigureAwait(false);
return rows > 0;
}
/// <inheritdoc /> /// <inheritdoc />
public async Task<bool> DeleteAsync(string keyId, CancellationToken ct) public async Task<bool> DeleteAsync(string keyId, CancellationToken ct)
{ {
@@ -5,8 +5,15 @@ namespace ZB.MOM.WW.Auth.ApiKeys.Sqlite;
/// </summary> /// </summary>
public static class SqliteAuthSchema public static class SqliteAuthSchema
{ {
/// <summary>The schema version this build creates and supports.</summary> /// <summary>
public const int CurrentVersion = 1; /// The schema version this build creates and supports. This is <c>2</c>, not <c>1</c>,
/// to match the deployed databases of the donor (MxAccessGateway) this store was
/// extracted from: that store reached its final shape via a v1→v2 history and stamps
/// <c>version = 2</c> on disk. The final schema has been byte-identical since v1, so a
/// single-shot create stamped as 2 interoperates with existing <c>gateway-auth.db</c>
/// files (the migrator only refuses an on-disk version <em>newer</em> than this).
/// </summary>
public const int CurrentVersion = 2;
/// <summary>Name of the single-row table tracking the applied schema version.</summary> /// <summary>Name of the single-row table tracking the applied schema version.</summary>
public const string SchemaVersionTable = "schema_version"; public const string SchemaVersionTable = "schema_version";
@@ -35,7 +35,7 @@ public sealed class SqliteAuthStoreMigrator(AuthSqliteConnectionFactory connecti
$"Auth database schema version {existingVersion} is newer than supported version {SqliteAuthSchema.CurrentVersion}."); $"Auth database schema version {existingVersion} is newer than supported version {SqliteAuthSchema.CurrentVersion}.");
} }
await ApplyVersionOneAsync(connection, transaction, cancellationToken).ConfigureAwait(false); await ApplySchemaAsync(connection, transaction, cancellationToken).ConfigureAwait(false);
await WriteSchemaVersionAsync(connection, transaction, cancellationToken).ConfigureAwait(false); await WriteSchemaVersionAsync(connection, transaction, cancellationToken).ConfigureAwait(false);
await transaction.CommitAsync(cancellationToken).ConfigureAwait(false); await transaction.CommitAsync(cancellationToken).ConfigureAwait(false);
@@ -78,7 +78,10 @@ public sealed class SqliteAuthStoreMigrator(AuthSqliteConnectionFactory connecti
: Convert.ToInt32(version, CultureInfo.InvariantCulture); : Convert.ToInt32(version, CultureInfo.InvariantCulture);
} }
private static async Task ApplyVersionOneAsync( // Single-shot create of the final schema (all DDL is CREATE ... IF NOT EXISTS, so it is
// idempotent against an already-provisioned database). The applied version is stamped
// separately by WriteSchemaVersionAsync.
private static async Task ApplySchemaAsync(
SqliteConnection connection, SqliteConnection connection,
SqliteTransaction transaction, SqliteTransaction transaction,
CancellationToken cancellationToken) CancellationToken cancellationToken)
@@ -9,7 +9,9 @@ namespace ZB.MOM.WW.Auth.Ldap;
/// low-level error on the first real login attempt. /// low-level error on the first real login attempt.
/// </summary> /// </summary>
/// <remarks> /// <remarks>
/// Four conditions are enforced: /// Validation is skipped entirely when <see cref="LdapOptions.Enabled"/> is <c>false</c>
/// (a disabled provider's connection fields are inert). When enabled, four conditions
/// are enforced:
/// <list type="bullet"> /// <list type="bullet">
/// <item>plaintext transport (<see cref="LdapTransport.None"/>) is rejected unless /// <item>plaintext transport (<see cref="LdapTransport.None"/>) is rejected unless
/// <see cref="LdapOptions.AllowInsecure"/> is explicitly set (dev/test only);</item> /// <see cref="LdapOptions.AllowInsecure"/> is explicitly set (dev/test only);</item>
@@ -27,6 +29,14 @@ public sealed class LdapOptionsValidator : IValidateOptions<LdapOptions>
{ {
ArgumentNullException.ThrowIfNull(options); ArgumentNullException.ThrowIfNull(options);
// When LDAP is disabled, its connection fields are inert — do not require them.
// A consumer that turns LDAP off should not have to supply a server/search-base/
// service-account just to satisfy startup validation.
if (!options.Enabled)
{
return ValidateOptionsResult.Success;
}
if (options.Transport == LdapTransport.None && !options.AllowInsecure) if (options.Transport == LdapTransport.None && !options.AllowInsecure)
{ {
return ValidateOptionsResult.Fail( return ValidateOptionsResult.Fail(
@@ -292,6 +292,59 @@ public sealed class ApiKeyAdminCommandsTests : IAsyncLifetime
Assert.Equal(auditCountBefore, auditCountAfter); Assert.Equal(auditCountBefore, auditCountAfter);
} }
// --- set-scopes / enable-disable ---
[Fact]
public async Task SetEnabledAsync_And_SetScopesAsync_AppendAuditEntries()
{
ApiKeyAdminCommands commands = BuildCommands();
await commands.InitDbAsync(null, CancellationToken.None);
await commands.CreateKeyAsync(
"key-1",
"Service A",
new HashSet<string>(["read"], StringComparer.Ordinal),
null,
null,
CancellationToken.None);
// Disable, then re-enable, then replace scopes.
KeyActionResult disabled =
await commands.SetEnabledAsync("key-1", enabled: false, "10.0.0.1", CancellationToken.None);
Assert.True(disabled.Succeeded);
Assert.Null(await _read.FindActiveByKeyIdAsync("key-1", CancellationToken.None));
KeyActionResult enabled =
await commands.SetEnabledAsync("key-1", enabled: true, "10.0.0.1", CancellationToken.None);
Assert.True(enabled.Succeeded);
Assert.NotNull(await _read.FindActiveByKeyIdAsync("key-1", CancellationToken.None));
KeyActionResult scoped = await commands.SetScopesAsync(
"key-1",
new HashSet<string>(["read", "write"], StringComparer.Ordinal),
"10.0.0.1",
CancellationToken.None);
Assert.True(scoped.Succeeded);
IReadOnlyList<ApiKeyAuditEntry> recent = await _audit.ListRecentAsync(50, CancellationToken.None);
Assert.Single(recent, e => e.EventType == "disable-key");
Assert.Single(recent, e => e.EventType == "enable-key");
Assert.Single(recent, e => e.EventType == "set-scopes");
IReadOnlyList<ApiKeyListItem> listed = await commands.ListKeysAsync(CancellationToken.None);
ApiKeyListItem item = Assert.Single(listed, k => k.KeyId == "key-1");
Assert.True(item.Scopes.SetEquals(new HashSet<string>(["read", "write"], StringComparer.Ordinal)));
}
[Fact]
public async Task SetScopesAsync_NullScopes_Throws()
{
ApiKeyAdminCommands commands = BuildCommands();
await commands.InitDbAsync(null, CancellationToken.None);
await Assert.ThrowsAnyAsync<ArgumentException>(() =>
commands.SetScopesAsync("key-1", null!, null, CancellationToken.None));
}
// --- delete-key --- // --- delete-key ---
[Fact] [Fact]
@@ -105,6 +105,87 @@ public sealed class SqliteApiKeyAdminStoreTests : IAsyncLifetime
Assert.False(result); Assert.False(result);
} }
// --- SetScopes ---
[Fact]
public async Task SetScopesAsync_ReplacesScopes_AndReturnsTrue()
{
await _admin.CreateAsync(
SampleRecord("key-1") with { Scopes = new HashSet<string>(["a"], StringComparer.Ordinal) },
CancellationToken.None);
bool result = await _admin.SetScopesAsync(
"key-1",
new HashSet<string>(["b", "c"], StringComparer.Ordinal),
CancellationToken.None);
Assert.True(result);
IReadOnlyList<ApiKeyListItem> listed = await _admin.ListAsync(CancellationToken.None);
ApiKeyListItem item = Assert.Single(listed, k => k.KeyId == "key-1");
Assert.True(item.Scopes.SetEquals(new HashSet<string>(["b", "c"], StringComparer.Ordinal)));
}
[Fact]
public async Task SetScopesAsync_UnknownKey_ReturnsFalse()
{
bool result = await _admin.SetScopesAsync(
"missing",
new HashSet<string>(["b"], StringComparer.Ordinal),
CancellationToken.None);
Assert.False(result);
}
// --- SetEnabled ---
[Fact]
public async Task SetEnabledAsync_False_DisablesKey()
{
await _admin.CreateAsync(SampleRecord("key-1"), CancellationToken.None);
var when = new DateTimeOffset(2026, 5, 31, 9, 0, 0, TimeSpan.Zero);
bool result = await _admin.SetEnabledAsync("key-1", enabled: false, when, CancellationToken.None);
Assert.True(result);
Assert.Null(await _read.FindActiveByKeyIdAsync("key-1", CancellationToken.None));
ApiKeyRecord? found = await _read.FindByKeyIdAsync("key-1", CancellationToken.None);
Assert.Equal(when, found!.RevokedUtc);
}
[Fact]
public async Task SetEnabledAsync_True_ReenablesKey_WithoutChangingSecret()
{
ApiKeyRecord original = SampleRecord("key-1");
await _admin.CreateAsync(original, CancellationToken.None);
// Record some usage so we can prove last_used_utc is left untouched on re-enable.
var used = new DateTimeOffset(2026, 5, 20, 12, 0, 0, TimeSpan.Zero);
await _read.MarkUsedAsync("key-1", used, CancellationToken.None);
// Disable, then re-enable.
await _admin.SetEnabledAsync(
"key-1", enabled: false, new DateTimeOffset(2026, 5, 31, 9, 0, 0, TimeSpan.Zero), CancellationToken.None);
bool result = await _admin.SetEnabledAsync(
"key-1", enabled: true, new DateTimeOffset(2026, 6, 1, 9, 0, 0, TimeSpan.Zero), CancellationToken.None);
Assert.True(result);
// Active again, and the secret hash + last-used timestamp are unchanged.
ApiKeyRecord? active = await _read.FindActiveByKeyIdAsync("key-1", CancellationToken.None);
Assert.NotNull(active);
Assert.True(active!.SecretHash.SequenceEqual(original.SecretHash));
Assert.Null(active.RevokedUtc);
Assert.Equal(used, active.LastUsedUtc);
}
[Fact]
public async Task SetEnabledAsync_UnknownKey_ReturnsFalse()
{
bool result = await _admin.SetEnabledAsync(
"missing", enabled: false, DateTimeOffset.UtcNow, CancellationToken.None);
Assert.False(result);
}
// --- Delete --- // --- Delete ---
[Fact] [Fact]
@@ -172,6 +253,73 @@ public sealed class SqliteApiKeyAdminStoreTests : IAsyncLifetime
() => _admin.DeleteAsync(keyId!, CancellationToken.None)); () => _admin.DeleteAsync(keyId!, CancellationToken.None));
} }
[Theory]
[InlineData(null)]
[InlineData("")]
[InlineData(" ")]
public async Task SetScopesAsync_NullOrWhitespaceKeyId_ThrowsArgumentException(string? keyId)
{
await Assert.ThrowsAnyAsync<ArgumentException>(
() => _admin.SetScopesAsync(
keyId!,
new HashSet<string>(["read"], StringComparer.Ordinal),
CancellationToken.None));
}
[Theory]
[InlineData(null)]
[InlineData("")]
[InlineData(" ")]
public async Task SetEnabledAsync_NullOrWhitespaceKeyId_ThrowsArgumentException(string? keyId)
{
await Assert.ThrowsAnyAsync<ArgumentException>(
() => _admin.SetEnabledAsync(keyId!, enabled: false, DateTimeOffset.UtcNow, CancellationToken.None));
}
[Fact]
public async Task SetScopesAsync_NullScopes_ThrowsArgumentNullException()
{
await _admin.CreateAsync(SampleRecord("key-1"), CancellationToken.None);
await Assert.ThrowsAsync<ArgumentNullException>(
() => _admin.SetScopesAsync("key-1", null!, CancellationToken.None));
}
// --- SetEnabled idempotence ---
[Fact]
public async Task SetEnabledAsync_OnAlreadyActiveKey_ReturnsTrue()
{
await _admin.CreateAsync(SampleRecord("key-1"), CancellationToken.None);
bool result = await _admin.SetEnabledAsync(
"key-1", enabled: true, DateTimeOffset.UtcNow, CancellationToken.None);
Assert.True(result);
ApiKeyRecord? active = await _read.FindActiveByKeyIdAsync("key-1", CancellationToken.None);
Assert.NotNull(active);
Assert.Null(active!.RevokedUtc);
}
[Fact]
public async Task SetEnabledAsync_OnAlreadyDisabledKey_OverwritesTimestamp_ReturnsTrue()
{
await _admin.CreateAsync(SampleRecord("key-1"), CancellationToken.None);
var t1 = new DateTimeOffset(2026, 5, 1, 10, 0, 0, TimeSpan.Zero);
var t2 = new DateTimeOffset(2026, 5, 15, 10, 0, 0, TimeSpan.Zero);
// Disable at t1.
await _admin.SetEnabledAsync("key-1", enabled: false, t1, CancellationToken.None);
// Disable again at a later t2 (idempotent overwrite — no guard on revoked_utc).
bool result = await _admin.SetEnabledAsync("key-1", enabled: false, t2, CancellationToken.None);
Assert.True(result);
IReadOnlyList<ApiKeyListItem> listed = await _admin.ListAsync(CancellationToken.None);
ApiKeyListItem item = Assert.Single(listed, k => k.KeyId == "key-1");
Assert.Equal(t2, item.RevokedUtc);
}
// --- Audit --- // --- Audit ---
[Fact] [Fact]
@@ -34,6 +34,27 @@ public sealed class SqliteMigratorTests : IDisposable
Assert.Equal(1, await CountSchemaVersionRowsAsync()); Assert.Equal(1, await CountSchemaVersionRowsAsync());
} }
[Fact]
public void CurrentVersion_Is2_ToMatchDonorGatewayDeployedSchema() =>
// The store was extracted from MxAccessGateway, whose deployed gateway-auth.db is
// stamped version 2. The library must stamp 2 (not reset to 1) so it does not refuse
// those existing databases on first boot. Locking this invariant.
Assert.Equal(2, SqliteAuthSchema.CurrentVersion);
[Fact]
public async Task MigrateAsync_AgainstExistingVersion2Db_DoesNotThrow_AndStaysAt2()
{
// The deployed-gateway scenario: a database already provisioned at version 2.
var migrator = new SqliteAuthStoreMigrator(Factory);
await migrator.MigrateAsync(CancellationToken.None);
await SetVersionAsync(2);
await migrator.MigrateAsync(CancellationToken.None); // must not throw
Assert.Equal(2, await ReadVersionAsync());
Assert.True(await TableExistsAsync(SqliteAuthSchema.ApiKeysTable));
}
[Fact] [Fact]
public async Task MigrateAsync_FutureSchemaVersion_Throws() public async Task MigrateAsync_FutureSchemaVersion_Throws()
{ {
@@ -72,4 +72,20 @@ public class LdapOptionsValidatorTests
Assert.False(new LdapOptionsValidator() Assert.False(new LdapOptionsValidator()
.Validate(null, Opts()) .Validate(null, Opts())
.Failed); .Failed);
[Fact]
public void Validator_Skips_AllChecks_WhenDisabled() =>
// When LDAP is disabled its connection fields are inert; an otherwise-invalid
// config (plaintext + blank Server/SearchBase/ServiceAccountDn) must still pass.
Assert.False(new LdapOptionsValidator()
.Validate(null, new LdapOptions
{
Enabled = false,
Transport = LdapTransport.None,
AllowInsecure = false,
Server = "",
SearchBase = "",
ServiceAccountDn = "",
})
.Failed);
} }
+2 -2
View File
@@ -4,7 +4,7 @@ Startup configuration-validation library for the **ZB.MOM.WW SCADA family** (OtO
The library normalizes the three-project configuration-validation surface: a failure-accumulating `IValidateOptions` base, reusable rule primitives, a bind+validate+`ValidateOnStart` DI extension, and a pre-host `ConfigPreflight` aggregator for raw `IConfiguration` — so the plumbing is written once and domain rules stay per-project. The library normalizes the three-project configuration-validation surface: a failure-accumulating `IValidateOptions` base, reusable rule primitives, a bind+validate+`ValidateOnStart` DI extension, and a pre-host `ConfigPreflight` aggregator for raw `IConfiguration` — so the plumbing is written once and domain rules stay per-project.
**Built at 0.1.0. Not yet adopted by OtOpcUa, MxAccessGateway, or ScadaBridge.** Adoption tracked in `~/Desktop/scadaproj/components/configuration/GAPS.md`. **Built at 0.1.0. Adopted by OtOpcUa, MxAccessGateway, and ScadaBridge on 2026-06-01** (local default branches; not yet pushed to remotes). Adoption tracked in `~/Desktop/scadaproj/components/configuration/GAPS.md`.
--- ---
@@ -66,7 +66,7 @@ ZB.MOM.WW.Configuration/
## Status ## Status
Part of the **scadaproj component-normalization family** — this is the configuration + validation component. Built at **0.1.0**. **Not yet adopted by OtOpcUa, MxAccessGateway, or ScadaBridge** — follow-on adoption is tracked in: Part of the **scadaproj component-normalization family** — this is the configuration + validation component. Built at **0.1.0**. **Adopted by OtOpcUa, MxAccessGateway, and ScadaBridge on 2026-06-01** (local default branches; not yet pushed to remotes) — per-app result is tracked in:
- `~/Desktop/scadaproj/components/configuration/GAPS.md` - `~/Desktop/scadaproj/components/configuration/GAPS.md`
+1 -1
View File
@@ -101,7 +101,7 @@ No third-party packages; no ASP.NET Core framework reference.
## Status ## Status
**Built at 0.1.0. Not yet adopted by the three apps.** Adoption is tracked in the component backlog: **Built at 0.1.0. Adopted across all three apps on 2026-06-01** (local default branches; not yet pushed to remotes). Adoption is tracked in the component backlog:
- `~/Desktop/scadaproj/components/configuration/GAPS.md` - `~/Desktop/scadaproj/components/configuration/GAPS.md`
@@ -0,0 +1,11 @@
<Project>
<PropertyGroup>
<TargetFramework>net10.0</TargetFramework>
<Nullable>enable</Nullable>
<ImplicitUsings>enable</ImplicitUsings>
<LangVersion>latest</LangVersion>
<Version>0.1.0</Version>
<ManagePackageVersionsCentrally>true</ManagePackageVersionsCentrally>
<GenerateDocumentationFile>true</GenerateDocumentationFile>
</PropertyGroup>
</Project>
@@ -0,0 +1,24 @@
<Project>
<PropertyGroup>
<ManagePackageVersionsCentrally>true</ManagePackageVersionsCentrally>
</PropertyGroup>
<ItemGroup>
<!-- Library -->
<PackageVersion Include="Microsoft.Data.SqlClient" Version="6.0.2" />
<PackageVersion Include="Grpc.AspNetCore" Version="2.76.0" />
<!-- Google.Protobuf and Grpc.Tools must be >= the minimums Grpc.AspNetCore 2.76.0 requires -->
<PackageVersion Include="Google.Protobuf" Version="3.31.1" />
<PackageVersion Include="Microsoft.Extensions.Hosting.Abstractions" Version="10.0.0" />
<PackageVersion Include="Microsoft.Extensions.Options.ConfigurationExtensions" Version="10.0.0" />
<PackageVersion Include="Grpc.Tools" Version="2.76.0" />
<!-- Test -->
<PackageVersion Include="xunit" Version="2.9.3" />
<PackageVersion Include="xunit.runner.visualstudio" Version="3.1.4" />
<PackageVersion Include="Microsoft.NET.Test.Sdk" Version="17.14.1" />
<PackageVersion Include="coverlet.collector" Version="6.0.4" />
</ItemGroup>
</Project>
@@ -0,0 +1,8 @@
<Solution>
<Folder Name="/src/">
<Project Path="src/ZB.MOM.WW.GalaxyRepository/ZB.MOM.WW.GalaxyRepository.csproj" />
</Folder>
<Folder Name="/tests/">
<Project Path="tests/ZB.MOM.WW.GalaxyRepository.Tests/ZB.MOM.WW.GalaxyRepository.Tests.csproj" />
</Folder>
</Solution>
@@ -0,0 +1,71 @@
using Microsoft.AspNetCore.Builder;
using Microsoft.AspNetCore.Routing;
using Microsoft.Extensions.Configuration;
using Microsoft.Extensions.DependencyInjection;
using Microsoft.Extensions.Options;
using ZB.MOM.WW.GalaxyRepository.Grpc;
namespace ZB.MOM.WW.GalaxyRepository.DependencyInjection;
/// <summary>
/// Dependency-injection and endpoint-routing extensions that register the reusable
/// Galaxy Repository services and map the canonical gRPC service. A consuming gateway
/// calls <see cref="AddZbGalaxyRepository"/> during service registration and
/// <see cref="MapZbGalaxyRepository"/> while building its endpoint pipeline.
/// </summary>
public static class GalaxyRepositoryServiceCollectionExtensions
{
/// <summary>
/// Registers the Galaxy Repository SQL provider, shared hierarchy cache, deploy
/// notifier, on-disk snapshot store, and the background refresh service, binding
/// <see cref="GalaxyRepositoryOptions"/> from the supplied configuration section.
/// </summary>
/// <param name="services">The service collection to add registrations to.</param>
/// <param name="configuration">The application configuration root.</param>
/// <param name="sectionPath">
/// The configuration section path to bind <see cref="GalaxyRepositoryOptions"/> from
/// (for example <c>MxGateway:Galaxy</c> or <c>HistorianGateway:Galaxy</c>).
/// </param>
/// <returns>The service collection for chaining.</returns>
public static IServiceCollection AddZbGalaxyRepository(
this IServiceCollection services,
IConfiguration configuration,
string sectionPath)
{
ArgumentNullException.ThrowIfNull(services);
ArgumentNullException.ThrowIfNull(configuration);
ArgumentException.ThrowIfNullOrWhiteSpace(sectionPath);
// Bind only — this shared lib ships no validator, so a .ValidateOnStart() here
// would be a silent no-op. The consuming application owns option validation
// (e.g. the sidecar's ConfigPreflight / validated-options layer).
services
.AddOptions<GalaxyRepositoryOptions>()
.Bind(configuration.GetSection(sectionPath));
services.AddSingleton(sp =>
new GalaxyRepository(sp.GetRequiredService<IOptions<GalaxyRepositoryOptions>>().Value));
services.AddSingleton<IGalaxyRepository>(sp => sp.GetRequiredService<GalaxyRepository>());
services.AddSingleton<IGalaxyDeployNotifier, GalaxyDeployNotifier>();
services.AddSingleton<IGalaxyHierarchySnapshotStore, GalaxyHierarchySnapshotStore>();
services.AddSingleton<IGalaxyHierarchyCache, GalaxyHierarchyCache>();
services.AddHostedService<GalaxyHierarchyRefreshService>();
return services;
}
/// <summary>
/// Maps the canonical <see cref="GalaxyRepositoryGrpcService"/> onto the consuming
/// application's endpoint pipeline. Call after <see cref="AddZbGalaxyRepository"/> and
/// after gRPC has been added to the application's services.
/// </summary>
/// <param name="endpoints">The endpoint route builder to map the gRPC service onto.</param>
/// <returns>The endpoint route builder for chaining.</returns>
public static IEndpointRouteBuilder MapZbGalaxyRepository(this IEndpointRouteBuilder endpoints)
{
ArgumentNullException.ThrowIfNull(endpoints);
endpoints.MapGrpcService<GalaxyRepositoryGrpcService>();
return endpoints;
}
}
@@ -0,0 +1,41 @@
namespace ZB.MOM.WW.GalaxyRepository;
/// <summary>One row from <see cref="GalaxyRepository.GetAttributesAsync"/>.</summary>
public sealed class GalaxyAttributeRow
{
/// <summary>Gets the Galaxy object identifier.</summary>
public int GobjectId { get; init; }
/// <summary>Gets the tag name.</summary>
public string TagName { get; init; } = string.Empty;
/// <summary>Gets the attribute name.</summary>
public string AttributeName { get; init; } = string.Empty;
/// <summary>Gets the full tag reference.</summary>
public string FullTagReference { get; init; } = string.Empty;
/// <summary>Gets the MXAccess data type code.</summary>
public int MxDataType { get; init; }
/// <summary>Gets the data type name.</summary>
public string? DataTypeName { get; init; }
/// <summary>Gets a value indicating whether this is an array.</summary>
public bool IsArray { get; init; }
/// <summary>Gets the array dimension, if applicable.</summary>
public int? ArrayDimension { get; init; }
/// <summary>Gets the MXAccess attribute category code.</summary>
public int MxAttributeCategory { get; init; }
/// <summary>Gets the security classification code.</summary>
public int SecurityClassification { get; init; }
/// <summary>Gets a value indicating whether this is historized.</summary>
public bool IsHistorized { get; init; }
/// <summary>Gets a value indicating whether this is an alarm.</summary>
public bool IsAlarm { get; init; }
}
@@ -0,0 +1,19 @@
using ZB.MOM.WW.GalaxyRepository.Grpc;
namespace ZB.MOM.WW.GalaxyRepository;
/// <summary>
/// Result of one <see cref="GalaxyBrowseProjector.ProjectChildren"/> call. Holds a
/// materialized page of direct children for the requested parent, along with a
/// parallel-indexed <see cref="ChildHasChildren"/> hint and the total post-filter
/// sibling count for paging.
/// </summary>
/// <param name="Children">The page of direct children, sorted areas-first then by display name.</param>
/// <param name="ChildHasChildren">Parallel array indicating whether each child has at least one matching descendant under the same filter set.</param>
/// <param name="TotalChildCount">Total matching direct children of the parent (post-filter).</param>
/// <param name="FilterSignature">Stable signature of the filter and parent selector, used to bind page tokens.</param>
public sealed record GalaxyBrowseChildrenResult(
IReadOnlyList<GalaxyObject> Children,
IReadOnlyList<bool> ChildHasChildren,
int TotalChildCount,
string FilterSignature);
@@ -0,0 +1,281 @@
using System.Collections.Concurrent;
using System.Runtime.CompilerServices;
using System.Security.Cryptography;
using System.Text;
using Grpc.Core;
using ZB.MOM.WW.GalaxyRepository.Grpc;
namespace ZB.MOM.WW.GalaxyRepository;
/// <summary>
/// Projects one level of children of a parent object out of an immutable
/// <see cref="GalaxyHierarchyCacheEntry"/>. Pure and side-effect free. Memoizes the
/// filtered child list per cache-entry instance so repeated paging is an O(pageSize)
/// slice rather than an O(siblings) filter scan per page. The memo is keyed on the
/// immutable cache entry, so when the cache publishes a new entry the stale memo
/// becomes unreachable and is reclaimed with it.
/// </summary>
public static class GalaxyBrowseProjector
{
private static readonly ConditionalWeakTable<
GalaxyHierarchyCacheEntry,
ConcurrentDictionary<string, FilteredChildren>> FilteredChildrenCache = new();
/// <summary>Projects one page of direct children of the resolved parent.</summary>
/// <param name="entry">The Galaxy hierarchy cache entry to query.</param>
/// <param name="request">The browse-children request.</param>
/// <param name="browseSubtreeGlobs">Optional API-key browse-subtree constraints.</param>
/// <param name="offset">Zero-based offset into the filtered child list.</param>
/// <param name="pageSize">Maximum number of children to return.</param>
public static GalaxyBrowseChildrenResult ProjectChildren(
GalaxyHierarchyCacheEntry entry,
BrowseChildrenRequest request,
IReadOnlyList<string>? browseSubtreeGlobs,
int offset,
int pageSize)
{
ArgumentNullException.ThrowIfNull(entry);
ArgumentNullException.ThrowIfNull(request);
if (offset < 0)
{
throw new ArgumentOutOfRangeException(nameof(offset), offset, "Offset must be greater than or equal to zero.");
}
if (pageSize <= 0)
{
throw new ArgumentOutOfRangeException(nameof(pageSize), pageSize, "Page size must be greater than zero.");
}
int parentId = ResolveParentId(entry, request);
string filterSignature = ComputeFilterSignature(request, browseSubtreeGlobs, parentId);
FilteredChildren filtered = GetFilteredChildren(entry, request, browseSubtreeGlobs, parentId, filterSignature);
bool includeAttributes = IncludeAttributes(request);
int end = (int)Math.Min((long)offset + pageSize, filtered.Children.Count);
List<GalaxyObject> page = new(Math.Max(0, end - offset));
List<bool> hasChildren = new(Math.Max(0, end - offset));
for (int index = offset; index < end; index++)
{
page.Add(CloneObject(filtered.Children[index].Object, includeAttributes));
hasChildren.Add(filtered.HasMatchingDescendant[index]);
}
return new GalaxyBrowseChildrenResult(page, hasChildren, filtered.Children.Count, filterSignature);
}
/// <summary>
/// Resolves the request's parent oneof to a gobject id, throwing
/// <see cref="RpcException"/> with <see cref="StatusCode.NotFound"/> when the
/// parent does not exist. Public so the gRPC handler can compute the same
/// parent id (needed for the page-token signature) without reimplementing the
/// resolution rules.
/// </summary>
/// <param name="entry">The Galaxy hierarchy cache entry to query.</param>
/// <param name="request">The browse-children request.</param>
public static int ResolveParentId(GalaxyHierarchyCacheEntry entry, BrowseChildrenRequest request)
{
switch (request.ParentCase)
{
case BrowseChildrenRequest.ParentOneofCase.None:
return 0;
case BrowseChildrenRequest.ParentOneofCase.ParentGobjectId:
if (request.ParentGobjectId == 0)
{
return 0;
}
if (!entry.Index.ObjectViewsById.ContainsKey(request.ParentGobjectId))
{
throw new RpcException(new Status(StatusCode.NotFound, "BrowseChildren parent was not found."));
}
return request.ParentGobjectId;
case BrowseChildrenRequest.ParentOneofCase.ParentTagName:
{
if (!entry.Index.ObjectViewsByTagName.TryGetValue(request.ParentTagName, out GalaxyObjectView? match))
{
throw new RpcException(new Status(StatusCode.NotFound, "BrowseChildren parent was not found."));
}
return match.Object.GobjectId;
}
case BrowseChildrenRequest.ParentOneofCase.ParentContainedPath:
{
if (!entry.Index.ObjectViewsByContainedPath.TryGetValue(request.ParentContainedPath, out GalaxyObjectView? match))
{
throw new RpcException(new Status(StatusCode.NotFound, "BrowseChildren parent was not found."));
}
return match.Object.GobjectId;
}
default:
return 0;
}
}
private static FilteredChildren GetFilteredChildren(
GalaxyHierarchyCacheEntry entry,
BrowseChildrenRequest request,
IReadOnlyList<string>? browseSubtreeGlobs,
int parentId,
string filterSignature)
{
ConcurrentDictionary<string, FilteredChildren> memo =
FilteredChildrenCache.GetValue(entry, static _ => new ConcurrentDictionary<string, FilteredChildren>(StringComparer.Ordinal));
return memo.GetOrAdd(
filterSignature,
static (_, state) =>
{
IReadOnlyDictionary<int, IReadOnlyList<GalaxyObjectView>> map = state.Entry.Index.ChildrenByParent;
IReadOnlyList<GalaxyObjectView> directChildren = map.TryGetValue(state.ParentId, out IReadOnlyList<GalaxyObjectView>? list)
? list
: Array.Empty<GalaxyObjectView>();
List<GalaxyObjectView> matched = [];
List<bool> hasMatching = [];
foreach (GalaxyObjectView view in directChildren)
{
if (!MatchesBrowseSubtrees(view, state.BrowseSubtreeGlobs))
{
continue;
}
if (!MatchesFilters(view.Object, state.Request))
{
// Even if the direct child itself fails the filter, a matching
// descendant should still surface its ancestor — but only when
// there is one. Mirror the dashboard browse-tree semantics: if a
// descendant matches, include the parent with has-children true.
if (HasMatchingDescendant(view, state.Entry.Index, state.Request, state.BrowseSubtreeGlobs))
{
matched.Add(view);
hasMatching.Add(true);
}
continue;
}
matched.Add(view);
hasMatching.Add(HasMatchingDescendant(view, state.Entry.Index, state.Request, state.BrowseSubtreeGlobs));
}
return new FilteredChildren(matched, hasMatching);
},
(Entry: entry, ParentId: parentId, Request: request, BrowseSubtreeGlobs: browseSubtreeGlobs));
}
private static bool HasMatchingDescendant(
GalaxyObjectView parent,
GalaxyHierarchyIndex index,
BrowseChildrenRequest request,
IReadOnlyList<string>? browseSubtreeGlobs)
{
if (!index.ChildrenByParent.TryGetValue(parent.Object.GobjectId, out IReadOnlyList<GalaxyObjectView>? children))
{
return false;
}
// Defend against pathological cycles in Galaxy data (e.g. a corrupt A→B→A chain).
// BuildContainedPath uses the same visited-id pattern; mirror it so this walk
// terminates even when ChildrenByParent forms a cycle.
HashSet<int> visited = new() { parent.Object.GobjectId };
Stack<GalaxyObjectView> stack = new();
foreach (GalaxyObjectView child in children)
{
if (visited.Add(child.Object.GobjectId))
{
stack.Push(child);
}
}
while (stack.Count > 0)
{
GalaxyObjectView candidate = stack.Pop();
if (MatchesBrowseSubtrees(candidate, browseSubtreeGlobs)
&& MatchesFilters(candidate.Object, request))
{
return true;
}
if (index.ChildrenByParent.TryGetValue(candidate.Object.GobjectId, out IReadOnlyList<GalaxyObjectView>? grandchildren))
{
foreach (GalaxyObjectView grandchild in grandchildren)
{
if (visited.Add(grandchild.Object.GobjectId))
{
stack.Push(grandchild);
}
}
}
}
return false;
}
private static bool MatchesBrowseSubtrees(GalaxyObjectView view, IReadOnlyList<string>? browseSubtreeGlobs)
{
return browseSubtreeGlobs is null
|| browseSubtreeGlobs.Count == 0
|| browseSubtreeGlobs.Any(glob => GalaxyGlobMatcher.IsMatch(view.ContainedPath, glob));
}
private static bool MatchesFilters(GalaxyObject obj, BrowseChildrenRequest request)
{
if (request.CategoryIds.Count > 0 && !request.CategoryIds.Contains(obj.CategoryId))
{
return false;
}
foreach (string templateFilter in request.TemplateChainContains)
{
if (!obj.TemplateChain.Any(template => template.Contains(templateFilter, StringComparison.OrdinalIgnoreCase)))
{
return false;
}
}
if (!string.IsNullOrWhiteSpace(request.TagNameGlob)
&& !GalaxyGlobMatcher.IsMatch(obj.TagName, request.TagNameGlob))
{
return false;
}
if (request.AlarmBearingOnly && !obj.Attributes.Any(attribute => attribute.IsAlarm))
{
return false;
}
if (request.HistorizedOnly && !obj.Attributes.Any(attribute => attribute.IsHistorized))
{
return false;
}
return true;
}
private static bool IncludeAttributes(BrowseChildrenRequest request)
{
return !request.HasIncludeAttributes || request.IncludeAttributes;
}
private static GalaxyObject CloneObject(GalaxyObject source, bool includeAttributes)
{
GalaxyObject clone = source.Clone();
if (!includeAttributes)
{
clone.Attributes.Clear();
}
return clone;
}
/// <summary>Computes a stable filter signature for memoization purposes.</summary>
/// <param name="request">The browse-children request.</param>
/// <param name="browseSubtreeGlobs">Optional API-key browse-subtree constraints.</param>
/// <param name="parentId">Resolved parent gobject id (0 for roots).</param>
public static string ComputeFilterSignature(
BrowseChildrenRequest request,
IReadOnlyList<string>? browseSubtreeGlobs,
int parentId)
{
StringBuilder builder = new();
builder.Append("parent=").Append(parentId.ToString(System.Globalization.CultureInfo.InvariantCulture));
builder.Append("|cat=").AppendJoin(',', request.CategoryIds.Order());
builder.Append("|tpl=").AppendJoin(',', request.TemplateChainContains.Order(StringComparer.OrdinalIgnoreCase));
builder.Append("|glob=").Append(request.TagNameGlob);
builder.Append("|attrs=").Append(request.HasIncludeAttributes ? request.IncludeAttributes.ToString() : "unset");
builder.Append("|alarm=").Append(request.AlarmBearingOnly);
builder.Append("|hist=").Append(request.HistorizedOnly);
builder.Append("|browse=").AppendJoin(',', (browseSubtreeGlobs ?? Array.Empty<string>()).Order(StringComparer.OrdinalIgnoreCase));
byte[] hash = SHA256.HashData(Encoding.UTF8.GetBytes(builder.ToString()));
return Convert.ToHexString(hash, 0, 12);
}
private sealed record FilteredChildren(
IReadOnlyList<GalaxyObjectView> Children,
IReadOnlyList<bool> HasMatchingDescendant);
}
@@ -0,0 +1,18 @@
namespace ZB.MOM.WW.GalaxyRepository;
/// <summary>Freshness state of the shared Galaxy hierarchy cache entry.</summary>
public enum GalaxyCacheStatus
{
/// <summary>Cache has never completed a refresh.</summary>
Unknown = 0,
/// <summary>Cache holds data from a recent successful refresh.</summary>
Healthy = 1,
/// <summary>Cache holds data, but the most recent refresh attempt failed
/// or no successful refresh has happened within the staleness threshold.</summary>
Stale = 2,
/// <summary>Latest refresh failed and no prior data is available.</summary>
Unavailable = 3,
}
@@ -0,0 +1,19 @@
namespace ZB.MOM.WW.GalaxyRepository;
/// <summary>
/// A single Galaxy deploy notification. Published by <see cref="GalaxyHierarchyCache"/>
/// whenever a refresh detects that <c>galaxy.time_of_last_deploy</c> has changed (or on
/// the first successful refresh). Consumed by <see cref="IGalaxyDeployNotifier"/>
/// subscribers (the streaming gRPC RPC).
/// </summary>
/// <param name="Sequence">Monotonically increasing per process start; gaps indicate dropped events.</param>
/// <param name="ObservedAt">Server wall-clock when the cache observed the deploy.</param>
/// <param name="TimeOfLastDeploy">The <c>galaxy.time_of_last_deploy</c> value, or <see langword="null"/> when the Galaxy table reports none.</param>
/// <param name="ObjectCount">Number of objects in the hierarchy at the time of the event.</param>
/// <param name="AttributeCount">Number of attributes in the hierarchy at the time of the event.</param>
public sealed record GalaxyDeployEventInfo(
long Sequence,
DateTimeOffset ObservedAt,
DateTimeOffset? TimeOfLastDeploy,
int ObjectCount,
int AttributeCount);
@@ -0,0 +1,79 @@
using System.Collections.Concurrent;
using System.Runtime.CompilerServices;
using System.Threading.Channels;
namespace ZB.MOM.WW.GalaxyRepository;
/// <summary>
/// Channel-based fan-out of Galaxy deploy events to streaming gRPC subscribers. Each
/// subscriber gets a private bounded channel so a slow client cannot back-pressure
/// other subscribers or the publisher. When a subscriber's channel is full the oldest
/// event is dropped — clients use the sequence field to detect gaps.
/// </summary>
public sealed class GalaxyDeployNotifier : IGalaxyDeployNotifier
{
private const int SubscriberQueueCapacity = 16;
private readonly ConcurrentDictionary<Guid, Channel<GalaxyDeployEventInfo>> _subscribers = new();
private GalaxyDeployEventInfo? _latest;
/// <summary>
/// The most recent deploy event, or null if none has been published.
/// </summary>
public GalaxyDeployEventInfo? Latest => Volatile.Read(ref _latest);
/// <inheritdoc />
public void Publish(GalaxyDeployEventInfo info)
{
ArgumentNullException.ThrowIfNull(info);
Volatile.Write(ref _latest, info);
foreach (Channel<GalaxyDeployEventInfo> channel in _subscribers.Values)
{
// BoundedChannelFullMode.DropOldest -> writes never wait; we only fail if the
// channel was completed by the subscriber side, which we ignore.
channel.Writer.TryWrite(info);
}
}
/// <inheritdoc />
public async IAsyncEnumerable<GalaxyDeployEventInfo> SubscribeAsync(
[EnumeratorCancellation] CancellationToken cancellationToken)
{
Guid subscriberId = Guid.NewGuid();
Channel<GalaxyDeployEventInfo> channel = Channel.CreateBounded<GalaxyDeployEventInfo>(
new BoundedChannelOptions(SubscriberQueueCapacity)
{
FullMode = BoundedChannelFullMode.DropOldest,
SingleReader = true,
SingleWriter = false,
});
_subscribers[subscriberId] = channel;
// Bootstrap: emit the latest known event so subscribers don't need to wait for
// the next deploy to know current state.
GalaxyDeployEventInfo? bootstrap = Volatile.Read(ref _latest);
if (bootstrap is not null)
{
channel.Writer.TryWrite(bootstrap);
}
try
{
while (await channel.Reader.WaitToReadAsync(cancellationToken).ConfigureAwait(false))
{
while (channel.Reader.TryRead(out GalaxyDeployEventInfo? next))
{
yield return next;
}
}
}
finally
{
_subscribers.TryRemove(subscriberId, out _);
channel.Writer.TryComplete();
}
}
}
@@ -0,0 +1,131 @@
using System.Collections.Concurrent;
using System.Text;
using System.Text.RegularExpressions;
namespace ZB.MOM.WW.GalaxyRepository;
/// <summary>
/// Anchored, case-insensitive glob matcher (<c>*</c> and <c>?</c> wildcards) used by the
/// hierarchy and browse projectors to filter object tag names and browse subtrees.
/// Compiled regexes are cached and the cache is bounded so an unbounded stream of distinct
/// client-supplied globs cannot grow memory without limit.
/// </summary>
public static class GalaxyGlobMatcher
{
/// <summary>
/// Maximum number of compiled-regex entries retained in <see cref="RegexCache"/>.
/// The cache is keyed by glob pattern and patterns flow in from two sources:
/// admin-controlled API-key constraints (naturally bounded) and the
/// client-supplied <c>DiscoverHierarchyRequest.TagNameGlob</c> (unbounded — a
/// client can iterate through generated names and create millions of distinct
/// globs over the process lifetime). Capping the cache bounds memory while
/// keeping the hot working set hit-cached.
/// </summary>
internal const int RegexCacheCapacity = 256;
/// <summary>
/// Bounded compiled-regex cache keyed by glob pattern. <c>IsMatch</c> is called
/// once per object per <c>DiscoverHierarchy</c>/<c>WatchDeployEvents</c>
/// evaluation, so the same handful of glob patterns are translated
/// repeatedly; caching avoids rebuilding and recompiling the regex on every
/// call. Beyond <see cref="RegexCacheCapacity"/> entries the oldest insertion
/// is evicted so a client cannot grow the cache without bound by submitting
/// unique patterns. Eviction is approximate (FIFO over insertion order, not
/// true LRU) because we only need the bound, not exact recency tracking.
/// </summary>
private static readonly ConcurrentDictionary<string, Regex> RegexCache = new(StringComparer.Ordinal);
/// <summary>
/// Insertion-order queue used to evict the oldest cache entry when the cache
/// exceeds <see cref="RegexCacheCapacity"/>. A separate queue keeps the
/// <see cref="RegexCache"/> reads lock-free; the lock below only guards the
/// eviction path.
/// </summary>
private static readonly ConcurrentQueue<string> InsertionOrder = new();
private static readonly object EvictionLock = new();
/// <summary>
/// Current cache size, exposed for tests asserting the cap is honoured.
/// </summary>
internal static int CurrentCacheSize => RegexCache.Count;
/// <summary>Determines whether a value matches a glob pattern (with * and ? wildcards).</summary>
/// <param name="value">The value to test against the glob pattern.</param>
/// <param name="glob">The glob pattern with * and ? wildcards.</param>
public static bool IsMatch(string value, string glob)
{
if (string.IsNullOrWhiteSpace(glob))
{
return true;
}
return GetOrCreateRegex(glob).IsMatch(value ?? string.Empty);
}
private static Regex GetOrCreateRegex(string glob)
{
if (RegexCache.TryGetValue(glob, out Regex? existing))
{
return existing;
}
Regex compiled = new(
BuildRegex(glob),
RegexOptions.CultureInvariant | RegexOptions.IgnoreCase | RegexOptions.Compiled,
TimeSpan.FromMilliseconds(100));
// GetOrAdd atomically returns whichever instance is in the cache after the
// call — either the locally-compiled regex (we won the race) or the regex
// another thread inserted (we lost). It also avoids the TryAdd-then-indexer
// pattern where the key could be evicted between the failed TryAdd and the
// indexer read, producing a KeyNotFoundException under contention near the cap.
Regex result = RegexCache.GetOrAdd(glob, compiled);
if (ReferenceEquals(result, compiled))
{
// We were the inserter — track for FIFO eviction and bound the cache.
InsertionOrder.Enqueue(glob);
EvictIfOverCapacity();
}
return result;
}
private static void EvictIfOverCapacity()
{
if (RegexCache.Count <= RegexCacheCapacity)
{
return;
}
// Serialize eviction so two threads do not race past the cap together.
lock (EvictionLock)
{
while (RegexCache.Count > RegexCacheCapacity && InsertionOrder.TryDequeue(out string? oldest))
{
RegexCache.TryRemove(oldest, out _);
}
}
}
private static string BuildRegex(string glob)
{
StringBuilder builder = new("^", glob.Length + 2);
foreach (char character in glob)
{
switch (character)
{
case '*':
builder.Append(".*");
break;
case '?':
builder.Append('.');
break;
default:
builder.Append(Regex.Escape(character.ToString()));
break;
}
}
builder.Append('$');
return builder.ToString();
}
}
@@ -0,0 +1,365 @@
using Microsoft.Extensions.Logging;
using ZB.MOM.WW.GalaxyRepository.Grpc;
namespace ZB.MOM.WW.GalaxyRepository;
/// <summary>
/// Server-side cache of Galaxy Repository browse data. All gRPC clients share the same
/// entry — the materialized object list is produced once per refresh and reused across
/// requests. Refreshes are deploy-time gated: every tick queries
/// <c>galaxy.time_of_last_deploy</c> (cheap), and the heavy hierarchy + attributes rowsets
/// are pulled only when that timestamp has advanced.
/// Each successful heavy refresh is persisted to disk through
/// <see cref="IGalaxyHierarchySnapshotStore"/>; the first refresh restores that
/// snapshot (as <see cref="GalaxyCacheStatus.Stale"/>) so clients can browse
/// last-known data when the Galaxy database is unreachable on a cold start.
/// </summary>
public sealed class GalaxyHierarchyCache : IGalaxyHierarchyCache, IDisposable
{
private static readonly TimeSpan StaleThreshold = TimeSpan.FromMinutes(5);
private readonly IGalaxyRepository _repository;
private readonly IGalaxyDeployNotifier _notifier;
private readonly IGalaxyHierarchySnapshotStore? _snapshotStore;
private readonly TimeProvider _timeProvider;
private readonly ILogger<GalaxyHierarchyCache>? _logger;
private readonly TaskCompletionSource _firstLoad = new(TaskCreationOptions.RunContinuationsAsynchronously);
private readonly SemaphoreSlim _refreshGate = new(1, 1);
private GalaxyHierarchyCacheEntry _current = GalaxyHierarchyCacheEntry.Empty;
private bool _restoreAttempted;
/// <summary>Initializes a new instance of the <see cref="GalaxyHierarchyCache"/> class.</summary>
/// <param name="repository">Galaxy Repository client for SQL queries.</param>
/// <param name="notifier">Galaxy deploy event notifier.</param>
/// <param name="timeProvider">Provider for current time; defaults to system time.</param>
/// <param name="logger">Optional logger for diagnostic output.</param>
/// <param name="snapshotStore">
/// Optional on-disk snapshot store. When supplied, the cache persists each
/// successful refresh and restores the last snapshot on first load.
/// </param>
public GalaxyHierarchyCache(
IGalaxyRepository repository,
IGalaxyDeployNotifier notifier,
TimeProvider? timeProvider = null,
ILogger<GalaxyHierarchyCache>? logger = null,
IGalaxyHierarchySnapshotStore? snapshotStore = null)
{
_repository = repository;
_notifier = notifier;
_timeProvider = timeProvider ?? TimeProvider.System;
_logger = logger;
_snapshotStore = snapshotStore;
}
/// <summary>Gets the current Galaxy hierarchy cache entry with projected status.</summary>
public GalaxyHierarchyCacheEntry Current
{
get
{
GalaxyHierarchyCacheEntry snapshot = Volatile.Read(ref _current);
GalaxyCacheStatus projected = ProjectStatus(snapshot);
return projected == snapshot.Status
? snapshot
: snapshot with { Status = projected };
}
}
/// <summary>Refreshes the Galaxy hierarchy cache if the deploy time has advanced.</summary>
/// <param name="cancellationToken">Token to cancel the asynchronous operation.</param>
/// <returns>Asynchronous task representing the refresh operation.</returns>
public async Task RefreshAsync(CancellationToken cancellationToken)
{
await _refreshGate.WaitAsync(cancellationToken).ConfigureAwait(false);
try
{
await RefreshCoreAsync(cancellationToken).ConfigureAwait(false);
}
finally
{
_refreshGate.Release();
}
}
/// <summary>Waits for the Galaxy hierarchy cache to complete its first load.</summary>
/// <param name="cancellationToken">Token to cancel the asynchronous operation.</param>
/// <returns>Asynchronous task representing the wait operation.</returns>
public Task WaitForFirstLoadAsync(CancellationToken cancellationToken)
{
return _firstLoad.Task.WaitAsync(cancellationToken);
}
/// <summary>
/// Disposes the refresh gate. As a DI singleton the cache is disposed once at host
/// shutdown, after the refresh <see cref="GalaxyHierarchyRefreshService"/> has stopped,
/// so no in-flight refresh can be holding the gate.
/// </summary>
public void Dispose()
{
_refreshGate.Dispose();
}
private async Task RefreshCoreAsync(CancellationToken cancellationToken)
{
// First refresh only: seed the cache from the on-disk snapshot before
// querying SQL, so a cold start with an unreachable Galaxy database can
// still serve last-known browse data. Runs under the refresh gate.
if (!_restoreAttempted)
{
_restoreAttempted = true;
await TryRestoreFromDiskAsync(cancellationToken).ConfigureAwait(false);
}
GalaxyHierarchyCacheEntry previous = Volatile.Read(ref _current);
DateTimeOffset queriedAt = _timeProvider.GetUtcNow();
try
{
DateTime? deployRaw = await _repository.GetLastDeployTimeAsync(cancellationToken).ConfigureAwait(false);
DateTimeOffset? deployTime = deployRaw.HasValue
? new DateTimeOffset(DateTime.SpecifyKind(deployRaw.Value, DateTimeKind.Utc))
: null;
bool hasPriorData = previous.HasData;
bool deployChanged = !hasPriorData || deployTime != previous.LastDeployTime;
if (!deployChanged)
{
// No deploy change — skip heavy queries; just bump LastSuccessAt.
GalaxyHierarchyCacheEntry refreshed = previous with
{
Status = GalaxyCacheStatus.Healthy,
LastQueriedAt = queriedAt,
LastSuccessAt = queriedAt,
LastError = null,
};
Volatile.Write(ref _current, refreshed);
_firstLoad.TrySetResult();
return;
}
Task<List<GalaxyHierarchyRow>> hierarchyTask = _repository.GetHierarchyAsync(cancellationToken);
Task<List<GalaxyAttributeRow>> attributesTask = _repository.GetAttributesAsync(cancellationToken);
await Task.WhenAll(hierarchyTask, attributesTask).ConfigureAwait(false);
List<GalaxyHierarchyRow> hierarchy = hierarchyTask.Result;
List<GalaxyAttributeRow> attributes = attributesTask.Result;
long nextSequence = previous.Sequence + 1;
GalaxyHierarchyCacheEntry next = BuildEntry(
status: GalaxyCacheStatus.Healthy,
sequence: nextSequence,
lastQueriedAt: queriedAt,
lastSuccessAt: queriedAt,
lastDeployTime: deployTime,
lastError: null,
hierarchy: hierarchy,
attributes: attributes);
Volatile.Write(ref _current, next);
_firstLoad.TrySetResult();
_notifier.Publish(new GalaxyDeployEventInfo(
Sequence: nextSequence,
ObservedAt: queriedAt,
TimeOfLastDeploy: deployTime,
ObjectCount: hierarchy.Count,
AttributeCount: attributes.Count));
await PersistSnapshotAsync(deployTime, queriedAt, hierarchy, attributes, cancellationToken).ConfigureAwait(false);
}
catch (OperationCanceledException) when (cancellationToken.IsCancellationRequested)
{
throw;
}
catch (Exception exception)
{
// Catch every non-cancellation failure — not just SqlException /
// InvalidOperationException. A TimeoutException or Win32Exception
// from connection establishment, or another DbException subtype,
// must still degrade gracefully to Stale/Unavailable and complete
// _firstLoad rather than escape and fault the refresh BackgroundService.
_logger?.LogWarning(exception, "Galaxy hierarchy cache refresh failed.");
GalaxyHierarchyCacheEntry failed = previous with
{
Status = previous.HasData ? GalaxyCacheStatus.Stale : GalaxyCacheStatus.Unavailable,
LastQueriedAt = queriedAt,
LastError = exception.Message,
};
Volatile.Write(ref _current, failed);
_firstLoad.TrySetResult();
}
}
/// <summary>
/// Materializes a complete <see cref="GalaxyHierarchyCacheEntry"/> from raw
/// hierarchy and attribute rowsets. Shared by the live refresh path and the
/// on-disk restore path so both produce an identical object list and index.
/// </summary>
private static GalaxyHierarchyCacheEntry BuildEntry(
GalaxyCacheStatus status,
long sequence,
DateTimeOffset? lastQueriedAt,
DateTimeOffset? lastSuccessAt,
DateTimeOffset? lastDeployTime,
string? lastError,
IReadOnlyList<GalaxyHierarchyRow> hierarchy,
IReadOnlyList<GalaxyAttributeRow> attributes)
{
IReadOnlyList<GalaxyObject> objects = BuildObjects(hierarchy, attributes);
GalaxyHierarchyIndex index = GalaxyHierarchyIndex.Build(objects);
int areaCount = hierarchy.Count(row => row.IsArea);
int historized = attributes.Count(row => row.IsHistorized);
int alarms = attributes.Count(row => row.IsAlarm);
return new GalaxyHierarchyCacheEntry(
Status: status,
Sequence: sequence,
LastQueriedAt: lastQueriedAt,
LastSuccessAt: lastSuccessAt,
LastDeployTime: lastDeployTime,
LastError: lastError,
Objects: objects,
Index: index,
ObjectCount: hierarchy.Count,
AreaCount: areaCount,
AttributeCount: attributes.Count,
HistorizedAttributeCount: historized,
AlarmAttributeCount: alarms);
}
/// <summary>
/// Seeds the cache from the on-disk snapshot when no live data has loaded yet.
/// The restored entry is marked <see cref="GalaxyCacheStatus.Stale"/> — it is
/// last-known data, not live. A later refresh that observes the same deploy
/// time promotes it to healthy; one that observes a newer deploy replaces it.
/// </summary>
private async Task TryRestoreFromDiskAsync(CancellationToken cancellationToken)
{
if (_snapshotStore is null)
{
return;
}
if (Volatile.Read(ref _current).HasData)
{
return;
}
GalaxyHierarchySnapshot? snapshot;
try
{
snapshot = await _snapshotStore.TryLoadAsync(cancellationToken).ConfigureAwait(false);
}
catch (OperationCanceledException) when (cancellationToken.IsCancellationRequested)
{
throw;
}
catch (Exception exception)
{
_logger?.LogWarning(exception, "Failed to restore the Galaxy hierarchy from the on-disk snapshot.");
return;
}
if (snapshot is null)
{
return;
}
long sequence = Volatile.Read(ref _current).Sequence + 1;
GalaxyHierarchyCacheEntry restored = BuildEntry(
status: GalaxyCacheStatus.Stale,
sequence: sequence,
lastQueriedAt: snapshot.SavedAt,
lastSuccessAt: snapshot.SavedAt,
lastDeployTime: snapshot.LastDeployTime,
lastError: null,
hierarchy: snapshot.Hierarchy,
attributes: snapshot.Attributes);
Volatile.Write(ref _current, restored);
// Restored data is a valid completed first load: unblock callers waiting on
// the bootstrap gate immediately, rather than making them wait out the full
// wait budget for a live query that — when the database is unreachable, the
// scenario this restore exists for — may not return for seconds.
_firstLoad.TrySetResult();
_notifier.Publish(new GalaxyDeployEventInfo(
Sequence: sequence,
ObservedAt: _timeProvider.GetUtcNow(),
TimeOfLastDeploy: snapshot.LastDeployTime,
ObjectCount: snapshot.Hierarchy.Count,
AttributeCount: snapshot.Attributes.Count));
_logger?.LogInformation(
"Restored Galaxy hierarchy from on-disk snapshot saved {SavedAt:o}: {ObjectCount} objects, {AttributeCount} attributes (status Stale until the Galaxy database confirms).",
snapshot.SavedAt,
snapshot.Hierarchy.Count,
snapshot.Attributes.Count);
}
/// <summary>
/// Persists a successful refresh to disk. Persistence failures are logged and
/// swallowed — a cache that cannot write its backup is still fully usable.
/// </summary>
private async Task PersistSnapshotAsync(
DateTimeOffset? deployTime,
DateTimeOffset savedAt,
IReadOnlyList<GalaxyHierarchyRow> hierarchy,
IReadOnlyList<GalaxyAttributeRow> attributes,
CancellationToken cancellationToken)
{
if (_snapshotStore is null)
{
return;
}
try
{
await _snapshotStore.SaveAsync(
new GalaxyHierarchySnapshot(deployTime, savedAt, hierarchy, attributes),
cancellationToken).ConfigureAwait(false);
}
catch (OperationCanceledException) when (cancellationToken.IsCancellationRequested)
{
// The refresh was cancelled (service shutdown) before the write finished.
// That is not a persistence failure — do not log it as a warning.
}
catch (Exception exception)
{
_logger?.LogWarning(exception, "Failed to persist the Galaxy hierarchy snapshot to disk.");
}
}
private static IReadOnlyList<GalaxyObject> BuildObjects(
IReadOnlyList<GalaxyHierarchyRow> hierarchy,
IReadOnlyList<GalaxyAttributeRow> attributes)
{
Dictionary<int, List<GalaxyAttributeRow>> attributesByGobjectId = attributes
.GroupBy(a => a.GobjectId)
.ToDictionary(g => g.Key, g => g.ToList());
List<GalaxyObject> objects = new(hierarchy.Count);
foreach (GalaxyHierarchyRow row in hierarchy)
{
objects.Add(GalaxyProtoMapper.MapObject(row, attributesByGobjectId));
}
return objects;
}
private GalaxyCacheStatus ProjectStatus(GalaxyHierarchyCacheEntry snapshot)
{
if (snapshot.Status is GalaxyCacheStatus.Unknown or GalaxyCacheStatus.Unavailable)
{
return snapshot.Status;
}
if (snapshot.LastSuccessAt is { } success
&& _timeProvider.GetUtcNow() - success > StaleThreshold)
{
return GalaxyCacheStatus.Stale;
}
return snapshot.Status;
}
}
@@ -0,0 +1,56 @@
using ZB.MOM.WW.GalaxyRepository.Grpc;
namespace ZB.MOM.WW.GalaxyRepository;
/// <summary>
/// Immutable snapshot of the Galaxy Repository browse data held by
/// <see cref="GalaxyHierarchyCache"/>. Multiple gRPC clients share the same
/// materialized object list and precomputed hierarchy index.
/// </summary>
/// <param name="Status">The cache freshness state at the time the entry was produced.</param>
/// <param name="Sequence">Monotonically increasing per process start; bumped on each heavy refresh.</param>
/// <param name="LastQueriedAt">UTC wall-clock of the most recent refresh attempt.</param>
/// <param name="LastSuccessAt">UTC wall-clock of the most recent successful refresh.</param>
/// <param name="LastDeployTime">The <c>galaxy.time_of_last_deploy</c> the data was pulled at.</param>
/// <param name="LastError">The most recent refresh error message, or <see langword="null"/>.</param>
/// <param name="Objects">The materialized Galaxy object list.</param>
/// <param name="Index">Precomputed lookup structures over <paramref name="Objects"/>.</param>
/// <param name="ObjectCount">Number of objects in the hierarchy.</param>
/// <param name="AreaCount">Number of area objects in the hierarchy.</param>
/// <param name="AttributeCount">Number of attributes across all objects.</param>
/// <param name="HistorizedAttributeCount">Number of historized attributes.</param>
/// <param name="AlarmAttributeCount">Number of alarm-bearing attributes.</param>
public sealed record GalaxyHierarchyCacheEntry(
GalaxyCacheStatus Status,
long Sequence,
DateTimeOffset? LastQueriedAt,
DateTimeOffset? LastSuccessAt,
DateTimeOffset? LastDeployTime,
string? LastError,
IReadOnlyList<GalaxyObject> Objects,
GalaxyHierarchyIndex Index,
int ObjectCount,
int AreaCount,
int AttributeCount,
int HistorizedAttributeCount,
int AlarmAttributeCount)
{
/// <summary>Gets an empty Galaxy hierarchy cache entry.</summary>
public static GalaxyHierarchyCacheEntry Empty { get; } = new(
Status: GalaxyCacheStatus.Unknown,
Sequence: 0,
LastQueriedAt: null,
LastSuccessAt: null,
LastDeployTime: null,
LastError: null,
Objects: Array.Empty<GalaxyObject>(),
Index: GalaxyHierarchyIndex.Empty,
ObjectCount: 0,
AreaCount: 0,
AttributeCount: 0,
HistorizedAttributeCount: 0,
AlarmAttributeCount: 0);
/// <summary>Gets a value indicating whether the cache entry contains usable data.</summary>
public bool HasData => Status is GalaxyCacheStatus.Healthy or GalaxyCacheStatus.Stale;
}
@@ -0,0 +1,206 @@
using ZB.MOM.WW.GalaxyRepository.Grpc;
namespace ZB.MOM.WW.GalaxyRepository;
/// <summary>
/// Precomputed lookup structures over a materialized Galaxy object list. Built once per
/// cache entry so browse/discover handlers can resolve roots/parents by id, tag name, or
/// contained path in O(1), enumerate direct children, and resolve tag addresses to objects
/// or attributes without rescanning the full object list.
/// </summary>
public sealed class GalaxyHierarchyIndex
{
private GalaxyHierarchyIndex(
IReadOnlyList<GalaxyObjectView> objectViews,
IReadOnlyDictionary<int, GalaxyObjectView> objectViewsById,
IReadOnlyDictionary<string, GalaxyTagLookup> tagsByAddress,
IReadOnlyDictionary<int, IReadOnlyList<GalaxyObjectView>> childrenByParent,
IReadOnlyDictionary<string, GalaxyObjectView> objectViewsByTagName,
IReadOnlyDictionary<string, GalaxyObjectView> objectViewsByContainedPath)
{
ObjectViews = objectViews;
ObjectViewsById = objectViewsById;
TagsByAddress = tagsByAddress;
ChildrenByParent = childrenByParent;
ObjectViewsByTagName = objectViewsByTagName;
ObjectViewsByContainedPath = objectViewsByContainedPath;
}
/// <summary>Gets an empty Galaxy hierarchy index.</summary>
public static GalaxyHierarchyIndex Empty { get; } = new(
Array.Empty<GalaxyObjectView>(),
new Dictionary<int, GalaxyObjectView>(),
new Dictionary<string, GalaxyTagLookup>(StringComparer.OrdinalIgnoreCase),
new Dictionary<int, IReadOnlyList<GalaxyObjectView>>(),
new Dictionary<string, GalaxyObjectView>(StringComparer.OrdinalIgnoreCase),
new Dictionary<string, GalaxyObjectView>(StringComparer.OrdinalIgnoreCase));
/// <summary>Gets the object views.</summary>
public IReadOnlyList<GalaxyObjectView> ObjectViews { get; }
/// <summary>Gets the object views indexed by gobject id.</summary>
public IReadOnlyDictionary<int, GalaxyObjectView> ObjectViewsById { get; }
/// <summary>Gets tags indexed by address.</summary>
public IReadOnlyDictionary<string, GalaxyTagLookup> TagsByAddress { get; }
/// <summary>Gets direct children grouped by parent gobject id. Root objects (no parent, or self-parented) live under key 0. Each list is sorted areas-first, then by display name (OrdinalIgnoreCase).</summary>
public IReadOnlyDictionary<int, IReadOnlyList<GalaxyObjectView>> ChildrenByParent { get; }
/// <summary>Gets object views indexed by <see cref="GalaxyObject.TagName"/> (OrdinalIgnoreCase). Lets browse/discover handlers resolve parents/roots by tag name in O(1) instead of scanning <see cref="ObjectViews"/>.</summary>
public IReadOnlyDictionary<string, GalaxyObjectView> ObjectViewsByTagName { get; }
/// <summary>Gets object views indexed by contained path (OrdinalIgnoreCase). Lets browse/discover handlers resolve parents/roots by path in O(1) instead of scanning <see cref="ObjectViews"/>.</summary>
public IReadOnlyDictionary<string, GalaxyObjectView> ObjectViewsByContainedPath { get; }
/// <summary>Builds a Galaxy hierarchy index from the given objects.</summary>
/// <param name="objects">The Galaxy objects to index.</param>
/// <returns>A new Galaxy hierarchy index.</returns>
public static GalaxyHierarchyIndex Build(IReadOnlyList<GalaxyObject> objects)
{
if (objects.Count == 0)
{
return Empty;
}
Dictionary<int, GalaxyObject> objectsById = new();
foreach (GalaxyObject obj in objects)
{
objectsById.TryAdd(obj.GobjectId, obj);
}
List<GalaxyObjectView> views = new(objects.Count);
Dictionary<int, GalaxyObjectView> viewsById = new();
Dictionary<string, GalaxyTagLookup> tagsByAddress = new(StringComparer.OrdinalIgnoreCase);
Dictionary<string, GalaxyObjectView> viewsByTagName = new(StringComparer.OrdinalIgnoreCase);
Dictionary<string, GalaxyObjectView> viewsByContainedPath = new(StringComparer.OrdinalIgnoreCase);
foreach (GalaxyObject obj in objects)
{
string path = BuildContainedPath(obj, objectsById);
int depth = string.IsNullOrWhiteSpace(path) ? 0 : path.Count(character => character == '/');
GalaxyObjectView view = new(obj, path, depth);
views.Add(view);
viewsById.TryAdd(obj.GobjectId, view);
if (!string.IsNullOrWhiteSpace(obj.TagName))
{
tagsByAddress.TryAdd(obj.TagName, new GalaxyTagLookup(obj, Attribute: null, path));
viewsByTagName.TryAdd(obj.TagName, view);
}
if (!string.IsNullOrWhiteSpace(path))
{
viewsByContainedPath.TryAdd(path, view);
}
foreach (GalaxyAttribute attribute in obj.Attributes)
{
if (!string.IsNullOrWhiteSpace(attribute.FullTagReference))
{
tagsByAddress.TryAdd(attribute.FullTagReference, new GalaxyTagLookup(obj, attribute, path));
}
}
}
Dictionary<int, List<GalaxyObjectView>> childrenByParent = new();
foreach (GalaxyObjectView view in views)
{
int parentKey = view.Object.ParentGobjectId;
// Treat self-parented (corrupt) rows as roots.
if (parentKey == view.Object.GobjectId)
{
parentKey = 0;
}
// Re-root orphans whose parent object is absent from the set (e.g. a deleted or
// never-loaded container area). Otherwise they bucket under a phantom parent id
// that is never reached from the root, so they vanish from browse entirely.
else if (parentKey != 0 && !objectsById.ContainsKey(parentKey))
{
parentKey = 0;
}
if (!childrenByParent.TryGetValue(parentKey, out List<GalaxyObjectView>? bucket))
{
bucket = [];
childrenByParent[parentKey] = bucket;
}
bucket.Add(view);
}
foreach (List<GalaxyObjectView> bucket in childrenByParent.Values)
{
bucket.Sort(CompareByAreaThenDisplayName);
}
Dictionary<int, IReadOnlyList<GalaxyObjectView>> readOnlyChildren = new(childrenByParent.Count);
foreach (KeyValuePair<int, List<GalaxyObjectView>> kvp in childrenByParent)
{
readOnlyChildren[kvp.Key] = kvp.Value;
}
return new GalaxyHierarchyIndex(
views,
viewsById,
tagsByAddress,
readOnlyChildren,
viewsByTagName,
viewsByContainedPath);
}
private static string BuildContainedPath(
GalaxyObject obj,
IReadOnlyDictionary<int, GalaxyObject> objectsById)
{
Stack<string> names = new();
HashSet<int> seen = [];
GalaxyObject? current = obj;
while (current is not null && seen.Add(current.GobjectId))
{
names.Push(ResolvePathSegment(current));
current = current.ParentGobjectId != 0
&& objectsById.TryGetValue(current.ParentGobjectId, out GalaxyObject? parent)
? parent
: null;
}
return string.Join('/', names.Where(name => !string.IsNullOrWhiteSpace(name)));
}
private static string ResolvePathSegment(GalaxyObject obj)
{
if (!string.IsNullOrWhiteSpace(obj.ContainedName))
{
return obj.ContainedName;
}
if (!string.IsNullOrWhiteSpace(obj.BrowseName))
{
return obj.BrowseName;
}
return obj.TagName;
}
private static int CompareByAreaThenDisplayName(GalaxyObjectView left, GalaxyObjectView right)
{
if (left.Object.IsArea != right.Object.IsArea)
{
return left.Object.IsArea ? -1 : 1;
}
return string.Compare(DisplayNameOf(left), DisplayNameOf(right), StringComparison.OrdinalIgnoreCase);
}
private static string DisplayNameOf(GalaxyObjectView view)
{
GalaxyObject obj = view.Object;
if (!string.IsNullOrWhiteSpace(obj.BrowseName))
{
return obj.BrowseName;
}
if (!string.IsNullOrWhiteSpace(obj.ContainedName))
{
return obj.ContainedName;
}
return obj.TagName;
}
}
@@ -0,0 +1,317 @@
using System.Collections.Concurrent;
using System.Runtime.CompilerServices;
using System.Security.Cryptography;
using System.Text;
using Grpc.Core;
using ZB.MOM.WW.GalaxyRepository.Grpc;
namespace ZB.MOM.WW.GalaxyRepository;
/// <summary>
/// Projects a <c>DiscoverHierarchy</c> request against an immutable
/// <see cref="GalaxyHierarchyCacheEntry"/>: applies the root/depth/category/template/glob
/// filters, pages the result, and memoizes the filtered list per cache-entry instance so
/// paging is O(pageSize) rather than O(total) per page. Pure and side-effect free.
/// </summary>
public static class GalaxyHierarchyProjector
{
/// <summary>
/// Per-cache-entry memo of filtered, ordered <see cref="GalaxyObjectView"/> lists
/// keyed by filter signature. Without it, paging through a large hierarchy
/// re-applies every filter and re-scans the full <see cref="GalaxyHierarchyIndex.ObjectViews"/>
/// collection on every page — O(total) per page, O(total²/pageSize) end-to-end.
/// With it, the first page builds the filtered list and each subsequent page is an
/// O(pageSize) slice. The table is keyed on the immutable cache-entry instance, so
/// when the cache publishes a new entry the stale memo becomes unreachable and is
/// reclaimed with it — no explicit invalidation needed.
/// </summary>
private static readonly ConditionalWeakTable<GalaxyHierarchyCacheEntry, ConcurrentDictionary<string, IReadOnlyList<GalaxyObjectView>>> FilteredViewCache = new();
/// <summary>Projects a discovery request against a cache entry and returns all matching objects.</summary>
/// <param name="entry">The Galaxy hierarchy cache entry.</param>
/// <param name="request">The discovery hierarchy request.</param>
/// <param name="browseSubtreeGlobs">Optional glob patterns to filter browse subtrees.</param>
public static GalaxyHierarchyQueryResult Project(
GalaxyHierarchyCacheEntry entry,
DiscoverHierarchyRequest request,
IReadOnlyList<string>? browseSubtreeGlobs = null)
{
return Project(
entry,
request,
browseSubtreeGlobs,
offset: 0,
pageSize: int.MaxValue);
}
/// <summary>Projects a discovery request with paging against a cache entry and returns a page of matching objects.</summary>
/// <param name="entry">The Galaxy hierarchy cache entry.</param>
/// <param name="request">The discovery hierarchy request.</param>
/// <param name="browseSubtreeGlobs">Optional glob patterns to filter browse subtrees.</param>
/// <param name="offset">The zero-based offset into the result set.</param>
/// <param name="pageSize">The maximum number of results to return.</param>
public static GalaxyHierarchyQueryResult Project(
GalaxyHierarchyCacheEntry entry,
DiscoverHierarchyRequest request,
IReadOnlyList<string>? browseSubtreeGlobs,
int offset,
int pageSize)
{
ArgumentNullException.ThrowIfNull(entry);
ArgumentNullException.ThrowIfNull(request);
if (offset < 0)
{
throw new ArgumentOutOfRangeException(nameof(offset), offset, "Offset must be greater than or equal to zero.");
}
if (pageSize <= 0)
{
throw new ArgumentOutOfRangeException(nameof(pageSize), pageSize, "Page size must be greater than zero.");
}
int? maxDepth = request.MaxDepth;
if (maxDepth < 0)
{
throw new RpcException(new Status(
StatusCode.InvalidArgument,
"DiscoverHierarchy max_depth must be greater than or equal to zero when provided."));
}
string filterSignature = ComputeFilterSignature(request, browseSubtreeGlobs);
IReadOnlyList<GalaxyObjectView> matchedViews = GetFilteredViews(
entry,
request,
browseSubtreeGlobs,
maxDepth,
filterSignature);
bool includeAttributes = IncludeAttributes(request);
List<GalaxyObject> page = new(Math.Min(pageSize, Math.Max(0, matchedViews.Count - offset)));
int end = (int)Math.Min((long)offset + pageSize, matchedViews.Count);
for (int index = offset; index < end; index++)
{
page.Add(CloneObject(matchedViews[index].Object, includeAttributes));
}
return new GalaxyHierarchyQueryResult(
page,
matchedViews.Count,
filterSignature);
}
private static IReadOnlyList<GalaxyObjectView> GetFilteredViews(
GalaxyHierarchyCacheEntry entry,
DiscoverHierarchyRequest request,
IReadOnlyList<string>? browseSubtreeGlobs,
int? maxDepth,
string filterSignature)
{
// ResolveRoot can throw RpcException(NotFound); run it before consulting the
// memo so a bad root surfaces consistently regardless of cache state.
IReadOnlyList<GalaxyObjectView> views = entry.Index.ObjectViews;
GalaxyObjectView? root = ResolveRoot(request, entry.Index);
ConcurrentDictionary<string, IReadOnlyList<GalaxyObjectView>> memo =
FilteredViewCache.GetValue(entry, static _ => new ConcurrentDictionary<string, IReadOnlyList<GalaxyObjectView>>(StringComparer.Ordinal));
return memo.GetOrAdd(
filterSignature,
static (_, state) =>
{
List<GalaxyObjectView> matched = [];
foreach (GalaxyObjectView view in state.Views)
{
if (MatchesRoot(view, state.Root, state.MaxDepth)
&& MatchesBrowseSubtrees(view, state.BrowseSubtreeGlobs)
&& MatchesFilters(view.Object, state.Request))
{
matched.Add(view);
}
}
return matched;
},
(Views: views, Root: root, MaxDepth: maxDepth, BrowseSubtreeGlobs: browseSubtreeGlobs, Request: request));
}
/// <summary>Finds an object in the hierarchy by its tag address.</summary>
/// <param name="entry">The Galaxy hierarchy cache entry.</param>
/// <param name="tagAddress">The tag address to search for.</param>
public static GalaxyObject? FindObjectForTag(
GalaxyHierarchyCacheEntry entry,
string tagAddress)
{
if (string.IsNullOrWhiteSpace(tagAddress))
{
return null;
}
return entry.Index.TagsByAddress.TryGetValue(tagAddress, out GalaxyTagLookup? lookup)
? lookup.Object
: null;
}
/// <summary>Finds an attribute in the hierarchy by its tag address.</summary>
/// <param name="entry">The Galaxy hierarchy cache entry.</param>
/// <param name="tagAddress">The tag address to search for.</param>
public static GalaxyAttribute? FindAttributeForTag(
GalaxyHierarchyCacheEntry entry,
string tagAddress)
{
if (string.IsNullOrWhiteSpace(tagAddress))
{
return null;
}
return entry.Index.TagsByAddress.TryGetValue(tagAddress, out GalaxyTagLookup? lookup)
? lookup.Attribute
: null;
}
/// <summary>Gets the contained path for an object by its gobject ID.</summary>
/// <param name="entry">The Galaxy hierarchy cache entry.</param>
/// <param name="gobjectId">The Galaxy object ID.</param>
public static string GetContainedPath(
GalaxyHierarchyCacheEntry entry,
int gobjectId)
{
return entry.Index.ObjectViewsById.TryGetValue(gobjectId, out GalaxyObjectView? view)
? view.ContainedPath
: string.Empty;
}
private static GalaxyObjectView? ResolveRoot(
DiscoverHierarchyRequest request,
GalaxyHierarchyIndex index)
{
GalaxyObjectView? root = request.RootCase switch
{
DiscoverHierarchyRequest.RootOneofCase.None => null,
DiscoverHierarchyRequest.RootOneofCase.RootGobjectId =>
index.ObjectViewsById.TryGetValue(request.RootGobjectId, out GalaxyObjectView? byId) ? byId : null,
DiscoverHierarchyRequest.RootOneofCase.RootTagName =>
index.ObjectViewsByTagName.TryGetValue(request.RootTagName, out GalaxyObjectView? byTag) ? byTag : null,
DiscoverHierarchyRequest.RootOneofCase.RootContainedPath =>
index.ObjectViewsByContainedPath.TryGetValue(request.RootContainedPath, out GalaxyObjectView? byPath) ? byPath : null,
_ => null,
};
if (request.RootCase != DiscoverHierarchyRequest.RootOneofCase.None && root is null)
{
throw new RpcException(new Status(StatusCode.NotFound, "DiscoverHierarchy root was not found."));
}
return root;
}
private static bool MatchesRoot(
GalaxyObjectView view,
GalaxyObjectView? root,
int? maxDepth)
{
if (root is null)
{
return true;
}
bool isRoot = view.Object.GobjectId == root.Object.GobjectId;
bool isDescendant = view.ContainedPath.StartsWith(root.ContainedPath + "/", StringComparison.OrdinalIgnoreCase);
if (!isRoot && !isDescendant)
{
return false;
}
return maxDepth is null || view.Depth - root.Depth <= maxDepth.Value;
}
private static bool MatchesBrowseSubtrees(
GalaxyObjectView view,
IReadOnlyList<string>? browseSubtreeGlobs)
{
return browseSubtreeGlobs is null
|| browseSubtreeGlobs.Count == 0
|| browseSubtreeGlobs.Any(glob => GalaxyGlobMatcher.IsMatch(view.ContainedPath, glob));
}
private static bool MatchesFilters(
GalaxyObject obj,
DiscoverHierarchyRequest request)
{
if (request.CategoryIds.Count > 0 && !request.CategoryIds.Contains(obj.CategoryId))
{
return false;
}
foreach (string templateFilter in request.TemplateChainContains)
{
if (!obj.TemplateChain.Any(template => template.Contains(templateFilter, StringComparison.OrdinalIgnoreCase)))
{
return false;
}
}
if (!string.IsNullOrWhiteSpace(request.TagNameGlob)
&& !GalaxyGlobMatcher.IsMatch(obj.TagName, request.TagNameGlob))
{
return false;
}
if (request.AlarmBearingOnly && !obj.Attributes.Any(attribute => attribute.IsAlarm))
{
return false;
}
if (request.HistorizedOnly && !obj.Attributes.Any(attribute => attribute.IsHistorized))
{
return false;
}
return true;
}
private static bool IncludeAttributes(DiscoverHierarchyRequest request)
{
return !request.HasIncludeAttributes || request.IncludeAttributes;
}
private static GalaxyObject CloneObject(GalaxyObject source, bool includeAttributes)
{
GalaxyObject clone = source.Clone();
if (!includeAttributes)
{
clone.Attributes.Clear();
}
return clone;
}
/// <summary>Computes a stable filter signature for memoization purposes.</summary>
/// <param name="request">The discovery hierarchy request.</param>
/// <param name="browseSubtreeGlobs">Optional glob patterns to filter browse subtrees.</param>
public static string ComputeFilterSignature(
DiscoverHierarchyRequest request,
IReadOnlyList<string>? browseSubtreeGlobs)
{
StringBuilder builder = new();
builder.Append("root=").Append(request.RootCase).Append('|');
builder.Append(request.RootCase switch
{
DiscoverHierarchyRequest.RootOneofCase.RootGobjectId => request.RootGobjectId.ToString(
System.Globalization.CultureInfo.InvariantCulture),
DiscoverHierarchyRequest.RootOneofCase.RootTagName => request.RootTagName,
DiscoverHierarchyRequest.RootOneofCase.RootContainedPath => request.RootContainedPath,
_ => string.Empty,
});
builder.Append("|max=").Append(request.MaxDepth?.ToString(System.Globalization.CultureInfo.InvariantCulture) ?? "");
builder.Append("|cat=").AppendJoin(',', request.CategoryIds.Order());
builder.Append("|tpl=").AppendJoin(',', request.TemplateChainContains.Order(StringComparer.OrdinalIgnoreCase));
builder.Append("|glob=").Append(request.TagNameGlob);
builder.Append("|attrs=").Append(request.HasIncludeAttributes ? request.IncludeAttributes.ToString() : "unset");
builder.Append("|alarm=").Append(request.AlarmBearingOnly);
builder.Append("|hist=").Append(request.HistorizedOnly);
builder.Append("|browse=").AppendJoin(',', (browseSubtreeGlobs ?? Array.Empty<string>()).Order(StringComparer.OrdinalIgnoreCase));
byte[] hash = SHA256.HashData(Encoding.UTF8.GetBytes(builder.ToString()));
return Convert.ToHexString(hash, 0, 12);
}
}
@@ -0,0 +1,16 @@
using ZB.MOM.WW.GalaxyRepository.Grpc;
namespace ZB.MOM.WW.GalaxyRepository;
/// <summary>
/// Result of one <see cref="GalaxyHierarchyProjector.Project(GalaxyHierarchyCacheEntry, DiscoverHierarchyRequest, System.Collections.Generic.IReadOnlyList{string}, int, int)"/>
/// call: a materialized page of matching objects, the total post-filter object count, and
/// the stable filter signature used to bind page tokens.
/// </summary>
/// <param name="Objects">The page of matching objects.</param>
/// <param name="TotalObjectCount">Total matching objects across the whole hierarchy (post-filter).</param>
/// <param name="FilterSignature">Stable signature of the filter set, used to bind page tokens.</param>
public sealed record GalaxyHierarchyQueryResult(
IReadOnlyList<GalaxyObject> Objects,
int TotalObjectCount,
string FilterSignature);
@@ -0,0 +1,62 @@
using Microsoft.Extensions.Hosting;
using Microsoft.Extensions.Logging;
using Microsoft.Extensions.Options;
namespace ZB.MOM.WW.GalaxyRepository;
/// <summary>Background service that periodically refreshes the Galaxy Repository hierarchy cache off the request path.</summary>
public sealed class GalaxyHierarchyRefreshService(
IGalaxyHierarchyCache cache,
IOptions<GalaxyRepositoryOptions> options,
ILogger<GalaxyHierarchyRefreshService> logger,
TimeProvider? timeProvider = null) : BackgroundService
{
private readonly TimeProvider _timeProvider = timeProvider ?? TimeProvider.System;
/// <inheritdoc />
protected override async Task ExecuteAsync(CancellationToken stoppingToken)
{
TimeSpan interval = TimeSpan.FromSeconds(Math.Max(1, options.Value.DashboardRefreshIntervalSeconds));
try
{
await cache.RefreshAsync(stoppingToken).ConfigureAwait(false);
}
catch (OperationCanceledException) when (stoppingToken.IsCancellationRequested)
{
return;
}
catch (Exception exception)
{
// A transient first-load failure (e.g. a TimeoutException or
// Win32Exception from connection establishment, or a DbException
// subtype the cache does not catch) must not fault this
// BackgroundService and stop the whole host. The cache records
// its own Unavailable/Stale status; the periodic tick below retries.
logger.LogWarning(exception, "Initial Galaxy hierarchy cache load failed; will retry on the refresh interval.");
}
using PeriodicTimer timer = new(interval, _timeProvider);
try
{
while (await timer.WaitForNextTickAsync(stoppingToken).ConfigureAwait(false))
{
try
{
await cache.RefreshAsync(stoppingToken).ConfigureAwait(false);
}
catch (OperationCanceledException) when (stoppingToken.IsCancellationRequested)
{
return;
}
catch (Exception exception)
{
logger.LogWarning(exception, "Galaxy hierarchy cache refresh tick failed.");
}
}
}
catch (OperationCanceledException) when (stoppingToken.IsCancellationRequested)
{
}
}
}
@@ -0,0 +1,35 @@
namespace ZB.MOM.WW.GalaxyRepository;
/// <summary>
/// One row from <see cref="GalaxyRepository.GetHierarchyAsync"/>: a deployed Galaxy
/// <c>gobject</c> with its hierarchy parent and template-derivation chain.
/// </summary>
public sealed class GalaxyHierarchyRow
{
/// <summary>Gets the Galaxy object identifier.</summary>
public int GobjectId { get; init; }
/// <summary>Gets the tag name.</summary>
public string TagName { get; init; } = string.Empty;
/// <summary>Gets the contained name.</summary>
public string ContainedName { get; init; } = string.Empty;
/// <summary>Gets the browse name.</summary>
public string BrowseName { get; init; } = string.Empty;
/// <summary>Gets the parent Galaxy object identifier.</summary>
public int ParentGobjectId { get; init; }
/// <summary>Gets a value indicating whether this is an area.</summary>
public bool IsArea { get; init; }
/// <summary>Gets the category identifier.</summary>
public int CategoryId { get; init; }
/// <summary>Gets the Galaxy object identifier of the host.</summary>
public int HostedByGobjectId { get; init; }
/// <summary>Gets the template derivation chain.</summary>
public IReadOnlyList<string> TemplateChain { get; init; } = Array.Empty<string>();
}
@@ -0,0 +1,24 @@
namespace ZB.MOM.WW.GalaxyRepository;
/// <summary>
/// A serializable point-in-time copy of the Galaxy Repository browse data.
/// Holds the raw hierarchy and attribute rowsets — not the materialized
/// protobuf objects — so the restore path runs the exact same
/// materialization as a live refresh. Persisted by
/// <see cref="IGalaxyHierarchySnapshotStore"/> after a successful refresh
/// and reloaded at startup when the Galaxy database is unreachable.
/// </summary>
/// <param name="LastDeployTime">
/// The <c>galaxy.time_of_last_deploy</c> the rowsets were pulled at, or
/// <see langword="null"/> when the Galaxy table reported no deploy. A later
/// live refresh that observes this same timestamp can promote the restored
/// entry to healthy without re-running the heavy queries.
/// </param>
/// <param name="SavedAt">UTC wall-clock when the snapshot was written to disk.</param>
/// <param name="Hierarchy">The persisted object-hierarchy rowset.</param>
/// <param name="Attributes">The persisted attribute rowset.</param>
public sealed record GalaxyHierarchySnapshot(
DateTimeOffset? LastDeployTime,
DateTimeOffset SavedAt,
IReadOnlyList<GalaxyHierarchyRow> Hierarchy,
IReadOnlyList<GalaxyAttributeRow> Attributes);
@@ -0,0 +1,152 @@
using System.Text.Json;
using Microsoft.Extensions.Logging;
using Microsoft.Extensions.Options;
namespace ZB.MOM.WW.GalaxyRepository;
/// <summary>
/// JSON-file implementation of <see cref="IGalaxyHierarchySnapshotStore"/>.
/// Writes the on-disk snapshot atomically (temp file + rename) so a crash
/// mid-write can never leave a torn file, and ignores files whose schema
/// version it does not recognize. When
/// <see cref="GalaxyRepositoryOptions.PersistSnapshot"/> is <see langword="false"/>
/// — or <see cref="GalaxyRepositoryOptions.SnapshotCachePath"/> is empty —
/// both operations are no-ops. The snapshot path is fully consumer-supplied;
/// this store imposes no platform-specific default, so it is cross-platform.
/// </summary>
public sealed class GalaxyHierarchySnapshotStore : IGalaxyHierarchySnapshotStore, IDisposable
{
/// <summary>
/// On-disk format version. Bump this whenever the persisted shape changes
/// in a way an older or newer consumer cannot read; a mismatched file is
/// ignored rather than misparsed.
/// </summary>
private const int CurrentSchemaVersion = 1;
private static readonly JsonSerializerOptions SerializerOptions = new()
{
WriteIndented = false,
};
private readonly string? _path;
private readonly TimeSpan _writeTimeout;
private readonly ILogger<GalaxyHierarchySnapshotStore>? _logger;
private readonly SemaphoreSlim _ioGate = new(1, 1);
/// <summary>Initializes a new instance of the <see cref="GalaxyHierarchySnapshotStore"/> class.</summary>
/// <param name="options">Galaxy repository options carrying the snapshot path and enable flag.</param>
/// <param name="logger">Optional logger for diagnostic output.</param>
public GalaxyHierarchySnapshotStore(
IOptions<GalaxyRepositoryOptions> options,
ILogger<GalaxyHierarchySnapshotStore>? logger = null)
{
GalaxyRepositoryOptions value = options.Value;
_path = value.PersistSnapshot && !string.IsNullOrWhiteSpace(value.SnapshotCachePath)
? value.SnapshotCachePath
: null;
_writeTimeout = TimeSpan.FromSeconds(Math.Max(1, value.CommandTimeoutSeconds));
_logger = logger;
}
/// <inheritdoc />
public async Task SaveAsync(GalaxyHierarchySnapshot snapshot, CancellationToken cancellationToken)
{
ArgumentNullException.ThrowIfNull(snapshot);
if (_path is null)
{
return;
}
PersistedFile file = new(CurrentSchemaVersion, snapshot);
await _ioGate.WaitAsync(cancellationToken).ConfigureAwait(false);
try
{
// Bound the write so a stuck disk — e.g. a SnapshotCachePath on an
// unresponsive network share — cannot stall the caller. On the cache
// refresh path that would otherwise pin the whole refresh loop.
using CancellationTokenSource writeCts = CancellationTokenSource.CreateLinkedTokenSource(cancellationToken);
writeCts.CancelAfter(_writeTimeout);
string? directory = Path.GetDirectoryName(_path);
if (!string.IsNullOrEmpty(directory))
{
Directory.CreateDirectory(directory);
}
string tempPath = _path + ".tmp";
await using (FileStream stream = new(tempPath, FileMode.Create, FileAccess.Write, FileShare.None))
{
await JsonSerializer.SerializeAsync(stream, file, SerializerOptions, writeCts.Token).ConfigureAwait(false);
}
File.Move(tempPath, _path, overwrite: true);
_logger?.LogDebug(
"Persisted Galaxy hierarchy snapshot to {Path} ({ObjectCount} objects, {AttributeCount} attributes).",
_path,
snapshot.Hierarchy.Count,
snapshot.Attributes.Count);
}
finally
{
_ioGate.Release();
}
}
/// <inheritdoc />
public async Task<GalaxyHierarchySnapshot?> TryLoadAsync(CancellationToken cancellationToken)
{
if (_path is null || !File.Exists(_path))
{
return null;
}
await _ioGate.WaitAsync(cancellationToken).ConfigureAwait(false);
try
{
PersistedFile? file;
await using (FileStream stream = new(_path, FileMode.Open, FileAccess.Read, FileShare.Read))
{
file = await JsonSerializer.DeserializeAsync<PersistedFile>(
stream, SerializerOptions, cancellationToken).ConfigureAwait(false);
}
if (file is null || file.SchemaVersion != CurrentSchemaVersion || file.Snapshot is null)
{
_logger?.LogWarning(
"Ignoring Galaxy hierarchy snapshot at {Path}: unrecognized or empty schema version.",
_path);
return null;
}
return file.Snapshot;
}
catch (Exception exception) when (exception is JsonException or IOException or UnauthorizedAccessException)
{
// A corrupt, truncated, locked, or access-denied snapshot file is an
// expected failure mode for a disk cache — honor the Try contract and
// return null rather than throwing.
_logger?.LogWarning(
exception,
"Ignoring Galaxy hierarchy snapshot at {Path}: the file is unreadable or not valid JSON.",
_path);
return null;
}
finally
{
_ioGate.Release();
}
}
/// <summary>
/// Disposes the I/O gate. As a DI singleton the store is disposed once at host
/// shutdown, by which point no save/load is in flight.
/// </summary>
public void Dispose()
{
_ioGate.Dispose();
}
/// <summary>On-disk envelope: a schema version plus the snapshot payload.</summary>
private sealed record PersistedFile(int SchemaVersion, GalaxyHierarchySnapshot? Snapshot);
}
@@ -0,0 +1,16 @@
using ZB.MOM.WW.GalaxyRepository.Grpc;
namespace ZB.MOM.WW.GalaxyRepository;
/// <summary>
/// A <see cref="GalaxyObject"/> paired with its computed contained path and hierarchy
/// depth. Materialized once per cache entry by <see cref="GalaxyHierarchyIndex"/> so
/// browse/discover projection can filter and page without recomputing paths.
/// </summary>
/// <param name="Object">The projected Galaxy object.</param>
/// <param name="ContainedPath">The slash-delimited contained path from the hierarchy root.</param>
/// <param name="Depth">The number of path segments from the root (zero for top-level objects).</param>
public sealed record GalaxyObjectView(
GalaxyObject Object,
string ContainedPath,
int Depth);
@@ -0,0 +1,76 @@
using ZB.MOM.WW.GalaxyRepository.Grpc;
namespace ZB.MOM.WW.GalaxyRepository;
/// <summary>
/// Maps <see cref="GalaxyHierarchyRow"/> + <see cref="GalaxyAttributeRow"/> rows produced
/// by <see cref="GalaxyRepository"/> into <c>galaxy_repository.v1</c> proto messages.
/// Pure function, separated so it can be unit-tested without a SQL connection.
/// </summary>
public static class GalaxyProtoMapper
{
/// <summary>Maps Galaxy hierarchy and attribute rows to Galaxy object protos.</summary>
/// <param name="hierarchy">Hierarchy rows from Galaxy Repository.</param>
/// <param name="attributes">Attribute rows from Galaxy Repository.</param>
public static IEnumerable<GalaxyObject> MapHierarchy(
IReadOnlyList<GalaxyHierarchyRow> hierarchy,
IReadOnlyList<GalaxyAttributeRow> attributes)
{
Dictionary<int, List<GalaxyAttributeRow>> attributesByGobjectId = attributes
.GroupBy(a => a.GobjectId)
.ToDictionary(g => g.Key, g => g.ToList());
foreach (GalaxyHierarchyRow row in hierarchy)
{
yield return MapObject(row, attributesByGobjectId);
}
}
/// <summary>Maps a Galaxy hierarchy row to a Galaxy object proto.</summary>
/// <param name="row">Hierarchy row from Galaxy Repository.</param>
/// <param name="attributesByGobjectId">Attributes indexed by gobject ID.</param>
public static GalaxyObject MapObject(
GalaxyHierarchyRow row,
IReadOnlyDictionary<int, List<GalaxyAttributeRow>> attributesByGobjectId)
{
GalaxyObject obj = new()
{
GobjectId = row.GobjectId,
TagName = row.TagName,
ContainedName = row.ContainedName,
BrowseName = row.BrowseName,
ParentGobjectId = row.ParentGobjectId,
IsArea = row.IsArea,
CategoryId = row.CategoryId,
HostedByGobjectId = row.HostedByGobjectId,
};
obj.TemplateChain.AddRange(row.TemplateChain);
if (attributesByGobjectId.TryGetValue(row.GobjectId, out List<GalaxyAttributeRow>? attrs))
{
foreach (GalaxyAttributeRow attr in attrs)
{
obj.Attributes.Add(MapAttribute(attr));
}
}
return obj;
}
/// <summary>Maps a Galaxy attribute row to a Galaxy attribute proto.</summary>
/// <param name="row">Attribute row from Galaxy Repository.</param>
public static GalaxyAttribute MapAttribute(GalaxyAttributeRow row) => new()
{
AttributeName = row.AttributeName,
FullTagReference = row.FullTagReference,
MxDataType = row.MxDataType,
DataTypeName = row.DataTypeName ?? string.Empty,
IsArray = row.IsArray,
ArrayDimension = row.ArrayDimension ?? 0,
ArrayDimensionPresent = row.ArrayDimension.HasValue,
MxAttributeCategory = row.MxAttributeCategory,
SecurityClassification = row.SecurityClassification,
IsHistorized = row.IsHistorized,
IsAlarm = row.IsAlarm,
};
}
@@ -0,0 +1,257 @@
using Microsoft.Data.SqlClient;
namespace ZB.MOM.WW.GalaxyRepository;
/// <summary>
/// SQL access to the AVEVA System Platform Galaxy Repository database.
/// <para>
/// <see cref="HierarchySql" /> is the query originally ported from the OtOpcUa
/// project. <see cref="AttributesSql" /> has diverged: it additionally enumerates the
/// built-in attributes contributed by each object's primitives (from
/// <c>attribute_definition</c> via <c>primitive_instance</c>), so engine/platform objects
/// and extension sub-attributes (e.g. <c>TestAlarm001.Acked</c>) are surfaced. The
/// OtOpcUa query is not kept in sync.
/// </para>
/// </summary>
public sealed class GalaxyRepository(GalaxyRepositoryOptions options) : IGalaxyRepository
{
/// <summary>Tests the connection to the Galaxy Repository database.</summary>
/// <param name="ct">Token to cancel the asynchronous operation.</param>
public async Task<bool> TestConnectionAsync(CancellationToken ct = default)
{
try
{
using SqlConnection conn = new(options.ConnectionString);
await conn.OpenAsync(ct).ConfigureAwait(false);
using SqlCommand cmd = new("SELECT 1", conn) { CommandTimeout = options.CommandTimeoutSeconds };
object? result = await cmd.ExecuteScalarAsync(ct).ConfigureAwait(false);
return result is int i && i == 1;
}
catch (SqlException) { return false; }
catch (InvalidOperationException) { return false; }
}
/// <summary>Retrieves the last deployment time from the Galaxy Repository.</summary>
/// <param name="ct">Token to cancel the asynchronous operation.</param>
public async Task<DateTime?> GetLastDeployTimeAsync(CancellationToken ct = default)
{
using SqlConnection conn = new(options.ConnectionString);
await conn.OpenAsync(ct).ConfigureAwait(false);
using SqlCommand cmd = new("SELECT time_of_last_deploy FROM galaxy", conn)
{ CommandTimeout = options.CommandTimeoutSeconds };
object? result = await cmd.ExecuteScalarAsync(ct).ConfigureAwait(false);
return result is DateTime dt ? dt : null;
}
/// <summary>Retrieves the complete hierarchy of Galaxy objects from the repository.</summary>
/// <param name="ct">Token to cancel the asynchronous operation.</param>
public async Task<List<GalaxyHierarchyRow>> GetHierarchyAsync(CancellationToken ct = default)
{
List<GalaxyHierarchyRow> rows = new();
using SqlConnection conn = new(options.ConnectionString);
await conn.OpenAsync(ct).ConfigureAwait(false);
using SqlCommand cmd = new(HierarchySql, conn) { CommandTimeout = options.CommandTimeoutSeconds };
using SqlDataReader reader = await cmd.ExecuteReaderAsync(ct).ConfigureAwait(false);
while (await reader.ReadAsync(ct).ConfigureAwait(false))
{
string templateChainRaw = reader.IsDBNull(8) ? string.Empty : reader.GetString(8);
string[] templateChain = templateChainRaw.Length == 0
? Array.Empty<string>()
: templateChainRaw.Split(['|'], StringSplitOptions.RemoveEmptyEntries)
.Select(s => s.Trim())
.Where(s => s.Length > 0)
.ToArray();
rows.Add(new GalaxyHierarchyRow
{
GobjectId = Convert.ToInt32(reader.GetValue(0)),
TagName = reader.GetString(1),
ContainedName = reader.IsDBNull(2) ? string.Empty : reader.GetString(2),
BrowseName = reader.GetString(3),
ParentGobjectId = Convert.ToInt32(reader.GetValue(4)),
IsArea = Convert.ToInt32(reader.GetValue(5)) == 1,
CategoryId = Convert.ToInt32(reader.GetValue(6)),
HostedByGobjectId = Convert.ToInt32(reader.GetValue(7)),
TemplateChain = templateChain,
});
}
return rows;
}
/// <summary>Retrieves all attributes for Galaxy objects from the repository.</summary>
/// <param name="ct">Token to cancel the asynchronous operation.</param>
public async Task<List<GalaxyAttributeRow>> GetAttributesAsync(CancellationToken ct = default)
{
List<GalaxyAttributeRow> rows = new();
using SqlConnection conn = new(options.ConnectionString);
await conn.OpenAsync(ct).ConfigureAwait(false);
using SqlCommand cmd = new(AttributesSql, conn) { CommandTimeout = options.CommandTimeoutSeconds };
using SqlDataReader reader = await cmd.ExecuteReaderAsync(ct).ConfigureAwait(false);
while (await reader.ReadAsync(ct).ConfigureAwait(false))
{
rows.Add(new GalaxyAttributeRow
{
GobjectId = Convert.ToInt32(reader.GetValue(0)),
TagName = reader.GetString(1),
AttributeName = reader.GetString(2),
FullTagReference = reader.GetString(3),
MxDataType = Convert.ToInt32(reader.GetValue(4)),
DataTypeName = reader.IsDBNull(5) ? null : reader.GetString(5),
IsArray = Convert.ToInt32(reader.GetValue(6)) == 1,
ArrayDimension = reader.IsDBNull(7) ? null : Convert.ToInt32(reader.GetValue(7)),
MxAttributeCategory = Convert.ToInt32(reader.GetValue(8)),
SecurityClassification = Convert.ToInt32(reader.GetValue(9)),
IsHistorized = Convert.ToInt32(reader.GetValue(10)) == 1,
IsAlarm = Convert.ToInt32(reader.GetValue(11)) == 1,
});
}
return rows;
}
// Area objects (category 13) are returned even when undeployed (deployed_package_id = 0):
// they are organizational/model nodes that group deployed objects, so excluding them
// orphans every area whose containing area is not itself deployed. All non-area objects
// still require deployment. Orphans left by a missing/deleted parent area are re-rooted
// by GalaxyHierarchyIndex.Build so nothing disappears from browse.
private const string HierarchySql = @"
;WITH template_chain AS (
SELECT g.gobject_id AS instance_gobject_id, t.gobject_id AS template_gobject_id,
t.tag_name AS template_tag_name, t.derived_from_gobject_id, 0 AS depth
FROM gobject g
INNER JOIN gobject t ON t.gobject_id = g.derived_from_gobject_id
WHERE g.is_template = 0 AND g.deployed_package_id <> 0 AND g.derived_from_gobject_id <> 0
UNION ALL
SELECT tc.instance_gobject_id, t.gobject_id, t.tag_name, t.derived_from_gobject_id, tc.depth + 1
FROM template_chain tc
INNER JOIN gobject t ON t.gobject_id = tc.derived_from_gobject_id
WHERE tc.derived_from_gobject_id <> 0 AND tc.depth < 10
)
SELECT DISTINCT
g.gobject_id,
g.tag_name,
g.contained_name,
CASE WHEN g.contained_name IS NULL OR g.contained_name = ''
THEN g.tag_name
ELSE g.contained_name
END AS browse_name,
CASE WHEN g.contained_by_gobject_id = 0
THEN g.area_gobject_id
ELSE g.contained_by_gobject_id
END AS parent_gobject_id,
CASE WHEN td.category_id = 13
THEN 1
ELSE 0
END AS is_area,
td.category_id AS category_id,
g.hosted_by_gobject_id AS hosted_by_gobject_id,
ISNULL(
STUFF((
SELECT '|' + tc.template_tag_name
FROM template_chain tc
WHERE tc.instance_gobject_id = g.gobject_id
ORDER BY tc.depth
FOR XML PATH('')
), 1, 1, ''),
''
) AS template_chain
FROM gobject g
INNER JOIN template_definition td
ON g.template_definition_id = td.template_definition_id
WHERE td.category_id IN (1, 3, 4, 10, 11, 13, 17, 24, 26)
AND g.is_template = 0
AND (g.deployed_package_id <> 0 OR td.category_id = 13)
ORDER BY parent_gobject_id, g.tag_name";
// Unlike HierarchySql, this query has diverged from the OtOpcUa original. It returns two
// kinds of attribute: user-configured dynamic attributes (the original `dynamic_attribute`
// body, src_pri 0) and the built-in attributes every object inherits from its primitives
// (`attribute_definition` joined through `primitive_instance`, src_pri 1). Built-in
// attributes are why engine/platform objects and extension sub-attributes such as
// `TestAlarm001.Acked` show up at all. Built-in rows carry no category filter (the
// `attribute_definition` category numbering differs from `dynamic_attribute`'s — only the
// `_`-prefix and `.Description` name exclusions apply) and are never flagged
// `is_historized`/`is_alarm`: those flags describe a user attribute that anchors an
// extension, not the extension's machinery leaves.
private const string AttributesSql = @"
;WITH deployed_package_chain AS (
SELECT g.gobject_id, p.package_id, p.derived_from_package_id, 0 AS depth
FROM gobject g
INNER JOIN package p ON p.package_id = g.deployed_package_id
WHERE g.is_template = 0 AND g.deployed_package_id <> 0
UNION ALL
SELECT dpc.gobject_id, p.package_id, p.derived_from_package_id, dpc.depth + 1
FROM deployed_package_chain dpc
INNER JOIN package p ON p.package_id = dpc.derived_from_package_id
WHERE dpc.derived_from_package_id <> 0 AND dpc.depth < 10
),
candidate AS (
SELECT
dpc.gobject_id, g.tag_name, da.attribute_name, da.mx_data_type, da.is_array,
CASE WHEN da.is_array = 1
THEN CONVERT(int, CONVERT(varbinary(2),
SUBSTRING(da.mx_value, 15, 2) + SUBSTRING(da.mx_value, 13, 2), 2))
ELSE NULL END AS array_dimension,
da.mx_attribute_category, da.security_classification, dpc.depth, 0 AS src_pri
FROM deployed_package_chain dpc
INNER JOIN dynamic_attribute da ON da.package_id = dpc.package_id
INNER JOIN gobject g ON g.gobject_id = dpc.gobject_id
INNER JOIN template_definition td ON td.template_definition_id = g.template_definition_id
WHERE td.category_id IN (1, 3, 4, 10, 11, 13, 17, 24, 26)
AND da.attribute_name NOT LIKE '[_]%'
AND da.attribute_name NOT LIKE '%.Description'
AND da.mx_attribute_category IN (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 24)
UNION ALL
SELECT
dpc.gobject_id, g.tag_name,
CASE WHEN pi.primitive_name IS NULL OR pi.primitive_name = ''
THEN ad.attribute_name
ELSE pi.primitive_name + '.' + ad.attribute_name END AS attribute_name,
ad.mx_data_type, ad.is_array,
CASE WHEN ad.is_array = 1
THEN CONVERT(int, CONVERT(varbinary(2),
SUBSTRING(ad.mx_value, 15, 2) + SUBSTRING(ad.mx_value, 13, 2), 2))
ELSE NULL END AS array_dimension,
ad.mx_attribute_category, ad.security_classification, dpc.depth, 1 AS src_pri
FROM deployed_package_chain dpc
INNER JOIN primitive_instance pi ON pi.package_id = dpc.package_id
INNER JOIN attribute_definition ad ON ad.primitive_definition_id = pi.primitive_definition_id
INNER JOIN gobject g ON g.gobject_id = dpc.gobject_id
INNER JOIN template_definition td ON td.template_definition_id = g.template_definition_id
WHERE td.category_id IN (1, 3, 4, 10, 11, 13, 17, 24, 26)
AND ad.attribute_name NOT LIKE '[_]%'
AND ad.attribute_name NOT LIKE '%.Description'
),
ranked AS (
SELECT c.*, ROW_NUMBER() OVER (
PARTITION BY c.gobject_id, c.attribute_name ORDER BY c.src_pri, c.depth) AS rn
FROM candidate c
)
SELECT
r.gobject_id, r.tag_name, r.attribute_name,
r.tag_name + '.' + r.attribute_name
+ CASE WHEN r.is_array = 1 THEN '[]' ELSE '' END AS full_tag_reference,
r.mx_data_type, dt.description AS data_type_name, r.is_array, r.array_dimension,
r.mx_attribute_category, r.security_classification,
CASE WHEN r.src_pri = 0 AND EXISTS (
SELECT 1 FROM deployed_package_chain dpc2
INNER JOIN primitive_instance pi ON pi.package_id = dpc2.package_id AND pi.primitive_name = r.attribute_name
INNER JOIN primitive_definition pd ON pd.primitive_definition_id = pi.primitive_definition_id AND pd.primitive_name = 'HistoryExtension'
WHERE dpc2.gobject_id = r.gobject_id
) THEN 1 ELSE 0 END AS is_historized,
CASE WHEN r.src_pri = 0 AND EXISTS (
SELECT 1 FROM deployed_package_chain dpc2
INNER JOIN primitive_instance pi ON pi.package_id = dpc2.package_id AND pi.primitive_name = r.attribute_name
INNER JOIN primitive_definition pd ON pd.primitive_definition_id = pi.primitive_definition_id AND pd.primitive_name = 'AlarmExtension'
WHERE dpc2.gobject_id = r.gobject_id
) THEN 1 ELSE 0 END AS is_alarm
FROM ranked r
LEFT JOIN data_type dt ON dt.mx_data_type = r.mx_data_type
WHERE r.rn = 1
ORDER BY r.tag_name, r.attribute_name";
}
@@ -0,0 +1,55 @@
namespace ZB.MOM.WW.GalaxyRepository;
/// <summary>
/// Connection settings for the AVEVA System Platform Galaxy Repository database.
/// <para>
/// <see cref="SectionName"/> is a generic default; the DI extension accepts an explicit
/// configuration section path so a consumer can bind from its own section (e.g.
/// <c>HistorianGateway:Galaxy</c>).
/// </para>
/// </summary>
public sealed class GalaxyRepositoryOptions
{
/// <summary>
/// Generic default configuration section name. The DI extension accepts an explicit
/// section path, so a consumer may bind from a different section (e.g.
/// <c>HistorianGateway:Galaxy</c>).
/// </summary>
public const string SectionName = "GalaxyRepository";
/// <summary>
/// Default SQL Server connection string for the Galaxy Repository database.
/// Single source of truth shared with the integration-test fallback so the
/// production default and the live-test default cannot drift.
/// </summary>
public const string DefaultConnectionString =
"Server=localhost;Database=ZB;Integrated Security=True;TrustServerCertificate=True;Encrypt=False;";
/// <summary>The SQL Server connection string for the Galaxy Repository database.</summary>
public string ConnectionString { get; init; } = DefaultConnectionString;
/// <summary>The timeout in seconds for SQL commands executed against the Galaxy Repository.</summary>
public int CommandTimeoutSeconds { get; init; } = 60;
/// <summary>
/// Interval (seconds) between background refreshes of the dashboard Galaxy summary
/// cache. SQL is hit at most once per interval regardless of dashboard render rate.
/// </summary>
public int DashboardRefreshIntervalSeconds { get; init; } = 30;
/// <summary>
/// Whether the latest successful Galaxy browse dataset is persisted to disk. When
/// enabled, the cache reloads that snapshot at startup so clients can still browse
/// last-known data while the Galaxy database is unreachable.
/// </summary>
public bool PersistSnapshot { get; init; } = true;
/// <summary>
/// File path for the persisted Galaxy browse snapshot. Ignored when
/// <see cref="PersistSnapshot"/> is <see langword="false"/>. There is no built-in
/// default path — the consumer supplies a cross-platform-friendly path appropriate to
/// its host. When left empty and <see cref="PersistSnapshot"/> is enabled, the
/// snapshot store (a later task) decides where to write.
/// </summary>
public string SnapshotCachePath { get; init; } = string.Empty;
}
@@ -0,0 +1,16 @@
using ZB.MOM.WW.GalaxyRepository.Grpc;
namespace ZB.MOM.WW.GalaxyRepository;
/// <summary>
/// Resolution result for a tag address: the owning <see cref="GalaxyObject"/>, the
/// specific <see cref="GalaxyAttribute"/> when the address names an attribute (otherwise
/// <see langword="null"/>), and the object's contained path.
/// </summary>
/// <param name="Object">The Galaxy object that owns the looked-up address.</param>
/// <param name="Attribute">The matched attribute, or <see langword="null"/> when the address names an object.</param>
/// <param name="ContainedPath">The owning object's contained path.</param>
public sealed record GalaxyTagLookup(
GalaxyObject Object,
GalaxyAttribute? Attribute,
string ContainedPath);
@@ -0,0 +1,329 @@
using Google.Protobuf.WellKnownTypes;
using Grpc.Core;
using ProtoGalaxyRepository = ZB.MOM.WW.GalaxyRepository.Grpc.GalaxyRepository;
namespace ZB.MOM.WW.GalaxyRepository.Grpc;
/// <summary>
/// Reusable gRPC surface that exposes the Galaxy Repository to clients. Hosted by any
/// consuming gateway (e.g. MxAccessGateway or the HistorianGateway sidecar) via
/// <see cref="DependencyInjection.GalaxyRepositoryServiceCollectionExtensions.MapZbGalaxyRepository"/>.
/// <para>
/// <c>DiscoverHierarchy</c> and <c>GetLastDeployTime</c> serve from
/// <see cref="IGalaxyHierarchyCache"/> so many clients share a single SQL pull.
/// <c>WatchDeployEvents</c> streams events from <see cref="IGalaxyDeployNotifier"/>.
/// <c>TestConnection</c> remains a direct SQL probe since callers use it as a health check.
/// </para>
/// <para>
/// This service applies <b>no</b> per-identity browse-subtree filtering — the full
/// hierarchy is projected (<c>null</c> subtree globs). Authorization (including any
/// subtree scoping) is the responsibility of the hosting gateway's interceptor layer.
/// </para>
/// </summary>
/// <param name="repository">Direct SQL surface used by <c>TestConnection</c>.</param>
/// <param name="cache">Shared hierarchy cache that <c>DiscoverHierarchy</c>/<c>BrowseChildren</c>/<c>GetLastDeployTime</c> serve from.</param>
/// <param name="notifier">Deploy-event source streamed by <c>WatchDeployEvents</c>.</param>
public sealed class GalaxyRepositoryGrpcService(
IGalaxyRepository repository,
IGalaxyHierarchyCache cache,
IGalaxyDeployNotifier notifier) : ProtoGalaxyRepository.GalaxyRepositoryBase
{
private static readonly TimeSpan FirstLoadWaitBudget = TimeSpan.FromSeconds(5);
private const int DefaultDiscoverPageSize = 1000;
private const int MaxDiscoverPageSize = 5000;
private const int DefaultBrowsePageSize = 500;
// MaxBrowsePageSize reuses MaxDiscoverPageSize (5000) — same cap.
/// <inheritdoc />
public override async Task<TestConnectionReply> TestConnection(
TestConnectionRequest request,
ServerCallContext context)
{
bool ok = await repository.TestConnectionAsync(context.CancellationToken).ConfigureAwait(false);
return new TestConnectionReply { Ok = ok };
}
/// <inheritdoc />
public override async Task<GetLastDeployTimeReply> GetLastDeployTime(
GetLastDeployTimeRequest request,
ServerCallContext context)
{
await WaitForCacheBootstrap(context.CancellationToken).ConfigureAwait(false);
GalaxyHierarchyCacheEntry entry = cache.Current;
if (!entry.HasData)
{
throw new RpcException(new Status(
StatusCode.Unavailable,
ResolveUnavailableMessage(entry)));
}
GetLastDeployTimeReply reply = new() { Present = entry.LastDeployTime.HasValue };
if (entry.LastDeployTime.HasValue)
{
reply.TimeOfLastDeploy = Timestamp.FromDateTimeOffset(entry.LastDeployTime.Value);
}
return reply;
}
/// <inheritdoc />
public override async Task<DiscoverHierarchyReply> DiscoverHierarchy(
DiscoverHierarchyRequest request,
ServerCallContext context)
{
await WaitForCacheBootstrap(context.CancellationToken).ConfigureAwait(false);
GalaxyHierarchyCacheEntry entry = cache.Current;
if (!entry.HasData)
{
throw new RpcException(new Status(
StatusCode.Unavailable,
ResolveUnavailableMessage(entry)));
}
int pageSize = ResolvePageSize(request.PageSize);
// The shared library applies no per-identity subtree scoping; the hosting
// gateway enforces authorization at its interceptor layer.
string filterSignature = GalaxyHierarchyProjector.ComputeFilterSignature(request, browseSubtreeGlobs: null);
PageToken pageToken = ParsePageToken(request.PageToken, entry.Sequence, filterSignature);
GalaxyHierarchyQueryResult query = GalaxyHierarchyProjector.Project(
entry,
request,
browseSubtreeGlobs: null,
pageToken.Offset,
pageSize);
int offset = pageToken.Offset;
if (offset > query.TotalObjectCount)
{
throw new RpcException(new Status(
StatusCode.InvalidArgument,
"DiscoverHierarchy page_token is outside the current hierarchy."));
}
DiscoverHierarchyReply reply = new()
{
TotalObjectCount = query.TotalObjectCount,
};
reply.Objects.Add(query.Objects);
int nextOffset = offset + query.Objects.Count;
if (nextOffset < query.TotalObjectCount)
{
reply.NextPageToken = FormatPageToken(entry.Sequence, query.FilterSignature, nextOffset);
}
return reply;
}
/// <inheritdoc />
public override async Task<BrowseChildrenReply> BrowseChildren(
BrowseChildrenRequest request,
ServerCallContext context)
{
await WaitForCacheBootstrap(context.CancellationToken).ConfigureAwait(false);
GalaxyHierarchyCacheEntry entry = cache.Current;
if (!entry.HasData)
{
throw new RpcException(new Status(
StatusCode.Unavailable,
ResolveUnavailableMessage(entry)));
}
int pageSize = ResolveBrowsePageSize(request.PageSize);
// Resolve the parent id once so the page-token signature can include it
// and the projector sees the same resolved id when memoizing. The projector
// re-resolves internally; with the by-name/by-path indexes on
// GalaxyHierarchyIndex that second call is O(1), so the redundancy is cheap
// and keeps the projector self-contained.
int parentId = GalaxyBrowseProjector.ResolveParentId(entry, request);
string filterSignature = GalaxyBrowseProjector.ComputeFilterSignature(
request, browseSubtreeGlobs: null, parentId);
PageToken pageToken = ParsePageToken(request.PageToken, entry.Sequence, filterSignature);
GalaxyBrowseChildrenResult result = GalaxyBrowseProjector.ProjectChildren(
entry,
request,
browseSubtreeGlobs: null,
pageToken.Offset,
pageSize);
if (pageToken.Offset > result.TotalChildCount)
{
throw new RpcException(new Status(
StatusCode.InvalidArgument,
"BrowseChildren page_token is outside the current children set."));
}
BrowseChildrenReply reply = new()
{
TotalChildCount = result.TotalChildCount,
CacheSequence = (ulong)entry.Sequence,
};
reply.Children.Add(result.Children);
reply.ChildHasChildren.Add(result.ChildHasChildren);
int nextOffset = pageToken.Offset + result.Children.Count;
if (nextOffset < result.TotalChildCount)
{
reply.NextPageToken = FormatPageToken(entry.Sequence, result.FilterSignature, nextOffset);
}
return reply;
}
/// <inheritdoc />
public override async Task WatchDeployEvents(
WatchDeployEventsRequest request,
IServerStreamWriter<DeployEvent> responseStream,
ServerCallContext context)
{
DateTimeOffset? lastSeen = request.LastSeenDeployTime?.ToDateTimeOffset();
await foreach (GalaxyDeployEventInfo info in notifier
.SubscribeAsync(context.CancellationToken)
.ConfigureAwait(false))
{
// Suppress the initial bootstrap event when the client already knows about
// this deploy time. We only suppress the first one — subsequent events fire
// on actual changes, so they always pass.
if (lastSeen is { } seen && info.TimeOfLastDeploy == seen)
{
lastSeen = null;
continue;
}
lastSeen = null;
await responseStream.WriteAsync(MapDeployEvent(info), context.CancellationToken).ConfigureAwait(false);
}
}
private async Task WaitForCacheBootstrap(CancellationToken cancellationToken)
{
if (cache.Current.HasData || cache.Current.Status == GalaxyCacheStatus.Unavailable)
{
return;
}
using CancellationTokenSource budget = CancellationTokenSource.CreateLinkedTokenSource(cancellationToken);
budget.CancelAfter(FirstLoadWaitBudget);
try
{
await cache.WaitForFirstLoadAsync(budget.Token).ConfigureAwait(false);
}
catch (OperationCanceledException) when (cancellationToken.IsCancellationRequested)
{
throw;
}
catch (OperationCanceledException)
{
// Budget elapsed; fall through and let the caller see the current
// (possibly Unknown/Unavailable) entry.
}
}
private static DeployEvent MapDeployEvent(GalaxyDeployEventInfo info)
{
DeployEvent ev = new()
{
Sequence = (ulong)info.Sequence,
ObservedAt = Timestamp.FromDateTimeOffset(info.ObservedAt),
ObjectCount = info.ObjectCount,
AttributeCount = info.AttributeCount,
TimeOfLastDeployPresent = info.TimeOfLastDeploy.HasValue,
};
if (info.TimeOfLastDeploy.HasValue)
{
ev.TimeOfLastDeploy = Timestamp.FromDateTimeOffset(info.TimeOfLastDeploy.Value);
}
return ev;
}
private static string ResolveUnavailableMessage(GalaxyHierarchyCacheEntry entry) => entry.Status switch
{
GalaxyCacheStatus.Unknown => "Galaxy cache has not completed its initial load yet.",
GalaxyCacheStatus.Unavailable => "Galaxy repository is unavailable.",
_ => "Galaxy cache has no data available.",
};
private static int ResolvePageSize(int requestedPageSize)
{
if (requestedPageSize < 0)
{
throw new RpcException(new Status(
StatusCode.InvalidArgument,
"DiscoverHierarchy page_size must be greater than zero when provided."));
}
int pageSize = requestedPageSize == 0 ? DefaultDiscoverPageSize : requestedPageSize;
return Math.Min(pageSize, MaxDiscoverPageSize);
}
private static int ResolveBrowsePageSize(int requested)
{
if (requested < 0)
{
throw new RpcException(new Status(
StatusCode.InvalidArgument,
"BrowseChildren page_size must be greater than zero when provided."));
}
int pageSize = requested == 0 ? DefaultBrowsePageSize : requested;
return Math.Min(pageSize, MaxDiscoverPageSize);
}
private static string FormatPageToken(long sequence, string filterSignature, int offset)
{
return string.Concat(
sequence.ToString(System.Globalization.CultureInfo.InvariantCulture),
":",
filterSignature,
":",
offset.ToString(System.Globalization.CultureInfo.InvariantCulture));
}
private static PageToken ParsePageToken(string pageToken, long currentSequence, string currentFilterSignature)
{
if (string.IsNullOrWhiteSpace(pageToken))
{
return new PageToken(currentSequence, currentFilterSignature, Offset: 0);
}
string[] parts = pageToken.Split(':', count: 3);
if (parts.Length != 3
|| !long.TryParse(
parts[0],
System.Globalization.NumberStyles.None,
System.Globalization.CultureInfo.InvariantCulture,
out long sequence)
|| !int.TryParse(
parts[2],
System.Globalization.NumberStyles.None,
System.Globalization.CultureInfo.InvariantCulture,
out int offset)
|| offset < 0)
{
throw new RpcException(new Status(
StatusCode.InvalidArgument,
"page_token is invalid."));
}
if (sequence != currentSequence)
{
throw new RpcException(new Status(
StatusCode.InvalidArgument,
"page_token is stale."));
}
if (!string.Equals(parts[1], currentFilterSignature, StringComparison.Ordinal))
{
throw new RpcException(new Status(
StatusCode.InvalidArgument,
"page_token does not match the current filters."));
}
return new PageToken(sequence, parts[1], offset);
}
private sealed record PageToken(long Sequence, string FilterSignature, int Offset);
}
@@ -0,0 +1,17 @@
namespace ZB.MOM.WW.GalaxyRepository;
/// <summary>Publishes Galaxy repository deploy events to subscribers.</summary>
public interface IGalaxyDeployNotifier
{
/// <summary>The most recently published event, or null if no event has fired yet.</summary>
GalaxyDeployEventInfo? Latest { get; }
/// <summary>Publishes a deploy event to all current subscribers and stores it as Latest.</summary>
/// <param name="info">The deploy event to publish.</param>
void Publish(GalaxyDeployEventInfo info);
/// <summary>Subscribes to deploy events. The sequence yields the latest event first (if available) then streams new events as they fire.</summary>
/// <param name="cancellationToken">Token to cancel the asynchronous operation.</param>
/// <returns>Async enumerable of deploy events.</returns>
IAsyncEnumerable<GalaxyDeployEventInfo> SubscribeAsync(CancellationToken cancellationToken);
}
@@ -0,0 +1,25 @@
namespace ZB.MOM.WW.GalaxyRepository;
/// <summary>Cache for Galaxy Repository hierarchy data.</summary>
public interface IGalaxyHierarchyCache
{
/// <summary>The latest cache entry. Status freshness is recomputed against the clock.</summary>
GalaxyHierarchyCacheEntry Current { get; }
/// <summary>
/// Forces a refresh against the Galaxy Repository. Performs a cheap
/// <c>time_of_last_deploy</c> probe first and only re-queries the heavy hierarchy +
/// attributes rowsets when the deploy time has changed since the last successful
/// refresh.
/// </summary>
/// <param name="cancellationToken">Token to cancel the asynchronous operation.</param>
Task RefreshAsync(CancellationToken cancellationToken);
/// <summary>
/// Awaits the first completed refresh attempt (success or failure). Useful for
/// gRPC handlers that want to serve from cache without returning Unavailable on the
/// very first request after the service starts.
/// </summary>
/// <param name="cancellationToken">Token to cancel the asynchronous operation.</param>
Task WaitForFirstLoadAsync(CancellationToken cancellationToken);
}
@@ -0,0 +1,28 @@
namespace ZB.MOM.WW.GalaxyRepository;
/// <summary>
/// Persists the latest Galaxy Repository browse dataset to disk and reloads
/// it at startup. Lets <see cref="GalaxyHierarchyCache"/> serve last-known
/// browse data when the Galaxy database is unreachable on a cold start.
/// </summary>
public interface IGalaxyHierarchySnapshotStore
{
/// <summary>
/// Writes <paramref name="snapshot"/> to disk, replacing any previous
/// snapshot atomically. A no-op when snapshot persistence is disabled.
/// </summary>
/// <param name="snapshot">The browse dataset to persist.</param>
/// <param name="cancellationToken">Token to cancel the asynchronous operation.</param>
Task SaveAsync(GalaxyHierarchySnapshot snapshot, CancellationToken cancellationToken);
/// <summary>
/// Reads the persisted Galaxy browse dataset.
/// </summary>
/// <param name="cancellationToken">Token to cancel the asynchronous operation.</param>
/// <returns>
/// The persisted snapshot, or <see langword="null"/> when none exists,
/// persistence is disabled, or the on-disk file uses an unrecognized
/// schema version.
/// </returns>
Task<GalaxyHierarchySnapshot?> TryLoadAsync(CancellationToken cancellationToken);
}
@@ -0,0 +1,26 @@
namespace ZB.MOM.WW.GalaxyRepository;
/// <summary>
/// Abstraction over <see cref="GalaxyRepository"/>: the read-only SQL surface over the
/// AVEVA System Platform Galaxy Repository database. Exists so consumers (and the cache
/// layer, a later task) can be unit-tested against an in-memory fake without standing up a
/// real <c>Microsoft.Data.SqlClient</c> <c>SqlConnection</c> against a bogus host/port.
/// </summary>
public interface IGalaxyRepository
{
/// <summary>Tests the connection to the Galaxy Repository database.</summary>
/// <param name="ct">Token to cancel the asynchronous operation.</param>
Task<bool> TestConnectionAsync(CancellationToken ct = default);
/// <summary>Retrieves the last deployment time from the Galaxy Repository.</summary>
/// <param name="ct">Token to cancel the asynchronous operation.</param>
Task<DateTime?> GetLastDeployTimeAsync(CancellationToken ct = default);
/// <summary>Retrieves the complete hierarchy of Galaxy objects from the repository.</summary>
/// <param name="ct">Token to cancel the asynchronous operation.</param>
Task<List<GalaxyHierarchyRow>> GetHierarchyAsync(CancellationToken ct = default);
/// <summary>Retrieves all attributes for Galaxy objects from the repository.</summary>
/// <param name="ct">Token to cancel the asynchronous operation.</param>
Task<List<GalaxyAttributeRow>> GetAttributesAsync(CancellationToken ct = default);
}
@@ -0,0 +1,190 @@
syntax = "proto3";
package galaxy_repository.v1;
option csharp_namespace = "ZB.MOM.WW.GalaxyRepository.Grpc";
import "google/protobuf/timestamp.proto";
import "google/protobuf/wrappers.proto";
// Wire-compatibility policy (ProtobufStyleGuide): this contract evolves
// additively only. Never renumber or repurpose an existing field number or
// enum value. When a field or enum value is removed, add a `reserved` range
// (and `reserved` name) covering it in the same change so a future editor
// cannot accidentally reuse the retired tag. There are no `reserved`
// declarations today because no field or enum value has ever been removed.
// Read-only browse over the AVEVA System Platform Galaxy Repository (ZB SQL
// database). Lets clients enumerate the deployed object hierarchy and each
// object's dynamic attributes so they know what tag references to subscribe
// to via the MxAccessGateway service.
service GalaxyRepository {
rpc TestConnection(TestConnectionRequest) returns (TestConnectionReply);
rpc GetLastDeployTime(GetLastDeployTimeRequest) returns (GetLastDeployTimeReply);
rpc DiscoverHierarchy(DiscoverHierarchyRequest) returns (DiscoverHierarchyReply);
// Server-stream of deploy events. The server emits the current state immediately
// on subscribe (so clients can bootstrap their cache without waiting for the next
// deploy), then emits one event each time the gateway's hierarchy cache observes
// a new galaxy.time_of_last_deploy. The sequence field is monotonically
// increasing per server start; gaps indicate the per-subscriber buffer dropped
// older events because the client was too slow.
rpc WatchDeployEvents(WatchDeployEventsRequest) returns (stream DeployEvent);
// Returns the direct children of a parent object (or the root objects when
// `parent` is unset). Designed for OPC UA-style lazy expand: clients walk
// one level at a time instead of paging the full hierarchy. Filters mirror
// DiscoverHierarchy exactly. Backed by the same shared hierarchy cache.
rpc BrowseChildren(BrowseChildrenRequest) returns (BrowseChildrenReply);
}
message TestConnectionRequest {}
message TestConnectionReply {
bool ok = 1;
}
message GetLastDeployTimeRequest {}
message GetLastDeployTimeReply {
bool present = 1;
google.protobuf.Timestamp time_of_last_deploy = 2;
}
message DiscoverHierarchyRequest {
// Maximum number of objects to return. The server applies its default when
// unset and rejects non-positive values.
int32 page_size = 1;
// Opaque token returned by a previous DiscoverHierarchy response.
string page_token = 2;
// Optional. When set, return only this object and its descendants.
// Empty = full hierarchy.
oneof root {
int32 root_gobject_id = 3;
string root_tag_name = 4;
string root_contained_path = 5;
}
// Optional. Cap on descendant depth from root. Zero returns only the root.
// Unset means unlimited depth.
google.protobuf.Int32Value max_depth = 6;
// Optional object category id filters.
repeated int32 category_ids = 7;
// Optional case-insensitive substring filters against template names.
repeated string template_chain_contains = 8;
// Optional anchored, case-insensitive glob over object tag_name.
string tag_name_glob = 9;
// Optional. Unset or true includes attributes. False returns object skeletons.
optional bool include_attributes = 10;
// Optional. Return only objects with at least one alarm-bearing attribute.
bool alarm_bearing_only = 11;
// Optional. Return only objects with at least one historized attribute.
bool historized_only = 12;
}
message DiscoverHierarchyReply {
repeated GalaxyObject objects = 1;
// Non-empty when another page is available.
string next_page_token = 2;
// Total number of objects in the cached hierarchy at the time of the call.
int32 total_object_count = 3;
}
message WatchDeployEventsRequest {
// Optional. When set, the bootstrap event is suppressed if the cached deploy
// time matches this value. Future events are still emitted normally.
google.protobuf.Timestamp last_seen_deploy_time = 1;
}
message DeployEvent {
// Monotonically increasing per server start. Gaps indicate dropped events.
uint64 sequence = 1;
// Server wall-clock when the cache observed the deploy.
google.protobuf.Timestamp observed_at = 2;
// Galaxy.time_of_last_deploy. Absent only when the Galaxy table reports null.
google.protobuf.Timestamp time_of_last_deploy = 3;
bool time_of_last_deploy_present = 4;
int32 object_count = 5;
int32 attribute_count = 6;
}
message GalaxyObject {
int32 gobject_id = 1;
string tag_name = 2;
string contained_name = 3;
string browse_name = 4;
int32 parent_gobject_id = 5;
bool is_area = 6;
int32 category_id = 7;
int32 hosted_by_gobject_id = 8;
repeated string template_chain = 9;
repeated GalaxyAttribute attributes = 10;
}
message GalaxyAttribute {
string attribute_name = 1;
string full_tag_reference = 2;
// Raw Galaxy SQL `dbo.data_type` identifier, passed through unchanged.
// This is NOT a member of `mxaccess_gateway.v1.MxDataType` — Galaxy's
// type enumeration is distinct from MXAccess's wire data-type enum and
// the two must not be cast or compared. The GalaxyRepository service is
// metadata-only and deliberately does not share types with
// mxaccess_gateway.proto. See docs/GalaxyRepository.md.
int32 mx_data_type = 3;
// Human-readable name from Galaxy's `dbo.data_type` table (e.g. "Float",
// "Integer", "Boolean"). Free-form Galaxy text; not a stable enum.
string data_type_name = 4;
bool is_array = 5;
int32 array_dimension = 6;
bool array_dimension_present = 7;
// Raw Galaxy SQL attribute-category identifier, passed through unchanged.
// Galaxy-specific; not mapped to any gateway enum. See
// docs/GalaxyRepository.md.
int32 mx_attribute_category = 8;
// Raw Galaxy SQL security-classification identifier, passed through
// unchanged. Galaxy-specific; not mapped to any gateway enum. See
// docs/GalaxyRepository.md.
int32 security_classification = 9;
bool is_historized = 10;
bool is_alarm = 11;
}
message BrowseChildrenRequest {
// Parent selector. Empty oneof returns root objects (parent_gobject_id == 0).
oneof parent {
int32 parent_gobject_id = 1;
string parent_tag_name = 2;
string parent_contained_path = 3;
}
// Maximum number of direct children to return. Server default 500; cap 5000.
int32 page_size = 4;
// Opaque token returned by a previous BrowseChildren response. Bound to the
// cache sequence, parent selector, and the filter set; a mismatch returns
// InvalidArgument.
string page_token = 5;
// --- Filter parity with DiscoverHierarchy. AND-combined. ---
repeated int32 category_ids = 6;
repeated string template_chain_contains = 7;
string tag_name_glob = 8;
optional bool include_attributes = 9;
bool alarm_bearing_only = 10;
bool historized_only = 11;
}
message BrowseChildrenReply {
// Direct children matching the filter, sorted areas-first then by
// case-insensitive display name (same order as the dashboard tree).
repeated GalaxyObject children = 1;
// Non-empty when another page of siblings is available.
string next_page_token = 2;
// Total matching direct children of the parent (post-filter).
int32 total_child_count = 3;
// Parallel array, indexed with `children`. True when the child has at least
// one matching descendant under the same filter set. Lets a UI choose
// whether to draw an expand triangle without an extra round trip.
repeated bool child_has_children = 4;
// Cache sequence this reply was projected from. Clients may pass it back as
// part of the page_token contract. Mismatch on the next page -> InvalidArgument.
uint64 cache_sequence = 5;
}
@@ -0,0 +1,30 @@
<Project Sdk="Microsoft.NET.Sdk">
<PropertyGroup>
<IsPackable>true</IsPackable>
<PackageId>ZB.MOM.WW.GalaxyRepository</PackageId>
<Authors>ZB.MOM.WW</Authors>
<Description>Read-only Galaxy object-hierarchy browse library for the ZB.MOM.WW SCADA family. Provides a SQL provider for the Galaxy Repository database and a canonical gRPC service for exposing the hierarchy to modern .NET 10 clients — extracted from MxAccessGateway so any consumer can browse the Galaxy without loading 32-bit COM.</Description>
<PackageTags>galaxy;repository;browse;aveva;wonderware;system-platform;scada;grpc;sql;zb-mom-ww</PackageTags>
<PackageProjectUrl>https://gitea.dohertylan.com/dohertj2/zb-mom-ww-galaxyrepository</PackageProjectUrl>
<RepositoryUrl>https://gitea.dohertylan.com/dohertj2/zb-mom-ww-galaxyrepository</RepositoryUrl>
</PropertyGroup>
<ItemGroup>
<PackageReference Include="Microsoft.Data.SqlClient" />
<PackageReference Include="Grpc.AspNetCore" />
<PackageReference Include="Google.Protobuf" />
<PackageReference Include="Microsoft.Extensions.Hosting.Abstractions" />
<PackageReference Include="Microsoft.Extensions.Options.ConfigurationExtensions" />
<PackageReference Include="Grpc.Tools">
<PrivateAssets>all</PrivateAssets>
<IncludeAssets>runtime; build; native; contentfiles; analyzers; buildtransitive</IncludeAssets>
</PackageReference>
</ItemGroup>
<!-- Proto files are added in Task 2; the empty glob is intentional and builds cleanly. -->
<ItemGroup>
<Protobuf Include="Protos\*.proto" GrpcServices="Server" />
</ItemGroup>
</Project>
@@ -0,0 +1,134 @@
using System.Runtime.CompilerServices;
using ZB.MOM.WW.GalaxyRepository;
namespace ZB.MOM.WW.GalaxyRepository.Tests;
/// <summary>
/// In-memory <see cref="IGalaxyRepository"/> returning canned rowsets. Counts the heavy
/// hierarchy/attribute reads so tests can assert deploy-gated skips, and can be flipped to
/// throw so the failure path is exercisable.
/// </summary>
internal sealed class FakeGalaxyRepository : IGalaxyRepository
{
private readonly IReadOnlyList<GalaxyHierarchyRow> _hierarchy;
private readonly IReadOnlyList<GalaxyAttributeRow> _attributes;
public FakeGalaxyRepository(
IReadOnlyList<GalaxyHierarchyRow> hierarchy,
IReadOnlyList<GalaxyAttributeRow> attributes,
DateTime? deployTime)
{
_hierarchy = hierarchy;
_attributes = attributes;
DeployTime = deployTime;
}
/// <summary>The deploy time returned by <see cref="GetLastDeployTimeAsync"/>; mutate to simulate a redeploy.</summary>
public DateTime? DeployTime { get; set; }
/// <summary>When set, every query throws this exception (simulates an unreachable database).</summary>
public Exception? ThrowOnQuery { get; set; }
public int HierarchyReadCount { get; private set; }
public int AttributeReadCount { get; private set; }
public Task<bool> TestConnectionAsync(CancellationToken ct = default) =>
ThrowOnQuery is null ? Task.FromResult(true) : throw ThrowOnQuery;
public Task<DateTime?> GetLastDeployTimeAsync(CancellationToken ct = default)
{
if (ThrowOnQuery is not null)
{
throw ThrowOnQuery;
}
return Task.FromResult(DeployTime);
}
public Task<List<GalaxyHierarchyRow>> GetHierarchyAsync(CancellationToken ct = default)
{
if (ThrowOnQuery is not null)
{
throw ThrowOnQuery;
}
HierarchyReadCount++;
return Task.FromResult(_hierarchy.ToList());
}
public Task<List<GalaxyAttributeRow>> GetAttributesAsync(CancellationToken ct = default)
{
if (ThrowOnQuery is not null)
{
throw ThrowOnQuery;
}
AttributeReadCount++;
return Task.FromResult(_attributes.ToList());
}
}
/// <summary>Records published deploy events so tests can assert publication.</summary>
internal sealed class RecordingDeployNotifier : IGalaxyDeployNotifier
{
public List<GalaxyDeployEventInfo> Published { get; } = [];
public GalaxyDeployEventInfo? Latest { get; private set; }
public void Publish(GalaxyDeployEventInfo info)
{
Published.Add(info);
Latest = info;
}
public async IAsyncEnumerable<GalaxyDeployEventInfo> SubscribeAsync(
[EnumeratorCancellation] CancellationToken cancellationToken)
{
if (Latest is { } latest)
{
yield return latest;
}
await Task.CompletedTask.ConfigureAwait(false);
}
}
/// <summary>
/// In-memory <see cref="IGalaxyHierarchySnapshotStore"/>. Pre-seed <see cref="Snapshot"/>
/// to exercise the restore path; reads <see cref="SaveAsync"/> back to assert persistence.
/// </summary>
internal sealed class FakeSnapshotStore : IGalaxyHierarchySnapshotStore
{
public GalaxyHierarchySnapshot? Snapshot { get; set; }
public int SaveCount { get; private set; }
public int LoadCount { get; private set; }
public Task SaveAsync(GalaxyHierarchySnapshot snapshot, CancellationToken cancellationToken)
{
SaveCount++;
Snapshot = snapshot;
return Task.CompletedTask;
}
public Task<GalaxyHierarchySnapshot?> TryLoadAsync(CancellationToken cancellationToken)
{
LoadCount++;
return Task.FromResult(Snapshot);
}
}
/// <summary>
/// A <see cref="TimeProvider"/> whose UTC clock is fixed (and advanceable) so the cache's
/// staleness projection (which fires after a 5-minute threshold) is deterministic.
/// </summary>
internal sealed class StubTimeProvider(DateTimeOffset start) : TimeProvider
{
private DateTimeOffset _now = start;
public override DateTimeOffset GetUtcNow() => _now;
public void Advance(TimeSpan delta) => _now += delta;
}
@@ -0,0 +1,236 @@
using ZB.MOM.WW.GalaxyRepository;
namespace ZB.MOM.WW.GalaxyRepository.Tests;
/// <summary>
/// Tests for <see cref="GalaxyHierarchyCache"/> first-load, deploy-gating, snapshot
/// restore, persistence, and status-transition behavior. Uses an in-memory
/// <see cref="IGalaxyRepository"/> and snapshot store plus a fixed
/// <see cref="StubTimeProvider"/> so no SQL is touched and no asserts are time-sensitive.
/// </summary>
public sealed class GalaxyHierarchyCacheTests
{
private static readonly DateTimeOffset FixedNow = new(2026, 1, 1, 12, 0, 0, TimeSpan.Zero);
private static readonly DateTime DeployTime = new(2026, 1, 1, 0, 0, 0, DateTimeKind.Utc);
private static List<GalaxyHierarchyRow> SampleHierarchy() =>
[
new() { GobjectId = 1, TagName = "Area1", ContainedName = "Area1", BrowseName = "Area1", IsArea = true },
new() { GobjectId = 2, TagName = "Pump01", ContainedName = "Pump01", BrowseName = "Pump01", ParentGobjectId = 1 },
];
private static List<GalaxyAttributeRow> SampleAttributes() =>
[
new() { GobjectId = 2, AttributeName = "PV", FullTagReference = "Pump01.PV", IsHistorized = true, IsAlarm = true },
];
[Fact]
public async Task RefreshAsync_FirstLoad_PopulatesCurrentWithDataAndUnblocksWaitForFirstLoad()
{
FakeGalaxyRepository repository = new(SampleHierarchy(), SampleAttributes(), DeployTime);
RecordingDeployNotifier notifier = new();
using GalaxyHierarchyCache cache = new(repository, notifier, new StubTimeProvider(FixedNow));
// Before refresh, the gate is unset and there is no data.
Assert.False(cache.Current.HasData);
Assert.Equal(GalaxyCacheStatus.Unknown, cache.Current.Status);
await cache.RefreshAsync(CancellationToken.None);
// First load completes (does not hang) and Current now holds usable data.
await cache.WaitForFirstLoadAsync(new CancellationTokenSource(TimeSpan.FromSeconds(5)).Token);
GalaxyHierarchyCacheEntry current = cache.Current;
Assert.True(current.HasData);
Assert.Equal(GalaxyCacheStatus.Healthy, current.Status);
Assert.Equal(2, current.ObjectCount);
Assert.Equal(1, current.AreaCount);
Assert.Equal(1, current.AttributeCount);
Assert.Equal(1, current.HistorizedAttributeCount);
Assert.Equal(1, current.AlarmAttributeCount);
// The heavy queries ran exactly once and a deploy event was published.
Assert.Equal(1, repository.HierarchyReadCount);
Assert.Equal(1, repository.AttributeReadCount);
GalaxyDeployEventInfo published = Assert.Single(notifier.Published);
Assert.Equal(2, published.ObjectCount);
Assert.Equal(1, published.AttributeCount);
}
[Fact]
public async Task RefreshAsync_NoDeployChange_SkipsHeavyQueriesOnSecondRefresh()
{
FakeGalaxyRepository repository = new(SampleHierarchy(), SampleAttributes(), DeployTime);
using GalaxyHierarchyCache cache = new(repository, new RecordingDeployNotifier(), new StubTimeProvider(FixedNow));
await cache.RefreshAsync(CancellationToken.None);
await cache.RefreshAsync(CancellationToken.None);
// Deploy time unchanged => the heavy hierarchy/attribute reads happened only once.
Assert.Equal(1, repository.HierarchyReadCount);
Assert.Equal(1, repository.AttributeReadCount);
Assert.True(cache.Current.HasData);
Assert.Equal(GalaxyCacheStatus.Healthy, cache.Current.Status);
}
[Fact]
public async Task RefreshAsync_DeployAdvances_RebuildsAndBumpsSequence()
{
FakeGalaxyRepository repository = new(SampleHierarchy(), SampleAttributes(), DeployTime);
RecordingDeployNotifier notifier = new();
using GalaxyHierarchyCache cache = new(repository, notifier, new StubTimeProvider(FixedNow));
await cache.RefreshAsync(CancellationToken.None);
long firstSequence = cache.Current.Sequence;
repository.DeployTime = DeployTime.AddHours(1);
await cache.RefreshAsync(CancellationToken.None);
Assert.Equal(2, repository.HierarchyReadCount);
Assert.Equal(firstSequence + 1, cache.Current.Sequence);
Assert.Equal(2, notifier.Published.Count);
}
[Fact]
public async Task RefreshAsync_FirstQueryFailsNoPriorData_StatusUnavailableButFirstLoadStillCompletes()
{
FakeGalaxyRepository repository = new(SampleHierarchy(), SampleAttributes(), DeployTime)
{
ThrowOnQuery = new TimeoutException("galaxy db unreachable"),
};
using GalaxyHierarchyCache cache = new(repository, new RecordingDeployNotifier(), new StubTimeProvider(FixedNow));
await cache.RefreshAsync(CancellationToken.None);
// First load must complete so callers do not hang, even though the query failed.
await cache.WaitForFirstLoadAsync(new CancellationTokenSource(TimeSpan.FromSeconds(5)).Token);
Assert.False(cache.Current.HasData);
Assert.Equal(GalaxyCacheStatus.Unavailable, cache.Current.Status);
Assert.Contains("unreachable", cache.Current.LastError);
}
[Fact]
public async Task RefreshAsync_QueryFailsAfterPriorData_DegradesToStaleAndKeepsData()
{
FakeGalaxyRepository repository = new(SampleHierarchy(), SampleAttributes(), DeployTime);
using GalaxyHierarchyCache cache = new(repository, new RecordingDeployNotifier(), new StubTimeProvider(FixedNow));
await cache.RefreshAsync(CancellationToken.None);
Assert.True(cache.Current.HasData);
// A later refresh fails: data is retained but flagged Stale.
repository.DeployTime = DeployTime.AddHours(1);
repository.ThrowOnQuery = new InvalidOperationException("transient");
await cache.RefreshAsync(CancellationToken.None);
Assert.True(cache.Current.HasData);
Assert.Equal(GalaxyCacheStatus.Stale, cache.Current.Status);
Assert.Equal(2, cache.Current.ObjectCount);
}
[Fact]
public async Task Current_AfterStalenessThreshold_ProjectsHealthyToStale()
{
FakeGalaxyRepository repository = new(SampleHierarchy(), SampleAttributes(), DeployTime);
StubTimeProvider clock = new(FixedNow);
using GalaxyHierarchyCache cache = new(repository, new RecordingDeployNotifier(), clock);
await cache.RefreshAsync(CancellationToken.None);
Assert.Equal(GalaxyCacheStatus.Healthy, cache.Current.Status);
// Advance past the 5-minute staleness threshold with no successful refresh.
clock.Advance(TimeSpan.FromMinutes(6));
Assert.Equal(GalaxyCacheStatus.Stale, cache.Current.Status);
// Data is still present — Stale means "old", not "gone".
Assert.True(cache.Current.HasData);
}
[Fact]
public async Task RefreshAsync_PersistsSnapshotAfterSuccessfulHeavyRefresh()
{
FakeGalaxyRepository repository = new(SampleHierarchy(), SampleAttributes(), DeployTime);
FakeSnapshotStore store = new();
using GalaxyHierarchyCache cache = new(
repository, new RecordingDeployNotifier(), new StubTimeProvider(FixedNow), logger: null, snapshotStore: store);
await cache.RefreshAsync(CancellationToken.None);
Assert.Equal(1, store.SaveCount);
Assert.NotNull(store.Snapshot);
Assert.Equal(2, store.Snapshot!.Hierarchy.Count);
Assert.Single(store.Snapshot.Attributes);
}
[Fact]
public async Task RefreshAsync_SnapshotRestore_ServesLastKnownDataAsStaleWhenDatabaseUnreachable()
{
// The snapshot store already holds a persisted dataset (last-known browse data).
FakeSnapshotStore store = new()
{
Snapshot = new GalaxyHierarchySnapshot(
LastDeployTime: DeployTime,
SavedAt: FixedNow.AddMinutes(-1),
Hierarchy: SampleHierarchy(),
Attributes: SampleAttributes()),
};
// The Galaxy database is unreachable on this cold start.
FakeGalaxyRepository repository = new(SampleHierarchy(), SampleAttributes(), DeployTime)
{
ThrowOnQuery = new TimeoutException("cold start, db down"),
};
RecordingDeployNotifier notifier = new();
using GalaxyHierarchyCache cache = new(
repository, notifier, new StubTimeProvider(FixedNow), logger: null, snapshotStore: store);
await cache.RefreshAsync(CancellationToken.None);
// First load is satisfied by the restored snapshot, not by SQL.
await cache.WaitForFirstLoadAsync(new CancellationTokenSource(TimeSpan.FromSeconds(5)).Token);
Assert.Equal(1, store.LoadCount);
GalaxyHierarchyCacheEntry current = cache.Current;
Assert.True(current.HasData);
// Restored data is "last-known", surfaced as Stale until the live DB confirms.
Assert.Equal(GalaxyCacheStatus.Stale, current.Status);
Assert.Equal(2, current.ObjectCount);
Assert.Equal(DeployTime, current.LastDeployTime!.Value.UtcDateTime);
// A deploy event was published for the restored data.
Assert.Single(notifier.Published);
}
[Fact]
public async Task RefreshAsync_SnapshotRestoreThenLiveQuery_PromotesRestoredDataToHealthy()
{
FakeSnapshotStore store = new()
{
Snapshot = new GalaxyHierarchySnapshot(
LastDeployTime: DeployTime,
SavedAt: FixedNow.AddMinutes(-1),
Hierarchy: SampleHierarchy(),
Attributes: SampleAttributes()),
};
// DB is reachable and reports the SAME deploy time the snapshot was pulled at.
FakeGalaxyRepository repository = new(SampleHierarchy(), SampleAttributes(), DeployTime);
using GalaxyHierarchyCache cache = new(
repository, new RecordingDeployNotifier(), new StubTimeProvider(FixedNow), logger: null, snapshotStore: store);
await cache.RefreshAsync(CancellationToken.None);
// Restore seeds Stale data; the same-deploy live query promotes it to Healthy
// without re-running the heavy hierarchy/attribute reads.
Assert.Equal(GalaxyCacheStatus.Healthy, cache.Current.Status);
Assert.Equal(0, repository.HierarchyReadCount);
Assert.True(cache.Current.HasData);
}
[Fact]
public void Dispose_CanBeCalledWithoutHavingRefreshed()
{
FakeGalaxyRepository repository = new(SampleHierarchy(), SampleAttributes(), DeployTime);
GalaxyHierarchyCache cache = new(repository, new RecordingDeployNotifier(), new StubTimeProvider(FixedNow));
// Dispose must be safe even when no refresh ever ran (semaphore never entered).
cache.Dispose();
}
}
@@ -0,0 +1,458 @@
using Grpc.Core;
using ZB.MOM.WW.GalaxyRepository;
using ZB.MOM.WW.GalaxyRepository.Grpc;
namespace ZB.MOM.WW.GalaxyRepository.Tests;
/// <summary>
/// Pure-logic tests for <see cref="GalaxyHierarchyProjector"/> and
/// <see cref="GalaxyBrowseProjector"/>. No SQL: the cache entry under test is built
/// from a small hand-made hierarchy through the same materialization the live cache
/// uses (a fake <see cref="IGalaxyRepository"/> driven through
/// <see cref="GalaxyHierarchyCache.RefreshAsync"/>), so the projectors are exercised
/// against a real <see cref="GalaxyHierarchyIndex"/>.
/// </summary>
public sealed class GalaxyHierarchyProjectorTests
{
/// <summary>
/// Builds a realistic cache entry by driving a fake repository through the cache's
/// own refresh path. This goes through <c>BuildEntry</c> + <see cref="GalaxyHierarchyIndex.Build"/>
/// exactly as production does, rather than reaching for an internal factory.
/// </summary>
private static GalaxyHierarchyCacheEntry BuildEntry(
IReadOnlyList<GalaxyHierarchyRow> hierarchy,
IReadOnlyList<GalaxyAttributeRow> attributes)
{
FakeGalaxyRepository repository = new(hierarchy, attributes, deployTime: new DateTime(2026, 1, 1, 0, 0, 0, DateTimeKind.Utc));
using GalaxyHierarchyCache cache = new(repository, new RecordingDeployNotifier());
cache.RefreshAsync(CancellationToken.None).GetAwaiter().GetResult();
GalaxyHierarchyCacheEntry entry = cache.Current;
Assert.True(entry.HasData);
return entry;
}
// A small but representative galaxy:
// PlantArea (area, id 1)
// ├─ LineA (area, id 2)
// │ ├─ Pump01 (id 10, template "Pump", historized+alarm attr)
// │ └─ Valve01 (id 11, template "Valve", plain attr)
// └─ Mixer01 (id 12, template "Mixer", alarm attr only)
// StandaloneTank (id 20, no parent — a root object)
private static GalaxyHierarchyCacheEntry BuildSampleEntry()
{
List<GalaxyHierarchyRow> hierarchy =
[
Hierarchy(1, "PlantArea", parent: 0, isArea: true, category: 100),
Hierarchy(2, "LineA", parent: 1, isArea: true, category: 100),
Hierarchy(10, "Pump01", parent: 2, category: 200, templates: ["$Pump", "$UserDefined"]),
Hierarchy(11, "Valve01", parent: 2, category: 201, templates: ["$Valve"]),
Hierarchy(12, "Mixer01", parent: 1, category: 202, templates: ["$Mixer"]),
Hierarchy(20, "StandaloneTank", parent: 0, category: 203, templates: ["$Tank"]),
];
List<GalaxyAttributeRow> attributes =
[
// Pump01: historized AND alarm-bearing.
Attribute(10, "Pump01.PV", historized: true, alarm: true),
Attribute(10, "Pump01.SP", historized: false, alarm: false),
// Valve01: plain.
Attribute(11, "Valve01.Cmd", historized: false, alarm: false),
// Mixer01: alarm only.
Attribute(12, "Mixer01.Fault", historized: false, alarm: true),
// StandaloneTank: historized only.
Attribute(20, "StandaloneTank.Level", historized: true, alarm: false),
];
return BuildEntry(hierarchy, attributes);
}
private static GalaxyHierarchyRow Hierarchy(
int id,
string tagName,
int parent,
bool isArea = false,
int category = 0,
IReadOnlyList<string>? templates = null) => new()
{
GobjectId = id,
TagName = tagName,
ContainedName = tagName,
BrowseName = tagName,
ParentGobjectId = parent,
IsArea = isArea,
CategoryId = category,
TemplateChain = templates ?? Array.Empty<string>(),
};
private static GalaxyAttributeRow Attribute(
int gobjectId,
string fullTagReference,
bool historized,
bool alarm) => new()
{
GobjectId = gobjectId,
AttributeName = fullTagReference.Split('.')[^1],
FullTagReference = fullTagReference,
IsHistorized = historized,
IsAlarm = alarm,
};
[Fact]
public void Project_NoFilters_ReturnsEveryObject()
{
GalaxyHierarchyCacheEntry entry = BuildSampleEntry();
GalaxyHierarchyQueryResult result = GalaxyHierarchyProjector.Project(entry, new DiscoverHierarchyRequest());
Assert.Equal(6, result.TotalObjectCount);
Assert.Equal(6, result.Objects.Count);
}
[Fact]
public void Project_PageSizeAndOffset_SlicesTheOrderedResult()
{
GalaxyHierarchyCacheEntry entry = BuildSampleEntry();
DiscoverHierarchyRequest request = new();
GalaxyHierarchyQueryResult full = GalaxyHierarchyProjector.Project(entry, request, browseSubtreeGlobs: null, offset: 0, pageSize: int.MaxValue);
GalaxyHierarchyQueryResult page1 = GalaxyHierarchyProjector.Project(entry, request, browseSubtreeGlobs: null, offset: 0, pageSize: 2);
GalaxyHierarchyQueryResult page2 = GalaxyHierarchyProjector.Project(entry, request, browseSubtreeGlobs: null, offset: 2, pageSize: 2);
GalaxyHierarchyQueryResult page3 = GalaxyHierarchyProjector.Project(entry, request, browseSubtreeGlobs: null, offset: 4, pageSize: 2);
// Total is unaffected by paging.
Assert.Equal(6, page1.TotalObjectCount);
Assert.Equal(2, page1.Objects.Count);
Assert.Equal(2, page2.Objects.Count);
Assert.Equal(2, page3.Objects.Count);
// The three pages reconstruct the full ordered result with no gaps/dupes.
List<int> paged =
[
.. page1.Objects.Select(o => o.GobjectId),
.. page2.Objects.Select(o => o.GobjectId),
.. page3.Objects.Select(o => o.GobjectId),
];
Assert.Equal(full.Objects.Select(o => o.GobjectId), paged);
}
[Fact]
public void Project_OffsetPastEnd_ReturnsEmptyPageButRealTotal()
{
GalaxyHierarchyCacheEntry entry = BuildSampleEntry();
GalaxyHierarchyQueryResult result = GalaxyHierarchyProjector.Project(
entry, new DiscoverHierarchyRequest(), browseSubtreeGlobs: null, offset: 999, pageSize: 10);
Assert.Empty(result.Objects);
Assert.Equal(6, result.TotalObjectCount);
}
[Fact]
public void Project_PageSignature_IsStableAcrossPagesAndMatchesComputeFilterSignature()
{
GalaxyHierarchyCacheEntry entry = BuildSampleEntry();
DiscoverHierarchyRequest request = new() { TagNameGlob = "Pump*" };
string expected = GalaxyHierarchyProjector.ComputeFilterSignature(request, browseSubtreeGlobs: null);
GalaxyHierarchyQueryResult page1 = GalaxyHierarchyProjector.Project(entry, request, browseSubtreeGlobs: null, offset: 0, pageSize: 1);
GalaxyHierarchyQueryResult page2 = GalaxyHierarchyProjector.Project(entry, request, browseSubtreeGlobs: null, offset: 1, pageSize: 1);
// The signature a caller computes to mint a page token round-trips: the projector
// reports the same signature on every page of the same filter set.
Assert.Equal(expected, page1.FilterSignature);
Assert.Equal(expected, page2.FilterSignature);
}
[Fact]
public void ComputeFilterSignature_DiffersWhenAnyFilterChanges()
{
DiscoverHierarchyRequest baseRequest = new() { TagNameGlob = "Pump*" };
DiscoverHierarchyRequest differentGlob = new() { TagNameGlob = "Valve*" };
DiscoverHierarchyRequest differentAlarm = new() { TagNameGlob = "Pump*", AlarmBearingOnly = true };
string baseSig = GalaxyHierarchyProjector.ComputeFilterSignature(baseRequest, null);
Assert.NotEqual(baseSig, GalaxyHierarchyProjector.ComputeFilterSignature(differentGlob, null));
Assert.NotEqual(baseSig, GalaxyHierarchyProjector.ComputeFilterSignature(differentAlarm, null));
Assert.NotEqual(baseSig, GalaxyHierarchyProjector.ComputeFilterSignature(baseRequest, browseSubtreeGlobs: ["PlantArea/*"]));
// Same inputs => same signature (deterministic).
Assert.Equal(baseSig, GalaxyHierarchyProjector.ComputeFilterSignature(new DiscoverHierarchyRequest { TagNameGlob = "Pump*" }, null));
}
[Fact]
public void Project_MaxDepthZero_FromRoot_ReturnsOnlyTheRoot()
{
GalaxyHierarchyCacheEntry entry = BuildSampleEntry();
DiscoverHierarchyRequest request = new() { RootGobjectId = 1, MaxDepth = 0 };
GalaxyHierarchyQueryResult result = GalaxyHierarchyProjector.Project(entry, request);
GalaxyObject only = Assert.Single(result.Objects);
Assert.Equal(1, only.GobjectId);
}
[Fact]
public void Project_MaxDepthOne_FromRoot_ReturnsRootAndDirectChildrenOnly()
{
GalaxyHierarchyCacheEntry entry = BuildSampleEntry();
// PlantArea(1) depth 0; LineA(2) and Mixer01(12) depth 1; Pump01/Valve01 depth 2.
DiscoverHierarchyRequest request = new() { RootGobjectId = 1, MaxDepth = 1 };
GalaxyHierarchyQueryResult result = GalaxyHierarchyProjector.Project(entry, request);
Assert.Equal([1, 2, 12], result.Objects.Select(o => o.GobjectId).OrderBy(id => id));
}
[Fact]
public void Project_NegativeMaxDepth_ThrowsInvalidArgument()
{
GalaxyHierarchyCacheEntry entry = BuildSampleEntry();
DiscoverHierarchyRequest request = new() { MaxDepth = -1 };
RpcException ex = Assert.Throws<RpcException>(() => GalaxyHierarchyProjector.Project(entry, request));
Assert.Equal(StatusCode.InvalidArgument, ex.StatusCode);
}
[Fact]
public void Project_UnknownRoot_ThrowsNotFound()
{
GalaxyHierarchyCacheEntry entry = BuildSampleEntry();
DiscoverHierarchyRequest request = new() { RootGobjectId = 99999 };
RpcException ex = Assert.Throws<RpcException>(() => GalaxyHierarchyProjector.Project(entry, request));
Assert.Equal(StatusCode.NotFound, ex.StatusCode);
}
[Fact]
public void Project_HistorizedOnly_ReturnsOnlyObjectsWithAHistorizedAttribute()
{
GalaxyHierarchyCacheEntry entry = BuildSampleEntry();
DiscoverHierarchyRequest request = new() { HistorizedOnly = true };
GalaxyHierarchyQueryResult result = GalaxyHierarchyProjector.Project(entry, request);
// Pump01(10) and StandaloneTank(20) carry historized attributes.
Assert.Equal([10, 20], result.Objects.Select(o => o.GobjectId).OrderBy(id => id));
}
[Fact]
public void Project_AlarmBearingOnly_ReturnsOnlyObjectsWithAnAlarmAttribute()
{
GalaxyHierarchyCacheEntry entry = BuildSampleEntry();
DiscoverHierarchyRequest request = new() { AlarmBearingOnly = true };
GalaxyHierarchyQueryResult result = GalaxyHierarchyProjector.Project(entry, request);
// Pump01(10) and Mixer01(12) carry alarm attributes.
Assert.Equal([10, 12], result.Objects.Select(o => o.GobjectId).OrderBy(id => id));
}
[Fact]
public void Project_AlarmAndHistorizedTogether_RequiresBoth()
{
GalaxyHierarchyCacheEntry entry = BuildSampleEntry();
DiscoverHierarchyRequest request = new() { AlarmBearingOnly = true, HistorizedOnly = true };
GalaxyHierarchyQueryResult result = GalaxyHierarchyProjector.Project(entry, request);
// Only Pump01(10) carries an attribute set that is both historized and alarm-bearing.
GalaxyObject only = Assert.Single(result.Objects);
Assert.Equal(10, only.GobjectId);
}
[Fact]
public void Project_TagNameGlob_MatchesAnchoredCaseInsensitive()
{
GalaxyHierarchyCacheEntry entry = BuildSampleEntry();
GalaxyHierarchyQueryResult prefix = GalaxyHierarchyProjector.Project(entry, new DiscoverHierarchyRequest { TagNameGlob = "Pump*" });
Assert.Equal([10], prefix.Objects.Select(o => o.GobjectId));
// Case-insensitive.
GalaxyHierarchyQueryResult lower = GalaxyHierarchyProjector.Project(entry, new DiscoverHierarchyRequest { TagNameGlob = "pump01" });
Assert.Equal([10], lower.Objects.Select(o => o.GobjectId));
// '?' single-char wildcard: "Pump0?" matches "Pump01".
GalaxyHierarchyQueryResult single = GalaxyHierarchyProjector.Project(entry, new DiscoverHierarchyRequest { TagNameGlob = "Pump0?" });
Assert.Equal([10], single.Objects.Select(o => o.GobjectId));
// Anchored: a bare substring that is not a prefix matches nothing.
GalaxyHierarchyQueryResult anchored = GalaxyHierarchyProjector.Project(entry, new DiscoverHierarchyRequest { TagNameGlob = "ump01" });
Assert.Empty(anchored.Objects);
}
[Fact]
public void Project_CategoryIds_FilterByObjectCategory()
{
GalaxyHierarchyCacheEntry entry = BuildSampleEntry();
DiscoverHierarchyRequest request = new() { CategoryIds = { 200, 201 } };
GalaxyHierarchyQueryResult result = GalaxyHierarchyProjector.Project(entry, request);
// category 200 = Pump01(10), category 201 = Valve01(11).
Assert.Equal([10, 11], result.Objects.Select(o => o.GobjectId).OrderBy(id => id));
}
[Fact]
public void Project_TemplateChainContains_IsSubstringAndCaseInsensitive()
{
GalaxyHierarchyCacheEntry entry = BuildSampleEntry();
DiscoverHierarchyRequest request = new() { TemplateChainContains = { "pump" } };
GalaxyHierarchyQueryResult result = GalaxyHierarchyProjector.Project(entry, request);
GalaxyObject only = Assert.Single(result.Objects);
Assert.Equal(10, only.GobjectId);
}
[Fact]
public void Project_IncludeAttributesDefault_CarriesAttributes()
{
GalaxyHierarchyCacheEntry entry = BuildSampleEntry();
DiscoverHierarchyRequest request = new() { TagNameGlob = "Pump*" };
GalaxyHierarchyQueryResult result = GalaxyHierarchyProjector.Project(entry, request);
GalaxyObject pump = Assert.Single(result.Objects);
Assert.Equal(2, pump.Attributes.Count);
}
[Fact]
public void Project_IncludeAttributesFalse_ReturnsSkeletons()
{
GalaxyHierarchyCacheEntry entry = BuildSampleEntry();
DiscoverHierarchyRequest request = new() { TagNameGlob = "Pump*", IncludeAttributes = false };
GalaxyHierarchyQueryResult result = GalaxyHierarchyProjector.Project(entry, request);
GalaxyObject pump = Assert.Single(result.Objects);
Assert.Empty(pump.Attributes);
}
[Fact]
public void Project_IncludeAttributesFalse_DoesNotMutateTheCachedEntry()
{
GalaxyHierarchyCacheEntry entry = BuildSampleEntry();
// Project with attributes stripped, then again with attributes included.
GalaxyHierarchyProjector.Project(entry, new DiscoverHierarchyRequest { TagNameGlob = "Pump*", IncludeAttributes = false });
GalaxyHierarchyQueryResult included = GalaxyHierarchyProjector.Project(entry, new DiscoverHierarchyRequest { TagNameGlob = "Pump*" });
// The earlier strip cloned the object — the cached entry still holds the attributes.
GalaxyObject pump = Assert.Single(included.Objects);
Assert.Equal(2, pump.Attributes.Count);
}
[Fact]
public void Project_InvalidOffsetOrPageSize_Throws()
{
GalaxyHierarchyCacheEntry entry = BuildSampleEntry();
Assert.Throws<ArgumentOutOfRangeException>(() =>
GalaxyHierarchyProjector.Project(entry, new DiscoverHierarchyRequest(), null, offset: -1, pageSize: 10));
Assert.Throws<ArgumentOutOfRangeException>(() =>
GalaxyHierarchyProjector.Project(entry, new DiscoverHierarchyRequest(), null, offset: 0, pageSize: 0));
}
// ---- GalaxyBrowseProjector ----
[Fact]
public void ProjectChildren_OfPlantArea_ReturnsDirectChildrenAreasFirst()
{
GalaxyHierarchyCacheEntry entry = BuildSampleEntry();
BrowseChildrenRequest request = new() { ParentGobjectId = 1 };
GalaxyBrowseChildrenResult result = GalaxyBrowseProjector.ProjectChildren(entry, request, browseSubtreeGlobs: null, offset: 0, pageSize: 100);
// Direct children of PlantArea(1) are LineA(2, area) and Mixer01(12, non-area);
// areas sort first.
Assert.Equal([2, 12], result.Children.Select(c => c.GobjectId));
Assert.Equal(2, result.TotalChildCount);
}
[Fact]
public void ProjectChildren_ChildHasChildrenFlag_ReflectsPresenceOfChildren()
{
GalaxyHierarchyCacheEntry entry = BuildSampleEntry();
BrowseChildrenRequest request = new() { ParentGobjectId = 1 };
GalaxyBrowseChildrenResult result = GalaxyBrowseProjector.ProjectChildren(entry, request, browseSubtreeGlobs: null, offset: 0, pageSize: 100);
Dictionary<int, bool> hasChildren = result.Children
.Select((child, index) => (child.GobjectId, result.ChildHasChildren[index]))
.ToDictionary(t => t.GobjectId, t => t.Item2);
// LineA(2) contains Pump01/Valve01 -> true; Mixer01(12) is a leaf -> false.
Assert.True(hasChildren[2]);
Assert.False(hasChildren[12]);
}
[Fact]
public void ProjectChildren_OfRoot_ReturnsTopLevelObjects()
{
GalaxyHierarchyCacheEntry entry = BuildSampleEntry();
// Empty parent oneof => roots (parent id 0).
BrowseChildrenRequest request = new();
GalaxyBrowseChildrenResult result = GalaxyBrowseProjector.ProjectChildren(entry, request, browseSubtreeGlobs: null, offset: 0, pageSize: 100);
// Roots: PlantArea(1, area) and StandaloneTank(20, non-area); areas first.
Assert.Equal([1, 20], result.Children.Select(c => c.GobjectId));
}
[Fact]
public void ProjectChildren_FilterMatchingDescendant_SurfacesNonMatchingAncestor()
{
GalaxyHierarchyCacheEntry entry = BuildSampleEntry();
// Pump01 lives two levels under PlantArea. Browsing PlantArea's children with a
// Pump glob should still surface LineA (which itself does not match) because it
// contains a matching descendant.
BrowseChildrenRequest request = new() { ParentGobjectId = 1, TagNameGlob = "Pump*" };
GalaxyBrowseChildrenResult result = GalaxyBrowseProjector.ProjectChildren(entry, request, browseSubtreeGlobs: null, offset: 0, pageSize: 100);
GalaxyObject surfaced = Assert.Single(result.Children);
Assert.Equal(2, surfaced.GobjectId);
Assert.True(result.ChildHasChildren[0]);
}
[Fact]
public void ProjectChildren_UnknownParent_ThrowsNotFound()
{
GalaxyHierarchyCacheEntry entry = BuildSampleEntry();
BrowseChildrenRequest request = new() { ParentGobjectId = 99999 };
RpcException ex = Assert.Throws<RpcException>(() =>
GalaxyBrowseProjector.ProjectChildren(entry, request, null, 0, 100));
Assert.Equal(StatusCode.NotFound, ex.StatusCode);
}
[Fact]
public void ProjectChildren_Paging_SlicesAndPreservesTotal()
{
GalaxyHierarchyCacheEntry entry = BuildSampleEntry();
// LineA(2) has two direct children: Pump01, Valve01.
BrowseChildrenRequest request = new() { ParentGobjectId = 2 };
GalaxyBrowseChildrenResult page1 = GalaxyBrowseProjector.ProjectChildren(entry, request, null, offset: 0, pageSize: 1);
GalaxyBrowseChildrenResult page2 = GalaxyBrowseProjector.ProjectChildren(entry, request, null, offset: 1, pageSize: 1);
Assert.Equal(2, page1.TotalChildCount);
Assert.Single(page1.Children);
Assert.Single(page2.Children);
Assert.NotEqual(page1.Children[0].GobjectId, page2.Children[0].GobjectId);
// Same filter+parent => same signature on both pages.
Assert.Equal(page1.FilterSignature, page2.FilterSignature);
}
[Fact]
public void ResolveParentId_ByTagName_ResolvesToGobjectId()
{
GalaxyHierarchyCacheEntry entry = BuildSampleEntry();
BrowseChildrenRequest request = new() { ParentTagName = "LineA" };
int id = GalaxyBrowseProjector.ResolveParentId(entry, request);
Assert.Equal(2, id);
}
}
@@ -0,0 +1,84 @@
using Microsoft.Extensions.Options;
using ZB.MOM.WW.GalaxyRepository;
namespace ZB.MOM.WW.GalaxyRepository.Tests;
/// <summary>
/// Round-trip tests for the real <see cref="GalaxyHierarchySnapshotStore"/> over a temp
/// file path: save then load, no-op when persistence is disabled, and clean disposal.
/// </summary>
public sealed class GalaxyHierarchySnapshotStoreTests : IDisposable
{
private readonly string _path = Path.Combine(
Path.GetTempPath(),
$"galaxyrepo-snap-{Guid.NewGuid():N}.json");
public void Dispose()
{
if (File.Exists(_path))
{
File.Delete(_path);
}
}
private static GalaxyHierarchySnapshot SampleSnapshot() => new(
LastDeployTime: new DateTimeOffset(2026, 1, 1, 0, 0, 0, TimeSpan.Zero),
SavedAt: new DateTimeOffset(2026, 1, 1, 12, 0, 0, TimeSpan.Zero),
Hierarchy:
[
new GalaxyHierarchyRow { GobjectId = 1, TagName = "Area1", IsArea = true },
new GalaxyHierarchyRow { GobjectId = 2, TagName = "Pump01", ParentGobjectId = 1 },
],
Attributes:
[
new GalaxyAttributeRow { GobjectId = 2, AttributeName = "PV", FullTagReference = "Pump01.PV", IsHistorized = true },
]);
[Fact]
public async Task SaveThenLoad_RoundTripsTheSnapshot()
{
using GalaxyHierarchySnapshotStore store = new(
Options.Create(new GalaxyRepositoryOptions { PersistSnapshot = true, SnapshotCachePath = _path }));
await store.SaveAsync(SampleSnapshot(), CancellationToken.None);
GalaxyHierarchySnapshot? loaded = await store.TryLoadAsync(CancellationToken.None);
Assert.NotNull(loaded);
Assert.Equal(2, loaded!.Hierarchy.Count);
Assert.Single(loaded.Attributes);
Assert.Equal("Pump01.PV", loaded.Attributes[0].FullTagReference);
Assert.True(loaded.Attributes[0].IsHistorized);
Assert.Equal(SampleSnapshot().LastDeployTime, loaded.LastDeployTime);
}
[Fact]
public async Task SaveAndLoad_AreNoOps_WhenPersistenceDisabled()
{
using GalaxyHierarchySnapshotStore store = new(
Options.Create(new GalaxyRepositoryOptions { PersistSnapshot = false, SnapshotCachePath = _path }));
await store.SaveAsync(SampleSnapshot(), CancellationToken.None);
Assert.False(File.Exists(_path));
Assert.Null(await store.TryLoadAsync(CancellationToken.None));
}
[Fact]
public async Task TryLoad_ReturnsNull_WhenNoFileExists()
{
using GalaxyHierarchySnapshotStore store = new(
Options.Create(new GalaxyRepositoryOptions { PersistSnapshot = true, SnapshotCachePath = _path }));
Assert.Null(await store.TryLoadAsync(CancellationToken.None));
}
[Fact]
public async Task TryLoad_ReturnsNull_WhenFileIsNotValidJson()
{
await File.WriteAllTextAsync(_path, "{ this is not valid json");
using GalaxyHierarchySnapshotStore store = new(
Options.Create(new GalaxyRepositoryOptions { PersistSnapshot = true, SnapshotCachePath = _path }));
Assert.Null(await store.TryLoadAsync(CancellationToken.None));
}
}
@@ -0,0 +1,25 @@
<Project Sdk="Microsoft.NET.Sdk">
<PropertyGroup>
<IsPackable>false</IsPackable>
<!-- Test project does not ship; no XML docs required (overrides Directory.Build.props). -->
<GenerateDocumentationFile>false</GenerateDocumentationFile>
</PropertyGroup>
<ItemGroup>
<PackageReference Include="coverlet.collector" />
<PackageReference Include="Microsoft.NET.Test.Sdk" />
<PackageReference Include="xunit" />
<PackageReference Include="xunit.runner.visualstudio" />
<PackageReference Include="Microsoft.Data.SqlClient" />
</ItemGroup>
<ItemGroup>
<Using Include="Xunit" />
</ItemGroup>
<ItemGroup>
<ProjectReference Include="..\..\src\ZB.MOM.WW.GalaxyRepository\ZB.MOM.WW.GalaxyRepository.csproj" />
</ItemGroup>
</Project>
+13 -10
View File
@@ -4,7 +4,7 @@ Observability libraries for the **ZB.MOM.WW SCADA family** (OtOpcUa, MxAccessGat
The library normalizes the three-project observability surface: a shared OpenTelemetry Resource driven by a single identity triple (`service.name` / `site.id` / `node.role`), standard instrumentation wiring, Prometheus and OTLP export, and a Serilog bootstrap with enrichers and `TraceContextEnricher` for trace↔log correlation. The library normalizes the three-project observability surface: a shared OpenTelemetry Resource driven by a single identity triple (`service.name` / `site.id` / `node.role`), standard instrumentation wiring, Prometheus and OTLP export, and a Serilog bootstrap with enrichers and `TraceContextEnricher` for trace↔log correlation.
**Built at 0.1.0. MxAccessGateway logging adopted (MEL → Serilog migration done on its own branch). OtOpcUa and ScadaBridge telemetry adoption is follow-on.** Adoption tracked in `~/Desktop/scadaproj/components/observability/GAPS.md`. **Built at 0.1.0, published to the Gitea NuGet feed, and adopted across all three apps on 2026-06-01** (branch `feat/adopt-zb-telemetry` per repo, behaviour-preserving). MxAccessGateway's MEL→Serilog migration + metrics export both landed in this pass — they were *not* actually done beforehand despite the earlier claim. ScadaBridge keeps its `LoggerConfigurationFactory` (min-level governance) and only adds the shared `TraceContextEnricher`; it does not call `AddZbSerilog`. Per-repo result + deferred follow-ons tracked in `~/Desktop/scadaproj/components/observability/GAPS.md`.
--- ---
@@ -21,12 +21,13 @@ The library normalizes the three-project observability surface: a shared OpenTel
| Consumer | `ZB.MOM.WW.Telemetry` (core) | `ZB.MOM.WW.Telemetry.Serilog` | | Consumer | `ZB.MOM.WW.Telemetry` (core) | `ZB.MOM.WW.Telemetry.Serilog` |
|---|:---:|:---:| |---|:---:|:---:|
| **OtOpcUa** | yes (after adoption) | yes (after adoption) | | **OtOpcUa** | ✅ adopted | ✅ adopted (`AddZbSerilog`) |
| **MxAccessGateway** | yes (after adoption) | yes (MELSerilog adopted now) | | **MxAccessGateway** | ✅ adopted (`GatewayMetrics` exported) | ✅ adopted (MELSerilog migrated in this pass) |
| **ScadaBridge** | yes (after adoption) | yes (after adoption) | | **ScadaBridge** | ✅ adopted (both roots) | ⚠️ referenced for `TraceContextEnricher` only — keeps `LoggerConfigurationFactory`, does **not** call `AddZbSerilog` |
MxAccessGateway's logging adoption is the one in-pass migration. Full metrics/tracing wiring All three adopted on 2026-06-01 (branch `feat/adopt-zb-telemetry` per repo). ScadaBridge's logging
for all three apps is follow-on. deviates: it keeps its own `LoggerConfigurationFactory` (min-level governance contract) and only
adds the shared `TraceContextEnricher`. See `components/observability/GAPS.md` for the full result.
--- ---
@@ -60,11 +61,13 @@ All test assemblies run offline:
## Status ## Status
Built at **0.1.0** and published to the Gitea NuGet feed. MxAccessGateway logging (MEL → Serilog) Built at **0.1.0**, published to the Gitea NuGet feed, and **adopted across all three apps on
adopted on its own branch. **OtOpcUa and ScadaBridge telemetry adoption not yet started** 2026-06-01** (branch `feat/adopt-zb-telemetry` per repo, behaviour-preserving). MxAccessGateway's
tracked in the component backlog: MEL→Serilog migration and metrics export both landed in this pass (not beforehand, despite the
earlier claim). Deferred follow-ons (MxGateway `ms``s` + Meter rename, ScadaBridge app instruments
+ Site-node HTTP/1.1 metrics listener, OTLP wiring) are tracked in the component backlog:
- `~/Desktop/scadaproj/components/observability/GAPS.md` — adoption order, effort, and risk - `~/Desktop/scadaproj/components/observability/GAPS.md` — adoption status + deferred follow-ons
Design documentation: Design documentation:
+1 -1
View File
@@ -4,7 +4,7 @@
<Nullable>enable</Nullable> <Nullable>enable</Nullable>
<ImplicitUsings>enable</ImplicitUsings> <ImplicitUsings>enable</ImplicitUsings>
<LangVersion>latest</LangVersion> <LangVersion>latest</LangVersion>
<Version>0.1.0</Version> <Version>0.3.1</Version>
<ManagePackageVersionsCentrally>true</ManagePackageVersionsCentrally> <ManagePackageVersionsCentrally>true</ManagePackageVersionsCentrally>
</PropertyGroup> </PropertyGroup>
</Project> </Project>
@@ -8,7 +8,17 @@
<div class="login-wrap rise"> <div class="login-wrap rise">
<section class="panel"> <section class="panel">
<div class="login-body"> <div class="login-body">
<h1 class="login-title">@Product &mdash; sign in</h1> @* The product token is wrapped in its own span so consumers can restyle
it and tests can assert the product in isolation (kit issue #5). Set
Heading to replace the whole heading copy (e.g. for localization). *@
@if (!string.IsNullOrWhiteSpace(Heading))
{
<h1 class="login-title">@Heading</h1>
}
else
{
<h1 class="login-title"><span class="login-product">@Product</span> &mdash; sign in</h1>
}
<form method="post" action="@Action" data-enhance="false"> <form method="post" action="@Action" data-enhance="false">
@if (!string.IsNullOrEmpty(ReturnUrl)) @if (!string.IsNullOrEmpty(ReturnUrl))
{ {
@@ -36,9 +46,21 @@
</div> </div>
@code { @code {
/// <summary>Product name shown in the card heading. Required.</summary> /// <summary>
/// Product name shown in the card heading (rendered inside a
/// <c>&lt;span class="login-product"&gt;</c>, followed by the &quot;&#8212; sign in&quot;
/// suffix). Required. Ignored when <see cref="Heading"/> is set.
/// </summary>
[Parameter, EditorRequired] public string Product { get; set; } = string.Empty; [Parameter, EditorRequired] public string Product { get; set; } = string.Empty;
/// <summary>
/// Optional full heading override. When set (non-whitespace), it replaces the
/// default <c>&lt;Product&gt; &#8212; sign in</c> heading entirely — use it to
/// localize or fully customize the heading copy. When unset, the default heading
/// (with <see cref="Product"/> in a <c>.login-product</c> span) is rendered.
/// </summary>
[Parameter] public string? Heading { get; set; }
/// <summary> /// <summary>
/// Form <c>action</c> URL the sign-in POST targets. Defaults to <c>/auth/login</c>. /// Form <c>action</c> URL the sign-in POST targets. Defaults to <c>/auth/login</c>.
/// </summary> /// </summary>
@@ -1,8 +1,11 @@
@* Components/NavRailSection.razor — CSS-only collapsible (no JS, works in static SSR). @* Components/NavRailSection.razor — CSS-only collapsible (no JS, works in static SSR).
Apps that want cookie-persisted expand state keep their own interactive NavSection. *@ Apps that want cookie-persisted expand state keep their own interactive NavSection. *@
@namespace ZB.MOM.WW.Theme @namespace ZB.MOM.WW.Theme
<details class="rail-section" open="@Expanded"> <details class="rail-section" open="@Expanded" data-nav-key="@ResolvedKey">
<summary class="rail-eyebrow-toggle">@Title</summary> @* aria-expanded mirrors the native <details open> state so tests and assistive
tech have a stable, queryable attribute (kit issue #1). It is rendered from
Expanded at SSR time and kept in sync by nav-state.js on restore and toggle. *@
<summary class="rail-eyebrow-toggle" aria-expanded="@(Expanded ? "true" : "false")">@Title</summary>
<div class="rail-section-body">@ChildContent</div> <div class="rail-section-body">@ChildContent</div>
</details> </details>
@@ -18,8 +21,24 @@
/// </summary> /// </summary>
[Parameter] public bool Expanded { get; set; } = true; [Parameter] public bool Expanded { get; set; } = true;
/// <summary>
/// Stable identifier used to persist this section's open/closed state in
/// localStorage (via the kit's nav-state.js). Defaults to a slug of <see cref="Title"/>.
/// </summary>
[Parameter] public string? Key { get; set; }
/// <summary> /// <summary>
/// Section content — typically <see cref="NavRailItem"/> children. /// Section content — typically <see cref="NavRailItem"/> children.
/// </summary> /// </summary>
[Parameter] public RenderFragment? ChildContent { get; set; } [Parameter] public RenderFragment? ChildContent { get; set; }
private string ResolvedKey => string.IsNullOrWhiteSpace(Key) ? Slug(Title) : Key!;
private static string Slug(string? s)
{
if (string.IsNullOrWhiteSpace(s)) return string.Empty;
var chars = s.Trim().ToLowerInvariant()
.Select(c => char.IsLetterOrDigit(c) ? c : '-').ToArray();
return string.Join('-', new string(chars).Split('-', StringSplitOptions.RemoveEmptyEntries));
}
} }
@@ -0,0 +1,4 @@
@namespace ZB.MOM.WW.Theme
@* Components/ThemeScripts.razor — drop before </body>. Emits the kit's nav-state
enhancer that persists NavRailSection open/closed state in localStorage. *@
<script src="_content/ZB.MOM.WW.Theme/js/nav-state.js" defer></script>
@@ -47,6 +47,20 @@
and force it shown regardless of the <details> open state (the hamburger and force it shown regardless of the <details> open state (the hamburger
toggle is hidden at this width). */ toggle is hidden at this width). */
@media (min-width: 992px) { @media (min-width: 992px) {
/* Chromium >=121 wraps a <details>'s content in a generated ::details-content
box that carries content-visibility:hidden while the <details> is closed.
Because our app-shell ships closed by default (SSR, no JS) and the toggle
is d-lg-none here, that wrapper would (a) hide the rail+page entirely on
lg+ and (b) sit between .app-shell and its rail/page children, collapsing
the flex-lg-row layout into a vertical stack. Dissolving the wrapper with
display:contents removes its box (so content-visibility no longer hides the
content AND rail/page become direct flex children of .app-shell again).
Browsers that don't support ::details-content treat this as an invalid
selector and drop the rule, falling back to the legacy force-show below. */
.app-shell::details-content {
display: contents;
}
#theme-rail { #theme-rail {
display: block; display: block;
position: sticky; position: sticky;
@@ -196,6 +210,13 @@
.rail-section > summary::-webkit-details-marker { display: none; } .rail-section > summary::-webkit-details-marker { display: none; }
.rail-section > summary::before { content: '\25B6'; font-size: 0.55rem; color: var(--ink-faint); margin-right: 0.4rem; } .rail-section > summary::before { content: '\25B6'; font-size: 0.55rem; color: var(--ink-faint); margin-right: 0.4rem; }
.rail-section[open] > summary::before { content: '\25BC'; } .rail-section[open] > summary::before { content: '\25BC'; }
/* Hide a collapsed section's items explicitly. The browser's built-in
<details> content-hiding (::details-content content-visibility:hidden) is
unreliable once an interactive framework (e.g. Blazor InteractiveServer)
owns/re-renders the native <details> — a closed section can otherwise keep
showing its items under a "collapsed" chevron. An explicit display:none makes
the visual collapse work across all render modes (kit issue #6). */
.rail-section:not([open]) > .rail-section-body { display: none; }
/* StatusPill: info variant (on-palette, reuses the info blue wash) */ /* StatusPill: info variant (on-palette, reuses the info blue wash) */
.chip-info { color: var(--accent-deep); background: var(--info-bg); border-color: var(--info-border); } .chip-info { color: var(--accent-deep); background: var(--info-bg); border-color: var(--info-border); }
@@ -0,0 +1,95 @@
// ZB.MOM.WW.Theme nav-state.js — persists <details data-nav-key> open/closed
// state in localStorage so NavRailSection expand state survives navigation and
// reloads. Pure client-side; works with static Blazor SSR. Keyed per section.
// localStorage keys are prefixed with "zbnav:" to avoid collisions.
(function () {
var PREFIX = "zbnav:";
var INIT_ATTR = "data-zbnav-initialized";
var TRANSIENT_ATTR = "data-zbnav-transient";
// Mirror a section's native <details open> onto its <summary aria-expanded>
// so tests and assistive tech have a stable, queryable attribute (issue #1).
function syncAria(el) {
var summary = el.querySelector("summary.rail-eyebrow-toggle");
if (summary) summary.setAttribute("aria-expanded", el.open ? "true" : "false");
}
function wire(el) {
el.setAttribute(INIT_ATTR, "");
var key = PREFIX + el.getAttribute("data-nav-key");
var saved = null;
try { saved = window.localStorage.getItem(key); } catch (e) { saved = null; }
if (saved === "1") el.open = true;
else if (saved === "0") el.open = false;
el.addEventListener("toggle", function () {
syncAria(el);
// An active-link reveal (issue #2) is a transient open that must NOT
// overwrite the user's saved preference. The reveal flags the element
// before flipping open; consume the flag here and skip persistence.
if (el.getAttribute(TRANSIENT_ATTR) !== null) {
el.removeAttribute(TRANSIENT_ATTR);
return;
}
try { window.localStorage.setItem(key, el.open ? "1" : "0"); } catch (e) { /* ignore */ }
});
}
function apply() {
document.querySelectorAll("details.rail-section[data-nav-key]").forEach(function (el) {
if (!el.hasAttribute(INIT_ATTR)) wire(el); // wire once — avoid duplicate listeners
syncAria(el); // re-sync aria on every pass
});
// Reveal the section that holds the active link even if the user (or the
// app) left it collapsed, so the nav always shows where the user is
// (issue #2). Transient: flagged so the toggle handler does not persist it.
document.querySelectorAll("details.rail-section a.rail-link.active").forEach(function (link) {
var sec = link.closest("details.rail-section");
if (sec && !sec.open) {
sec.setAttribute(TRANSIENT_ATTR, "");
sec.open = true;
syncAria(sec);
}
});
}
if (document.readyState === "loading")
document.addEventListener("DOMContentLoaded", apply);
else
apply();
// Re-run after Blazor static-SSR enhanced navigation (or any re-render that
// replaces the rail nodes) so freshly inserted sections are wired, restored,
// and active-revealed (issue #3). The per-element INIT_ATTR guard keeps this
// idempotent for nodes that survived the navigation.
if (window.Blazor && typeof window.Blazor.addEventListener === "function") {
window.Blazor.addEventListener("enhancedload", apply);
}
// Re-run whenever rail sections are (re)inserted into the DOM. Under an
// interactive render mode (Blazor InteractiveServer/WebAssembly/Auto) the
// prerendered <details> wired on DOMContentLoaded are replaced when the
// runtime adopts the page, and `enhancedload` does NOT fire — so without this
// the live sections are never wired (no persistence, no aria sync, no
// active-reveal). A MutationObserver is the render-mode-agnostic backstop;
// the per-element INIT_ATTR guard keeps re-applies idempotent, and the
// childList-only filter (plus the active-reveal's `if (!sec.open)` guard)
// avoids any observe→mutate→observe loop (issue #6).
if (typeof MutationObserver === "function") {
var observer = new MutationObserver(function (mutations) {
for (var i = 0; i < mutations.length; i++) {
var added = mutations[i].addedNodes;
for (var j = 0; j < added.length; j++) {
var node = added[j];
if (node.nodeType !== 1) continue;
if ((node.matches && node.matches("details.rail-section")) ||
(node.querySelector && node.querySelector("details.rail-section"))) {
apply();
return;
}
}
}
});
observer.observe(document.documentElement, { childList: true, subtree: true });
}
})();
@@ -41,4 +41,28 @@ public class LoginCardTests : TestContext
var cut = RenderComponent<LoginCard>(p => p.Add(x => x.Product, "OtOpcUa")); var cut = RenderComponent<LoginCard>(p => p.Add(x => x.Product, "OtOpcUa"));
Assert.Empty(cut.FindAll("input[name=returnUrl]")); Assert.Empty(cut.FindAll("input[name=returnUrl]"));
} }
// Theme issue #5: the product token is isolated in a .login-product span so it
// can be styled/asserted apart from the "— sign in" suffix.
[Fact]
public void Product_is_wrapped_in_login_product_span()
{
var cut = RenderComponent<LoginCard>(p => p.Add(x => x.Product, "OtOpcUa"));
var product = cut.Find(".login-title .login-product");
Assert.Equal("OtOpcUa", product.TextContent);
Assert.Contains("sign in", cut.Find(".login-title").TextContent);
}
// Theme issue #5: Heading replaces the whole heading copy when set.
[Fact]
public void Heading_overrides_default_heading_when_set()
{
var cut = RenderComponent<LoginCard>(p => p
.Add(x => x.Product, "OtOpcUa")
.Add(x => x.Heading, "Welcome back"));
var title = cut.Find(".login-title");
Assert.Equal("Welcome back", title.TextContent);
Assert.Empty(cut.FindAll(".login-title .login-product"));
Assert.DoesNotContain("sign in", title.TextContent);
}
} }
@@ -47,6 +47,17 @@ public class NavRailTests : TestContext
Assert.NotNull(cut.Find(".rail-section-body .rail-link")); Assert.NotNull(cut.Find(".rail-section-body .rail-link"));
} }
// Theme issue #1: the <summary> mirrors the <details open> state via
// aria-expanded so tests and assistive tech have a stable, queryable attribute.
[Fact]
public void NavRailSection_summary_aria_expanded_true_when_open()
{
var cut = RenderComponent<NavRailSection>(p => p
.Add(x => x.Title, "Navigation")
.AddChildContent("<a class='rail-link'>X</a>"));
Assert.Equal("true", cut.Find("summary.rail-eyebrow-toggle").GetAttribute("aria-expanded"));
}
[Fact] [Fact]
public void NavRailSection_collapsed_when_not_expanded() public void NavRailSection_collapsed_when_not_expanded()
{ {
@@ -55,4 +66,50 @@ public class NavRailTests : TestContext
.AddChildContent("<a class='rail-link'>X</a>")); .AddChildContent("<a class='rail-link'>X</a>"));
Assert.False(cut.Find("details.rail-section").HasAttribute("open")); Assert.False(cut.Find("details.rail-section").HasAttribute("open"));
} }
// Theme issue #1: aria-expanded reflects the collapsed SSR state too.
[Fact]
public void NavRailSection_summary_aria_expanded_false_when_collapsed()
{
var cut = RenderComponent<NavRailSection>(p => p
.Add(x => x.Title, "Nav").Add(x => x.Expanded, false)
.AddChildContent("<a class='rail-link'>X</a>"));
Assert.Equal("false", cut.Find("summary.rail-eyebrow-toggle").GetAttribute("aria-expanded"));
}
[Fact]
public void NavRailSection_emits_data_nav_key_slug_from_title_by_default()
{
var cut = RenderComponent<NavRailSection>(p => p
.Add(x => x.Title, "Site Calls")
.AddChildContent("<a class='rail-link'>X</a>"));
Assert.Equal("site-calls", cut.Find("details.rail-section").GetAttribute("data-nav-key"));
}
[Fact]
public void NavRailSection_emits_explicit_key_when_supplied()
{
var cut = RenderComponent<NavRailSection>(p => p
.Add(x => x.Title, "Navigation").Add(x => x.Key, "nav")
.AddChildContent("<a class='rail-link'>X</a>"));
Assert.Equal("nav", cut.Find("details.rail-section").GetAttribute("data-nav-key"));
}
[Fact]
public void NavRailSection_whitespace_only_title_yields_empty_data_nav_key()
{
var cut = RenderComponent<NavRailSection>(p => p
.Add(x => x.Title, " ")
.AddChildContent("<a class='rail-link'>X</a>"));
Assert.Equal("", cut.Find("details.rail-section").GetAttribute("data-nav-key"));
}
[Fact]
public void NavRailSection_slug_preserves_unicode_letters()
{
var cut = RenderComponent<NavRailSection>(p => p
.Add(x => x.Title, "Café")
.AddChildContent("<a class='rail-link'>X</a>"));
Assert.Equal("café", cut.Find("details.rail-section").GetAttribute("data-nav-key"));
}
} }
@@ -35,6 +35,10 @@ public class StaticAssetsTests
public void Fonts_are_vendored(string file) => public void Fonts_are_vendored(string file) =>
Assert.True(File.Exists(Path.Combine(Wwwroot, "fonts", file))); Assert.True(File.Exists(Path.Combine(Wwwroot, "fonts", file)));
[Fact]
public void NavStateScript_ships() =>
Assert.True(File.Exists(Path.Combine(Wwwroot, "js", "nav-state.js")));
// Theme-002: .chip-idle pairs the idle background with the matching --idle // Theme-002: .chip-idle pairs the idle background with the matching --idle
// foreground token (per DESIGN-TOKENS.md), not --ink-soft. // foreground token (per DESIGN-TOKENS.md), not --ink-soft.
[Fact] [Fact]
@@ -0,0 +1,13 @@
namespace ZB.MOM.WW.Theme.Tests;
public class ThemeScriptsTests : TestContext
{
[Fact]
public void ThemeScripts_emits_nav_state_script_tag()
{
var cut = RenderComponent<ThemeScripts>();
var script = cut.Find("script");
Assert.Equal("_content/ZB.MOM.WW.Theme/js/nav-state.js", script.GetAttribute("src"));
Assert.True(script.HasAttribute("defer"));
}
}
+357
View File
@@ -0,0 +1,357 @@
# ZB.MOM.WW.Theme — Known Issues
Issues found in the `ZB.MOM.WW.Theme` kit that are best fixed **once in the kit and
re-distributed** to every consuming app, rather than worked around per-app. Found while
debugging the ScadaBridge Central UI Playwright suite against kit version **0.2.1** (the
version ScadaBridge consumed at the time).
All file references below point at the kit source under `src/ZB.MOM.WW.Theme/`.
> **RESOLVED in kit 0.3.0 (2026-06-05).** Issues 1, 2, 3, and 5 are fixed in the kit and
> redistributed; Issue 4 is an accepted, documented tradeoff (no code change). See
> [Resolution](#resolution-kit-030) below for what changed and why.
>
> **RESOLVED in kit 0.3.1 (2026-06-05).** Issue 6 (collapsible nav non-functional under
> interactive Blazor render) is fixed — CSS `display:none`-when-closed backstop +
> `MutationObserver` re-wire in `nav-state.js`. See the [Issue 6](#issue-6--collapsible-nav-is-non-functional-under-interactive-blazor-render-mode)
> resolution note. All six issues are now resolved (5 fixed, 1 accepted tradeoff).
## Summary
| # | Severity | Component | Issue | Status (0.3.0) |
|---|----------|-----------|-------|----------------|
| 1 | Medium | `NavRailSection` / `nav-state.js` | No programmatic expanded-state hook (`aria-expanded` / `data-*`) on the section toggle. | ✅ Fixed |
| 2 | Medium | `nav-state.js` | The section containing the active link is not auto-expanded on navigation. | ✅ Fixed |
| 3 | Medium | `nav-state.js` | Persistence wires once on `DOMContentLoaded`; not re-applied after Blazor enhanced navigation / dynamic re-render. | ✅ Fixed |
| 4 | Low | `NavRailSection` | Always-expanded SSR default causes a flash / layout shift of collapsed sections on load. | 📄 Accepted tradeoff (documented) |
| 5 | Low (optional) | `LoginCard` | Heading bakes the localizable `— sign in` suffix into the product title with no separate hook. | ✅ Fixed |
| 6 | High | `NavRailSection` / `nav-state.js` | Under **interactive** Blazor render mode the whole collapsible nav is non-functional: clicking a header doesn't hide items, and `nav-state.js` never wires (no aria sync, no persistence, no active-reveal). | ✅ Fixed (0.3.1) |
---
## Resolution (kit 0.3.0)
Shipped in `ZB.MOM.WW.Theme` **0.3.0** (2026-06-05) and adopted across all three apps.
- **Issue 1 — `aria-expanded` hook.** `NavRailSection.razor` now renders
`<summary class="rail-eyebrow-toggle" aria-expanded="…">`, computed from `Expanded` at SSR
time, and `nav-state.js` keeps it in sync with the native `<details open>` state on restore
and on every `toggle`. Tests/AT can now await a stable attribute instead of inferring from
child-link visibility. (bUnit:
`NavRailSection_summary_aria_expanded_true_when_open` / `…_false_when_collapsed`.)
- **Issue 2 — auto-expand the active section.** After restoring saved state, `nav-state.js`
force-opens any `details.rail-section` that contains an `a.rail-link.active`. The reveal is
**transient** — it is flagged with `data-zbnav-transient` before the open flip so the
`toggle` handler skips persistence and the user's saved collapse preference is preserved.
- **Issue 3 — re-apply after enhanced navigation.** `apply()` is now also bound to Blazor's
`enhancedload` event (`Blazor.addEventListener('enhancedload', apply)`); the per-element
`data-zbnav-initialized` guard keeps re-runs idempotent. Static-SSR consumers keep
persistence + active-reveal after enhanced navigations; interactive Server consumers (e.g.
ScadaBridge Central UI) are unaffected as before.
- **Issue 4 — SSR flash / CLS: accepted tradeoff (no code change).** The kit deliberately
keeps **client-only** persistence to stay render-mode-agnostic, so the server renders every
section `open` and JS collapses the saved-collapsed ones after first paint. The alternative —
an inline pre-paint `<head>` snippet that mutates not-yet-parsed `<details>` from
`localStorage` — adds a FOUC-script that runs against DOM that does not yet exist, for a
Low-severity cosmetic flash. We chose **not** to take on that complexity/risk. Consumers who
care about the flash for a specific layout can add their own pre-paint restore; the kit will
not ship one by default. (This paragraph is the documented decision the issue asks for.)
- **Issue 5 — `LoginCard` heading hook.** The product token is now wrapped in
`<span class="login-product">@Product</span> — sign in`, and an optional `Heading` parameter
fully replaces the heading copy when set (for localization / custom wording). Existing
`"<Product> — sign in"` assertions still pass. (bUnit:
`Product_is_wrapped_in_login_product_span` / `Heading_overrides_default_heading_when_set`.)
---
## Issue 1 — No programmatic expanded-state hook on `NavRailSection`
**Severity:** Medium · **Files:** `Components/NavRailSection.razor`, `wwwroot/js/nav-state.js`
**Symptom.** A section's open/closed state is exposed only through the native
`<details open>` boolean attribute on the `<details class="rail-section">` element. The
`<summary class="rail-eyebrow-toggle">` toggle carries no `aria-expanded` and there is no
`data-*` state attribute. E2E tests (and some older assistive tech) cannot reliably query
or await the expanded state — they must infer it from child-link visibility.
**Root cause.** `NavRailSection.razor` renders:
```razor
<details class="rail-section" open="@Expanded" data-nav-key="@ResolvedKey">
<summary class="rail-eyebrow-toggle">@Title</summary>
<div class="rail-section-body">@ChildContent</div>
</details>
```
There is no attribute that mirrors `open` in a test- or AT-stable way.
**Impact on consumers.** Every consumer's UI tests must assert collapse state indirectly
(e.g. waiting on a child link to become visible/hidden) instead of awaiting a stable
attribute. This was the proximate cause of several stale ScadaBridge nav tests. Native
`<details>`/`<summary>` is keyboard- and screen-reader-accessible by default, so this is
primarily a **testability** gap (with a modest a11y upside).
**Recommended fix.** Mirror `open` onto `aria-expanded` on the `<summary>`, kept in sync by
`nav-state.js` (set it during `apply()` from `el.open`, and update it inside the existing
`toggle` listener). This gives both tests and AT a stable, queryable attribute without
changing the CSS-only collapse mechanism.
**Verify.** After the fix, `summary.rail-eyebrow-toggle` exposes `aria-expanded="true|false"`
that flips when the section is toggled and after a reload restores saved state.
---
## Issue 2 — Active section is not auto-expanded on navigation
**Severity:** Medium · **File:** `wwwroot/js/nav-state.js`
**Symptom.** When a section is collapsed (either because the user previously collapsed it —
`localStorage` `zbnav:<key>` = `"0"` — or because a consumer sets `Expanded="false"`) and
the user navigates to a route whose link lives inside that section, the section **stays
collapsed**. The active link (`a.rail-link.active`) is present in the DOM but hidden by the
closed `<details>`, so the nav no longer shows the user where they are.
**Root cause.** `nav-state.js` only *restores saved open/closed state*; it has no concept of
the active link. Grep of the kit confirms the only "active" handling is the
`.rail-link.active` CSS rule in `wwwroot/css/layout.css` — there is no JS that opens the
section containing the active link.
**Impact on consumers.** Loss of the common "navigating into a section reveals it" behavior.
A user who collapses a section and then deep-links (or is redirected) into one of its pages
lands with the relevant nav group collapsed and the current page's link hidden. (ScadaBridge
previously had app-owned auto-expand-on-navigate; the kit cutover dropped it, and the
`NavigatingIntoCollapsedSection_AutoExpandsIt` test now fails because nothing re-expands.)
**Recommended fix.** In `nav-state.js`, after restoring saved state, force-open any section
that contains the active link:
```js
// after the saved-state restore loop, before wiring is "done":
document.querySelectorAll("details.rail-section a.rail-link.active").forEach(function (link) {
var sec = link.closest("details.rail-section");
if (sec && !sec.open) sec.open = true; // reveal the section the user is in
});
```
Run this both on initial load and after Blazor navigation (see Issue 3). Decide whether the
forced-open should also persist to `localStorage` or be a transient reveal (recommended:
transient — don't overwrite the user's saved preference).
**Verify.** Collapse a section, navigate to one of its pages (or reload directly on it):
the section opens and the active link is visible.
---
## Issue 3 — Persistence wires once and is not re-applied after navigation
**Severity:** Medium · **File:** `wwwroot/js/nav-state.js`
**Symptom.** `apply()` runs only on the initial `DOMContentLoaded` (or first script eval)
and guards each element with `data-zbnav-initialized`. Under Blazor **static SSR enhanced
navigation** — or any dynamic re-render that replaces the `<details>` nodes — newly inserted
sections are never wired: their saved state is not restored and their toggles are not
persisted. The active-section logic from Issue 2 would likewise not re-run.
**Root cause.** The script self-invokes once:
```js
if (document.readyState === "loading")
document.addEventListener("DOMContentLoaded", apply);
else
apply();
```
There is no hook for Blazor's enhanced-navigation lifecycle and no `MutationObserver`.
**Impact on consumers.** Static-SSR consumers (the kit explicitly targets "works in static
SSR") lose nav persistence after the first enhanced navigation. **Interactive Blazor Server
consumers (such as ScadaBridge Central UI) are largely unaffected**, because the rail is
prerendered once and then patched in place over the SignalR circuit, so the original
`<details>` elements and their listeners survive — which is why ScadaBridge's persistence
appears to work today. The kit should still be correct for its static-SSR audience.
**Recommended fix.** Also re-run `apply()` on Blazor's enhanced-load event (and keep the
per-element init guard so it stays idempotent):
```js
if (window.Blazor && Blazor.addEventListener) {
Blazor.addEventListener('enhancedload', apply);
}
```
Optionally add a `MutationObserver` on the rail container as a framework-agnostic backstop.
**Verify.** On a static-SSR host, expand/collapse a section, perform an enhanced navigation
to another page and back, and confirm the saved state is still restored and toggles still
persist.
---
## Issue 4 — Always-expanded SSR default flashes / shifts layout on load
**Severity:** Low · **File:** `Components/NavRailSection.razor`
**Symptom.** `NavRailSection.Expanded` defaults to `true`, so every section renders `open`
in the server HTML. `nav-state.js` only collapses the saved-collapsed sections *after* the
script runs, producing a brief flash of fully-expanded nav and a layout shift (CLS) on each
load for users who keep sections collapsed.
**Root cause.** State lives in `localStorage` and is applied by JS post-render, while the
server-rendered default is unconditionally expanded. The server has no knowledge of the
saved state at render time.
**Impact on consumers.** Cosmetic flash / minor CLS on initial load; more noticeable with
many sections collapsed.
**Recommended fix (pick one).**
- Inline a tiny restore snippet in `<head>` (via `ThemeHead`) that sets each `<details>`'s
`open` from `localStorage` before first paint; or
- Accept the tradeoff and document it (the kit deliberately chose client-only persistence to
stay render-mode-agnostic).
**Verify.** With several sections saved-collapsed, reload and confirm no expanded-then-collapse
flash.
---
## Issue 5 — `LoginCard` heading couples the product name and "— sign in" (optional)
**Severity:** Low (optional) · **File:** `Components/LoginCard.razor`
**Symptom.** The card heading is `<h1 class="login-title">@Product &mdash; sign in</h1>`.
The (localizable) `— sign in` suffix is baked into the product title with no separate hook,
so consumers can't restyle/override the heading copy or assert the product token in isolation
without string-matching the whole heading.
**Impact on consumers.** Minor: per-app heading customization and exact-text test assertions
are awkward (must match `"<Product> — sign in"` rather than the product alone).
**Recommended fix (optional).** Wrap the product in a span and/or expose an override:
```razor
<h1 class="login-title"><span class="login-product">@Product</span> &mdash; sign in</h1>
```
or add an optional `Heading` parameter that, when set, replaces the default heading entirely.
---
## Issue 6 — Collapsible nav is non-functional under interactive Blazor render mode
**Severity:** High · **Files:** `Components/NavRailSection.razor`, `wwwroot/js/nav-state.js`,
`wwwroot/css/layout.css` · **Status:** ✅ Fixed in 0.3.1
> **Resolution (kit 0.3.1, 2026-06-05).** Both recommended parts shipped, so the collapsible
> nav now works under interactive render modes as well as static SSR:
> 1. **CSS robust collapse** — `layout.css` hides the body explicitly when closed instead of
> relying on the native `::details-content` content-hiding (which an interactive framework
> desyncs): `.rail-section:not([open]) > .rail-section-body { display: none; }`.
> 2. **Render-mode-agnostic re-wire** — `nav-state.js` adds a `MutationObserver` on
> `document.documentElement` (childList + subtree) that re-runs `apply()` whenever
> `details.rail-section` nodes are added/replaced, so the interactive runtime's re-render
> gets wired (aria sync, `data-zbnav-initialized`, localStorage persistence, active-reveal).
> The existing `enhancedload` hook (Issue 3) is kept for static-SSR enhanced navigation.
>
> Verified live in ScadaBridge Central UI (global `@rendermode InteractiveServer`): the
> Playwright `NavCollapseTests` (toggle-hides-items, persistence-survives-reload,
> deep-link-auto-reveal) now pass against 0.3.1.
> **This corrects Issue 3's note**, which claimed interactive Blazor Server consumers are
> "largely unaffected because the rail is patched in place." Direct observation of the live
> ScadaBridge Central UI (global `@rendermode InteractiveServer`) shows that is **false** —
> the kit's `<details>`/JS nav does not work under interactive render modes at all.
**Symptom.** In an app that renders the rail under an interactive render mode
(`InteractiveServer`, `InteractiveWebAssembly`, or `InteractiveAuto`), the collapsible nav is
visually and functionally dead:
1. Clicking a section header toggles the chevron (▶/▼) but **does not hide the section's
items** — the links stay fully visible under a "collapsed" chevron.
2. `aria-expanded` never changes, `localStorage` is never written, the saved state is not
restored on reload, and the active-section auto-reveal (Issue 2) does not fire.
**Root cause.** The kit nav is a **static-SSR / CSS-only** design (NavRailSection's own
comment: *"works in static SSR"*). Under an interactive render mode, Blazor's runtime
**owns and re-renders the `<details>`/`<summary>` DOM** after it adopts the prerendered
markup. Two independent consequences, both observed live:
- **Native collapse is defeated.** On the live page a closed section has `details.open === false`
and its `::details-content` computes `content-visibility: hidden`, yet the
`.rail-section-body` and its links remain laid out and visible (measured non-zero height /
non-null `offsetParent`). Blazor's management of the native `<details>` desyncs the browser's
built-in content-hiding. The body's `display` value (flex/block/grid/inline) makes no
difference — only an explicit `display: none` actually hides it.
- **`nav-state.js` never wires the live DOM.** The interactive `<details>` elements have **no
`data-zbnav-initialized` attribute**, i.e. `wire()` never ran on them: `apply()` runs on
`DOMContentLoaded` against the *prerendered* nodes, which Blazor then replaces, and the only
re-run hook (`enhancedload`, added for Issue 3) does not fire under interactive render modes.
So aria sync, localStorage persistence, and active-reveal are all inert.
This matters for the kit's stated goal: per the normalization notes, nav-expand persistence was
promoted into the kit at 0.2.0 *"so all three apps share one persistence mechanism."* One of the
three consumers (ScadaBridge Central UI) is interactive Blazor Server, where that mechanism
silently does nothing.
**Verified evidence (live, global InteractiveServer).** On a logged-in dashboard:
`data-zbnav-initialized` absent on every `details.rail-section`; after clicking a header,
`details.open === false` but the section's link still reports `clientHeight: 33` and a non-null
`offsetParent`; setting `.rail-section-body { display:none }` is the only thing that hides it;
`localStorage` has no `zbnav:*` keys before or after toggling.
**Recommended fix (two parts — both belong in the kit).**
1. **Make the collapse render-mode-robust (CSS).** Don't rely solely on the native
`<details>` content-hiding; hide the body explicitly when closed:
```css
/* layout.css — robust across render modes; native ::details-content hiding
is unreliable once an interactive framework manages the <details>. */
.rail-section:not([open]) > .rail-section-body { display: none; }
```
(Verified live: this is exactly what hides the items.)
2. **Make persistence/aria/reveal work under interactive render.** `enhancedload` is
static-SSR-only; also wire after the interactive runtime has (re)rendered. Options, in
preference order:
- Re-run `apply()` from Blazor's post-render hooks — `Blazor.addEventListener('afterStarted', …)`
(interactive WASM/Server boot) and re-apply on circuit/render updates; and/or
- Add a `MutationObserver` on the rail container that calls `apply()` when
`details.rail-section` nodes are added/replaced (framework-agnostic backstop — covers
interactive re-renders, enhanced nav, and dynamic nav alike);
- **Or** ship an explicitly **interactive** `NavRailSection` variant (a small Blazor
component with an `@onclick` toggle and `[Parameter] bool Expanded` two-way state) for
consumers that render interactively — which is what NavRailSection's own comment already
gestures at (*"Apps that want cookie-persisted expand state keep their own interactive
NavSection"*). If the kit's intent is that interactive apps bring their own section
component, say so loudly in the docs and have the CSS-only one degrade gracefully (part 1
still applies so at least the visual collapse works).
**Verify.** In an interactive-render host: clicking a header hides the section's items; the
summary's `aria-expanded` flips; `localStorage` gets a `zbnav:<key>` entry; the state survives
a reload; and deep-linking into a collapsed section reveals it.
**Consumer note (ScadaBridge).** Resolved on 0.3.1: ScadaBridge's Central UI consumes
`ZB.MOM.WW.Theme` 0.3.1, and the Playwright `NavCollapseTests` (toggling, persistence,
auto-reveal) now pass — the `NavCollapseWiredAsync` gate (which waits for
`data-zbnav-initialized` on every `details.rail-section`) is satisfied under interactive
render, so those tests run unskipped and green.
---
## Not kit bugs — expected consumer adaptations
For the avoidance of doubt, the following are **not** theme issues; they are the normal cost
of adopting the kit and belong in each consumer's own tests/markup:
- Login markup moved from a hand-rolled `<h4>ScadaBridge</h4>` + `Sign In` button to the
kit's `LoginCard` (`h1.login-title` reading `"<Product> — sign in"`, button labelled
`Sign in`). Consumers must update selectors/text accordingly.
- Nav moved from app-owned `button.nav-section-toggle` + `aria-expanded` + a
`scadabridge_nav` cookie to the kit's `<details.rail-section>` + `<summary>` + `localStorage`
(`zbnav:<key>`). Collapsed sections now **keep their children in the DOM** (hidden), and
sections default to **expanded**, not collapsed — so DOM-presence-based "hidden" assertions
and "collapsed by default" assumptions must be rewritten around visibility and the
`<details open>` state.
These are being handled in the ScadaBridge Playwright suite separately.
+19317
View File
File diff suppressed because it is too large Load Diff
+56
View File
@@ -0,0 +1,56 @@
Northwind Consumer Products — Unified Namespace
(generated from Galaxy DESKTOP-6JL3KKO\DEV; 40 machines, 1036 signals)
northwind
└─ birmingham
├─ filling/ (Filling & Capping; from Galaxy TestArea)
│ ├─ line-1/
│ │ ├─ rinser-01 [krones Hydra Srs3] ← TestMachine_001 (28 signals)
│ │ ├─ filler-02 [sidel SF300 Srs5] ← TestMachine_002 (28 signals)
│ │ ├─ capper-03 [khs Innofill Srs4] ← TestMachine_003 (28 signals)
│ │ ├─ labeler-04 [krones Contiroll Srs3] ← TestMachine_004 (28 signals)
│ │ ├─ inspector-05 [antares-vision Vmax Srs2] ← TestMachine_005 (28 signals)
│ │ ├─ coder-06 [videojet 1580 Srs4] ← TestMachine_006 (28 signals)
│ │ └─ rinser-07 [krones Hydra Srs3] ← TestMachine_007 (28 signals)
│ ├─ line-2/
│ │ ├─ rinser-08 [krones Hydra Srs3] ← TestMachine_008 (28 signals)
│ │ ├─ filler-09 [sidel SF300 Srs2] ← TestMachine_009 (28 signals)
│ │ ├─ capper-10 [khs Innofill Srs3] ← TestMachine_010 (28 signals)
│ │ ├─ labeler-11 [krones Contiroll Srs4] ← TestMachine_011 (28 signals)
│ │ ├─ inspector-12 [antares-vision Vmax Srs5] ← TestMachine_012 (28 signals)
│ │ └─ coder-13 [videojet 1580 Srs4] ← TestMachine_013 (28 signals)
│ └─ line-3/
│ ├─ rinser-14 [krones Hydra Srs4] ← TestMachine_014 (28 signals)
│ ├─ filler-15 [sidel SF300 Srs4] ← TestMachine_015 (28 signals)
│ ├─ capper-16 [khs Innofill Srs4] ← TestMachine_016 (28 signals)
│ ├─ labeler-17 [krones Contiroll Srs4] ← TestMachine_017 (28 signals)
│ ├─ inspector-18 [antares-vision Vmax Srs4] ← TestMachine_018 (28 signals)
│ └─ coder-19 [videojet 1580 Srs5] ← TestMachine_019 (28 signals)
├─ blending/ (Blending & CIP; from Galaxy TestArea2)
│ └─ cip-1/
│ └─ blender-20 [spx-flow APV-R5 Srs4] ← TestMachine_020 (24 signals)
└─ packaging/ (Packaging & Palletizing; from Galaxy TestArea3)
├─ pack-1/
│ ├─ cartoner-21 [marchesini MC820 Srs2] ← TestMachine_021 (24 signals)
│ ├─ case-packer-22 [bosch Elematic Srs4] ← TestMachine_022 (24 signals)
│ ├─ palletizer-23 [fanuc M410 Srs5] ← TestMachine_023 (24 signals)
│ ├─ stretch-wrapper-24 [lantech Q300 Srs4] ← TestMachine_024 (24 signals)
│ └─ checkweigher-25 [mettler-toledo C3570 Srs2] ← TestMachine_025 (24 signals)
├─ pack-2/
│ ├─ cartoner-26 [marchesini MC820 Srs2] ← TestMachine_026 (24 signals)
│ ├─ case-packer-27 [bosch Elematic Srs5] ← TestMachine_027 (24 signals)
│ ├─ palletizer-28 [fanuc M410 Srs5] ← TestMachine_028 (24 signals)
│ ├─ stretch-wrapper-29 [lantech Q300 Srs4] ← TestMachine_029 (24 signals)
│ └─ checkweigher-30 [mettler-toledo C3570 Srs5] ← TestMachine_030 (24 signals)
├─ pack-3/
│ ├─ cartoner-31 [marchesini MC820 Srs5] ← TestMachine_031 (24 signals)
│ ├─ case-packer-32 [bosch Elematic Srs5] ← TestMachine_032 (24 signals)
│ ├─ palletizer-33 [fanuc M410 Srs5] ← TestMachine_033 (24 signals)
│ ├─ stretch-wrapper-34 [lantech Q300 Srs4] ← TestMachine_034 (24 signals)
│ └─ checkweigher-35 [mettler-toledo C3570 Srs2] ← TestMachine_035 (24 signals)
└─ pack-4/
├─ cartoner-36 [marchesini MC820 Srs4] ← TestMachine_036 (24 signals)
├─ case-packer-37 [bosch Elematic Srs3] ← TestMachine_037 (24 signals)
├─ palletizer-38 [fanuc M410 Srs3] ← TestMachine_038 (24 signals)
├─ stretch-wrapper-39 [lantech Q300 Srs2] ← TestMachine_039 (24 signals)
└─ checkweigher-40 [mettler-toledo C3570 Srs5] ← TestMachine_040 (24 signals)
+13 -4
View File
@@ -3,10 +3,19 @@
Divergence of each project from [`spec/SPEC.md`](spec/SPEC.md), and the ordered backlog to Divergence of each project from [`spec/SPEC.md`](spec/SPEC.md), and the ordered backlog to
reach the shared `ZB.MOM.WW.Audit` library. Status legend: ⛔ gap · 🟡 partial · ✅ matches. reach the shared `ZB.MOM.WW.Audit` library. Status legend: ⛔ gap · 🟡 partial · ✅ matches.
> **Adoption is deferred this round.** The library is being designed (shared contract in > **✅ ADOPTED 2026-06-02 (local-only) — DEEP.** The backlog (#1#6) was implemented across all three apps on each repo's
> [`shared-contract/ZB.MOM.WW.Audit.md`](shared-contract/ZB.MOM.WW.Audit.md)) but is not yet > **`feat/adopt-zb-audit`** branch (stacked on `feat/adopt-zb-auth`) — committed + spec/code-reviewed, then **merged to
> wired into any app — exactly where `ZB.MOM.WW.Auth` and `ZB.MOM.WW.Theme` sit today. > each repo's local default (main/master) and PUSHED to origin (gitea) on 2026-06-03** (in sync). The user chose **DEEP adopt**:
> The items below are the follow-on work; each lands as a separate PR per project. > the canonical 9-field `AuditEvent` is the record EVERYWHERE
> (domain fields ride in `DetailsJson`), so the §1 "keep own record" framing below was superseded. OtOpcUa: canonical
> record + `AuditWriterActor : IAuditWriter` + `Outcome` col/migration + `ClusterAudit` fix. MxGateway: canonical SQLite
> `audit_event` store + `IAuditWriter` + `IApiKeyAuditStore`→canonical adapter. **ScadaBridge: a full audit-subsystem
> re-architecture** (codec + site `audit_event`/`audit_forward_state` sidecar + central partitioned-table collapse to
> 10 canonical + persisted computed cols, MSSQL-verified). §5 (Actor→Auth principal) wired via per-app
> `IAuditActorAccessor` (Phase 3). The Task 2.0 gate found this doc's pre-adoption framing was partly stale (MxGateway's
> store had moved into the lib; OtOpcUa's structured path was dormant; ScadaBridge's filter was typed to its own record).
> Detail: `docs/plans/2026-06-02-auth-audit-normalization-phase2-deep.md` + `…-scadabridge-audit-rearch.md`. The
> ⛔/🟡 cells below describe the PRE-adoption divergence (kept for history).
## Divergence vs spec ## Divergence vs spec
+9
View File
@@ -3,6 +3,15 @@
Divergence of each project from [`spec/SPEC.md`](spec/SPEC.md), and the ordered backlog to Divergence of each project from [`spec/SPEC.md`](spec/SPEC.md), and the ordered backlog to
reach the shared `ZB.MOM.WW.Auth` library. Status legend: ⛔ gap · 🟡 partial · ✅ matches. reach the shared `ZB.MOM.WW.Auth` library. Status legend: ⛔ gap · 🟡 partial · ✅ matches.
> **✅ ADOPTED 2026-06-02 (local-only).** The full backlog (#1#8) was implemented across all three apps on each repo's
> **`feat/adopt-zb-auth`** branch — committed + spec/code-reviewed, then **merged to each repo's local default
> (main/master) and PUSHED to origin (gitea) on 2026-06-03** (in sync; `feat/*` kept locally). Shared
> `Auth.Ldap` + `Auth.ApiKeys` (ScadaBridge inbound re-architected to keyId/Bearer), `IGroupRoleMapper<TRole>`,
> `Transport`-enum config, canonical `ZbClaimTypes`/`ZbCookieDefaults`, unified dev base DN `dc=zb,dc=local`, and the
> canonical-six roles (with ScadaBridge's accepted auditor/admin SoD collapse). Consumer pins: OtOpcUa `0.1.1`,
> MxGateway `0.1.2`, ScadaBridge `0.1.3`. Detail: `docs/plans/2026-06-02-auth-audit-normalization*.md`. The ⛔/🟡 cells
> below describe the PRE-adoption divergence (kept for history).
## Divergence vs spec ## Divergence vs spec
### §1 LDAP config schema ### §1 LDAP config schema
@@ -99,7 +99,10 @@ public interface IApiKeyStore { // default: SQLite (hash, scope
Task<ApiKeyRecord?> FindByKeyIdAsync(string keyId, CancellationToken ct); Task<ApiKeyRecord?> FindByKeyIdAsync(string keyId, CancellationToken ct);
Task MarkUsedAsync(string keyId, CancellationToken ct); Task MarkUsedAsync(string keyId, CancellationToken ct);
} }
public interface IApiKeyAdminStore { /* create / revoke / rotate / delete + audit */ } public interface IApiKeyAdminStore { /* create / revoke / rotate / delete + audit */
Task<bool> SetScopesAsync(string keyId, IReadOnlySet<string> scopes, CancellationToken ct); // 0.1.3: replace scope set; secret untouched
Task<bool> SetEnabledAsync(string keyId, bool enabled, DateTimeOffset whenUtc, CancellationToken ct); // 0.1.3: reversible enable/disable toggle; secret untouched
}
``` ```
- Constraints are carried as an **opaque `object`** (project supplies the policy: mxaccessgw - Constraints are carried as an **opaque `object`** (project supplies the policy: mxaccessgw
@@ -107,6 +110,22 @@ public interface IApiKeyAdminStore { /* create / revoke / rotate / delete + audi
parse→lookup→peppered-HMAC→constant-time-compare→audit pipeline; it does **not** interpret constraints. parse→lookup→peppered-HMAC→constant-time-compare→audit pipeline; it does **not** interpret constraints.
- Ships the `apikey` admin verbs as a reusable command set. - Ships the `apikey` admin verbs as a reusable command set.
### 0.1.3 admin additions
`0.1.3` adds **editable scopes** and a **reversible enable/disable toggle** with **no schema
change** (still `CurrentVersion = 2`). Both land on `IApiKeyAdminStore` and the
`ApiKeyAdminCommands` facade:
- `IApiKeyAdminStore.SetScopesAsync(keyId, scopes, ct)` — replaces a key's scope set; never
touches the secret. Returns `false` if the key is unknown.
- `IApiKeyAdminStore.SetEnabledAsync(keyId, enabled, whenUtc, ct)` — clears (`enabled: true`) or
sets (`enabled: false`) `revoked_utc` regardless of current state; leaves `secret_hash` and
`last_used_utc` untouched (the distinction from rotate). Returns `false` if the key is unknown.
- `ApiKeyAdminCommands.SetScopesAsync(...)` — audited `set-scopes` verb (records scope **count**,
not contents); returns `KeyActionResult`.
- `ApiKeyAdminCommands.SetEnabledAsync(...)` — audited `enable-key` / `disable-key` verb;
returns `KeyActionResult`.
## `ZB.MOM.WW.Auth.AspNetCore` ## `ZB.MOM.WW.Auth.AspNetCore`
- Canonical `ClaimTypes` constants (name, display, username, role, scope-id). - Canonical `ClaimTypes` constants (name, display, username, role, scope-id).
+8 -6
View File
@@ -1,12 +1,14 @@
# Configuration validation — gaps & adoption backlog # Configuration validation — gaps & adoption backlog
Divergence of each project from [`spec/SPEC.md`](spec/SPEC.md), and the ordered backlog to adopt Divergence of each project from [`spec/SPEC.md`](spec/SPEC.md), and the ordered backlog to adopt
the shared `ZB.MOM.WW.Configuration` library. The library is **BUILT @ 0.1.0** (27 tests) at the shared `ZB.MOM.WW.Configuration` library. The library is **BUILT @ 0.1.0** (42 tests) at
[`../../ZB.MOM.WW.Configuration/`](../../ZB.MOM.WW.Configuration/) but **NOT YET ADOPTED** by any [`../../ZB.MOM.WW.Configuration/`](../../ZB.MOM.WW.Configuration/) and was **ADOPTED across all three
app — so every item below is an *adoption* item, not a library-build item. This mirrors the Auth / apps on 2026-06-01** — published to the Gitea feed, then consumed on each repo's local default branch
UI-Theme / Health pattern: the shared library is built first; adoption is opt-in and tracked here, (merged, **not yet pushed** to remotes). The adoption items below are now largely closed: MxGateway +
not forced. (Unlike the observability pass, there is **no in-pass sister-repo adoption** in this ScadaBridge migrated to `OptionsValidatorBase`/`AddValidatedOptions` behaviour-preservingly (validator
release.) messages byte-identical), ScadaBridge's `StartupValidator``ConfigPreflight`, and OtOpcUa gained
net-new `Ldap`/`OpcUa` validators (plus a follow-on pass: real `Security:Ldap` binding, `ValidateOnStart`
wired for ScadaBridge Cluster/HealthMonitoring, and assorted hardening).
Status legend: ⛔ gap · 🟡 partial · ✅ matches. Status legend: ⛔ gap · 🟡 partial · ✅ matches.
+55
View File
@@ -181,3 +181,58 @@ app is opt-in and tracked here, not forced.
unit migration (Gap U1) and the Meter rename (Gap N1) are deferred from the initial MxGateway unit migration (Gap U1) and the Meter rename (Gap N1) are deferred from the initial MxGateway
adoption (Task #9). They are breaking dashboard/alert changes requiring ops coordination and adoption (Task #9). They are breaking dashboard/alert changes requiring ops coordination and
are tracked as separate backlog items #6 and #7 in the adoption backlog above. are tracked as separate backlog items #6 and #7 in the adoption backlog above.
## Adoption status — 2026-06-01 (DONE)
`ZB.MOM.WW.Telemetry` + `ZB.MOM.WW.Telemetry.Serilog` (`0.1.0`) were adopted across **all three**
sister apps in one pass, behaviour-preserving. Each adoption landed on a per-repo branch
`feat/adopt-zb-telemetry` (one commit per task). Plan + design:
[`docs/plans/2026-06-01-telemetry-library-adoption.md`](../../docs/plans/2026-06-01-telemetry-library-adoption.md).
> **Correction:** the prior claim that *"MxAccessGateway logging was adopted (MEL → Serilog) on its
> own branch"* was **false on `main`** — MxGateway was still MEL-only, and its `MxGateway.Server`
> meter was never exported. The full MEL→Serilog migration **and** the metrics export both landed
> in this 2026-06-01 pass.
| Repo | `AddZbTelemetry` (Resource + std instrumentation + Prometheus) | `/metrics` | Logging | Meter (unchanged) |
|---|---|---|---|---|
| **OtOpcUa** | ✅ replaced hand-rolled `ObservabilityExtensions` | ✅ `/metrics` (path unchanged) | ✅ `AddZbSerilog` (sinks moved to `appsettings`; `LogContextEnricher` kept) | `ZB.MOM.WW.OtOpcUa` |
| **ScadaBridge** | ✅ added in `BindSharedOptions` (both Central + Site roots) | ✅ Central; mapped on Site too (see follow-on) | ⚠️ **kept `LoggerConfigurationFactory`** + added shared `TraceContextEnricher` — did **not** adopt `AddZbSerilog` | (none yet; #9) |
| **MxAccessGateway** | ✅ exports existing `GatewayMetrics` | ✅ new `/metrics` | ✅ MEL→`AddZbSerilog`; `GatewayLogRedactor` exposed via `ILogRedactor` seam (`GatewayLogRedactorSeam`); `GatewayLogScope`/middleware kept as-is | `MxGateway.Server` (name + `ms` units unchanged) |
### Accepted scope decisions (deviations from the original backlog)
- **ScadaBridge keeps `LoggerConfigurationFactory` (backlog #5 revised).** The factory implements a
documented governance contract (REQ-HOST-8 / Host-011/014/020/022): `ScadaBridge:Logging:MinimumLevel`
is the floor and **overrides** `Serilog:MinimumLevel`, with operator warnings. `AddZbSerilog`
hard-codes `MinimumLevel.Is(Information)` before `ReadFrom.Configuration`, which would invert that
precedence and silently drop the knob. So ScadaBridge keeps the factory and only **adds the shared
`TraceContextEnricher`** to it — gaining trace↔log correlation without regressing the contract. Full
`AddZbSerilog` adoption for ScadaBridge would first require teaching the shared bootstrap to accept a
caller-supplied minimum-level governance hook.
- **MxGateway keeps `GatewayLogScope` + request-logging middleware as-is.** The Serilog MEL provider
captures MEL `BeginScope` dictionaries as structured properties, so the scope/correlation code keeps
producing the same properties under Serilog. Only the provider swap + the `ILogRedactor` adapter were
needed.
## Follow-ons — DONE 2026-06-01
All the deferred follow-ons were then executed (branch `feat/telemetry-followons` per repo,
behaviour-preserving except the intentional, no-consumer-yet metric-shape change in #6/#7). Plan:
[`docs/plans/2026-06-01-telemetry-followons.md`](../../docs/plans/2026-06-01-telemetry-followons.md).
| Item | Status | What landed |
|---|---|---|
| **#6** MxGateway histogram `ms``s` | ✅ | 3 histograms record `.TotalSeconds`, unit `"s"`. Safe — never Prometheus-exported before, so no dashboards broke. |
| **#7** Meter rename → `ZB.MOM.WW.MxGateway` | ✅ | `GatewayMetrics.MeterName` renamed; `docs/Metrics.md` synced. |
| **#9** ScadaBridge app instruments | ✅ | `ScadaBridgeTelemetry` meter (`ZB.MOM.WW.ScadaBridge`) + first 4: `deployments.applied` (counter), `store_and_forward.queue.depth` (sync-safe cached gauge), `inbound_api.requests` (counter, bounded `method` tag), `site.connection.up` (balanced open/close gauge). |
| **#10/#11** OTLP opt-in | ✅ | All 3 apps read `<App>:Telemetry:Exporter` (`Prometheus`\|`Otlp`) + `:OtlpEndpoint`, default Prometheus. Setting OTLP also exports OtOpcUa's spans (resolves the trace no-op) — once a collector endpoint is configured. |
| **Site-node `/metrics` scrape** | ✅ | ScadaBridge `NodeOptions.MetricsPort` (default **8084**, avoids the site `RemotingPort=8082` collision) + a second `Http1AndHttp2` Kestrel listener on the Site role; `StartupValidator` enforces MetricsPort ≠ Remoting/Grpc. |
| Serilog version drift | ✅ | OtOpcUa `Serilog.AspNetCore`/`.Extensions.Hosting`/`.Settings.Configuration` aligned to `10.0.0` (family-consistent). |
**Still open (not code — operational/future):**
- **OTLP is opt-in but unexercised** until an OTel collector endpoint is deployed and the
`<App>:Telemetry:Exporter=Otlp` + `:OtlpEndpoint` config is set. The wiring is in place; only a
collector is missing.
- **Further ScadaBridge instruments** beyond the first 4 are additive future work (not blocking).
+14 -7
View File
@@ -40,16 +40,20 @@ Serilog with the same options as enricher properties and adds `TraceContextEnric
`node.role`) populates both the OTel Resource and the Serilog enrichers, so a metric, a span, and `node.role`) populates both the OTel Resource and the Serilog enrichers, so a metric, a span, and
a log line from the same node carry identical dimensions and join up in a backend. a log line from the same node carry identical dimensions and join up in a backend.
One adoption happens **in this task**: MxAccessGateway migrates off MEL onto `AddZbSerilog`. All **Adopted across all three apps on 2026-06-01** (branch `feat/adopt-zb-telemetry` per repo,
other app wiring is follow-on, consistent with how Auth and UI-Theme are structured. behaviour-preserving). Note: MxAccessGateway's MEL→Serilog migration was *not* actually done at
library-build time despite an earlier claim — it landed in this adoption pass, along with the
metrics export. See [`GAPS.md` → Adoption status — 2026-06-01](GAPS.md) for the per-repo result,
the accepted scope decisions (ScadaBridge keeps `LoggerConfigurationFactory`; MxGateway keeps its
log-scope code), and the deferred follow-ons.
## Status by project ## Status by project
| Project | OTel SDK today | Metrics today | Tracing today | Logging today | Enrichers today | Adoption status | | Project | OTel SDK today | Metrics today | Tracing today | Logging today | Enrichers today | Adoption status |
|---|---|---|---|---|---|---| |---|---|---|---|---|---|---|
| **OtOpcUa** | ✅ full SDK (`WithMetrics`+`WithTracing`) | ✅ 7 instruments (`otopcua.*`); Prometheus `/metrics` | 🟡 2 spans defined; no exporter | Serilog (Console+File) | `DriverInstanceId`/`DriverType`/`CapabilityName`/`CorrelationId` (driver-scope) | Not started (follow-on) | | **OtOpcUa** | ✅ full SDK via `AddZbTelemetry` | ✅ 7 instruments (`otopcua.*`); Prometheus `/metrics` | 🟡 2 spans defined; no exporter | Serilog via `AddZbSerilog` (sinks in `appsettings`) | `DriverInstanceId`/`DriverType`/`CapabilityName`/`CorrelationId` (driver-scope, kept) + shared | ✅ **Adopted 2026-06-01** |
| **MxAccessGateway** | ⛔ none (hand-rolled `Meter`) | 🟡 20 instruments (`mxgateway.*`); **never exported** | ⛔ none | **Serilog (migrated from MEL — adopted)** | `SiteId`/`NodeRole`/`NodeHostname` (via `AddZbSerilog`); session/worker enrichers via `LogContext.PushProperty` | **Logging adopted; OTel metrics/traces follow-on** | | **MxAccessGateway** | `AddZbTelemetry` exports `GatewayMetrics` | 20 instruments (`mxgateway.*`) now exported; new `/metrics` | ⛔ none | **Serilog (migrated from MEL in this pass)** | `SiteId`/`NodeRole`/`NodeHostname` via `AddZbSerilog`; `GatewayLogScope` kept; `ILogRedactor` seam | ✅ **Adopted 2026-06-01** |
| **ScadaBridge** | ⛔ (`OpenTelemetry.Api` CVE-patch only) | ⛔ zero instruments | ⛔ none | Serilog (Console+File) | `SiteId`/`NodeRole`/`NodeHostname` (process-level; strongest set) | Not started (follow-on) | | **ScadaBridge** | `AddZbTelemetry` (both roots) | ✅ Resource + std instrumentation; `/metrics` (Central) | ⛔ none | Serilog via `LoggerConfigurationFactory` (kept) + shared `TraceContextEnricher` | `SiteId`/`NodeRole`/`NodeHostname` (process-level) + trace context | ✅ **Adopted 2026-06-01** (logging via factory, not `AddZbSerilog` — see GAPS) |
See each project's [`current-state/<project>/CURRENT-STATE.md`](current-state/) for the See each project's [`current-state/<project>/CURRENT-STATE.md`](current-state/) for the
code-verified detail and its adoption plan. code-verified detail and its adoption plan.
@@ -100,8 +104,11 @@ hinge that makes a metric, a span, and a log line from the same node carry ident
## Component status ## Component status
**Status: Built @ 0.1.0. MxAccessGateway MEL → Serilog logging adopted (on its own branch). **Status: Built @ 0.1.0 and published to the Gitea NuGet feed. Adopted across all three apps on
OtOpcUa and ScadaBridge telemetry adoption is follow-on, tracked in [`GAPS.md`](GAPS.md).** 2026-06-01** (OtOpcUa, MxAccessGateway, ScadaBridge — branch `feat/adopt-zb-telemetry` per repo).
The MxAccessGateway MEL→Serilog migration and metrics export both landed in this pass (they were
not actually done beforehand despite an earlier claim). Per-repo result + deferred follow-ons:
[`GAPS.md` → Adoption status — 2026-06-01](GAPS.md).
The shared library lives at The shared library lives at
[`~/Desktop/scadaproj/ZB.MOM.WW.Telemetry/`](../../ZB.MOM.WW.Telemetry/) (.NET 10; 2 packages — [`~/Desktop/scadaproj/ZB.MOM.WW.Telemetry/`](../../ZB.MOM.WW.Telemetry/) (.NET 10; 2 packages —
+32
View File
@@ -3,6 +3,38 @@
Divergence of each project from [`spec/SPEC.md`](spec/SPEC.md), and the ordered backlog to Divergence of each project from [`spec/SPEC.md`](spec/SPEC.md), and the ordered backlog to
reach adoption of the `ZB.MOM.WW.Theme` shared RCL. Status legend: ⛔ gap · 🟡 partial · ✅ matches. reach adoption of the `ZB.MOM.WW.Theme` shared RCL. Status legend: ⛔ gap · 🟡 partial · ✅ matches.
> **✅ ADOPTED 2026-06-03 (local-only).** Backlog #2#4 implemented across all three apps on each repo's
> **`feat/adopt-zb-theme`** branch — full canonical cutover (SPEC §7): `<ThemeHead/>`/`<ThemeScripts/>`,
> thin `MainLayout``<ThemeShell>` + `NavRailItem`/`NavRailSection`, per-app `theme.css`/IBM-Plex fonts/
> `nav-state.js` deleted, `<LoginCard>` sign-in, and `StatusPill` (OtOpcUa's dead `StatusBadge` deleted;
> MxGateway's `StatusBadge` redirected to a thin `StatusPill` adapter; inline domain `.chip-*` kept as page
> content per §6). **Library first enhanced to `0.2.0`** — nav-expand persistence promoted INTO the kit
> (`NavRailSection.Key``data-nav-key` + a localStorage `nav-state.js` enhancer emitted by a new
> `<ThemeScripts/>`), so all three apps get uniform persistence from one source (OtOpcUa's bespoke
> cookie/JS-interop nav island retired). 0.2.0 published to the Gitea feed; 44 bUnit tests. **MxGateway
> additionally gained a net-new Blazor `<LoginCard>` `/login` page** reusing its existing hardened
> `POST /login` endpoint (antiforgery + `SanitizeReturnUrl` + `SignInAsync` preserved). Every task spec+code
> reviewed (high-risk via serial spec→code; the MxGateway login via an Opus security review), then
> **fast-forward-merged into each repo's local default and PUSHED to origin (gitea) 2026-06-03** (in sync;
> `feat/*` kept locally): OtOpcUa `master`@`11de14d`, ScadaBridge `main`@`58352a6`, MxGateway `main`@`73e54e2`.
> Plan: `docs/plans/2026-06-03-ui-theme-adoption*.md`. The ⛔/🟡 cells below describe the PRE-adoption
> divergence (kept for history).
>
> **Post-adoption CSS prune (2026-06-03, branch `chore/theme-css-prune` per app).** An audit found each app's
> kept `site.css` still carried the old shell CSS the kit now owns — broader than first logged. Pruned:
> **OtOpcUa** shed a near-verbatim copy of the kit's `layout.css` (`.app-shell`/`.side-rail`/`.rail-link`/
> `.rail-foot`/`.login-*`) plus dead `#sidebar-collapse` (kit emits `#theme-rail`) and `.rail-eyebrow-chevron`
> (167 lines), keeping only app-only `.rail-eyebrow` + `.chip-alert`/`.chip-caution`; **ScadaBridge** shed the
> dead `.sidebar`/`.nav-link`/`.nav-section-toggle` block (95), keeping `#reconnect-modal`/`.script-editor-modal`;
> **MxGateway** shed the dead `.sidebar` block + orphaned `.dashboard-login`/`.login-card` (106), keeping
> `.app-bar` (still used by `/denied`) + the `.chip` override. Each verified unreferenced before removal; all
> three build clean (0 warn/0 err). OtOpcUa's copy was the notable one — it *overrode* the kit, not just dead code.
> **Still deferred:** a kit-side `layout.css` `calc(100vh - 3.3rem)` review; and ScadaBridge's `Host` consumes the
> kit only **transitively via `CentralUI`** (no direct `PackageReference`) — builds green, but an implicit dependency.
>
> _Feed note: the same audit re-confirmed `ZB.MOM.WW.Theme 0.2.0` **is** genuinely on the Gitea feed (registration
> `count:1`, package base `versions:["0.2.0"]`, search `totalHits:1`) — the publish was real, not optimism._
--- ---
## Divergence vs spec ## Divergence vs spec
@@ -1,9 +1,16 @@
# Shared library: `ZB.MOM.WW.Theme` # Shared library: `ZB.MOM.WW.Theme`
**Status: Built (`0.1.0`).** The RCL lives at **Status: Built + Published + Adopted (`0.2.0`).** The RCL lives at
[`scadaproj/ZB.MOM.WW.Theme/`](../../../ZB.MOM.WW.Theme/) — built and tested. Adoption [`scadaproj/ZB.MOM.WW.Theme/`](../../../ZB.MOM.WW.Theme/) — built, tested (44 bUnit tests), and
by the three apps is follow-on, tracked in [`../GAPS.md`](../GAPS.md). Realizes **published to the Gitea NuGet feed**. **Adopted across all three apps on 2026-06-03** — merged to each repo's
[`../spec/SPEC.md`](../spec/SPEC.md). local default and **pushed to origin (gitea)**, in sync (see [`../GAPS.md`](../GAPS.md)).
Realizes [`../spec/SPEC.md`](../spec/SPEC.md).
`0.2.0` adds **shared nav-expand persistence**: `NavRailSection` gained a `Key` parameter (emitted as
`data-nav-key`, defaulting to a slug of `Title`), a vendored `wwwroot/js/nav-state.js` localStorage enhancer
(keyed by `data-nav-key`, prefix `zbnav:`, idempotent), and a new **`ThemeScripts`** component (sibling to
`ThemeHead`) that emits the enhancer `<script defer>` before `</body>`. This lets every app persist nav
expand-state from one shared, static-SSR-friendly mechanism (no per-app cookie/JS-interop island).
--- ---
@@ -16,12 +23,14 @@ tokens-only or components-only consumers; all three apps consume the full kit.
|---|---|---| |---|---|---|
| `ZB.MOM.WW.Theme` | `net10.0` Razor Class Library | Tokens + fonts + layout CSS + all components | | `ZB.MOM.WW.Theme` | `net10.0` Razor Class Library | Tokens + fonts + layout CSS + all components |
Published to the Gitea NuGet feed; `Version 0.1.0`. SemVer — token changes are Published to the Gitea NuGet feed; `Version 0.2.0`. SemVer — token changes are
breaking (major bump). Build from `scadaproj/ZB.MOM.WW.Theme/`: breaking (major bump); the `0.1.0 → 0.2.0` bump added nav persistence (`NavRailSection.Key` +
`ThemeScripts` + `nav-state.js`) additively. Build from `scadaproj/ZB.MOM.WW.Theme/`:
```bash ```bash
dotnet build -c Release # 0 warnings (TreatWarningsAsErrors) dotnet build -c Release # 0 warnings (TreatWarningsAsErrors)
dotnet test # 32 bUnit tests dotnet test # 44 bUnit tests
./build/pack.sh # → ./artifacts/ZB.MOM.WW.Theme.0.1.0.nupkg ./build/pack.sh # → ./artifacts/ZB.MOM.WW.Theme.0.2.0.nupkg
GITEA_NUGET_SOURCE=… GITEA_NUGET_KEY=… ./build/push.sh # publish to the Gitea feed
``` ```
--- ---
@@ -72,6 +81,18 @@ Place in `App.razor` `<head>` **after** the app's Bootstrap link.
--- ---
### `ThemeScripts`
Emits the nav-state localStorage enhancer `<script src="_content/ZB.MOM.WW.Theme/js/nav-state.js" defer>`.
No parameters. Place in `App.razor` **before `</body>`**. Persists each `NavRailSection`'s open/closed
state (keyed by its `data-nav-key`) across navigation and reloads; pure client-side, works in static SSR.
```razor
<ThemeScripts />
```
---
### `ThemeShell` ### `ThemeShell`
Canonical side-rail chassis. **Not a `LayoutComponentBase`** — delegated to from the app's Canonical side-rail chassis. **Not a `LayoutComponentBase`** — delegated to from the app's
@@ -134,14 +155,15 @@ One rail navigation link. Wraps Blazor `<NavLink class="rail-link">`.
### `NavRailSection` ### `NavRailSection`
Collapsible nav section group using CSS-only `<details open>` — no JavaScript, works in Collapsible nav section group using CSS-only `<details open>` — no JavaScript required. Open/closed
static Blazor SSR. Apps that need interactive cookie-persisted expand state may keep a state is persisted in localStorage by `<ThemeScripts/>` (keyed by `Key``data-nav-key`); works in
bespoke interactive `NavSection` alongside this. static Blazor SSR.
| Parameter | Type | Required | Default | Notes | | Parameter | Type | Required | Default | Notes |
|---|---|---|---|---| |---|---|---|---|---|
| `Title` | `string` | Yes | — | Eyebrow label | | `Title` | `string` | Yes | — | Eyebrow label |
| `Expanded` | `bool` | No | `true` | Initial open state | | `Key` | `string?` | No | slug of `Title` | Stable persistence key, emitted as `data-nav-key` |
| `Expanded` | `bool` | No | `true` | Initial open state (before localStorage restore) |
| `ChildContent` | `RenderFragment?` | No | `null` | `NavRailItem` children | | `ChildContent` | `RenderFragment?` | No | `null` | `NavRailItem` children |
--- ---
+341
View File
@@ -0,0 +1,341 @@
# Deployment & Environments — SCADA/OT family
> How the sister projects are deployed: environments, hosts, SSH access, Docker/Traefik
> topology, databases, and the full service/port map. Compiled **2026-06-03** by reading the
> actual compose/Traefik/SSH files (not docs alone). For the per-service **environment
> variables** see the companion [`env_vars.md`](env_vars.md).
>
> **Source confidence:** container/port/Traefik/DB facts below are read straight from the
> compose + `traefik/*.yml` files. SSH facts are from `~/.ssh/config` + `~/.ssh/known_hosts`.
> Where a fact is referenced in repo docs but not pinned in a config/script on this machine,
> it's marked _(referenced, not scripted in-repo)_ — don't treat those as automated.
---
## 1. Environment inventory
| Environment | Where it runs | What it is | Entry point |
|---|---|---|---|
| **ScadaBridge `docker`** | This Mac (Docker Desktop/OrbStack) | Full hub-and-spoke: 2 Central + 3 sites ×2 nodes | `http://localhost:9000` |
| **ScadaBridge `docker-env2`** | This Mac | Second isolated cluster: 2 Central + 1 site ×2 nodes | `http://localhost:9100` |
| **ScadaBridge `infra`** | This Mac | Shared backing services (MSSQL, OPC-UA sims, SMTP, REST, Playwright) — **not** LDAP (see shared GLAuth below) | n/a (deps) |
| **OtOpcUa `otopcua-dev`** | This Mac | 3 independent Akka clusters (MAIN + SITE-A + SITE-B) sharing one ConfigDb | `http://localhost:9200` |
| **MxAccessGateway** | `windev` (10.100.0.48), Windows | Windows-native gRPC gateway + per-session x86 worker (no Docker) | `http://10.100.0.48:5120` (gRPC) |
| **Production (VD03)** | `wonder-app-vd03.zmr.zimmer.com` | Single-node ScadaBridge + MxGateway prod host | see `docs/operations/` runbooks |
The three ScadaBridge stacks share one external Docker network **`scadabridge-net`**; the
OtOpcUa `otopcua-dev` stack runs on its **own** default network (`otopcua-dev_default`) and is
network-isolated from ScadaBridge. All local stacks can run simultaneously — host ports do not
collide (see [§7](#7-consolidated-host-port-map)).
> On this Apple-Silicon Mac, MSSQL runs under amd64 emulation (slow first-ready; the "platform
> does not match" warning is expected/benign). See [[scadabridge-local-deploy-gotchas]].
---
## 2. Hosts & SSH connectivity
### 2.1 Host inventory
| Host | Address | OS | Role | SSH port |
|---|---|---|---|---|
| **This Mac** | local | macOS (darwin) | Dev workstation — runs all local Docker stacks | n/a |
| **windev** | `10.100.0.48` | Windows | OtOpcUa Windows-service host **+** MxAccessGateway (gRPC `5120` / dashboard `5130`) | 22 |
| **fixture host** | `10.100.0.35` | Debian/Linux + Docker | OtOpcUa driver **integration-test fixtures** + a test SQL Server | 22 |
| **VD03 (prod)** | `wonder-app-vd03.zmr.zimmer.com` | Windows | Production single-node ScadaBridge + MxGateway | **2222** |
| **gitea** | `gitea.dohertylan.com` (`10.100.0.228`) | Linux | Git remotes + NuGet feed (`/api/packages/dohertj2/nuget`) | 22 |
All are on the private `10.x` lab network — a LAN/VPN connection is required.
### 2.2 How to connect (passwordless SSH)
Auth is **key-based (passwordless)** with `~/.ssh/id_ed25519` (a legacy `~/.ssh/id_rsa` exists
as fallback). Only **one** host alias is defined in `~/.ssh/config`:
```sshconfig
# ~/.ssh/config (verified)
Include ~/.orbstack/ssh/config # OrbStack local Linux VMs — use `ssh orb` / `orb` CLI
Host windev
HostName 10.100.0.48
User dohertj2
IdentityFile ~/.ssh/id_ed25519
# Port 22 (default)
```
| Target | Command | Notes |
|---|---|---|
| **windev** (Win host) | `ssh windev` | Configured alias; user `dohertj2`, key `id_ed25519`, port 22 |
| **fixture host** | `ssh dohertj2@10.100.0.35` | In `known_hosts`; **no** config alias — pass user explicitly; port 22, key-based |
| **VD03 (prod)** | `ssh dohertj2@wonder-app-vd03.zmr.zimmer.com -p 2222` | In `known_hosts` on **port 2222** (the only non-standard SSH port); user/key not pinned in config — confirm before use |
| **local Linux VMs** | `ssh orb` / `orb` | OrbStack-managed |
> ⚠️ `~/bin` is **empty** on this Mac. OtOpcUa's `CLAUDE.md` mentions an `lmxopcua-fix` helper "in
> `~/bin`" for controlling the `10.100.0.35` fixture containers — it is **not present here** (it's a
> Windows-side helper). On this machine, drive the fixture host with direct SSH, e.g.
> `ssh dohertj2@10.100.0.35 'docker compose -f /opt/otopcua-<driver>/docker-compose.yml up -d'`.
> Treat the exact remote paths/commands as _(referenced, not scripted in-repo)_ — verify on the host.
---
## 3. ScadaBridge deployment
.NET 10 + Akka.NET. One image `scadabridge:latest` (built by `docker/build.sh`) backs every node;
role is chosen by `SCADABRIDGE_CONFIG` (`Central`|`Site`) → `appsettings.{role}.json`. Central is a
2-node Akka cluster (split-brain resolver = `keep-oldest`); each Site is its **own** 2-node Akka
cluster reached from Central via ClusterClient.
### 3.1 `docker/` — primary 3-site cluster (network `scadabridge-net`)
| Service | Container | Host→container ports | Role | Volumes |
|---|---|---|---|---|
| central-a | `scadabridge-central-a` | `9001:5000` (UI+Inbound API), `9011:8081` (Akka) | Central | `central-node-a/appsettings.Central.json` (ro), `…/logs` |
| central-b | `scadabridge-central-b` | `9002:5000`, `9012:8081` | Central | `central-node-b/…` |
| site-a-a | `scadabridge-site-a-a` | `9021:8082` (Akka), `9023:8083` (gRPC) | Site | `site-a-node-a/{appsettings.Site.json,data,logs}` |
| site-a-b | `scadabridge-site-a-b` | `9022:8082`, `9024:8083` | Site | `site-a-node-b/…` |
| site-b-a | `scadabridge-site-b-a` | `9031:8082`, `9033:8083` | Site | `site-b-node-a/…` |
| site-b-b | `scadabridge-site-b-b` | `9032:8082`, `9034:8083` | Site | `site-b-node-b/…` |
| site-c-a | `scadabridge-site-c-a` | `9041:8082`, `9043:8083` | Site | `site-c-node-a/…` |
| site-c-b | `scadabridge-site-c-b` | `9042:8082`, `9044:8083` | Site | `site-c-node-b/…` |
| traefik | `scadabridge-traefik` | `9000:80` (Central LB), `8180:8080` (dashboard) | LB | `traefik/{traefik,dynamic}.yml` (ro) |
All `restart: unless-stopped`; image `scadabridge:latest` (traefik `traefik:v3.4`).
**Access:** Central UI/API via LB `http://localhost:9000`; direct nodes `:9001`/`:9002`; Traefik
dashboard `http://localhost:8180`; Management API `http://localhost:9000/management`; health
`…/health/ready` + `…/health/active`.
### 3.2 `docker-env2/` — secondary 1-site cluster (same `scadabridge-net`)
| Service | Container | Host→container ports | Role |
|---|---|---|---|
| central-a | `scadabridge-env2-central-a` | `9101:5000`, `9111:8081` | Central |
| central-b | `scadabridge-env2-central-b` | `9102:5000`, `9112:8081` | Central |
| site-x-a | `scadabridge-env2-site-x-a` | `9121:8082`, `9123:8083` | Site |
| site-x-b | `scadabridge-env2-site-x-b` | `9122:8082`, `9124:8083` | Site |
| traefik | `scadabridge-env2-traefik` | `9100:80` (LB), `8181:8080` (dashboard) | LB |
**Access:** LB `http://localhost:9100`; direct `:9101`/`:9102`; dashboard `http://localhost:8181`.
This cluster's DBs and **auth cookie name** are distinct from `docker/` so the two can run on
`localhost` at once — cookie `ZB.MOM.WW.ScadaBridge.Auth.env2` vs the default; see
[[scadabridge-local-deploy-gotchas]].
### 3.3 `infra/` — shared backing services (network `scadabridge-net`)
| Service | Container | Image | Host ports | Purpose |
|---|---|---|---|---|
| mssql | `scadabridge-mssql` | `mcr.microsoft.com/mssql/server:2022-latest` | `1433:1433` | SQL Server — Central DBs for **both** clusters; named vol `scadabridge-mssql-data`; init via `/docker-entrypoint-initdb.d/{setup,machinedata_seed,setup-env2}.sql` |
| opcua | `scadabridge-opcua` | `mcr.microsoft.com/iotedge/opc-plc:latest` | `50000:50000`, `8080:8080` | OPC-UA simulator 1 (`--unsecuretransport --autoaccept`) |
| opcua2 | `scadabridge-opcua2` | `…/opc-plc:latest` | `50010:50010`, `8081:8080` | OPC-UA simulator 2 |
| smtp | `scadabridge-smtp` | `axllent/mailpit:latest` | `1025:1025`, `8025:8025` | SMTP sink + web UI (`http://localhost:8025`) |
| restapi | `scadabridge-restapi` | local build `./restapi` | `5200:5200` | Test REST endpoint |
| playwright | `scadabridge-playwright` | `mcr.microsoft.com/playwright:v1.58.2-noble` | `3000:3000` | Browser-automation server |
> **LDAP is NOT started by `infra/`.** The per-app `scadabridge-ldap` container has been retired
> (commented out in `infra/docker-compose.yml`). All three apps (ScadaBridge, OtOpcUa, MxAccessGateway)
> now share a single **`zb-shared-glauth`** container on the Linux fixture host **`10.100.0.35:3893`**
> (`baseDN dc=zb,dc=local`, Transport=None). Source of truth and deploy/verify runbook:
> **`scadaproj/infra/glauth/`** (`config.toml` + `docker-compose.yml` + `README.md`); deploy by
> scp-ing those two files to `10.100.0.35` and running `docker compose up -d`.
### 3.4 Traefik (ScadaBridge)
Both clusters use a file provider + insecure API dashboard. `traefik.yml`: entrypoint `web:80`,
`api.dashboard: true / insecure: true`, file provider `dynamic.yml`. `dynamic.yml` router
`central` (`PathPrefix(/)` → service `central`) load-balances the two Central containers with an
**active health check** on `/health/active` (interval 5s, timeout 3s) — so traffic only routes to
the active leader (standby returns 503 and is dropped from rotation):
```yaml
# docker/traefik/dynamic.yml (env2 points at scadabridge-env2-central-a/-b)
http:
routers: { central: { rule: "PathPrefix(`/`)", service: central, entryPoints: [web] } }
services:
central:
loadBalancer:
healthCheck: { path: /health/active, interval: 5s, timeout: 3s }
servers: [ {url: "http://scadabridge-central-a:5000"}, {url: "http://scadabridge-central-b:5000"} ]
```
### 3.5 Databases (ScadaBridge)
- **Central → MSSQL** (`scadabridge-mssql:1433`), app login `scadabridge_app` / `ScadaBridge_Dev1#` 🔒(dev-only):
- `docker/`: `ScadaBridgeConfig` + `ScadaBridgeMachineData`
- `docker-env2/`: `ScadaBridgeConfig2` + `ScadaBridgeMachineData2`
- Created by `infra/mssql/setup.sql` + `setup-env2.sql` at MSSQL init; EF Core migrations run on Central startup; `docker-env2/init-db.sh` ensures the env2 DBs before deploy; `seed-sites.sh` seeds Site rows post-deploy.
- **Site → SQLite**, per node under the mounted `…/data` volume (`SiteDbPath`, plus a store-and-forward DB). Not networked, not replicated across hosts.
### 3.6 Deploy commands (ScadaBridge)
```bash
cd ~/Desktop/ScadaBridge
cd infra && docker compose up -d # 1) backing services (MSSQL, OPC-UA, SMTP, REST) — LDAP is shared glauth on 10.100.0.35 (scadaproj/infra/glauth/)
bash docker/build.sh # 2) create scadabridge-net (if missing) + build scadabridge:latest
bash docker/deploy.sh # 3) up -d --force-recreate; prints access points (9000/9001/9002/8180)
bash docker/seed-sites.sh # 4) seed sites + data-connections (optional)
# env2 cluster:
bash docker-env2/deploy.sh # reuses the image; runs init-db.sh; ports 9100/9101/9102/8181
```
> **Caveat:** `deploy.sh` does `up -d --force-recreate`, starting both Central nodes at once — they
> can split-brain on a simultaneous start. Start Central **sequenced** (central-a → wait `/health/active`
> 200 → central-b). Central also requires `ScadaBridge__InboundApi__ApiKeyPepper` (dev value is inline in
> both composes). Full detail: [[scadabridge-local-deploy-gotchas]].
---
## 4. OtOpcUa deployment (`otopcua-dev`)
.NET 10 OPC-UA server. **Three independent Akka clusters** share the single `OtOpcUa` ConfigDb
(multi-tenancy via the `ServerCluster` table); Akka isolation is by disjoint seed lists (same
system name `otopcua`, internal remoting port `4053`). Built locally from `docker-dev/Dockerfile`
→ image `otopcua-host:dev`. **No per-app LDAP container**`docker-dev` is un-stubbed
(`Authentication__Ldap__DevStubMode` removed) and binds the **shared GLAuth** at
`10.100.0.35:3893` (`baseDN dc=zb,dc=local`, Transport=None). Start the shared glauth first via
`scadaproj/infra/glauth/` if it is not already running.
| Service | Container | Host→container ports | Cluster / role |
|---|---|---|---|
| sql | (`otopcua-dev-sql-1`) | `14330:1433` | SQL Server 2022 — the shared `OtOpcUa` ConfigDb |
| cluster-seed | one-shot | — | `mssql-tools` running `/seed/entrypoint.sh` (idempotent ServerCluster/ClusterNode seed) |
| admin-a | host | _(none — internal `:9000` UI behind Traefik)_ | MAIN, role `admin` (seed) |
| admin-b | host | _(none)_ | MAIN, role `admin` (joins admin-a) |
| driver-a | host | `4840:4840` (OPC UA) | MAIN, role `driver` |
| driver-b | host | `4841:4840` | MAIN, role `driver` |
| site-a-1 | host | `4842:4840` | SITE-A, `admin,driver` (seed) |
| site-a-2 | host | `4843:4840` | SITE-A, `admin,driver` |
| site-b-1 | host | `4844:4840` | SITE-B, `admin,driver` (seed) |
| site-b-2 | host | `4845:4840` | SITE-B, `admin,driver` |
| traefik | host | `9200:80` (Admin UI LB), `8089:8080` (dashboard) | `traefik:v3.1` |
- **OPC UA endpoints:** `opc.tcp://localhost:4840` (driver-a) … `:4845` (site-b-2). Admin nodes serve no OPC UA.
- **Admin UI (Traefik, sticky cookie `otopcua_lb`, health-checked on `/health/active`):**
- MAIN cluster: `http://localhost:9200`
- SITE-A: `http://site-a.localhost:9200` · SITE-B: `http://site-b.localhost:9200` (Host-header routing; macOS auto-resolves `*.localhost`)
- Traefik dashboard: `http://localhost:8089`
- **DB:** `sql` service, `14330:1433`, SA `OtOpcUa!Dev123` 🔒(dev-only), database `OtOpcUa`; EF auto-migrates on host start, then `cluster-seed` inserts the 3 ServerCluster + 6 ClusterNode rows.
- **Deploy:** `docker compose -f docker-dev/docker-compose.yml up -d --build` ; tear down with `… down -v`.
- **Galaxy link:** driver nodes resolve `GALAXY_MXGW_API_KEY` and connect out to MxAccessGateway (see §5).
> **Integration-test fixtures (separate from this stack)** run on the Linux **fixture host
> `10.100.0.35`** (Modbus `:5020`, Allen-Bradley `:44818`, S7 `:102`, OPC-UA `:50000`, SQL `:14330`).
> Those are test endpoints, not the deployed app; per-fixture env defaults are in [`env_vars.md`](env_vars.md) §1.3.
---
## 5. MxAccessGateway deployment (Windows-native, no Docker)
Two processes: an **x64 .NET 10 Server** (ASP.NET Core gRPC + Blazor dashboard) and a **per-session
x86 .NET 4.8 Worker** that owns the 32-bit AVEVA MXAccess COM/STA. Windows-only. Deployed on
**`windev` (10.100.0.48)** and **VD03**, run as a **Windows Service via NSSM** (config delivered as
`Kestrel__Endpoints__…` environment variables, not `appsettings.json`).
### 5.1 Endpoint/port map
| Endpoint | Default URL | Protocol | Config key | Purpose |
|---|---|---|---|---|
| **Http (gRPC)** | `http://0.0.0.0:5120` (h2c) | HTTP/2 cleartext | `Kestrel__Endpoints__Http__Url` / `__Protocols=Http2` | Public gRPC: sessions, MxCommand/MxEvent, Galaxy browse |
| **Dashboard** | `http://0.0.0.0:5130` | HTTP/1.1 | `Kestrel__Endpoints__Dashboard__Url` | Blazor dashboard + SignalR hubs + `/login` |
Local dev (`launchSettings.json`): gRPC `http://localhost:5120` (https dev profile adds `7121`).
TLS optional — set `…Http__Url=https://…`; the gateway auto-generates a self-signed cert if none is
supplied (`docs/GatewayConfiguration.md`). Dashboard cookie name is now configurable
(`MxGateway:Dashboard:CookieName`).
### 5.2 Run / host
```powershell
# local dev
dotnet run --project src/ZB.MOM.WW.MxGateway.Server/ZB.MOM.WW.MxGateway.Server.csproj
# the x86 worker must be published first; path = MxGateway:Worker:ExecutablePath
dotnet build src/MxGateway.Worker/MxGateway.Worker.csproj -p:Platform=x86
```
- **Worker model:** the Server spawns one `ZB.MOM.WW.MxGateway.Worker.exe` (x86) **per gRPC session**;
IPC over a named pipe (`\\.\pipe\mxgateway-<session>` + a per-session `MXGATEWAY_WORKER_NONCE`);
heartbeat 5s / grace 15s; max 64 concurrent sessions. The worker exits when the session closes.
- **Production hosts:** both `10.100.0.48` and `wonder-app-vd03` serve gRPC on `:5120` (per
`docs/GatewayConfiguration.md`).
### 5.3 Who connects to it
| Client | Connects to | Auth |
|---|---|---|
| OtOpcUa `GalaxyDriver` | `http://10.100.0.48:5120` (gRPC) | API key via `GALAXY_MXGW_API_KEY` (`mxgw_…` bearer) 🔒 |
| ScadaBridge MxGateway adapter | same gRPC endpoint `:5120` | API key |
---
## 6. Cross-project runtime data flow (deployed)
```
AVEVA Galaxy (Wonderware) ──MXAccess COM (32-bit)──► MxAccessGateway (windev:5120 gRPC / :5130 dashboard)
▲ ▲
OtOpcUa GalaxyDriver ───gRPC────┘ │ gRPC
(otopcua-dev: opc.tcp :48404845) │
│ OPC UA │
▼ │
ScadaBridge DCL ◄──OPC UA──┐ ┌──MxGateway adapter──┘
(docker :9000 / env2 :9100) └───┘
```
ScadaBridge reaches Wonderware data two ways: **(1)** OPC UA → OtOpcUa → gateway, or **(2)** its
MxGateway adapter → gateway directly. The break surface is the wire contracts (the gateway `.proto`s
and OtOpcUa's OPC-UA address space), not compile references.
---
## 7. Consolidated host port map
Every published host port across the local stacks (no collisions — all can run at once):
| Port | → Container:port | Service | Stack |
|---|---|---|---|
| 1025 | `scadabridge-smtp`:1025 | SMTP submission | infra |
| 1433 | `scadabridge-mssql`:1433 | SQL Server (ScadaBridge Central DBs) | infra |
| 3000 | `scadabridge-playwright`:3000 | Playwright server | infra |
| 3893 | `zb-shared-glauth`:3893 on **10.100.0.35** | LDAP (shared GLAuth — remote fixture host, not a local container) | scadaproj/infra/glauth/ |
| 5200 | `scadabridge-restapi`:5200 | Test REST API | infra |
| 8025 | `scadabridge-smtp`:8025 | Mailpit web UI | infra |
| 8080 | `scadabridge-opcua`:8080 | OPC-UA sim 1 web UI | infra |
| 8081 | `scadabridge-opcua2`:8080 | OPC-UA sim 2 web UI | infra |
| 50000 | `scadabridge-opcua`:50000 | OPC-UA sim 1 endpoint | infra |
| 50010 | `scadabridge-opcua2`:50010 | OPC-UA sim 2 endpoint | infra |
| 9000 | `scadabridge-traefik`:80 | **Central UI/API (LB)** | docker |
| 8180 | `scadabridge-traefik`:8080 | Traefik dashboard | docker |
| 9001 / 9002 | central-a / central-b :5000 | Central UI+Inbound API (direct) | docker |
| 9011 / 9012 | central-a / central-b :8081 | Akka remoting | docker |
| 90219024 | site-a-a/b :8082 / :8083 | Site A Akka / gRPC | docker |
| 90319034 | site-b-a/b :8082 / :8083 | Site B Akka / gRPC | docker |
| 90419044 | site-c-a/b :8082 / :8083 | Site C Akka / gRPC | docker |
| 9100 | `scadabridge-env2-traefik`:80 | **Central UI/API (LB)** | docker-env2 |
| 8181 | `scadabridge-env2-traefik`:8080 | Traefik dashboard | docker-env2 |
| 9101 / 9102 | env2 central-a / central-b :5000 | Central (direct) | docker-env2 |
| 9111 / 9112 | env2 central-a / central-b :8081 | Akka remoting | docker-env2 |
| 91219124 | env2 site-x-a/b :8082 / :8083 | Site X Akka / gRPC | docker-env2 |
| 14330 | `otopcua-dev` sql :1433 | SQL Server (`OtOpcUa` DB) | otopcua-dev |
| 4840 / 4841 | driver-a / driver-b :4840 | OPC UA (MAIN) | otopcua-dev |
| 4842 / 4843 | site-a-1 / site-a-2 :4840 | OPC UA (SITE-A) | otopcua-dev |
| 4844 / 4845 | site-b-1 / site-b-2 :4840 | OPC UA (SITE-B) | otopcua-dev |
| 9200 | `otopcua-dev` traefik :80 | **Admin UI (LB)** | otopcua-dev |
| 8089 | `otopcua-dev` traefik :8080 | Traefik dashboard | otopcua-dev |
**Remote (non-local) endpoints:** MxAccessGateway gRPC `10.100.0.48:5120` (h2c) / dashboard `:5130`;
production gRPC on `wonder-app-vd03:5120`. SSH: windev/fixture/gitea on `22`, **VD03 on `2222`**.
---
## 8. Secrets & dev-only values
Every credential shown above (`OtOpcUa!Dev123`, `ScadaBridge_Dev1#`, the inline API-key peppers,
the `docker-dev` JWT signing key, the `mxgw_…` API key) is a **dev-only placeholder** for the local
stacks — never reuse as a real secret. Production injects real secrets out-of-band (NSSM env / secret
store), per ScadaBridge `docs/operations/inbound-api-key-reissue.md` (the VD03 runbook). The full
🔒 secret inventory and the `__`-env-var override forms are in [`env_vars.md`](env_vars.md) §5.
## 9. Production (VD03) — pointer
`wonder-app-vd03.zmr.zimmer.com` (SSH `:2222`) runs the production single-node ScadaBridge and the
MxGateway (gRPC `:5120`). The production install is **not a scripted in-repo flow** here — the
operational procedures live in ScadaBridge `docs/operations/` (`failover-procedures.md`,
`maintenance-procedures.md`, `inbound-api-key-reissue.md`, `troubleshooting-guide.md`). Treat any
prod service/port specifics not in those runbooks as unverified.
@@ -0,0 +1,202 @@
# Design: Deploy `ZB.MOM.WW.Configuration` fleet-wide
**Date:** 2026-06-01
**Status:** Approved — ready for implementation planning (writing-plans).
**Scope:** Adopt the shared `ZB.MOM.WW.Configuration` library into all three sister apps
(OtOpcUa, MxAccessGateway, ScadaBridge).
> Every state claim below was **code-verified on 2026-06-01**, not taken from the
> `components/*/GAPS.md` prose — those docs proved unreliable in both directions (they
> claimed Health was un-adopted when it is fully adopted, and claimed Telemetry was
> adopted before it was). See memory `component-status-claims-are-optimistic`.
---
## 0. Why this module
Verified fleet-wide adoption state (real `PackageReference` + usage scan of the three
sister-app `src/` trees, plus Gitea-feed `curl`):
| Module | OtOpcUa | MxAccessGateway | ScadaBridge | Status |
|---|---|---|---|---|
| Health | ✅ | ✅ | ✅ | already deployed fleet-wide |
| Telemetry (observability) | ✅ | ✅ | ✅ | already deployed fleet-wide |
| **Configuration** | — | — | — | **chosen: not adopted anywhere** |
| Auth | — | — | — | not adopted |
| UI Theme | — | — | — | not adopted |
| Audit | — | — | — | not adopted |
Configuration was chosen as the next fleet-wide adoption because it is the same
cross-cutting-infra flavour as the already-done Health + Telemetry, it is the
lowest-risk (behaviour-preserving for the two heavy consumers), and it still delivers
real new value (OtOpcUa gains fail-fast startup validation it lacks entirely today).
### Decisions locked during brainstorming
- **Module:** Configuration.
- **OtOpcUa depth:** add **real** validators (net-new `Ldap`/`OpcUa` startup validation),
not just a package reference.
- **Rollout:** per-repo **sequential**, increasing risk order: Foundation → MxGateway →
OtOpcUa → ScadaBridge; each repo on its own branch, verified green before the next.
- **ScadaBridge `StartupValidator` → `ConfigPreflight`:** included in this pass.
---
## 1. Goal & scope
Move the config-validation **plumbing** (failure accumulation, the bind+validate+
`ValidateOnStart` triple, the pre-host raw-config aggregator) into the shared library so
it is written once; leave every **domain rule and failure message** per-project.
**Out of scope:**
- OtOpcUa's `DraftValidator` / `sp_ValidateDraft` — domain *content* validation over
database draft rows, dormant in `src/`, not the host-config concern this library owns.
- Any change to rule wording or validation semantics (behaviour-preserving except the
*additive* OtOpcUa validators).
---
## 2. The contract being adopted (verified public API)
From `ZB.MOM.WW.Configuration/src/ZB.MOM.WW.Configuration/`:
- **`OptionsValidatorBase<TOptions>`** — abstract `IValidateOptions<TOptions>`. Override
`protected abstract void Validate(ValidationBuilder, TOptions)`; the base creates the
builder, runs the override, and returns `Success` only when no failures were recorded
(else `Fail(builder.Failures)`).
- **`ValidationBuilder`** — rule primitives `Required`, `Port`, `HostPort`,
`PositiveTimeSpan`, `OneOf`, `MinCount`, plus `RequireThat(bool, message)` and
`Add(message)` for custom / cross-field rules. `Failures` / `IsValid` expose state.
- **`ServiceCollectionExtensions.AddValidatedOptions<TOptions, TValidator>(config, sectionPath)`**
`TryAddEnumerable` the validator (singleton) + `AddOptions().Bind(section).ValidateOnStart()`
in one call; returns the `OptionsBuilder` for chaining.
- **`ConfigPreflight.For(IConfiguration)`** — fluent pre-host checker for raw config
before the DI container exists: `RequireValue(key)`, `RequirePort(key)`,
`Require(key, predicate, reason)`, `When(condition, block)`, terminating in
`ThrowIfInvalid()` (throws `InvalidOperationException` listing all failures).
Library health: `dotnet test`**42 passed, 0 failed** (the `CLAUDE.md` "27 tests" line
is stale-low; the suite passes regardless).
---
## 3. Foundation phase (must land before any repo adopts)
This is the part the status docs hide. Verified 2026-06-01:
1. **Pack + push the package.** `ZB.MOM.WW.Configuration` is **404 on the Gitea feed**
(`registration/zb.mom.ww.configuration/index.json`), while the known-adopted Health
package returns 200. `dotnet pack -c Release` then push the `.nupkg` to
`https://gitea.dohertylan.com/api/packages/dohertj2/nuget`.
2. **Per-app feed wiring** (all three `nuget.config` files): the `dohertj2-gitea`
`packageSourceMapping` currently routes only `ZB.MOM.WW.MxGateway.*`,
`ZB.MOM.WW.Health*`, `ZB.MOM.WW.Telemetry*`. Add
`<package pattern="ZB.MOM.WW.Configuration" />`. Without this, restore fails even with
the package on the feed.
3. **Central version pin** in each app's `Directory.Packages.props`:
`<PackageVersion Include="ZB.MOM.WW.Configuration" Version="0.1.0" />`.
4. **Verify gate:** `curl` the registration index → **200** before any repo work begins.
---
## 4. Per-repo adoption (sequential)
Each repo: branch `feat/adopt-zb-configuration`, `PackageReference` (no version — central
package management), migrate, `dotnet build` + `dotnet test` green, then move on.
### Repo 1 — MxAccessGateway (medium; pure refactor)
- `PackageReference Include="ZB.MOM.WW.Configuration"` in
`src/ZB.MOM.WW.MxGateway.Server/ZB.MOM.WW.MxGateway.Server.csproj`.
- `GatewayOptionsValidator : IValidateOptions<GatewayOptions>`
`: OptionsValidatorBase<GatewayOptions>`. Drop the private `List<string>` and the
`Count == 0 ? Success : Fail` tail (now the base's job). Map private helpers:
`AddIfBlank``Required`; `AddIfNotPositive` / `AddIfNegative``RequireThat(... , msg)`.
Keep `AddIfInvalidPath`, the `.exe`-extension rule, the cross-field
`HeartbeatGraceSeconds >= HeartbeatIntervalSeconds`, range checks, and all nine
sub-validators as `RequireThat`/`Add` custom rules. **Every message string unchanged.**
- `AddGatewayConfiguration`'s `AddOptions().BindConfiguration(SectionName).ValidateOnStart()`
+ `AddSingleton<IValidateOptions<GatewayOptions>, GatewayOptionsValidator>()`
`services.AddValidatedOptions<GatewayOptions, GatewayOptionsValidator>(config, GatewayOptions.SectionName)`.
Keep the separate `IGatewayConfigurationProvider` registration.
### Repo 2 — OtOpcUa (lightest base, but net-new validation added)
- `PackageReference` in
`src/Server/ZB.MOM.WW.OtOpcUa.Host/ZB.MOM.WW.OtOpcUa.Host.csproj`.
- New `LdapOptionsValidator : OptionsValidatorBase<LdapOptions>`
(`LdapOptions` lives in `ZB.MOM.WW.OtOpcUa.Security/Ldap/`): `Required` on Server /
SearchBase (and other not-optional fields). `Program.cs:99`
`AddOptions<LdapOptions>().Bind(GetSection("Ldap"))`
`AddValidatedOptions<LdapOptions, LdapOptionsValidator>(config, "Ldap")`.
- New validator for the `OpcUa` section; replace the imperative
`GetSection("OpcUa").Bind(options)` at `OtOpcUaServerHostedService.cs:63` with validated
options resolved from DI. Exact rule list finalized in the implementation plan from the
real `OpcUaOptions` fields (ports → `Port`, endpoints → `HostPort`, required strings →
`Required`, durations → `PositiveTimeSpan`).
- New unit tests for both validators (valid config passes; each missing/invalid field
produces its message).
### Repo 3 — ScadaBridge (heaviest; refactor + preflight)
- `PackageReference` in `src/ZB.MOM.WW.ScadaBridge.Host/...csproj` and the module projects
that own validators (ClusterInfrastructure, Security, HealthMonitoring, AuditLog).
- Four `*OptionsValidator``OptionsValidatorBase<T>`:
- `ClusterOptionsValidator`: `SeedNodes` ≥ 2 → `MinCount`; strategy ∈ set → `OneOf`;
three positive `TimeSpan``PositiveTimeSpan`; cross-field heartbeat/threshold and
`DownIfAlone`/`MinNrOfMembers``RequireThat`.
- `SecurityOptionsValidator`: `Required` LdapServer / LdapSearchBase (JwtSigningKey stays
validated in `JwtTokenService` ctor — unchanged).
- `HealthMonitoringOptionsValidator`: three `PositiveTimeSpan` + cross-field
`CentralOfflineTimeout >= OfflineTimeout``RequireThat`. Preserve the idempotent
registration called from all three `Add*HealthMonitoring` entry points.
- `AuditLogOptionsValidator`: positive/`>=`/range checks → `RequireThat`.
- Each module `AddXxx``AddValidatedOptions<T, TValidator>` where the section binding
shape allows (preserve `ValidateOnStart` + `TryAddEnumerable` semantics).
- `StartupValidator.Validate(configuration)` at `Program.cs:41` → `ConfigPreflight.For(
configuration).RequireValue(...)/RequirePort(...)/When(...).ThrowIfInvalid()`. **Must
keep `StartupValidatorTests` green** — the thrown message is byte-compatible with
`ConfigPreflight.ThrowIfInvalid()`.
---
## 5. Error handling / behaviour preservation
- Failure surface is unchanged everywhere: `OptionsValidationException` thrown at host
start via `ValidateOnStart`; `ConfigPreflight.ThrowIfInvalid()` throws the same
`InvalidOperationException` text ScadaBridge's `StartupValidator` throws today.
- MxGateway + ScadaBridge: **zero message changes** — the existing validator tests and
`StartupValidatorTests` are the regression guard.
- OtOpcUa: **additive** — a config that was silently accepted (then failed late as an LDAP
error on first login, or an OPC UA bind error) now fails fast at startup. That is the
intended improvement, called out so it is not mistaken for a regression.
---
## 6. Testing & verification (gate per repo, before moving on)
- Library: re-run `dotnet test` (already 42 green).
- Each repo on its branch: `dotnet build` + `dotnet test` green.
- MxGateway: `src/MxGateway.Tests` (fake worker — no MXAccess needed).
- OtOpcUa: full solution test + the new validator unit tests.
- ScadaBridge: four validator tests + `StartupValidatorTests` still green.
- **Restore proof** per repo: a clean restore pulls `ZB.MOM.WW.Configuration 0.1.0` from
Gitea — confirms both the push and the source-mapping edit.
---
## 7. Risks & mitigations
| Risk | Mitigation |
|---|---|
| Package 404 / source-mapping omission breaks restore | Foundation phase + per-repo restore proof gate. |
| A "trivial" message tweak during refactor changes behaviour | Behaviour-preserving rule; existing tests fail loudly if a message drifts. |
| ScadaBridge preflight message drift | `StartupValidatorTests` must pass unchanged. |
| OtOpcUa `OpcUa`/`Ldap` rule set guesses wrong fields | Plan finalizes rules from the actual options classes; additive-only. |
| `AddValidatedOptions` singleton constraint (no scoped deps in validators) | All four ScadaBridge + the gateway validators are already stateless singletons. |
---
## 8. Deliverable & next step
This design doc, then a step-by-step implementation plan produced via the **writing-plans**
skill. No source changes in any repo until the plan is approved and execution begins.
> Note: `~/Desktop/scadaproj` is **not** a git repository, so this design is not committed
> here; it is saved under `docs/plans/`. (Per memory, do not `git init` it without asking.)
@@ -0,0 +1,566 @@
# Deploy `ZB.MOM.WW.Configuration` Fleet-Wide — Implementation Plan
> **For Claude:** REQUIRED SUB-SKILL: Use superpowers-extended-cc:executing-plans to implement this plan task-by-task.
**Goal:** Adopt the shared `ZB.MOM.WW.Configuration` library into all three sister apps (MxAccessGateway, OtOpcUa, ScadaBridge) so the config-validation *plumbing* is owned by the library while *domain rules and messages* stay per-project.
**Architecture:** Foundation first (publish the package to the Gitea feed + wire each app's NuGet source-mapping/version pin), then per-repo sequential adoption in increasing-risk order: MxGateway → OtOpcUa → ScadaBridge. Each repo on its own `feat/adopt-zb-configuration` branch, built + tested green before the next.
**Tech Stack:** .NET 10, `Microsoft.Extensions.Options` (`IValidateOptions`, `ValidateOnStart`), xUnit, central package management, Gitea NuGet feed.
**Design doc:** [`2026-06-01-deploy-zb-configuration-design.md`](2026-06-01-deploy-zb-configuration-design.md)
---
## ⚠️ Decisions & corrections baked into this plan (read first)
1. **Behaviour-preserving = use `RequireThat`, NOT the wording-imposing primitives.**
`ValidationBuilder.Required/Port/PositiveTimeSpan/...` emit **standardized** messages
(`"{field} is required"`, `"{field} must be between 1 and 65535 (was …)"`, `"{field} must be a
positive duration (was …)"`). MxGateway and ScadaBridge use **bespoke** messages (often with
trailing rationale, e.g. `"…; it is used directly as a PeriodicTimer period."`). Mapping their
checks onto the primitives would **silently change the messages and break the existing validator
tests.** Therefore, for MxGateway + ScadaBridge migrations: keep every check as
`builder.RequireThat(<condition>, "<exact existing message>")` (or `builder.Add("<message>")` for
unconditional adds). The `components/configuration/GAPS.md` "→ Required / → PositiveTimeSpan"
mappings are **wrong for byte-compatibility** — do not follow them. The wording-imposing
primitives are used **only in OtOpcUa**, where the validators are net-new and we author the
wording fresh.
2. **OtOpcUa gets real, net-new validators** (Ldap + OpcUa) — approved scope. This adds fail-fast
startup validation OtOpcUa lacks today; a previously silently-accepted bad config now throws at
host start. That is the intended improvement, not a regression.
3. **Flagged discrepancy (do not silently "fix"):** `OtOpcUa Program.cs:99` binds
`GetSection("Ldap")` but `LdapOptions.SectionName = "Authentication:Ldap"`. This plan
**preserves** the current `"Ldap"` section path and surfaces the mismatch to the user in Task 3.
Do not switch to the constant without an explicit decision.
4. **Out of scope:** OtOpcUa's `DraftValidator` / `sp_ValidateDraft` (dormant domain-content
validation), and any rule-wording change to existing validators.
---
## Task 1: Foundation — publish package + wire all three consumers
**Classification:** small
**Estimated implement time:** ~5 min
**Parallelizable with:** none (everything else depends on this)
**Files:**
- Pack source: `~/Desktop/scadaproj/ZB.MOM.WW.Configuration/ZB.MOM.WW.Configuration.slnx`
- Modify: `~/Desktop/MxAccessGateway/nuget.config`
- Modify: `~/Desktop/OtOpcUa/NuGet.config`
- Modify: `~/Desktop/OtOpcUa/Directory.Packages.props`
- Modify: `~/Desktop/ScadaBridge/nuget.config`
- Modify: `~/Desktop/ScadaBridge/Directory.Packages.props`
> Context: verified 2026-06-01 — `ZB.MOM.WW.Configuration` is **404** on the Gitea feed (Health is
> 200), and **no** app's `packageSourceMapping` routes it to Gitea. Both must be fixed before any
> repo can restore it. The lib builds clean: `dotnet test` = **42 passed**.
**Step 1: Verify the lib is green**
Run: `cd ~/Desktop/scadaproj/ZB.MOM.WW.Configuration && dotnet test ZB.MOM.WW.Configuration.slnx`
Expected: `Passed! - Failed: 0, Passed: 42`.
**Step 2: Pack**
Run: `cd ~/Desktop/scadaproj/ZB.MOM.WW.Configuration && dotnet pack ZB.MOM.WW.Configuration.slnx -c Release -o ./artifacts`
Expected: `ZB.MOM.WW.Configuration.0.1.0.nupkg` in `./artifacts`.
**Step 3: Push to Gitea** (use the same credentials/source already used for Health/Telemetry)
Run: `dotnet nuget push ./artifacts/ZB.MOM.WW.Configuration.0.1.0.nupkg --source dohertj2-gitea` (or the full feed URL `https://gitea.dohertylan.com/api/packages/dohertj2/nuget` with API key).
**Step 4: Verify it's live**
Run: `curl -s -o /dev/null -w "%{http_code}\n" https://gitea.dohertylan.com/api/packages/dohertj2/nuget/registration/zb.mom.ww.configuration/index.json`
Expected: `200`.
**Step 5: Add source-mapping in each `nuget.config`**
In all three (`MxAccessGateway/nuget.config`, `OtOpcUa/NuGet.config`, `ScadaBridge/nuget.config`),
inside the `dohertj2-gitea` `<packageSource>` block, add alongside the existing Health/Telemetry
patterns:
```xml
<package pattern="ZB.MOM.WW.Configuration" />
```
**Step 6: Pin the version (central package management)**
In `OtOpcUa/Directory.Packages.props` and `ScadaBridge/Directory.Packages.props`, add to the
`<ItemGroup>` of `<PackageVersion>`s:
```xml
<PackageVersion Include="ZB.MOM.WW.Configuration" Version="0.1.0" />
```
> Note: MxAccessGateway pins versions inline on the `PackageReference` (verified: its Health refs
> carry `Version="0.1.0"`), so its pin happens in Task 2 on the `PackageReference` itself. Confirm
> per repo whether `ManagePackageVersionsCentrally` is set and follow the repo's existing convention.
**Step 7: Restore proof**
Run (one app is enough): `cd ~/Desktop/ScadaBridge && dotnet restore` after Task 7 adds the
reference — OR a throwaway probe now: temporarily add the ref to a scratch project. Minimum gate:
Step 4 returns 200 and the mapping/pin edits are saved in all three repos.
**Step 8: Commit each touched repo** (these are separate git repos; `scadaproj` itself is NOT a git repo)
```bash
# in each of MxAccessGateway / OtOpcUa / ScadaBridge:
git checkout -b feat/adopt-zb-configuration
git add nuget.config NuGet.config Directory.Packages.props
git commit -m "build: add ZB.MOM.WW.Configuration feed mapping + version pin"
```
---
## Task 2: MxAccessGateway — migrate `GatewayOptionsValidator` to the shared base
**Classification:** standard
**Estimated implement time:** ~5 min
**Parallelizable with:** none
**Files:**
- Modify: `~/Desktop/MxAccessGateway/src/ZB.MOM.WW.MxGateway.Server/ZB.MOM.WW.MxGateway.Server.csproj`
- Modify: `~/Desktop/MxAccessGateway/src/ZB.MOM.WW.MxGateway.Server/Configuration/GatewayOptionsValidator.cs`
- Modify: `~/Desktop/MxAccessGateway/src/ZB.MOM.WW.MxGateway.Server/Configuration/GatewayConfigurationServiceCollectionExtensions.cs`
- Test (regression guard, do not change): `~/Desktop/MxAccessGateway/src/MxGateway.Tests/**` (the existing `GatewayOptionsValidator` tests)
**Step 1: Add the package reference**
In `ZB.MOM.WW.MxGateway.Server.csproj`, beside the existing Health refs:
```xml
<PackageReference Include="ZB.MOM.WW.Configuration" Version="0.1.0" />
```
**Step 2: Re-base the validator (messages byte-identical)**
`GatewayOptionsValidator.cs` — change the class + entry point and retarget the sub-validators and
helpers from `List<string>` to `ValidationBuilder`. The nine `ValidateXxx` methods and the four
helpers stay; only their parameter type and the `.Add` target change.
```csharp
using ZB.MOM.WW.Configuration; // add
using ZB.MOM.WW.MxGateway.Contracts;
namespace ZB.MOM.WW.MxGateway.Server.Configuration;
public sealed class GatewayOptionsValidator : OptionsValidatorBase<GatewayOptions> // was : IValidateOptions<GatewayOptions>
{
private const int MinimumMaxMessageBytes = 1024;
private const int MaximumMaxMessageBytes = 256 * 1024 * 1024;
protected override void Validate(ValidationBuilder builder, GatewayOptions options) // was public ValidateOptionsResult Validate(string? name, GatewayOptions options)
{
ValidateAuthentication(options.Authentication, builder);
ValidateLdap(options.Ldap, builder);
ValidateWorker(options.Worker, builder);
ValidateSessions(options.Sessions, builder);
ValidateEvents(options.Events, builder);
ValidateDashboard(options.Dashboard, builder);
ValidateProtocol(options.Protocol, builder);
ValidateAlarms(options.Alarms, builder);
ValidateTls(options.Tls, builder);
// NOTE: no List<string> and no `return Count==0 ? Success : Fail` — the base does that.
}
// ... sub-validators unchanged except `List<string> failures` param → `ValidationBuilder builder`
// and every `failures.Add(msg)``builder.Add(msg)`.
```
Helper conversions (keep the four helpers; retarget to the builder — **messages unchanged**):
```csharp
private static void AddIfBlank(string? value, string message, ValidationBuilder builder) =>
builder.RequireThat(!string.IsNullOrWhiteSpace(value), message);
private static void AddIfNotPositive(int value, string message, ValidationBuilder builder) =>
builder.RequireThat(value > 0, message);
private static void AddIfNegative(int value, string message, ValidationBuilder builder) =>
builder.RequireThat(value >= 0, message);
private static void AddIfInvalidPath(string? value, string message, ValidationBuilder builder)
{
if (string.IsNullOrWhiteSpace(value)) return;
try { _ = Path.GetFullPath(value); }
catch (ArgumentException) { builder.Add(message); }
catch (NotSupportedException) { builder.Add(message); }
catch (PathTooLongException) { builder.Add(message); }
}
```
> DO NOT replace `AddIfBlank` with `builder.Required(...)` etc. — that changes the message text.
> Mechanical rule for the bodies: `failures.Add(x)``builder.Add(x)`; the early-`return` guards
> (e.g. `if (!options.Enabled) return;` in `ValidateLdap`/`ValidateAlarms`, and the
> `Enum.IsDefined` short-circuit `return` in `ValidateAuthentication`) stay exactly as written.
**Step 3: Collapse the DI triple → `AddValidatedOptions`**
`GatewayConfigurationServiceCollectionExtensions.cs` — replace the
`AddOptions().BindConfiguration(SectionName).ValidateOnStart()` + `AddSingleton<IValidateOptions…>`
trio with one call (keep the separate `IGatewayConfigurationProvider` registration):
```csharp
using ZB.MOM.WW.Configuration; // add
// was:
// services.AddOptions<GatewayOptions>().BindConfiguration(GatewayOptions.SectionName).ValidateOnStart();
// services.AddSingleton<IValidateOptions<GatewayOptions>, GatewayOptionsValidator>();
services.AddValidatedOptions<GatewayOptions, GatewayOptionsValidator>(
configuration, GatewayOptions.SectionName);
```
> `AddValidatedOptions` takes an `IConfiguration`; if `AddGatewayConfiguration` doesn't already
> receive one, thread `builder.Configuration` (or `IConfiguration`) into it. The original used
> `BindConfiguration(SectionName)` (path read off the type); `AddValidatedOptions` takes the path as
> the `sectionPath` argument — pass `GatewayOptions.SectionName`. Net binding is identical.
**Step 4: Build + test (regression guard)**
Run: `cd ~/Desktop/MxAccessGateway && dotnet build src/MxGateway.sln && dotnet test src/MxGateway.Tests/MxGateway.Tests.csproj`
Expected: build succeeds; **all existing `GatewayOptionsValidator` tests pass unchanged** (proves messages are byte-identical). No MXAccess needed (fake worker).
**Step 5: Commit**
```bash
git add src/ZB.MOM.WW.MxGateway.Server
git commit -m "refactor: adopt ZB.MOM.WW.Configuration in MxGateway (behaviour-preserving)"
```
---
## Task 3: OtOpcUa — net-new `LdapOptionsValidator`
**Classification:** standard
**Estimated implement time:** ~5 min
**Parallelizable with:** Task 4 (different files — but keep on the same OtOpcUa branch)
**Files:**
- Modify: `~/Desktop/OtOpcUa/src/Server/ZB.MOM.WW.OtOpcUa.Host/ZB.MOM.WW.OtOpcUa.Host.csproj`
- Create: `~/Desktop/OtOpcUa/src/Server/ZB.MOM.WW.OtOpcUa.Host/Configuration/LdapOptionsValidator.cs`
- Modify: `~/Desktop/OtOpcUa/src/Server/ZB.MOM.WW.OtOpcUa.Host/Program.cs:99`
- Create: `~/Desktop/OtOpcUa/tests/Server/ZB.MOM.WW.OtOpcUa.Host.Tests/Configuration/LdapOptionsValidatorTests.cs` (match the repo's actual Host test project path — verify before writing)
**Step 1: Package reference**
In `ZB.MOM.WW.OtOpcUa.Host.csproj` (no `Version` — central management, pinned in Task 1):
```xml
<PackageReference Include="ZB.MOM.WW.Configuration" />
```
**Step 2: Write the failing test** (`LdapOptionsValidatorTests.cs`)
```csharp
using Microsoft.Extensions.Options;
using ZB.MOM.WW.OtOpcUa.Security.Ldap;
using ZB.MOM.WW.OtOpcUa.Host.Configuration;
using Xunit;
public class LdapOptionsValidatorTests
{
private static ValidateOptionsResult Run(LdapOptions o) =>
new LdapOptionsValidator().Validate(null, o);
[Fact]
public void Valid_options_pass() =>
Assert.True(Run(new LdapOptions { Enabled = true, Server = "ldap", SearchBase = "dc=x", Port = 389 }).Succeeded);
[Fact]
public void Disabled_skips_all_checks() =>
Assert.True(Run(new LdapOptions { Enabled = false, Server = "", SearchBase = "", Port = 0 }).Succeeded);
[Fact]
public void Blank_server_fails_when_enabled() =>
Assert.Contains("Authentication:Ldap:Server is required when LDAP login is enabled.",
Run(new LdapOptions { Enabled = true, Server = "", SearchBase = "dc=x", Port = 389 }).Failures!);
}
```
**Step 3: Run it — expect FAIL** (`LdapOptionsValidator` not defined).
Run: `cd ~/Desktop/OtOpcUa && dotnet test --filter FullyQualifiedName~LdapOptionsValidatorTests`
**Step 4: Implement** (`LdapOptionsValidator.cs`) — gate on `Enabled` like MxGateway; author wording fresh
```csharp
using ZB.MOM.WW.Configuration;
using ZB.MOM.WW.OtOpcUa.Security.Ldap;
namespace ZB.MOM.WW.OtOpcUa.Host.Configuration;
public sealed class LdapOptionsValidator : OptionsValidatorBase<LdapOptions>
{
protected override void Validate(ValidationBuilder builder, LdapOptions options)
{
if (!options.Enabled) return;
builder.RequireThat(!string.IsNullOrWhiteSpace(options.Server),
"Authentication:Ldap:Server is required when LDAP login is enabled.");
builder.RequireThat(!string.IsNullOrWhiteSpace(options.SearchBase),
"Authentication:Ldap:SearchBase is required when LDAP login is enabled.");
builder.Port(options.Port, "Authentication:Ldap:Port");
}
}
```
**Step 5: Wire the binding** — `Program.cs:99`
```csharp
// was: builder.Services.AddOptions<LdapOptions>().Bind(builder.Configuration.GetSection("Ldap"));
builder.Services.AddValidatedOptions<LdapOptions, LdapOptionsValidator>(builder.Configuration, "Ldap");
```
> **FLAG to the user (do not auto-resolve):** the section path stays `"Ldap"` to preserve current
> behaviour, even though `LdapOptions.SectionName == "Authentication:Ldap"`. The message strings
> above intentionally say `Authentication:Ldap:` (matching the conceptual section name); if the user
> prefers the path to match the constant, change both the `sectionPath` and re-confirm config keys.
**Step 6: Run tests — expect PASS.** `dotnet test --filter FullyQualifiedName~LdapOptionsValidatorTests`
**Step 7: Commit**
```bash
git add src/Server/ZB.MOM.WW.OtOpcUa.Host tests
git commit -m "feat: add fail-fast LDAP options validation in OtOpcUa via ZB.MOM.WW.Configuration"
```
---
## Task 4: OtOpcUa — net-new `OpcUa` validator + route through DI
**Classification:** high-risk
**Estimated implement time:** ~5 min
**Parallelizable with:** Task 3
**Files:**
- Create: `~/Desktop/OtOpcUa/src/Server/ZB.MOM.WW.OtOpcUa.Host/Configuration/OpcUaApplicationHostOptionsValidator.cs`
- Modify: `~/Desktop/OtOpcUa/src/Server/ZB.MOM.WW.OtOpcUa.Host/Program.cs` (register validated options)
- Modify: `~/Desktop/OtOpcUa/src/Server/ZB.MOM.WW.OtOpcUa.Host/OpcUa/OtOpcUaServerHostedService.cs:41-63` (inject `IOptions`, drop imperative bind)
- Create: `OpcUaApplicationHostOptionsValidatorTests.cs` (Host test project)
> Why high-risk: changes the hosted service constructor and makes a bad `OpcUa` section throw at host
> start (`ValidateOnStart`). Today `StartAsync` swallows SDK-start exceptions (`OtOpcUaServerHostedService.cs:75-82`);
> validation now fails fast *before* that path. This is the intended fail-fast improvement, but it is
> a behaviour change — keep it isolated and tested.
**Step 1: Write the failing test** — valid passes; bad port fails with fresh primitive wording
```csharp
using Microsoft.Extensions.Options;
using ZB.MOM.WW.OtOpcUa.OpcUaServer;
using ZB.MOM.WW.OtOpcUa.Host.Configuration;
using Xunit;
public class OpcUaApplicationHostOptionsValidatorTests
{
private static ValidateOptionsResult Run(OpcUaApplicationHostOptions o) =>
new OpcUaApplicationHostOptionsValidator().Validate(null, o);
[Fact] public void Defaults_pass() => Assert.True(Run(new OpcUaApplicationHostOptions()).Succeeded);
[Fact] public void Bad_port_fails() =>
Assert.Contains("OpcUa:OpcUaPort must be between 1 and 65535 (was 0)",
Run(new OpcUaApplicationHostOptions { OpcUaPort = 0 }).Failures!);
}
```
**Step 2: Run — expect FAIL.**
**Step 3: Implement the validator** — net-new, so use the wording-imposing primitives freely
```csharp
using ZB.MOM.WW.Configuration;
using ZB.MOM.WW.OtOpcUa.OpcUaServer;
namespace ZB.MOM.WW.OtOpcUa.Host.Configuration;
public sealed class OpcUaApplicationHostOptionsValidator : OptionsValidatorBase<OpcUaApplicationHostOptions>
{
protected override void Validate(ValidationBuilder builder, OpcUaApplicationHostOptions o)
{
builder.Required(o.ApplicationName, "OpcUa:ApplicationName");
builder.Required(o.ApplicationUri, "OpcUa:ApplicationUri");
builder.Required(o.PublicHostname, "OpcUa:PublicHostname");
builder.Required(o.PkiStoreRoot, "OpcUa:PkiStoreRoot");
builder.Port(o.OpcUaPort, "OpcUa:OpcUaPort");
builder.MinCount(o.EnabledSecurityProfiles, 1, "OpcUa:EnabledSecurityProfiles");
}
}
```
**Step 4: Register validated options** — `Program.cs` (near the other host registrations)
```csharp
builder.Services.AddValidatedOptions<OpcUaApplicationHostOptions, OpcUaApplicationHostOptionsValidator>(
builder.Configuration, "OpcUa");
```
**Step 5: Consume via DI in the hosted service** — `OtOpcUaServerHostedService.cs`
Add `IOptions<OpcUaApplicationHostOptions> options` to the constructor (store `_options`), then
replace lines 62-63:
```csharp
// was:
// var options = new OpcUaApplicationHostOptions();
// _configuration.GetSection("OpcUa").Bind(options);
var options = _options.Value;
```
(If `_configuration` becomes unused after this, leave it — other members may use it; verify before removing.)
**Step 6: Run tests + full build.**
Run: `cd ~/Desktop/OtOpcUa && dotnet build ZB.MOM.WW.OtOpcUa.slnx && dotnet test ZB.MOM.WW.OtOpcUa.slnx`
Expected: green, including the two new tests.
**Step 7: Commit**
```bash
git add src/Server/ZB.MOM.WW.OtOpcUa.Host tests
git commit -m "feat: validate OpcUa host options at startup (route through IOptions + ValidateOnStart)"
```
---
## Task 5: ScadaBridge — migrate the four `*OptionsValidator` to the shared base
**Classification:** standard
**Estimated implement time:** ~6 min (split per-validator if needed — they are independent files)
**Parallelizable with:** Task 6 (StartupValidator is a different file)
**Files:**
- Modify (add `PackageReference Include="ZB.MOM.WW.Configuration"` to each owning project):
- `src/ZB.MOM.WW.ScadaBridge.ClusterInfrastructure/…csproj`
- `src/ZB.MOM.WW.ScadaBridge.Security/…csproj`
- `src/ZB.MOM.WW.ScadaBridge.HealthMonitoring/…csproj`
- `src/ZB.MOM.WW.ScadaBridge.AuditLog/…csproj`
- Modify:
- `src/ZB.MOM.WW.ScadaBridge.ClusterInfrastructure/ClusterOptionsValidator.cs`
- `src/ZB.MOM.WW.ScadaBridge.Security/SecurityOptionsValidator.cs`
- `src/ZB.MOM.WW.ScadaBridge.HealthMonitoring/HealthMonitoringOptionsValidator.cs`
- `src/ZB.MOM.WW.ScadaBridge.AuditLog/Configuration/AuditLogOptionsValidator.cs`
- Test (regression guard, do not change): the existing four validator test classes.
**Transformation (identical shape for all four):**
1. `: IValidateOptions<T>``: OptionsValidatorBase<T>` (`using ZB.MOM.WW.Configuration;`).
2. `public ValidateOptionsResult Validate(string? name, T options)`
`protected override void Validate(ValidationBuilder builder, T options)`.
3. Delete `var failures = new List<string>();` and the
`return failures.Count … ? Fail(failures) : Success;` tail.
4. Each `if (<bad>) failures.Add("<msg>");``builder.RequireThat(!(<bad>), "<msg>");`
(i.e. invert the condition to the *valid* predicate), **message unchanged**.
Worked example — `HealthMonitoringOptionsValidator` (the others follow the same recipe):
```csharp
using Microsoft.Extensions.Options;
using ZB.MOM.WW.Configuration;
namespace ZB.MOM.WW.ScadaBridge.HealthMonitoring;
public sealed class HealthMonitoringOptionsValidator : OptionsValidatorBase<HealthMonitoringOptions>
{
protected override void Validate(ValidationBuilder builder, HealthMonitoringOptions options)
{
builder.RequireThat(options.ReportInterval > TimeSpan.Zero,
$"ScadaBridge:HealthMonitoring:ReportInterval must be a positive duration " +
$"(was {options.ReportInterval}); it is used directly as a PeriodicTimer period.");
builder.RequireThat(options.OfflineTimeout > TimeSpan.Zero,
$"ScadaBridge:HealthMonitoring:OfflineTimeout must be a positive duration " +
$"(was {options.OfflineTimeout}); it drives the offline-check PeriodicTimer cadence.");
builder.RequireThat(options.CentralOfflineTimeout > TimeSpan.Zero,
$"ScadaBridge:HealthMonitoring:CentralOfflineTimeout must be a positive duration " +
$"(was {options.CentralOfflineTimeout}).");
builder.RequireThat(
!(options.OfflineTimeout > TimeSpan.Zero
&& options.CentralOfflineTimeout > TimeSpan.Zero
&& options.CentralOfflineTimeout < options.OfflineTimeout),
$"ScadaBridge:HealthMonitoring:CentralOfflineTimeout ({options.CentralOfflineTimeout}) " +
$"must be >= OfflineTimeout ({options.OfflineTimeout}): the synthetic 'central' site has " +
"no heartbeat source and is fed only by the slower self-report loop, so it needs at " +
"least as much offline grace as a real site.");
}
}
```
> Reminder: do **not** swap to `builder.PositiveTimeSpan/MinCount/OneOf` — their wording differs
> from these bespoke messages and would break the existing tests. `ClusterOptionsValidator` has the
> most rules (SeedNodes≥2, strategy one-of, three positive-`TimeSpan`, cross-field heartbeat,
> `DownIfAlone`, `MinNrOfMembers`); apply the same invert-condition-keep-message recipe to each.
**Step — build + test (guard):**
Run: `cd ~/Desktop/ScadaBridge && dotnet build ZB.MOM.WW.ScadaBridge.slnx && dotnet test ZB.MOM.WW.ScadaBridge.slnx --filter FullyQualifiedName~OptionsValidator`
Expected: the four validators' existing tests pass unchanged.
**Step — commit:** `git commit -am "refactor: ScadaBridge validators onto OptionsValidatorBase (messages unchanged)"`
(Optional follow-on, separate task: collapse each module's `AddXxx` `Bind+ValidateOnStart+TryAddEnumerable`
into `AddValidatedOptions<T,TValidator>` where the binding shape matches — preserve HealthMonitoring's
idempotent registration called from three entry points. Verify each test still passes.)
---
## Task 6: ScadaBridge — `StartupValidator``ConfigPreflight`
**Classification:** high-risk
**Estimated implement time:** ~5 min
**Parallelizable with:** Task 5
**Files:**
- Modify: `src/ZB.MOM.WW.ScadaBridge.Host/StartupValidator.cs` (re-implement body over `ConfigPreflight`) — or inline into `Program.cs:41` and delete the class.
- Modify: `src/ZB.MOM.WW.ScadaBridge.Host/Program.cs:41`
- Test (regression guard, MUST stay green unchanged): `tests/ZB.MOM.WW.ScadaBridge.Host.Tests/StartupValidatorTests.cs`
> The final thrown message is **byte-identical** between `StartupValidator`
> (`"Configuration validation failed:\n - …"`) and `ConfigPreflight.ThrowIfInvalid()` — verified.
> The individual messages are bespoke and several are **cross-field** (GrpcPort≠RemotingPort,
> MetricsPort≠RemotingPort/GrpcPort, seed-node-port≠GrpcPort). `ConfigPreflight` has no
> `Add`/`RequireThat`; reproduce these via the `Require(key, predicate, reason)` escape hatch where
> the predicate **closes over** the other resolved values and ignores its passed argument, and
> `reason` is the exact tail so `$"{key} {reason}"` equals the original message.
**Recipe (preserve every message):**
- `RequireValue(key)` only where the original message is exactly `"{key} is required"`
(e.g. `ScadaBridge:Node:NodeHostname is required`).
- Everything else → `Require(key, pred, reason)`:
- `Require("ScadaBridge:Node:Role", raw => raw is "Central" or "Site", "must be 'Central' or 'Site'")`.
- `Require("ScadaBridge:Node:RemotingPort", raw => int.TryParse(raw, out var p) && p is >= 1 and <= 65535, "must be 1-65535")`**do not** use `RequirePort` (its wording differs).
- `Require("ScadaBridge:Cluster:SeedNodes", _ => (seedNodes?.Count ?? 0) >= 2, "must have at least 2 entries")` (read `seedNodes` once via `.Get<List<string>>()`).
- Role-conditional blocks → `.When(role == "Central", p => { … })` / `.When(role == "Site", p => { … })`.
- Cross-field, value-ignoring predicate example:
`p.Require("ScadaBridge:Node:GrpcPort", _ => port != grpcPort, "must differ from RemotingPort")`.
- Seed-node loop: `foreach (var seed in seedNodes ?? []) p.Require("ScadaBridge:Cluster:SeedNodes", _ => SeedNodePort(seed) != grpcPort, $"entry '{seed}' must not target the gRPC port ({grpcPort}); seed nodes must reference Akka remoting ports");` (keep the private `SeedNodePort` helper).
Resolve `role`, `port`, `grpcPort` (default 8083), `metricsPort` (default 8084) with the **exact**
parse-or-default logic from the current `StartupValidator` before building the preflight, then end
with `.ThrowIfInvalid()`.
**Step — run the guard test (unchanged):**
Run: `dotnet test ZB.MOM.WW.ScadaBridge.slnx --filter FullyQualifiedName~StartupValidatorTests`
Expected: PASS with no test edits — this is the byte-compatibility proof.
**Step — full ScadaBridge build + test:**
Run: `cd ~/Desktop/ScadaBridge && dotnet build ZB.MOM.WW.ScadaBridge.slnx && dotnet test ZB.MOM.WW.ScadaBridge.slnx`
Expected: all green (four validators + `StartupValidatorTests`).
**Step — commit:** `git commit -am "refactor: ScadaBridge StartupValidator → ConfigPreflight (byte-compatible)"`
---
## Final verification (all repos)
- `ZB.MOM.WW.Configuration` registration index → 200.
- Each repo: clean `dotnet restore` pulls `ZB.MOM.WW.Configuration 0.1.0` from Gitea.
- Each repo: `dotnet build` + `dotnet test` green on its `feat/adopt-zb-configuration` branch.
- No message-string drift anywhere except OtOpcUa's net-new validators.
- Open the three per-repo PRs (or finish per `superpowers-extended-cc:finishing-a-development-branch`).
- Update `components/configuration/GAPS.md` + the CLAUDE.md matrix to reflect actual adoption.
## Notes
- DRY/YAGNI/TDD honored: net-new OtOpcUa code is test-first; migrations rely on existing tests as the regression guard.
- `scadaproj` itself is NOT a git repo — do not `git init` it. Commits happen inside each sister repo.
- Skills: `@superpowers-extended-cc:executing-plans`, `@superpowers-extended-cc:test-driven-development`, `@superpowers-extended-cc:verification-before-completion`.
@@ -0,0 +1,18 @@
{
"planPath": "docs/plans/2026-06-01-deploy-zb-configuration.md",
"tasks": [
{"id": 11, "subject": "Task 1: Foundation — publish package + wire 3 consumers", "classification": "small", "status": "completed", "result": "Published ZB.MOM.WW.Configuration 0.1.0 to Gitea (was 404; now 200). nuget.config source-mapping + version pins on feat/adopt-zb-configuration in all 3 repos. Commits: MxGw 437ab65, OtOpcUa 0cbb82e, ScadaBridge 9bca6aa."},
{"id": 12, "subject": "Task 2: MxGateway — GatewayOptionsValidator → base", "classification": "standard", "status": "completed", "blockedBy": [11], "commit": "459a88b", "result": "Migrated to OptionsValidatorBase via RequireThat (messages byte-identical); AddGatewayConfiguration → AddValidatedOptions (+4 call sites). Tests 571/574 (3 pre-existing macOS failures). Spec ✅, code Approved-with-minors."},
{"id": 13, "subject": "Task 3: OtOpcUa — net-new LdapOptionsValidator", "classification": "standard", "status": "completed", "blockedBy": [12], "commit": "f35ebd7", "result": "New LdapOptionsValidator; Program.cs:99 → AddValidatedOptions(config,'Ldap') — behaviour-preserving per user decision A. FLAG: OtOpcUa LDAP binds nonexistent sections (real config = Security:Ldap); recorded as memory otopcua-ldap-config-section-mismatch. 4/4 new tests; build 0/0."},
{"id": 14, "subject": "Task 4: OtOpcUa — OpcUa validator + DI routing", "classification": "high-risk", "status": "completed", "blockedBy": [12], "commit": "88e773a", "result": "New OpcUaApplicationHostOptionsValidator; AddValidatedOptions(config,'OpcUa') in hasDriver block; hosted service now consumes IOptions (dead _configuration removed). 4/4 new tests; build 0/0. Spec ✅, code Approved-with-minors."},
{"id": 15, "subject": "Task 5: ScadaBridge — 4 validators → base", "classification": "standard", "status": "completed", "blockedBy": [13, 14], "commit": "aac59c9", "result": "Cluster/Security/HealthMonitoring/AuditLog → OptionsValidatorBase via RequireThat (no primitives; messages verbatim). DI untouched (AddValidatedOptions collapse deferred). 33/33 validator tests unchanged. Spec ✅, code Approved-with-minors (De Morgan readability nits)."},
{"id": 16, "subject": "Task 6: ScadaBridge — StartupValidator → ConfigPreflight", "classification": "high-risk", "status": "completed", "blockedBy": [13, 14], "commit": "6dbbc7a", "result": "StartupValidator body re-implemented over ConfigPreflight (Require escape-hatch for bespoke + cross-field rules; default int.TryParse + IsNullOrEmpty preserved). StartupValidatorTests 46/46 UNCHANGED (byte-compat proof). Spec ✅, code Approved-with-minors."}
],
"deferred": [
"ScadaBridge: collapse module AddXxx → AddValidatedOptions (DI simplification; preserve HealthMonitoring idempotent registration).",
"MxGateway pre-existing (not regressions): Ldap:Port allows >65535; AddIfInvalidPath doesn't catch IOException.",
"OtOpcUa pre-existing bug (flagged + memory): LdapOptions binds Security:Ldap nowhere; DevStubMode never applies — separate behaviour-changing fix.",
"Cosmetic: De Morgan predicate comments (ScadaBridge validators); vestigial `var options = _options` in OtOpcUaServerHostedService."
],
"lastUpdated": "2026-06-01"
}
@@ -0,0 +1,117 @@
# ZB.MOM.WW.Telemetry — Follow-ons Implementation Plan
> Continuation of [`2026-06-01-telemetry-library-adoption.md`](2026-06-01-telemetry-library-adoption.md).
> Executes the deferred follow-ons recorded in `components/observability/GAPS.md`, all four groups
> selected by the user.
**Goal:** Close the recorded telemetry follow-ons across the three apps — additive/hygiene fixes,
MxGateway metric normalization, ScadaBridge first application instruments, and OTLP opt-in.
**Branches:** new `feat/telemetry-followons` per repo (off the now-updated default). Commit per task,
never skip hooks, never force-push. The three repo phases are independent (parallel); within a repo,
sequential.
**Behaviour bar:** additive/opt-in by default (Prometheus stays the default exporter; new instruments
are new series; the MxGateway `ms``s` + rename are the *one* intentional metric-shape change, safe
because those series were never Prometheus-exported before the adoption).
---
## OtOpcUa (branch `feat/telemetry-followons` off `master`)
### Task O-A2: align Serilog to the 10.x line
**Classification:** small · **Files:** `Directory.Packages.props`
Bump `Serilog.AspNetCore`, `Serilog.Extensions.Hosting`, `Serilog.Settings.Configuration` from
`9.0.0``10.0.0` (ScadaBridge already runs `10.0.0` with `Serilog 4.x`, so 10.x is 4.x-compatible —
no Serilog 5 needed). Keep `Serilog 4.3.0` (or bump to `4.3.1` to match ScadaBridge). Restore + build
`ZB.MOM.WW.OtOpcUa.slnx`; run `--filter LogContextEnricherTests`. Commit.
### Task O-D: OTLP exporter opt-in (config-driven)
**Classification:** standard · **Parallelizable with:** O-A2 (disjoint files)
**Files:** `src/Server/.../Observability/ObservabilityExtensions.cs`, `src/Server/.../Program.cs:138`
Refactor `AddOtOpcUaObservability` to accept `IConfiguration` and read
`OtOpcUa:Telemetry:Exporter` (`Prometheus`|`Otlp`, default Prometheus) + `OtOpcUa:Telemetry:OtlpEndpoint`;
set `o.Exporter`/`o.OtlpEndpoint` accordingly. Update the call site to
`builder.Services.AddOtOpcUaObservability(builder.Configuration)`. Default (no config) stays Prometheus.
This also makes OtOpcUa's recorded spans exportable when OTLP is configured (resolves the trace no-op).
Build; run `OtOpcUaTelemetryHookTests`. Commit.
---
## MxAccessGateway (branch `feat/telemetry-followons` off `main`)
### Task M-A3: gitignore stray doc artifacts
**Classification:** trivial · **Files:** `.gitignore`
Append a `# Documentation review artifacts` block ignoring `*-docs-issues.md`, `*-docs-fixed.md`,
`*-docs-final.md` (the 5 untracked `*-docs-*.md` files are CommentChecker "Documentation Analysis
Report" output). Commit. (Do NOT delete the files — just ignore.)
### Task M-B: metric normalization (`ms``s` + meter rename)
**Classification:** standard · **Files:** `src/.../Metrics/GatewayMetrics.cs`, test if needed
- Rename `MeterName` const `"MxGateway.Server"``"ZB.MOM.WW.MxGateway"`. (AddZbTelemetry uses the
const, so it follows automatically; no test asserts the literal; `GatewayMetricsTests` filter by
meter *instance*, not name.)
- Change the 3 histograms' unit `"ms"``"s"` (CreateHistogram lines) and their 4 record sites
`.TotalMilliseconds``.TotalSeconds`. The snapshot/dashboard do NOT read these histograms, so no
read-path impact. Check `GatewayMetricsTests` for any histogram-value assertion in ms and update.
Build the Server project; run `--filter "GatewayMetricsTests|GatewayApplicationTests"`. Commit.
### Task M-D: OTLP exporter opt-in
**Classification:** small · **Files:** `src/.../GatewayApplication.cs` (the `AddZbTelemetry` lambda)
In the `AddZbTelemetry` lambda, read `MxGateway:Telemetry:Exporter` + `MxGateway:Telemetry:OtlpEndpoint`
from `builder.Configuration` (in scope) and set `o.Exporter`/`o.OtlpEndpoint`. Default Prometheus. Build.
Commit. (Sequential after M-B — both touch GatewayApplication.cs / metrics area.)
---
## ScadaBridge (branch `feat/telemetry-followons` off `main`)
### Task S-A1: site-node HTTP/1.1 `/metrics` listener
**Classification:** standard · **Files:** `src/.../NodeOptions.cs`, `src/.../Program.cs` (Site Kestrel)
Add `MetricsPort` (default `8082`) to `NodeOptions`. In the Site block's `ConfigureKestrel`, add a
second `ListenAnyIP(metricsPort, lo => lo.Protocols = Http1AndHttp2)` alongside the existing HTTP/2-only
gRPC-port listener, so the already-mapped `/metrics` becomes scrapable over HTTP/1.1 on site nodes.
Read the port from `ScadaBridge:Node:MetricsPort` (default 8082). Build; existing Host.Tests stay green.
Commit.
### Task S-D: OTLP exporter opt-in
**Classification:** small · **Files:** `src/.../SiteServiceRegistration.cs` (the `AddZbTelemetry` lambda)
In `BindSharedOptions`, read `ScadaBridge:Telemetry:Exporter` + `ScadaBridge:Telemetry:OtlpEndpoint`
from `config` (in scope) and set `o.Exporter`/`o.OtlpEndpoint`. Default Prometheus. Build. Commit.
(Sequential after S-C0 — both edit the `AddZbTelemetry` call.)
### Task S-C0: `ScadaBridgeTelemetry` meter + registration
**Classification:** standard · **Files:** Create `src/ZB.MOM.WW.ScadaBridge.Commons/Observability/ScadaBridgeTelemetry.cs`; edit `SiteServiceRegistration.cs` (`AddZbTelemetry` Meters)
Create a `ScadaBridgeTelemetry` static class: `Meter "ZB.MOM.WW.ScadaBridge"` + the four instruments
(`scadabridge.deployments.applied` counter; `scadabridge.store_and_forward.queue.depth` observable
gauge; `scadabridge.inbound_api.requests` counter; `scadabridge.site.connection.up` up/down gauge) with
thin static emit helpers. Register `o.Meters = ["ZB.MOM.WW.ScadaBridge"]` in the `AddZbTelemetry` call.
Build. Commit. (Precedes C1C4.)
### Tasks S-C1…S-C4: wire the four emit points
**Classification:** standard each · depend on S-C0
- **S-C1 `deployments.applied`** — increment on the DeploymentManager/DeploymentService success path.
- **S-C2 `store_and_forward.queue.depth`** — observable-gauge callback reading the StoreAndForward depth
(SQLite `COUNT`/existing depth accessor).
- **S-C3 `inbound_api.requests`** — increment (tag = method) in the InboundAPI endpoint filter/middleware.
- **S-C4 `site.connection.up`** — +1 on site-stream open, 1 on close in the Communication/SiteStream
gRPC server.
Each implementer finds the cleanest emit point and **STOPs + reports** if no clean point exists rather
than forcing a fragile edit. Add a focused test where practical. Build; commit per instrument.
---
## scadaproj bookkeeping
### Task Z: update GAPS.md
**Classification:** trivial · **Files:** `components/observability/GAPS.md`
Move the handled follow-ons (#6/#7 done; A1 site-listener done; #9 first instruments done; #10/#11 OTLP
opt-in done) from "Deferred" to a "Follow-ons — DONE 2026-06-01" subsection; note what each app now does.
Commit + (on user request) push all branches/merges.
---
## Sequencing
After each repo branch is cut: OtOpcUa {O-A2 ∥ O-D}; MxGateway {M-A3 → M-B → M-D}; ScadaBridge
{S-A1 ∥ (S-C0 → {S-C1, S-C2, S-C3, S-C4} → S-D)}. Repos run in parallel. Z + merge/push last.
@@ -0,0 +1,234 @@
# Adopt `ZB.MOM.WW.Telemetry` across the three sister apps — design
**Date:** 2026-06-01
**Status:** Approved (design); implementation plan to follow via writing-plans.
**Scope:** Integrate the built-but-unadopted `ZB.MOM.WW.Telemetry` (+ `.Serilog`) shared library
into all three sister apps — **OtOpcUa**, **MxAccessGateway**, **ScadaBridge** — wiring the shared
OpenTelemetry Resource, standard instrumentation, Prometheus `/metrics`, and the shared Serilog
bootstrap with identity enrichers and trace↔log correlation.
This is the second full cross-fleet adoption of one of the six shared `ZB.MOM.WW.*` libraries
(after `ZB.MOM.WW.Health`). It follows the adoption backlog in
[`components/observability/GAPS.md`](../../components/observability/GAPS.md), re-verified against
current code on 2026-06-01.
> **Correction recorded during design:** the library CLAUDE.md and
> [`components/observability/README.md`](../../components/observability/README.md) claim
> *"MxAccessGateway logging adopted (MEL → Serilog migration done on its own branch)."* This is
> **false on `main`** — MxGateway is still MEL-only (no Serilog packages, `GatewayLogScope` /
> `GatewayLogRedactor` still bespoke), and its `MxGateway.Server` meter is **not exported at all**
> (no `AddOpenTelemetry`, no `/metrics`). That branch never landed. This design therefore includes
> the full MxGateway MEL→Serilog migration, and the bookkeeping task corrects the false claim.
---
## 1. Goal & scope
Wire the two shared packages into all three apps:
- **`ZB.MOM.WW.Telemetry`** — `AddZbTelemetry(options)`: shared OTel Resource (the identity triple
`service.name` / `site.id` / `node.role` + `service.namespace` / `service.version` / `host.name`),
caller-supplied Meters/ActivitySources, standard instrumentation (ASP.NET Core, HttpClient, gRPC
client, runtime, process), Prometheus always-on exporter (OTLP opt-in), and `app.MapZbMetrics()`
to mount `/metrics`.
- **`ZB.MOM.WW.Telemetry.Serilog`** — `AddZbSerilog(options)`: two-stage Serilog bootstrap,
`ReadFrom.Configuration` sinks, `SiteId`/`NodeRole`/`NodeHostname` enrichers, `TraceContextEnricher`
(writes `trace_id`/`span_id` from `Activity.Current`), and the `ILogRedactor` seam via
`RedactionEnricher`. Uses `preserveStaticLogger: true` so it is test-safe.
**The headline gap (§1 of GAPS):** *no* app sets a single OTel Resource attribute today, so every
metric and span from every node is indistinguishable in a backend — no service identity, no
site/role topology, no version label. `AddZbTelemetry` closes this for all three at once. This is
the single highest-value observability gap across the fleet.
**Behaviour-preserving bar** (same as the Health adoption): same log messages at the same levels,
same metric series with the same names and units, same `/metrics` path. New series produced by
standard instrumentation are *additive*. All genuinely breaking items are **deferred** (see §6).
---
## 2. Distribution
- **Feed:** Gitea NuGet registry `dohertj2-gitea`
(`https://gitea.dohertylan.com/api/packages/dohertj2/nuget/index.json`). Credentials live
**creds-only at the user level** (`~/.nuget/NuGet/NuGet.Config` `<packageSourceCredentials>`),
matched by source name — **never committed to any repo**. Already configured during the Health
round; no change needed here.
- **Source-mapping — the two-pattern gotcha (carried from Health):** under
`packageSourceMapping`, the glob `ZB.MOM.WW.Telemetry.*` matches `ZB.MOM.WW.Telemetry.Serilog`
but **not** the bare core id `ZB.MOM.WW.Telemetry`. Each repo therefore needs **both**:
```xml
<package pattern="ZB.MOM.WW.Telemetry" />
<package pattern="ZB.MOM.WW.Telemetry.*" />
```
- **Per-repo wiring:**
| Repo | CPM? | Change |
|---|---|---|
| OtOpcUa | yes (`Directory.Packages.props`) | add 2 `<PackageVersion>` @ `0.1.0`; extend existing `NuGet.config` mapping with both Telemetry patterns; add 2 versionless `<PackageReference>` to the Host csproj |
| ScadaBridge | yes | add 2 `<PackageVersion>` @ `0.1.0`; extend existing `nuget.config` mapping; add 2 versionless `<PackageReference>` to the Host csproj |
| MxAccessGateway | **no CPM** | add 2 direct versioned `<PackageReference>` to the Server csproj; extend its `nuget.config` mapping (the file created during the Health round) |
- **Task 0 (gating, like Health):** the library docs claim these two packages are already on the
feed. **Verify first; pack + push the two `.nupkg`s if missing** — the Health round proved this
claim cannot be trusted.
- **Serilog version floor (Gap V1):** OtOpcUa pins `Serilog.AspNetCore` 9.0.0, ScadaBridge 10.0.0.
Confirm the `.Serilog` package's Serilog dependency floor is satisfied by both (bump if not), and
pick MxGateway's fresh `Serilog.AspNetCore` version to align.
---
## 3. Per-app adoption surface
### OtOpcUa (`master`) — moderate
Already has Serilog (inline `UseSerilog`), full OTel, and Prometheus `/metrics`.
- **Metrics/traces:** replace the hand-rolled
`src/Server/ZB.MOM.WW.OtOpcUa.Host/Observability/ObservabilityExtensions.cs`
(`AddOpenTelemetry().WithMetrics(...AddPrometheusExporter()).WithTracing(...)` +
`MapPrometheusScrapingEndpoint("/metrics")`) with
```csharp
builder.AddZbTelemetry(o =>
{
o.ServiceName = "otopcua";
o.ServiceVersion = /* AssemblyInformationalVersion */;
o.Meters = ["ZB.MOM.WW.OtOpcUa"];
o.ActivitySources = ["ZB.MOM.WW.OtOpcUa"];
// Exporter defaults to Prometheus
});
// ...
app.MapZbMetrics();
```
**Same meter/source names and same `/metrics` path** → behaviour-preserving; *gains* the Resource
identity + standard instrumentation. (OtOpcUa records spans but has no trace exporter today;
Prometheus is metrics-only, so traces remain a no-op exporter-wise — unchanged. OTLP trace wiring
is deferred, §6.)
- **Logging:** replace the inline
`builder.Host.UseSerilog((ctx, lc) => lc.ReadFrom.Configuration(...).WriteTo.Console().WriteTo.File(...))`
with `builder.AddZbSerilog(o => { o.ServiceName = "otopcua"; })`, moving the Console/File sinks
into `appsettings` `Serilog:WriteTo` so `ReadFrom.Configuration` reproduces them. Keep the
existing driver-scope `LogContextEnricher` alongside the shared enrichers.
- **Identity:** `ServiceName="otopcua"`; `SiteId`/`NodeRole` omitted (none in config).
### ScadaBridge (`main`) — moderate, two composition roots
Serilog already (via `LoggerConfigurationFactory`); **no OTel at all**; `SiteId` + `NodeRole`
already read from config (`ScadaBridge:Node:*`, `NodeOptions`).
- **Metrics:** add `builder.AddZbTelemetry(o => { o.ServiceName="scadabridge"; o.SiteId=siteId; o.NodeRole=nodeRole; })`
+ `app.MapZbMetrics()` in **both** composition roots — the Central block and the Site block of
`Program.cs` (the same two-root pattern the Health adoption used). `Meters=[]` for now (app
instruments are deferred, §6). Purely additive — no metrics exist today to break.
- **Logging:** replace `LoggerConfigurationFactory.Build(config, nodeRole, siteId, nodeHostname)` +
`builder.Host.UseSerilog()` with
`builder.AddZbSerilog(o => { o.ServiceName="scadabridge"; o.SiteId=siteId; o.NodeRole=nodeRole; })`
— its enrichers reproduce the factory's `SiteId`/`NodeRole`/`NodeHostname`. Keep a minimal
`CreateBootstrapLogger()` line for early-startup capture per the library's documented pattern,
then delete `LoggerConfigurationFactory`. Verify the existing sinks are config-driven (`Serilog`
section in `appsettings`) so the swap is byte-equivalent; mirror any code-side sinks into config.
### MxAccessGateway (`main`) — heaviest (the MEL→Serilog migration)
MEL-only; custom `MxGateway.Server` meter **not exported**; no `/metrics`. The x86 net48 worker is
a separate process and **out of scope** — telemetry is for the Server.
- **Logging (MEL → Serilog):**
- Add Serilog packages (`Serilog.AspNetCore` + sinks) to the Server csproj (direct versioned ref).
- Replace the temporary `LoggerFactory.Create(...)` MEL bootstrap in `GatewayApplication.cs`
(and `builder.Logging` config) with `builder.AddZbSerilog(o => { o.ServiceName="mxgateway"; })`
+ a `CreateBootstrapLogger()` line.
- `GatewayLogScope``Serilog.Context.LogContext.PushProperty(...)`.
- `GatewayLogRedactor` → implement the `ILogRedactor` seam, register in DI (picked up by
`RedactionEnricher`).
- Request-logging middleware → `UseSerilogRequestLogging()` (or keep the middleware but emit via
a Serilog `ILogger`). Sinks to `appsettings`.
- **Metrics:** `builder.AddZbTelemetry(o => { o.ServiceName="mxgateway"; o.Meters=["MxGateway.Server"]; })`
+ `app.MapZbMetrics()` → the 20 existing instruments (13 counters, 3 histograms, 4 gauges) finally
export. **Keep the `MxGateway.Server` meter name and the `ms` histogram units** (rename and unit
conversion are deferred, §6). `GetSnapshot()` in-memory read path stays untouched.
---
## 4. Shared seam
```
ZbTelemetryOptions (ServiceName / SiteId / NodeRole / Meters / ActivitySources / Exporter)
┌─────────────────┴──────────────────┐
AddZbTelemetry (core) AddZbSerilog (.Serilog)
• ZbResource (identity triple) • ReadFrom.Configuration sinks
• app Meters + ActivitySources • SiteId / NodeRole / NodeHostname enrichers
• standard instrumentation • TraceContextEnricher (trace_id / span_id)
• Prometheus always + OTLP opt-in • ILogRedactor seam (RedactionEnricher)
│ │
app.MapZbMetrics() → /metrics preserveStaticLogger: true (test-safe)
```
Both packages share the single `ZbTelemetryOptions`. The Serilog OTLP log sink derives its Resource
attributes from `ZbResource.BuildAttributes` (single source of truth), so logs can never drift from
metrics and traces in a backend.
---
## 5. Sequencing & execution
Subagent-driven, classification-driven review chain. **Task 0 gates everything** (verify/publish the
feed). Then three **independent** per-repo phases — each its own git repo, branch
**`feat/adopt-zb-telemetry`**, commit per task, **never skip hooks, never force-push**:
1. **Task 0 (gating):** verify the two Telemetry `.nupkg`s are on the Gitea feed; pack + push if
missing (creds-only user config, already set).
2. **OtOpcUa:** source-mapping + package refs → `AddZbTelemetry` swap → `AddZbSerilog` swap → tests.
3. **ScadaBridge:** source-mapping + package refs → `AddZbTelemetry` (both roots) → `AddZbSerilog`
(replace `LoggerConfigurationFactory`) → tests.
4. **MxAccessGateway:** source-mapping + package refs → **MEL→Serilog** (sub-tasked, `high-risk`)
`AddZbTelemetry` metrics export → tests.
5. **scadaproj bookkeeping:** add an "Adoption status — DONE" section to
`components/observability/GAPS.md` (per-repo table + deferred items), **and correct the false
"MxGateway logging already adopted" claim** in CLAUDE.md, the library CLAUDE.md, and
`components/observability/README.md`.
The MxGateway MEL→Serilog migration is the one `high-risk` change (logging behaviour on the most
operational app) and gets the full spec→code serial review chain. The other per-app swaps are
`standard`.
---
## 6. Deferred (out of scope this round; recorded in GAPS)
| # | Item | Why deferred |
|---|---|---|
| #6 | MxGateway histogram `ms``s` | Breaking dashboard/alert change — needs ops coordination |
| #7 | MxGateway meter rename `MxGateway.Server``ZB.MOM.WW.MxGateway` | Breaking Prometheus label change — needs ops coordination |
| #9 | ScadaBridge app instruments (`ScadaBridgeTelemetry` + `scadabridge.*`) | Application-specific work, not shared-library adoption |
| #10 | OtOpcUa OTLP exporter alongside Prometheus | Opt-in; no consumer for OTLP yet |
| #11 | OtOpcUa trace-export no-op (spans recorded, no exporter) | Resolved by #10 / OTLP; or document |
None of these block the behaviour-preserving initial adoption.
---
## 7. Testing
All tests run **offline** — Prometheus is in-process, no OTLP collector required, and the library's
own test suites are network-free.
- **OtOpcUa:** assert `/metrics` is still served, the `ZB.MOM.WW.OtOpcUa` meter is present, the
Resource carries `service.name`, and the shared Serilog enrichers are wired.
- **ScadaBridge:** assert `/metrics` is served in **both** roles, the logger carries
`SiteId`/`NodeRole` enrichers, and startup is clean after `LoggerConfigurationFactory` removal.
- **MxAccessGateway** (the careful one): assert log messages are still emitted at the same levels,
redaction still applies, request logging still fires, `/metrics` is now served, and the
`GetSnapshot()` path is unchanged — using the existing fake-worker test harness (no MXAccess
needed).
---
## 8. Acceptance bar
- Each app builds and its test suite is green.
- `/metrics` serves the same existing series (plus additive standard-instrumentation series); meter
names and units unchanged.
- Logs carry the same messages at the same levels, plus the shared identity enrichers and
`trace_id`/`span_id` correlation.
- No secrets committed to any repo (the Gitea token stays creds-only at the user level).
- `components/observability/GAPS.md` updated; the false "MxGateway logging adopted" claim corrected.
@@ -0,0 +1,848 @@
# ZB.MOM.WW.Telemetry Adoption Implementation Plan
> **For Claude:** REQUIRED SUB-SKILL: Use superpowers-extended-cc:executing-plans (or subagent-driven-development) to implement this plan task-by-task.
**Goal:** Adopt the shared `ZB.MOM.WW.Telemetry` + `ZB.MOM.WW.Telemetry.Serilog` packages across OtOpcUa, MxAccessGateway, and ScadaBridge — giving all three the OTel Resource identity triple, standard instrumentation, Prometheus `/metrics`, and shared Serilog correlation — behaviour-preserving, with breaking items deferred.
**Architecture:** Gitea-registry distribution (`dohertj2-gitea`, creds-only at user level). Each app references the shared packages and swaps its bespoke wiring for `AddZbTelemetry` / `AddZbSerilog`, keeping existing meter names, units, log messages, and the `/metrics` path. Each sister repo is its own git repo; work happens on branch `feat/adopt-zb-telemetry`, one commit per task, **never skip hooks, never force-push.**
**Tech Stack:** .NET 10, OpenTelemetry SDK, Prometheus exporter, Serilog, NuGet Central Package Management (OtOpcUa + ScadaBridge; MxGateway has none).
**Source design:** [`2026-06-01-telemetry-library-adoption-design.md`](2026-06-01-telemetry-library-adoption-design.md)
---
## Two refinements discovered during planning (deviations from the design doc)
Both serve the approved **behaviour-preserving** acceptance bar:
1. **ScadaBridge logging — KEEP `LoggerConfigurationFactory`.** The design doc said "delete the
factory and swap to `AddZbSerilog`." Code review showed the factory implements a documented
governance contract (REQ-HOST-8 / Host-011/014/020/022): `ScadaBridge:Logging:MinimumLevel` is
the floor and **overrides** `Serilog:MinimumLevel`, with operator warnings when both are set or
a level is mistyped. `AddZbSerilog` hard-codes `MinimumLevel.Is(Information)` *before*
`ReadFrom.Configuration`, which inverts that precedence and silently drops the
`ScadaBridge:Logging:MinimumLevel` knob (and breaks its tests). **Plan: keep the factory, add the
shared `TraceContextEnricher` to it** (gaining trace↔log correlation) and do NOT adopt
`AddZbSerilog` for ScadaBridge. ScadaBridge still fully adopts the metrics/Resource half.
2. **MxGateway logging — keep `GatewayLogScope` + request-logging middleware as-is.** The Serilog
MEL provider captures MEL `BeginScope` dictionaries as structured properties, so the existing
middleware keeps producing the same scope properties once Serilog is the provider. The only
logging code changes are: register Serilog as the provider (`AddZbSerilog`), migrate the
`appsettings` `Logging` section to a `Serilog` section, and wrap the static `GatewayLogRedactor`
behind the `ILogRedactor` seam. No rewrite of working scope code.
---
## Execution order & parallelism
- **Task 0 gates everything** (packages must be on the feed before any repo can restore).
- After Task 0, the **three repo phases are independent** (separate working directories) and may run
concurrently: OtOpcUa (Tasks 13), ScadaBridge (Tasks 46), MxGateway (Tasks 711).
- **Within a repo, tasks are sequential** (same working tree / same branch — do not dispatch two
implementers against one repo concurrently).
- **Task 12** (scadaproj bookkeeping) runs last, after all three phases land.
Branch setup (first task in each repo creates it): `git checkout -b feat/adopt-zb-telemetry` from the
repo's default branch (`master` for OtOpcUa, `main` for the others).
---
## Task 0: Publish/verify Telemetry packages on the Gitea feed
**Classification:** small
**Estimated implement time:** ~4 min
**Parallelizable with:** none (gates all)
**Files:**
- Work in: `/Users/dohertj2/Desktop/scadaproj/ZB.MOM.WW.Telemetry/`
- No repo files edited (publish only). Credentials already at `~/.nuget/NuGet/NuGet.Config`.
**Context:** The library CLAUDE.md claims these are "published to the Gitea NuGet feed." The Health
round proved that claim unreliable. Verify; pack + push only if missing. Mirrors Health Task 0.
**Step 1: Check whether `ZB.MOM.WW.Telemetry` 0.1.0 is already on the feed**
```bash
cd /Users/dohertj2/Desktop/scadaproj/ZB.MOM.WW.Telemetry
# Use the user-level creds (source name dohertj2-gitea) already configured.
dotnet nuget list source # confirm dohertj2-gitea is NOT registered globally (creds are user-level only)
curl -s -u "dohertj2:$(grep -A2 dohertj2-gitea ~/.nuget/NuGet/NuGet.Config | grep ClearTextPassword | sed -E 's/.*value="([^"]+)".*/\1/')" \
"https://gitea.dohertylan.com/api/packages/dohertj2/nuget/registration/zb.mom.ww.telemetry/index.json" -o /tmp/tele.json -w "%{http_code}\n"
```
Expected: `200` if already published (then SKIP to Step 4), `404` if missing (continue).
**Step 2: Pack the two packages (only if missing)**
```bash
dotnet pack ZB.MOM.WW.Telemetry.slnx -c Release -o ./artifacts
ls ./artifacts/*.nupkg
```
Expected: `ZB.MOM.WW.Telemetry.0.1.0.nupkg` and `ZB.MOM.WW.Telemetry.Serilog.0.1.0.nupkg`.
**Step 3: Push both to Gitea (only if missing)**
```bash
TOKEN=$(grep -A2 dohertj2-gitea ~/.nuget/NuGet/NuGet.Config | grep ClearTextPassword | sed -E 's/.*value="([^"]+)".*/\1/')
for pkg in ./artifacts/ZB.MOM.WW.Telemetry.0.1.0.nupkg ./artifacts/ZB.MOM.WW.Telemetry.Serilog.0.1.0.nupkg; do
dotnet nuget push "$pkg" --source "https://gitea.dohertylan.com/api/packages/dohertj2/nuget/index.json" --api-key "$TOKEN"
done
```
Expected: `Your package was pushed.` for each (or `409 Conflict` if a version already exists — acceptable).
**Step 4: Verify both ids resolve**
```bash
for id in zb.mom.ww.telemetry zb.mom.ww.telemetry.serilog; do
curl -s -u "dohertj2:$TOKEN" "https://gitea.dohertylan.com/api/packages/dohertj2/nuget/registration/$id/index.json" -w " -> %{http_code}\n" -o /dev/null
done
```
Expected: `-> 200` for both.
**Step 5: No commit** (publish-only task). Record completion.
> **SECURITY:** the Gitea token must NEVER be written into any repo file or commit. It lives only in
> `~/.nuget/NuGet/NuGet.Config`. The `curl`/`push` commands read it from there at runtime.
---
## Task 1: OtOpcUa — distribution wiring (source mapping + package refs)
**Classification:** small
**Estimated implement time:** ~4 min
**Parallelizable with:** Task 4, Task 7 (other repos)
**Files:**
- Modify: `/Users/dohertj2/Desktop/OtOpcUa/NuGet.config`
- Modify: `/Users/dohertj2/Desktop/OtOpcUa/Directory.Packages.props`
- Modify: `/Users/dohertj2/Desktop/OtOpcUa/src/Server/ZB.MOM.WW.OtOpcUa.Host/ZB.MOM.WW.OtOpcUa.Host.csproj`
**Step 1: Branch**
```bash
cd /Users/dohertj2/Desktop/OtOpcUa && git checkout master && git pull --ff-only && git checkout -b feat/adopt-zb-telemetry
```
**Step 2: Add Telemetry patterns to `NuGet.config`** — under `<packageSource key="dohertj2-gitea">`, add BOTH patterns (the `.*` glob does NOT match the bare core id):
```xml
<packageSource key="dohertj2-gitea">
<package pattern="ZB.MOM.WW.Health" />
<package pattern="ZB.MOM.WW.Health.*" />
<package pattern="ZB.MOM.WW.Telemetry" />
<package pattern="ZB.MOM.WW.Telemetry.*" />
</packageSource>
```
**Step 3: Add versions to `Directory.Packages.props`** (next to the Health `<PackageVersion>` lines):
```xml
<PackageVersion Include="ZB.MOM.WW.Telemetry" Version="0.1.0" />
<PackageVersion Include="ZB.MOM.WW.Telemetry.Serilog" Version="0.1.0" />
```
**Step 4: Add versionless refs to the Host csproj** (next to the `ZB.MOM.WW.Health` refs):
```xml
<PackageReference Include="ZB.MOM.WW.Telemetry" />
<PackageReference Include="ZB.MOM.WW.Telemetry.Serilog" />
```
**Step 5: Restore + build to confirm the Gitea feed resolves and Serilog floor is satisfied**
```bash
dotnet restore ZB.MOM.WW.OtOpcUa.slnx
dotnet build ZB.MOM.WW.OtOpcUa.slnx -c Debug
```
Expected: restore pulls both packages from `dohertj2-gitea`; build succeeds. If restore fails on a
`Serilog.AspNetCore` floor (OtOpcUa pins 9.0.0), bump `Serilog.AspNetCore` (and the related
`Serilog.*` 9.x lines) in `Directory.Packages.props` to the floor the package requires, then rebuild.
**Step 6: Commit**
```bash
git add NuGet.config Directory.Packages.props src/Server/ZB.MOM.WW.OtOpcUa.Host/ZB.MOM.WW.OtOpcUa.Host.csproj
git commit -m "build(otopcua): reference ZB.MOM.WW.Telemetry packages from Gitea feed"
```
---
## Task 2: OtOpcUa — swap OTel wiring to AddZbTelemetry
**Classification:** standard
**Estimated implement time:** ~4 min
**Parallelizable with:** none (within OtOpcUa)
**Files:**
- Modify: `/Users/dohertj2/Desktop/OtOpcUa/src/Server/ZB.MOM.WW.OtOpcUa.Host/Observability/ObservabilityExtensions.cs` (rewrite body; keep both method names + signatures)
- Test (oracle, do not edit): `/Users/dohertj2/Desktop/OtOpcUa/tests/Server/ZB.MOM.WW.OtOpcUa.Runtime.Tests/Observability/OtOpcUaTelemetryHookTests.cs`
**Context:** Today `AddOtOpcUaObservability()` (called at `Program.cs:138`) hand-wires
`AddOpenTelemetry().WithMetrics(...AddMeter("ZB.MOM.WW.OtOpcUa")...AddPrometheusExporter()).WithTracing(...AddSource("ZB.MOM.WW.OtOpcUa"))`,
and `MapOtOpcUaMetrics()` (called at `Program.cs:160`) maps `/metrics`. Keep both call sites
unchanged; rewrite the extension bodies to delegate to the shared library. **Same meter/source
names + same `/metrics` path** ⇒ behaviour-preserving; gains the Resource identity triple +
standard instrumentation.
**Step 1: Rewrite `ObservabilityExtensions.cs`** preserving the two public method signatures:
```csharp
using Microsoft.AspNetCore.Routing;
using Microsoft.Extensions.DependencyInjection;
using ZB.MOM.WW.OtOpcUa.Commons.Observability; // OtOpcUaTelemetry
using ZB.MOM.WW.Telemetry;
namespace ZB.MOM.WW.OtOpcUa.Host.Observability;
/// <summary>
/// OtOpcUa observability wiring, delegated to the shared ZB.MOM.WW.Telemetry library.
/// Keeps the existing meter/ActivitySource names ("ZB.MOM.WW.OtOpcUa") and the "/metrics"
/// scrape path, and adds the shared OTel Resource + standard instrumentation.
/// </summary>
public static class ObservabilityExtensions
{
public static IServiceCollection AddOtOpcUaObservability(this IServiceCollection services)
{
ArgumentNullException.ThrowIfNull(services);
return services.AddZbTelemetry(o =>
{
o.ServiceName = "otopcua";
o.Meters = [OtOpcUaTelemetry.MeterName]; // "ZB.MOM.WW.OtOpcUa"
o.ActivitySources = [OtOpcUaTelemetry.ActivitySourceName]; // "ZB.MOM.WW.OtOpcUa"
// Exporter defaults to Prometheus — preserves the existing /metrics posture.
});
}
// Keep the SAME signature the Program.cs:160 call site uses (app.MapOtOpcUaMetrics()).
// MapZbMetrics() maps MapPrometheusScrapingEndpoint() whose default path is "/metrics".
public static IEndpointRouteBuilder MapOtOpcUaMetrics(this IEndpointRouteBuilder endpoints)
{
ArgumentNullException.ThrowIfNull(endpoints);
endpoints.MapZbMetrics();
return endpoints;
}
}
```
> If the existing `MapOtOpcUaMetrics` extends `WebApplication`/`IApplicationBuilder` rather than
> `IEndpointRouteBuilder`, keep THAT receiver type and call `app.MapZbMetrics();` — match the
> current signature so `Program.cs:160` compiles unchanged.
**Step 2: Build**
```bash
cd /Users/dohertj2/Desktop/OtOpcUa && dotnet build ZB.MOM.WW.OtOpcUa.slnx -c Debug
```
Expected: PASS. (The now-redundant direct `OpenTelemetry.Extensions.Hosting` /
`OpenTelemetry.Exporter.Prometheus.AspNetCore` refs may stay — they resolve the same assemblies the
shared package brings; leaving them is lower-risk than pruning.)
**Step 3: Run the telemetry hook tests (the behaviour oracle)**
```bash
dotnet test ZB.MOM.WW.OtOpcUa.slnx --filter "FullyQualifiedName~OtOpcUaTelemetryHookTests"
```
Expected: PASS — the meter `ZB.MOM.WW.OtOpcUa` and ActivitySource still emit (the shared
`AddZbTelemetry` registered them via `o.Meters`/`o.ActivitySources`).
**Step 4: Commit**
```bash
git add src/Server/ZB.MOM.WW.OtOpcUa.Host/Observability/ObservabilityExtensions.cs
git commit -m "feat(otopcua): wire OTel via AddZbTelemetry (shared Resource + std instrumentation)"
```
---
## Task 3: OtOpcUa — swap Serilog to AddZbSerilog + move sinks to config
**Classification:** standard
**Estimated implement time:** ~5 min
**Parallelizable with:** none (within OtOpcUa)
**Files:**
- Modify: `/Users/dohertj2/Desktop/OtOpcUa/src/Server/ZB.MOM.WW.OtOpcUa.Host/Program.cs:49-52` (the inline `UseSerilog` block)
- Modify: `/Users/dohertj2/Desktop/OtOpcUa/src/Server/ZB.MOM.WW.OtOpcUa.Host/appsettings.json` (currently `{}`)
- Test (oracle): `/Users/dohertj2/Desktop/OtOpcUa/tests/Core/ZB.MOM.WW.OtOpcUa.Core.Tests/Observability/LogContextEnricherTests.cs`
**Context:** Today `Program.cs:49-52` configures Serilog in code with `ReadFrom.Configuration` +
`WriteTo.Console()` + `WriteTo.File("logs/otopcua-.log", rollingInterval: Day)`. `AddZbSerilog` uses
`ReadFrom.Configuration` only, so the Console/File sinks must move into config to be reproduced. The
role-specific `appsettings.*.json` already carry `Serilog:MinimumLevel` overrides — those keep
working through `ReadFrom.Configuration`.
**Step 1: Add the sinks to `appsettings.json`** (replace the empty `{}`):
```json
{
"Serilog": {
"Using": [ "Serilog.Sinks.Console", "Serilog.Sinks.File" ],
"WriteTo": [
{ "Name": "Console" },
{ "Name": "File", "Args": { "path": "logs/otopcua-.log", "rollingInterval": "Day" } }
]
}
}
```
> Do NOT add `"Enrich": ["FromLogContext"]` unless it is already enabled today — adding it would
> newly surface driver-scope properties and change output. Preserve the current enrich set.
**Step 2: Replace the inline `UseSerilog` block in `Program.cs`.** Remove lines 49-52:
```csharp
builder.Host.UseSerilog((ctx, lc) => lc
.ReadFrom.Configuration(ctx.Configuration)
.WriteTo.Console()
.WriteTo.File("logs/otopcua-.log", rollingInterval: RollingInterval.Day));
```
and replace with:
```csharp
builder.AddZbSerilog(o => o.ServiceName = "otopcua");
```
Add `using ZB.MOM.WW.Telemetry.Serilog;` to the `using` block. Keep `app.UseSerilogRequestLogging();`
(line 141) unchanged. Keep the existing `using Serilog;` if still referenced; remove
`RollingInterval` import only if now unused.
**Step 3: Build + run the LogContextEnricher tests**
```bash
cd /Users/dohertj2/Desktop/OtOpcUa
dotnet build ZB.MOM.WW.OtOpcUa.slnx -c Debug
dotnet test ZB.MOM.WW.OtOpcUa.slnx --filter "FullyQualifiedName~LogContextEnricherTests"
```
Expected: build PASS; tests PASS (the static `LogContextEnricher.Push` helper is unaffected — it is
not registered in DI and AddZbSerilog does not change its disposable contract).
**Step 4: Sanity-check that logs still emit** (no automated log-output harness here):
```bash
# Quick smoke: build runs; optionally run the host briefly in a role that doesn't need infra
# and confirm console log lines appear. If no safe role exists, rely on the build + the request-
# logging path remaining wired (UseSerilogRequestLogging at Program.cs:141).
```
**Step 5: Commit**
```bash
git add src/Server/ZB.MOM.WW.OtOpcUa.Host/Program.cs src/Server/ZB.MOM.WW.OtOpcUa.Host/appsettings.json
git commit -m "feat(otopcua): adopt AddZbSerilog (shared enrichers + trace correlation); sinks to config"
```
---
## Task 4: ScadaBridge — distribution wiring (source mapping + package refs)
**Classification:** small
**Estimated implement time:** ~4 min
**Parallelizable with:** Task 1, Task 7 (other repos)
**Files:**
- Modify: `/Users/dohertj2/Desktop/ScadaBridge/nuget.config`
- Modify: `/Users/dohertj2/Desktop/ScadaBridge/Directory.Packages.props`
- Modify: `/Users/dohertj2/Desktop/ScadaBridge/src/ZB.MOM.WW.ScadaBridge.Host/ZB.MOM.WW.ScadaBridge.Host.csproj`
**Step 1: Branch**
```bash
cd /Users/dohertj2/Desktop/ScadaBridge && git checkout main && git pull --ff-only && git checkout -b feat/adopt-zb-telemetry
```
**Step 2: Add Telemetry patterns to `nuget.config`** under `<packageSource key="dohertj2-gitea">`:
```xml
<package pattern="ZB.MOM.WW.Telemetry" />
<package pattern="ZB.MOM.WW.Telemetry.*" />
```
**Step 3: Add versions to `Directory.Packages.props`** (next to the Health lines):
```xml
<PackageVersion Include="ZB.MOM.WW.Telemetry" Version="0.1.0" />
<PackageVersion Include="ZB.MOM.WW.Telemetry.Serilog" Version="0.1.0" />
```
**Step 4: Add versionless refs to the Host csproj** (next to the Health refs):
```xml
<PackageReference Include="ZB.MOM.WW.Telemetry" />
<PackageReference Include="ZB.MOM.WW.Telemetry.Serilog" />
```
> `ZB.MOM.WW.Telemetry.Serilog` is referenced here only for the public `TraceContextEnricher` type
> used in Task 6 — ScadaBridge does NOT call `AddZbSerilog`.
**Step 5: Restore + build** (watch for OTel version conflicts with the pinned `OpenTelemetry.Api 1.15.3`)
```bash
dotnet restore ZB.MOM.WW.ScadaBridge.slnx
dotnet build ZB.MOM.WW.ScadaBridge.slnx -c Debug
```
Expected: PASS. If a transitive OTel version conflicts with the CVE-override `OpenTelemetry.Api`,
align the override version to what the shared package requires.
**Step 6: Commit**
```bash
git add nuget.config Directory.Packages.props src/ZB.MOM.WW.ScadaBridge.Host/ZB.MOM.WW.ScadaBridge.Host.csproj
git commit -m "build(scadabridge): reference ZB.MOM.WW.Telemetry packages from Gitea feed"
```
---
## Task 5: ScadaBridge — AddZbTelemetry in both composition roots + MapZbMetrics
**Classification:** standard
**Estimated implement time:** ~5 min
**Parallelizable with:** none (within ScadaBridge)
**Files:**
- Modify: `/Users/dohertj2/Desktop/ScadaBridge/src/ZB.MOM.WW.ScadaBridge.Host/SiteServiceRegistration.cs` (`BindSharedOptions`, ~lines 100-117 — add the registration; called by BOTH roots)
- Modify: `/Users/dohertj2/Desktop/ScadaBridge/src/ZB.MOM.WW.ScadaBridge.Host/Program.cs` (Central endpoint section ~206-259; Site endpoint section ~307-320 — add `app.MapZbMetrics()` in each)
- Test: `/Users/dohertj2/Desktop/ScadaBridge/tests/ZB.MOM.WW.ScadaBridge.Host.Tests/` (add a `/metrics`-served assertion; HealthCheckTests pattern with `WebApplicationFactory<Program>`)
**Context:** ScadaBridge has NO OTel today (only the `OpenTelemetry.Api` CVE override). `SiteId`,
`NodeRole`, `NodeHostname` are available from config (`ScadaBridge:Node:*`). `BindSharedOptions` is
called by both the Central and Site roots, so registering telemetry there covers both without
duplication. This is purely additive (no metrics exist to break).
**Step 1: Register telemetry in `BindSharedOptions`.** Inside `SiteServiceRegistration.BindSharedOptions(IServiceCollection services, IConfiguration config)`, after the existing `services.Configure<...>` calls, add:
```csharp
// Shared OTel: Resource identity (service.name / site.id / node.role) + standard instrumentation
// + Prometheus exporter. Mounted at /metrics by app.MapZbMetrics() in each composition root.
services.AddZbTelemetry(o =>
{
o.ServiceName = "scadabridge";
o.SiteId = config["ScadaBridge:Node:SiteId"] ?? "central";
o.NodeRole = config["ScadaBridge:Node:Role"];
// o.Meters left empty — application instruments are a deferred follow-on (GAPS #9).
});
```
Add `using ZB.MOM.WW.Telemetry;`. (Use the SAME default `?? "central"` for SiteId that
`Program.cs:45` uses, so the Resource attribute matches the log enricher value.)
**Step 2: Map `/metrics` in BOTH roots.** In `Program.cs`:
- Central block — after `app.UseRouting()` and alongside the other `Map*` calls (e.g. just after `app.MapZbHealth();`), add:
```csharp
app.MapZbMetrics();
```
- Site block — in its endpoint section (where `app.MapGrpcService<...>()` is mapped, ~307-320), add:
```csharp
app.MapZbMetrics();
```
Add `using ZB.MOM.WW.Telemetry;` to `Program.cs` if not already present. `MapZbMetrics()` requires
routing; the Central block already calls `UseRouting()`, and the Site block's `MapGrpcService`
implies endpoint routing — if the Site app lacks `UseRouting()`, add it before `MapZbMetrics()`.
**Step 3: Add a `/metrics` integration test** in the Host.Tests project (mirror `HealthCheckTests`):
```csharp
[Fact]
public async Task Metrics_Endpoint_IsMapped()
{
using var factory = /* existing WebApplicationFactory<Program> setup for Central role */;
using var client = factory.CreateClient();
var response = await client.GetAsync("/metrics");
Assert.Equal(HttpStatusCode.OK, response.StatusCode);
var body = await response.Content.ReadAsStringAsync();
Assert.Contains("# ", body); // Prometheus exposition format (HELP/TYPE comments)
}
```
> Reuse the exact `WebApplicationFactory<Program>` + in-memory config bootstrapping that
> `HealthCheckTests.cs` already uses for the Central role (it sets the env to "Central" and removes
> the Akka hosted service). Do not invent a new harness.
**Step 4: Build + test**
```bash
cd /Users/dohertj2/Desktop/ScadaBridge
dotnet build ZB.MOM.WW.ScadaBridge.slnx -c Debug
dotnet test ZB.MOM.WW.ScadaBridge.slnx --filter "FullyQualifiedName~HealthCheckTests|FullyQualifiedName~Metrics_Endpoint_IsMapped|FullyQualifiedName~CompositionRoot"
```
Expected: PASS (existing composition-root + health tests stay green; new metrics test passes).
**Step 5: Commit**
```bash
git add src/ZB.MOM.WW.ScadaBridge.Host/SiteServiceRegistration.cs src/ZB.MOM.WW.ScadaBridge.Host/Program.cs tests/ZB.MOM.WW.ScadaBridge.Host.Tests/
git commit -m "feat(scadabridge): wire AddZbTelemetry + /metrics in both composition roots"
```
---
## Task 6: ScadaBridge — add shared TraceContextEnricher to LoggerConfigurationFactory
**Classification:** small
**Estimated implement time:** ~3 min
**Parallelizable with:** none (within ScadaBridge)
**Files:**
- Modify: `/Users/dohertj2/Desktop/ScadaBridge/src/ZB.MOM.WW.ScadaBridge.Host/LoggerConfigurationFactory.cs` (the `Build` return expression)
- Test (oracle): `/Users/dohertj2/Desktop/ScadaBridge/tests/ZB.MOM.WW.ScadaBridge.Host.Tests/SerilogTests.cs` (+ any `LoggerConfigurationFactory` tests)
**Context (deviation from design doc — see top of plan):** KEEP `LoggerConfigurationFactory` intact
(it owns the Host-011/014/020/022 minimum-level governance). Only add the shared
`TraceContextEnricher` so logs emitted inside a span carry `trace_id`/`span_id` and can be joined to
traces. This gains the cross-cutting correlation win without regressing ScadaBridge's logging
contract.
**Step 1: Add the enricher to the `Build` return.** In `LoggerConfigurationFactory.Build(...)`, the
final expression currently ends:
```csharp
return new LoggerConfiguration()
.ReadFrom.Configuration(configuration)
.MinimumLevel.Is(minimumLevel)
.Enrich.WithProperty("SiteId", siteId)
.Enrich.WithProperty("NodeHostname", nodeHostname)
.Enrich.WithProperty("NodeRole", nodeRole);
```
Add the shared enricher as the last `.Enrich`:
```csharp
.Enrich.WithProperty("NodeRole", nodeRole)
.Enrich.With(new ZB.MOM.WW.Telemetry.Serilog.TraceContextEnricher());
```
(Or add `using ZB.MOM.WW.Telemetry.Serilog;` and use `.Enrich.With(new TraceContextEnricher())`.)
**Step 2: Build + run the Serilog tests**
```bash
cd /Users/dohertj2/Desktop/ScadaBridge
dotnet build ZB.MOM.WW.ScadaBridge.slnx -c Debug
dotnet test ZB.MOM.WW.ScadaBridge.slnx --filter "FullyQualifiedName~SerilogTests|FullyQualifiedName~LoggerConfiguration"
```
Expected: PASS. The three node-identity enrichers and the min-level governance are untouched;
`trace_id`/`span_id` only appear when an `Activity.Current` exists (none in these tests → no change
to asserted properties).
**Step 3: Commit**
```bash
git add src/ZB.MOM.WW.ScadaBridge.Host/LoggerConfigurationFactory.cs
git commit -m "feat(scadabridge): add shared TraceContextEnricher to log pipeline (trace correlation)"
```
---
## Task 7: MxAccessGateway — distribution wiring (source mapping + package refs)
**Classification:** small
**Estimated implement time:** ~4 min
**Parallelizable with:** Task 1, Task 4 (other repos)
**Files:**
- Modify: `/Users/dohertj2/Desktop/MxAccessGateway/nuget.config`
- Modify: `/Users/dohertj2/Desktop/MxAccessGateway/src/ZB.MOM.WW.MxGateway.Server/ZB.MOM.WW.MxGateway.Server.csproj` (NO CPM — direct versioned refs)
**Step 1: Branch**
```bash
cd /Users/dohertj2/Desktop/MxAccessGateway && git checkout main && git pull --ff-only && git checkout -b feat/adopt-zb-telemetry
```
**Step 2: Add Telemetry patterns to `nuget.config`** under `<packageSource key="dohertj2-gitea">`:
```xml
<package pattern="ZB.MOM.WW.Telemetry" />
<package pattern="ZB.MOM.WW.Telemetry.*" />
```
**Step 3: Add direct versioned refs to the Server csproj** (in the main `<ItemGroup>` of `<PackageReference>`s). MxGateway has no Serilog/OTel today, so it needs the shared packages AND the concrete sink assemblies referenced by the `appsettings` `Using` block:
```xml
<PackageReference Include="ZB.MOM.WW.Telemetry" Version="0.1.0" />
<PackageReference Include="ZB.MOM.WW.Telemetry.Serilog" Version="0.1.0" />
<PackageReference Include="Serilog.AspNetCore" Version="10.0.0" />
<PackageReference Include="Serilog.Sinks.Console" Version="6.1.1" />
<PackageReference Include="Serilog.Sinks.File" Version="7.0.0" />
```
> Versions align with ScadaBridge's pins (Serilog.AspNetCore 10.0.0, Console 6.1.1, File 7.0.0). If
> the `.Serilog` package requires a different `Serilog.AspNetCore` floor, match it.
**Step 4: Restore + build**
```bash
dotnet build src/ZB.MOM.WW.MxGateway.Server/ZB.MOM.WW.MxGateway.Server.csproj -c Debug
```
Expected: PASS (packages resolve from Gitea + nuget.org).
**Step 5: Commit**
```bash
git add nuget.config src/ZB.MOM.WW.MxGateway.Server/ZB.MOM.WW.MxGateway.Server.csproj
git commit -m "build(mxgateway): reference ZB.MOM.WW.Telemetry + Serilog packages"
```
---
## Task 8: MxAccessGateway — migrate appsettings Logging → Serilog section
**Classification:** small
**Estimated implement time:** ~3 min
**Parallelizable with:** none (within MxGateway)
**Files:**
- Modify: `/Users/dohertj2/Desktop/MxAccessGateway/src/ZB.MOM.WW.MxGateway.Server/appsettings.json`
**Context:** Current `Logging` (MEL) section: `Default: Information`, `Microsoft.AspNetCore: Warning`.
`AddZbSerilog` reads sinks/levels via `ReadFrom.Configuration` from a `Serilog` section. Translate
the levels and add Console + File sinks so logging output is preserved after the provider swap.
**Step 1: Replace the `Logging` block with a `Serilog` block.** Remove:
```json
"Logging": {
"LogLevel": { "Default": "Information", "Microsoft.AspNetCore": "Warning" }
},
```
Add:
```json
"Serilog": {
"Using": [ "Serilog.Sinks.Console", "Serilog.Sinks.File" ],
"MinimumLevel": {
"Default": "Information",
"Override": { "Microsoft.AspNetCore": "Warning" }
},
"WriteTo": [
{ "Name": "Console" },
{ "Name": "File", "Args": { "path": "logs/mxgateway-.log", "rollingInterval": "Day" } }
]
},
```
> Keep the rest of `appsettings.json` (gateway config) unchanged. Note: `AddZbSerilog` applies its
> own `MinimumLevel.Is(Information)` before `ReadFrom.Configuration`, so the `Serilog:MinimumLevel`
> above is honoured (raising the floor to Information and overriding Microsoft.AspNetCore to Warning
> — matching today's MEL levels).
**Step 2: Commit** (config-only; build happens in Task 9 once the provider is wired)
```bash
git add src/ZB.MOM.WW.MxGateway.Server/appsettings.json
git commit -m "config(mxgateway): translate MEL Logging section to Serilog"
```
---
## Task 9: MxAccessGateway — wire AddZbSerilog (MEL → Serilog provider swap)
**Classification:** high-risk
**Estimated implement time:** ~5 min
**Parallelizable with:** none (within MxGateway)
**Files:**
- Modify: `/Users/dohertj2/Desktop/MxAccessGateway/src/ZB.MOM.WW.MxGateway.Server/GatewayApplication.cs` (`CreateBuilder`, after `ConfigureSelfSignedTls(builder)` ~line 63)
- Test: `/Users/dohertj2/Desktop/MxAccessGateway/src/ZB.MOM.WW.MxGateway.Tests/Gateway/GatewayApplicationTests.cs` (add a provider-swap assertion)
**Context (high-risk — logging on the most operational app):** Register Serilog as the host's
logging provider so all existing MEL `ILogger`/`ILoggerFactory` calls (including
`UseGatewayRequestLoggingScope`'s middleware) route through Serilog. The Serilog MEL provider
captures MEL `BeginScope` dictionaries as structured properties, so `GatewayLogScope` and the
request-logging middleware keep working unchanged. The temporary `LoggerFactory.Create(...AddConsole())`
at lines 96-100 (used only by the TLS cert provider) may remain as-is.
**Step 1: Add the failing test** in `GatewayApplicationTests.cs` — assert the logger factory is now Serilog-backed:
```csharp
[Fact]
public void Build_UsesSerilogLoggerProvider()
{
using var app = GatewayApplication.Build([]);
var factory = app.Services.GetRequiredService<ILoggerFactory>();
// Serilog.Extensions.Hosting registers SerilogLoggerFactory when AddSerilog replaces the factory.
Assert.Equal("SerilogLoggerFactory", factory.GetType().Name);
}
```
**Step 2: Run it — expect FAIL** (`dotnet test ... --filter Build_UsesSerilogLoggerProvider`) → today the factory is the default MEL `LoggerFactory`.
**Step 3: Wire `AddZbSerilog`.** In `GatewayApplication.CreateBuilder`, immediately after
`ConfigureSelfSignedTls(builder);`, add:
```csharp
builder.AddZbSerilog(o => o.ServiceName = "mxgateway");
```
Add `using ZB.MOM.WW.Telemetry.Serilog;`. (`AddZbSerilog` calls `services.AddSerilog(..., preserveStaticLogger: true)`,
which registers `SerilogLoggerFactory` — replacing the MEL factory, so default providers do not
double-log.)
**Step 4: Run the test — expect PASS**, then run the broader logging-adjacent suites:
```bash
cd /Users/dohertj2/Desktop/MxAccessGateway
dotnet build src/ZB.MOM.WW.MxGateway.Server/ZB.MOM.WW.MxGateway.Server.csproj -c Debug
dotnet test src/ZB.MOM.WW.MxGateway.Tests/ZB.MOM.WW.MxGateway.Tests.csproj --filter "FullyQualifiedName~GatewayApplicationTests"
dotnet test src/ZB.MOM.WW.MxGateway.Tests/ZB.MOM.WW.MxGateway.Tests.csproj --filter "FullyQualifiedName~FakeWorker"
```
Expected: PASS — `Build_MapsCanonicalHealthEndpoints`, `Build_RegistersGatewayMetrics`, the
config-validation cases, and the fake-worker smoke all stay green; the new provider-swap test passes.
**Step 5: Verify no double console logging** — if `SerilogLoggerFactory` is confirmed in Step 4, the
default providers are bypassed and no extra step is needed. If you observe duplicated console lines
in any manual run, add `builder.Logging.ClearProviders();` immediately before `AddZbSerilog`.
**Step 6: Commit**
```bash
git add src/ZB.MOM.WW.MxGateway.Server/GatewayApplication.cs src/ZB.MOM.WW.MxGateway.Tests/Gateway/GatewayApplicationTests.cs
git commit -m "feat(mxgateway): adopt AddZbSerilog — MEL→Serilog provider swap (behaviour-preserving)"
```
---
## Task 10: MxAccessGateway — wrap GatewayLogRedactor behind the ILogRedactor seam
**Classification:** standard
**Estimated implement time:** ~4 min
**Parallelizable with:** none (within MxGateway)
**Files:**
- Create: `/Users/dohertj2/Desktop/MxAccessGateway/src/ZB.MOM.WW.MxGateway.Server/Diagnostics/GatewayLogRedactorSeam.cs`
- Modify: `/Users/dohertj2/Desktop/MxAccessGateway/src/ZB.MOM.WW.MxGateway.Server/GatewayApplication.cs` (register the seam in DI in `CreateBuilder`)
- Test: `/Users/dohertj2/Desktop/MxAccessGateway/src/ZB.MOM.WW.MxGateway.Tests/Diagnostics/GatewayLogRedactorSeamTests.cs`
**Context:** The shared `RedactionEnricher` applies any DI-registered `ILogRedactor` to every log
event before it reaches a sink. MxGateway's redaction lives in the static `GatewayLogRedactor`
(API-key Bearer tokens, client identity). Provide a thin `ILogRedactor` that redacts the relevant
log-event properties (`ClientIdentity`, `authorization`) via the existing static helper. Keep
`GatewayLogRedactor` for its current callers (`GatewayLogScope`, `DashboardRedactor`).
**Step 1: Write the failing test** (`GatewayLogRedactorSeamTests.cs`):
```csharp
using System.Collections.Generic;
using ZB.MOM.WW.MxGateway.Server.Diagnostics;
using Xunit;
public class GatewayLogRedactorSeamTests
{
[Fact]
public void Redact_MasksApiKeyInClientIdentity()
{
var redactor = new GatewayLogRedactorSeam();
var props = new Dictionary<string, object?>
{
["ClientIdentity"] = "Bearer mxgw_operator01_super-secret"
};
redactor.Redact(props);
Assert.Equal("Bearer mxgw_operator01_[redacted]", props["ClientIdentity"]);
}
}
```
**Step 2: Run it — expect FAIL** (type doesn't exist).
**Step 3: Implement `GatewayLogRedactorSeam.cs`:**
```csharp
using ZB.MOM.WW.Telemetry.Serilog;
namespace ZB.MOM.WW.MxGateway.Server.Diagnostics;
/// <summary>
/// Adapts the static <see cref="GatewayLogRedactor"/> to the shared <see cref="ILogRedactor"/> seam
/// so the telemetry RedactionEnricher masks API-key/credential material on every log event.
/// </summary>
public sealed class GatewayLogRedactorSeam : ILogRedactor
{
private static readonly string[] IdentityKeys = ["ClientIdentity", "authorization", "Authorization"];
public void Redact(IDictionary<string, object?> properties)
{
ArgumentNullException.ThrowIfNull(properties);
foreach (var key in IdentityKeys)
{
if (properties.TryGetValue(key, out var value) && value is string s)
{
properties[key] = GatewayLogRedactor.RedactClientIdentity(s);
}
}
}
}
```
**Step 4: Register in DI.** In `GatewayApplication.CreateBuilder`, alongside the other singletons, add:
```csharp
builder.Services.AddSingleton<ZB.MOM.WW.Telemetry.Serilog.ILogRedactor, Diagnostics.GatewayLogRedactorSeam>();
```
**Step 5: Run the test + build**
```bash
cd /Users/dohertj2/Desktop/MxAccessGateway
dotnet test src/ZB.MOM.WW.MxGateway.Tests/ZB.MOM.WW.MxGateway.Tests.csproj --filter "FullyQualifiedName~GatewayLogRedactorSeamTests"
dotnet build src/ZB.MOM.WW.MxGateway.Server/ZB.MOM.WW.MxGateway.Server.csproj -c Debug
```
Expected: PASS.
**Step 6: Commit**
```bash
git add src/ZB.MOM.WW.MxGateway.Server/Diagnostics/GatewayLogRedactorSeam.cs src/ZB.MOM.WW.MxGateway.Server/GatewayApplication.cs src/ZB.MOM.WW.MxGateway.Tests/Diagnostics/GatewayLogRedactorSeamTests.cs
git commit -m "feat(mxgateway): expose GatewayLogRedactor via shared ILogRedactor seam"
```
---
## Task 11: MxAccessGateway — wire AddZbTelemetry (export GatewayMetrics) + MapZbMetrics
**Classification:** standard
**Estimated implement time:** ~4 min
**Parallelizable with:** none (within MxGateway)
**Files:**
- Modify: `/Users/dohertj2/Desktop/MxAccessGateway/src/ZB.MOM.WW.MxGateway.Server/GatewayApplication.cs` (`CreateBuilder` after `AddSingleton<GatewayMetrics>()` ~line 72; `MapGatewayEndpoints` after `MapZbHealth()` ~line 177)
- Test: `/Users/dohertj2/Desktop/MxAccessGateway/src/ZB.MOM.WW.MxGateway.Tests/Gateway/GatewayApplicationTests.cs` (add `/metrics`-served assertion) + existing `GatewayMetricsTests` as oracle
**Context:** The `MxGateway.Server` meter (13 counters, 3 ms-histograms, 4 gauges) exists but is
never exported (no OTel SDK, no `/metrics`). `AddZbTelemetry` with `Meters = ["MxGateway.Server"]`
registers the meter with the OTel MeterProvider + Prometheus exporter; `MapZbMetrics()` mounts
`/metrics`. **Keep the `MxGateway.Server` name and the `ms` histogram units** (rename #7 + unit #6
are deferred). `GetSnapshot()` is untouched.
**Step 1: Add `AddZbTelemetry` in `CreateBuilder`**, immediately after `builder.Services.AddSingleton<GatewayMetrics>();`:
```csharp
builder.AddZbTelemetry(o =>
{
o.ServiceName = "mxgateway";
o.Meters = [GatewayMetrics.MeterName]; // "MxGateway.Server" — unchanged (rename deferred)
});
```
Add `using ZB.MOM.WW.Telemetry;`.
**Step 2: Map `/metrics` in `MapGatewayEndpoints`**, after `endpoints.MapZbHealth();`:
```csharp
endpoints.MapZbMetrics();
```
**Step 3: Add the served-endpoint test** in `GatewayApplicationTests.cs`:
```csharp
[Fact]
public async Task Build_MapsMetricsEndpoint()
{
using var app = GatewayApplication.Build([]);
await app.StartAsync();
try
{
using var client = new HttpClient { BaseAddress = new Uri(app.Urls.First()) };
var response = await client.GetAsync("/metrics");
Assert.Equal(HttpStatusCode.OK, response.StatusCode);
}
finally { await app.StopAsync(); }
}
```
> If the existing test class already has a started-host helper (the config-validation tests call
> `StartAsync`), reuse it rather than starting a fresh host. Tests bind ephemeral ports (`:0`).
**Step 4: Build + test**
```bash
cd /Users/dohertj2/Desktop/MxAccessGateway
dotnet build src/ZB.MOM.WW.MxGateway.Server/ZB.MOM.WW.MxGateway.Server.csproj -c Debug
dotnet test src/ZB.MOM.WW.MxGateway.Tests/ZB.MOM.WW.MxGateway.Tests.csproj --filter "FullyQualifiedName~GatewayApplicationTests|FullyQualifiedName~GatewayMetricsTests"
```
Expected: PASS — the `MeterListener`-based `GatewayMetricsTests` (Tests-027 isolation) stay green
because the meter name/instruments are unchanged; the new `/metrics` test passes.
**Step 5: Commit**
```bash
git add src/ZB.MOM.WW.MxGateway.Server/GatewayApplication.cs src/ZB.MOM.WW.MxGateway.Tests/Gateway/GatewayApplicationTests.cs
git commit -m "feat(mxgateway): export GatewayMetrics via AddZbTelemetry + /metrics (name/units unchanged)"
```
---
## Task 12: scadaproj — bookkeeping (GAPS + correct the false "MxGateway logging adopted" claim)
**Classification:** trivial
**Estimated implement time:** ~4 min
**Parallelizable with:** none (runs after all repo phases)
**Files:**
- Modify: `/Users/dohertj2/Desktop/scadaproj/components/observability/GAPS.md` (add "Adoption status — 2026-06-01 (DONE)" section)
- Modify: `/Users/dohertj2/Desktop/scadaproj/components/observability/README.md` (correct the "MxGateway logging adopted" claim)
- Modify: `/Users/dohertj2/Desktop/scadaproj/ZB.MOM.WW.Telemetry/CLAUDE.md` (same correction)
- Modify: `/Users/dohertj2/Desktop/scadaproj/CLAUDE.md` (observability row + "MxAccessGateway logging adopted" note)
**Step 1: Add an adoption-status section to `GAPS.md`** with a per-repo table (what each app now
does), the **accepted scope note** (ScadaBridge keeps `LoggerConfigurationFactory` + adds
`TraceContextEnricher` rather than adopting `AddZbSerilog`; MxGateway keeps `GatewayLogScope`), and a
**Deferred** subsection listing #6 (histogram ms→s), #7 (meter rename), #9 (ScadaBridge app
instruments), #10/#11 (OTLP) as still-open.
**Step 2: Correct the false claim** everywhere it appears — the prior text said MxGateway's MEL→Serilog
migration was "done on its own branch." Replace with: "MxGateway MEL→Serilog migration + metrics
export landed on `main` via the 2026-06-01 telemetry adoption (branch `feat/adopt-zb-telemetry`)."
**Step 3: Commit**
```bash
cd /Users/dohertj2/Desktop/scadaproj
git add components/observability/GAPS.md components/observability/README.md ZB.MOM.WW.Telemetry/CLAUDE.md CLAUDE.md
git commit -m "docs(observability): record ZB.MOM.WW.Telemetry adoption across 3 apps; correct MxGateway logging-status claim"
```
---
## Acceptance checklist (whole plan)
- [ ] Both Telemetry packages resolve from the Gitea feed (Task 0 verified `200`).
- [ ] OtOpcUa: builds; `OtOpcUaTelemetryHookTests` + `LogContextEnricherTests` green; `/metrics` still served; meter `ZB.MOM.WW.OtOpcUa` unchanged.
- [ ] ScadaBridge: builds; composition-root + health + new metrics tests green; `/metrics` served in both roles; `LoggerConfigurationFactory` governance intact.
- [ ] MxGateway: builds; `GatewayApplicationTests` + `GatewayMetricsTests` + fake-worker smoke green; logger is Serilog-backed; redaction applied via seam; `/metrics` served; `MxGateway.Server` name + `ms` units unchanged.
- [ ] No secrets committed to any repo (token stays in `~/.nuget/NuGet/NuGet.Config`).
- [ ] `components/observability/GAPS.md` updated; the false "MxGateway logging adopted" claim corrected.
- [ ] All three feature branches committed (one commit per task), no hooks skipped, no force-push.
@@ -0,0 +1,20 @@
{
"planPath": "docs/plans/2026-06-01-telemetry-library-adoption.md",
"tasks": [
{"id": 0, "taskId": 23, "subject": "Task 0: Publish/verify Telemetry packages on Gitea", "status": "pending", "classification": "small"},
{"id": 1, "taskId": 24, "subject": "Task 1: OtOpcUa — distribution wiring", "status": "pending", "classification": "small", "blockedBy": [0]},
{"id": 2, "taskId": 25, "subject": "Task 2: OtOpcUa — swap OTel to AddZbTelemetry", "status": "pending", "classification": "standard", "blockedBy": [1]},
{"id": 3, "taskId": 26, "subject": "Task 3: OtOpcUa — swap Serilog to AddZbSerilog", "status": "pending", "classification": "standard", "blockedBy": [2]},
{"id": 4, "taskId": 27, "subject": "Task 4: ScadaBridge — distribution wiring", "status": "pending", "classification": "small", "blockedBy": [0]},
{"id": 5, "taskId": 28, "subject": "Task 5: ScadaBridge — AddZbTelemetry both roots + MapZbMetrics", "status": "pending", "classification": "standard", "blockedBy": [4]},
{"id": 6, "taskId": 29, "subject": "Task 6: ScadaBridge — TraceContextEnricher in LoggerConfigurationFactory", "status": "pending", "classification": "small", "blockedBy": [5]},
{"id": 7, "taskId": 30, "subject": "Task 7: MxAccessGateway — distribution wiring", "status": "pending", "classification": "small", "blockedBy": [0]},
{"id": 8, "taskId": 31, "subject": "Task 8: MxAccessGateway — appsettings Logging → Serilog", "status": "pending", "classification": "small", "blockedBy": [7]},
{"id": 9, "taskId": 32, "subject": "Task 9: MxAccessGateway — AddZbSerilog (MEL→Serilog provider swap)", "status": "pending", "classification": "high-risk", "blockedBy": [8]},
{"id": 10, "taskId": 33, "subject": "Task 10: MxAccessGateway — ILogRedactor seam", "status": "pending", "classification": "standard", "blockedBy": [9]},
{"id": 11, "taskId": 34, "subject": "Task 11: MxAccessGateway — AddZbTelemetry metrics export + MapZbMetrics", "status": "pending", "classification": "standard", "blockedBy": [10]},
{"id": 12, "taskId": 35, "subject": "Task 12: scadaproj — bookkeeping + correct false claim", "status": "pending", "classification": "trivial", "blockedBy": [3, 6, 11]}
],
"notes": "Task 0 gates all. After Task 0 the three repo phases (OtOpcUa 1-3, ScadaBridge 4-6, MxGateway 7-11) are independent and may run concurrently across their separate working directories; within a repo tasks are sequential. Task 12 last.",
"lastUpdated": "2026-06-01"
}
@@ -0,0 +1,195 @@
# Design — Auth + Audit normalization across all three sister projects
**Date:** 2026-06-02
**Status:** Approved (brainstorming complete) — handing off to writing-plans.
**Scope owner decision:** full two-library normalization (see [Scope decisions](#scope-decisions)).
## Summary
Bring two shared libraries that already live in `scadaproj` but are **unpublished and
adopted by no app** — `ZB.MOM.WW.Auth` (4 packages) and `ZB.MOM.WW.Audit` (1 package) —
to **full adoption across OtOpcUa, MxAccessGateway, and ScadaBridge**, ending with every
audit emit site carrying the genuine Auth-resolved principal as `AuditEvent.Actor`.
The original request was "implement the audit component in all sister projects." Because
audit GAPS #4 (Actor = the `ZB.MOM.WW.Auth` principal) requires an authenticated principal
at every emit site, and because the owner chose the maximal scope at every fork, the job
expands to a **two-library program**: full Auth adoption (auth GAPS #1#8) first, then full
Audit adoption (audit GAPS #1#6) with #4 wiring `Actor` from the now-live principal.
## Verified starting state (source-checked 2026-06-02)
- **Both libraries exist and are pack-ready** in `scadaproj/ZB.MOM.WW.Auth/` (4 csproj +
`build/pack.sh` + `build/push.sh`, 172 tests) and `scadaproj/ZB.MOM.WW.Audit/`
(`build/pack.sh`, 19 tests). Both at version `0.1.0`, both central-package-management.
- **Neither is on the Gitea feed.** All five package registration endpoints return
**HTTP 404**. No `.nupkg` is built locally.
- **Adopted by zero apps.** No sibling repo references `ZB.MOM.WW.Auth*` or `ZB.MOM.WW.Audit`.
- **Feed source-mapping is missing in all three repos.** Each `NuGet.config`
`packageSourceMapping` lists Health/Telemetry/Configuration but **not** Auth or Audit, so
each repo needs mapping lines added (mirror MxGateway commit `437ab65`, which did this for
Configuration).
- **The MxGateway audit coordination gate (audit GAPS #2) is CLEAR.** `MxGateway.Server`
already references `ZB.MOM.WW.Telemetry.Serilog 0.1.0`; the Serilog/Telemetry/Configuration
work is merged to `main`. MxGateway audit adoption is unblocked.
- Established adoption rhythm (Telemetry, Configuration): publish lib to feed → add feed
mapping + version pin → behaviour-preserving consumer cutover → land on the repo's local
default branch (not pushed to remote).
> Per repo memory, prior "published"/"adopted" claims in this workspace have repeatedly been
> optimistic; every claim above was re-verified against the feed and source on 2026-06-02.
## Scope decisions
| Fork | Decision |
|---|---|
| How deep into the audit GAPS backlog? | **Everything incl. #4 Actor→Auth** (all of #1#6). |
| How to satisfy #4 given Auth is unadopted? | **Adopt Auth first, then audit** (two-library program). |
| How much of the Auth backlog? | **Full Auth normalization** (auth GAPS #1#8, all 3 repos). |
| How to walk the work matrix? | **Library-major waterfall** (Phase 1 Auth → Phase 2 Audit → Phase 3 wiring). |
| Remote integration model | **Local-only**; no `git push`, no PRs (safest for production auth paths; flip per repo later if desired). |
## Architecture — four phases
```
Phase 0 Publish & feed-map pack + push both libs to Gitea feed (fix the 404s);
(foundation) add NuGet.config source-mappings + version pins in all 3 repos.
Phase 1 Auth adoption auth GAPS #1#8 across all 3 repos, in GAPS sequence:
(largest, sec-sensitive) #3 IGroupRoleMapper seam → #1 Ldap + #2 ApiKeys cutover →
#4 config schema (A1/A2) + #5 claims/cookies → #6 dev base DN →
#8 canonical roles. Each lands behind tests.
Phase 2 Audit adoption audit GAPS #1#3 core + #5/#6 cleanups across all 3 repos.
(behaviour-preserving)
Phase 3 Actor→Auth wiring audit GAPS #4: route the now-live Auth principal into Actor
(the payoff) at every emit site. Closes the loop Audit.Actor == Auth principal.
```
The waterfall is enforced by task dependencies (Phase 0 → 1 → 2 → 3). Phase 1 must fully
land before Phase 3 can wire a *stable* principal; Phase 2 sits after Phase 1 so emit sites
aren't touched twice.
### Delivery model
- One **feature branch per repo per library phase** (`feat/adopt-zb-auth`, then
`feat/adopt-zb-audit`), behaviour-preserving except where a GAPS item is explicitly net-new.
- **Publish-first**: both packages on the feed and verified resolvable before any consumer edit.
- **Land on each repo's local default branch**, gated by that repo's tests + new contract tests.
- **Local-only** (no push). Each phase is a revertable branch merge.
- The libraries themselves are plain files in `scadaproj` (not nested git repos) — publishing
is `pack` + `push` only; no commits to the libs unless a parity gap forces a fix.
## Phase 0 — publish & feed-map *(task #7)*
1. `dotnet pack -c Release` both libraries; `push.sh` to the Gitea feed
(`https://gitea.dohertylan.com/api/packages/dohertj2/nuget/index.json`).
2. Verify all five packages return HTTP 200 from the registration endpoint.
3. In each repo: add `packageSourceMapping` patterns (`ZB.MOM.WW.Auth`, `ZB.MOM.WW.Auth.*`,
`ZB.MOM.WW.Audit`) to the gitea source, and version pins (`Directory.Packages.props` for
OtOpcUa/ScadaBridge; inline `Version="0.1.0"` for MxGateway).
4. `dotnet restore` resolves the new patterns in all three repos.
## Phase 1 — Auth adoption *(task #8, blocked by #7)*
Consumer cutover (libs are already extracted). GAPS order: #3 seam → #1 Ldap + #2 ApiKeys →
#4 config schema + #5 claims/cookies → #6 dev base DN → #8 canonical roles.
| | OtOpcUa | MxAccessGateway | ScadaBridge |
|---|---|---|---|
| Packages | Abstractions + Ldap + AspNetCore (no ApiKeys — OPC UA transport security) | all 4 (**source** for ApiKeys — cuts over first) | all 4 (**source** for Ldap; ApiKeys consumer after gw) |
| Role mapper (#3) | config-backed (`GroupToRole`) | config-backed | **DB-backed** (`LdapGroupMapping`) |
| Config migration (#4) | A1: `UseTls``Transport` enum (section already nested) | A1: `UseTls``Transport` enum | **A2 (biggest)**: flat `Security:Ldap*`→nested; rename `LdapUserIdAttribute``UserNameAttribute`, `LdapGroupAttribute``GroupAttribute` |
| Cookies/claims (#5) | Blazor Admin control-plane cookie | keep `MxGatewayDashboard` name, share claims | keep `ZB.MOM.WW.ScadaBridge.Auth` name, share claims |
| Canonical roles (#8) | no first-class `Deployer` (publish ⊂ `FleetAdmin`) | no `Designer`/`Deployer` | **roles collapse**: `AuditReadOnly`→Viewer, `Audit`→Administrator (auditor/admin SoD loss — GAPS-accepted) |
**Two deliberate behaviour changes (accepted):**
1. **ScadaBridge API-key token format** (D2): raw `X-API-Key` → structured
`<prefix>_<id>_<secret>`. A genuine wire change for inbound API clients — acceptable
pre-prod, requires an interop check.
2. **Canonical-roles collapse** in ScadaBridge removes auditor/admin separation-of-duties.
**Known live issue to fix during OtOpcUa cutover:** `LdapAuthService` `Enabled`/double-singleton
wiring is still open even though the `Security:Ldap` section binding was fixed — fold the fix
into the OtOpcUa LDAP cutover.
**Risk gate:** parity tests reproducing each app's current authn decisions (bind-then-search,
fail-closed group lookup, RFC-4514 + filter escaping, constant-time compare, peppered
HMAC-SHA256) must be green before any cutover merges.
## Phase 2 — Audit adoption *(task #9, blocked by #8)*
Behaviour-preserving seam/record/enum adoption.
| Repo | Core work (GAPS #1#3) | Keep bespoke |
|---|---|---|
| **OtOpcUa** (#1, #5) | Replace `Commons/.../AuditEvent.cs` with canonical record; `AuditWriterActor : IAuditWriter`; derive `Outcome` at emit sites (`OpcUaAccessDenied`/`CrossClusterNamespaceAttempt`→Denied, config verbs→Success); bridge `NodeId`/`CorrelationId` value-types | Akka singleton transport, 500/5s batching, two-layer dedup, `ConfigAuditLog` EF entity + idempotency index |
| **MxGateway** (#2, #6) | Map `IApiKeyAuditStore`/`ApiKeyAuditEntry``IAuditWriter`/`AuditEvent`; generate `EventId`; `"system"`/`"cli"` Actor fallback; `Category="ApiKey"`; `constraint-denied`→Denied | SQLite store, 3 producer call sites (only injected type changes), append-only table |
| **ScadaBridge** (#3) | Outright rename `IAuditPayloadFilter``IAuditRedactor`; adopt canonical `AuditOutcome` enum; confirm writer contract (byte-identical) — keep bespoke ~25-field record as storage shape | Entire Site/Central pipeline, 4 domain enums, CLI export/verify, Blazor UI, redaction policy |
**Resolved open GAPS decisions:**
1. **ScadaBridge rename vs. alias****outright rename** (compiler-verified across the HIGH blast radius).
2. **MxGateway `Details`→`DetailsJson`****wrap as a small JSON object** (keeps the field valid JSON).
3. **OtOpcUa `Outcome` storage****new nullable `Outcome` column + EF migration** (first-class, queryable).
4. **OtOpcUa SP path****leave bespoke + document**; *do* fix the `ClusterId`-filter/actor
mismatch in `ClusterAudit.razor` so structured rows are visible.
**Cleanups in scope:** #5 (OtOpcUa SP reconcile + `ClusterId` visibility fix), #6 (MxGateway
`CorrelationId` capture + structured `Target`).
**Behaviour fix:** MxGateway's `AppendAsync` currently may propagate; wrap it so the adopted
`IAuditWriter` never throws (honors the best-effort contract).
## Phase 3 — Actor→Auth wiring *(task #10, blocked by #8 + #9)*
With Auth live (Phase 1) and the canonical record adopted (Phase 2), route the resolved
principal into `AuditEvent.Actor` everywhere:
- **Seam:** one small `IAuditActorAccessor` — HTTP paths read `HttpContext.User`; non-HTTP
paths (Akka actors, CLI) thread the operation principal or fall back. The single place that
changes if the principal source ever changes again.
- OtOpcUa → LDAP-resolved user. MxGateway → API-key name (system/cli fallback retained for
keyless CLI events). ScadaBridge → principal at `ManagementActor`/inbound boundary.
## Contracts, testing & risk gates
**Hard seam contracts:**
- `IAuditWriter` — best-effort, MUST NOT throw, swallow internal failures. OtOpcUa actor ✅,
ScadaBridge ✅; MxGateway needs the never-throw wrap (above).
- `IAuditRedactor` — pure, never throws, over-redacts on failure. ScadaBridge's
`SafeDefaultAuditPayloadFilter` is the reference; rename preserves it.
**Cross-boundary surface:** Auth/Audit adoption is in-process and does **not** touch the
cross-repo wire contracts (gateway `.proto` files, OPC UA address-space shape) — **except** the
ScadaBridge API-key token-format change, the one item needing an interop check rather than just
a green unit build. A green build in one repo does not prove interop.
**Per-phase verification (evidence before "done"):**
- **Phase 0:** all 5 packages HTTP 200; `dotnet restore` green in all 3 repos.
- **Phase 1:** existing auth tests + new parity tests green per repo before merge; SB
token-format integration check.
- **Phase 2:** existing audit tests + new `Outcome`/`EventId`/rename tests; OtOpcUa `Outcome`
migration applies forward.
- **Phase 3:** `Actor == authenticated principal` on authenticated paths; fallback retained on
keyless/system paths.
- **Library suites** (Audit 19, Auth 172) re-run if any lib is touched. If a parity gap forces
a lib fix, bump `0.1.0``0.1.1` and re-publish rather than editing a published version.
## Tasks
| Task | Item | Blocked by |
|---|---|---|
| #7 | Phase 0 — publish both libs + feed-map all 3 repos | — |
| #8 | Phase 1 — adopt ZB.MOM.WW.Auth across all 3 repos (auth GAPS #1#8) | #7 |
| #9 | Phase 2 — adopt ZB.MOM.WW.Audit across all 3 repos (audit GAPS #1#3, #5, #6) | #8 |
| #10 | Phase 3 — wire Actor from the Auth principal (audit GAPS #4) | #8, #9 |
## References
- `components/auth/GAPS.md`, `components/auth/spec/`, `components/auth/current-state/*`
- `components/audit/GAPS.md`, `components/audit/shared-contract/ZB.MOM.WW.Audit.md`,
`components/audit/current-state/*`
- Libraries: `ZB.MOM.WW.Auth/`, `ZB.MOM.WW.Audit/`
- Prior adoption precedent: `components/configuration/GAPS.md`,
`components/observability/GAPS.md`
@@ -0,0 +1,366 @@
# Phase 1 (Auth adoption) — elaborated steps + Task 1.0 findings
Companion to `2026-06-02-auth-audit-normalization.md`. Produced by the Task 1.0 read-only
exploration gate (4 parallel explorers: library surface + 3 repos). All paths verified
2026-06-02 against source.
## Cutover target — `ZB.MOM.WW.Auth` public surface
| Package | Consumer entry points |
|---|---|
| `.Abstractions` | **NB: `IGroupRoleMapper<TRole>`/`GroupRoleMapping<TRole>`/`CanonicalRole` live in namespace `ZB.MOM.WW.Auth.Abstractions.Roles`** (verified during Task 1.1). `ILdapAuthService`, `LdapOptions` (`Transport: LdapTransport{Ldaps,StartTls,None}`, `AllowInsecure`, `UserNameAttribute`, `GroupAttribute`, `ServiceAccountDn/Password`, `SearchBase`, `ConnectionTimeoutMs`, `ServerCertificateValidationCallback`), `LdapAuthResult(Succeeded,Username,DisplayName,Groups,Failure)`, `LdapAuthFailure`, `CanonicalRole{Viewer,Operator,Engineer,Designer,Deployer,Administrator}`, `IGroupRoleMapper<TRole>` (**no default impl — consumer writes it**) → `GroupRoleMapping<TRole>(Roles, Scope:object?)`, plus API-key abstractions (`IApiKeyVerifier`, `ApiKeyVerification`, `ApiKeyIdentity`, `IApiKeyStore`/`IApiKeyAdminStore`/`IApiKeyAuditStore`, `ApiKeyOptions{TokenPrefix,PepperSecretName,SqlitePath,RunMigrationsOnStartup}`) |
| `.Ldap` | `LdapAuthService(LdapOptions)` : `ILdapAuthService`. Bind-then-search, fail-closed, never throws. `LdapOptionsValidator` (TLS-or-AllowInsecure) auto-registered. |
| `.ApiKeys` | `ApiKeyVerifier(ApiKeyOptions, IApiKeyStore, IApiKeyPepperProvider, TimeProvider?)`, `ApiKeyParser.TryParse` (`<prefix>_<keyId>_<secret>`), `ApiKeySecretGenerator.NewSecret()`, default SQLite stores, `ConfigurationApiKeyPepperProvider`. **Extracted from MxGateway — near-1:1 with its pipeline.** |
| `.AspNetCore` | `ZbClaimTypes{Name,Role,DisplayName,Username,ScopeId}`, `ZbCookieDefaults.Apply(opts, requireHttps, idleTimeout)`, DI: `AddZbLdapAuth(services, config, sectionPath)`, `AddZbApiKeyAuth(services, config, sectionPath)`. |
## Per-app current state (verified) and elaborated cutover
### OtOpcUa — packages: Abstractions + Ldap + AspNetCore (no ApiKeys)
Current LDAP: `src/Server/ZB.MOM.WW.OtOpcUa.Security/Ldap/LdapAuthService.cs` (impl), `ILdapAuthService.cs`,
`LdapOptions.cs` (**section `Security:Ldap`**, `UseTls` bool, `Enabled`, `DevStubMode`, embedded `GroupToRole` dict),
`LdapAuthResult.cs` (already carries `Roles`). Role mapping is **config + DB**: `RoleMapper.Map` (config
`GroupToRole`) + `RoleMapper.Merge` with DB `LdapGroupRoleMappingService`/`LdapGroupRoleMapping` (system-wide rows).
Native roles `AdminRole{ConfigViewer,ConfigEditor,FleetAdmin}` (control-plane only; data-plane is a separate
`NodePermissions` bitmask). DI: two `TryAddSingleton<ILdapAuthService,LdapAuthService>` sites
(`Security/ServiceCollectionExtensions.cs:42` + `Host/Program.cs:106`). Cookie `ZB.MOM.WW.OtOpcUa.Auth`,
single Cookie scheme (JWT inside cookie). **Second LDAP consumer:** OPC UA data-plane
`LdapOpcUaUserAuthenticator` + `OpcUaApplicationHost.HandleImpersonation` call the LDAP service too.
- **1.1 mapper:** implement `IGroupRoleMapper<AdminRole>` (or `<string>`) wrapping `RoleMapper.Map` + DB `Merge`.
- **1.2 Ldap:** replace `LdapAuthService` with `Auth.Ldap`; restructure flow to `ILdapAuthService → Groups → IGroupRoleMapper → roles → claims`; **preserve `DevStubMode` app-side** (library has no stub); wire BOTH consumers (login endpoint + OPC UA impersonation).
- **1.4 config:** `UseTls``Transport` enum (section already `Security:Ldap` — see Finding #1).
- **1.5 cookie/claims:** use `ZbClaimTypes` + `ZbCookieDefaults.Apply`; keep cookie name.
- **1.7 roles:** `ConfigViewer→Viewer`, `ConfigEditor→Designer`, `FleetAdmin→Administrator(+Deployer; publish⊂FleetAdmin)`. Data-plane `NodePermissions` unaffected.
### MxAccessGateway — packages: all 4 (ApiKeys **source**, cuts over first)
Current API keys (`src/ZB.MOM.WW.MxGateway.Server/Security/Authentication/`): `ApiKeyParser` (`mxgw_<id>_<secret>`),
`ApiKeySecretHasher` (HMAC-SHA256 + pepper `MxGateway:ApiKeyPepper`), `ApiKeySecretGenerator`, `ApiKeyVerifier`
(`FixedTimeEquals`), SQLite stores, `ConstraintEnforcer` + rich `ApiKeyConstraints`, gRPC
`GatewayGrpcAuthorizationInterceptor` + `GatewayScopes`. DI `AddSqliteAuthStore()`. → **near-1:1 with `Auth.ApiKeys`.**
LDAP: `Dashboard/DashboardAuthenticator.cs` (`MxGateway:Ldap`, `UseTls`), `GroupToRole` under `MxGateway:Dashboard`,
roles `Admin`/`Viewer`, cookie `MxGatewayDashboard`.
- **1.1 mapper:** `IGroupRoleMapper<string>` wrapping `DashboardAuthenticator.MapGroupsToRoles`.
- **1.2 Ldap:** replace `DashboardAuthenticator`'s LDAP internals with `Auth.Ldap` (keep dashboard claims/principal build).
- **1.3 ApiKeys:** delete the local parser/hasher/generator/verifier/stores; re-point to `Auth.ApiKeys`; **keep** `ConstraintEnforcer` + gRPC interceptor + scopes on top (constraints carried as the opaque blob). Lowest-risk ApiKeys cutover (it's the donor).
- **1.4 config:** `UseTls``Transport`.
- **1.5/1.7:** `ZbClaimTypes`/cookie defaults; `Viewer→Viewer`, `Admin→Administrator`.
### ScadaBridge — packages: all 4 (Ldap **source**; ApiKeys consumer)
Current LDAP (`src/ZB.MOM.WW.ScadaBridge.Security/LdapAuthService.cs`): the hardened reference (RFC-4514 DN escape,
filter escape, per-op timeout, fail-closed group lookup, username trim, service-account-bind distinction). Config is
**flat** `ScadaBridge:Security:Ldap*` in `SecurityOptions.cs` with **`LdapTransport` enum already** (`Ldaps/StartTls/None`),
`AllowInsecureLdap`, `LdapUserIdAttribute`, `LdapGroupAttribute`, validated by `SecurityOptionsValidator : OptionsValidatorBase`.
Role mapping **DB-backed** with **site-scoping**: `RoleMapper.MapGroupsToRolesAsync``RoleMappingResult(Roles, PermittedSiteIds, IsSystemWideDeployment)` over `LdapGroupMapping` + `SiteScopeRule` (SQL Server). Roles
`Admin/Design/Deployment/Audit/AuditReadOnly`; SoD via `OperationalAudit{Admin,Audit,AuditReadOnly}` + `AuditExport{Admin,Audit}`.
Cookie `ZB.MOM.WW.ScadaBridge.Auth`; JWT-in-cookie via `JwtTokenService`.
**Inbound API keys** (`InboundAPI/ApiKeyValidator.cs`): **raw `X-API-Key`**, **deterministic** HMAC (`ApiKeyHasher`, no per-row salt, by-value lookup), `ApiKey{Name,KeyHash,IsEnabled}` in **SQL Server**, **per-method approval** via `ApiMethod.ApprovedApiKeyIds` — **architecturally different from the library's keyId/scope/SQLite model.**
- **1.1 mapper:** `IGroupRoleMapper<string>` wrapping `RoleMapper.MapGroupsToRolesAsync`, carrying `PermittedSiteIds`/`IsSystemWideDeployment` in `GroupRoleMapping.Scope`.
- **1.2 Ldap:** ScadaBridge is the donor — confirm `Auth.Ldap` behaviour-matches, then re-point `LdapAuthService` usages to the library type. Lowest-risk Ldap cutover.
- **1.3 ApiKeys:** **see Finding #3 — bigger than a token reformat; needs a scope decision.**
- **1.4 config:** nest flat `Security:Ldap*` under a sub-section + rename `LdapUserIdAttribute→UserNameAttribute`, `LdapGroupAttribute→GroupAttribute`, `LdapTransport→Transport` (+ `SecurityOptionsValidator` + appsettings). Enum already matches.
- **1.7 roles:** `Admin→Administrator`, `Design→Designer`, `Deployment→Deployer`, `Audit→Administrator` (collapse), `AuditReadOnly→Viewer` (collapse) — removes the `OperationalAudit`/`AuditExport` SoD (accepted).
## Key findings that change the plan
1. **OtOpcUa LDAP section is `Security:Ldap`, not `Authentication:Ldap`.** Both `components/auth/GAPS.md §1`
and the auth current-state doc are wrong; the code (and the prior fix in memory) use `Security:Ldap`.
→ Task 1.4 for OtOpcUa is only `UseTls``Transport`, not a section move.
2. **OtOpcUa "double-singleton bug" is already mitigated.** Both registration sites use `TryAddSingleton`
(dedupes); the `Enabled` flag is an intentional fail-closed master switch. → Not a blocking fix; verify and
keep `Enabled`. Removes a risk the plan flagged.
3. **ScadaBridge inbound API keys are a re-architecture, not a token reformat.** The library's ApiKeys model
(`<prefix>_<keyId>_<secret>` Bearer, keyId lookup + constant-time compare, SQLite store, scopes + opaque
constraints) is fundamentally different from ScadaBridge's (raw `X-API-Key`, deterministic by-value HMAC
lookup, SQL Server `ApiKey{Name,KeyHash}`, per-method approval list). Wholesale adoption means re-architecting
inbound-API auth AND resolving a SQLite-vs-SQL-Server storage mismatch. **Needs a scope decision (Decision A).**
4. **OtOpcUa role mapping is config + DB**, not just config (`RoleMapper.Map` baseline + DB `Merge`). The
`IGroupRoleMapper` impl must combine both. OtOpcUa also has `DevStubMode` (no library equivalent — keep app-side)
and a **second LDAP consumer** (OPC UA data-plane impersonation) that must be re-wired too.
5. **MxGateway ApiKeys cutover is the donor path — lowest risk** (delete locals, re-point to library; keep
`ConstraintEnforcer`/gRPC/scopes on top). Confirms the GAPS sequencing (gateway first).
## Task 1.2 (LDAP cutover) — implemented + reviewed (2026-06-02)
Commits: OtOpcUa `257caa7`, MxGateway `c3b466e`, ScadaBridge `ac34dac`. All targeted tests green.
Security review verdict: **sound, no credential-leak regression** in any repo (insecure-transport
guards fire correctly; DevStubMode cannot leak to prod; claim shapes preserved). All three returned
CHANGES-REQUESTED for fixable issues:
- **OtOpcUa** (no Critical): (I1) insecure-transport guard is login-time only — add startup
validation gated on `Enabled` for defense-in-depth, verify prod overlays still boot; (I2) integration
stub pre-populates `Roles` so the Groups→mapper path isn't actually exercised — fix the stub; (I3)
document/test the zero-role fail-closed fallback.
- **MxGateway** (2 Critical): (C1) library strips group DNs to short RDN names before the
`LdapGroupClaimType` claim → verify prior behaviour, document, drop the now-dead full-DN branch in the
mapper, add a claim-value assertion; (C2) gateway's local `LdapOptions` is now a shadow copy (validated
but unused at runtime) → fold to the shared type or document the drift. (I1) shared `LdapOptionsValidator`
has **no `Enabled=false` guard** → validates even when LDAP is disabled (real for MxGateway, which can
disable dashboard LDAP).
- **ScadaBridge** (2 Critical): (C1) `ConfigSecretsTests` still checks the OLD flat key → passes
vacuously, no longer guards secret-in-config — repoint to nested key; (C2) `production-checklist.md`
still lists deleted flat keys → update; (I) unsafe `(RoleMappingResult)Scope!` cast → null-guard.
**Cross-cutting decision — shared library `LdapOptionsValidator` `Enabled` guard:** the validator runs
regardless of `Enabled`, requiring Server/SearchBase/ServiceAccountDn even when LDAP is off. Correct fix =
add an `if (!Enabled) return Success` guard to the shared validator and republish `0.1.1`, re-pinning all
consumers. (Alternative: each consumer always supplies those fields. The library fix is the principled one.)
## Task 1.2/1.4 — DONE (reviewed + fixed, 2026-06-02)
Library hardened to **`0.1.1`** (`LdapOptionsValidator` skips when `Enabled=false`), republished, re-pinned in all 3 repos.
Fix commits: OtOpcUa `c4f315e` (startup insecure-transport guard gated on Enabled/DevStub + `Transport: Ldaps`
declared in the 3 prod overlays + test fidelity), MxGateway `f4dc11b` (group-claim shape documented as
non-breaking — claim read nowhere in prod; shadow `LdapOptions` kept with a drift-warning doc), ScadaBridge
`4db8c37` (secret-test repointed to nested key, prod checklist updated, `Scope` cast guarded). All targeted
suites green. **1.2 (LDAP) + 1.4 (config) complete across all 3 repos.**
Remaining Phase 1: **1.3 ApiKeys** (MxGateway donor cutover — low risk; ScadaBridge full re-architecture —
largest single item: SQLite store + Bearer format + scopes + key re-issuance), **1.5** claims/cookies,
**1.6** dev base DN, **1.7** canonical roles.
## Task 1.3 ApiKeys — MxGateway DONE; ScadaBridge pending (2026-06-02)
**Library bumped to `0.1.2`**: `Auth.ApiKeys` SQLite migrator now stamps schema version **2** (was 1) to
match the donor gateway's deployed `gateway-auth.db` — without it the gateway would fail to boot (migrator
threw on a newer on-disk version). Final schema byte-identical since v1; no key re-issuance. Republished,
re-pinned in MxGateway. (+2 migrator tests.)
**MxGateway 1.3 — DONE + APPROVED** (commit `05009d7`): deleted 28 local pipeline files, adopted
`Auth.ApiKeys 0.1.2` via `AddZbApiKeyAuth`; kept `ConstraintEnforcer`/gRPC interceptor/scopes/CLI/dashboard
on top via a `GatewayApiKeyIdentityMapper` (library identity → gateway identity-with-EffectiveConstraints).
Review: no Critical; no auth bypass, schema compat + crypto parity + gRPC status mapping verified. Non-blocking
follow-ups: (a) dashboard mutations now write two audit rows (library + `dashboard-*`) — fine, note for Phase 2
audit bridging; (b) nit: `GatewayApiKeyIdentityMapper` uses `Constraints as string` (opaque coupling) — consider
a guard/contract test.
**ScadaBridge 1.3 — PENDING**: the full inbound-API re-architecture (SQL Server → SQLite store, `X-API-Key`
→ Bearer, per-method-approval → scopes/constraints, **all inbound keys re-issued**). Largest/highest-risk
single item in the program; warrants its own focused pass (likely decomposed).
## ScadaBridge ApiKeys re-architecture — spec (FULL ADOPT, 2026-06-02)
Decision: **full adopt** the library SQLite store + scopes model. Single consistent contract all layers build to:
- **Token format**: `Authorization: Bearer sbk_<keyId>_<secret>` (prefix `sbk`). Replaces the raw `X-API-Key` header.
- **Scope model = method name.** A key's `Scopes` set = the API-method names it may call. `ApiMethod.ApprovedApiKeyIds`
(CSV of key int IDs) is **retired**; per-method approval moves to the key's scopes. Auth check at the endpoint:
`identity.Scopes.Contains(methodName)`.
- **Storage**: inbound keys move to the library's SQLite store (new `ScadaBridge:InboundApi:ApiKeyStore` sqlite path
+ pepper via `ApiKeyOptions.PepperSecretName`, `RunMigrationsOnStartup`). The SQL Server `ApiKey` entity is retired;
`ApiMethod` is KEPT minus `ApprovedApiKeyIds` (EF migration drops the column). `InboundApiRepository` loses its ApiKey
methods + `GetApprovedKeysForMethodAsync`.
- **Auth path** (`InboundAPI`): endpoint reads Bearer, calls library `IApiKeyVerifier.VerifyAsync`, then the scope check.
PRESERVE the security invariants: 401 (missing/invalid/disabled), **403 identical message for both "method not found"
and "not in scope"** (enumeration-safety, InboundAPI-011), constant-time compare (library does it), active-node 503 +
body-cap 413 filters unchanged, audit actor = key DisplayName. Delete `ApiKeyValidator` hashing + `ApiKeyHasher`.
- **Management** (`ManagementActor` + CLI `security api-key` + Commons messages): drive the library `IApiKeyAdminStore` +
`ApiKeySecretGenerator`. `create` returns `sbk_<keyId>_<secret>` once (plaintext-once preserved); methods a key may call
= its scopes, set on create/update (e.g. `--methods a,b` or grant/revoke-method commands). `list` returns id/name/enabled
(no secret), `update --enabled`, `delete`/revoke. Audit preserved.
- **CentralUI**: `ApiKeys.razor` (list/create/toggle/delete via admin store; show token once), `ApiKeyForm.razor` (edit the
key's method-scopes), `ApiMethodForm.razor` (method-side "approved keys" now reads/writes key scopes across keys).
- **Breaking change**: all inbound keys re-issued (new format); clients switch `X-API-Key``Authorization: Bearer`.
Needs a runbook + CHANGELOG. Re-pin ScadaBridge Auth packages to **0.1.2**.
Sub-tasks (sequential where files overlap): **(A)** storage retire + EF migration + library wiring/options;
**(B)** auth-path rewrite (Bearer + verifier + scope check); **(C)** management (ManagementActor + CLI + messages);
**(D)** CentralUI pages; **(E)** runbook/CHANGELOG + integration test sweep. A→(B,C)→D→E.
Sequencing note: doing it **additively** (add library path, switch auth, rewire mgmt/UI, retire SQL Server entity LAST)
keeps the build green at each step.
### Re-arch progress
- **A+B foundation — DONE + reviewed+fixed** (commits `a94558c`, `1fcc4f5`; re-pinned to 0.1.2). Library `AddZbApiKeyAuth`
wired additively (`ScadaBridge:InboundApi:ApiKeyStore`, prefix `sbk`, reuses inbound pepper); inbound endpoint now uses
the library verifier + Bearer + `Scopes.Contains(methodName)`. Security invariants preserved: 401 generic / 403 identical
body for not-found AND not-in-scope (enumeration-safe, pinned to a literal in tests), scope-check-before-DB (no timing
oracle), fail-fast pepper preflight (Central), audit actor = DisplayName. Old SQL Server path still compiles (retired in E).
163/163 InboundAPI tests green. **NOTE for E:** the library's `ApiKeySecretGenerator.NewSecret()` is `internal` — seed/create
keys via the public `ApiKeyAdminCommands.CreateKeyAsync` seam (returns the assembled `sbk_…` token).
- **Library 0.1.3 — DONE + reviewed + PUBLISHED** (scadaproj commits `468959c` impl, `290e85c` tests; pushed to Gitea,
ApiKeys 0.1.3 nupkg verified HTTP 200). Added `IApiKeyAdminStore.SetScopesAsync(keyId, scopes, ct)` + `SetEnabledAsync(keyId,
enabled, whenUtc, ct)` (+ audited facade verbs `ApiKeyAdminCommands.SetScopesAsync`/`SetEnabledAsync` → eventTypes
`set-scopes`/`enable-key`/`disable-key`). **No schema change** (`CurrentVersion` stays 2): scopes column already exists;
`revoked_utc` doubles as the enabled flag (null = enabled), so enable/disable is a reversible toggle that preserves the
secret (proven by test asserting `SecretHash.SequenceEqual` + unchanged `last_used_utc`). This is what lets C/D edit a key's
method-scopes and toggle enabled WITHOUT re-issuing the token. **ScadaBridge must re-pin Auth packages 0.1.2 → 0.1.3.**
- **C (management), D (CentralUI), E (retire SQL Server ApiKey + ApiMethod.ApprovedApiKeyIds migration + runbook/CHANGELOG)
— IN PROGRESS.** Mapping: `CreateApiKeyCommand``CreateKeyAsync` (keyId = `Guid.NewGuid().ToString("N")`,
DisplayName = name, scopes = `--methods`); `ListApiKeysCommand``ListKeysAsync` (enabled = `RevokedUtc is null`);
`UpdateApiKeyCommand(IsEnabled)``SetEnabledAsync`; new set-scopes path → `SetScopesAsync`; `DeleteApiKeyCommand`
revoke-then-`DeleteKeyAsync`. All management message keys switch `int ApiKeyId``string KeyId`.
### Discovered architecture (CentralUI Explore, 2026-06-02) — expands C/D/E
Two facts the original AE spec missed:
1. **CentralUI bypasses the ManagementActor.** `Components/Pages/Admin/ApiKeys.razor`, `ApiKeyForm.razor`, and
`Components/Pages/Design/ApiMethodForm.razor` call `IInboundApiRepository` (SQL Server EF) **directly** — they do NOT
send the `CreateApiKeyCommand`/etc. management messages. So there are **two** management entry points to rewire
(CLI→ManagementActor uses the messages; CentralUI→repository uses the entities). Decoupling: introduce one app-side
**`IInboundApiKeyAdmin` seam** over the library `ApiKeyAdminCommands`, and route BOTH CLI and CentralUI through it
(DRY + single audit path). The message-contract change (int→string) touches only CLI+ManagementActor; the
entity/repository change (`ApiKey.Id`, `ApiMethod.ApprovedApiKeyIds`) touches CentralUI + TransportExport.
2. **TransportExport couples API keys + methods into config export/import** (`Components/Pages/Design/TransportExport.razor`
+ `.razor.cs`, `HashSet<int>` selections, `ExportSelection`). With keys now in the library SQLite store (per-env pepper,
secret-once), a key can't be exported/re-imported usefully. **Decision (user, 2026-06-02): EXCLUDE inbound API keys from
transport — export API methods only; keys are re-created + method-scopes re-granted per environment.**
CentralUI blast radius (string keyId + scopes replace int Id + ApprovedApiKeyIds CSV): `Admin/ApiKeys.razor`,
`Admin/ApiKeyForm.razor`, `Design/ApiMethodForm.razor` (approved-keys ↔ key-scopes), `Design/TransportExport.razor(.cs)`,
`Design/ExternalSystems.razor` (uses method `int` id — methods STAY int in SQL Server, so unaffected for keys),
`Dashboard.razor` (key count), test `Admin/ApiKeyFormAuditDrillinTests.cs`.
### C/D/E decomposition — 5 reviewed green sub-commits (user: "coordinated multi-commit now", 2026-06-02)
- **C1** — re-pin ScadaBridge Auth 0.1.2→0.1.3; add app-side `IInboundApiKeyAdmin` seam (string-keyId model:
Create(name,methods)→(keyId,token) / List / SetEnabled / SetMethods / Delete[=revoke+delete] / GetMethodsForKey /
GetKeysForMethod) over the library facade; register `ApiKeyAdminCommands` + the seam in Host **and** CentralUI DI; seam
unit tests. **Purely additive — build green.**
- **C2** — Commons `Messages/Management/SecurityCommands.cs` contracts int→string keyId + add `Methods` + new
`SetApiKeyMethodsCommand`; rewire ManagementActor handlers + CLI `security api-key` onto the seam; update ManagementActor
tests. (CentralUI unaffected — it doesn't use these messages.)
- **C3** — CentralUI `ApiKeys.razor`/`ApiKeyForm.razor`/`ApiMethodForm.razor` (+ Dashboard count) off `IInboundApiRepository`-
for-keys onto the seam; string keyId; method-scope editing replaces `ApprovedApiKeyIds`; update bUnit test. (Methods stay
in SQL Server; just stop using the `ApprovedApiKeyIds` column — dropped in C5.)
- **C4** — TransportExport: remove API-key selection/export (methods-only); drop key `HashSet<int>` + `ExportSelection` keys;
tests.
- **C5 (=E)** — retire SQL Server `ApiKey` entity + DbContext reg + `IInboundApiRepository` key methods +
`GetApprovedKeysForMethodAsync`; drop `ApiMethod.ApprovedApiKeyIds`; EF migration (drop ApiKeys table + column); delete
residual `ApiKeyValidator`/`ApiKeyHasher`; runbook + CHANGELOG (breaking: re-issue keys, `X-API-Key``Authorization: Bearer`);
full build+test sweep.
#### Re-arch sub-commit progress (2026-06-02)
- **C1 — DONE + reviewed** (ScadaBridge commits `d09def2` seam+re-pin-0.1.3, `7f7ea3f` review polish). `IInboundApiKeyAdmin`
seam (interface in Commons, `LibraryInboundApiKeyAdmin` impl in the Security project over `ApiKeyAdminCommands`), DI in
Host (CentralUI shares that container). Spec PASS + code-review APPROVED (guard `name`, doc throws/O(n) contract).
**Two pre-existing Host.Tests reds from the prior session's Auth work (uncaught because Host.Tests weren't run) fixed as
part of restoring a green baseline:** (a) `7e25efa` — A+B's Central pepper preflight (`1fcc4f5`) needs a ≥16-char test
`ApiKeyPepper`; supplied via env vars in the Central test fixtures (test-only) + 3 guard tests; Host.Tests 86 fail → 1.
(b) `55099b1` — LDAP cutover (`ac34dac`) made component-lib `AddSecurity(IConfiguration)` violate ScadaBridge's
`OptionsTests` arch rule; moved `AddZbLdapAuth` to the Host composition root, dropped the param (behaviour-preserving);
Host.Tests 1 fail → **0**. Green baseline now: build 0/0, Host.Tests 228, Security.Tests 89, InboundAPI 163, CentralUI 584.
**NOTE for Phase 2:** `AuditLog.AddAuditLog(IConfiguration)` also takes IConfiguration but is intentionally NOT in the
`OptionsTests` scanned set — revisit during audit adoption (Task 2.5), don't silently "fix".
- **C2 — DONE + reviewed** (SB commits `6518e93` rewire, `8219b8e` review fixes). Commons messages int→string keyId
+ `Methods` + new `SetApiKeyMethodsCommand`; ManagementActor's 5 API-key handlers + CLI `security api-key` now drive
`IInboundApiKeyAdmin`; ScadaBridge management audit preserved (actor = user.Username; secret/token never audited/logged).
Spec PASS, code-review APPROVED after fixes: not-found now throws `ManagementCommandException` BEFORE audit (no spurious
audit on no-op update/delete/set-methods); empty `Methods` rejected server-side (prevents unusable key on create + stealth-
disable via `set-methods ""`); token advisory→stderr. Green: ManagementService 125, CLI 188, + Security/InboundAPI/Host/
CentralUI unchanged. CentralUI + SQL Server `ApiKey` entity/repo untouched (C3/C5).
- **C3 — DONE + reviewed** (SB commits `107e524` rewire, `d1191fd` review fixes). CentralUI `Admin/ApiKeys.razor`,
`Admin/ApiKeyForm.razor`, `Design/ApiMethodForm.razor`, `Dashboard.razor` onto `IInboundApiKeyAdmin`: string keyId,
method-NAME scopes replace the `ApprovedApiKeyIds` CSV, one-time token display on create, key Name fixed-after-create
(no rename in the lib model). The "approved keys ↔ key scopes" inversion is a pure tested helper
`CentralUI/Services/ApiMethodKeyScopeReconciler.cs` (save method entity first, then reconcile each affected key's full
scope set fresh; empty-last-scope revoke is blocked with a clear message, never pushes an empty set). Spec PASS,
code-review APPROVED after fixes: seam `bool` not-found now surfaced (no silent success), partial-reconcile-failure
guidance ("method saved, key scopes partially applied — review on API Keys page"), create validation order, concurrent-
edit reconciler test. CentralUI.Tests 595 green; all other suites unchanged. TransportExport + SQL Server entities/repo
untouched (C4/C5). (Also removed a stray `Name` artifact file from an accidental redirect — not committed.)
- **C4 — DONE + reviewed** (SB commits `731cfd3` rewire, `b13d7b3` review polish). TransportExport excludes inbound API
keys (methods-only) end-to-end — UI selection, `ExportSelection`, DependencyResolver, EntitySerializer/DTOs, BundleExporter,
manifest/summary, CLI `--api-keys`, ManagementActor `HandleExportBundle`, and the IMPORT path (BundleImporter/ArtifactDiff:
no key creation; method overwrite PRESERVES the destination's existing `ApprovedApiKeyIds`, doesn't clobber). Method export
drops `ApprovedApiKeyIds`. Backward-compat: legacy bundles with an `apiKeys` section still deserialize (tolerant `ApiKeys?`
field via shared `BundleJsonOptions` + `WhenWritingNull`) and are IGNORED on import with an `ImportResult.ApiKeysIgnored`
count + audit stamp; new exports omit the field. UI info note added. Spec PASS, code-review APPROVED (note: review I-1
"added-unrestricted count" intentionally SKIPPED — wrong model: inbound auth is scope-based, the verifier ignores
`ApprovedApiKeyIds`, so a new method is callable by NO key until a scope is granted). Transport.Tests 60, IntegrationTests
34 green. SQL Server `ApiKey`/`ApiMethod` entities + repo untouched (C5).
- **C5 (=E) — DONE + reviewed** (SB commit `afa5598`). Retired SQL Server `ApiKey` entity + 7 `IInboundApiRepository` key
methods + `ApiMethod.ApprovedApiKeyIds` + `DbSet<ApiKey>`/fluent config + residual `ApiKeyHasher`/`IApiKeyHasher`/
`ApiKeyValidator` (+ their tests). EF migration `RetireInboundApiKeyStore` (DropTable `ApiKeys` + DropColumn
`ApprovedApiKeyIds`; `Down` recreates both byte-faithfully; ModelSnapshot consistent). CHANGELOG.md + tracked runbook
`docs/operations/inbound-api-key-reissue.md` (BREAKING: `X-API-Key``Authorization: Bearer sbk_…`, all keys re-issued;
per-env SqlitePath + ≥16-char ApiKeyPepper). Spec PASS, code-review APPROVED: migration Down/snapshot verified, inbound
verifier path (A+B) intact, no live consumer broke. Green: ConfigurationDatabase 241, InboundAPI 148 (was 163: removed
validator/hasher tests), Security 89, Host 227 (was 228: removed validator DI test), ManagementService 125, CLI 188,
CentralUI 595, Transport 60+34. (Pre-existing infra-dependent failures — IntegrationTests ×11, AuditLog ×1, needing live
LDAP/SQL/SMTP — proven identical at baseline `b13d7b3` via git-stash; StaleTagMonitor flaky timer tests pass 13/13 isolated.)
**Installer/secret note:** the C5 code-review flagged the (untracked, intentionally `.gitignore`d `/deploy/`) `install.ps1`
not injecting the pepper — fixed ON DISK (the on-disk installer now takes `-ApiKeyPepper`); a subagent had force-committed
the ignored deploy script (which embeds a real default JWT key) — that commit was RESET (`git reset --mixed`), keeping the
edit on disk and the secret OUT of git history (branch was never pushed). The pepper requirement is documented in the
tracked runbook.
### ✅ Task 1.3 (Adopt ZB.MOM.WW.Auth.ApiKeys) COMPLETE across all repos
MxGateway donor cutover + ScadaBridge full re-architecture (C1 seam → C2 mgmt/CLI → C3 CentralUI → C4 TransportExport →
C5 retire+migration+runbook), all reviewed, lib at **0.1.3**. ScadaBridge inbound API is now 100% on the shared library
(Bearer `sbk_<keyId>_<secret>`, scope = method name, per-key SQLite store + per-env pepper); the SQL Server key model is
fully retired. Remaining Phase 1: **1.5** (AspNetCore claims/cookies, 3 UIs), **1.6** (dev GLAuth base DN), **1.7**
(canonical roles, 3 repos). Then Phase 2 (audit) + Phase 3 (Actor wiring).
## Resolved decisions (2026-06-02)
- **Decision A — ScadaBridge inbound API keys depth → (a) FULL ADOPT.** Re-architect inbound-API auth to the
library's model: `<prefix>_<keyId>_<secret>` Bearer token format, keyId lookup + constant-time compare,
scopes/constraints, and **move inbound API keys into the library's SQLite store** (separate from the SQL Server
config DB). This is the largest, highest-risk item in Phase 1. Implications to handle in Task 1.3:
- New SQLite auth DB for ScadaBridge inbound keys (path via `ApiKeyOptions.SqlitePath`); migrate/retire the
SQL Server `ApiKey{Name,KeyHash}` table + `ApiMethod.ApprovedApiKeyIds` linkage.
- Re-model **per-method approval** as the library's scopes/constraints (or the opaque constraint blob) — the
`ApiMethod.ApprovedApiKeyIds` set becomes per-key scope grants.
- Switch the inbound transport from `X-API-Key` header to `Authorization: Bearer <token>` (a client-visible
contract change — extends the already-accepted token-format change; needs the interop check + a doc/CHANGELOG note).
- Existing raw keys cannot be migrated (deterministic-by-value hash, no keyId/secret split) → **re-issue** all
inbound API keys; call this out in the cutover runbook.
- **Decision B — canonical role mappings → confirmed as tabled above** (OtOpcUa `ConfigViewer→Viewer`,
`ConfigEditor→Designer`, `FleetAdmin→Administrator+Deployer`; MxGateway `Viewer/Admin`; ScadaBridge
`Admin→Administrator`, `Design→Designer`, `Deployment→Deployer`, `Audit→Administrator`, `AuditReadOnly→Viewer`).
- **Decision C — dev escape hatches → keep app-side, unchanged.** OtOpcUa `DevStubMode` and MxGateway
`AllowAnonymousLocalhost`/loopback bypass have no library equivalent; preserve them in each app outside the
shared `Auth.Ldap` path.
## Phase 1 tail — decisions + current state (2026-06-02, resumed)
Task 1.0 gate read-only re-exploration confirmed the post-cutover state for 1.5/1.6/1.7 (3 parallel Explore agents):
- **None of the 3 repos reference `ZbClaimTypes`/`ZbCookieDefaults` yet.** `ZbClaimTypes.Name`/`Role` alias the framework
URIs (`ClaimTypes.Name`/`.Role`); `DisplayName`/`Username`/`ScopeId` = new `zb:`-prefixed strings.
- Claim mints today: **OtOpcUa** `AuthEndpoints.cs` uses `ClaimTypes.NameIdentifier` + `JwtTokenService.{Username,DisplayName}ClaimType` ("Username"/"DisplayName") + `ClaimTypes.Role` (JWT-in-cookie). **MxGateway** `DashboardAuthenticator.CreatePrincipal` uses `ClaimTypes.{NameIdentifier,Name,Role}` + custom `mxgateway:ldap_group`. **ScadaBridge** `CentralUI/Auth/AuthEndpoints.cs` + `JwtTokenService` use **plain** `"DisplayName"/"Username"/"Role"/"SiteId"/"LastActivity"` strings — `"Role"`/`"SiteId"` are load-bearing in `TokenValidationParameters` + every `AuthorizationPolicies` `RequireClaim`.
- Cookie names confirmed: `ZB.MOM.WW.OtOpcUa.Auth` / `MxGatewayDashboard` / `ZB.MOM.WW.ScadaBridge.Auth`. All three apps already do HttpOnly+SameSite=Strict+sliding+SecurePolicy via hand-rolled `PostConfigure` (no `ZbCookieDefaults.Apply`).
- Dev base DNs today: OtOpcUa + MxGateway = `dc=lmxopcua,dc=local`; ScadaBridge = `dc=scadabridge,dc=local`.
- `CanonicalRole` is referenced **nowhere** in any repo yet (Task 1.7 is its first use).
**Decision A3 (Task 1.6 dev base DN) → `dc=zb,dc=local`** (product-neutral, matches the ZB.MOM.WW family; all 3 dev
fixtures + dev appsettings move to it — prod directories untouched). ScadaBridge GLAuth user DNs become
`cn=<user>,ou=<group>,ou=users,dc=zb,dc=local`; OtOpcUa/MxGateway leave `dc=lmxopcua`.
**Decision (Task 1.5 ScadaBridge depth) → FULL canonical incl. role/scope.** Migrate ScadaBridge's role claim to the
framework URI (`ZbClaimTypes.Role`) and the site claim to `ZbClaimTypes.ScopeId` across cookie + JWT mint +
`TokenValidationParameters` + every policy `RequireClaim` + tests (cleanest: redefine the `JwtTokenService.*ClaimType`
constants to alias `ZbClaimTypes.*` so all existing references inherit canonical values). **Treated as high-risk** for the
ScadaBridge slice (serial spec→code review, full ScadaBridge suite). OtOpcUa/MxGateway slices stay standard.
### ✅ Task 1.5 (AspNetCore claims/cookies) COMPLETE across all 3 repos (reviewed)
- **OtOpcUa** `83856b7` + review-fix `d0777ee` (spec ✅, code ✅): `.Security` adds the `Auth.AspNetCore` pkg ref; `JwtTokenService.{Username,DisplayName}ClaimType` alias `ZbClaimTypes.{Username,DisplayName}`; cookie principal emits `ZbClaimTypes.Name` (replaced `NameIdentifier` — grep-confirmed no other reader) + `ZbClaimTypes.Role`; cookie via `ZbCookieDefaults.Apply`, name kept. Issued JWT is documented as issue-only (no `AddJwtBearer` in OtOpcUa; role stays short `"Role"`; `BuildValidationParameters` pins `RoleClaimType`/`NameClaimType` for forward-compat). 35/35.
- **MxGateway** `7e1af37` (spec ✅, code ✅): `DashboardAuthenticator` emits `ZbClaimTypes.{Username,DisplayName}` + identity `nameType/roleType=ZbClaimTypes.{Name,Role}`; keeps `mxgateway:ldap_group` + `NameIdentifier` (HubTokenService reads it); cookie via `ZbCookieDefaults.Apply(requireHttps:true, idleTimeout:8h)` (8h preserved), `RequireHttpsCookie=false` dev-HTTP override kept, name kept. Dashboard 85/85; full 575/578 (3 pre-existing FakeWorker reds).
- **ScadaBridge** `a0938f7` + spelling-fix `c185a56` (high-risk; spec ✅, code ✅): `JwtTokenService.*ClaimType` constants aliased to `ZbClaimTypes.*` (`RoleClaimType`=framework URI, `SiteIdClaimType`=`ScopeId`); JWT mint `MapInboundClaims=false`+`OutboundClaimTypeMap.Clear()` (instance-isolated, reviewer-verified) and validate `MapInboundClaims=false`+pinned `RoleClaimType`/`NameClaimType` → byte-symmetric round-trip; cookie identity `roleType=RoleClaimType`; every site-scope read on `SiteIdClaimType`; cookie via `ZbCookieDefaults.Apply` (30-min idle), name kept. No `AddJwtBearer` middleware (sole JWT path = `JwtTokenService.ValidateToken`). Role VALUES unchanged. Security 93/93, CentralUI 595/595, ManagementService 125/125, Host 227/227; infra reds (Integration ×11, AuditLog ×1, flaky StaleTagMonitor) confirmed pre-existing by stash-at-HEAD. **Minor (deferred):** a stale "PostConfigure" comment word; JWT-validated principals have null `Identity.Name` (no regression, no bearer path).
### ✅ Task 1.6 (unify dev LDAP base DN → `dc=zb,dc=local`) COMPLETE across all 3 repos (reviewed, code-review-only per `small` class)
Mechanical, grep-verified substitution of each repo's dev directory base DN to the neutral `dc=zb,dc=local`; prod left untouched (no in-repo prod overlay carries the dev DN; `/deploy` is gitignored and was not touched). OU structure preserved throughout.
- **OtOpcUa** `8ba289f`: `LdapOptions.SearchBase` default, integration `docker-compose.yml` `LDAP_ROOT` + `TwoNodeClusterHarness` SearchBase/ServiceAccountDn, `AclEdit.razor` placeholder, `docs/v2/{dev-environment,phase-7-e2e-smoke}`. `grep dc=lmxopcua`→empty. Security 35, AdminUI 121, ControlPlane 29, Runtime 74 green.
- **MxGateway** `9572045`: `LdapOptions` defaults, `appsettings.json`, dashboard test group-DNs, `glauth.md` (dev DNs only — the `DC=corp,…` prod-example column left intact), `CLAUDE.md` index line. `grep dc=lmxopcua`→empty. 575/578 (3 pre-existing FakeWorker).
- **ScadaBridge** `6ae6051` (14 files): app `appsettings.Central.json`, the 4 docker/docker-env2 central-node configs, `infra/glauth/config.toml` baseDN, `infra/tools/ldap_tool.py`, 4 test fixtures, `docs/test_infra/*`. Cluster nodes use the shared `scadabridge-ldap` container backed by the now-updated `infra/glauth/config.toml` (no separate seed). `grep dc=scadabridge`→only the 2 excluded historical `docs/plans/*` records + synthetic `dc=example` left. Full non-infra suite green (Security 93, CentralUI 595, ManagementService 125, Host 227, ConfigurationDatabase 241).
## Task 1.7 (canonical roles) — inventory + decisions (2026-06-02)
Read-only role inventory (3 parallel Explore agents) found the canonical-role standardization is bigger than the plan's "~5 min/repo": it changes role string VALUES (claims + config-DB + enforcement), needs config-DB DATA migrations, and makes the ScadaBridge SoD collapse real. **EF persistence confirmed:** OtOpcUa `AdminRole` is `HasConversion<string>().HasMaxLength(32)` (stores the enum MEMBER NAME); ScadaBridge `LdapGroupMappings.Role` is free-text `nvarchar(500)` with HasData seed. Both → renaming role values requires a data migration.
**Resolved per-repo mapping (Decision B + filled gaps):**
- **MxGateway:** `Viewer→Viewer` (no-op), `Admin→Administrator`. Clean rename of `DashboardRoles.Admin` VALUE + `GroupToRole` config + `GatewayOptionsValidator` allowed-set. NO DB (dashboard roles not persisted). ⚠️ MUST NOT touch the separate gRPC `GatewayScopes.Admin = "admin"` data-plane scope.
- **OtOpcUa:** `ConfigViewer→Viewer`, `ConfigEditor→Designer`, `FleetAdmin→Administrator`, **`DriverOperator→Operator`** (plan-omitted gap). Rename `AdminRole` members + DevStub/appsettings `GroupToRole` values + every `[Authorize(Roles=)]`/`RequireRole` role string. **Config-DB data migration** on `LdapGroupRoleMappings.Role` (raw SQL UPDATE old→new; column is the same string col so it's a data, not schema, change). Data-plane `NodePermissions` bitmask UNTOUCHED. Enforcement preserved: `Designer`(←ConfigEditor) keeps the deploy access it has today (`Deployments.razor` `Roles="FleetAdmin,ConfigEditor"``"Administrator,Designer"`). Policy NAMES (e.g. `"DriverOperator"`/`"FleetAdmin"` policy keys) may stay as internal indirections; only the role STRINGS they check become canonical.
- **ScadaBridge (heaviest):** `Admin→Administrator`, `Design→Designer`, `Deployment→Deployer`, **`Audit→Administrator`** (collapse), **`AuditReadOnly→Viewer`** (collapse). Requires: config-DB data migration (`LdapGroupMappings.Role` UPDATE + HasData seed + ModelSnapshot); ~20 hard-coded role-string sites (ManagementActor site-scope bypass ×6 + `GetRequiredRole`, DebugStreamHub ×2, BrowseService/BindingTester, policy arrays); SoD policy rework `OperationalAuditRoles→{Administrator,Viewer}` + `AuditExportRoles→{Administrator}` so former `AuditReadOnly`(→Viewer) keeps audit-READ but still can't export; all role-asserting tests. **Real security consequence (accepted):** `Audit→Administrator` grants former audit-only users the full admin surface (create sites, manage LDAP mappings/API keys, import bundles). Site-scoping stays orthogonal (computed from `PermittedSiteIds`, Deployment-only).
**Decisions (2026-06-02):** depth = **FULL canonical (values change, incl. config-DB migrations + real SoD escalation)**; cadence = **proceed now**. Execution: MxGateway + OtOpcUa single high-risk commits each (parallel); ScadaBridge as a focused atomic change (12 coupled commits — the rename + seed + migration are coupled, so it does not cleanly split into 1.3-style green sub-increments). High-risk serial review (spec→code) per repo + full ScadaBridge suite.
### ✅ Task 1.7 (canonical roles) COMPLETE across all 3 repos (high-risk; spec ✅ + code ✅ each)
- **MxGateway** `04bce3ff` (spec ✅, code ✅): `DashboardRoles.Admin` value `"Admin"→"Administrator"` (Viewer unchanged) + `GroupToRole` config; validator/enforcement inherit the constant. NO DB (dashboard roles not persisted). gRPC `GatewayScopes.Admin="admin"` proven untouched. 577/580 (3 pre-existing FakeWorker).
- **OtOpcUa** `c1619d9` (spec ✅, code ✅): `AdminRole` enum members → `Viewer/Designer/Administrator`; `DriverOperator` role string → `Operator` (policy NAMES kept stable); DevStub `["Administrator"]`. **Data migration** `20260602112419_CanonicalizeAdminRoles` (`UPDATE LdapGroupRoleMapping` old→new, reverse Down, snapshot unchanged, no pending model changes). `Deployments.razor` `[Authorize(Roles="Administrator,Designer")]` (deploy access preserved). Data-plane `NodePermissions`/`NodeAcl`/evaluator untouched (proven). Security 45, Configuration 90, AdminUI 121 green. (Minor non-issues: an `ou=FleetAdmin` placeholder DN + a data-plane doc-comment — both LDAP-group/doc text, not role values.)
- **ScadaBridge** `b104760` + doc-fix `4118452` (high-risk; spec ✅, code ✅): `Roles` → canonical `{Administrator,Designer,Deployer,Viewer}` (Audit/AuditReadOnly removed); **SoD reworked** `OperationalAudit={Administrator,Viewer}`, `AuditExport={Administrator}` (Viewer reads-not-exports audit; Administrator does both + full admin). All enforcement literals moved incl. the 6 ManagementActor site-scope bypasses + DebugStreamHub + BrowseService/BindingTester. **Migration** `20260602113822_CanonicalizeRoles` (seed `UpdateData` + idempotent raw catch-all for operator rows; lossy Down documented; snapshot consistent). **Real SoD escalation** (Audit→Administrator gains full admin) documented in CHANGELOG. Full non-infra suite green (Security 93, CentralUI 595, ManagementService 125, Host 227, ConfigurationDatabase 241); infra reds pre-existing (stash-at-HEAD confirmed). `4118452` corrected stale role-name prose in NavMenu comments (comment-only; CentralUI rebuild 0/0).
## ✅ PHASE 1 COMPLETE (2026-06-02)
All of Tasks 1.01.7 done across OtOpcUa, MxAccessGateway, ScadaBridge — each on its local-only `feat/adopt-zb-auth` branch, **nothing pushed**. The three apps now consume `ZB.MOM.WW.Auth.*` from the Gitea feed (OtOpcUa 0.1.1 Abstractions+Ldap+AspNetCore; MxGateway 0.1.2 all-four; ScadaBridge 0.1.3 all-four): shared LDAP (`Auth.Ldap`), shared API-key model (`Auth.ApiKeys`, ScadaBridge fully re-architected), `IGroupRoleMapper<TRole>` seam, nested/`Transport`-enum config, canonical `ZbClaimTypes`/`ZbCookieDefaults`, unified dev base DN `dc=zb,dc=local`, and the canonical-six role vocabulary (with ScadaBridge's accepted auditor/admin SoD collapse). Every task spec- and code-reviewed; high-risk ones via the serial chain + full-suite runs. **Phase 1 exit gate met.** Next: Phase 2 (audit component — the original ask) starting at the Task 2.0 gate, then Phase 3 (wire audit Actor from the Auth principal).
@@ -0,0 +1,208 @@
# Phase 2 (Audit adoption) — Task 2.0 gate findings + DEEP re-scope (for review)
Companion to `2026-06-02-auth-audit-normalization.md`. Produced by the **Task 2.0 read-only
verification gate** (3 parallel explorers, all paths verified 2026-06-02 against live code on each
repo's `feat/adopt-zb-auth` HEAD). **Status: PAUSED for user review before any audit code is written.**
**Decisions taken (2026-06-02):**
- **Depth = DEEP adopt (canonical record).** Each app's audit record becomes the library's 9-field
`ZB.MOM.WW.Audit.AuditEvent`; domain-specific fields relocate into `DetailsJson`; each app consumes
the library's `IAuditWriter`/`IAuditRedactor`/`AuditOutcome` types. (User chose this over the
gate-recommended lighter "Align" — consistent with the standing maximal/full-adopt directive.)
- **Cadence = re-scope + PAUSE for review.** This doc is the review artifact; implementation does not
start until the user signs off (especially on the ScadaBridge cost, below).
> **Why a re-scope was needed:** the plan's Phase 2 task specs were written from optimistic
> `components/audit/current-state/*` docs (see [[component-status-claims-are-optimistic]]). The gate
> found all three repos' specs are materially off — file refs moved (MxGateway), the target path is
> dormant (OtOpcUa), and the "outright rename" is structurally impossible (ScadaBridge).
---
## The canonical contract (shared `ZB.MOM.WW.Audit` 0.1.0)
`AuditEvent` (sealed record): REQUIRED `EventId:Guid`, `OccurredAtUtc:DateTimeOffset` (UTC-normalized
on set), `Actor:string`, `Action:string`, `Outcome:AuditOutcome`; OPTIONAL `Category:string?`,
`Target:string?`, `SourceNode:string?`, `CorrelationId:Guid?`, `DetailsJson:string?`. **Nine fields.**
`AuditOutcome { Success, Failure, Denied }`. `IAuditWriter.WriteAsync(AuditEvent, CancellationToken)`
best-effort, never throws. `IAuditRedactor.Apply(AuditEvent) -> AuditEvent` — pure, never throws.
The package is pinned (central PM / explicit) + feed-mapped in all three repos; **referenced by none yet.**
---
## OtOpcUa — DEEP (Tasks 2.1 + 2.2) · risk: LOWMEDIUM
**Verified current state:** Commons `AuditEvent` is an **8-field positional record**
`(Guid EventId, string Category, string Action, string Actor, DateTime OccurredAtUtc, string? DetailsJson,
NodeId SourceNode, CorrelationId CorrelationId)` — where `NodeId`/`CorrelationId` are `readonly record
struct` newtypes over `string`/`Guid`. It is an **Akka message** delivered via `DistributedPubSub`
(`provider=cluster`) with **default (reflection) serialization** — no custom serializer. **The structured
actor path is DORMANT: zero production emit sites** construct/`Tell` an `AuditEvent` today (only the tests
do); all live audit goes through the bespoke **stored-procedure path** (`sp_NodeApplied`/`sp_PublishGeneration`/
`sp_ValidateDraft`/`sp_RollbackToGeneration` INSERT directly with `ClusterId`/`GenerationId`, NULL `EventId`).
`AuditWriterActor` (`ControlPlane/Audit/AuditWriterActor.cs`): 500/5s batching, two-layer dedup (in-buffer
`Dictionary<Guid,AuditEvent>` + DB filtered-unique `UX_ConfigAuditLog_EventId`), mapping at `:75-84`.
`ConfigAuditLog` (10 cols, no `Outcome`; `ISJSON` CHECK on `DetailsJson`). `ClusterAudit.razor:78` filters
`a.ClusterId == ClusterId`, but the actor sets `NodeId` not `ClusterId`, so structured rows are invisible.
Package pinned `0.1.0` in `Directory.Packages.props`, feed-mapped, unreferenced.
**Deep design — this is the easy one (the record is already ~canonical):**
- **2.1 (high-risk: actor + contract):** Delete Commons `AuditEvent.cs`; reference `ZB.MOM.WW.Audit.AuditEvent`
from `ZB.MOM.WW.OtOpcUa.Commons` + `…ControlPlane`. Field map: `EventId``EventId`; `OccurredAtUtc`
`DateTime``DateTimeOffset` (widen at construction); `Actor`/`Action`/`Category`/`DetailsJson` direct;
`SourceNode` (unwrap `NodeId.Value``string?`); `CorrelationId` (unwrap `.Value``Guid?`); `Target` unused
(null) — OtOpcUa has no extra domain fields to push into `DetailsJson`, so **no field relocation**. Add the
NEW required `Outcome` (derive: `OpcUaAccessDenied`/`CrossClusterNamespaceAttempt``Denied`; config verbs→
`Success`; no `Failure` in OtOpcUa's vocabulary). `AuditWriterActor : IAuditWriter` (`WriteAsync` wraps the
fire-and-forget `Tell`, returns `Task.CompletedTask` — trivially best-effort). Keep batching/dedup. Mapping
at `:75-84` becomes `NodeId = evt.SourceNode`, `CorrelationId = evt.CorrelationId`, `Outcome = evt.Outcome`,
`EventType = $"{evt.Category}:{evt.Action}"` (storage keeps the composite). Value-type unwrap happens at the
(test + future) construction sites. **Akka wire note:** the message type changes shape → a rolling-deploy
wire break IN PRINCIPLE, but **moot** (no live emit traffic). Flag in the commit; no dual-accept window needed.
- **2.2 (high-risk: EF migration + UI query):** add nullable `Outcome` to `ConfigAuditLog` (+ DbContext mapping
`:429-463`) + EF migration `AddConfigAuditLogOutcome` (chains after `20260602112419_CanonicalizeAdminRoles`).
Fix `ClusterAudit.razor:78` so `ClusterId == null && NodeId` resolves to the cluster (OR-predicate joining
`ClusterNodes`, or populate `ClusterId` at flush). SP path stays bespoke (documented).
- **Package refs:** `…Commons` (record + `AuditOutcome`), `…ControlPlane` (`IAuditWriter`), `…Configuration`
(only if `Outcome` is stored as the enum type; otherwise store `string?`/`int?` and skip).
- **Effort:** ~record swap 5m + actor seam 5m + Outcome derivation 5m (2.1); column+migration+query 5m (2.2).
---
## MxGateway — DEEP (Task 2.3, re-scoped) · risk: MEDIUMHIGH (was "standard")
**Verified current state — the plan's file refs are STALE:** Phase 1 (Task 1.3) **moved**
`IApiKeyAuditStore` + `ApiKeyAuditEntry` + `SqliteApiKeyAuditStore` **into the shared library**
(`ZB.MOM.WW.Auth.Abstractions`/`…ApiKeys` 0.1.2) — they no longer exist in MxGateway. `ApiKeyAuditEntry` =
**5 fields** `(string? KeyId, string EventType, string? RemoteAddress, DateTimeOffset CreatedUtc, string? Details)`,
persisted to the SQLite `api_key_audit` table (5 cols). `IApiKeyAuditStore` = `AppendAsync` + `ListRecentAsync`
(the dashboard "recent audit" view reads via `ListRecentAsync`). **Three producers, but one is library-internal:**
- `ApiKeyAdminCommands` (**library-internal**, in `ZB.MOM.WW.Auth.ApiKeys`) — emits CLI/admin verbs
(`init-db`/`create-key`/`revoke-key`/`rotate-key`/`delete-key`/`set-scopes`/`enable-key`/`disable-key`),
keyless for `init-db`, `RemoteAddress` null on the CLI path. **MxGateway cannot edit these call sites.**
- `DashboardApiKeyManagementService` (MxGateway-local) — `dashboard-*` verbs, real `KeyId` + `RemoteAddress`.
- `ConstraintEnforcer.RecordDenialAsync` (MxGateway-local) — single `constraint-denied` EventType, `RemoteAddress`
hardcoded null, `Details = "{commandKind}: {target}: {ConstraintName}: {Message}"`.
`AppendAsync` currently **propagates** exceptions (no best-effort wrap). Serilog migration **landed** (no blocker).
`ZB.MOM.WW.Audit` unreferenced; `nuget.config` already maps the package.
**Deep design — the library-internal CLI producer forces an adapter:**
- Add `<PackageReference Include="ZB.MOM.WW.Audit" />` to `…Server`.
- New **MxGateway-owned canonical store** `audit_event` (SQLite, 9 canonical columns + `details_json`) with its own
migrator — the existing `api_key_audit` lives in the **library-owned** auth DB schema, so we do NOT alter that
schema. Implement `IAuditWriter` over the new store (best-effort try/catch — fixes the no-wrap gap).
- **Adapter for the library-internal CLI events:** register a MxGateway `IApiKeyAuditStore` impl whose
`AppendAsync(ApiKeyAuditEntry)` maps → canonical `AuditEvent` (`EventId=NewGuid`; `KeyId``Actor` with
`"cli"`/`"system"` fallback; `EventType``Action`; `CreatedUtc``OccurredAtUtc`; `RemoteAddress``SourceNode`;
`Outcome=Success`; `Category="ApiKey"`; `Target=KeyId`; `Details``DetailsJson` wrapped `{"detail":"…"}`) and
forwards to `IAuditWriter`. Its `ListRecentAsync` reads the canonical store and maps back to `ApiKeyAuditEntry`
(so the existing dashboard recent-audit view keeps working) **or** the dashboard view is repointed to canonical.
- **Local producers** (`DashboardApiKeyManagementService`, `ConstraintEnforcer`) rewritten to build canonical
`AuditEvent`s directly via `IAuditWriter` (`constraint-denied``Outcome.Denied`; capture `CorrelationId` from
`MxCommandRequest.ClientCorrelationId` (constraint path — needs threading down) / `HttpContext.TraceIdentifier`
(dashboard); structured `Target` from `commandKind`/`target` (GAPS #6)).
- **Open question for review:** retire `api_key_audit` (canonical store becomes the sole audit table) vs keep it
coexisting. Retiring is cleaner-deep but touches the library's store wiring; coexisting is lower-risk.
- **Effort/classification:** re-scoped from "standard ~5m" to **high-risk** (new store + migrator + adapter +
producer rewrites + dashboard read path + DI + tests). Realistically 23 sub-commits.
---
## ScadaBridge — DEEP (Task 2.5, re-scoped) · risk: **VERY HIGH — audit-subsystem re-architecture**
**This is the one to scrutinize at review.** The gate definitively answered the plan's central claim is FALSE.
**Verified current state:** ScadaBridge's `AuditEvent` (`…Commons/Entities/Audit/AuditEvent.cs`) is a
**24-field** record — `EventId, OccurredAtUtc(DateTime), IngestedAtUtc, Channel(AuditChannel), Kind(AuditKind),
CorrelationId, ExecutionId, ParentExecutionId, SourceSiteId, SourceNode, SourceInstanceId, SourceScript, Actor,
Target, Status(AuditStatus), HttpStatus, DurationMs, ErrorMessage, ErrorDetail, RequestSummary, ResponseSummary,
PayloadTruncated, Extra, ForwardState(AuditForwardState?)`. It is the **storage shape of a partitioned SQL Server
audit table** with these as **queryable columns**. `IAuditPayloadFilter.Apply(ScadaBridgeAuditEvent) ->
ScadaBridgeAuditEvent` (NOT the library's record — a reflection contract test `PayloadFilterContractTests` pins
the typing). `IAuditWriter`/`ICentralAuditWriter` are likewise typed to the 24-field record. **`AuditStatus`
drives the site→central forwarding STATE MACHINE** (`Pending→Submitted→Forwarded→Reconciled`;
`Delivered`/`Failed`/`Parked`/`Discarded`) and the **filter's error-cap logic** (`IsErrorStatus`). The Central
reporting/UI queries by `Channel`/`Kind`/`Status`/`Site`. **Phase 1 did NOT touch any audit-pipeline file** (zero
drift). Blast radius of just the interface rename: ~10 files / ~20 sites; the contract test pins it.
**What DEEP adoption concretely requires here (full honesty):**
Replacing the 24-field record with the 9-field canonical + pushing ~15 domain fields into `DetailsJson` means
**re-architecting the entire audit subsystem**, because those fields are not decorative — they are load-bearing:
1. **Storage:** migrate the partitioned SQL Server audit table from ~24 typed columns to the 9 canonical columns
+ a JSON `DetailsJson` column. Massive, lossy-on-queryability data migration; partitioning scheme likely must
change; `IngestedAtUtc`/`ForwardState` are operational columns the forwarder UPDATEs.
3. **Forwarding state machine breaks:** `Status`/`ForwardState` move into opaque JSON — you cannot `UPDATE` a
JSON-embedded field as a column, and the reconciliation queries `WHERE Status/ForwardState = …` stop working.
The site→central forwarder would have to be redesigned (e.g., promote Status back out of JSON, defeating the
point).
4. **Redactor breaks:** `DefaultAuditPayloadFilter` reads `Channel`/`Status`/`RequestSummary`/`ResponseSummary`/
`ErrorDetail`/`Extra`/`PayloadTruncated` to choose truncation caps — on a 9-field canonical record those are
gone (opaque in `DetailsJson`), so the filter must be rewritten to parse JSON.
5. **Reporting/UI breaks:** Central audit-log queries/filters by Channel/Kind/Status/Site lose SQL queryability.
6. ~Dozens of call sites + the contract test + the perf hot-path test.
**Honest assessment:** ScadaBridge DEEP ≈ the **largest single undertaking in the whole program** (bigger than the
Phase-1 ApiKeys re-arch). The audit component's own GAPS doc says *"Align, don't replace"* for exactly this reason.
**Bounded alternative to weigh at review (recommended if "deep" is to be kept tractable):** make the canonical
`ZB.MOM.WW.Audit.AuditEvent` the **seam/transport + cross-project reporting** shape (the redactor and an
`IAuditWriter` operate on the canonical record; domain richness rides in `DetailsJson`), while the **SQL storage
keeps its typed queryable columns** populated by a storage-side projection (canonical+DetailsJson → columns) and
the forwarding state machine continues to key on the `Status`/`ForwardState` columns. This delivers "deep" at the
seam/record level (library types consumed; domain fields in `DetailsJson` for the canonical view) **without**
gutting the partitioned store, the state machine, the filter, or the reporting — a far safer "deep."
---
## Cross-cutting
- **Branch model:** `feat/adopt-zb-audit` per app, **stacked on `feat/adopt-zb-auth` HEAD** (Phase 3 wires the
audit `Actor` from the Phase-1 Auth principal, so audit must build on auth). Local-only, never pushed.
- **No library change / republish** needed for the chosen designs (MxGateway adapts in-repo) — so no Gitea token
required unless the user later wants the canonical mapping pushed into a shared lib.
- **Phase 3 (unchanged in intent):** `IAuditActorAccessor` seam + wire `AuditEvent.Actor` from the Auth principal
at every authenticated emit site; keep `"system"`/`"cli"` fallbacks for keyless paths.
## Re-scoped task list (for review)
| # | Repo | Re-scoped scope | Class | Risk |
|---|---|---|---|---|
| 2.1 | OtOpcUa | Commons record → canonical `AuditEvent`; `AuditWriterActor : IAuditWriter`; `Outcome` derivation; Akka-wire note (dormant) | high-risk | LowMed |
| 2.2 | OtOpcUa | `ConfigAuditLog.Outcome` column + EF migration + `ClusterAudit` visibility fix; SP path bespoke | high-risk | LowMed |
| 2.3 | MxGateway | new canonical SQLite `audit_event` store + migrator; `IAuditWriter`; `IApiKeyAuditStore`→canonical adapter (for library-internal CLI events) incl. `ListRecentAsync`; rewrite local producers; CorrelationId/Target capture; DI; tests | **high-risk** (↑ from standard) | MedHigh |
| 2.5 | ScadaBridge | **DEEP = audit-subsystem re-arch** (24-field→9-field record everywhere; domain fields→`DetailsJson`; SQL partitioned-table migration; forwarding state machine + filter + reporting rewrite; contract/perf tests) — **OR** the bounded "deep-at-the-seam" alternative above | **very-high-risk** | **VERY HIGH** |
## Implementation status (2026-06-02, deep adoption underway)
- **✅ OtOpcUa 2.1 + 2.2 DONE** (`feat/adopt-zb-audit`, spec ✅ + code ✅): `933dd1a` — deleted bespoke Commons
`AuditEvent`, adopted library `ZB.MOM.WW.Audit.AuditEvent`, `AuditWriterActor : IAuditWriter` (best-effort
`WriteAsync` wraps `Self.Tell`), `AuditOutcomeMapper.FromAction` derivation, batching/dedup intact; `b7f5e88`
nullable `Outcome` column + migration `20260602135350_AddConfigAuditLogOutcome` (additive, chains after
CanonicalizeAdminRoles, no pending model changes) + `ClusterAudit` fix via shared `ClusterAuditQuery` (OR-predicate
joining `ClusterNode` membership). SP path untouched. ControlPlane 45/45, Configuration 80/80 (+3), AdminUI 121/121.
Minor backlog: no `IX_ConfigAuditLog_NodeId` (irrelevant while structured path dormant).
- **✅ MxGateway 2.3 DONE** (`feat/adopt-zb-audit`, spec ✅ + code ✅): `a5944bb` — new MxGateway-owned canonical
SQLite `audit_event` store (same auth DB file via the library's `AuthSqliteConnectionFactory`; library tables
untouched), `CanonicalAuditWriter : IAuditWriter` (best-effort, never throws — closes the library's no-wrap gap),
`CanonicalForwardingApiKeyAuditStore : IApiKeyAuditStore` adapter (maps `ApiKeyAuditEntry`→canonical w/ system/cli
fallback + constraint-denied→Denied + DetailsJson wrap; `ListRecent` round-trips for the dashboard view), DI
overrides the library's `TryAddSingleton`'d store; `7ea8358` — Dashboard + ConstraintEnforcer rewritten to emit
canonical `AuditEvent` directly via `IAuditWriter` with structured `Target` + (dashboard) `CorrelationId`. 587 pass,
3 pre-existing FakeWorker reds, +10 tests. `api_key_audit` left unused (documented). Minor backlog: dup `WrapDetail`,
per-op `EnsureTable`, a test temp-dir leak, unfiltered `ListRecent` category.
- **✅ ScadaBridge 2.5 — DONE (FULL re-arch, user-chosen).** Decomposed into C1C7 (design in
`2026-06-02-scadabridge-audit-rearch.md`), all spec+code reviewed, MSSQL-verified, local-only on `feat/adopt-zb-audit`.
Canonical record everywhere; site SQLite two-table (canonical + forwarding sidecar); central `dbo.AuditLog` collapsed to
10 canonical cols + persisted computed cols (`CollapseAuditLogToCanonical` migration); redactor/outcome/UI/export/CLI all
canonical. Forwarding state machine preserved (sidecar) + queryability preserved (persisted computed columns) — the design's
key insight that central is append-only made pure-9-col central feasible without gutting forwarding.
## Open items to confirm at review
1. **ScadaBridge:** full audit re-architecture (pure 9-col storage) vs the **bounded "deep-at-the-seam"** variant
(canonical record at the seam/reporting boundary; keep typed storage columns + state machine). Strongly
recommend the bounded variant.
2. **MxGateway:** retire `api_key_audit` (canonical store is sole) vs keep it coexisting.
3. **OtOpcUa:** confirm leaving the SP path bespoke (structured path is dormant; canonicalization is forward-looking
prep) is acceptable, and the `ClusterAudit` fix approach (OR-predicate vs populate `ClusterId`).
4. **Sequencing:** OtOpcUa (2.1→2.2) and MxGateway (2.3) are independent + tractable; ScadaBridge (2.5) is the
gating risk — do it last, and as staged reviewed sub-commits regardless of variant.
@@ -0,0 +1,347 @@
# Auth + Audit Normalization Implementation Plan
> **For Claude:** REQUIRED SUB-SKILL: Use superpowers-extended-cc:executing-plans to implement this plan task-by-task.
**Goal:** Publish `ZB.MOM.WW.Auth` (4 pkgs) + `ZB.MOM.WW.Audit` (1 pkg) to the Gitea feed and adopt both across OtOpcUa, MxAccessGateway, and ScadaBridge, ending with every audit emit site carrying the Auth-resolved principal as `AuditEvent.Actor`.
**Architecture:** Library-major waterfall — Phase 0 publish/feed-map → Phase 1 full Auth adoption (auth GAPS #1#8) → Phase 2 full Audit adoption (audit GAPS #1#3,#5,#6) → Phase 3 wire `Actor` from the principal. Behaviour-preserving cutover except two accepted changes (ScadaBridge token format, canonical-roles collapse). One feature branch per repo per library phase; local-only delivery (no `git push`).
**Tech Stack:** .NET 10, NuGet (Gitea feed + central package management), Akka.NET (OtOpcUa/ScadaBridge), EF Core + SQL Server (OtOpcUa) / SQLite (MxGateway, ScadaBridge site), Blazor admin UIs, gRPC (gateway), LDAP/GLAuth, peppered HMAC API keys, xUnit.
**Design doc:** [`2026-06-02-auth-audit-normalization-design.md`](2026-06-02-auth-audit-normalization-design.md)
**Fidelity note:** Phase 0 tasks are command-exact and executable as written. Phase 13 cutover tasks name exact files-to-edit and acceptance criteria but their per-step diffs are elaborated **just-in-time** by the per-phase "explore + elaborate" gate task (the implementer reads the named source first) — these repos' auth source has not been opened during planning, only the normalized `components/*/current-state/` docs. Audit (Phase 2) tasks cite the exact paths/lines those docs provide.
**Prerequisite the executor must supply:** Phase 0 push needs `GITEA_NUGET_KEY` (Gitea token with `package:write`). The agent cannot mint this — the user exports it, or runs the push step via `!`.
---
## PHASE 0 — Publish & feed-map (executable now)
Branch: work on `docs/auth-audit-normalization` (current) or a fresh `chore/publish-auth-audit`. The library packs happen in `scadaproj`; the feed-map edits happen in the three sibling repos (each on its own `feat/adopt-zb-auth` branch — created here, reused in Phase 1).
### Task 0.1: Add a push script for ZB.MOM.WW.Audit
**Classification:** trivial
**Estimated implement time:** ~2 min
**Parallelizable with:** none (blocks 0.3)
**Files:**
- Create: `ZB.MOM.WW.Audit/build/push.sh`
**Step 1: Create the script** (mirror `ZB.MOM.WW.Auth/build/push.sh`)
```bash
#!/usr/bin/env bash
# push.sh — pack and push the ZB.MOM.WW.Audit NuGet package to the Gitea feed.
#
# Required environment variables:
# GITEA_NUGET_SOURCE — full URL of the Gitea NuGet feed
# GITEA_NUGET_KEY — Gitea access token with package:write permission
set -euo pipefail
: "${GITEA_NUGET_SOURCE:?set GITEA_NUGET_SOURCE to your Gitea NuGet feed URL}"
: "${GITEA_NUGET_KEY:?set GITEA_NUGET_KEY to your Gitea access token}"
dotnet pack -c Release -o ./artifacts
dotnet nuget push "./artifacts/*.nupkg" \
--source "$GITEA_NUGET_SOURCE" \
--api-key "$GITEA_NUGET_KEY" \
--skip-duplicate
```
**Step 2:** `chmod +x ZB.MOM.WW.Audit/build/push.sh`
**Step 3: Commit**
```bash
git add ZB.MOM.WW.Audit/build/push.sh && git commit -m "build(audit): add Gitea push.sh"
```
### Task 0.2: Build + test both libraries green before publishing
**Classification:** small
**Estimated implement time:** ~4 min
**Parallelizable with:** 0.1
**Files:** none (verification only)
**Step 1:** `cd ZB.MOM.WW.Auth && dotnet test` — expect all 172 pass.
**Step 2:** `cd ZB.MOM.WW.Audit && dotnet test` — expect all 19 pass.
**Acceptance:** both suites green. If either fails, STOP — do not publish a red library.
### Task 0.3: Pack + push both libraries to the Gitea feed
**Classification:** standard
**Estimated implement time:** ~4 min (+ network)
**Parallelizable with:** none (blocked by 0.1, 0.2)
**Files:** none (publishes artifacts)
**Step 1: Export credentials** (user-supplied token)
```bash
export GITEA_NUGET_SOURCE="https://gitea.dohertylan.com/api/packages/dohertj2/nuget/index.json"
export GITEA_NUGET_KEY="<gitea token with package:write>"
```
**Step 2:** `cd ZB.MOM.WW.Auth && ./build/push.sh`
**Step 3:** `cd ZB.MOM.WW.Audit && ./build/push.sh`
**Step 4: Verify all 5 resolve (HTTP 200)**
```bash
for p in zb.mom.ww.auth.abstractions zb.mom.ww.auth.ldap zb.mom.ww.auth.apikeys \
zb.mom.ww.auth.aspnetcore zb.mom.ww.audit; do
printf '%s -> ' "$p"
curl -s -o /dev/null -w "%{http_code}\n" \
"https://gitea.dohertylan.com/api/packages/dohertj2/nuget/registration/$p/index.json"
done
```
**Acceptance:** all five print `200` (currently all `404`).
### Task 0.4: Feed-map + restore OtOpcUa
**Classification:** small
**Estimated implement time:** ~4 min
**Parallelizable with:** 0.5, 0.6 (different repos)
**Files:**
- Modify: `~/Desktop/OtOpcUa/NuGet.config` (add patterns under `dohertj2-gitea`)
- Modify: `~/Desktop/OtOpcUa/Directory.Packages.props` (add `PackageVersion` entries)
**Step 1:** create branch `feat/adopt-zb-auth` in OtOpcUa.
**Step 2:** under the `dohertj2-gitea` `packageSource`, add:
```xml
<package pattern="ZB.MOM.WW.Auth" />
<package pattern="ZB.MOM.WW.Auth.*" />
<package pattern="ZB.MOM.WW.Audit" />
```
**Step 3:** in `Directory.Packages.props` add (version 0.1.0): `ZB.MOM.WW.Auth.Abstractions`, `ZB.MOM.WW.Auth.Ldap`, `ZB.MOM.WW.Auth.AspNetCore`, `ZB.MOM.WW.Audit`. (No `ZB.MOM.WW.Auth.ApiKeys` — OtOpcUa uses OPC UA transport security.)
**Step 4:** `dotnet restore ZB.MOM.WW.OtOpcUa.slnx` — expect success, the new packages download from gitea.
**Step 5: Commit** `build: add ZB.MOM.WW.Auth/Audit feed mapping + version pins`.
**Acceptance:** restore succeeds; `obj/project.assets.json` lists the new packages from the gitea source.
### Task 0.5: Feed-map + restore MxAccessGateway
**Classification:** small
**Estimated implement time:** ~4 min
**Parallelizable with:** 0.4, 0.6
**Files:**
- Modify: `~/Desktop/MxAccessGateway/nuget.config`
- Modify: `~/Desktop/MxAccessGateway/src/ZB.MOM.WW.MxGateway.Server/ZB.MOM.WW.MxGateway.Server.csproj` (inline `Version=` style — no CPM)
**Step 1:** branch `feat/adopt-zb-auth` in MxAccessGateway.
**Step 2:** add the same three `<package pattern>` lines under `dohertj2-gitea`.
**Step 3:** `dotnet restore src/MxGateway.sln` (PackageReferences added in Phase 1; this step only proves the feed resolves — optionally add a throwaway reference and remove, or defer restore-proof to Phase 1's first add).
**Step 4: Commit** `build: add ZB.MOM.WW.Auth/Audit feed mapping`.
**Acceptance:** `nuget.config` maps the new patterns; restore of an added Auth package succeeds.
### Task 0.6: Feed-map + restore ScadaBridge
**Classification:** small
**Estimated implement time:** ~4 min
**Parallelizable with:** 0.4, 0.5
**Files:**
- Modify: `~/Desktop/ScadaBridge/nuget.config`
- Modify: `~/Desktop/ScadaBridge/Directory.Packages.props`
**Step 1:** branch `feat/adopt-zb-auth` in ScadaBridge.
**Step 2:** add the three `<package pattern>` lines under `dohertj2-gitea`.
**Step 3:** add `PackageVersion` entries @ 0.1.0 for all 4 Auth packages + `ZB.MOM.WW.Audit`.
**Step 4:** `dotnet restore ZB.MOM.WW.ScadaBridge.slnx`.
**Step 5: Commit** `build: add ZB.MOM.WW.Auth/Audit feed mapping + version pins`.
**Acceptance:** restore succeeds.
> **Phase 0 exit gate:** all 5 packages HTTP 200; all 3 repos restore green with the new feed mappings. Only then start Phase 1.
---
## PHASE 1 — Auth adoption (auth GAPS #1#8) *[HIGH-RISK PHASE]*
Order within the phase (per `components/auth/GAPS.md` sequencing): **#3 seam → #1 Ldap + #2 ApiKeys → #4 config + #5 claims/cookies → #6 base DN → #8 canonical roles.** Every cutover is gated by parity tests before merge.
### Task 1.0: Explore auth source + elaborate Phase 1 steps *(GATE — do first)*
**Classification:** standard
**Estimated implement time:** ~5 min (read-only)
**Parallelizable with:** none (blocks all 1.x)
**Files (read-only):**
- `components/auth/current-state/{otopcua,mxaccessgw,scadabridge}/CURRENT-STATE.md`
- `components/auth/spec/SPEC.md`, `components/auth/spec/CANONICAL-ROLES.md`, `components/auth/shared-contract/ZB.MOM.WW.Auth.md`
- `ZB.MOM.WW.Auth/src/**` (the public surface being adopted)
- Each repo's LDAP auth service, API-key pipeline, role mapper, and auth DI wiring (paths surfaced by the current-state docs).
**Action:** read the above; for each task below fill in the concrete diff, exact file paths, and the parity-test assertions. Append the elaborated steps to this plan section (or a `…-phase1.md` companion). **No code changes in this task.** This gate exists because the per-repo auth source was not opened during planning.
### Task 1.1: `IGroupRoleMapper<TRole>` seam — config + DB mappers (GAPS #3, all 3 repos)
**Classification:** standard
**Estimated implement time:** ~5 min/repo (split per repo if needed)
**Parallelizable with:** 1.2 within a repo only after the seam type is referenced
**Files:** per-repo role-mapping call sites (config-backed for OtOpcUa + MxGateway; DB-backed `LdapGroupMapping` for ScadaBridge) — exact paths from Task 1.0.
**Steps:** TDD — write a mapper test asserting current group→role outputs are preserved → wire the app to the library's `IGroupRoleMapper<TRole>` (config mapper for OtOpcUa/gw, DB/delegate mapper for SB) → green → commit. **Acceptance:** existing role-resolution behaviour byte-identical; #3 done (cheap, unblocks the rest).
### Task 1.2: Adopt `ZB.MOM.WW.Auth.Ldap` — cutover (GAPS #1, all 3 repos)
**Classification:** high-risk (security; LDAP)
**Estimated implement time:** split per repo (~5 min each)
**Parallelizable with:** 1.3 (different repos) — but within a repo, serial after 1.1
**Files:** each repo's LDAP authentication service + DI (ScadaBridge is the donor baseline; OtOpcUa/gw cut over to it). For OtOpcUa also fix the open `LdapAuthService` `Enabled`/double-singleton wiring (repo memory).
**Steps (per repo):** write parity tests reproducing current authn decisions (bind-then-search, fail-closed-on-group-lookup, RFC-4514 + filter escaping, username trim, service-account-bind distinction) → run red against the library path → replace bespoke LDAP with `Auth.Ldap` → green → commit. **Acceptance:** parity tests green; bespoke LDAP code removed/delegated; OtOpcUa singleton bug fixed.
### Task 1.3: Adopt `ZB.MOM.WW.Auth.ApiKeys` — cutover (GAPS #2; MxGateway then ScadaBridge)
**Classification:** high-risk (security; API keys)
**Estimated implement time:** ~5 min/repo
**Parallelizable with:** 1.2 (different files) — MxGateway first (source), then ScadaBridge
**Files:** MxGateway `Security/Authentication/` API-key verifier/store DI; ScadaBridge Inbound API `X-API-Key` path.
**Steps:** parity tests (peppered HMAC-SHA256, constant-time compare, scope/constraint enforcement) → cutover to `Auth.ApiKeys` → green → commit. **ScadaBridge behaviour change (accepted):** raw `X-API-Key` → structured `<prefix>_<id>_<secret>`; add an **interop check** that an inbound client using the new token format authenticates and the old format is rejected. **Acceptance:** parity + interop green; gateway is the proven source before SB cuts over.
### Task 1.4: Config schema migration (GAPS #4 / A1A2, all 3 repos)
**Classification:** standard
**Estimated implement time:** ~4 min/repo
**Parallelizable with:** bundled with 1.2 per the GAPS note ("mechanical; do with #1")
**Files:** OtOpcUa + MxGateway: `UseTls``Transport` enum binding + appsettings. ScadaBridge: flat `Security:Ldap*`→nested section; rename `LdapUserIdAttribute``UserNameAttribute`, `LdapGroupAttribute``GroupAttribute` (+ appsettings + any validators).
**Steps:** update options class + binding + appsettings + (ScadaBridge) `ConfigPreflight`/validator messages → run config-validation tests → commit. **Acceptance:** apps bind the new schema; no behaviour change beyond key names/enum.
### Task 1.5: `ZB.MOM.WW.Auth.AspNetCore` claims/cookie conventions (GAPS #5, all 3 UIs)
**Classification:** standard
**Estimated implement time:** ~4 min/repo
**Parallelizable with:** 1.4
**Files:** each UI's cookie/claims wiring (OtOpcUa Blazor Admin control-plane; MxGateway `MxGatewayDashboard`; ScadaBridge `ZB.MOM.WW.ScadaBridge.Auth`). Keep each cookie **name**; share canonical claim types + attributes.
**Steps:** adopt the shared claim-type constants + cookie attribute defaults → auth-flow test (login sets canonical claims) → commit. **Acceptance:** each app keeps its cookie name but emits canonical claims.
### Task 1.6: Unify dev GLAuth base DN (GAPS #6, all 3 + fixtures)
**Classification:** small (dev-only)
**Estimated implement time:** ~3 min
**Parallelizable with:** 1.5
**Files:** dev appsettings + LDAP/GLAuth fixtures/infra in each repo. Pick one shared base DN (open decision A3 — resolve in Task 1.0).
**Acceptance:** dev fixtures + all 3 apps share one base DN; dev login still works.
### Task 1.7: Canonical roles — `canonical → native` expansion (GAPS #8, all 3 repos)
**Classification:** high-risk (security policy)
**Estimated implement time:** ~5 min/repo
**Parallelizable with:** none (after 1.1)
**Files:** each repo's role-enforcement mapping. **ScadaBridge accepted collapse:** `AuditReadOnly`→Viewer, `Audit`→Administrator (auditor/admin SoD removed). OtOpcUa: publish ⊂ `FleetAdmin` (no first-class `Deployer`). MxGateway: assign applicable subset (no `Designer`/`Deployer`).
**Steps:** map each canonical role to native enforcement; test that each LDAP group still authorizes its expected actions; document the SoD change → commit. **Acceptance:** canonical six standardized org-wide; per-project native enforcement unchanged except the documented ScadaBridge collapse.
> **Phase 1 exit gate:** all 3 repos consume `ZB.MOM.WW.Auth.*` from the feed; bespoke LDAP/ApiKey/role code removed or delegated; existing auth tests + new parity tests green per repo; SB token-format interop check green. Merge each `feat/adopt-zb-auth` to the repo's local default branch (no push).
---
## PHASE 2 — Audit adoption (audit GAPS #1#3, #5, #6)
> ⚠️ **RE-SCOPED 2026-06-02 — the task specs below are SUPERSEDED.** The Task 2.0 gate (verified against
> live code) found these specs materially wrong: MxGateway's audit files moved into the shared library
> (Phase 1), OtOpcUa's structured audit path is dormant (zero emit sites), and the ScadaBridge
> "outright rename" is structurally impossible (its filter is typed to its own 24-field record, not the
> library's 9-field one). The user chose **DEEP adopt (canonical record)** + **pause for review**. The
> corrected, gate-grounded deep design is in
> [`2026-06-02-auth-audit-normalization-phase2-deep.md`](2026-06-02-auth-audit-normalization-phase2-deep.md)
> — **implementation is PAUSED pending user review of that doc (esp. the ScadaBridge audit-subsystem
> re-architecture cost).** The original specs below are kept for historical context only.
Branch `feat/adopt-zb-audit` per repo. Behaviour-preserving except the OtOpcUa `Outcome` column + `ClusterId` visibility fix. Concrete paths below come from `components/audit/current-state/*`.
### Task 2.0: Explore audit source + confirm elaboration *(GATE — light, paths already known)*
**Classification:** trivial
**Estimated implement time:** ~3 min (read-only)
**Parallelizable with:** none (blocks 2.x)
**Files (read-only):** the exact files cited in the tasks below (OtOpcUa `AuditWriterActor.cs`, `Commons/Messages/Audit/AuditEvent.cs`, `ConfigAuditLog.cs`, `OtOpcUaConfigDbContext.cs`, `ClusterAudit.razor`; MxGateway `IApiKeyAuditStore.cs`, `SqliteApiKeyAuditStore.cs`, `ApiKeyAuditEntry.cs`, `ConstraintEnforcer.cs`, the 3 producers; ScadaBridge `IAuditPayloadFilter.cs`, `IAuditWriter.cs`, `AuditEvent.cs`, the 4 enums). Confirm line refs still hold; adjust if drifted.
### Task 2.1: OtOpcUa — canonical record + `AuditWriterActor : IAuditWriter` + `Outcome` (GAPS #1)
**Classification:** high-risk (actor model + data contract)
**Estimated implement time:** split (record swap ~5 min; actor seam ~5 min; Outcome derivation ~5 min)
**Parallelizable with:** 2.3, 2.5 (different repos)
**Files:**
- Modify: `src/Core/ZB.MOM.WW.OtOpcUa.Commons/Messages/Audit/AuditEvent.cs` (replace with canonical record usage; bridge `NodeId`/`CorrelationId` value-types at construction)
- Modify: `src/Server/ZB.MOM.WW.OtOpcUa.ControlPlane/Audit/AuditWriterActor.cs` (implement `IAuditWriter`; map at `:75-84`)
- Modify: `tests/Server/ZB.MOM.WW.OtOpcUa.ControlPlane.Tests/AuditWriterActorTests.cs`
**Steps:** TDD — extend actor tests to assert `Outcome` derivation (`OpcUaAccessDenied`/`CrossClusterNamespaceAttempt`→Denied, config verbs→Success) and the canonical record mapping → red → swap record + implement seam + derive `Outcome` at emit sites → keep 500/5s batching + two-layer dedup → green → commit. **Acceptance:** existing tests + new `Outcome` tests green; transport/dedup unchanged.
### Task 2.2: OtOpcUa — `Outcome` column migration + `ClusterId` visibility fix (GAPS #1 storage, #5)
**Classification:** high-risk (EF migration + UI query)
**Estimated implement time:** ~5 min
**Parallelizable with:** none (after 2.1)
**Files:**
- Modify: `src/Core/ZB.MOM.WW.OtOpcUa.Configuration/Entities/ConfigAuditLog.cs` (add nullable `Outcome`)
- Modify: `.../OtOpcUaConfigDbContext.cs` (mapping ~`:429-463`)
- Create: `Migrations/<ts>_AddConfigAuditLogOutcome.cs`
- Modify: `src/Server/ZB.MOM.WW.OtOpcUa.AdminUI/Components/Pages/Clusters/ClusterAudit.razor:78` (so structured actor rows — which set `NodeId` not `ClusterId` — are discoverable)
**Steps:** add column + migration → `dotnet ef migrations add` + apply on a test DB → adjust the query so structured rows appear under a cluster → commit. Leave the SP path bespoke (documented). **Acceptance:** migration applies forward; structured `AuditEvent` rows now visible in `ClusterAudit.razor`.
### Task 2.3: MxGateway — `IApiKeyAuditStore``IAuditWriter` adapter (GAPS #2, #6)
**Classification:** standard
**Estimated implement time:** ~5 min
**Parallelizable with:** 2.1, 2.5
**Files:**
- Modify: `src/ZB.MOM.WW.MxGateway.Server/Security/Authentication/``IApiKeyAuditStore.cs`, `SqliteApiKeyAuditStore.cs`, `ApiKeyAuditEntry.cs`, `AuthStoreServiceCollectionExtensions.cs:23`, and the 3 producers (`ApiKeyAdminCliRunner`, `DashboardApiKeyManagementService`, `ConstraintEnforcer.cs:117`)
- Test: gateway audit tests (`SqliteAuthStoreTests`, `ApiKeyAdminCliRunnerTests`)
**Steps:** map to canonical `AuditEvent` — generate `EventId`; `KeyId→Actor` with `"system"`/`"cli"` fallback; `EventType→Action`; `CreatedUtc→OccurredAtUtc`; `RemoteAddress→SourceNode`; `constraint-denied→Outcome.Denied` else `Success`; `Category="ApiKey"`; `Details→DetailsJson` **wrapped as a JSON object**; add `CorrelationId` capture + structured `Target` (#6). **Wrap `AppendAsync` so it never throws** (best-effort contract). Producers keep call sites; only the injected type changes. → tests green → commit. **Acceptance:** writes produce canonical events; writer never propagates; tests green.
### Task 2.5: ScadaBridge — rename `IAuditPayloadFilter``IAuditRedactor` + adopt `AuditOutcome` (GAPS #3)
**Classification:** high-risk (HIGH blast radius rename across site/central/wiring)
**Estimated implement time:** ~5 min (compiler-driven)
**Parallelizable with:** 2.1, 2.3
**Files:**
- Modify: `src/ZB.MOM.WW.ScadaBridge.AuditLog/Payload/IAuditPayloadFilter.cs` → adopt `ZB.MOM.WW.Audit.IAuditRedactor` (outright rename; `DefaultAuditPayloadFilter`/`SafeDefaultAuditPayloadFilter` implement it unchanged)
- Modify: all references across `AuditLog/Site`, `AuditLog/Central`, wiring, `Commons`
- Adopt canonical `AuditOutcome` enum; confirm `IAuditWriter` signature is byte-identical (keep the bespoke ~25-field record as storage shape — option (a))
**Steps:** outright rename (let the compiler enumerate sites) → adopt `AuditOutcome` and the `Status→Outcome` projection (`Delivered`→Success; `Failed`/`Parked`/`Discarded`→Failure; `InboundAuthFailure`→Denied) for cross-project reporting → build + full audit test suite green → commit. **Acceptance:** compiles clean; no transport/storage/CLI/UI behaviour change; enum + interface names canonical.
> **Phase 2 exit gate:** all 3 repos consume `ZB.MOM.WW.Audit`; seams/record/enum canonical; existing audit suites green; OtOpcUa `Outcome` migration applies; ScadaBridge rename clean. Merge each `feat/adopt-zb-audit` locally (no push).
---
## PHASE 3 — Wire `Actor` from the Auth principal (audit GAPS #4)
### Task 3.1: Introduce `IAuditActorAccessor` seam
**Classification:** standard
**Estimated implement time:** ~4 min
**Parallelizable with:** none (blocks 3.23.4)
**Files:** a small accessor per app (HTTP impl reads `HttpContext.User`; non-HTTP returns a threaded/fallback principal). Exact location decided in Task 1.0/3.1 from the now-adopted `Auth.AspNetCore` principal plumbing.
**Steps:** define the interface + an HTTP-backed impl + a fallback impl → unit test both → commit. **Acceptance:** accessor returns the Auth principal on authenticated paths, a fallback otherwise.
### Task 3.2 / 3.3 / 3.4: Wire emit sites — OtOpcUa / MxGateway / ScadaBridge
**Classification:** standard (each)
**Estimated implement time:** ~4 min each
**Parallelizable with:** each other (different repos), after 3.1
**Files:** each repo's audit emit sites (OtOpcUa config-write/authz emitters; MxGateway 3 producers — keep `"system"`/`"cli"` for keyless CLI; ScadaBridge `ManagementActor`/inbound boundary).
**Steps:** inject `IAuditActorAccessor`; set `AuditEvent.Actor = accessor.CurrentPrincipal` at each emit site → test `Actor == authenticated principal` on authenticated paths, fallback retained otherwise → commit. **Acceptance:** every authenticated emit carries the real Auth principal; keyless/system paths retain explicit fallbacks.
> **Program exit gate:** `Audit.Actor == Auth principal` end-to-end across all 3 repos; all suites green; everything on local default branches (no push). Update `components/auth/GAPS.md` and `components/audit/GAPS.md` to mark the adopted items done, and refresh the relevant `CLAUDE.md` status rows.
---
## Risk gates (cross-cutting)
- **Never publish a red library** (Task 0.2 gates 0.3). If a parity gap forces a lib fix, bump `0.1.0``0.1.1` and re-publish; don't edit a published version.
- **Phase 1 parity tests** must be green before any auth cutover merges — this is the security gate.
- **A green build in one repo does not prove interop.** The ScadaBridge token-format change (Task 1.3) is the one cross-boundary contract change and needs the explicit interop check.
- **Waterfall enforced by deps:** Phase 1 fully lands before Phase 2; Phase 3 after both.
@@ -0,0 +1,43 @@
{
"planPath": "docs/plans/2026-06-02-auth-audit-normalization.md",
"designPath": "docs/plans/2026-06-02-auth-audit-normalization-design.md",
"tasks": [
{"id": 7, "subject": "Phase 0 umbrella — publish + feed-map", "status": "completed", "blockedBy": [11, 12, 13, 14, 15, 16]},
{"id": 8, "subject": "Phase 1 umbrella — adopt ZB.MOM.WW.Auth — COMPLETE (all of 1.0-1.7 across 3 repos, reviewed, local-only)", "status": "completed", "blockedBy": [7, 17, 18, 19, 20, 21, 22, 23, 24]},
{"id": 9, "subject": "Phase 2 umbrella — adopt ZB.MOM.WW.Audit — COMPLETE (OtOpcUa 2.1/2.2, MxGateway 2.3, ScadaBridge 2.5 full re-arch C1-C7; all reviewed, local-only)", "status": "completed", "blockedBy": [7, 8, 25, 26, 27, 28, 29]},
{"id": 10, "subject": "Phase 3 umbrella — wire Actor from Auth principal — COMPLETE (IAuditActorAccessor per app + emit-site wiring; all reviewed, local-only)", "status": "completed", "blockedBy": [8, 9, 30, 31]},
{"id": 11, "subject": "Task 0.1: Add push.sh for ZB.MOM.WW.Audit", "status": "completed", "blockedBy": []},
{"id": 12, "subject": "Task 0.2: Build+test both libs green", "status": "completed", "blockedBy": []},
{"id": 13, "subject": "Task 0.3: Pack+push both libs; verify HTTP 200", "status": "completed", "blockedBy": [11, 12]},
{"id": 14, "subject": "Task 0.4: Feed-map + restore OtOpcUa", "status": "completed", "blockedBy": [13]},
{"id": 15, "subject": "Task 0.5: Feed-map MxAccessGateway", "status": "completed", "blockedBy": [13]},
{"id": 16, "subject": "Task 0.6: Feed-map + restore ScadaBridge", "status": "completed", "blockedBy": [13]},
{"id": 17, "subject": "Task 1.0: GATE explore auth source + elaborate", "status": "completed", "blockedBy": [14, 15, 16]},
{"id": 18, "subject": "Task 1.1: IGroupRoleMapper seam (#3)", "status": "completed", "blockedBy": [17]},
{"id": 19, "subject": "Task 1.2: Adopt Auth.Ldap cutover (#1) [high-risk]", "status": "completed", "blockedBy": [18]},
{"id": 20, "subject": "Task 1.3: Adopt Auth.ApiKeys (#2) [high-risk] — COMPLETE (MxGw donor + ScadaBridge re-arch C1-C5)", "status": "completed", "blockedBy": [18]},
{"id": 21, "subject": "Task 1.4: Config schema migration A1/A2 (#4)", "status": "completed", "blockedBy": [17]},
{"id": 22, "subject": "Task 1.5: AspNetCore claims/cookies (#5) — DONE all 3 (OtOpcUa 83856b7+d0777ee, MxGw 7e1af37, SB full-canonical a0938f7+c185a56)", "status": "completed", "blockedBy": [17]},
{"id": 23, "subject": "Task 1.6: Unify dev base DN (#6) — DONE all 3 to dc=zb,dc=local (OtOpcUa 8ba289f, MxGw 9572045, SB 6ae6051)", "status": "completed", "blockedBy": [17]},
{"id": 24, "subject": "Task 1.7: Canonical roles native expansion (#8) [high-risk] — DONE all 3, full-value canonical (MxGw 04bce3ff, OtOpcUa c1619d9 +DB-mig, SB b104760+4118452 +DB-mig +SoD collapse)", "status": "completed", "blockedBy": [18]},
{"id": 25, "subject": "Task 2.0: GATE confirm audit source refs — DONE; found plan specs materially off → DEEP re-scope in -phase2-deep.md; PAUSED for user review before 2.1/2.2/2.3/2.5", "status": "completed", "blockedBy": [8]},
{"id": 26, "subject": "Task 2.1: OtOpcUa canonical record + IAuditWriter + Outcome (#1) [high-risk] — DONE 933dd1a (spec+code reviewed)", "status": "completed", "blockedBy": [25]},
{"id": 27, "subject": "Task 2.2: OtOpcUa Outcome migration + ClusterId fix (#1,#5) [high-risk] — DONE b7f5e88 (spec+code reviewed)", "status": "completed", "blockedBy": [26]},
{"id": 28, "subject": "Task 2.3: MxGateway store→IAuditWriter adapter (#2,#6) [re-scoped high-risk] — DONE a5944bb+7ea8358 (canonical SQLite store+adapter; spec+code reviewed)", "status": "completed", "blockedBy": [25]},
{"id": 29, "subject": "Task 2.5: ScadaBridge audit DEEP full-rearch to 9-col canonical (#3) [high-risk] — DONE C1-C7 (3d77dc0,adfb4d3/5aaf9e2,db707bb/c27b2c3,946d3e2/1737d15,68a6bd1,C6-subsumed,635461c/bc0e5bf); all spec+code reviewed, MSSQL-verified", "status": "completed", "blockedBy": [25]},
{"id": 30, "subject": "Task 3.1: IAuditActorAccessor seam (per-app HTTP accessor) — DONE (OtOpcUa 075c0e6, MxGw 0859d47, SB b3de840)", "status": "completed", "blockedBy": [9]},
{"id": 31, "subject": "Task 3.2-3.4: Wire emit sites to Auth principal (#4) — DONE (MxGw dashboard Actor=operator/Target=keyId; SB inbound Actor from principal w/ auth-fail-null; OtOpcUa seam forward-looking) — reviewed", "status": "completed", "blockedBy": [30]},
{"id": 32, "subject": "Task 1.3-L: Extend Auth.ApiKeys admin store (SetScopes/SetEnabled) -> lib 0.1.3 (PUBLISHED)", "status": "completed", "blockedBy": []},
{"id": 33, "subject": "Task 1.3-C1: ScadaBridge re-pin 0.1.3 + IInboundApiKeyAdmin seam (additive) + baseline reds fixed", "status": "completed", "blockedBy": [32]},
{"id": 34, "subject": "Task 1.3-C2: ManagementActor + CLI + Commons messages onto seam", "status": "completed", "blockedBy": [33]},
{"id": 35, "subject": "Task 1.3-C3: CentralUI pages onto seam (string keyId + scopes)", "status": "completed", "blockedBy": [33]},
{"id": 36, "subject": "Task 1.3-C4: TransportExport exclude API keys (methods-only)", "status": "completed", "blockedBy": [33, 35]},
{"id": 37, "subject": "Task 1.3-C5 (=E): retire SQL Server ApiKey entity + EF migration + runbook", "status": "completed", "blockedBy": [34, 35, 36]}
],
"lastUpdated": "2026-06-02 — PROGRAM COMPLETE (Phases 0-3 done across 3 repos: Auth + Audit normalized, Actor wired from principal). All local-only on feat/adopt-zb-auth + feat/adopt-zb-audit; NOTHING pushed/merged. Remaining = exit-gate doc updates + user merge/push decision."
}
@@ -0,0 +1,78 @@
# ScadaBridge audit re-architecture (Task 2.5, DEEP full 9-col) — decomposition
Companion to `2026-06-02-auth-audit-normalization-phase2-deep.md`. User chose **Full re-arch (pure 9-col storage)**
for ScadaBridge audit. Architect design pass (read-only, verified on `feat/adopt-zb-audit`) produced this. The full
audit record becomes the library 9-field `ZB.MOM.WW.Audit.AuditEvent`; ~15 domain fields relocate into `DetailsJson`;
ScadaBridge consumes the library `IAuditWriter`/`IAuditRedactor`/`AuditOutcome`. This is the program's largest task.
## Key resolutions (from the design)
- **Forwarding state machine (the crux) → resolved cleanly.** It lives **only in site SQLite**; the central MS SQL
`AuditLog` table is **append-only** (DENY UPDATE/DELETE; central rows leave `ForwardState` null; reconciliation is
pure idempotent-insert with in-memory cursors), and the gRPC `AuditEventDtoMapper` **already** drops
`ForwardState`/`IngestedAtUtc` on the wire. So **central needs NO forwarding columns** (pure 9-col). On the **site**,
add a **sidecar `audit_forward_state` table** keyed by `EventId` (`ForwardState`, `OccurredAtUtc`, precomputed
`IsCachedKind`, optional `AttemptCount`/`LastAttemptUtc`) — `MarkForwarded`/`MarkReconciled` UPDATE the sidecar;
`ReadPending*` JOIN it; the canonical `audit_event` table is write-once. Precomputing `IsCachedKind` keeps the drain
hot path off JSON parsing (strictly faster than today's `Kind NOT IN(...)`).
- **Central storage migration → new table + copy** (in-place collapse infeasible: partition-aligned indexes +
`SwitchOutPartitionAsync` hard-codes a byte-identical staging column list). New 10-col table on the SAME
`ps_AuditLog_Month(OccurredAtUtc)` scheme; per-partition data copy projecting old typed columns into `DetailsJson`
(`FOR JSON PATH`); rename + role re-grant (append-only preserved). Partitioning preserved (`OccurredAtUtc` stays).
- **Reporting queryability → persisted computed columns for hot filters.** `Category`(=Channel) + canonical
`Outcome`/`Target`/`Actor`/`SourceNode`/`CorrelationId` cover most filters directly. Add **PERSISTED computed columns**
`Kind`/`Status`/`SourceSiteId`/`ExecutionId`/`ParentExecutionId` (`JSON_VALUE(DetailsJson,'$.x')`) + partition-aligned
indexes so the existing index semantics + the `GetExecutionTreeAsync` recursive CTE survive without a JSON perf cliff.
- **Redactor → `ScadaBridgeAuditRedactor : IAuditRedactor`** on the canonical record: parse `DetailsJson` once, redact +
byte-safe-truncate `requestSummary`/`responseSummary`/`errorDetail`/`extra` in the JSON tree, cap on canonical
`Category`/`Outcome` (replacing the typed `Channel`/`Status` reads), set `payloadTruncated`, re-serialize. Add a
fast-path that skips JSON parse when nothing to redact. `SafeDefault``SafeDefaultAuditRedactor`. Re-baseline the
perf hot-path budgets (JSON parse/rewrite is ~24× the typed-field path).
- **Canonical field mapping:** `Action = "{Channel}.{Kind}"`; `Category = Channel`; `Target/SourceNode/CorrelationId/
Actor/OccurredAtUtc` direct (DateTime→DateTimeOffset UTC). **`Outcome`:** `Kind==InboundAuthFailure`→`Denied` (checked
first); `Status==Delivered``Success`; `Status∈{Failed,Parked,Discarded}``Failure`; in-flight/`Skipped``Success`.
- **`DetailsJson` schema (camelCase, stable):** channel, kind, status, executionId, parentExecutionId, sourceSiteId,
sourceInstanceId, sourceScript, httpStatus, durationMs, errorMessage, errorDetail, requestSummary, responseSummary,
payloadTruncated, extra, ingestedAtUtc. **One shared `AuditDetailsCodec` (Commons) with deterministic options is
MANDATORY** — the canonical record uses value-equality + consumers dedup on it, so key-order/whitespace drift would
break dedup. (`forwardState` is NOT in DetailsJson — it's site-sidecar only.)
- **Commons takes the `ZB.MOM.WW.Audit` package ref** (the record lives in Commons; the package is a leaf canonical-types
pkg, only dep `Microsoft.Extensions.DependencyInjection.Abstractions`). Acceptable.
- **gRPC proto kept UNCHANGED** — the wire `AuditEventDto` stays 24-field internally; `AuditEventDtoMapper` projects
to/from `DetailsJson`. Avoids a proto/codegen rev + a site/central version-skew handshake. (A proto collapse is a
separate later task.)
## Staged decomposition (C1C7)
| Stage | Scope | Green? | Class | Risk |
|---|---|---|---|---|
| **C1** | Commons: add `ZB.MOM.WW.Audit` ref; new pure types `AuditDetails` record + `AuditDetailsCodec` (deterministic) + `Status/Kind→AuditOutcome` projection + `Action`/`Category` builders. No existing type changes. | yes | small | trivial |
| **C2** | `ScadaBridgeAuditRedactor`/`SafeDefaultAuditRedactor : IAuditRedactor` (canonical record, parse/rewrite DetailsJson, fast-path) — additive, old `IAuditPayloadFilter` still wired; unit-tested in isolation. | yes | standard | low |
| **C3** | **ATOMIC CUT — swap the record everywhere.** `Commons.Entities.Audit.AuditEvent``ZB.MOM.WW.Audit.AuditEvent` across ~40 src files + tests: emitters build canonical (domain→DetailsJson via codec); seams (`IAuditWriter`/`ICentralAuditWriter`/`ISiteAuditQueue`/`IAuditLogRepository`/`AuditLogQueryFilter`) re-type; `AuditEventDtoMapper` DTO↔canonical (proto unchanged); switch redactor wiring `IAuditPayloadFilter``IAuditRedactor`. | **boundaries only** | **high-risk** | **HIGHEST** |
| **C4** | Site SQLite two-table forwarding: `SqliteAuditWriter``audit_event` + `audit_forward_state`; retarget `MarkForwarded/MarkReconciled/ReadPending*/GetBacklogStats/MapRow` to JOIN+sidecar; precompute `IsCachedKind`. Telemetry/Reconciliation actors unchanged (seam stable). Site SQLite is ephemeral (7-day) → in-place schema reset, no data migration. | yes | high-risk | HIGH |
| **C5** | **ATOMIC CUT — central migration.** EF `CollapseAuditLogToCanonical`: new 10-col table on the partition scheme + per-partition data copy (old cols→DetailsJson) + persisted computed cols/indexes + rename + role re-grant; update `AuditLogRepository.InsertIfNotExistsAsync` + `SwitchOutPartitionAsync` staging list; regen ModelSnapshot. Maintenance-window; verify row-count + JSON spot-check. | **boundaries only** | **high-risk** | **HIGHEST** |
| **C6** | Reporting/UI/export retarget: `QueryAsync`/`GetKpiSnapshotAsync`/`GetExecutionTreeAsync` predicates→canonical/computed cols; `AuditLogExportService`+`AuditEndpoints` CSV + CentralUI Audit components + CLI parse `DetailsJson` for display. | yes | standard | med |
| **C7** | Tests + perf re-baseline + cleanup: rewrite `PayloadFilterContractTests`/redaction/`HotPathLatencyTests` to canonical+JSON + new budget; delete dead `Commons.Entities.Audit.AuditEvent`, 4 audit enums (or relocate behind codec), `IAuditPayloadFilter`/`Default`/`SafeDefault`, obsolete `AddColumnIfMissing`. | yes | standard | low |
**Atomic cuts:** only C3 (shared record type changes for all callers at once) and C5's data-copy half cannot stay green continuously. All other stages are green at completion.
## Top risks (carry into execution)
1. **C5 partition + `SwitchOutPartitionAsync` + persisted computed columns** — staging table must carry identical computed defs for SWITCH; add a SWITCH round-trip integration test before C5 ships. **Documented fallback:** if too brittle, keep `Kind`/`Status` as 2 real non-canonical columns on the central table (pragmatic, not pure-9-col) — decide at C5 implementation if blocked.
2. **DetailsJson determinism** — single `AuditDetailsCodec` (C1) is load-bearing for value-equality/dedup, not cosmetic.
3. **Redactor perf** — budgets move; add the no-op fast-path + empirically re-baseline in C7.
4. **gRPC** — keep the proto unchanged (mapper-internal projection); do NOT couple a wire change to this storage cut.
5. **`Action=Channel.Kind`** lossiness — mitigated by `Category`(=channel) + persisted computed `Kind`; ScadaBridge-internal filtering uses those, not `Action` parsing.
Delivery: `feat/adopt-zb-audit` (stacked on auth), local-only. Each stage = one implementer + classification review chain; full ScadaBridge suite at C3/C4/C5/C7.
## Stage status (live)
- **✅ C1 DONE** `3d77dc0` (code ✅) — `AuditDetails` + deterministic `AuditDetailsCodec` (pinned byte-exact) + `AuditOutcomeProjector` + `AuditFieldBuilders` + Commons→`ZB.MOM.WW.Audit` ref; 56 tests.
- **✅ C2 DONE** `adfb4d3` + fix `5aaf9e2` (spec ✅, code ✅ after fix) — `ScadaBridgeAuditRedactor`/`SafeDefaultAuditRedactor : IAuditRedactor` on the canonical record; redaction primitives extracted into shared `AuditRedactionPrimitives`/`AuditRegexCache` (old filter delegates, behaviour-preserved); cap-selection reads `d.Status` (faithful to legacy `IsErrorStatus`); fast-path + never-throws; review-fix hardened `OverRedact` to scrub ALL free-text fields + marker alignment + outer-catch never-leak test. 61 redaction + 44 payload + 88 commons-audit green.
- **✅ C3 DONE** `db707bb` + fix `c27b2c3` (spec ✅, code ✅; independently re-verified build 0/0 + AuditLog 241/Communication 201). Atomic record swap across all seams/emitters/gRPC DTO/redactor-wiring (127 files); `ScadaBridgeAuditEventFactory` single emit point; `AuditRowProjection` Decompose/Recompose transitional 24-col shim (lossless round-trip verified); proto unchanged; old `IAuditPayloadFilter` classes deleted (C7 pulled forward). Fix: safe enum-parse fallback in `MapRow`+`FromDto`.
- **✅ C4 DONE** `946d3e2` + fix `1737d15` (spec ✅, code ✅; independently re-verified diff scope = writer+tests only, build 0/0, AuditLog 249/1-preexisting). Site SQLite → `audit_event` (canonical) + `audit_forward_state` sidecar; forwarding marks/reads on the sidecar via JOIN; `IsCachedKind`={CachedSubmit,ApiCallCached,DbWriteCached,CachedResolve} precomputed drain split; old `AuditLog` table dropped (ephemeral reset). Fix: `PRAGMA foreign_keys=ON` + `MarkForwarded` no-demote guard.
- **✅ C5 DONE** `68a6bd1` (spec ✅, code ✅; a LIVE SQL Server was available so the migration + SWITCH were fully exercised — independently re-verified build 0/0 + ConfigurationDatabase 248/248). Central `dbo.AuditLog` collapsed to 10 canonical cols + 6 computed cols (5 PERSISTED + `IngestedAtUtc` non-persisted) on the preserved `ps_AuditLog_Month` scheme; `CollapseAuditLogToCanonical` new-table-and-copy migration (`FOR JSON PATH` projection, byte-verified round-trip; Down = documented one-way); repo writes/reads canonical directly; `SwitchOutPartition` staging matches the computed-col defs; append-only roles re-granted. C3 central shim retired. Forced deviations (all sound): IngestedAtUtc non-persisted, execution-id indexes unfiltered, provider-aware `OnModelCreating` strips JSON_VALUE for SQLite. Deferred to C7: a dedicated migration-projection test + the stale `CreatesFiveNamedIndexes` test name.
- **✅ C6 SUBSUMED** (no commit) — reporting/UI/export/CLI retarget was already completed by the C3 record-swap (`AuditEventView`/`AuditExportRow` shims decode every domain field from `DetailsJson`) + the C5 repo-query retarget. Read-only explorer verdict: all consumer surfaces canonical-complete; the only flagged items (ExecutionId/ParentExecutionId not in CSV; SourceNodes not parsed in export `ParseFilter`) are PRE-rearch omissions, not regressions. CentralUI 595/595, ManagementService 125/125 confirm.
- **✅ C7 DONE** `635461c` + doc-fix `bc0e5bf` (review ✅; independently re-verified build 0/0, PerformanceTests 10/10, ConfigurationDatabase 251/251 incl. the 3 new migration-projection tests PASSING on live MSSQL, zero dead crefs). Perf hot-path re-baselined (canonical JSON redactor measured ~14µs/2µs — faster than the old typed walk; budgets 200/30/5µs + fast-path `Assert.Same`); `CollapseAuditLogToCanonicalMigrationTests` (seed→migrate→assert Action/Category/Outcome/Actor-null/DetailsJson-round-trip + 5 persisted computed cols); index test → `CreatesNineNamedIndexes`; 26 dead-`<see cref>` across 13 files cleaned; doc-fix corrected the "six persisted" wording (5 persisted + IngestedAtUtc non-persisted).
## ✅ TASK 2.5 COMPLETE — ScadaBridge audit FULL re-architecture to pure 9-col canonical (2026-06-02)
All of C1C7 done, each spec+code reviewed, on `feat/adopt-zb-audit` (local-only, never pushed). ScadaBridge's audit subsystem now: the canonical `ZB.MOM.WW.Audit.AuditEvent` record everywhere (domain fields in `DetailsJson` via the deterministic `AuditDetailsCodec`); the library `IAuditRedactor`/`AuditOutcome` consumed; site SQLite = `audit_event` (canonical) + `audit_forward_state` sidecar (forwarding decoupled, `IsCachedKind` drain split); central `dbo.AuditLog` collapsed to 10 canonical cols + persisted computed cols on the preserved partition scheme (`CollapseAuditLogToCanonical` migration, MSSQL-verified); UI/export/CLI canonical-complete via `AuditEventView`/`AuditExportRow`. The gRPC proto was intentionally left unchanged (mapper-internal projection). This was the program's single largest task.
@@ -0,0 +1,171 @@
# UI-Theme Adoption — Design
**Date:** 2026-06-03
**Status:** Approved (brainstorming complete) — ready for `writing-plans`.
**Component:** UI Theme (`ZB.MOM.WW.Theme` shared RCL).
**Goal:** Adopt the shared `ZB.MOM.WW.Theme` Razor Class Library across all three sister
apps (OtOpcUa AdminUI, MxAccessGateway Dashboard, ScadaBridge CentralUI + Host) via a
**full canonical cutover** (SPEC §7), after first **promoting nav-expand persistence into
the kit** so every app gets it from one shared mechanism.
> This is the UI-theme analogue of the completed Auth+Audit normalization
> (`docs/plans/2026-06-02-auth-audit-normalization*.md`). It is **UI-only**: no data
> contracts, no DB migrations, no wire protocols. The dominant risk is **visual
> regression**, not data corruption.
---
## 0. Verified starting state (2026-06-03)
Independently verified (the component docs were optimistic — cf. memory
`component-status-claims-are-optimistic`):
- **Library is real but unpublished and unadopted.** `ZB.MOM.WW.Theme/` holds all 10
components + a Release `0.1.0` nupkg, but the Gitea feed returns **HTTP 404** for the
package and **no app references it**. The shared-contract's "Published to the Gitea NuGet
feed" is aspirational. → This is a clean **publish + adopt**.
- **Library is plain files tracked by `scadaproj`** (not a nested git repo) — library
changes commit in `scadaproj` (cf. memory `shared-libs-are-plain-files-not-nested-repos`).
- **Per-app surface** matches `components/ui-theme/GAPS.md`:
- **OtOpcUa AdminUI** — already side-rail (`.app-shell`/`.side-rail`/`.rail-link`);
interactive `NavSidebar` island (`@rendermode InteractiveServer`) holding `_expanded`,
persisted via JS interop (`window.navState.get/.set`) to the `otopcua_nav` cookie
(comma-separated section ids, 1-yr, `SameSite=Lax`); bespoke `StatusBadge`; static-POST
`Login.razor`; own `theme.css` + vendored fonts. *Lowest risk.*
- **ScadaBridge CentralUI**`.sidebar`/`.nav-link`/`<ul><li>` (`NavMenu` + `NavSection`);
`Login.razor` + `LoginLayout`; own `theme.css`; Host owns `App.razor`. *Medium risk.*
- **MxAccessGateway Dashboard** — combined `MainLayout` (~210 lines); `.sidebar`/`.nav-link`;
`StatusBadge`; **no Blazor login page** (server-redirect); own `theme.css` (font path is
absolute `/fonts/…`, not portable). *Highest risk.*
---
## 1. Decisions (locked during brainstorming)
| # | Decision | Choice |
|---|---|---|
| D1 | Adoption depth | **A — Full canonical cutover** (SPEC §7 acceptance, all three apps) |
| D2 | Nav persistence | **On all apps, via one shared kit mechanism** (not bespoke per app) |
| D3 | Persistence implementation | **CSS `<details>` + localStorage enhancer** (recommended over promoting OtOpcUa's interactive-island+cookie) |
| D4 | MxGateway login | **Add a new `<LoginCard>` Blazor login page** (the higher-risk consistency option) |
| D5 | Delivery model | **Same as Auth/Audit**`feat/adopt-zb-theme` per app, local-only, then fast-forward merge to each repo's default + push to gitea on explicit go; scadaproj docs on `docs/ui-theme-adoption` |
| D6 | Publish | **Publish the (enhanced) RCL to the Gitea feed first**, then adopt (needs `GITEA_NUGET_KEY`, user-supplied, not persisted) |
| D7 | Library version | **Bump `0.1.0 → 0.2.0`** (new feature: persistent nav + `ThemeScripts`); publish `0.2.0` directly (0.1.0 was never released) |
| D8 | Accent colors | Preserve each app's current `--accent` value (move the *source* to the RCL, don't shift palettes) |
---
## 2. Program shape & sequencing
A **library-minor-then-adopt waterfall** (same shape as Auth/Audit):
- **Phase 0 — Library enhancement + publish.** Add shared nav persistence (§3), bump to
`0.2.0`, run the bUnit suite, `build/push.sh` to the Gitea feed. Commits in `scadaproj`.
- **Phase 1 — OtOpcUa AdminUI** (lowest risk; already side-rail; validates the pattern).
- **Phase 2 — ScadaBridge CentralUI + Host** (medium; class migration + AuthorizeView nav).
- **Phase 3 — MxAccessGateway Dashboard** (highest; split combined layout **and** add the
net-new `LoginCard` page).
- **Phase 4 — scadaproj docs + memory** (GAPS adoption banner; CLAUDE.md ui-theme row →
*Adopted*; shared-contract → *Published 0.2.0*; memory note).
**Execution:** subagent-driven, classification-driven reviews (trivial→none; small→code;
standard→spec∥code parallel; high-risk→serial spec→code + final integration review).
**Delivery:** `feat/adopt-zb-theme` branch per app, local-only; full build+test green per
repo; fast-forward merge to each default + push to gitea on the user's explicit go.
---
## 3. Library enhancement: shared nav persistence (Phase 0)
Promote **one** shared mechanism into the kit — a simpler generalization of OtOpcUa's
proven cookie+interop approach.
**Mechanism — CSS `<details>` + localStorage enhancer:**
- `NavRailSection` stays the static-SSR-friendly `<details class="rail-section" open="@Expanded">`
it already is. It gains a stable **`Key`** parameter (default = a slug of `Title`) emitted
as a `data-nav-key` attribute on the `<details>`.
- New vendored asset `wwwroot/js/nav-state.js` in the RCL: on `DOMContentLoaded`, for each
`[data-nav-key]`, read `localStorage` and set `el.open`; attach a `toggle` listener that
writes `el.open` back to `localStorage` keyed by `data-nav-key`. Pure client-side
progressive enhancement — no circuit, no server round-trip.
- New `<ThemeScripts/>` component (sibling to `ThemeHead`) emits
`<script src="_content/ZB.MOM.WW.Theme/js/nav-state.js" defer></script>`, placed before
`</body>`.
**Why localStorage over promoting OtOpcUa's island+cookie:** keeps the kit
**static-SSR-friendly** (no forced `InteractiveServer` island per app), one shared file,
uniform across all three. It *simplifies* OtOpcUa — retiring its interactive `NavSidebar`
island + `nav-state.js` + `otopcua_nav` cookie in favor of the shared enhancer. localStorage
is per-browser/origin (same effective scope as the old cookie) and is never read
server-side today, so nothing is lost.
**Trade-off:** a brief flash-of-default-state on first paint (localStorage isn't readable
server-side, so sections render at their server default and JS corrects after load).
Negligible for a nav rail. (If zero-flash were required, the alternative is a server-read
cookie — rejected as more kit coupling.)
**Version:** `0.1.0 → 0.2.0` (additive feature). **Tests:** extend the bUnit suite —
`NavRailSection` emits `data-nav-key` (derived slug + explicit `Key`); `ThemeScripts` emits
the script tag. JS runtime behavior is covered by the per-app manual checklist (§5), since
bUnit has no JS engine.
---
## 4. Per-app adoption scope (full canonical cutover)
Each app, per SPEC §7: add `PackageReference ZB.MOM.WW.Theme 0.2.0` + `@using ZB.MOM.WW.Theme`
in `_Imports.razor`; `<ThemeHead/>` in `App.razor` `<head>` after Bootstrap + `<ThemeScripts/>`
before `</body>`; **delete the app's `theme.css` + vendored IBM Plex `.woff2` fonts**; replace
`MainLayout` with the thin delegation to `<ThemeShell Product=… Accent=…>`; rebuild nav with
`NavRailItem`/`NavRailSection`; `StatusBadge``<StatusPill>`; login → `<LoginCard>`; **keep**
each app's `site.css` page-layout residual + scoped `.razor.css` unchanged. `--accent`
preserves each app's current value (D8).
| App | Notable specifics | Risk |
|---|---|---|
| **OtOpcUa** AdminUI | Already-correct rail classes (RCL `layout.css` matches). Retire `NavSidebar` island + `nav-state.js` + `otopcua_nav` cookie → kit `NavRailSection`/`NavRailItem` + shared enhancer. `RailFooter` = the existing `AuthorizeView` session block. `StatusBadge``StatusPill`. `Login.razor``LoginCard` (keep static POST, `<AntiforgeryToken/>`, server-validate `ReturnUrl`). | LowMed |
| **ScadaBridge** CentralUI + Host | `.sidebar`/`.nav-link`/`<ul><li>` (`NavMenu`+`NavSection`) → kit nav (class migration throughout). Verify `<AuthorizeView>` policy-gated sections render/hide under static SSR (GAPS open Q). `<ThemeHead/>`/`<ThemeScripts/>` go in Host's `App.razor`. `StatusBadge`/inline `.chip-*``StatusPill`. `Login.razor`+`LoginLayout``LoginCard`. | Med |
| **MxGateway** Dashboard | Split combined ~210-line `MainLayout` → thin `MainLayout` + `<ThemeShell>` (nav extracted into the `Nav` slot). `.sidebar`/`.nav-link`→rail classes; portable font path fixed by RCL. `StatusBadge``StatusPill`. **Add a new `/login` Blazor page** using `<LoginCard>` posting to a `/auth/login` endpoint wired to the app's existing `ZB.MOM.WW.Auth` LDAP service + dashboard cookie `SignInAsync` (mirror OtOpcUa/ScadaBridge static-POST login). Verify the server auth-redirect now lands on this page. | **High** |
---
## 5. Delivery, risk & verification
- **Build/test gate per repo:** `dotnet build` + the full suite green before merge. Baseline
the **known pre-existing reds** first and do not chase them (ScadaBridge IntegrationTests
×11 needing live LDAP/SQL/SMTP + flaky `StaleTagMonitor` timer tests; MxGateway 3 FakeWorker
tests) — only regressions introduced by this work count.
- **Visual regression is the real risk** — a green build does not prove the chrome looks
right. Verification per app = a structured manual checklist:
1. Rail renders at `lg`+ and collapses to a hamburger toggle below `lg`.
2. Nav expand-state persists across navigations and a full reload (shared enhancer).
3. `StatusPill` renders correctly in all five states (`Ok`/`Warn`/`Bad`/`Idle`/`Info`).
4. Login posts, round-trips `ReturnUrl` safely (server-validated), shows errors.
5. IBM Plex fonts load from `_content/ZB.MOM.WW.Theme/fonts/…` (no 404; OtOpcUa's latent
font 404 is fixed).
- **Optional browser smoke pass:** run each app locally and drive a Claude-in-Chrome smoke
pass (screenshots of shell + login) before merge — included only if the user opts in;
otherwise the checklist above is run manually.
- **MxGateway `/login`** is auth-facing and net-new → `high-risk` classification (serial
spec→code review + final integration review).
---
## 6. Acceptance (per app)
Mirrors SPEC §7: (1) `ZB.MOM.WW.Theme 0.2.0` referenced + in `_Imports.razor`; (2)
`<ThemeHead/>` after Bootstrap and per-app `theme.css`/fonts deleted; (3) `MainLayout` is the
thin `ThemeShell` delegation; (4) nav rebuilt with `NavRailItem`/`NavRailSection` (+ shared
persistence via `<ThemeScripts/>`); (5) local `StatusBadge`/`.chip-*` removed → `<StatusPill>`;
(6) login is `<LoginCard>` (static POST, `<AntiforgeryToken/>`, server-validated `ReturnUrl`)
— including MxGateway's net-new page; (7) `site.css` residual + scoped `.razor.css` kept.
---
## 7. Out of scope
Per SPEC §0/§6: each app's `site.css` page-layout residual, route/page content, scoped
`.razor.css`, authorization logic. The kit owns *chrome and tokens*, not domain screens.
No new data grids/modals/toasts (YAGNI). Bootstrap stays per-app (not vendored by the kit).
+643
View File
@@ -0,0 +1,643 @@
# UI-Theme Adoption Implementation Plan
> **For Claude:** REQUIRED SUB-SKILL: Use superpowers-extended-cc:executing-plans to implement this plan task-by-task.
**Goal:** Enhance the shared `ZB.MOM.WW.Theme` RCL with cross-app nav-expand persistence (bump `0.2.0`, publish to the Gitea feed), then adopt it via full canonical cutover across OtOpcUa AdminUI, ScadaBridge CentralUI+Host, and MxAccessGateway Dashboard.
**Architecture:** A library-minor-then-adopt waterfall (same shape as the completed Auth/Audit normalization). Phase 0 enhances + publishes the kit. Phases 13 are **independent per-repo cutovers** (each on its own `feat/adopt-zb-theme` branch, local-only) ordered by risk. Phase 4 updates scadaproj docs + memory. UI-only — no data contracts, no DB migrations; the dominant risk is **visual regression**, mitigated by per-app build+test gates and a manual visual checklist.
**Tech Stack:** .NET 10, Blazor SSR, Razor Class Library, bUnit/xUnit, Bootstrap 5, NuGet central package management (OtOpcUa/ScadaBridge) / per-project versions (MxGateway), Gitea NuGet feed.
**Design:** [`2026-06-03-ui-theme-adoption-design.md`](2026-06-03-ui-theme-adoption-design.md). Decisions D1D8 there are authoritative.
---
## Conventions for the executor
- **Delivery:** scadaproj library + docs changes (Phases 0, 4) commit on the existing `docs/ui-theme-adoption` branch. Each app (Phases 13) gets its own `feat/adopt-zb-theme` branch, **committed local-only, never pushed** until the user explicitly authorizes merge+push (same model as Auth/Audit).
- **Per-repo green gate:** before declaring an app's phase done, run `dotnet build` + that repo's full test suite. **Baseline known pre-existing reds first** and do not chase them: ScadaBridge IntegrationTests ×11 (need live LDAP/SQL/SMTP), `PartitionPurgeTests.EndToEnd`, flaky `StaleTagMonitor` timer tests; MxGateway 3 FakeWorker tests. Only regressions introduced by this work count.
- **Cutover invariant (all apps):** the kit's `theme.css`/`layout.css` define `--*` tokens, the side-rail layout, and the `.chip`/`.chip-ok|warn|bad|idle|info` status classes. Before deleting an app's `wwwroot/css/theme.css`, **diff it against the kit's `theme.css`/`layout.css` and migrate any app-only rules** (e.g. OtOpcUa's `.chip-alert`/`.chip-caution`) into that app's `site.css`. The app's `site.css` page-layout residual and scoped `.razor.css` stay.
- **Status policy (per SPEC §6/§7):** inline `.chip-*` spans and Bootstrap `.badge` in *domain pages* are page content — they keep working under kit CSS and are **not** rewritten. Only a bespoke status *component* gets removed/redirected to `<StatusPill>`.
- **Cross-repo parallelism:** Phases 1, 2, 3 touch disjoint repos and are mutually independent — they MAY run concurrently, but are listed in risk order (OtOpcUa → ScadaBridge → MxGateway). All three are blocked by Task 0.4 (published package).
---
## Phase 0 — Library enhancement + publish (scadaproj, branch `docs/ui-theme-adoption`)
### Task 0.1: NavRailSection persistence key
**Classification:** standard
**Estimated implement time:** ~4 min
**Parallelizable with:** Task 0.2
**Files:**
- Modify: `ZB.MOM.WW.Theme/src/ZB.MOM.WW.Theme/Components/NavRailSection.razor`
- Test: `ZB.MOM.WW.Theme/tests/ZB.MOM.WW.Theme.Tests/NavRailTests.cs`
**Context:** `NavRailSection` renders `<details class="rail-section" open="@Expanded"><summary class="rail-eyebrow-toggle">@Title</summary>…`. Add an optional `Key` parameter (default = a stable slug of `Title`) emitted as `data-nav-key` on the `<details>` so the localStorage enhancer (Task 0.2) can persist per-section open state.
**Step 1 — failing tests** in `NavRailTests.cs`:
```csharp
[Fact]
public void NavRailSection_emits_data_nav_key_slug_from_title_by_default()
{
var cut = RenderComponent<NavRailSection>(p => p
.Add(x => x.Title, "Site Calls")
.AddChildContent("<a class='rail-link'>X</a>"));
Assert.Equal("site-calls", cut.Find("details.rail-section").GetAttribute("data-nav-key"));
}
[Fact]
public void NavRailSection_emits_explicit_key_when_supplied()
{
var cut = RenderComponent<NavRailSection>(p => p
.Add(x => x.Title, "Navigation").Add(x => x.Key, "nav")
.AddChildContent("<a class='rail-link'>X</a>"));
Assert.Equal("nav", cut.Find("details.rail-section").GetAttribute("data-nav-key"));
}
```
**Step 2 — run, expect FAIL** (no `Key`/`data-nav-key`):
`dotnet test ZB.MOM.WW.Theme/ --filter "FullyQualifiedName~NavRailSection_emits"`
**Step 3 — implement.** Edit `NavRailSection.razor`:
```razor
@namespace ZB.MOM.WW.Theme
<details class="rail-section" open="@Expanded" data-nav-key="@ResolvedKey">
<summary class="rail-eyebrow-toggle">@Title</summary>
<div class="rail-section-body">@ChildContent</div>
</details>
@code {
[Parameter, EditorRequired] public string Title { get; set; } = string.Empty;
[Parameter] public bool Expanded { get; set; } = true;
/// <summary>Stable identifier used to persist this section's open/closed state in
/// localStorage (via the kit's nav-state.js). Defaults to a slug of <see cref="Title"/>.</summary>
[Parameter] public string? Key { get; set; }
[Parameter] public RenderFragment? ChildContent { get; set; }
private string ResolvedKey => string.IsNullOrWhiteSpace(Key) ? Slug(Title) : Key!;
private static string Slug(string s)
{
var chars = s.Trim().ToLowerInvariant()
.Select(c => char.IsLetterOrDigit(c) ? c : '-').ToArray();
return string.Join('-', new string(chars).Split('-', StringSplitOptions.RemoveEmptyEntries));
}
}
```
**Step 4 — run, expect PASS** (plus the existing NavRail tests stay green).
**Step 5 — commit:** `git add -A && git commit -m "feat(theme): NavRailSection data-nav-key for persistence"`
---
### Task 0.2: localStorage nav enhancer + ThemeScripts
**Classification:** standard
**Estimated implement time:** ~5 min
**Parallelizable with:** Task 0.1
**Files:**
- Create: `ZB.MOM.WW.Theme/src/ZB.MOM.WW.Theme/wwwroot/js/nav-state.js`
- Create: `ZB.MOM.WW.Theme/src/ZB.MOM.WW.Theme/Components/ThemeScripts.razor`
- Test: `ZB.MOM.WW.Theme/tests/ZB.MOM.WW.Theme.Tests/ThemeScriptsTests.cs` (new)
- Test: `ZB.MOM.WW.Theme/tests/ZB.MOM.WW.Theme.Tests/StaticAssetsTests.cs` (extend)
**Step 1 — create `wwwroot/js/nav-state.js`** (progressive enhancement; no framework):
```javascript
// ZB.MOM.WW.Theme nav-state.js — persists <details data-nav-key> open/closed
// state in localStorage so NavRailSection expand state survives navigation and
// reloads. Pure client-side; works with static Blazor SSR. Keyed per section.
(function () {
var PREFIX = "zbnav:";
function apply() {
document.querySelectorAll("details.rail-section[data-nav-key]").forEach(function (el) {
var key = PREFIX + el.getAttribute("data-nav-key");
var saved = null;
try { saved = window.localStorage.getItem(key); } catch (e) { return; }
if (saved === "1") el.open = true;
else if (saved === "0") el.open = false;
el.addEventListener("toggle", function () {
try { window.localStorage.setItem(key, el.open ? "1" : "0"); } catch (e) { /* ignore */ }
});
});
}
if (document.readyState === "loading")
document.addEventListener("DOMContentLoaded", apply);
else
apply();
})();
```
**Step 2 — create `Components/ThemeScripts.razor`:**
```razor
@namespace ZB.MOM.WW.Theme
@* Components/ThemeScripts.razor — drop before </body>. Emits the kit's nav-state
enhancer that persists NavRailSection open/closed state in localStorage. *@
<script src="_content/ZB.MOM.WW.Theme/js/nav-state.js" defer></script>
```
**Step 3 — failing tests.** `ThemeScriptsTests.cs`:
```csharp
namespace ZB.MOM.WW.Theme.Tests;
public class ThemeScriptsTests : TestContext
{
[Fact]
public void ThemeScripts_emits_nav_state_script_tag()
{
var cut = RenderComponent<ThemeScripts>();
var script = cut.Find("script");
Assert.Equal("_content/ZB.MOM.WW.Theme/js/nav-state.js", script.GetAttribute("src"));
Assert.True(script.HasAttribute("defer"));
}
}
```
In `StaticAssetsTests.cs`, add an assertion that the JS file ships (mirror its existing CSS/font asset checks — read the file first to match its exact assertion style, e.g. verifying the file exists on disk under `wwwroot/js/nav-state.js`).
**Step 4 — run tests, expect PASS:** `dotnet test ZB.MOM.WW.Theme/`
**Step 5 — commit:** `git commit -am "feat(theme): ThemeScripts + localStorage nav-state enhancer"`
---
### Task 0.3: Version bump 0.2.0 + full suite
**Classification:** small
**Estimated implement time:** ~2 min
**Parallelizable with:** none (depends on 0.1, 0.2)
**Files:**
- Modify: `ZB.MOM.WW.Theme/Directory.Build.props:7`
**Steps:**
1. Change `<Version>0.1.0</Version>``<Version>0.2.0</Version>`.
2. Run `cd ZB.MOM.WW.Theme && dotnet build -c Release` — expect **0 warnings** (TreatWarningsAsErrors).
3. Run `dotnet test` — expect all green (38 existing + the new persistence/ThemeScripts tests).
4. Commit: `git commit -am "chore(theme): bump 0.1.0 -> 0.2.0 (nav persistence + ThemeScripts)"`
---
### Task 0.4: Publish 0.2.0 to Gitea feed
**Classification:** small
**Estimated implement time:** ~2 min (blocks on user-supplied token)
**Parallelizable with:** none (depends on 0.3)
**⚠ Requires the user's `GITEA_NUGET_KEY`** (Gitea token with `package:write`). It is not persisted — ask the user to export it (or run the push command themselves via `! …`). Do not invent or store it.
**Steps:**
1. Confirm 404 pre-state: `curl -s -o /dev/null -w "%{http_code}\n" https://gitea.dohertylan.com/api/packages/dohertj2/nuget/registration/zb.mom.ww.theme/index.json` (expect `404`).
2. Publish:
```bash
cd ZB.MOM.WW.Theme
export GITEA_NUGET_SOURCE="https://gitea.dohertylan.com/api/packages/dohertj2/nuget/index.json"
export GITEA_NUGET_KEY="<user-supplied>"
./build/push.sh
```
3. Verify published: re-run the curl — expect `200`; confirm version `0.2.0` is listed.
4. No commit needed (artifacts are gitignored). Record the publish in the task log.
---
## Phase 1 — OtOpcUa AdminUI cutover (repo `~/Desktop/OtOpcUa`, branch `feat/adopt-zb-theme`)
> Blocked by Task 0.4. Lowest risk: already side-rail with the kit's exact CSS classes.
> **First:** `cd ~/Desktop/OtOpcUa && git checkout -b feat/adopt-zb-theme`.
### Task 1.1: NuGet wiring + usings
**Classification:** small
**Estimated implement time:** ~3 min
**Parallelizable with:** none (gates 1.21.5)
**Files:**
- Modify: `Directory.Packages.props` (repo root) — add `<PackageVersion Include="ZB.MOM.WW.Theme" Version="0.2.0" />`
- Modify: `NuGet.config` (repo root) — under `<packageSource key="dohertj2-gitea">` add `<package pattern="ZB.MOM.WW.Theme" />`
- Modify: `src/Server/ZB.MOM.WW.OtOpcUa.AdminUI/ZB.MOM.WW.OtOpcUa.AdminUI.csproj` — add `<PackageReference Include="ZB.MOM.WW.Theme" />` (versionless; central PM)
- Modify: `src/Server/ZB.MOM.WW.OtOpcUa.AdminUI/_Imports.razor` — add `@using ZB.MOM.WW.Theme`
**Verify:** `dotnet restore src/Server/ZB.MOM.WW.OtOpcUa.AdminUI/ZB.MOM.WW.OtOpcUa.AdminUI.csproj` resolves `ZB.MOM.WW.Theme 0.2.0` from the Gitea feed. Commit.
### Task 1.2: App.razor — ThemeHead + ThemeScripts
**Classification:** small
**Estimated implement time:** ~2 min
**Parallelizable with:** Task 1.3, 1.4, 1.5
**Files:**
- Modify: `src/Server/ZB.MOM.WW.OtOpcUa.AdminUI/Components/App.razor`
**Edits:** Replace line `…/css/theme.css` `<link>` with `<ThemeHead />` (keep the Bootstrap `<link>` *above* it and the `…/css/site.css` `<link>` *below* it). Replace `<script src="…/js/nav-state.js"></script>` with `<ThemeScripts />`. Keep the bootstrap bundle + `blazor.web.js` scripts. Commit.
### Task 1.3: Migrate app-only CSS, delete theme.css + fonts + nav-state.js
**Classification:** standard
**Estimated implement time:** ~4 min
**Parallelizable with:** Task 1.2, 1.4, 1.5
**Files:**
- Modify: `src/Server/ZB.MOM.WW.OtOpcUa.AdminUI/wwwroot/css/site.css`
- Delete: `wwwroot/css/theme.css`, `wwwroot/fonts/ibm-plex-*.woff2` (×3), `wwwroot/js/nav-state.js`
**Steps:** Diff `wwwroot/css/theme.css` against the kit's `theme.css`+`layout.css`. Any rule present in the app copy but NOT the kit (notably **`.chip-alert`, `.chip-caution`**, and any app-only tweak) → append to `site.css` under a clearly-commented "App-specific status variants (not in ZB.MOM.WW.Theme)" block. Then delete the four asset files. Keep `wwwroot/js/monaco-loader.js`. Commit.
### Task 1.4: MainLayout → ThemeShell + kit nav
**Classification:** high-risk
**Estimated implement time:** ~5 min
**Parallelizable with:** Task 1.2, 1.3, 1.5
**Files:**
- Modify: `src/Server/ZB.MOM.WW.OtOpcUa.AdminUI/Components/Layout/MainLayout.razor`
- Delete: `Components/Layout/NavSidebar.razor`, `Components/Layout/NavSection.razor`
**Context:** Replaces the interactive `NavSidebar` island + bespoke `NavSection` with the kit's static `<ThemeShell>` + `NavRailSection`/`NavRailItem` (persistence now comes from `ThemeScripts`). All sections default `Expanded=true`; the URL-based auto-expand behavior is intentionally dropped (D2/D3 — localStorage persistence replaces it). Reproduce the 3 sections / 17 links / footer exactly.
**Target `MainLayout.razor`:**
```razor
@inherits LayoutComponentBase
<ThemeShell Product="OtOpcUa" Accent="#2f5fd0">
<Nav>
<NavRailSection Title="Navigation" Key="nav">
<NavRailItem Href="/" Text="Overview" Match="NavLinkMatch.All" />
<NavRailItem Href="/fleet" Text="Fleet status" />
<NavRailItem Href="/hosts" Text="Host status" />
<NavRailItem Href="/clusters" Text="Clusters" />
<NavRailItem Href="/reservations" Text="Reservations" />
<NavRailItem Href="/certificates" Text="Certificates" />
<NavRailItem Href="/role-grants" Text="Role grants" />
</NavRailSection>
<NavRailSection Title="Scripting" Key="scripting">
<NavRailItem Href="/virtual-tags" Text="Virtual tags" />
<NavRailItem Href="/scripted-alarms" Text="Scripted alarms" />
<NavRailItem Href="/scripts" Text="Scripts" />
<NavRailItem Href="/script-log" Text="Script log" />
</NavRailSection>
<NavRailSection Title="Live" Key="live">
<NavRailItem Href="/deployments" Text="Deployments" />
<NavRailItem Href="/alerts" Text="Alerts" />
<NavRailItem Href="/alarms-historian" Text="Alarms historian" />
</NavRailSection>
</Nav>
<RailFooter>
<AuthorizeView>
<Authorized>
<div class="rail-eyebrow">Session</div>
<a class="rail-user" href="/account">@context.User.Identity?.Name</a>
<div class="rail-roles">@string.Join(", ", context.User.Claims.Where(c => c.Type.EndsWith("/role")).Select(c => c.Value))</div>
<form method="post" action="/auth/logout"><AntiforgeryToken /><button class="rail-btn" type="submit">Sign out</button></form>
</Authorized>
<NotAuthorized>
<div class="rail-eyebrow">Session</div>
<a class="rail-btn" href="/login">Sign in</a>
</NotAuthorized>
</AuthorizeView>
</RailFooter>
<ChildContent>@Body</ChildContent>
</ThemeShell>
```
**Note:** confirm `ThemeShell` exposes `Nav`/`RailFooter`/`ChildContent` slots and that the hamburger/collapse behavior comes from the kit's `layout.css` (Bootstrap collapse JS already loaded). If the kit shell wraps the rail in its own collapse, drop the app's old hamburger markup (now in the shell). Build the AdminUI project; verify it compiles. Commit.
### Task 1.5: Delete dead StatusBadge + Login → LoginCard
**Classification:** standard
**Estimated implement time:** ~3 min
**Parallelizable with:** Task 1.2, 1.3, 1.4
**Files:**
- Delete: `Components/Shared/StatusBadge.razor` (verified unused — confirm with a repo grep for `<StatusBadge` returning 0 hits before deleting)
- Modify: `Components/Pages/Login.razor`
**Login target** (preserve static POST to `/auth/login`, the `Error`/`ReturnUrl` query params, and `LoginLayout`):
```razor
@page "/login"
@layout LoginLayout
@attribute [Microsoft.AspNetCore.Authorization.AllowAnonymous]
<div class="login-wrap rise" style="animation-delay:.02s">
<LoginCard Product="OtOpcUa Admin" Action="/auth/login" ReturnUrl="@ReturnUrl" Error="@Error">
<AntiforgeryToken />
</LoginCard>
</div>
@code {
[SupplyParameterFromQuery] private string? Error { get; set; }
[SupplyParameterFromQuery] private string? ReturnUrl { get; set; }
}
```
**Note:** the `/auth/login` endpoint already round-trips `returnUrl` and signs in (unchanged). Confirm `<LoginCard>` renders the username/password fields the endpoint reads (`name="username"`, `name="password"`, `name="returnUrl"`); if its field names differ, set them via LoginCard params or keep the existing `<form>` inside a `<TechCard>` instead. Build; commit.
### Task 1.6: Build, test, visual checklist (OtOpcUa)
**Classification:** small
**Estimated implement time:** ~3 min
**Parallelizable with:** none (depends on 1.21.5)
**Steps:** `dotnet build ZB.MOM.WW.OtOpcUa.slnx`; `dotnet test ZB.MOM.WW.OtOpcUa.slnx` (compare against baseline reds). Run the visual checklist (design §5): rail renders ≥lg + hamburger <lg; nav persistence across reload; status chips intact (incl. alert/caution); login posts + returnUrl; IBM Plex fonts load from `_content/ZB.MOM.WW.Theme/fonts/` (the old latent 404 is gone). Report results; do not merge.
---
## Phase 2 — ScadaBridge CentralUI + Host cutover (repo `~/Desktop/ScadaBridge`, branch `feat/adopt-zb-theme`)
> Blocked by Task 0.4. Independent of Phase 1. **First:** `cd ~/Desktop/ScadaBridge && git checkout -b feat/adopt-zb-theme`.
### Task 2.1: NuGet wiring + usings
**Classification:** small
**Estimated implement time:** ~3 min
**Files:**
- Modify: `Directory.Packages.props` — add `<PackageVersion Include="ZB.MOM.WW.Theme" Version="0.2.0" />`
- Modify: `nuget.config` — under `dohertj2-gitea` add `<package pattern="ZB.MOM.WW.Theme" />`
- Modify: `src/ZB.MOM.WW.ScadaBridge.CentralUI/ZB.MOM.WW.ScadaBridge.CentralUI.csproj` — add `<PackageReference Include="ZB.MOM.WW.Theme" />`
- Modify: `src/ZB.MOM.WW.ScadaBridge.CentralUI/_Imports.razor` — add `@using ZB.MOM.WW.Theme`
- Modify: `src/ZB.MOM.WW.ScadaBridge.Host/_Imports.razor` — add `@using ZB.MOM.WW.Theme` (Host's `App.razor` uses `ThemeHead`/`ThemeScripts`; the RCL flows transitively via the CentralUI project reference)
**Verify** restore resolves 0.2.0; commit.
### Task 2.2: Host App.razor — ThemeHead + ThemeScripts
**Classification:** small
**Estimated implement time:** ~2 min
**Parallelizable with:** Task 2.3, 2.4, 2.5
**Files:**
- Modify: `src/ZB.MOM.WW.ScadaBridge.Host/Components/App.razor`
**Edits:** Replace the `_content/ZB.MOM.WW.ScadaBridge.CentralUI/css/theme.css` `<link>` with `<ThemeHead />` (keep Bootstrap + bootstrap-icons links above; keep `/ZB.MOM.WW.ScadaBridge.Host.styles.css` and the CentralUI `site.css` link). Replace the `…CentralUI/js/nav-state.js` `<script>` with `<ThemeScripts />`; keep `treeview-storage.js`, `monaco-init.js`, `audit-grid.js`, `transport.js`, and the bootstrap bundle. Commit.
### Task 2.3: Migrate app-only CSS, delete theme.css + fonts + nav-state.js
**Classification:** standard
**Estimated implement time:** ~3 min
**Parallelizable with:** Task 2.2, 2.4, 2.5
**Files:**
- Modify: `src/ZB.MOM.WW.ScadaBridge.CentralUI/wwwroot/css/site.css` (only if the diff surfaces app-only rules)
- Delete: `CentralUI/wwwroot/css/theme.css`, `CentralUI/wwwroot/fonts/ibm-plex-*.woff2` (×3), `CentralUI/wwwroot/js/nav-state.js`
**Steps:** Diff CentralUI `theme.css` vs kit; migrate any app-only rules into `site.css` (ScadaBridge's chips are the standard ok/warn/bad/idle, covered by the kit — expect little/none). Keep the other JS files. Commit.
### Task 2.4: MainLayout → ThemeShell + kit nav (preserve AuthorizeView gating, DialogHost, SessionExpiry)
**Classification:** high-risk
**Estimated implement time:** ~5 min
**Parallelizable with:** Task 2.2, 2.3, 2.5
**Files:**
- Modify: `src/ZB.MOM.WW.ScadaBridge.CentralUI/Components/Layout/MainLayout.razor`
- Modify: `src/ZB.MOM.WW.ScadaBridge.CentralUI/Components/Layout/NavMenu.razor`
- Delete: `Components/Layout/NavSection.razor` (after NavMenu no longer uses it)
**Context:** Two non-obvious must-preserves: `MainLayout` hosts `<DialogHost />` and `<SessionExpiry />` — keep both in the thin layout (outside `<ThemeShell>` or in `ChildContent` alongside `@Body`). `NavMenu` wraps its sections in `<AuthorizeView Policy="…">` (RequireAdmin/RequireDesign/RequireDeployment/OperationalAudit + mixed-role children) — these policy guards must wrap the new `NavRailSection`s unchanged.
**Steps:**
1. `MainLayout.razor` → thin delegation:
```razor
@inherits LayoutComponentBase
<ThemeShell Product="ScadaBridge" Accent="#2f5fd0">
<Nav><NavMenu /></Nav>
<RailFooter>
<AuthorizeView><Authorized>
<div class="rail-eyebrow">Session</div>
<span class="rail-user">@context.User.GetDisplayName()</span>
<form method="post" action="/auth/logout" data-enhance="false"><AntiforgeryToken /><button class="rail-btn" type="submit">Sign Out</button></form>
</Authorized></AuthorizeView>
</RailFooter>
<ChildContent>@Body</ChildContent>
</ThemeShell>
<DialogHost />
<SessionExpiry />
```
(Move the session/sign-out block out of `NavMenu` into `RailFooter`; keep `GetDisplayName()`.)
2. Rewrite `NavMenu.razor` body: replace `<nav class="sidebar"><ul class="nav flex-column">…` + `<li><NavLink class="nav-link">` + `<NavSection>` with kit `NavRailSection`/`NavRailItem`, **preserving each `<AuthorizeView Policy="…">` wrapper** around its section. The always-visible Dashboard link becomes a bare `<NavRailItem Href="/" Text="Dashboard" Match="NavLinkMatch.All" />` (outside any section, or in a default section). Reproduce all sections/links from the inventory (Admin, Design, Deployment, Notifications, Site Calls, Monitoring, Audit) and their child links exactly.
3. Build CentralUI; **verify `<AuthorizeView>`-wrapped `NavRailSection` renders for an authorized principal and hides for an unauthorized one** under static SSR (GAPS open question) — assert via an existing CentralUI bUnit test or add a focused one. Commit.
### Task 2.5: Login → LoginCard
**Classification:** standard
**Estimated implement time:** ~3 min
**Parallelizable with:** Task 2.2, 2.3, 2.4
**Files:**
- Modify: `src/ZB.MOM.WW.ScadaBridge.CentralUI/Components/Pages/Login.razor`
**Target** (preserve static POST to `/auth/login`; the endpoint uses `.DisableAntiforgery()` and always redirects `/` — no `returnUrl`, no antiforgery token needed):
```razor
@page "/login"
@layout LoginLayout
@using Microsoft.AspNetCore.Authorization
@attribute [AllowAnonymous]
<LoginCard Product="ScadaBridge" Action="/auth/login" Error="@ErrorMessage" />
@code {
[SupplyParameterFromQuery(Name = "error")] public string? ErrorMessage { get; set; }
}
```
Confirm `<LoginCard>`'s field names match what `/auth/login` reads (`username`/`password`). Build; commit.
### Task 2.6: Build, test, visual checklist (ScadaBridge)
**Classification:** small
**Estimated implement time:** ~4 min
**Parallelizable with:** none (depends on 2.22.5)
**Steps:** `dotnet build ZB.MOM.WW.ScadaBridge.slnx`; run the FULL suite (Host/CentralUI/ManagementService/Transport/ConfigurationDatabase) and compare to baseline reds. Visual checklist incl. policy-gated nav sections show/hide by role, DialogHost + SessionExpiry still function. Report; do not merge.
---
## Phase 3 — MxAccessGateway Dashboard cutover (repo `~/Desktop/MxAccessGateway`, branch `feat/adopt-zb-theme`)
> Blocked by Task 0.4. Independent of Phases 12. Highest risk: combined-layout split + login conversion. **First:** `cd ~/Desktop/MxAccessGateway && git checkout -b feat/adopt-zb-theme`.
### Task 3.1: NuGet wiring + usings (no central PM)
**Classification:** small
**Estimated implement time:** ~3 min
**Files:**
- Modify: `src/ZB.MOM.WW.MxGateway.Server/ZB.MOM.WW.MxGateway.Server.csproj` — add `<PackageReference Include="ZB.MOM.WW.Theme" Version="0.2.0" />` (explicit version; this repo has no `Directory.Packages.props`)
- Modify: `NuGet.config` — under `dohertj2-gitea` add `<package pattern="ZB.MOM.WW.Theme" />`
- Modify: `src/ZB.MOM.WW.MxGateway.Server/Dashboard/Components/_Imports.razor` — add `@using ZB.MOM.WW.Theme`
**Verify** restore resolves 0.2.0; commit.
### Task 3.2: App.razor — ThemeHead + ThemeScripts
**Classification:** small
**Estimated implement time:** ~2 min
**Parallelizable with:** Task 3.3, 3.5
**Files:**
- Modify: `src/ZB.MOM.WW.MxGateway.Server/Dashboard/Components/App.razor`
**Edits:** Replace `<link rel="stylesheet" href="/css/theme.css" />` with `<ThemeHead />` (keep Bootstrap link above + `/css/site.css` below). Replace `<script src="/js/nav-state.js"></script>` with `<ThemeScripts />`. Keep bootstrap bundle + `blazor.web.js`. Commit. *(Note: `<HeadOutlet>`/`<Routes>` keep their `@rendermode="InteractiveServer"`; ThemeHead/ThemeScripts are static markup and unaffected.)*
### Task 3.3: Migrate app-only CSS, delete theme.css + fonts
**Classification:** standard
**Estimated implement time:** ~3 min
**Parallelizable with:** Task 3.2, 3.5
**Files:**
- Modify: `src/ZB.MOM.WW.MxGateway.Server/wwwroot/css/site.css` (only if diff surfaces app-only rules)
- Delete: `wwwroot/css/theme.css`, `wwwroot/fonts/ibm-plex-*.woff2` (×3)
**Steps:** Diff `theme.css` vs kit; migrate app-only rules to `site.css`. The kit's `@font-face` uses the correct relative path (the app's absolute `/fonts/` path is retired). **Keep** `wwwroot/js/nav-state.js`? No — it is replaced by `ThemeScripts` (Task 3.2 removed its `<script>`); delete `wwwroot/js/nav-state.js` here too. Commit.
### Task 3.4: Split combined MainLayout → thin MainLayout + ThemeShell
**Classification:** high-risk
**Estimated implement time:** ~5 min
**Parallelizable with:** Task 3.5
**Files:**
- Modify: `src/ZB.MOM.WW.MxGateway.Server/Dashboard/Components/Layout/MainLayout.razor`
- Delete: `Dashboard/Components/Layout/NavSection.razor`
**Context:** The ~211-line combined layout (hamburger + `<nav class="sidebar">` + brand + 3 `NavSection`s + AuthorizeView footer + `<main>`) collapses to a thin `<ThemeShell>` delegation. Reproduce the Runtime / Galaxy / Admin sections + links and the footer (Authorized: user + Sign Out POST `/logout`; NotAuthorized: Sign In `/login`).
```razor
@inherits LayoutComponentBase
<ThemeShell Product="MXAccess Gateway" Accent="#2f5fd0">
<Nav>
<NavRailItem Href="/" Text="Dashboard" Match="NavLinkMatch.All" />
<NavRailSection Title="Runtime" Key="runtime"> … sessions/workers/events/alarms … </NavRailSection>
<NavRailSection Title="Galaxy" Key="galaxy"> … repository/browse … </NavRailSection>
<NavRailSection Title="Admin" Key="admin"> … API Keys/settings … </NavRailSection>
</Nav>
<RailFooter>
<AuthorizeView>
<Authorized>
<div class="rail-eyebrow">Session</div>
<span class="rail-user">@context.User.Identity?.Name</span>
<form method="post" action="/logout" data-enhance="false"><AntiforgeryToken /><button class="rail-btn" type="submit">Sign Out</button></form>
</Authorized>
<NotAuthorized><a class="rail-btn" href="/login">Sign In</a></NotAuthorized>
</AuthorizeView>
</RailFooter>
<ChildContent>@Body</ChildContent>
</ThemeShell>
```
Fill the section children from the inventory's exact hrefs/labels. Build the Server project; commit.
### Task 3.5: StatusBadge → StatusPill adapter
**Classification:** standard
**Estimated implement time:** ~3 min
**Parallelizable with:** Task 3.2, 3.3, 3.4
**Files:**
- Modify: `src/ZB.MOM.WW.MxGateway.Server/Dashboard/Components/Shared/StatusBadge.razor`
**Decision (documented deviation from SPEC §7.5):** `StatusBadge` is used at 12 call sites with a domain `Text`→class switch. Rather than scatter the text→state mapping across 12 pages, **redirect `StatusBadge` to render `<StatusPill>`** — the bespoke `.chip` rendering moves to the kit; only the app's domain text→state mapping (per-project vocabulary, SPEC §6) remains. Call sites stay unchanged.
```razor
@* Thin adapter: maps MxGateway runtime state text → kit StatusPill state. *@
<StatusPill State="MapState(Text)">@Text</StatusPill>
@code {
[Parameter] public string? Text { get; set; }
private static StatusState MapState(string? t) => t switch
{
"Ready" or "Healthy" or "Active" => StatusState.Ok,
"Creating" or "StartingWorker" or "WaitingForPipe" or "InitializingWorker" or "Closing" or "Stale" or "Degraded" => StatusState.Warn,
"Faulted" or "Unavailable" => StatusState.Bad,
_ => StatusState.Idle,
};
}
```
Confirm `StatusPill` renders its `ChildContent` as the label and emits `chip chip-ok|warn|bad|idle`. Build; commit. *(If the reviewer insists on literal deletion, the fallback is replacing all 12 call sites with `<StatusPill>` + a shared static `MapState` helper — note it but prefer the adapter.)*
### Task 3.6: Net-new Blazor LoginCard page (reuse existing hardened endpoint)
**Classification:** high-risk
**Estimated implement time:** ~5 min
**Parallelizable with:** none (depends on 3.4 for layout/usings context)
**Files:**
- Create: `Dashboard/Components/Layout/LoginLayout.razor`
- Create: `Dashboard/Components/Pages/Login.razor`
- Modify: `Dashboard/DashboardEndpointRouteBuilderExtensions.cs`
**Context (discovered reality):** MxGateway is NOT login-less — it has a working, hardened login: `POST /login` (`PostLoginAsync`) validates antiforgery, calls `IDashboardAuthenticator.AuthenticateAsync` (LDAP via shared Auth → roles → `ZbClaimTypes` principal), `SignInAsync`, then `LocalRedirect(SanitizeReturnUrl(returnUrl))`. The login *UI* is a raw HTML string from `GetLoginAsync`/`RenderLoginPage`. We swap **only the UI** to a Blazor `<LoginCard>` page; the `POST /login` endpoint and authenticator are reused unchanged. A `<form method="post">` posts natively, so the page's render mode is irrelevant to the POST.
**Steps:**
1. `LoginLayout.razor`: `@inherits LayoutComponentBase` + `@Body` (no rail).
2. `Login.razor`:
```razor
@page "/login"
@layout LoginLayout
@using Microsoft.AspNetCore.Authorization
@attribute [AllowAnonymous]
<div class="dashboard-login">
<LoginCard Product="MXAccess Gateway" Action="/login" ReturnUrl="@ReturnUrl" Error="@Error">
<AntiforgeryToken />
</LoginCard>
</div>
@code {
[SupplyParameterFromQuery(Name = "returnUrl")] private string? ReturnUrl { get; set; }
[SupplyParameterFromQuery(Name = "error")] private string? Error { get; set; }
}
```
3. In `DashboardEndpointRouteBuilderExtensions.cs`:
- **Remove** the `MapGet("/login", … GetLoginAsync)` registration and the `GetLoginAsync` + `RenderLoginPage` helpers (the Blazor route now serves `GET /login`; the component carries `[AllowAnonymous]` to override the `RequireAuthorization(ViewerPolicy)` on `MapRazorComponents<App>()`).
- **Change** `PostLoginAsync`'s failure branch from re-rendering HTML to a redirect: `return Results.Redirect($"/login?error={Uri.EscapeDataString(result.FailureMessage ?? "…")}&returnUrl={Uri.EscapeDataString(returnUrl)}");`. Keep antiforgery validation, `SignInAsync`, and the success `LocalRedirect(returnUrl)`.
- Keep `MapPost("/login")`, `/logout` (GET+POST), and `/denied` (still uses `RenderPage`).
4. Build the Server project. **Verify the full flow:** unauthenticated request to `/` → cookie challenge → `/login` renders the Blazor `<LoginCard>` anonymously → POST authenticates → cookie set → redirect. Commit.
### Task 3.7: Build, test, login smoke (MxGateway)
**Classification:** small
**Estimated implement time:** ~4 min
**Parallelizable with:** none (depends on 3.23.6)
**Steps:** `dotnet build src/MxGateway.sln` (+ worker x86 if the suite needs it); `dotnet test src/MxGateway.Tests/MxGateway.Tests.csproj` (compare to the 3 baseline FakeWorker reds). Visual + auth smoke: layout renders, nav persists, status pills, **login page renders and a valid/invalid credential round-trips correctly** (the highest-risk surface). Report; do not merge.
---
## Phase 4 — scadaproj docs + memory (branch `docs/ui-theme-adoption`)
### Task 4.1: Update component docs
**Classification:** small
**Estimated implement time:** ~3 min
**Parallelizable with:** none (depends on 1.6, 2.6, 3.7)
**Files:**
- Modify: `components/ui-theme/GAPS.md` — add an "✅ ADOPTED 2026-06-03 (local-only)" banner mirroring the Auth/Audit GAPS banners; note persistence promoted to the kit + MxGateway's new LoginCard page.
- Modify: `components/ui-theme/shared-contract/ZB.MOM.WW.Theme.md` — status → "Built + Published `0.2.0`"; document `ThemeScripts` + `NavRailSection.Key` + the nav-state.js asset.
- Modify: `CLAUDE.md` — UI-Theme component row status → "Adopted (lib `0.2.0`; all 3 apps, local feature branches)"; bump the version/test-count prose (38→ new total).
Commit.
### Task 4.2: Update memory
**Classification:** trivial
**Estimated implement time:** ~2 min
**Parallelizable with:** Task 4.1
**Files:**
- Create: `/Users/dohertj2/.claude/projects/-Users-dohertj2-Desktop-scadaproj/memory/ui-theme-adoption.md` (project memory: scope, 0.2.0, per-app branches local-only, MxGateway login conversion, persistence-in-kit decision; link `[[component-status-claims-are-optimistic]]`, `[[shared-libs-are-plain-files-not-nested-repos]]`).
- Modify: `…/memory/MEMORY.md` — add the index line.
Commit.
### Task 4.3: Final integration review
**Classification:** standard
**Estimated implement time:** ~4 min
**Parallelizable with:** none
Dispatch a final reviewer across all three `feat/adopt-zb-theme` diffs + the scadaproj Phase 0/4 diff: confirm SPEC §7 acceptance per app, no app-only CSS lost, no regressions vs baseline, and the cross-app consistency of the shell/nav/login. Produce a go/no-go for the merge+push decision (which remains the user's call).
---
## Dependency summary
- `0.1, 0.2``0.3``0.4`.
- `0.4` blocks `1.1`, `2.1`, `3.1`.
- Within Phase 1: `1.1` → {`1.2`, `1.3`, `1.4`, `1.5`} (parallel) → `1.6`.
- Within Phase 2: `2.1` → {`2.2`, `2.3`, `2.4`, `2.5`} (parallel) → `2.6`.
- Within Phase 3: `3.1` → {`3.2`, `3.3`, `3.5`} (parallel) and `3.1``3.4``3.6`; all → `3.7`.
- `{1.6, 2.6, 3.7}``4.1`, `4.2``4.3`.
- Phases 1/2/3 are independent repos (may run concurrently; listed in risk order).
@@ -0,0 +1,32 @@
{
"planPath": "docs/plans/2026-06-03-ui-theme-adoption.md",
"tasks": [
{"id": 44, "subject": "Task 0.1: NavRailSection persistence key", "status": "pending"},
{"id": 45, "subject": "Task 0.2: nav-state.js enhancer + ThemeScripts", "status": "pending"},
{"id": 46, "subject": "Task 0.3: Bump 0.2.0 + full suite", "status": "pending", "blockedBy": [44, 45]},
{"id": 47, "subject": "Task 0.4: Publish 0.2.0 to Gitea feed", "status": "pending", "blockedBy": [46]},
{"id": 48, "subject": "Task 1.1: OtOpcUa NuGet wiring + usings", "status": "pending", "blockedBy": [47]},
{"id": 49, "subject": "Task 1.2: OtOpcUa App.razor ThemeHead/ThemeScripts", "status": "pending", "blockedBy": [48]},
{"id": 50, "subject": "Task 1.3: OtOpcUa migrate CSS, delete theme.css/fonts/nav-state.js", "status": "pending", "blockedBy": [48]},
{"id": 51, "subject": "Task 1.4: OtOpcUa MainLayout to ThemeShell + kit nav", "status": "pending", "blockedBy": [48]},
{"id": 52, "subject": "Task 1.5: OtOpcUa delete dead StatusBadge + Login to LoginCard", "status": "pending", "blockedBy": [48]},
{"id": 53, "subject": "Task 1.6: OtOpcUa build/test/visual checklist", "status": "pending", "blockedBy": [49, 50, 51, 52]},
{"id": 54, "subject": "Task 2.1: ScadaBridge NuGet wiring + usings", "status": "pending", "blockedBy": [47]},
{"id": 55, "subject": "Task 2.2: ScadaBridge Host App.razor ThemeHead/ThemeScripts", "status": "pending", "blockedBy": [54]},
{"id": 56, "subject": "Task 2.3: ScadaBridge migrate CSS, delete theme.css/fonts/nav-state.js", "status": "pending", "blockedBy": [54]},
{"id": 57, "subject": "Task 2.4: ScadaBridge MainLayout/NavMenu to ThemeShell (preserve AuthorizeView/DialogHost/SessionExpiry)", "status": "pending", "blockedBy": [54]},
{"id": 58, "subject": "Task 2.5: ScadaBridge Login to LoginCard", "status": "pending", "blockedBy": [54]},
{"id": 59, "subject": "Task 2.6: ScadaBridge build/test/visual checklist", "status": "pending", "blockedBy": [55, 56, 57, 58]},
{"id": 60, "subject": "Task 3.1: MxGateway NuGet wiring + usings (no central PM)", "status": "pending", "blockedBy": [47]},
{"id": 61, "subject": "Task 3.2: MxGateway App.razor ThemeHead/ThemeScripts", "status": "pending", "blockedBy": [60]},
{"id": 62, "subject": "Task 3.3: MxGateway migrate CSS, delete theme.css/fonts/nav-state.js", "status": "pending", "blockedBy": [60]},
{"id": 63, "subject": "Task 3.4: MxGateway split combined MainLayout to ThemeShell", "status": "pending", "blockedBy": [60]},
{"id": 64, "subject": "Task 3.5: MxGateway StatusBadge to StatusPill adapter", "status": "pending", "blockedBy": [60]},
{"id": 65, "subject": "Task 3.6: MxGateway net-new Blazor LoginCard page", "status": "pending", "blockedBy": [63]},
{"id": 66, "subject": "Task 3.7: MxGateway build/test/login smoke", "status": "pending", "blockedBy": [61, 62, 63, 64, 65]},
{"id": 67, "subject": "Task 4.1: Update component docs", "status": "pending", "blockedBy": [53, 59, 66]},
{"id": 68, "subject": "Task 4.2: Update memory", "status": "pending", "blockedBy": [53, 59, 66]},
{"id": 69, "subject": "Task 4.3: Final integration review", "status": "pending", "blockedBy": [67, 68]}
],
"lastUpdated": "2026-06-03"
}
@@ -0,0 +1,141 @@
# Shared GLAuth Standardization — Design
> **Status:** IMPLEMENTED + verified 2026-06-04 (all 18 plan tasks). See `shared-glauth-on-35` memory.
> Plan: [`2026-06-04-shared-glauth-standardization.md`](2026-06-04-shared-glauth-standardization.md).
> **Scope:** dev/test only. Production stays on real corporate AD (out of scope).
## Goal
Consolidate the three sister projects (OtOpcUa, MxAccessGateway, ScadaBridge) onto **one shared
GLAuth dev directory** running on the shared Docker host **`10.100.0.35:3893`**, replacing the
three separate LDAP setups in use today. This is the natural endpoint of the Auth-component
normalization: all three already use the shared `ZB.MOM.WW.Auth.Ldap` library (search-then-bind)
and already default to the same base DN `dc=zb,dc=local`.
## Decisions (locked during brainstorming)
| Decision | Choice |
|---|---|
| Environments | **Dev/test only** (prod → real AD, untouched) |
| Consolidation depth | **Full** — every dev instance points at 35 |
| Transport | **Plaintext** (`Transport=None`, `AllowInsecure=true`) — trusted lab subnet |
| Source of truth | **`scadaproj/infra/glauth/`** (app-neutral, next to the other shared `ZB.MOM.WW.*` components) — Approach A |
## Architecture
```
scadaproj/infra/glauth/ ← single source of truth (git)
├── config.toml (merged dc=zb,dc=local directory)
├── docker-compose.yml (one `glauth` service, :3893)
└── README.md
│ deploy on 10.100.0.35: docker compose up -d
GLAuth @ 10.100.0.35:3893 · datastore=config · baseDN dc=zb,dc=local · ldaps=false
▲ ▲
plaintext bind │ (None + AllowInsecure) │
┌──────────────┴───────────┐ ┌─────────┴─────────────────────┐
Mac / OrbStack │ windev (10.100.0.48)
• ScadaBridge :9000/:9100 │ • MxGateway (MxAccessGw svc)
• OtOpcUa docker-dev │ • OtOpcUa (OtOpcUa svc)
(un-stubbed)
```
- One `glauth` container on `10.100.0.35:3893`, `datastore=config`, `baseDN=dc=zb,dc=local`, ldaps disabled.
- Every dev consumer: `Server=10.100.0.35`, `Port=3893`, `Transport=None`, `AllowInsecure=true`, `SearchBase=dc=zb,dc=local`.
- **Retired:** the `scadabridge-ldap` container (ScadaBridge `infra/docker-compose.yml`) and the windev-local glauth (`C:\publish\glauth`).
- **Consequences:** windev gains a runtime dependency on 35 for *new* logins (existing cookie sessions unaffected); deploying to 35 needs working access (see Prerequisites).
## The merged directory
One `dc=zb,dc=local` directory; group families partitioned into **non-overlapping gid ranges** (today
both existing GLAuth files reuse 55015505 — the collision to fix). **Each app maps only its own family
and ignores the rest**, so the families coexist with zero conflict.
**Groups**
| Family | Used by | Groups (gidnumber) |
|---|---|---|
| `SCADA-*` (55xx) | ScadaBridge roles (DB-mapped) | Admins 5501, Designers 5502, Deploy-All 5503, Deploy-SiteA 5504, Viewers 5505 |
| OPC-perm (560x) | OtOpcUa + MxGateway OPC-UA write model | ReadOnly 5601, WriteOperate 5602, WriteTune 5603, WriteConfigure 5604, AlarmAck 5605 |
| `Gw*` (561x) | MxGateway dashboard (config-mapped) | GwAdmin 5610, GwReader 5611 |
| `OtOpcUa-*` (57xx) | OtOpcUa AdminUI (DB-mapped) | Admins 5701, Designers 5702, Viewers 5703 |
`SCADA-*` keeps its canonical 55xx numbers (already deployed). The OPC/`Gw` groups move off the old
55015505/5510 into 56xx to clear the clash.
**Users** (all password `password`; uid ranges 50xx ScadaBridge / 51xx MxGateway / 52xx OtOpcUa)
- **`serviceaccount`** (5999, `cn=serviceaccount,dc=zb,dc=local`, `search *` capability) — the *single*
bind account every app uses. Password `serviceaccount123`. ScadaBridge moves to it from `cn=admin`/`password`.
- **`multi-role`** (5005) — member of **every** group → all roles in all three apps (canonical cross-app QA login).
- **`admin`** (5001) — `SCADA-Admins` + `GwAdmin` + `OtOpcUa-Admins` → Administrator everywhere.
- Per-role testers: `designer`/`deployer`/`site-deployer` (ScadaBridge); `gwreader` (MxGateway Viewer);
`otdesigner`/`otviewer` (OtOpcUa); `readonly`/`writeop`/`writetune`/`writeconfig`/`alarmack` (OPC perms).
## Per-app config changes
Each consumer changes only its LDAP `Server` (+ a few keys). Shared service account
`cn=serviceaccount,dc=zb,dc=local` / `serviceaccount123`.
- **ScadaBridge** (`docker/` + `docker-env2/`, central-node-a & -b `appsettings.Central.json`):
`Ldap:Server` `scadabridge-ldap``10.100.0.35`; `ServiceAccountDn` `cn=admin``cn=serviceaccount`,
`ServiceAccountPassword``serviceaccount123`. Rest unchanged (`SCADA-*` DB mappings already seeded).
Retire the `ldap` service in `infra/docker-compose.yml`; sequenced-recreate central nodes.
- **OtOpcUa docker-dev** (`docker-dev/docker-compose.yml`, all host containers) — **the un-stub**:
drop `Security__Ldap__DevStubMode=true`; add `Server=10.100.0.35`, `Port=3893`, `Transport=None`,
`AllowInsecure=true`, `SearchBase=dc=zb,dc=local`, `ServiceAccountDn=cn=serviceaccount,…`,
`ServiceAccountPassword=serviceaccount123`. Seed OtOpcUa DB mappings
`OtOpcUa-Admins→Administrator`, `OtOpcUa-Designers→Designer`, `OtOpcUa-Viewers→Viewer` (system-wide).
- **MxGateway** (windev `C:\publish\mxaccessgw\Server\appsettings.json`): `Ldap:Server`
`localhost``10.100.0.35`; `SearchBase` `dc=lmxopcua``dc=zb,dc=local`; `ServiceAccountDn``…dc=zb,dc=local`.
`Transport=None`/`AllowInsecure=true` already migrated; `GroupToRole` (`GwAdmin`/`GwReader`) unchanged.
Restart `MxAccessGw` (+ dependent `OtOpcUa` svc).
- **OtOpcUa (windev service)**: locate its deployed overlay; repoint `Server``10.100.0.35`,
`SearchBase``dc=zb,dc=local`, service account, and switch dev transport `Ldaps``None`+`AllowInsecure`.
- **Then** stop/disable the windev-local `glauth` service.
## Rollout & rollback
Incremental; **the old glauths stay up until the very end**, so every step is reversible by pointing
`Server` back.
1. Stand up the shared glauth on 35 → verify via `ldapsearch` (bind `serviceaccount`; `multi-role`
`memberOf` spans all families). Nothing repointed yet.
2. Prove reachability from an OrbStack container to `10.100.0.35:3893` (the linchpin) before any app edit.
3. ScadaBridge `:9000` → recreate → browser-verify `multi-role` = 4 roles. Then `:9100`.
4. OtOpcUa docker-dev → un-stub + repoint + seed → recreate → verify.
5. windev MxGateway (backup appsettings) → restart → verify. Then windev OtOpcUa overlay.
6. Only once all green: stop/disable `scadabridge-ldap` + the windev-local glauth.
**Rollback** per consumer: revert the one-line `Server` change (git revert on the Mac; `.bak` restore on
windev) and recreate/restart. Remove the shared glauth = `docker compose down` on 35.
## Testing & verification
- **LDAP layer:** `ldapsearch` bind `serviceaccount`; confirm each test user + `multi-role`'s `memberOf`
across all four families; bind each user to confirm `password`.
- **Per-app browser (macbook Chrome):** ScadaBridge `:9000`/`:9100` `multi-role` → 4 roles (via
`/auth/token`); OtOpcUa `:9200` → seeded roles; MxGateway `10.100.0.48:5130` → Administrator; windev OtOpcUa → AdminUI.
- **Role-gating spot-checks:** `gwreader`→MxGateway Viewer-only; `designer`→ScadaBridge design-only;
`otviewer`→OtOpcUa read-only.
- **Negative:** wrong password rejected everywhere; a user in no family of an app → denied there.
## Prerequisites & open items (resolve in the plan)
1. **Access to `10.100.0.35`** — SSH from this Mac is currently refused (`Permission denied`/connection
reset) and the windev→35 jump is administratively prohibited. Either re-authorize this Mac's key on 35,
or the user runs the final `docker compose up -d`. Artifacts are portable either way.
2. **OtOpcUa group key shape** — confirm OtOpcUa maps on the **short RDN** (`OtOpcUa-Admins`) the shared
lib returns vs the full-DN its `LdapGroupRoleMapping` entity comment shows, before seeding.
3. **OrbStack→LAN reachability** — verify ScadaBridge/OtOpcUa containers can reach `10.100.0.35:3893`
early (likely fine; it's the linchpin). `log()` if any consumer can't reach 35 rather than silently failing.
4. **windev OtOpcUa config path** — discovery step (less is known about this deployment than MxGateway).
## Notes
- `scadaproj` is a plain-files umbrella that is *also* a local git repo; `infra/glauth/` lives here as the
canonical source. Per-app config edits land on a `feat/*` branch per repo (merge on the user's go).
windev edits are deployment-only with `.bak` backups (like the GroupToRole / LDAP-key migrations done
2026-06-04); repo templates optionally aligned.
- Related memory: `multi-role-cross-app-test-user`, `mxgateway-windev-deploy`,
`scadabridge-local-deploy-gotchas`, `auth-audit-normalization-in-progress`.
@@ -0,0 +1,610 @@
# Shared GLAuth Standardization — Implementation Plan
> **For Claude:** REQUIRED SUB-SKILL: Use superpowers-extended-cc:executing-plans to implement this plan task-by-task.
**Goal:** Consolidate OtOpcUa, MxAccessGateway, and ScadaBridge **dev/test** auth onto one shared GLAuth directory at `10.100.0.35:3893` (`dc=zb,dc=local`, plaintext), replacing the three separate LDAP setups.
**Architecture:** A single app-neutral GLAuth `config` directory lives in `scadaproj/infra/glauth/` (source of truth) and runs as one container on the shared Docker host `10.100.0.35`. Group families are partitioned into non-overlapping gid ranges (`SCADA-*` 55xx, OPC-perm/`Gw*` 56xx, `OtOpcUa-*` 57xx); each app maps only its own family. Every dev consumer just repoints its LDAP `Server` at `10.100.0.35`. Rollout is incremental and keeps the old glauths running until each consumer is verified.
**Tech Stack:** GLAuth (`glauth/glauth:latest`, TOML `config` datastore), Docker Compose / OrbStack (Mac) + Docker on `10.100.0.35`, .NET 10 apps using the shared `ZB.MOM.WW.Auth.Ldap` (search-then-bind), MSSQL config DBs, Windows/NSSM services on windev (`10.100.0.48`), `ldapsearch` + Chrome (macbook) for verification.
**Design:** [`2026-06-04-shared-glauth-standardization-design.md`](2026-06-04-shared-glauth-standardization-design.md)
**Reference values**
- `password` → sha256 `5e884898da28047151d0e56f8dc6292773603d0d6aabbdd62a11ef721d1542d8`
- `serviceaccount123` → sha256 `af29d0e5c9801ae98a999ed3915e1cf428a64b4b62b3cf221b6336cce0398419`
- Shared service account: `cn=serviceaccount,dc=zb,dc=local` / `serviceaccount123`
- All consumer LDAP keys: `Server=10.100.0.35 Port=3893 Transport=None AllowInsecure=true SearchBase=dc=zb,dc=local`
**Branching:** scadaproj artifacts on the current `docs/shared-glauth-standardization` branch. Per-app config edits on a `feat/shared-glauth` branch in each app repo (ScadaBridge, OtOpcUa). windev edits are deployment-only (`.bak` backups), repo templates optionally aligned. Merge on the user's go.
**Operational caveats (read first):**
- **`10.100.0.35` access is currently blocked from this Mac** (SSH refused; windev→35 jump prohibited). **Task 4 is a hard gate** — it needs either this Mac's key re-authorized on 35 *or* the user to run the `docker compose up`. The artifact is portable.
- Tasks that recreate running clusters (ScadaBridge, OtOpcUa) and touch the **live windev host** are operational; their "tests" are `ldapsearch`/`curl`/browser checks with exact expected output. Sequence cluster recreates seed-first to avoid Akka split-brain.
---
## Phase 0 — Author + deploy the shared GLAuth
### Task 0: Write the merged GLAuth `config.toml`
**Classification:** standard
**Estimated implement time:** ~4 min
**Parallelizable with:** Task 1, Task 2
**Files:**
- Create: `/Users/dohertj2/Desktop/scadaproj/infra/glauth/config.toml`
**Step 1: Write the file** with this exact content (merged `dc=zb,dc=local` directory; gid families partitioned; `multi-role` is in every group):
```toml
[ldap]
enabled = true
listen = "0.0.0.0:3893"
[ldaps]
enabled = false
[backend]
datastore = "config"
baseDN = "dc=zb,dc=local"
[behaviors]
# Dev: do not lock out on failed binds (avoids surprises during testing).
LimitFailedBinds = false
# ── Groups ───────────────────────────────────────────────────────────
# ScadaBridge role groups (55xx) — DB-mapped (LdapGroupMappings)
[[groups]]
name = "SCADA-Admins"
gidnumber = 5501
[[groups]]
name = "SCADA-Designers"
gidnumber = 5502
[[groups]]
name = "SCADA-Deploy-All"
gidnumber = 5503
[[groups]]
name = "SCADA-Deploy-SiteA"
gidnumber = 5504
[[groups]]
name = "SCADA-Viewers"
gidnumber = 5505
# OPC-UA permission groups (560x) — OtOpcUa + MxGateway OPC write model
[[groups]]
name = "ReadOnly"
gidnumber = 5601
[[groups]]
name = "WriteOperate"
gidnumber = 5602
[[groups]]
name = "WriteTune"
gidnumber = 5603
[[groups]]
name = "WriteConfigure"
gidnumber = 5604
[[groups]]
name = "AlarmAck"
gidnumber = 5605
# MxGateway dashboard groups (561x) — config-mapped (GroupToRole)
[[groups]]
name = "GwAdmin"
gidnumber = 5610
[[groups]]
name = "GwReader"
gidnumber = 5611
# OtOpcUa AdminUI role groups (57xx) — DB-mapped (LdapGroupRoleMapping)
[[groups]]
name = "OtOpcUa-Admins"
gidnumber = 5701
[[groups]]
name = "OtOpcUa-Designers"
gidnumber = 5702
[[groups]]
name = "OtOpcUa-Viewers"
gidnumber = 5703
# ── Users ────────────────────────────────────────────────────────────
# All passwords are "password" except serviceaccount ("serviceaccount123").
# sha256("password") = 5e884898da28047151d0e56f8dc6292773603d0d6aabbdd62a11ef721d1542d8
# sha256("serviceaccount123") = af29d0e5c9801ae98a999ed3915e1cf428a64b4b62b3cf221b6336cce0398419
# The single bind account every app uses (search-then-bind).
[[users]]
name = "serviceaccount"
uidnumber = 5999
primarygroup = 5601
passsha256 = "af29d0e5c9801ae98a999ed3915e1cf428a64b4b62b3cf221b6336cce0398419"
[[users.capabilities]]
action = "search"
object = "*"
# Cross-app: member of EVERY group → all roles in all three apps.
[[users]]
name = "multi-role"
givenname = "Multi"
sn = "Role"
mail = "multi-role@zb.local"
uidnumber = 5005
primarygroup = 5501
othergroups = [5502, 5503, 5504, 5505, 5601, 5602, 5603, 5604, 5605, 5610, 5611, 5701, 5702, 5703]
passsha256 = "5e884898da28047151d0e56f8dc6292773603d0d6aabbdd62a11ef721d1542d8"
# Administrator everywhere (admin-equivalent of each app).
[[users]]
name = "admin"
uidnumber = 5001
primarygroup = 5501
othergroups = [5610, 5701]
passsha256 = "5e884898da28047151d0e56f8dc6292773603d0d6aabbdd62a11ef721d1542d8"
# ScadaBridge single-role testers
[[users]]
name = "designer"
uidnumber = 5002
primarygroup = 5502
passsha256 = "5e884898da28047151d0e56f8dc6292773603d0d6aabbdd62a11ef721d1542d8"
[[users]]
name = "deployer"
uidnumber = 5003
primarygroup = 5503
passsha256 = "5e884898da28047151d0e56f8dc6292773603d0d6aabbdd62a11ef721d1542d8"
[[users]]
name = "site-deployer"
uidnumber = 5004
primarygroup = 5504
passsha256 = "5e884898da28047151d0e56f8dc6292773603d0d6aabbdd62a11ef721d1542d8"
# MxGateway dashboard Viewer tester
[[users]]
name = "gwreader"
uidnumber = 5106
primarygroup = 5611
passsha256 = "5e884898da28047151d0e56f8dc6292773603d0d6aabbdd62a11ef721d1542d8"
# OPC-UA permission testers
[[users]]
name = "readonly"
uidnumber = 5101
primarygroup = 5601
passsha256 = "5e884898da28047151d0e56f8dc6292773603d0d6aabbdd62a11ef721d1542d8"
[[users]]
name = "writeop"
uidnumber = 5102
primarygroup = 5602
passsha256 = "5e884898da28047151d0e56f8dc6292773603d0d6aabbdd62a11ef721d1542d8"
[[users]]
name = "writetune"
uidnumber = 5103
primarygroup = 5603
passsha256 = "5e884898da28047151d0e56f8dc6292773603d0d6aabbdd62a11ef721d1542d8"
[[users]]
name = "writeconfig"
uidnumber = 5104
primarygroup = 5604
passsha256 = "5e884898da28047151d0e56f8dc6292773603d0d6aabbdd62a11ef721d1542d8"
[[users]]
name = "alarmack"
uidnumber = 5105
primarygroup = 5605
passsha256 = "5e884898da28047151d0e56f8dc6292773603d0d6aabbdd62a11ef721d1542d8"
# OtOpcUa single-role testers (admin covers OtOpcUa-Admins)
[[users]]
name = "otdesigner"
uidnumber = 5202
primarygroup = 5702
passsha256 = "5e884898da28047151d0e56f8dc6292773603d0d6aabbdd62a11ef721d1542d8"
[[users]]
name = "otviewer"
uidnumber = 5203
primarygroup = 5703
passsha256 = "5e884898da28047151d0e56f8dc6292773603d0d6aabbdd62a11ef721d1542d8"
```
**Step 2: Verify TOML parses** (sanity, no network):
Run: `python3 -c "import tomllib,sys; tomllib.load(open('/Users/dohertj2/Desktop/scadaproj/infra/glauth/config.toml','rb')); print('OK')"`
Expected: `OK`
---
### Task 1: Write the GLAuth `docker-compose.yml`
**Classification:** small
**Estimated implement time:** ~2 min
**Parallelizable with:** Task 0, Task 2
**Files:**
- Create: `/Users/dohertj2/Desktop/scadaproj/infra/glauth/docker-compose.yml`
**Step 1: Write** (single service, bind-mount the config read-only, publish 3893 on all interfaces so cross-host clients reach it):
```yaml
# Shared dev GLAuth for OtOpcUa + MxAccessGateway + ScadaBridge.
# Deploy on the shared Docker host 10.100.0.35: docker compose up -d
# Verify: ldapsearch -x -H ldap://10.100.0.35:3893 \
# -D cn=serviceaccount,dc=zb,dc=local -w serviceaccount123 \
# -b dc=zb,dc=local "(cn=multi-role)" memberOf
name: zb-shared-glauth
services:
glauth:
image: glauth/glauth:latest
container_name: zb-shared-glauth
restart: unless-stopped
ports:
- "3893:3893"
volumes:
- ./config.toml:/app/config/config.cfg:ro
```
---
### Task 2: Write the `README.md` (deploy + verify runbook)
**Classification:** small
**Estimated implement time:** ~3 min
**Parallelizable with:** Task 0, Task 1
**Files:**
- Create: `/Users/dohertj2/Desktop/scadaproj/infra/glauth/README.md`
**Step 1: Write** a runbook covering: purpose (shared dev directory for all 3 apps); the merged directory's group families + gid ranges + the canonical users (`multi-role`/`admin`/`serviceaccount` + per-role testers); **deploy on `10.100.0.35`** (`scp -r infra/glauth dohertj2@10.100.0.35:~/zb-glauth && ssh dohertj2@10.100.0.35 'cd ~/zb-glauth && docker compose up -d'`) with the note that this Mac's SSH access to 35 must be working (else the user runs it); and the **verification** `ldapsearch` commands (bind `serviceaccount`, confirm `multi-role`'s `memberOf` spans all four families; bind each tester). Include the "to add a user/group, edit `config.toml` and `docker compose up -d --force-recreate` (the single-file bind-mount needs a recreate, not a restart)" gotcha.
---
### Task 3: Commit Phase 0 artifacts
**Classification:** trivial
**Estimated implement time:** ~1 min
**Parallelizable with:** none
**Files:** (commit only) — `/Users/dohertj2/Desktop/scadaproj/infra/glauth/*`
**Step 1:** From `/Users/dohertj2/Desktop/scadaproj` (already on `docs/shared-glauth-standardization`):
```bash
git add infra/glauth/config.toml infra/glauth/docker-compose.yml infra/glauth/README.md
git commit -m "feat(glauth): merged shared dev GLAuth directory + compose + runbook (10.100.0.35)"
```
---
### Task 4: Deploy to `10.100.0.35` and verify the directory ⟵ HARD GATE / ACCESS-PREREQUISITE
**Classification:** high-risk
**Estimated implement time:** ~5 min (blocked on 35 access)
**Parallelizable with:** none
**Files:** none (operational)
**Step 1: Resolve access.** Confirm `ssh dohertj2@10.100.0.35 'echo ok'` works. If it does NOT (currently the case from this Mac), STOP and either (a) have the user re-authorize this Mac's key on 35, or (b) hand the user `infra/glauth/` + the deploy command to run on 35. Do not proceed past this gate until GLAuth is up on 35.
**Step 2: Deploy** (once access works). Copy the FILES into the dest dir (not the dir itself) so a
re-deploy doesn't nest them at `~/zb-glauth/glauth/` (the `scp -r dir-into-existing-dir` trap):
```bash
ssh dohertj2@10.100.0.35 'mkdir -p ~/zb-glauth'
scp /Users/dohertj2/Desktop/scadaproj/infra/glauth/config.toml \
/Users/dohertj2/Desktop/scadaproj/infra/glauth/docker-compose.yml \
dohertj2@10.100.0.35:~/zb-glauth/
ssh dohertj2@10.100.0.35 'cd ~/zb-glauth && docker compose up -d --force-recreate && docker ps --filter name=zb-shared-glauth'
```
Expected: `zb-shared-glauth` container `Up`.
**Step 3 (test): Verify the directory** from the Mac via a throwaway ldap client:
```bash
docker run --rm alpine:3.20 sh -c 'apk add --no-progress -q openldap-clients >/dev/null 2>&1 && \
ldapsearch -x -H ldap://10.100.0.35:3893 -D "cn=serviceaccount,dc=zb,dc=local" -w serviceaccount123 \
-b "dc=zb,dc=local" "(cn=multi-role)" memberOf'
```
Expected: `result: 0 Success` and `memberOf` listing all four families — `SCADA-*`, `ReadOnly/Write*/AlarmAck`, `GwAdmin/GwReader`, `OtOpcUa-*`.
**Step 4 (test): Confirm a user binds with `password`:**
```bash
docker run --rm alpine:3.20 sh -c 'apk add --no-progress -q openldap-clients >/dev/null 2>&1 && \
ldapsearch -x -H ldap://10.100.0.35:3893 -D "cn=multi-role,dc=zb,dc=local" -w password \
-b "dc=zb,dc=local" "(cn=multi-role)" cn 2>&1 | grep -i "result:"'
```
Expected: `result: 50 Insufficient access` (bind OK — search denied because multi-role lacks the search capability; a *bad* password would give `result: 49`).
---
## Phase 1 — ScadaBridge repoint (Mac docker)
### Task 5: Repoint the 4 ScadaBridge central-node configs
**Classification:** standard
**Estimated implement time:** ~3 min
**Parallelizable with:** Task 10, Task 11, Task 14, Task 15 (different repos/hosts)
**Files (4 identical edits):**
- Modify: `/Users/dohertj2/Desktop/ScadaBridge/docker/central-node-a/appsettings.Central.json` (`Ldap` block ~lines 2532)
- Modify: `/Users/dohertj2/Desktop/ScadaBridge/docker/central-node-b/appsettings.Central.json`
- Modify: `/Users/dohertj2/Desktop/ScadaBridge/docker-env2/central-node-a/appsettings.Central.json`
- Modify: `/Users/dohertj2/Desktop/ScadaBridge/docker-env2/central-node-b/appsettings.Central.json`
**Step 1:** In each file's `Ldap` block, change three keys (leave `Port`, `Transport`, `AllowInsecure`, `SearchBase` as-is — already `3893`/`None`/`true`/`dc=zb,dc=local`):
- `"Server": "scadabridge-ldap"``"Server": "10.100.0.35"`
- `"ServiceAccountDn": "cn=admin,dc=zb,dc=local"``"ServiceAccountDn": "cn=serviceaccount,dc=zb,dc=local"`
- `"ServiceAccountPassword": "password"``"ServiceAccountPassword": "serviceaccount123"`
**Step 2 (test): Confirm all four files updated:**
Run: `grep -l '"Server": "10.100.0.35"' /Users/dohertj2/Desktop/ScadaBridge/docker*/central-node-*/appsettings.Central.json | wc -l`
Expected: `4`
---
### Task 6: Retire the `scadabridge-ldap` service + prove OrbStack→35 reachability
**Classification:** small
**Estimated implement time:** ~3 min
**Parallelizable with:** Task 5
**Files:**
- Modify: `/Users/dohertj2/Desktop/ScadaBridge/infra/docker-compose.yml` (the `ldap:` service, lines ~4450)
**Step 1 (test FIRST — the linchpin): Verify a container on `scadabridge-net` can reach `10.100.0.35:3893`** before retiring anything:
```bash
docker run --rm --network scadabridge-net alpine:3.20 sh -c \
'apk add --no-progress -q openldap-clients >/dev/null 2>&1 && \
ldapsearch -x -H ldap://10.100.0.35:3893 -D "cn=serviceaccount,dc=zb,dc=local" -w serviceaccount123 -b "dc=zb,dc=local" "(cn=admin)" cn 2>&1 | grep -i "result:"'
```
Expected: `result: 0 Success`. **If unreachable, STOP** — fix networking (OrbStack→LAN) before repointing; do not retire the local glauth.
**Step 2:** Comment out (do not delete — keep for rollback) the `ldap:` service block in `infra/docker-compose.yml`. Stop the old container: `docker stop scadabridge-ldap`. (Leave it stopped, not removed, until Phase 4.)
---
### Task 7: Recreate the `:9000` cluster central nodes + browser-verify
**Classification:** high-risk
**Estimated implement time:** ~5 min (operational)
**Parallelizable with:** none
**Files:** none (operational)
**Step 1:** Recreate the two central nodes to pick up the new config (seed-first to avoid split-brain — recreate `central-a`, wait healthy, then `central-b`):
```bash
cd /Users/dohertj2/Desktop/ScadaBridge/docker && docker compose up -d --force-recreate --no-deps central-node-a
# wait until central-a is serving, then:
docker compose up -d --force-recreate --no-deps central-node-b
```
**Step 2 (test): Token endpoint shows all four roles** (re-runs the full LDAP auth against 35):
```bash
curl -s -m10 -X POST http://localhost:9000/auth/token --data-urlencode username=multi-role --data-urlencode password=password
```
Expected JSON contains `"roles":["Administrator","Designer","Deployer","Viewer"]`.
**Step 3 (test): Browser** (Chrome macbook) — sign out, log in `multi-role`/`password` at `http://localhost:9000/login`; expect the dashboard with ADMIN + DESIGN + DEPLOYMENT nav sections.
---
### Task 8: Recreate the `:9100` cluster central nodes + verify
**Classification:** high-risk
**Estimated implement time:** ~5 min (operational)
**Parallelizable with:** none
**Files:** none (operational)
**Step 1:** As Task 7 but in `/Users/dohertj2/Desktop/ScadaBridge/docker-env2` (recreate `central-node-a` then `-b`).
**Step 2 (test):** `curl -s -m10 -X POST http://localhost:9100/auth/token --data-urlencode username=multi-role --data-urlencode password=password``"roles":["Administrator","Designer","Deployer","Viewer"]`.
---
### Task 9: Commit ScadaBridge edits on a branch
**Classification:** trivial
**Estimated implement time:** ~1 min
**Parallelizable with:** none
**Files:** (commit) the 4 central-node json + `infra/docker-compose.yml`
**Step 1:**
```bash
cd /Users/dohertj2/Desktop/ScadaBridge && git checkout -b feat/shared-glauth
git add docker/central-node-*/appsettings.Central.json docker-env2/central-node-*/appsettings.Central.json infra/docker-compose.yml
git commit -m "feat(auth): point dev clusters at shared GLAuth 10.100.0.35; retire local scadabridge-ldap"
```
(Do not merge/push — wait for the user's go.)
---
## Phase 2 — OtOpcUa docker-dev un-stub (Mac docker)
### Task 10: Confirm group-key shape, then add `LdapGroupRoleMapping` seed rows
**Classification:** standard
**Estimated implement time:** ~4 min
**Parallelizable with:** Task 5, Task 14, Task 15
**Files:**
- Modify: `/Users/dohertj2/Desktop/OtOpcUa/docker-dev/seed/seed-clusters.sql`
- Read (gate): `/Users/dohertj2/Desktop/scadaproj/ZB.MOM.WW.Auth/src/ZB.MOM.WW.Auth.Ldap/LdapAuthService.cs`
**Step 1 (gate): Confirm the runtime group string is the bare RDN** (`OtOpcUa-Admins`), not a full DN. Read `LdapAuthService.cs` and find where it builds the returned `Groups` from `memberOf`; confirm it strips each DN to its first RDN *value*. Cross-check: ScadaBridge's DB mappings use bare `SCADA-Admins` and work today against the same glauth `groupformat=ou` (so memberOf is `ou=SCADA-Admins,...` → returned as `SCADA-Admins`). Conclusion to lock: seed `LdapGroup = 'OtOpcUa-Admins'` (bare). If the code instead returns full DNs, STOP and seed the full DN form — but the evidence says bare.
**Step 2:** Append idempotent INSERTs to `seed-clusters.sql` (table `dbo.LdapGroupRoleMapping`; `Role` stored as the enum NAME string; system-wide rows ⇒ `ClusterId = NULL`, `IsSystemWide = 1`):
```sql
-- Shared-GLAuth dev: OtOpcUa AdminUI role mappings (system-wide).
-- Group keys are the BARE RDN names the shared ZB.MOM.WW.Auth.Ldap returns.
IF NOT EXISTS (SELECT 1 FROM dbo.LdapGroupRoleMapping WHERE LdapGroup = 'OtOpcUa-Admins' AND ClusterId IS NULL)
INSERT INTO dbo.LdapGroupRoleMapping (Id, LdapGroup, Role, ClusterId, IsSystemWide, CreatedAtUtc, Notes)
VALUES (NEWID(), 'OtOpcUa-Admins', 'Administrator', NULL, 1, SYSUTCDATETIME(), 'shared-glauth dev seed');
IF NOT EXISTS (SELECT 1 FROM dbo.LdapGroupRoleMapping WHERE LdapGroup = 'OtOpcUa-Designers' AND ClusterId IS NULL)
INSERT INTO dbo.LdapGroupRoleMapping (Id, LdapGroup, Role, ClusterId, IsSystemWide, CreatedAtUtc, Notes)
VALUES (NEWID(), 'OtOpcUa-Designers', 'Designer', NULL, 1, SYSUTCDATETIME(), 'shared-glauth dev seed');
IF NOT EXISTS (SELECT 1 FROM dbo.LdapGroupRoleMapping WHERE LdapGroup = 'OtOpcUa-Viewers' AND ClusterId IS NULL)
INSERT INTO dbo.LdapGroupRoleMapping (Id, LdapGroup, Role, ClusterId, IsSystemWide, CreatedAtUtc, Notes)
VALUES (NEWID(), 'OtOpcUa-Viewers', 'Viewer', NULL, 1, SYSUTCDATETIME(), 'shared-glauth dev seed');
```
---
### Task 11: Un-stub the OtOpcUa docker-dev host containers
**Classification:** standard
**Estimated implement time:** ~5 min
**Parallelizable with:** Task 5, Task 14, Task 15
**Files:**
- Modify: `/Users/dohertj2/Desktop/OtOpcUa/docker-dev/docker-compose.yml` (the 6 admin/site containers: `admin-a` ~L100, `admin-b` ~L117, `site-a-1` ~L170, `site-a-2` ~L193, `site-b-1` ~L215, `site-b-2` ~L238)
**Step 1:** In each of the 6 containers' `environment:`, replace the single `Security__Ldap__DevStubMode: "true"` line with the real-LDAP block:
```yaml
Security__Ldap__Enabled: "true"
Security__Ldap__DevStubMode: "false"
Security__Ldap__Server: "10.100.0.35"
Security__Ldap__Port: "3893"
Security__Ldap__Transport: "None"
Security__Ldap__AllowInsecure: "true"
Security__Ldap__SearchBase: "dc=zb,dc=local"
Security__Ldap__ServiceAccountDn: "cn=serviceaccount,dc=zb,dc=local"
Security__Ldap__ServiceAccountPassword: "serviceaccount123"
```
(Driver-only `driver-a`/`driver-b` have no LDAP block — leave them.)
**Step 2 (test): Confirm 6 containers updated, 0 DevStub left:**
Run: `grep -c 'Security__Ldap__Server: "10.100.0.35"' /Users/dohertj2/Desktop/OtOpcUa/docker-dev/docker-compose.yml``6`; and `grep -c 'DevStubMode: "true"' …/docker-compose.yml``0`.
---
### Task 12: Apply seed + recreate `otopcua-dev` + browser-verify
**Classification:** high-risk
**Estimated implement time:** ~5 min (operational)
**Parallelizable with:** none
**Files:** none (operational)
**Step 1: Apply the new mapping rows to the running config DB** (host port 14330):
```bash
docker exec otopcua-dev-sql-1 /opt/mssql-tools18/bin/sqlcmd -S localhost -U sa -P 'OtOpcUa!Dev123' -No -d OtOpcUa -Q "$(sed -n '/OtOpcUa-Admins/,/shared-glauth dev seed.);/p' /Users/dohertj2/Desktop/OtOpcUa/docker-dev/seed/seed-clusters.sql)"
```
(or simpler: re-run the seed container `docker compose -f docker-dev/docker-compose.yml up cluster-seed`). Verify: `… -Q "SELECT LdapGroup,Role FROM dbo.LdapGroupRoleMapping WHERE IsSystemWide=1"` → the 3 OtOpcUa-* rows.
**Step 2: Recreate the 6 admin/site host containers** (seed-first per cluster — recreate `admin-a` then `admin-b`; `site-a-1` then `site-a-2`; `site-b-1` then `site-b-2`):
```bash
cd /Users/dohertj2/Desktop/OtOpcUa/docker-dev
for n in admin-a admin-b site-a-1 site-a-2 site-b-1 site-b-2; do docker compose up -d --force-recreate --no-deps $n; sleep 3; done
```
**Step 3 (test): Browser** — log in `multi-role`/`password` at `http://localhost:9200/login`; expect the AdminUI Overview, SESSION panel showing `multi-role` + **Administrator** (from `OtOpcUa-Admins`→Administrator). Confirms the un-stub + real bind + DB mapping all work.
---
### Task 13: Commit OtOpcUa edits on a branch
**Classification:** trivial
**Estimated implement time:** ~1 min
**Parallelizable with:** none
**Files:** (commit) `docker-dev/docker-compose.yml`, `docker-dev/seed/seed-clusters.sql`
**Step 1:**
```bash
cd /Users/dohertj2/Desktop/OtOpcUa && git checkout -b feat/shared-glauth
git add docker-dev/docker-compose.yml docker-dev/seed/seed-clusters.sql
git commit -m "feat(auth): un-stub docker-dev onto shared GLAuth 10.100.0.35 + seed OtOpcUa-* role mappings"
```
(Do not merge/push.)
---
## Phase 3 — windev repoint + retire windev-local glauth (live host)
### Task 14: Repoint MxGateway (windev) at the shared GLAuth
**Classification:** high-risk
**Estimated implement time:** ~5 min (operational, live host)
**Parallelizable with:** Task 5, Task 10, Task 11
**Files:** (windev, deployment-only) `C:\publish\mxaccessgw\Server\appsettings.json` (`MxGateway:Ldap`)
**Step 1: Back up** `appsettings.json``appsettings.json.bak-20260604-glauth35` (skip if exists).
**Step 2: Edit `MxGateway:Ldap`** (literal replacements; preserve the rest, incl. the `Transport=None`/`AllowInsecure=true` migrated 2026-06-04, and `GroupToRole`):
- `"Server": "localhost"``"Server": "10.100.0.35"`
- `"SearchBase": "dc=lmxopcua,dc=local"``"SearchBase": "dc=zb,dc=local"`
- `"ServiceAccountDn": "cn=serviceaccount,dc=lmxopcua,dc=local"``"ServiceAccountDn": "cn=serviceaccount,dc=zb,dc=local"`
(`ServiceAccountPassword` stays `serviceaccount123`.) Use a `-File` PowerShell script (`[IO.File]::WriteAllText` after `.Replace(...)`), validate JSON parses.
**Step 3:** `Restart-Service MxAccessGw -Force; Start-Service OtOpcUa` (cascades to the dependent OtOpcUa svc — start it back).
**Step 4 (test):** From the Mac, `POST http://10.100.0.48:5130/auth/login` (GET `/login` for the antiforgery token+cookie first) with `username=multi-role&password=password``302 Location: /` (success). Browser-verify the dashboard logs in as `multi-role` (Administrator).
---
### Task 15: Repoint OtOpcUa (windev service) + switch transport to plaintext
**Classification:** high-risk
**Estimated implement time:** ~5 min (operational, live host)
**Parallelizable with:** Task 5, Task 10, Task 11
**Files:** (windev, deployment-only) `C:\publish\lmxopcua\appsettings.json` (`Security:Ldap`) — **discover any per-role overlay first** (`appsettings.admin.json`/`appsettings.driver.json` in `C:\publish\lmxopcua\` or `C:\publish\lmxopcua-admin\`; the live binary is `C:\publish\lmxopcua\OtOpcUa.Server.exe`).
**Step 1: Discovery** — `Get-ChildItem C:\publish\lmxopcua\appsettings*.json` and inspect which file holds the live `Security:Ldap` (base + any `appsettings.admin.json` overlay that sets `Transport=Ldaps`). Back up whatever you edit.
**Step 2: Edit `Security:Ldap`** in the live config (and the admin overlay if present):
- `Server``10.100.0.35`; `SearchBase``dc=zb,dc=local`; `Transport` `Ldaps``None`; add/set `AllowInsecure` `true`; `ServiceAccountDn``cn=serviceaccount,dc=zb,dc=local`, `ServiceAccountPassword``serviceaccount123`; ensure `DevStubMode=false`.
**Step 3:** `Restart-Service OtOpcUa` (note the dependency direction: `MxAccessGw` depends on `OtOpcUa` — restarting OtOpcUa may require `-Force` and a follow-up `Start-Service MxAccessGw`; verify both Running).
**Step 4 (test):** Browser-verify the windev OtOpcUa AdminUI logs in as `multi-role` → Administrator. (Locate its dashboard URL during discovery.)
---
### Task 16: Stop/disable the windev-local glauth
**Classification:** small
**Estimated implement time:** ~2 min (operational)
**Parallelizable with:** none
**Files:** none (windev service)
**Step 1 (only after Tasks 14 + 15 verify green):** `Stop-Service glauth; Set-Service glauth -StartupType Manual` (disable autostart but keep installed for rollback). Keep `C:\publish\glauth\glauth.cfg` + the `glauth.cfg.bak-multirole-20260604` backup in place.
**Step 2 (test):** Re-run Task 14/15 logins once more to confirm windev auth still works with the local glauth down (proves they're truly on 35).
---
## Phase 4 — Final verification + housekeeping
### Task 17: Full cross-app verification matrix
**Classification:** high-risk
**Estimated implement time:** ~5 min (operational)
**Parallelizable with:** none
**Files:** none (operational)
**Step 1 (positive):** `multi-role`/`password` logs in on all five surfaces — ScadaBridge `:9000` + `:9100` (4 roles via `/auth/token`), OtOpcUa `:9200` (Administrator), MxGateway `10.100.0.48:5130` (Administrator), windev OtOpcUa.
**Step 2 (role-gating):** `gwreader`/`password` → MxGateway dashboard **Viewer-only** (no API-Keys/Settings admin pages); `designer`/`password` → ScadaBridge design nav but not ADMIN; `otviewer`/`password` → OtOpcUa read-only.
**Step 3 (negative):** wrong password rejected on every surface; a `SCADA-*`-only user (`designer`) gets **denied** on the MxGateway dashboard (no `Gw*` group). Record each result.
---
### Task 18: Update memory, design status, and finalize branches
**Classification:** small
**Estimated implement time:** ~4 min
**Parallelizable with:** none
**Files:**
- Update memory: `multi-role-cross-app-test-user.md` (now backed by the shared 35 GLAuth), `mxgateway-windev-deploy.md` + `scadabridge-local-deploy-gotchas.md` (repointed to 35), add a new `shared-glauth-on-35.md` (the directory layout, gid families, deploy/verify runbook, access caveat) + `MEMORY.md` index lines.
- Update: design doc status → "implemented".
- (Optional) align repo template appsettings (MxGateway/ScadaBridge) on the `feat/shared-glauth` branches so a clean redeploy doesn't reintroduce old keys.
**Step 1:** Write the memory updates. **Step 2:** Mark the design doc implemented. **Step 3:** Summarize branch state (scadaproj `docs/shared-glauth-standardization`; app `feat/shared-glauth` branches committed, not merged) and ask the user about merging.
---
## Execution notes
- **Phases 1, 2, 3 are independent** after Task 4 (different repos/hosts) — their first tasks (5, 10/11, 14/15) are mutually `Parallelizable`. Within a phase, recreate/verify tasks are sequential.
- Old glauths stay up until Tasks 6/16; every repoint is reversible by reverting the one-line `Server` change and recreating/restarting.
- Several tasks are **operational** (recreate clusters, live windev, the 35 deploy) — not code-with-unit-tests; their "tests" are the exact `ldapsearch`/`curl`/browser checks given.
@@ -0,0 +1,25 @@
{
"planPath": "docs/plans/2026-06-04-shared-glauth-standardization.md",
"tasks": [
{"id": 0, "subject": "Task 0: Write merged GLAuth config.toml", "status": "completed"},
{"id": 1, "subject": "Task 1: Write GLAuth docker-compose.yml", "status": "completed"},
{"id": 2, "subject": "Task 2: Write GLAuth README runbook", "status": "completed"},
{"id": 3, "subject": "Task 3: Commit Phase 0 artifacts", "status": "completed", "blockedBy": [0, 1, 2]},
{"id": 4, "subject": "Task 4: Deploy to 10.100.0.35 + verify directory (GATE)", "status": "completed", "blockedBy": [3]},
{"id": 5, "subject": "Task 5: Repoint 4 ScadaBridge central-node configs", "status": "completed", "blockedBy": [4]},
{"id": 6, "subject": "Task 6: Retire scadabridge-ldap + prove OrbStack->35 reachability", "status": "completed", "blockedBy": [4]},
{"id": 7, "subject": "Task 7: Recreate :9000 central nodes + browser-verify", "status": "completed", "blockedBy": [5, 6]},
{"id": 8, "subject": "Task 8: Recreate :9100 central nodes + verify", "status": "completed", "blockedBy": [7]},
{"id": 9, "subject": "Task 9: Commit ScadaBridge edits on feat/shared-glauth", "status": "completed", "blockedBy": [7, 8]},
{"id": 10, "subject": "Task 10: Confirm group-key shape + seed OtOpcUa-* mappings", "status": "completed", "blockedBy": [4]},
{"id": 11, "subject": "Task 11: Un-stub OtOpcUa docker-dev host containers", "status": "completed", "blockedBy": [4]},
{"id": 12, "subject": "Task 12: Apply seed + recreate otopcua-dev + verify", "status": "completed", "blockedBy": [10, 11]},
{"id": 13, "subject": "Task 13: Commit OtOpcUa edits on feat/shared-glauth", "status": "completed", "blockedBy": [12]},
{"id": 14, "subject": "Task 14: Repoint MxGateway (windev) at shared GLAuth", "status": "completed", "blockedBy": [4]},
{"id": 15, "subject": "Task 15: Repoint OtOpcUa (windev) [resolved by discovery: headless OPC server, no LDAP login]", "status": "completed", "blockedBy": [4]},
{"id": 16, "subject": "Task 16: Stop/disable windev-local glauth", "status": "completed", "blockedBy": [14, 15]},
{"id": 17, "subject": "Task 17: Full cross-app verification matrix", "status": "completed", "blockedBy": [7, 8, 12, 14, 15, 16]},
{"id": 18, "subject": "Task 18: Update memory, design status, finalize branches", "status": "completed", "blockedBy": [17, 9, 13]}
],
"lastUpdated": "2026-06-04"
}
@@ -0,0 +1,122 @@
# ZB.MOM.WW.SPHistorianClient — Design
**Date:** 2026-06-19
**Status:** Approved — proceeding to implementation plan.
## Goal
Repackage the proven, pure-managed .NET 10 `AVEVA.Historian.Client` SDK (delivered in
`HistorianSDK_2023R2/histsdk-migration.zip` from `10.100.0.48`) as the family-branded shared
library **`ZB.MOM.WW.SPHistorianClient`** (System Platform Historian Client), following the same
conventions as the other `ZB.MOM.WW.*` shared libraries in this repo.
## Context — what the source bundle contains
`histsdk-migration.zip``histsdk-migration/`:
- `histsdk/` — the SDK git repo. `src/AVEVA.Historian.Client/` is a **pure-managed .NET 10** client
for AVEVA Historian (no `aahClientManaged.dll` / `aahClient.dll` / native AVEVA runtime — the wire
protocol is reverse-engineered and re-implemented in C#). ~165188 unit + gated-live tests pass.
- `analysis-2023r2/` — reverse-engineering analysis (recovered protos, decompiled stock contract,
transport writeup). **Kept separate from the repo on purpose.**
Two transport families exist in the SDK:
| Transport | Protocol | Platform | Verification |
|---|---|---|---|
| `LocalPipe`, `RemoteTcpIntegrated`, `RemoteTcpCertificate` | WCF/MDAS (2020) | **Windows-only** | **live-verified**: raw/aggregate(16 modes)/at-time/event reads, browse, metadata, status, `EnsureTag`/`DeleteTag` |
| `RemoteGrpc` | gRPC (2023 R2) | cross-platform (Grpc.Net.Client/.Web) | unit-tested; **not yet live-verified** against a real 2023 R2 server (`ExchangeKey` auth step unproven) |
## Decisions (locked)
1. **Approach: port + rebrand.** Copy the SDK source into `ZB.MOM.WW.SPHistorianClient`, rename the
root namespace, adopt ZB conventions, bring the unit tests, drop non-shippable artifacts. One
coherent shared library — a published package should not ship a third-party (AVEVA) namespace or
non-redistributable reverse-engineering artifacts.
2. **Transports: both WCF + gRPC.** Ship everything that works. WCF members keep
`[SupportedOSPlatform("windows")]`; the gRPC path runs anywhere. No working code discarded.
3. **Not a "component normalization."** There is no duplicated historian code across the three apps
to converge — this is a net-new shared library that simply follows ZB packaging conventions.
## Repository layout
Plain files committed into this repo (NOT a nested git repo — see the
`shared-libs-are-plain-files-not-nested-repos` convention):
```
ZB.MOM.WW.SPHistorianClient/
Directory.Build.props # net10.0, Nullable, ImplicitUsings, LangVersion latest, Version 0.1.0, central pkg mgmt
Directory.Packages.props # central PackageVersion entries
ZB.MOM.WW.SPHistorianClient.slnx
CLAUDE.md README.md .gitignore
src/ZB.MOM.WW.SPHistorianClient/ # the single package
HistorianClient.cs, HistorianClientOptions.cs, HistorianTransport.cs
Models/ Protocol/ Transport/ Wcf/ Wcf/Contracts/ Grpc/ Grpc/Protos/*.proto
DependencyInjection/AddZbSpHistorianClient (ZB-idiomatic DI extension)
tests/ZB.MOM.WW.SPHistorianClient.Tests/ # offline unit/golden-byte + gated-live integration
artifacts/ # dotnet pack output
```
## Port mechanics
- Copy `src/AVEVA.Historian.Client/` and `tests/AVEVA.Historian.Client.Tests/` from the bundle.
- Rename the C# root namespace `AVEVA.Historian.Client``ZB.MOM.WW.SPHistorianClient` across all
files: 74 `namespace` declarations spanning the root + 6 sub-namespaces
(`.Models`, `.Wcf`, `.Wcf.Contracts`, `.Protocol`, `.Transport`, `.Grpc`), all `using` directives,
and the `InternalsVisibleTo` to the test assembly. Drop the `InternalsVisibleTo` to
`AVEVA.Historian.ReverseEngineering` (tool not shipped).
- **Leave the proto wire contracts untouched:** the 6 `Grpc/Protos/*.proto` keep
`option csharp_namespace = "ArchestrA.Grpc.Contract.*"` — that is AVEVA's wire contract, not ours.
`Grpc.Tools` keeps generating the client stubs at build.
- Convert inline `PackageReference` versions to central management in `Directory.Packages.props`,
matching the `ZB.MOM.WW.Telemetry` template.
## Dependencies
- **Library:** `Google.Protobuf`, `Grpc.Net.Client`, `Grpc.Net.Client.Web`, `Grpc.Tools` (build-only,
`PrivateAssets=all`), `System.ServiceModel.NetNamedPipe`, `System.ServiceModel.NetTcp`,
`System.Security.Cryptography.Xml`. Add `Microsoft.Extensions.DependencyInjection.Abstractions` +
`Microsoft.Extensions.Options` for the DI extension.
- **Tests:** `xunit`, `xunit.runner.visualstudio`, `Microsoft.NET.Test.Sdk`, `coverlet.collector`,
`Microsoft.Data.SqlClient` (SQL post-check tests).
## Excluded (safety / non-redistributable / Windows-native)
- `tools/` reverse-engineering harnesses (.NET Framework, reference native AVEVA binaries).
- `analysis-2023r2/decompiled/` — proprietary AVEVA decompilations (not redistributable).
- `scripts/` — Frida / PowerShell / Python capture tooling.
- `docs/reverse-engineering/` — identity-bearing `.ndjson` / capture evidence.
**Kept:** the recovered `.proto` files (needed to build), the offline unit tests, and a sanitized
architecture/surface summary folded into `CLAUDE.md` / `README.md`. `.gitignore` blocks the
identity-bearing patterns (`*.ndjson`, `current/`, `aveva-install-*/`, `artifacts/`-raw, etc.).
## Public surface (preserved 1:1)
`HistorianClient` + `HistorianClientOptions` façade; `Models/*`; `HistorianTransport` enum
(`LocalPipe` / `RemoteTcpIntegrated` / `RemoteTcpCertificate` / `RemoteGrpc`); operations:
`ProbeAsync`, `ReadRawAsync` / `ReadAggregateAsync` / `ReadAtTimeAsync`, `ReadEventsAsync`,
`BrowseTagNamesAsync`, `GetTagMetadataAsync`, status calls, `EnsureTagAsync` / `DeleteTagAsync`.
**One ZB-idiomatic addition:** `AddZbSpHistorianClient(...)` DI extension mirroring `AddZbTelemetry`
— thin: binds `HistorianClientOptions` and registers `HistorianClient`. Optional to consumers.
## Cross-platform & testing posture
- WCF members already carry `[SupportedOSPlatform("windows")]`; the library builds and unit-tests on
macOS/Linux. gRPC path is portable.
- Offline unit/golden-byte tests run anywhere. Live integration tests stay gated by `HISTORIAN_*`
env vars and skip cleanly when unset.
- Verify `dotnet build` + `dotnet test` pass locally (macOS) before finishing.
## Packaging
`dotnet pack -c Release -o ./artifacts``ZB.MOM.WW.SPHistorianClient.0.1.0.nupkg`. Gitea URLs in
package metadata. **Not pushed/published** to any feed unless explicitly requested.
## Out of scope (this pass)
- Wiring `ZB.MOM.WW.SPHistorianClient` into any consumer (e.g. OtOpcUa Phase C HistoryRead) — a
separate follow-on.
- Live-verifying the gRPC `RemoteGrpc` path against a real 2023 R2 server.
- Writing samples (`AddS2`) — architecturally blocked in the source SDK; remains out of scope.
+569
View File
@@ -0,0 +1,569 @@
# ZB.MOM.WW.SPHistorianClient Implementation Plan
> **For Claude:** REQUIRED SUB-SKILL: Use superpowers-extended-cc:executing-plans (or subagent-driven-development) to implement this plan task-by-task.
**Goal:** Repackage the proven, pure-managed .NET 10 `AVEVA.Historian.Client` SDK from the migration bundle as the family-branded shared library `ZB.MOM.WW.SPHistorianClient`, following the same conventions as the other `ZB.MOM.WW.*` libraries in this repo.
**Architecture:** This is a **port + rebrand**, not a rewrite. Copy the SDK `src/` and `tests/` into a new `ZB.MOM.WW.SPHistorianClient/` directory, rewrite the C# root namespace `AVEVA.Historian.Client``ZB.MOM.WW.SPHistorianClient` (leaving the proto-generated `ArchestrA.Grpc.Contract.*` wire contracts untouched), adopt ZB conventions (`Directory.Build.props` / `Directory.Packages.props` central package management, `.slnx`, `CLAUDE.md`/`README.md`), drop the non-shippable reverse-engineering tooling and proprietary decompilations, add one ZB-idiomatic DI extension, then build/test/pack.
**Tech Stack:** .NET 10, C# (net10.0), WCF/MDAS (`System.ServiceModel.*`, Windows-only transports), gRPC (`Grpc.Net.Client` + `Grpc.Tools`, cross-platform 2023 R2 transport), xUnit. Central package management.
**Design doc:** `docs/plans/2026-06-19-sphistorianclient-design.md`
**Branch:** `feat/sphistorianclient` (already created; design doc already committed at `bbb7942`).
---
## Source bundle location (read-only inputs)
The SDK source lives in an extracted bundle under `/tmp`:
- Extracted root: `/tmp/histsdk/extracted/histsdk-migration/histsdk/`
- SDK source: `…/histsdk/src/AVEVA.Historian.Client/` — **74 `.cs` + 6 `.proto`**
- SDK tests: `…/histsdk/tests/AVEVA.Historian.Client.Tests/` — **25 `.cs`**
- Re-extract fallback (if `/tmp` was cleaned): `cd /tmp/histsdk && unzip -o -q histsdk-migration.zip -d extracted`
**Never copy:** `tools/` (RE harnesses, .NET Framework + native AVEVA refs), `analysis-2023r2/decompiled/` (proprietary, non-redistributable), `scripts/`, `docs/reverse-engineering/` (identity-bearing captures), `bin/`/`obj/`, the bundle's `.git/`, and the bundle's original `.csproj` files (we author fresh ZB ones).
**Gotchas baked into this plan (from prior repo experience):**
- Do **not** set `TreatWarningsAsErrors` — the WCF/SSPI code carries `[SupportedOSPlatform("windows")]` and will emit CA platform warnings on macOS that must stay warnings.
- Central package management means **no inline `Version=` on any `PackageReference`** (that is `NU1008`). All versions live in `Directory.Packages.props`.
- `Microsoft.Data.SqlClient` may surface an `NU1903` advisory on restore. Without `TreatWarningsAsErrors` it is a warning. If a restore ever hard-fails on it, add `-p:NuGetAudit=false` to the build/test command.
- macOS `sed -i` requires an explicit empty backup arg: `sed -i '' 's/…/…/g'`.
---
## Task 1: Scaffold the library skeleton
**Classification:** standard
**Estimated implement time:** ~4 min
**Parallelizable with:** none (every later task depends on this)
**Files:**
- Create: `ZB.MOM.WW.SPHistorianClient/Directory.Build.props`
- Create: `ZB.MOM.WW.SPHistorianClient/Directory.Packages.props`
- Create: `ZB.MOM.WW.SPHistorianClient/.gitignore`
- Create: `ZB.MOM.WW.SPHistorianClient/ZB.MOM.WW.SPHistorianClient.slnx`
- Create: `ZB.MOM.WW.SPHistorianClient/src/ZB.MOM.WW.SPHistorianClient/ZB.MOM.WW.SPHistorianClient.csproj`
- Create: `ZB.MOM.WW.SPHistorianClient/tests/ZB.MOM.WW.SPHistorianClient.Tests/ZB.MOM.WW.SPHistorianClient.Tests.csproj`
**Step 1: `Directory.Build.props`** (mirrors `ZB.MOM.WW.Telemetry/Directory.Build.props`)
```xml
<Project>
<PropertyGroup>
<TargetFramework>net10.0</TargetFramework>
<Nullable>enable</Nullable>
<ImplicitUsings>enable</ImplicitUsings>
<LangVersion>latest</LangVersion>
<Version>0.1.0</Version>
<ManagePackageVersionsCentrally>true</ManagePackageVersionsCentrally>
</PropertyGroup>
</Project>
```
**Step 2: `Directory.Packages.props`** (versions lifted verbatim from the bundle's two `.csproj` files)
```xml
<Project>
<PropertyGroup>
<ManagePackageVersionsCentrally>true</ManagePackageVersionsCentrally>
</PropertyGroup>
<ItemGroup>
<!-- Historian SDK runtime deps (WCF/MDAS transports — Windows-only at runtime) -->
<PackageVersion Include="System.Security.Cryptography.Xml" Version="10.0.7" />
<PackageVersion Include="System.ServiceModel.NetNamedPipe" Version="10.0.652802" />
<PackageVersion Include="System.ServiceModel.NetTcp" Version="10.0.652802" />
<!-- 2023 R2 gRPC transport (cross-platform) -->
<PackageVersion Include="Google.Protobuf" Version="3.24.4" />
<PackageVersion Include="Grpc.Net.Client" Version="2.58.0" />
<PackageVersion Include="Grpc.Net.Client.Web" Version="2.58.0" />
<PackageVersion Include="Grpc.Tools" Version="2.59.0" />
<!-- ZB-idiomatic DI extension (only non-BCL lib dependency) -->
<PackageVersion Include="Microsoft.Extensions.DependencyInjection.Abstractions" Version="10.0.7" />
<!-- Test -->
<PackageVersion Include="Microsoft.NET.Test.Sdk" Version="17.14.1" />
<PackageVersion Include="xunit" Version="2.9.3" />
<PackageVersion Include="xunit.runner.visualstudio" Version="3.1.4" />
<PackageVersion Include="coverlet.collector" Version="6.0.4" />
<PackageVersion Include="Microsoft.Data.SqlClient" Version="6.0.2" />
</ItemGroup>
</Project>
```
**Step 3: `.gitignore`**
```gitignore
bin/
obj/
# identity-bearing / non-redistributable — never commit
*.ndjson
current/
aveva-install-*/
```
**Step 4: `ZB.MOM.WW.SPHistorianClient.slnx`**
```xml
<Solution>
<Folder Name="/src/">
<Project Path="src/ZB.MOM.WW.SPHistorianClient/ZB.MOM.WW.SPHistorianClient.csproj" />
</Folder>
<Folder Name="/tests/">
<Project Path="tests/ZB.MOM.WW.SPHistorianClient.Tests/ZB.MOM.WW.SPHistorianClient.Tests.csproj" />
</Folder>
</Solution>
```
**Step 5: `src/ZB.MOM.WW.SPHistorianClient/ZB.MOM.WW.SPHistorianClient.csproj`**
(Derived from the bundle's `AVEVA.Historian.Client.csproj`: inline versions removed for central
management; ZB package metadata added; `InternalsVisibleTo` retargeted to the ZB test assembly and
the `…ReverseEngineering` one dropped; proto glob uses forward slashes for cross-platform MSBuild.)
```xml
<Project Sdk="Microsoft.NET.Sdk">
<PropertyGroup>
<IsPackable>true</IsPackable>
<PackageId>ZB.MOM.WW.SPHistorianClient</PackageId>
<Authors>ZB.MOM.WW</Authors>
<Description>Pure-managed .NET 10 client for AVEVA System Platform Historian (Wonderware) for the ZB.MOM.WW SCADA family. The wire protocol is reverse-engineered and re-implemented in C# — no native AVEVA runtime dependency. Surfaces history reads (raw / aggregate / at-time / event), tag browse + metadata, status, and tag create/delete over the WCF/MDAS transports (Windows) plus a cross-platform gRPC transport for 2023 R2.</Description>
<PackageTags>aveva;wonderware;historian;system-platform;scada;timeseries;grpc;wcf;zb-mom-ww</PackageTags>
<PackageProjectUrl>https://gitea.dohertylan.com/dohertj2/zb-mom-ww-sphistorianclient</PackageProjectUrl>
<RepositoryUrl>https://gitea.dohertylan.com/dohertj2/zb-mom-ww-sphistorianclient</RepositoryUrl>
</PropertyGroup>
<ItemGroup>
<PackageReference Include="System.Security.Cryptography.Xml" />
<PackageReference Include="System.ServiceModel.NetNamedPipe" />
<PackageReference Include="System.ServiceModel.NetTcp" />
<PackageReference Include="Microsoft.Extensions.DependencyInjection.Abstractions" />
</ItemGroup>
<!-- 2023 R2 gRPC transport (RemoteGrpc). Pure-managed: Grpc.Net.Client + Google.Protobuf.
Grpc.Tools is build-only (PrivateAssets=all) and generates the client stubs from the
recovered contract under Grpc/Protos at build. -->
<ItemGroup>
<PackageReference Include="Google.Protobuf" />
<PackageReference Include="Grpc.Net.Client" />
<PackageReference Include="Grpc.Net.Client.Web" />
<PackageReference Include="Grpc.Tools">
<PrivateAssets>all</PrivateAssets>
<IncludeAssets>runtime; build; native; contentfiles; analyzers; buildtransitive</IncludeAssets>
</PackageReference>
</ItemGroup>
<ItemGroup>
<Protobuf Include="Grpc/Protos/*.proto" GrpcServices="Client" ProtoRoot="Grpc/Protos" />
</ItemGroup>
<ItemGroup>
<AssemblyAttribute Include="System.Runtime.CompilerServices.InternalsVisibleToAttribute">
<_Parameter1>ZB.MOM.WW.SPHistorianClient.Tests</_Parameter1>
</AssemblyAttribute>
</ItemGroup>
</Project>
```
**Step 6: `tests/ZB.MOM.WW.SPHistorianClient.Tests/ZB.MOM.WW.SPHistorianClient.Tests.csproj`**
```xml
<Project Sdk="Microsoft.NET.Sdk">
<PropertyGroup>
<IsPackable>false</IsPackable>
</PropertyGroup>
<ItemGroup>
<PackageReference Include="coverlet.collector" />
<PackageReference Include="Microsoft.NET.Test.Sdk" />
<PackageReference Include="Microsoft.Data.SqlClient" />
<PackageReference Include="xunit" />
<PackageReference Include="xunit.runner.visualstudio" />
</ItemGroup>
<ItemGroup>
<Using Include="Xunit" />
</ItemGroup>
<ItemGroup>
<ProjectReference Include="..\..\src\ZB.MOM.WW.SPHistorianClient\ZB.MOM.WW.SPHistorianClient.csproj" />
</ItemGroup>
</Project>
```
**Step 7: Verify the skeleton is well-formed (build will fail — no sources yet — that is expected)**
Run: `cd ZB.MOM.WW.SPHistorianClient && dotnet restore ZB.MOM.WW.SPHistorianClient.slnx`
Expected: restore **succeeds** (proves the props/csproj XML and central package versions resolve). A
follow-up `dotnet build` would fail only because no `.cs` exist yet — do not build here.
**Step 8: Commit**
```bash
cd /Users/dohertj2/Desktop/scadaproj
git add ZB.MOM.WW.SPHistorianClient/
git commit -m "feat(sphistorianclient): scaffold shared library skeleton (props, csprojs, slnx)"
```
---
## Task 2: Port source + tests with namespace rewrite
**Classification:** standard
**Estimated implement time:** ~4 min
**Parallelizable with:** none
**Blocked by:** Task 1
**Files:**
- Create (scripted copy): `ZB.MOM.WW.SPHistorianClient/src/ZB.MOM.WW.SPHistorianClient/**/*.{cs,proto}` (74 `.cs` + 6 `.proto`)
- Create (scripted copy): `ZB.MOM.WW.SPHistorianClient/tests/ZB.MOM.WW.SPHistorianClient.Tests/**/*.cs` (25 `.cs`)
This task is a deterministic copy + namespace rewrite — run the script, then verify counts.
**Step 1: Copy + rewrite (single script)**
```bash
set -euo pipefail
BUNDLE=/tmp/histsdk/extracted/histsdk-migration/histsdk
DEST=/Users/dohertj2/Desktop/scadaproj/ZB.MOM.WW.SPHistorianClient
# Guard: re-extract if /tmp was cleaned
if [ ! -d "$BUNDLE/src/AVEVA.Historian.Client" ]; then
cd /tmp/histsdk && unzip -o -q histsdk-migration.zip -d extracted
fi
# --- src: copy .cs + .proto, preserving subdirs (NOT the old .csproj) ---
SRC="$DEST/src/ZB.MOM.WW.SPHistorianClient"
cd "$BUNDLE/src/AVEVA.Historian.Client"
find . \( -name '*.cs' -o -name '*.proto' \) | while read -r f; do
mkdir -p "$SRC/$(dirname "$f")"
cp "$f" "$SRC/$f"
done
# --- tests: copy .cs only (NOT the old .csproj) ---
TST="$DEST/tests/ZB.MOM.WW.SPHistorianClient.Tests"
cd "$BUNDLE/tests/AVEVA.Historian.Client.Tests"
find . -name '*.cs' | while read -r f; do
mkdir -p "$TST/$(dirname "$f")"
cp "$f" "$TST/$f"
done
# --- namespace rewrite in .cs ONLY (proto wire contracts stay ArchestrA.Grpc.Contract.*) ---
find "$SRC" "$TST" -name '*.cs' -print0 \
| xargs -0 sed -i '' 's/AVEVA\.Historian\.Client/ZB.MOM.WW.SPHistorianClient/g'
```
**Step 2: Verify counts and that the rename is total**
```bash
DEST=/Users/dohertj2/Desktop/scadaproj/ZB.MOM.WW.SPHistorianClient
echo "src cs: $(find "$DEST/src" -name '*.cs' | wc -l) (expect 74)"
echo "src proto: $(find "$DEST/src" -name '*.proto' | wc -l) (expect 6)"
echo "test cs: $(find "$DEST/tests" -name '*.cs' | wc -l) (expect 25)"
echo "leftover AVEVA.Historian.Client in .cs: $(grep -rl 'AVEVA\.Historian\.Client' "$DEST" --include='*.cs' | wc -l) (expect 0)"
echo "proto namespace preserved: $(grep -l 'ArchestrA.Grpc.Contract' "$DEST"/src/ZB.MOM.WW.SPHistorianClient/Grpc/Protos/*.proto | wc -l) (expect 6)"
```
Expected: `74`, `6`, `25`, `0`, `6`. If "leftover" is non-zero, inspect those files — the only legitimate
remaining mentions would be inside comments/strings that happen to differ in casing/spacing; a clean
port should show `0`.
**Step 3: Commit**
```bash
cd /Users/dohertj2/Desktop/scadaproj
git add ZB.MOM.WW.SPHistorianClient/src ZB.MOM.WW.SPHistorianClient/tests
git commit -m "feat(sphistorianclient): port SDK source + tests, rebrand namespace to ZB.MOM.WW.SPHistorianClient"
```
---
## Task 3: Build + test green
**Classification:** high-risk
**Estimated implement time:** ~5 min (plus restore/build wall-time)
**Parallelizable with:** none
**Blocked by:** Task 2
This is the integration gate. The port must compile and the offline test suite must pass on this macOS host.
**Files:**
- Modify (only if the build surfaces a defect): any ported file under `ZB.MOM.WW.SPHistorianClient/src` or `…/tests`, or the two `.csproj`.
**Step 1: Build**
Run: `cd ZB.MOM.WW.SPHistorianClient && dotnet build ZB.MOM.WW.SPHistorianClient.slnx`
Expected: **Build succeeded.** Platform-compatibility (CAxxxx `[SupportedOSPlatform("windows")]`) warnings
are acceptable and must remain warnings. If restore hard-fails on `NU1903`, re-run with
`-p:NuGetAudit=false`.
**Step 2: Test**
Run: `dotnet test ZB.MOM.WW.SPHistorianClient.slnx`
Expected: all tests pass; the live integration tests (`HistorianClientIntegrationTests`,
`HistorianGrpcIntegrationTests`, `RemoteTcpIntegrationTests`) **skip cleanly** because no `HISTORIAN_*`
env vars are set. The bundle's `MIGRATION-README.md` documents ~188 tests passing on macOS with the
live ones skipped — treat a comparable count with **zero failures** as success.
**Step 3: Triage rules (if not green)**
- Compile error referencing `AVEVA.Historian.Client` → a file was missed by the rewrite; re-run the
Task 2 sed on that file.
- `NU1008` (version on PackageReference) → an inline `Version=` slipped into a `.csproj`; remove it
(version belongs in `Directory.Packages.props`).
- Missing generated gRPC type (e.g. `ArchestrA.Grpc.Contract.*` not found) → confirm the `<Protobuf>`
glob in the src `.csproj` resolves the 6 `Grpc/Protos/*.proto` and that `Grpc.Tools` restored.
- A genuine test failure (not a skip) → this is a real port defect; fix the ported code, do **not**
delete/weaken the test.
**Step 4: Commit (only if Step 3 required edits)**
```bash
git add -A ZB.MOM.WW.SPHistorianClient/
git commit -m "fix(sphistorianclient): resolve port build/test fallout"
```
---
## Task 4: Add the `AddZbSpHistorianClient` DI extension (TDD)
**Classification:** standard
**Estimated implement time:** ~4 min
**Parallelizable with:** Task 5
**Blocked by:** Task 3
`HistorianClientOptions` uses `required` + `init`-only properties, so the extension takes a fully-built
options instance (not an `Action<T>` configurator). It depends only on
`Microsoft.Extensions.DependencyInjection.Abstractions`.
**Files:**
- Create: `ZB.MOM.WW.SPHistorianClient/src/ZB.MOM.WW.SPHistorianClient/DependencyInjection/ZbSpHistorianClientServiceCollectionExtensions.cs`
- Test: `ZB.MOM.WW.SPHistorianClient/tests/ZB.MOM.WW.SPHistorianClient.Tests/DependencyInjectionTests.cs`
**Step 1: Write the failing test**
```csharp
using Microsoft.Extensions.DependencyInjection;
using ZB.MOM.WW.SPHistorianClient;
namespace ZB.MOM.WW.SPHistorianClient.Tests;
public class DependencyInjectionTests
{
[Fact]
public void AddZbSpHistorianClient_resolves_client_and_options()
{
var services = new ServiceCollection();
var options = new HistorianClientOptions { Host = "localhost" };
services.AddZbSpHistorianClient(options);
using var sp = services.BuildServiceProvider();
Assert.Same(options, sp.GetRequiredService<HistorianClientOptions>());
Assert.NotNull(sp.GetRequiredService<HistorianClient>());
}
[Fact]
public void AddZbSpHistorianClient_throws_when_host_missing()
{
var services = new ServiceCollection();
var options = new HistorianClientOptions { Host = "" };
Assert.Throws<ArgumentException>(() => services.AddZbSpHistorianClient(options));
}
[Fact]
public void AddZbSpHistorianClient_throws_on_null_options()
{
var services = new ServiceCollection();
Assert.Throws<ArgumentNullException>(() => services.AddZbSpHistorianClient(null!));
}
}
```
**Step 2: Run — verify it fails to compile** (`AddZbSpHistorianClient` not defined)
Run: `dotnet test ZB.MOM.WW.SPHistorianClient.slnx --filter "FullyQualifiedName~DependencyInjectionTests"`
Expected: FAIL (does not compile / method missing).
**Step 3: Implement**
```csharp
using Microsoft.Extensions.DependencyInjection;
namespace ZB.MOM.WW.SPHistorianClient;
/// <summary>
/// ZB.MOM.WW DI registration for <see cref="HistorianClient"/>. Mirrors the family's
/// <c>AddZb*</c> convention. Because <see cref="HistorianClientOptions"/> is <c>required</c>/
/// <c>init</c>-only, callers pass a fully-built options instance (bind it from configuration in the
/// consuming app, e.g. <c>config.GetSection("Historian").Get&lt;HistorianClientOptions&gt;()</c>).
/// </summary>
public static class ZbSpHistorianClientServiceCollectionExtensions
{
public static IServiceCollection AddZbSpHistorianClient(
this IServiceCollection services,
HistorianClientOptions options)
{
ArgumentNullException.ThrowIfNull(services);
ArgumentNullException.ThrowIfNull(options);
if (string.IsNullOrWhiteSpace(options.Host))
{
throw new ArgumentException(
"HistorianClientOptions.Host must be set.", nameof(options));
}
services.AddSingleton(options);
// HistorianClient opens a fresh channel per operation and has a no-op DisposeAsync,
// so transient is safe and avoids assuming the shared dialect is concurrency-safe.
services.AddTransient<HistorianClient>();
return services;
}
}
```
**Step 4: Run — verify pass**
Run: `dotnet test ZB.MOM.WW.SPHistorianClient.slnx --filter "FullyQualifiedName~DependencyInjectionTests"`
Expected: PASS (3/3).
**Step 5: Commit**
```bash
git add ZB.MOM.WW.SPHistorianClient/src/ZB.MOM.WW.SPHistorianClient/DependencyInjection \
ZB.MOM.WW.SPHistorianClient/tests/ZB.MOM.WW.SPHistorianClient.Tests/DependencyInjectionTests.cs
git commit -m "feat(sphistorianclient): add AddZbSpHistorianClient DI extension"
```
---
## Task 5: Author `CLAUDE.md` + `README.md`
**Classification:** small
**Estimated implement time:** ~4 min
**Parallelizable with:** Task 4
**Blocked by:** Task 3
Sanitized docs only — **no hostnames, credentials, customer tag names, or capture data.** Model the
structure on `ZB.MOM.WW.Telemetry/CLAUDE.md` (overview, package table, build/test/pack commands,
status) but adapt to a single-package library.
**Files:**
- Create: `ZB.MOM.WW.SPHistorianClient/CLAUDE.md`
- Create: `ZB.MOM.WW.SPHistorianClient/README.md`
**`CLAUDE.md` must cover:**
- One-paragraph overview: pure-managed .NET 10 AVEVA System Platform Historian client, no native AVEVA
dependency, reverse-engineered wire protocol. Ported from the `histsdk` migration bundle.
- The supported operation surface table (copy the README table from the bundle:
`ProbeAsync`, `ReadRawAsync`, `ReadAggregateAsync` (16 modes), `ReadAtTimeAsync`, `ReadEventsAsync`,
`BrowseTagNamesAsync`, `GetTagMetadataAsync`, `GetConnectionStatusAsync`,
`GetStoreForwardStatusAsync`, `GetSystemParameterAsync`, `EnsureTagAsync`, `DeleteTagAsync`).
- Transport matrix: `LocalPipe` / `RemoteTcpIntegrated` / `RemoteTcpCertificate` (WCF, Windows-only,
live-verified) vs `RemoteGrpc` (2023 R2, cross-platform, **not yet live-verified**).
- Out of scope: writing samples (`AddS2` architecturally blocked), discrete/string tag creation.
- DI: the `AddZbSpHistorianClient(options)` extension + the bind-from-config note.
- Build/test/pack commands (from this dir):
`dotnet build ZB.MOM.WW.SPHistorianClient.slnx` / `dotnet test …` /
`dotnet pack ZB.MOM.WW.SPHistorianClient.slnx -c Release -o ./artifacts`.
- Live integration tests gated by `HISTORIAN_*` env vars (skip cleanly when unset). List the env vars.
**`README.md`:** a trimmed public-facing version — overview, quick-start snippet (the bundle's
`HistorianClient` usage example, namespace updated to `ZB.MOM.WW.SPHistorianClient`), supported surface
table, build/test commands.
**Commit:**
```bash
git add ZB.MOM.WW.SPHistorianClient/CLAUDE.md ZB.MOM.WW.SPHistorianClient/README.md
git commit -m "docs(sphistorianclient): add CLAUDE.md + README.md"
```
---
## Task 6: Pack verification
**Classification:** small
**Estimated implement time:** ~3 min
**Parallelizable with:** none
**Blocked by:** Task 4, Task 5
**Files:**
- Create (build output): `ZB.MOM.WW.SPHistorianClient/artifacts/ZB.MOM.WW.SPHistorianClient.0.1.0.nupkg`
**Step 1: Full green build + test once more, then pack**
```bash
cd /Users/dohertj2/Desktop/scadaproj/ZB.MOM.WW.SPHistorianClient
dotnet test ZB.MOM.WW.SPHistorianClient.slnx
dotnet pack ZB.MOM.WW.SPHistorianClient.slnx -c Release -o ./artifacts
```
Expected: tests pass (live ones skip); pack produces `artifacts/ZB.MOM.WW.SPHistorianClient.0.1.0.nupkg`.
**Step 2: Sanity-check the package contents**
```bash
unzip -l artifacts/ZB.MOM.WW.SPHistorianClient.0.1.0.nupkg | grep -E 'ZB.MOM.WW.SPHistorianClient.dll|.nuspec'
```
Expected: the lib DLL and nuspec are present.
**Step 3: Commit the nupkg** (matches the family convention — `ZB.MOM.WW.Telemetry` commits its `artifacts/*.nupkg`)
```bash
cd /Users/dohertj2/Desktop/scadaproj
git add -f ZB.MOM.WW.SPHistorianClient/artifacts/ZB.MOM.WW.SPHistorianClient.0.1.0.nupkg
git commit -m "build(sphistorianclient): pack 0.1.0 nupkg"
```
> **Do NOT push or publish** to the Gitea feed. Per repo experience, "published/adopted" claims must
> not be made without explicit user direction + feed verification.
---
## Task 7: Index the new library in the umbrella `CLAUDE.md` (optional)
**Classification:** trivial
**Estimated implement time:** ~2 min
**Parallelizable with:** none
**Blocked by:** Task 6
**Files:**
- Modify: `CLAUDE.md` (repo root umbrella index)
Add a short reference so the umbrella index reflects the newly-hosted library (the intro paragraph
that enumerates the hosted `ZB.MOM.WW.*` sources, and/or a one-line pointer near the component table
noting `ZB.MOM.WW.SPHistorianClient` is a net-new shared library — **not** a component normalization).
> **Caveat:** repo-root `CLAUDE.md` already has **pre-existing uncommitted edits** (unrelated to this
> work). Before editing, run `git diff CLAUDE.md` and make sure your commit message reflects that it
> may bundle those edits — or stage only the hunks you add. If this risks entangling unrelated changes,
> skip this task and leave it for the user.
**Commit:**
```bash
git add CLAUDE.md
git commit -m "docs: index ZB.MOM.WW.SPHistorianClient in umbrella CLAUDE.md"
```
---
## Done criteria
- `ZB.MOM.WW.SPHistorianClient/` exists with `src/`, `tests/`, props, `.slnx`, `CLAUDE.md`, `README.md`.
- `dotnet build` + `dotnet test` are green on macOS (live integration tests skip cleanly).
- `AddZbSpHistorianClient` DI extension present + tested.
- `artifacts/ZB.MOM.WW.SPHistorianClient.0.1.0.nupkg` produced.
- All work committed on `feat/sphistorianclient`. Not pushed/published.
@@ -0,0 +1,13 @@
{
"planPath": "docs/plans/2026-06-19-sphistorianclient.md",
"tasks": [
{"id": 1, "subject": "Task 1: Scaffold library skeleton", "status": "completed"},
{"id": 2, "subject": "Task 2: Port source + tests with namespace rewrite", "status": "completed", "blockedBy": [1]},
{"id": 3, "subject": "Task 3: Build + test green", "status": "completed", "blockedBy": [2]},
{"id": 4, "subject": "Task 4: Add AddZbSpHistorianClient DI extension (TDD)", "status": "completed", "blockedBy": [3]},
{"id": 5, "subject": "Task 5: Author CLAUDE.md + README.md", "status": "completed", "blockedBy": [3]},
{"id": 6, "subject": "Task 6: Pack verification", "status": "completed", "blockedBy": [4, 5]},
{"id": 7, "subject": "Task 7: Index new lib in umbrella CLAUDE.md (optional)", "status": "completed", "blockedBy": [6]}
],
"lastUpdated": "2026-06-19"
}
@@ -0,0 +1,216 @@
# ZB.MOM.WW.HistorianGateway — Design
**Date:** 2026-06-23
**Status:** Design approved (brainstorming complete) — ready for implementation planning
**Author:** brainstorming session (Joseph Doherty + Claude)
## 1. Summary
A new **full-feature sidecar** in the SCADA/OT sister-project family, modelled on
`MxAccessGateway` (`mxaccessgw`). It does two things:
1. **Read-only Galaxy metadata server** — exposes the AVEVA System Platform
("Wonderware") **Galaxy object hierarchy** (areas / objects / templates /
instances / attributes), sourced from the **Galaxy Repository SQL DB**, the same
data `mxaccessgw`'s `galaxy_repository` feature serves today.
2. **Full read/write gRPC API to the AVEVA (Wonderware) Historian** — reads (raw,
aggregate with all 15 retrieval modes, at-time, blocks, events, browse, metadata,
status) **and writes** (historical/backfill values, event send, tag-config
create/delete/rename/extended-properties, plus resilience helpers).
It reuses the family's **common shared packages and styles**: `ZB.MOM.WW.Auth`,
`ZB.MOM.WW.Theme`, `ZB.MOM.WW.Telemetry`(+`.Serilog`), `ZB.MOM.WW.Health`,
`ZB.MOM.WW.Configuration`, `ZB.MOM.WW.Audit`.
### Key reframing discovered during brainstorming
- The historian write surface lives in the **`histsdk`** repo
(`gitea.dohertylan.com/dohertj2/histsdk`, namespace `AVEVA.Historian.Client`),
which is **far ahead** of the stale `scadaproj/ZB.MOM.WW.SPHistorianClient` port
(2026-06-19 snapshot, reads + tag create/delete only, value-writes marked
"blocked"). `histsdk` has since added a **live-validated gRPC write surface**:
`AddHistoricalValuesAsync` (historical/backfill, gRPC-only), `SendEventAsync`
(events, both transports), `EnsureTags`/`DeleteTags`/`RenameTags`/
`AddTagExtendedProperties` (config writes, gRPC), plus higher-level
`HistorianStoreForwardWriter` (durable outbox) and a redundant-write cluster.
- **Hard server-side limit no client can lift:** `AddS2` *streaming live
process-sample* writes are GATED — the historian runtime cache only ingests from
configured IOServer / Application Server pipelines. "Live current value" writes are
therefore done via the **SQL path** (`aaAnalogTagInsert``INSERT INTO History`),
not gRPC.
- **No COM anywhere.** The historian SDK is pure-managed and Galaxy browse is plain
SQL, so — unlike `mxaccessgw` — this sidecar needs **no x86 worker and no
two-process split**. It is a single .NET 10 x64 process.
## 2. Decisions (locked during brainstorming)
| Decision | Choice |
|---|---|
| Purpose | **General-purpose gateway** — reusable modern façade over Historian + Galaxy metadata for any gRPC client (the way `mxaccessgw` is the façade for MXAccess). |
| Historian client code | **Vendor `histsdk`** (`AVEVA.Historian.Client`) into the sidecar repo as self-contained vendored source; namespace kept as-is to ease future re-sync. |
| Galaxy browse code | **New shared lib `ZB.MOM.WW.GalaxyRepository`** in scadaproj, extracted from `mxaccessgw`'s `galaxy_repository` browse; consumed by both `mxaccessgw` and this sidecar. |
| Dashboard | **Full Blazor dashboard** on `ZB.MOM.WW.Theme` (login + Galaxy browser + Historian console + API-key admin + status/health). |
| Historian write scope | **All:** historical/backfill writes, event send, tag-config writes, **and** resilience extras (store-and-forward outbox, redundant write fan-out, SQL live-value path). |
| Connection model | **Approach A — stateless gateway over pooled service-identity connections.** Clients authenticate to the gateway (ZB API key); the gateway owns historian credentials and reuses pooled, pre-authenticated connections. |
| Repo location | **New standalone sibling repo** `~/Desktop/HistorianGateway`, gitea remote `historiangw`. |
| Name / namespace | **`ZB.MOM.WW.HistorianGateway`**. |
## 3. Architecture & solution structure
Single .NET 10 x64 ASP.NET Core process.
```
~/Desktop/HistorianGateway/
src/
ZB.MOM.WW.HistorianGateway.Server/ ASP.NET Core host: gRPC services + Blazor dashboard + /healthz + /metrics
ZB.MOM.WW.HistorianGateway.Contracts/ the gateway's own .proto + generated types (for client codegen distribution)
vendor/AVEVA.Historian.Client/ VENDORED from histsdk; ArchestrA.Grpc.Contract.* protos + reads/writes/store-forward/redundancy
tests/
ZB.MOM.WW.HistorianGateway.Tests/ unit + env-gated live integration + bUnit dashboard
docs/plans/, CLAUDE.md, README.md
ZB.MOM.WW.HistorianGateway.slnx
```
**Cross-repo pieces:**
- **`scadaproj/ZB.MOM.WW.GalaxyRepository`** (new shared lib, plain files — NOT a
nested git repo): carries the **canonical `galaxy_repository.proto`** (adopted from
`mxaccessgw`'s existing contract so OtOpcUa's wire shape is not broken), the SQL
browse provider (connect to Galaxy Repository SQL → hierarchy model), and a
reusable gRPC service implementation both hosts can `MapGrpcService<>()`.
`mxaccessgw` adopting it is a **tracked follow-on** (same "built → adopted" pattern
as the other normalized components); this sidecar consumes it from the start.
- **Shared ZB packages consumed:** `ZB.MOM.WW.Auth`
(Abstractions+Ldap+ApiKeys+AspNetCore), `ZB.MOM.WW.Theme`, `ZB.MOM.WW.Telemetry`
(+`.Serilog`), `ZB.MOM.WW.Health`, `ZB.MOM.WW.Configuration`, `ZB.MOM.WW.Audit`.
## 4. gRPC API surface
Gateway's own curated contract (`ZB.MOM.WW.HistorianGateway.Grpc.V1`), grouped by
concern — not a 1:1 SDK dump. The vendored `ArchestrA.Grpc.Contract.*` protos stay
internal; clients see only the gateway contract.
| Service | RPCs | Notes |
|---|---|---|
| `HistorianRead` | `ReadRaw`, `ReadAggregate`, `ReadBlocks`, `ReadEvents` *(server-streaming)*; `ReadAtTime` | `ReadAggregate` exposes all 15 retrieval modes |
| `HistorianWrite` | `AddHistoricalValues`, `SendEvent`, `WriteLiveValues` | `WriteLiveValues` = SQL path (gRPC streaming is gated) |
| `HistorianTags` | `BrowseTagNames` *(streaming)*, `GetTagMetadata`, `EnsureTags`, `DeleteTags`, `RenameTags`, `AddTagExtendedProperties` | |
| `HistorianStatus` | `Probe`, `GetConnectionStatus`, `GetStoreForwardStatus`, `GetSystemParameter` | |
| `GalaxyRepository` | Browse areas / objects / templates / instances / attributes *(read-only)* | canonical proto from the shared lib |
**Authorization** is via **API-key scopes at the gateway** (Approach A trust
boundary): `historian:read`, `historian:write`, `historian:tags:write`,
`galaxy:read`.
## 5. Connection & data flow
```
gRPC client ──(ZB API key)──► HistorianGateway ──┬─ pooled, pre-authed gRPC conn ──► AVEVA Historian (RemoteGrpc 2023R2)
├─ store-forward outbox (SQLite) ─ replays writes on reconnect
├─ redundant-write fan-out ──────► historian members (All/Any ack)
├─ SqlConnection ──► Runtime DB (live-value writes via aaAnalogTagInsert/History)
└─ ZB.MOM.WW.GalaxyRepository ──► Galaxy Repository SQL (read-only browse)
```
- **Pooled connections:** the expensive auth handshake (`ValidateClientCredential` /
ECDH `ExchangeKey`) runs **once per connection on open**, then is reused across
requests; connections are health-checked with auto-reconnect. Write operations use
the write-enabled session mode (`0x401`).
- **Store-forward:** writes flow through the SDK's `HistorianStoreForwardWriter` — on
an unreachable historian, enqueue to durable SQLite; a background drain replays on
reconnect.
- **Redundancy:** `HistorianRedundantWriteResult` fan-out to configured members under
an All/Any ack policy; per-member result surfaced to the caller.
- **SQL live-write** and **Galaxy browse** are independent SQL paths, each with its
own validated connection config.
**Configuration** (all `ZB.MOM.WW.Configuration`-validated, aggregated by
`ConfigPreflight` at startup): historian (host, gRPC port 32565, transport=RemoteGrpc,
TLS, service identity/credentials), redundant members, store-forward path, Galaxy
Repository SQL connection string, Runtime DB connection string (SQL live-write),
Auth (LDAP + API-key pepper). Secrets live in the operator environment, never in repo.
## 6. Cross-cutting infrastructure + dashboard
- **Auth (`ZB.MOM.WW.Auth`):** gRPC clients use peppered-HMAC API keys (keyId/Bearer),
validated by a gRPC interceptor enforcing per-service scopes. Dashboard uses LDAP
login (`.Ldap`+`.AspNetCore`), cookie auth, `IGroupRoleMapper<TRole>`, canonical
`ZbClaimTypes`/`ZbCookieDefaults`, canonical-six roles, dev against the shared
GLAuth (`10.100.0.35:3893`, `dc=zb,dc=local`). `DisableLogin` dev/deploy switch.
- **Telemetry (`.Telemetry`+`.Serilog`):** `AddZbTelemetry` (Resource
`service.name=historian-gateway` + standard instrumentation + always-on Prometheus
`/metrics`, OTLP opt-in) + `AddZbSerilog`. App Meters: read/write counts + latency,
store-forward queue depth, pool connection state, redundancy ack outcomes.
- **Health (`.Health`):** three-tier ready/active/healthz + canonical JSON writer.
Probes: historian gRPC (`GrpcDependencyHealthCheck`), Galaxy Repository SQL +
Runtime DB (`DatabaseHealthCheck`), store-forward drain status.
- **Configuration (`.Configuration`):** `OptionsValidatorBase` / `ValidationBuilder` /
`AddValidatedOptions` / `ConfigPreflight` (§5).
- **Audit (`.Audit`, DEEP-adopt):** canonical `AuditEvent` + SQLite `IAuditWriter`
(MxGateway-style). Audited: tag-config writes, historical/event writes, API-key
admin, login/logout. `Actor` wired from the Auth principal via `IAuditActorAccessor`.
- **Dashboard (Blazor, `.Theme`):** Technical-Light side-rail shell + `LoginCard`
`/login`. Pages: **Status** (pool / store-forward / redundancy / version),
**Galaxy browser** (read-only hierarchy tree), **Historian console** (query with
raw/aggregate + mode picker + time range; role-gated write test for value insert /
event send), **API-key admin** (list/create/revoke keys + scopes), **Health**.
## 7. Error handling
- **gRPC status mapping:** `ProtocolEvidenceMissingException` (unsupported op/type —
e.g. non-analog tag, non-string event property) → `Unimplemented`/`FailedPrecondition`
with a clear "not in reverse-engineered surface" message; auth →
`Unauthenticated`/`PermissionDenied`; historian down → `Unavailable`; bad range /
unknown tag → `InvalidArgument`/`NotFound`.
- **Gated ops:** live streaming-sample writes (`AddS2`) are **not exposed** (no RPC);
live-value writes route through SQL `WriteLiveValues`.
- **Write resilience:** with store-forward enabled, an unreachable historian returns
*accepted + queued* (not an error); otherwise `Unavailable`. Redundancy surfaces a
per-member result; All-policy fails if any member fails, Any-policy succeeds on ≥1 ack.
- **Pool:** transient failures → reconnect + bounded retry; auth-handshake failure →
fail fast with diagnostic. No secrets/real hostnames in errors or logs (histsdk
safety rule).
## 8. Testing
- **Unit:** gRPC services against a faked historian-client seam + faked Galaxy
provider; scope/auth interceptor; config validators; SDK-model ↔ proto mapping.
- **Golden/protocol:** carry over `histsdk`'s golden byte tests for the vendored
client (historical "ON" buffer, event "OS" buffer, registration buffers) so the
vendored copy stays faithful.
- **Integration (env-gated, live, CI/macOS-safe):** real 2023 R2 historian + Galaxy
Repository SQL — read/write round-trips and browse via the self-cleaning
sandbox-tag lifecycle (`HISTORIAN_GRPC_WRITE_SANDBOX_TAG`); skipped when env vars
absent.
- **Dashboard:** bUnit component tests. **Smoke:** `/healthz`, `/metrics`, gRPC
`Probe`.
## 9. Out of scope / non-goals
- `AddS2` live streaming process-sample writes (GATED server-side; SQL path covers
live values instead).
- Non-analog tag creation, revision/edit writes, bit-faithful store-forward framing
(per `histsdk` capability matrix — `BOUNDED`/`HARD`/`GATED` items not selected).
- A two-process / x86 worker split (not needed — no COM).
- Re-syncing or replacing the existing stale `scadaproj/ZB.MOM.WW.SPHistorianClient`
port (we vendor `histsdk` instead; the stale port is left as-is).
## 10. Implementation components (high level)
1. **`ZB.MOM.WW.GalaxyRepository` shared lib** (scadaproj) — extract from
`mxaccessgw`, canonical proto + SQL browse provider + reusable gRPC service.
2. **Vendor `histsdk`** `AVEVA.Historian.Client` into the new repo + carry its golden
tests.
3. **Repo scaffold + host + shared-package wiring** (Auth/Telemetry/Health/
Configuration/Audit) + validated options + `ConfigPreflight`.
4. **gRPC contract + services** (Read / Write / Tags / Status / GalaxyRepository).
5. **Connection layer** — pooled pre-authed connections, store-forward, redundancy,
SQL live-write path.
6. **Auth** — API-key scope interceptor + LDAP dashboard auth + Audit wiring.
7. **Blazor dashboard** pages (Theme).
8. **Telemetry + Health** probes/meters.
9. **Tests** — unit / golden / env-gated integration / bUnit.
10. **Docs + repo/gitea setup**`CLAUDE.md`, `README.md`, gitea remote.
> `mxaccessgw` adoption of `ZB.MOM.WW.GalaxyRepository` is a separate tracked
> follow-on, not part of the initial sidecar delivery.
@@ -0,0 +1,523 @@
# ZB.MOM.WW.HistorianGateway Implementation Plan
> **For Claude:** REQUIRED SUB-SKILL: Use superpowers-extended-cc:executing-plans to implement this plan task-by-task.
**Goal:** Build a single .NET 10 x64 sidecar that exposes (1) a read-only Galaxy object-hierarchy metadata gRPC server and (2) a full read/write gRPC API to the AVEVA Historian, with a Blazor dashboard, reusing the family's shared `ZB.MOM.WW.*` packages.
**Architecture:** One ASP.NET Core process hosting gRPC services + Blazor (no COM, no x86 worker). The historian write/read surface comes from the **vendored `histsdk` client** (`AVEVA.Historian.Client`). The Galaxy browse comes from a **new shared lib `ZB.MOM.WW.GalaxyRepository`** in scadaproj (extracted from mxaccessgw, wire-compatible `galaxy_repository.v1`). Connection model: stateless gateway over a **pooled, pre-authenticated service-identity connection**; clients authenticate to the gateway via peppered-HMAC API keys with per-service scopes.
**Tech Stack:** .NET 10, ASP.NET Core, Grpc.AspNetCore 2.76, Grpc.Net.Client 2.58 (vendored), Google.Protobuf, Microsoft.Data.SqlClient, Microsoft.Data.Sqlite, Blazor InteractiveServer, `ZB.MOM.WW.Theme` 0.3.1, `ZB.MOM.WW.Auth` 0.1.2, `ZB.MOM.WW.Telemetry`/`.Serilog` 0.1.0, `ZB.MOM.WW.Health` 0.1.0, `ZB.MOM.WW.Audit` 0.1.0, `ZB.MOM.WW.Configuration` 0.1.0, xUnit, bUnit.
**Reference sources (read these for exact patterns — do NOT re-discover):**
- Design doc: `docs/plans/2026-06-23-historian-gateway-design.md`
- mxaccessgw (the model): `~/Desktop/MxAccessGateway/src/ZB.MOM.WW.MxGateway.Server/``GatewayApplication.cs` (host wiring), `Security/Authorization/*` (gRPC API-key interceptor + scope resolver), `Galaxy/GalaxyRepository.cs` (the SQL to extract), `Galaxy/GalaxyRepositoryOptions.cs`, `Galaxy/GalaxyHierarchyCache.cs`, `Galaxy/GalaxyRepositoryServiceCollectionExtensions.cs`, `Contracts/Protos/galaxy_repository.proto`, `Dashboard/Components/*` (Blazor + Theme).
- histsdk clone (to vendor): `/tmp/histsdk-explore/src/AVEVA.Historian.Client/` + `/tmp/histsdk-explore/tests/AVEVA.Historian.Client.Tests/`.
- Shared package signatures: captured in the design session; key paths under `~/Desktop/scadaproj/ZB.MOM.WW.{Telemetry,Health,Configuration,Audit,Auth,Theme}/`.
**Conventions for every task:** TDD where a seam exists (write the failing test first). Exact file paths in the `Files:` block ARE the implementer's contract. Commit after each task. Tests must stay green on macOS with no live historian/SQL (live tests are env-gated and skip when env vars are absent).
---
## Phase 0 — Shared `ZB.MOM.WW.GalaxyRepository` lib (in scadaproj)
> Built in `~/Desktop/scadaproj/ZB.MOM.WW.GalaxyRepository/` as plain files (NOT a nested git repo — see memory `shared-libs-are-plain-files-not-nested-repos`). Wire-compatible: keep proto `package galaxy_repository.v1` and all field numbers identical to mxaccessgw's so OtOpcUa is unaffected; only the C# `csharp_namespace` becomes neutral. mxaccessgw adoption of this lib is a separate follow-on, NOT in this plan.
### Task 1: Scaffold the GalaxyRepository lib project
**Classification:** small
**Estimated implement time:** ~3 min
**Parallelizable with:** Task 7 (vendoring), Task 9 (repo scaffold)
**Files:**
- Create: `~/Desktop/scadaproj/ZB.MOM.WW.GalaxyRepository/ZB.MOM.WW.GalaxyRepository.slnx`
- Create: `~/Desktop/scadaproj/ZB.MOM.WW.GalaxyRepository/src/ZB.MOM.WW.GalaxyRepository/ZB.MOM.WW.GalaxyRepository.csproj`
- Create: `~/Desktop/scadaproj/ZB.MOM.WW.GalaxyRepository/tests/ZB.MOM.WW.GalaxyRepository.Tests/ZB.MOM.WW.GalaxyRepository.Tests.csproj`
**Steps:**
1. Create the `.csproj` (net10.0, `Nullable`/`ImplicitUsings` enabled, packable, `PackageId=ZB.MOM.WW.GalaxyRepository`, `Version=0.1.0`). PackageReferences: `Microsoft.Data.SqlClient` 6.0.2, `Grpc.AspNetCore` 2.76.0, `Google.Protobuf`, `Microsoft.Extensions.Hosting.Abstractions`, `Microsoft.Extensions.Options.ConfigurationExtensions`. Add `<Protobuf Include="Protos\*.proto" GrpcServices="Server" />`.
2. Create the test `.csproj` (net10.0, `IsPackable=false`, xUnit 2.9.3 + `Microsoft.NET.Test.Sdk` 17.14.1 + `Microsoft.Data.SqlClient`), ProjectReference to the lib.
3. Create the `.slnx` listing both projects.
4. Run: `dotnet build ~/Desktop/scadaproj/ZB.MOM.WW.GalaxyRepository/ZB.MOM.WW.GalaxyRepository.slnx` — Expected: builds (no sources yet, 0 warnings).
5. Commit: `git -C ~/Desktop/scadaproj add ZB.MOM.WW.GalaxyRepository && git -C ~/Desktop/scadaproj commit -m "feat(galaxyrepo): scaffold ZB.MOM.WW.GalaxyRepository shared lib"`
### Task 2: Port the canonical galaxy_repository.proto (neutral namespace)
**Classification:** standard
**Estimated implement time:** ~4 min
**Parallelizable with:** none (Task 3+ depend on generated types)
**Files:**
- Create: `~/Desktop/scadaproj/ZB.MOM.WW.GalaxyRepository/src/ZB.MOM.WW.GalaxyRepository/Protos/galaxy_repository.proto`
**Steps:**
1. Copy mxaccessgw's `Contracts/Protos/galaxy_repository.proto` verbatim, changing ONLY `option csharp_namespace` to `"ZB.MOM.WW.GalaxyRepository.Grpc"`. Keep `package galaxy_repository.v1`, all services (`TestConnection`, `GetLastDeployTime`, `DiscoverHierarchy`, `WatchDeployEvents`, `BrowseChildren`), and every message/field number identical (wire compatibility).
2. Run: `dotnet build .../ZB.MOM.WW.GalaxyRepository.slnx` — Expected: PASS; generated `GalaxyRepository.GalaxyRepositoryBase`, `GalaxyObject`, `GalaxyAttribute`, etc. appear under namespace `ZB.MOM.WW.GalaxyRepository.Grpc`.
3. Commit: `feat(galaxyrepo): canonical galaxy_repository.v1 proto (neutral namespace)`
### Task 3: Port the SQL browse provider (`GalaxyRepository` + rows + options)
**Classification:** standard
**Estimated implement time:** ~5 min
**Parallelizable with:** none
**Files:**
- Create: `.../src/ZB.MOM.WW.GalaxyRepository/GalaxyRepositoryOptions.cs`
- Create: `.../src/ZB.MOM.WW.GalaxyRepository/GalaxyHierarchyRow.cs`
- Create: `.../src/ZB.MOM.WW.GalaxyRepository/GalaxyAttributeRow.cs`
- Create: `.../src/ZB.MOM.WW.GalaxyRepository/IGalaxyRepository.cs`
- Create: `.../src/ZB.MOM.WW.GalaxyRepository/GalaxyRepository.cs`
**Steps:**
1. Port `GalaxyRepositoryOptions` from mxaccessgw `Galaxy/GalaxyRepositoryOptions.cs` — rename section const to `ZB.MOM.WW.GalaxyRepository` (the consuming app picks its own section path at registration), drop MxGateway-specific defaults. Keep `ConnectionString`, `CommandTimeoutSeconds`, `DashboardRefreshIntervalSeconds`, `PersistSnapshot`, `SnapshotCachePath`.
2. Port `GalaxyHierarchyRow` / `GalaxyAttributeRow` DTOs and the `IGalaxyRepository` interface (`TestConnectionAsync`, `GetLastDeployTimeAsync`, `GetHierarchyAsync`, `GetAttributesAsync`).
3. Port `GalaxyRepository.cs` **verbatim** including the two SQL blocks (`HierarchySql`, `AttributesSql`) and the `SqlConnection`/`SqlDataReader` mapping loops — these are validated reverse-engineered queries; do NOT modify the SQL.
4. Run: `dotnet build` — Expected: PASS.
5. Commit: `feat(galaxyrepo): SQL browse provider (hierarchy + attributes)`
### Task 4: Port the in-memory hierarchy cache + snapshot + deploy notifier + refresh service
**Classification:** standard
**Estimated implement time:** ~5 min
**Parallelizable with:** none
**Files:**
- Create: `.../GalaxyHierarchyCacheEntry.cs`, `.../IGalaxyHierarchyCache.cs`, `.../GalaxyHierarchyCache.cs`
- Create: `.../IGalaxyDeployNotifier.cs`, `.../GalaxyDeployNotifier.cs`
- Create: `.../IGalaxyHierarchySnapshotStore.cs`, `.../GalaxyHierarchySnapshotStore.cs`
- Create: `.../GalaxyHierarchyRefreshService.cs` (`BackgroundService`)
- Create: `.../GalaxyHierarchyProjector.cs` (paging/filter projection used by the gRPC service)
**Steps:**
1. Port these from mxaccessgw's `Galaxy/` folder, adjusting namespaces to `ZB.MOM.WW.GalaxyRepository`. Keep the cache's first-load gate, refresh semaphore, snapshot restore, and deploy-poll refresh trigger.
2. Port `GalaxyHierarchyProjector` (the `Project(...)` + `ComputeFilterSignature(...)` used by `DiscoverHierarchy`/`BrowseChildren` paging).
3. Run: `dotnet build` — Expected: PASS.
4. Commit: `feat(galaxyrepo): hierarchy cache + snapshot + refresh service`
### Task 5: Port the reusable gRPC service + DI extension
**Classification:** standard
**Estimated implement time:** ~5 min
**Parallelizable with:** none
**Files:**
- Create: `.../Grpc/GalaxyRepositoryGrpcService.cs`
- Create: `.../DependencyInjection/GalaxyRepositoryServiceCollectionExtensions.cs`
**Steps:**
1. Port `GalaxyRepositoryGrpcService` from mxaccessgw's `Grpc/GalaxyRepositoryGrpcService.cs`, but REMOVE the mxaccessgw-specific `IGatewayRequestIdentityAccessor`/`ApiKeyConstraints` browse-subtree filtering (the gateway will apply its own auth at the interceptor layer). Keep `DiscoverHierarchy`, `BrowseChildren`, `TestConnection`, `GetLastDeployTime`, `WatchDeployEvents`. Base class: `ZB.MOM.WW.GalaxyRepository.Grpc.GalaxyRepository.GalaxyRepositoryBase`.
2. Write `AddZbGalaxyRepository(this IServiceCollection, IConfiguration, string sectionPath)` modeled on mxaccessgw's `AddGalaxyRepository` — bind options from `sectionPath`, register `GalaxyRepository`/`IGalaxyRepository`, notifier, snapshot store, cache, and the refresh `HostedService`. Add a companion `MapZbGalaxyRepository(this IEndpointRouteBuilder)` that `MapGrpcService<GalaxyRepositoryGrpcService>()`.
3. Run: `dotnet build` — Expected: PASS.
4. Commit: `feat(galaxyrepo): reusable gRPC service + AddZbGalaxyRepository DI`
### Task 6: Unit tests for the projector + DI smoke; pack
**Classification:** standard
**Estimated implement time:** ~5 min
**Parallelizable with:** none
**Files:**
- Create: `.../tests/ZB.MOM.WW.GalaxyRepository.Tests/GalaxyHierarchyProjectorTests.cs`
- Create: `.../tests/ZB.MOM.WW.GalaxyRepository.Tests/GalaxyHierarchyCacheTests.cs`
**Steps:**
1. **Write failing tests first:** projector paging (page_token round-trip, max_depth, `historized_only`/`alarm_bearing_only` filters, attribute include toggle) against a hand-built `GalaxyHierarchyCacheEntry` fixture; cache first-load gate + snapshot restore using a fake `IGalaxyRepository`. (SQL provider itself is exercised by env-gated integration later — no live DB in unit tests.)
2. Run: `dotnet test .../ZB.MOM.WW.GalaxyRepository.slnx` — Expected: FAIL (types/asserts).
3. Implement any small helper gaps surfaced; re-run — Expected: PASS.
4. Run: `dotnet pack .../src/ZB.MOM.WW.GalaxyRepository/ZB.MOM.WW.GalaxyRepository.csproj -c Release` — Expected: `ZB.MOM.WW.GalaxyRepository.0.1.0.nupkg` produced.
5. Commit: `test(galaxyrepo): projector + cache tests; pack 0.1.0`
---
## Phase 1 — Sidecar repo scaffold + vendor histsdk
### Task 7: Vendor the histsdk client + its golden tests
**Classification:** high-risk
**Estimated implement time:** ~5 min
**Parallelizable with:** Task 1
**Files:**
- Create: `~/Desktop/HistorianGateway/src/vendor/AVEVA.Historian.Client/**` (copied)
- Create: `~/Desktop/HistorianGateway/tests/AVEVA.Historian.Client.Tests/**` (copied)
- Create: `~/Desktop/HistorianGateway/src/vendor/AVEVA.Historian.Client/VENDORING.md`
**Steps:**
1. `mkdir -p ~/Desktop/HistorianGateway/src/vendor ~/Desktop/HistorianGateway/tests`. Copy `/tmp/histsdk-explore/src/AVEVA.Historian.Client/` and `/tmp/histsdk-explore/tests/AVEVA.Historian.Client.Tests/` into those locations.
2. In the vendored test `.csproj`, REMOVE the `ProjectReference` to `tools/AVEVA.Historian.ReverseEngineering` (not vendored) and delete any test classes that depend on that tooling namespace (the RE-sanitizer tests). KEEP the protocol/golden tests: `HistorianTagWriteProtocolTests`, `HistorianEventRowProtocolTests`, `GrpcEventSendProtocolTests`, `WcfDataQueryProtocolTests`, `StoreForwardOutboxTests`, `RedundancyTests`, version-gate tests. Fix the surviving test `.csproj` ProjectReference path to the new vendored client location.
3. Keep namespace `AVEVA.Historian.Client` as-is (eases re-sync). Write `VENDORING.md` recording: source repo `gitea.dohertylan.com/dohertj2/histsdk`, the commit/date of the snapshot, and "do not hand-edit; re-vendor from upstream."
4. Run: `dotnet build ~/Desktop/HistorianGateway/src/vendor/AVEVA.Historian.Client/AVEVA.Historian.Client.csproj` then `dotnet test ~/Desktop/HistorianGateway/tests/AVEVA.Historian.Client.Tests/` — Expected: build PASS; golden/offline tests PASS (live env-gated tests skip).
5. Commit (in the new repo, after Task 8 inits it — if running before Task 8, defer the commit): `chore(vendor): vendor histsdk AVEVA.Historian.Client + golden tests`
### Task 8: Initialize the sidecar repo + solution + Directory.Build.props
**Classification:** small
**Estimated implement time:** ~3 min
**Parallelizable with:** none (Task 7 output is added here)
**Files:**
- Create: `~/Desktop/HistorianGateway/.gitignore`
- Create: `~/Desktop/HistorianGateway/Directory.Build.props`
- Create: `~/Desktop/HistorianGateway/ZB.MOM.WW.HistorianGateway.slnx`
**Steps:**
1. `git -C ~/Desktop/HistorianGateway init` (this IS its own app repo — unlike shared libs). Add a .NET `.gitignore`.
2. `Directory.Build.props`: `net10.0`, `Nullable`/`ImplicitUsings` enable, `<Platforms>x64</Platforms>`, `<PlatformTarget>x64</PlatformTarget>`, common `LangVersion`.
3. Create `.slnx` referencing: `src/vendor/AVEVA.Historian.Client`, `tests/AVEVA.Historian.Client.Tests` (and the projects added in later phases — add them as created).
4. Run: `dotnet build ~/Desktop/HistorianGateway/ZB.MOM.WW.HistorianGateway.slnx` — Expected: PASS.
5. Commit: `chore: init repo + solution + Directory.Build.props` (then re-commit Task 7's vendored tree if it was deferred).
---
## Phase 2 — Host + configuration + shared-package wiring
### Task 9: Create the Contracts project + historian_gateway.proto skeleton
**Classification:** standard
**Estimated implement time:** ~5 min
**Parallelizable with:** Task 1
**Files:**
- Create: `~/Desktop/HistorianGateway/src/ZB.MOM.WW.HistorianGateway.Contracts/ZB.MOM.WW.HistorianGateway.Contracts.csproj`
- Create: `.../Contracts/Protos/historian_gateway.proto`
**Steps:**
1. `.csproj` net10.0, `Grpc.AspNetCore` 2.76.0, `<Protobuf Include="Protos\*.proto" GrpcServices="Both" />`.
2. Author `historian_gateway.proto` (`package historian_gateway.v1; option csharp_namespace = "ZB.MOM.WW.HistorianGateway.Contracts.Grpc";`) with the **service stubs and message shells** for the 4 historian services: `HistorianRead` (ReadRaw/ReadAggregate/ReadBlocks/ReadEvents server-streaming, ReadAtTime unary), `HistorianWrite` (AddHistoricalValues, SendEvent, WriteLiveValues), `HistorianTags` (BrowseTagNames streaming, GetTagMetadata, EnsureTags, DeleteTags, RenameTags, AddTagExtendedProperties), `HistorianStatus` (Probe, GetConnectionStatus, GetStoreForwardStatus, GetSystemParameter). Map the message fields to the vendored `HistorianSample`/`HistorianAggregateSample`/`HistorianEvent`/`HistorianTagMetadata`/`HistorianHistoricalValue` shapes (timestamps as `google.protobuf.Timestamp`, `RetrievalMode` as an enum mirroring the SDK's 15 modes).
3. Run: `dotnet build` — Expected: PASS; gateway gRPC base classes generated. Add project to `.slnx`.
4. Commit: `feat(contracts): historian_gateway.v1 proto + generated types`
### Task 10: Create the Server project + minimal boot
**Classification:** standard
**Estimated implement time:** ~5 min
**Parallelizable with:** none
**Files:**
- Create: `.../src/ZB.MOM.WW.HistorianGateway.Server/ZB.MOM.WW.HistorianGateway.Server.csproj`
- Create: `.../Server/Program.cs`
- Create: `.../Server/appsettings.json`, `.../Server/appsettings.Development.json`
**Steps:**
1. `.csproj` (Sdk `Microsoft.NET.Sdk.Web`): PackageReferences exactly mirroring mxaccessgw's Server csproj versions — `Grpc.AspNetCore` 2.76.0, `ZB.MOM.WW.Auth.{Abstractions,Ldap,ApiKeys,AspNetCore}` 0.1.2, `ZB.MOM.WW.Audit` 0.1.0, `ZB.MOM.WW.Theme` 0.3.1, `ZB.MOM.WW.Configuration` 0.1.0, `ZB.MOM.WW.Health` 0.1.0, `ZB.MOM.WW.Telemetry`+`.Serilog` 0.1.0, `Serilog.AspNetCore`/`.Sinks.Console`/`.Sinks.File`, `Microsoft.Data.Sqlite` 10.0.7, `Microsoft.Data.SqlClient` 6.0.2, `Polly.Core` 8.6.6. ProjectReferences: Contracts + vendored `AVEVA.Historian.Client` + `ZB.MOM.WW.GalaxyRepository` (project ref to the scadaproj lib, or pkg ref to its 0.1.0 nupkg).
2. `Program.cs`: minimal `WebApplication` that calls `AddZbSerilog`/`AddZbTelemetry` (ServiceName `historian-gateway`), `builder.Services.AddGrpc()`, maps `/healthz` + `/metrics` via `MapZbHealth`/`MapZbMetrics`, boots. (Subsystems wired in later tasks.)
3. Run: `dotnet build` then `dotnet run --project .../Server` and `curl -s localhost:<port>/healthz` — Expected: 200; `curl /metrics` returns Prometheus text. Add project to `.slnx`.
4. Commit: `feat(server): host scaffold + telemetry/serilog/health boot`
### Task 11: Configuration options + validators + ConfigPreflight
**Classification:** standard
**Estimated implement time:** ~5 min
**Parallelizable with:** none
**Files:**
- Create: `.../Server/Configuration/HistorianOptions.cs` + `HistorianOptionsValidator.cs`
- Create: `.../Server/Configuration/GalaxyOptions.cs` (thin wrapper / reuse `GalaxyRepositoryOptions`)
- Create: `.../Server/Configuration/RuntimeDbOptions.cs` + validator (SQL live-write)
- Create: `.../Server/Configuration/RedundancyOptions.cs` + validator
- Create: `.../Server/Configuration/StoreForwardOptions.cs` + validator
- Modify: `.../Server/Program.cs` (register `AddValidatedOptions<,>` + run `ConfigPreflight`)
- Test: `.../tests/ZB.MOM.WW.HistorianGateway.Tests/Configuration/ValidatorTests.cs`
**Steps:**
1. **Write failing validator tests first** using `OptionsValidatorBase`/`ValidationBuilder` semantics (e.g., missing `Historian:Host` → failure; bad port → failure; `Transport` one-of; redundancy `MinCount(members,1)` when enabled). Run — Expected: FAIL.
2. Implement options records + validators (subclass `OptionsValidatorBase<T>`, use `ValidationBuilder.Required/Port/HostPort/OneOf/PositiveTimeSpan/MinCount`). Map `HistorianOptions` → vendored `HistorianClientOptions` (Host, Port default 32565, `Transport=RemoteGrpc`, `GrpcUseTls`, credentials, `AllowUntrustedServerCertificate`).
3. In `Program.cs`, `AddValidatedOptions<,>` each, and run a `ConfigPreflight` (RequireValue host, RequirePort) before host build.
4. Run: `dotnet test` — Expected: PASS.
5. Commit: `feat(server): validated options + ConfigPreflight`
---
## Phase 3 — Connection layer (vendored client → gateway)
### Task 12: `IHistorianClient` seam over the vendored client
**Classification:** standard
**Estimated implement time:** ~5 min
**Parallelizable with:** none
**Files:**
- Create: `.../Server/Historian/IHistorianClient.cs` (interface mirroring the read/write methods the services need)
- Create: `.../Server/Historian/VendoredHistorianClient.cs` (adapts `AVEVA.Historian.Client.HistorianClient`)
- Test: `.../tests/.../Historian/HistorianClientSeamTests.cs`
**Steps:**
1. **Write failing test** that a `FakeHistorianClient : IHistorianClient` can be substituted and returns canned samples (this seam is what makes the gRPC services unit-testable without a live historian). Run — Expected: FAIL.
2. Define `IHistorianClient` with the methods the services call (ReadRaw/ReadAggregate/ReadAtTime/ReadBlocks/ReadEvents/BrowseTagNames/GetTagMetadata/Probe/GetConnectionStatus/GetStoreForwardStatus/GetSystemParameter/AddHistoricalValues/SendEvent/EnsureTag/DeleteTag/RenameTags/AddTagExtendedProperties). Implement `VendoredHistorianClient` delegating to the real `HistorianClient`.
3. Run: `dotnet test` — Expected: PASS.
4. Commit: `feat(historian): IHistorianClient seam + vendored adapter`
### Task 13: Connection pool (pre-authenticated, reused, health-checked)
**Classification:** high-risk
**Estimated implement time:** ~5 min
**Parallelizable with:** none
**Files:**
- Create: `.../Server/Historian/HistorianConnectionPool.cs` (+ `IHistorianConnectionPool`)
- Modify: `.../Server/Program.cs` (DI singleton)
- Test: `.../tests/.../Historian/HistorianConnectionPoolTests.cs`
**Steps:**
1. **Write failing test** asserting the pool opens/authenticates a connection once and reuses it across N borrow calls (count handshakes via a fake transport/lease factory), and that a faulted connection is evicted + re-created. Run — Expected: FAIL.
2. Implement a lease-based pool keyed by target; lazy-open with the auth handshake once; reuse; `SemaphoreSlim`-guarded reconnect on fault; expose `Lease()` returning a pooled `IHistorianClient`. (The vendored client is `IAsyncDisposable`; the pool owns lifecycle.)
3. Run: `dotnet test` — Expected: PASS.
4. Commit: `feat(historian): pooled pre-authenticated connection pool`
### Task 14: Store-forward + redundancy + SQL live-write wiring
**Classification:** high-risk
**Estimated implement time:** ~5 min
**Parallelizable with:** none
**Files:**
- Create: `.../Server/Historian/HistorianWriteCoordinator.cs` (routes writes → pool, store-forward, or redundancy per config)
- Create: `.../Server/Historian/SqlLiveValueWriter.cs` (`WriteLiveValues` via `aaAnalogTagInsert` + `INSERT INTO History`)
- Modify: `.../Server/Program.cs`
- Test: `.../tests/.../Historian/HistorianWriteCoordinatorTests.cs`, `.../SqlLiveValueWriterTests.cs`
**Steps:**
1. **Write failing tests:** (a) when store-forward enabled + historian unreachable, the coordinator enqueues (uses vendored `HistorianStoreForwardWriter` over a fake sink) and reports `Queued`; (b) when redundancy configured, it fans out via `HistorianRedundantClient` and returns per-member results under All/Any; (c) `SqlLiveValueWriter` builds the correct parameterized command sequence (assert against a fake `IDbCommand` recorder — no live SQL). Run — Expected: FAIL.
2. Implement the coordinator (compose vendored `HistorianStoreForwardWriter` + `HistorianRedundantClient` from config) and `SqlLiveValueWriter` (omit the server-managed `Quality` column; honor the storage-activation note from the SQL reference memory).
3. Run: `dotnet test` — Expected: PASS.
4. Commit: `feat(historian): write coordinator (store-forward + redundancy) + SQL live-write`
---
## Phase 4 — gRPC services + auth interceptor
### Task 15: `HistorianRead` service (representative TDD task; sets the pattern)
**Classification:** standard
**Estimated implement time:** ~5 min
**Parallelizable with:** Task 17 after the mapper exists
**Files:**
- Create: `.../Server/Grpc/HistorianReadService.cs`
- Create: `.../Server/Grpc/HistorianProtoMapper.cs` (SDK model ↔ proto)
- Modify: `.../Server/Program.cs` (`MapGrpcService`)
- Test: `.../tests/.../Grpc/HistorianReadServiceTests.cs`
**Steps:**
1. **Write failing test:** with a `FakeHistorianClient` yielding 3 `HistorianSample`s, calling `ReadRaw` streams 3 mapped proto rows; `ReadAggregate` passes the right `RetrievalMode`+interval; an unknown tag → `RpcException(NotFound)`; bad time range → `InvalidArgument`. Use an in-memory `IServerStreamWriter<T>` capture. Run — Expected: FAIL.
2. Implement `HistorianReadService : HistorianRead.HistorianReadBase` consuming `IHistorianConnectionPool.Lease()`; implement `HistorianProtoMapper` (Timestamp conversions, RetrievalMode enum map). Map exceptions per design §7.
3. Run: `dotnet test` — Expected: PASS.
4. Commit: `feat(grpc): HistorianRead service + proto mapper`
### Task 16: `HistorianWrite` service
**Classification:** standard
**Estimated implement time:** ~5 min
**Parallelizable with:** Task 17, Task 18 (no file overlap)
**Files:** Create `.../Server/Grpc/HistorianWriteService.cs`; Modify `Program.cs`; Test `.../Grpc/HistorianWriteServiceTests.cs`
**Steps:** TDD per the Task 15 pattern. `AddHistoricalValues`/`SendEvent` route through `HistorianWriteCoordinator`; `WriteLiveValues` through `SqlLiveValueWriter`. Map `ProtocolEvidenceMissingException``Unimplemented`, unreachable+store-forward → `OK` with `Queued` status, redundancy per-member results into the reply. Commit: `feat(grpc): HistorianWrite service`.
### Task 17: `HistorianTags` service
**Classification:** standard
**Estimated implement time:** ~5 min
**Parallelizable with:** Task 16, Task 18
**Files:** Create `.../Server/Grpc/HistorianTagsService.cs`; Modify `Program.cs`; Test `.../Grpc/HistorianTagsServiceTests.cs`
**Steps:** TDD. `BrowseTagNames` (streaming), `GetTagMetadata`, `EnsureTags`/`DeleteTags`/`RenameTags`/`AddTagExtendedProperties` via the seam/pool. Map unsupported tag types (`ProtocolEvidenceMissingException`) → `FailedPrecondition`. Commit: `feat(grpc): HistorianTags service`.
### Task 18: `HistorianStatus` service
**Classification:** small
**Estimated implement time:** ~4 min
**Parallelizable with:** Task 16, Task 17
**Files:** Create `.../Server/Grpc/HistorianStatusService.cs`; Modify `Program.cs`; Test `.../Grpc/HistorianStatusServiceTests.cs`
**Steps:** TDD. `Probe`/`GetConnectionStatus`/`GetStoreForwardStatus`/`GetSystemParameter`. Commit: `feat(grpc): HistorianStatus service`.
### Task 19: Galaxy gRPC wiring (consume the shared lib)
**Classification:** small
**Estimated implement time:** ~3 min
**Parallelizable with:** Task 1618
**Files:** Modify `.../Server/Program.cs` (`AddZbGalaxyRepository(config, "Galaxy")` + `MapZbGalaxyRepository()`); Modify `appsettings.json`
**Steps:** Register the shared lib's service + refresh hosted service; add `Galaxy:ConnectionString` config. Run: `dotnet run` + grpcurl `DiscoverHierarchy` against a fake/empty config returns `Unavailable` until cache loads (no live DB needed to prove wiring). Commit: `feat(server): wire shared GalaxyRepository gRPC service`.
### Task 20: API-key auth interceptor + scope resolver
**Classification:** high-risk
**Estimated implement time:** ~5 min
**Parallelizable with:** none
**Files:**
- Create: `.../Server/Security/GatewayGrpcScopeResolver.cs` (maps request type → scope)
- Create: `.../Server/Security/GatewayGrpcAuthorizationInterceptor.cs`
- Create: `.../Server/Security/GatewayScopes.cs` (`historian:read|write`, `historian:tags:write`, `galaxy:read`)
- Modify: `.../Server/Program.cs` (`AddZbApiKeyAuth` + `AddGrpc(o => o.Interceptors.Add<...>())`)
- Test: `.../tests/.../Security/GrpcAuthorizationTests.cs`
**Steps:**
1. **Write failing tests:** missing/invalid key → `Unauthenticated`; valid key without the required scope → `PermissionDenied`; valid key with scope → continuation runs. Fake `IApiKeyVerifier`. Run — Expected: FAIL.
2. Implement modeled on mxaccessgw's `GatewayGrpcAuthorizationInterceptor` + `GatewayGrpcScopeResolver` (switch on request type → scope), using shared `IApiKeyVerifier.VerifyAsync`. Respect a `Disabled` auth mode for dev.
3. Run: `dotnet test` — Expected: PASS.
4. Commit: `feat(security): gRPC API-key interceptor + scope enforcement`
---
## Phase 5 — Audit
### Task 21: Canonical SQLite audit writer + actor accessor + wiring
**Classification:** standard
**Estimated implement time:** ~5 min
**Parallelizable with:** Task 22 (dashboard auth) after interfaces exist
**Files:**
- Create: `.../Server/Audit/SqliteAuditWriter.cs` (`IAuditWriter`), `.../Server/Audit/HttpAuditActorAccessor.cs` (`IAuditActorAccessor`)
- Modify: write services (Tasks 16,17) + interceptor (Task 20) to emit `AuditEvent`s
- Modify: `.../Server/Program.cs` (`AddZbAudit` + register writer/actor)
- Test: `.../tests/.../Audit/SqliteAuditWriterTests.cs`
**Steps:**
1. **Write failing test:** writing an `AuditEvent` persists a row with the canonical 9 fields (`EventId`/`OccurredAtUtc`/`Actor`/`Action`/`Outcome`/`Category`/`Target`/`SourceNode`/`DetailsJson`), domain fields in `DetailsJson`; writer swallows internal errors. Use an in-memory SQLite. Run — Expected: FAIL.
2. Implement the SQLite writer (table create-if-missing) modeled on MxGateway's audit store; `HttpAuditActorAccessor` reads the Auth principal. Emit audit at tag/value/event writes, API-key admin, login/logout, with `Actor` from the accessor.
3. Run: `dotnet test` — Expected: PASS.
4. Commit: `feat(audit): canonical SQLite audit writer + actor wiring`
---
## Phase 6 — Blazor dashboard
### Task 22: Dashboard shell, LDAP cookie auth, login/logout
**Classification:** high-risk
**Estimated implement time:** ~5 min
**Parallelizable with:** none
**Files:**
- Create: `.../Server/Dashboard/Components/{App,Routes,_Imports}.razor`, `Layout/{MainLayout,LoginLayout}.razor`, `Pages/Login.razor`
- Create: `.../Server/Dashboard/DashboardServiceCollectionExtensions.cs`, `.../Dashboard/DashboardEndpointRouteBuilderExtensions.cs`, `.../Dashboard/DashboardAuthenticator.cs`, `.../Dashboard/DashboardGroupRoleMapper.cs`
- Modify: `Program.cs` (`AddGatewayDashboard` + `MapRazorComponents<App>` + auth/antiforgery middleware)
- Test: `.../tests/ZB.MOM.WW.HistorianGateway.Tests/bUnit/LayoutRenderTests.cs`
**Steps:**
1. **Write failing bUnit test** that `MainLayout` renders `<ThemeShell>` with the nav rail and `LoginCard` renders on the login page. Run — Expected: FAIL.
2. Port the dashboard shell from mxaccessgw (`App.razor` with `ThemeHead`/`ThemeScripts`, `MainLayout` with `ThemeShell`+`NavRailSection`/`NavRailItem`, `Login.razor` using `LoginCard` posting to `/auth/login`). Wire `AddZbLdapAuth(config,"Ldap")`, cookie auth via `ZbCookieDefaults.Apply`, `IGroupRoleMapper<CanonicalRole>`, `DisableLogin` switch, `IAuditActorAccessor`.
3. Run: `dotnet test` (bUnit) then `dotnet run` and load `/login` in a browser/curl — Expected: tests PASS; login page renders themed.
4. Commit: `feat(dashboard): Theme shell + LDAP cookie auth + login`
### Task 23: Status + Health pages
**Classification:** small
**Estimated implement time:** ~4 min
**Parallelizable with:** Task 24, Task 25
**Files:** Create `.../Dashboard/Components/Pages/{StatusPage,HealthPage}.razor` (+ a `DashboardStatusService`); Test bUnit render.
**Steps:** TDD bUnit render. Status shows pool state, store-forward queue depth, redundancy members, version (from a status service reading the pool/coordinator). Commit: `feat(dashboard): status + health pages`.
### Task 24: Galaxy browser page
**Classification:** small
**Estimated implement time:** ~5 min
**Parallelizable with:** Task 23, Task 25
**Files:** Create `.../Dashboard/Components/Pages/GalaxyBrowserPage.razor` + tree node view (port mxaccessgw `BrowsePage`/`BrowseTreeNodeView`, read-only, no add-tag); Test bUnit.
**Steps:** TDD bUnit render against the shared lib's cache. Commit: `feat(dashboard): read-only Galaxy browser`.
### Task 25: Historian console page (query + role-gated write test)
**Classification:** standard
**Estimated implement time:** ~5 min
**Parallelizable with:** Task 23, Task 24
**Files:** Create `.../Dashboard/Components/Pages/HistorianConsolePage.razor` (+ `DashboardHistorianService` calling the seam/pool); Test bUnit.
**Steps:** TDD bUnit. Query form (tag, time range, raw/aggregate + mode picker) renders results; write-test panel (historical value insert / event send) visible only to Engineer+ roles via `AuthorizeView`. Commit: `feat(dashboard): historian query + role-gated write console`.
### Task 26: API-key admin page
**Classification:** standard
**Estimated implement time:** ~5 min
**Parallelizable with:** Task 2325
**Files:** Create `.../Dashboard/Components/Pages/ApiKeysPage.razor` (+ `DashboardApiKeyManagementService` over the shared ApiKeys store); Test bUnit.
**Steps:** TDD bUnit. List/create (show secret once)/revoke keys with scope selection. Commit: `feat(dashboard): API-key admin`.
---
## Phase 7 — Telemetry meters + Health probes
### Task 27: App meters
**Classification:** small
**Estimated implement time:** ~4 min
**Parallelizable with:** Task 28
**Files:** Create `.../Server/Observability/GatewayMetrics.cs`; Modify services/coordinator/pool to record; Modify `Program.cs` (`o.Meters=[GatewayMetrics.MeterName]`); Test `.../Observability/GatewayMetricsTests.cs`.
**Steps:** TDD with `MeterListener`. Counters/histograms: read/write counts + latency, store-forward queue depth (observable gauge), pool connection state, redundancy ack outcomes. Commit: `feat(obs): gateway meters`.
### Task 28: Health probes
**Classification:** small
**Estimated implement time:** ~4 min
**Parallelizable with:** Task 27
**Files:** Create `.../Server/Health/{HistorianConnectionHealthCheck,StoreForwardDrainHealthCheck}.cs`; Modify `Program.cs` (`AddHealthChecks` with `GrpcDependencyHealthCheck` for historian, SQL checks for Galaxy + Runtime DB, custom checks, tagged `ZbHealthTags.Ready`); Test health-check unit tests.
**Steps:** TDD. Probes flip Unhealthy when a dependency is down (fake deps). Commit: `feat(health): historian/galaxy/runtime-db/store-forward probes`.
---
## Phase 8 — Integration, docs, repo
### Task 29: Env-gated live integration tests
**Classification:** standard
**Estimated implement time:** ~5 min
**Parallelizable with:** none
**Files:** Create `.../tests/.../Integration/{HistorianRoundTripTests,GalaxyBrowseTests}.cs`
**Steps:** Gated on `HISTORIAN_GRPC_HOST`/`HISTORIAN_GRPC_WRITE_SANDBOX_TAG` and a Galaxy SQL connection env var; `Skip` when absent. Cover read→write→read-back via the self-cleaning sandbox-tag lifecycle and a Galaxy `DiscoverHierarchy`. Run `dotnet test` (skips locally). Commit: `test: env-gated live integration`.
### Task 30: Full-suite green gate + smoke
**Classification:** small
**Estimated implement time:** ~3 min
**Parallelizable with:** none
**Steps:** Run `dotnet build ZB.MOM.WW.HistorianGateway.slnx` + `dotnet test` (whole solution) on macOS with no live env — Expected: ALL green, live tests skipped. `dotnet run` + curl `/healthz` (200), `/metrics` (text), grpcurl `HistorianStatus/Probe`. Fix any gaps. Commit: `chore: green gate + smoke`.
### Task 31: CLAUDE.md + README + gitea remote + scadaproj index
**Classification:** small
**Estimated implement time:** ~5 min
**Parallelizable with:** none
**Files:** Create `~/Desktop/HistorianGateway/{CLAUDE.md,README.md}`; copy the two design/plan docs into its `docs/plans/`; Modify `~/Desktop/scadaproj/CLAUDE.md` (index the new sidecar + note the GalaxyRepository follow-on for mxaccessgw).
**Steps:**
1. Write `CLAUDE.md` (overview, build/run/test commands, the no-COM single-process note, the vendored-histsdk + shared-GalaxyRepository dependencies, config sections, env vars) and `README.md`.
2. Create the gitea repo `historiangw` and push: `git -C ~/Desktop/HistorianGateway remote add origin https://gitea.dohertylan.com/dohertj2/historiangw.git && git push -u origin main` (confirm remote name/visibility with the user first).
3. Update scadaproj's umbrella `CLAUDE.md` runtime/implementation table with the new project row; commit scadaproj separately.
4. Commit: `docs: CLAUDE.md + README; index in scadaproj`.
---
## Dependency summary (for parallel dispatch)
- **Foundational, no blockers:** Task 1 (galaxy lib scaffold), Task 7 (vendor histsdk), Task 8 (repo init) — Task 8 consumes Task 7's tree.
- **Galaxy lib chain:** 2→3→4→5→6 (sequential; share files).
- **Sidecar chain:** 8→9→10→11→12→13→14, then gRPC services 15→(16,17,18 parallel),19, then 20, then 21.
- **Dashboard:** 22→(23,24,25,26 parallel) after Task 20 (auth) + Task 13/14 (data) + Task 5/19 (galaxy).
- **Obs:** 27,28 parallel after Task 14.
- **Close-out:** 29→30→31 after everything.
## Notes / non-goals (from design §9)
- No `AddS2` live streaming-sample writes (GATED) — live values only via SQL `WriteLiveValues`.
- No two-process/x86 worker (no COM).
- mxaccessgw adopting `ZB.MOM.WW.GalaxyRepository` is a tracked follow-on, NOT in this plan.
@@ -0,0 +1,37 @@
{
"planPath": "docs/plans/2026-06-23-historian-gateway-implementation.md",
"tasks": [
{"id": 1, "subject": "Task 1: Scaffold the GalaxyRepository lib project", "status": "pending"},
{"id": 2, "subject": "Task 2: Port the canonical galaxy_repository.proto (neutral namespace)", "status": "pending", "blockedBy": [1]},
{"id": 3, "subject": "Task 3: Port the SQL browse provider", "status": "pending", "blockedBy": [2]},
{"id": 4, "subject": "Task 4: Port the in-memory hierarchy cache + snapshot + refresh service", "status": "pending", "blockedBy": [3]},
{"id": 5, "subject": "Task 5: Port the reusable gRPC service + DI extension", "status": "pending", "blockedBy": [4]},
{"id": 6, "subject": "Task 6: Unit tests for projector/cache; pack 0.1.0", "status": "pending", "blockedBy": [5]},
{"id": 7, "subject": "Task 7: Vendor histsdk AVEVA.Historian.Client + golden tests", "status": "pending"},
{"id": 8, "subject": "Task 8: Init sidecar repo + solution + Directory.Build.props", "status": "pending", "blockedBy": [7]},
{"id": 9, "subject": "Task 9: Contracts project + historian_gateway.proto skeleton", "status": "pending", "blockedBy": [8]},
{"id": 10, "subject": "Task 10: Server project + minimal boot (telemetry/serilog/health)", "status": "pending", "blockedBy": [9, 7, 6]},
{"id": 11, "subject": "Task 11: Configuration options + validators + ConfigPreflight", "status": "pending", "blockedBy": [10]},
{"id": 12, "subject": "Task 12: IHistorianClient seam over vendored client", "status": "pending", "blockedBy": [10]},
{"id": 13, "subject": "Task 13: Connection pool (pre-authenticated, reused)", "status": "pending", "blockedBy": [12]},
{"id": 14, "subject": "Task 14: Write coordinator (store-forward + redundancy) + SQL live-write", "status": "pending", "blockedBy": [13]},
{"id": 15, "subject": "Task 15: HistorianRead service + proto mapper", "status": "pending", "blockedBy": [13]},
{"id": 16, "subject": "Task 16: HistorianWrite service", "status": "pending", "blockedBy": [14, 15]},
{"id": 17, "subject": "Task 17: HistorianTags service", "status": "pending", "blockedBy": [13, 15]},
{"id": 18, "subject": "Task 18: HistorianStatus service", "status": "pending", "blockedBy": [13, 15]},
{"id": 19, "subject": "Task 19: Galaxy gRPC wiring (consume shared lib)", "status": "pending", "blockedBy": [10, 5]},
{"id": 20, "subject": "Task 20: API-key auth interceptor + scope resolver", "status": "pending", "blockedBy": [15]},
{"id": 21, "subject": "Task 21: SQLite audit writer + actor accessor + wiring", "status": "pending", "blockedBy": [16, 17, 20]},
{"id": 22, "subject": "Task 22: Dashboard shell + LDAP cookie auth + login", "status": "pending", "blockedBy": [20]},
{"id": 23, "subject": "Task 23: Status + Health pages", "status": "pending", "blockedBy": [22, 14]},
{"id": 24, "subject": "Task 24: Galaxy browser page", "status": "pending", "blockedBy": [22, 19]},
{"id": 25, "subject": "Task 25: Historian console page (role-gated write)", "status": "pending", "blockedBy": [22, 15]},
{"id": 26, "subject": "Task 26: API-key admin page", "status": "pending", "blockedBy": [22]},
{"id": 27, "subject": "Task 27: App meters", "status": "pending", "blockedBy": [14]},
{"id": 28, "subject": "Task 28: Health probes", "status": "pending", "blockedBy": [14, 19]},
{"id": 29, "subject": "Task 29: Env-gated live integration tests", "status": "pending", "blockedBy": [16, 17, 18, 19]},
{"id": 30, "subject": "Task 30: Full-suite green gate + smoke", "status": "pending", "blockedBy": [21, 26, 27, 28, 29]},
{"id": 31, "subject": "Task 31: CLAUDE.md + README + gitea remote + scadaproj index", "status": "pending", "blockedBy": [30]}
],
"lastUpdated": "2026-06-23"
}

Some files were not shown because too many files have changed in this diff Show More