Phase 3 PR 36 — AVEVA prerequisites test-support library #35

Merged
dohertj2 merged 1 commits from phase-3-pr36-aveva-prerequisites into v2 2026-04-18 16:44:42 -04:00
Owner

Closes the gap where live-Galaxy smoke tests silently returned 'unreachable' without telling operators which specific piece failed. Adds a new multi-targeted test-support library (net10.0 + net48 so both modern and MXAccess-COM x86 consumers can reference it) exposing AvevaPrerequisites.CheckAllAsync / CheckRepositoryOnlyAsync / CheckGalaxyHostPipeOnlyAsync.

Eight probe categories

Category What's checked
Environment Process bitness (MXAccess COM is 32-bit-only — warn on 64-bit hosts).
AvevaInstall HKLM\SOFTWARE\WOW6432Node\ArchestrA\Framework registry keys, deployed-Platform marker, pending-reboot flag.
AvevaCoreService aaBootstrap + aaGR + NmxSvc + MSSQLSERVER — hard fail if missing.
AvevaSoftService aaLogger + aaUserValidator + aaGlobalDataCacheMonitorSvr — warn only.
MxAccessCom LMXProxy.LMXProxyServer ProgID → CLSID → 32-bit InprocServer32 file-on-disk.
GalaxyRepository SQL Server reachable (distinguishes 'unreachable' from 'ZB db missing'), deployed-object count.
AvevaHistorian aahClientAccessPoint + aahGateway — opt-in, only matters for HistoryRead.
OtOpcUaService OtOpcUaGalaxyHost + OtOpcUa + GLAuth + optional named-pipe probe.

Each check returns a PrerequisiteCheck record: Name (service:aaBootstrap, com:LMXProxy, …), Category, Status (Pass/Warn/Fail/Skip), and operator-facing Detail.

Report surface

  • IsLivetestReady — no Fail anywhere.
  • IsAvevaSideReady — AVEVA categories clean, lets v2 services be absent while still considering the environment AVEVA-ready (important while the v2 stack is mid-development).
  • SkipReason — multi-line message for Assert.Skip when any hard dependency failed; lists every Fail with its operator-actionable detail.
  • Warnings — separate text for Warn rows (degraded but not blocking).
  • RequireCategories(...) — throws when specific categories fail; lets a test be strict about what it needs.

Probe details worth flagging

  • ServiceProbe — treats DemandStart+Stopped as Warn (NmxSvc is DemandStart by design). AutoStart+Stopped = Fail. Not-installed = Fail for hard-required, Warn for soft.
  • RegistryProbePfeConfigOptions with PlatformId=0 means never deployed → surfaces the 'MXAccess will connect but every subscription is Bad quality' case that would otherwise debug-fail opaquely. RebootRequired post-patch surfaces as a loud warn.
  • MxAccessComProbe — resolves ProgID → CLSID → InprocServer32 file, catches orphan-registry installs where the DLL was removed but the registry entry remains. Also warns when the test process is 64-bit (explains REGDB_E_CLASSNOTREG in advance).
  • SqlProbe — distinguishes SQL unreachable (can't connect) from ZB missing (SELECT DB_ID('ZB') returns null) because remediation differs. Counts gobject WHERE deployed_version > 0 as a secondary warn-on-zero signal.
  • NamedPipeProbe — 2s NamedPipeClientStream.ConnectAsync against OtOpcUaGalaxy; disconnects immediately so we don't consume a session slot.

Multi-targeting

net48 gets System.ServiceProcess + Microsoft.Win32 in-box via BCL references; net10 gets the NuGet packages. Microsoft.Data.SqlClient v6 supports both. Net48Polyfills.cs provides IsExternalInit (records) + SupportedOSPlatformAttribute stub so the same sources compile on both frameworks with no per-callsite preprocessor guards.

Tests

AvevaPrerequisitesLiveTests (8 new Category=LiveGalaxy cases in Galaxy.Host.Tests) exercises the helper against this live dev box:

  • Helper reports Framework install as Pass.
  • aaBootstrap / aaGR running as Pass.
  • MxAccess COM registered → Pass.
  • ZB reachable + > 0 deployed objects → Pass.
  • IsAvevaSideReady == true even when our v2 services aren't installed (dev-box in-progress state).
  • Helper emits rows for OtOpcUaGalaxyHost + OtOpcUa + GLAuth even when not installed — regression guard so nobody can silently drop our own services from the check.

GalaxyRepositoryLiveSmokeTests updated to delegate its skip decision to AvevaPrerequisites.CheckRepositoryOnlyAsync (legacy ZbReachableAsync kept as a compat adapter while surrounding fixtures migrate to Assert.Skip-with-reason).

Test posture

  • Galaxy.Host.Tests Category=LiveGalaxy: 13 pass / 0 fail (5 prior smoke + 8 new prerequisites).
  • Full solution build clean — 0 errors.

What's next

The end-to-end live-Galaxy stack smoke (Proxy → Host pipe → MXAccess → real Galaxy tag) is the next PR. It'll call AvevaPrerequisites.CheckAllAsync at fixture construction and Assert.Skip(report.SkipReason) when the environment isn't ready — replacing 'silent return' with actionable diagnostics.

Closes the gap where live-Galaxy smoke tests silently returned 'unreachable' without telling operators which specific piece failed. Adds a new multi-targeted test-support library (`net10.0` + `net48` so both modern and MXAccess-COM x86 consumers can reference it) exposing `AvevaPrerequisites.CheckAllAsync` / `CheckRepositoryOnlyAsync` / `CheckGalaxyHostPipeOnlyAsync`. ## Eight probe categories | Category | What's checked | | --- | --- | | `Environment` | Process bitness (MXAccess COM is 32-bit-only — warn on 64-bit hosts). | | `AvevaInstall` | `HKLM\SOFTWARE\WOW6432Node\ArchestrA\Framework` registry keys, deployed-Platform marker, pending-reboot flag. | | `AvevaCoreService` | `aaBootstrap` + `aaGR` + `NmxSvc` + `MSSQLSERVER` — hard fail if missing. | | `AvevaSoftService` | `aaLogger` + `aaUserValidator` + `aaGlobalDataCacheMonitorSvr` — warn only. | | `MxAccessCom` | `LMXProxy.LMXProxyServer` ProgID → CLSID → 32-bit `InprocServer32` file-on-disk. | | `GalaxyRepository` | SQL Server reachable (distinguishes 'unreachable' from 'ZB db missing'), deployed-object count. | | `AvevaHistorian` | `aahClientAccessPoint` + `aahGateway` — opt-in, only matters for HistoryRead. | | `OtOpcUaService` | `OtOpcUaGalaxyHost` + `OtOpcUa` + `GLAuth` + optional named-pipe probe. | Each check returns a `PrerequisiteCheck` record: `Name` (`service:aaBootstrap`, `com:LMXProxy`, …), `Category`, `Status` (Pass/Warn/Fail/Skip), and operator-facing `Detail`. ## Report surface - `IsLivetestReady` — no Fail anywhere. - `IsAvevaSideReady` — AVEVA categories clean, lets v2 services be absent while still considering the environment AVEVA-ready (important while the v2 stack is mid-development). - `SkipReason` — multi-line message for `Assert.Skip` when any hard dependency failed; lists every Fail with its operator-actionable detail. - `Warnings` — separate text for Warn rows (degraded but not blocking). - `RequireCategories(...)` — throws when specific categories fail; lets a test be strict about what it needs. ## Probe details worth flagging - **ServiceProbe** — treats `DemandStart+Stopped` as Warn (NmxSvc is DemandStart by design). AutoStart+Stopped = Fail. Not-installed = Fail for hard-required, Warn for soft. - **RegistryProbe** — `PfeConfigOptions` with `PlatformId=0` means never deployed → surfaces the 'MXAccess will connect but every subscription is Bad quality' case that would otherwise debug-fail opaquely. `RebootRequired` post-patch surfaces as a loud warn. - **MxAccessComProbe** — resolves ProgID → CLSID → `InprocServer32` file, catches orphan-registry installs where the DLL was removed but the registry entry remains. Also warns when the test process is 64-bit (explains REGDB_E_CLASSNOTREG in advance). - **SqlProbe** — distinguishes `SQL unreachable` (can't connect) from `ZB missing` (`SELECT DB_ID('ZB')` returns null) because remediation differs. Counts `gobject WHERE deployed_version > 0` as a secondary warn-on-zero signal. - **NamedPipeProbe** — 2s `NamedPipeClientStream.ConnectAsync` against `OtOpcUaGalaxy`; disconnects immediately so we don't consume a session slot. ## Multi-targeting net48 gets `System.ServiceProcess` + `Microsoft.Win32` in-box via BCL references; net10 gets the NuGet packages. Microsoft.Data.SqlClient v6 supports both. `Net48Polyfills.cs` provides `IsExternalInit` (records) + `SupportedOSPlatformAttribute` stub so the same sources compile on both frameworks with no per-callsite preprocessor guards. ## Tests `AvevaPrerequisitesLiveTests` (8 new `Category=LiveGalaxy` cases in `Galaxy.Host.Tests`) exercises the helper against this live dev box: - Helper reports Framework install as Pass. - aaBootstrap / aaGR running as Pass. - MxAccess COM registered → Pass. - ZB reachable + > 0 deployed objects → Pass. - `IsAvevaSideReady == true` even when our v2 services aren't installed (dev-box in-progress state). - Helper emits rows for `OtOpcUaGalaxyHost` + `OtOpcUa` + `GLAuth` even when not installed — regression guard so nobody can silently drop our own services from the check. `GalaxyRepositoryLiveSmokeTests` updated to delegate its skip decision to `AvevaPrerequisites.CheckRepositoryOnlyAsync` (legacy `ZbReachableAsync` kept as a compat adapter while surrounding fixtures migrate to `Assert.Skip-with-reason`). ## Test posture - Galaxy.Host.Tests Category=LiveGalaxy: **13 pass / 0 fail** (5 prior smoke + 8 new prerequisites). - Full solution build clean — 0 errors. ## What's next The end-to-end live-Galaxy stack smoke (Proxy → Host pipe → MXAccess → real Galaxy tag) is the next PR. It'll call `AvevaPrerequisites.CheckAllAsync` at fixture construction and `Assert.Skip(report.SkipReason)` when the environment isn't ready — replacing 'silent return' with actionable diagnostics.
dohertj2 added 1 commit 2026-04-18 16:44:39 -04:00
AvevaPrerequisites.CheckAllAsync walks eight probe categories producing PrerequisiteCheck rows each with Name (e.g. 'service:aaBootstrap', 'sql:ZB', 'com:LMXProxy', 'registry:ArchestrA.Framework'), Category (AvevaCoreService / AvevaSoftService / AvevaInstall / MxAccessCom / GalaxyRepository / AvevaHistorian / OtOpcUaService / Environment), Status (Pass / Warn / Fail / Skip), and operator-facing Detail message. Report aggregates them: IsLivetestReady (no Fails anywhere) and IsAvevaSideReady (AVEVA-side categories pass, our v2 services can be absent while still considering the environment AVEVA-ready) so different test tiers can use the right threshold.
Individual probes: ServiceProbe.Check queries the Windows Service Control Manager via System.ServiceProcess.ServiceController — treats DemandStart+Stopped as Warn (NmxSvc is DemandStart by design; master pulls it up) but AutoStart+Stopped as Fail; not-installed is Fail for hard-required services, Warn for soft ones; non-Windows hosts get Skip; transitional states like StartPending get Warn with a 'try again' hint. RegistryProbe reads HKLM\SOFTWARE\WOW6432Node\ArchestrA\{Framework,Framework\Platform,MSIInstall} — Framework key presence + populated InstallPath/RootPath values mean System Platform installed; PfeConfigOptions in the Platform subkey (format 'PlatformId=N,EngineId=N,...') indicates a Platform has been deployed from the IDE (PlatformId=0 means never deployed — MXAccess will connect but every subscription will be Bad quality); RebootRequired='True' under MSIInstall surfaces as a loud warn since post-patch behavior is undefined. MxAccessComProbe resolves the LMXProxy.LMXProxyServer ProgID → CLSID → HKLM\SOFTWARE\Classes\WOW6432Node\CLSID\{guid}\InprocServer32, verifying the registered file exists on disk (catches the orphan-registry case where a previous uninstall left the ProgID registered but the DLL is gone — distinguishes it from the 'totally not installed' case by message); also emits a Warn when the test process is 64-bit (MXAccess COM activation fails with REGDB_E_CLASSNOTREG 0x80040154 regardless of registration, so seeing this warning tells operators why the activation would fail even on a fully-installed machine). SqlProbe tests Galaxy Repository via Microsoft.Data.SqlClient using the Windows-auth localhost connection string the repo code defaults to — distinguishes 'SQL Server unreachable' (connection fails) from 'ZB database does not exist' (SELECT DB_ID('ZB') returns null) because they have different remediation paths (sc.exe start MSSQLSERVER vs. restore from .cab backup); a secondary CheckDeployedObjectCountAsync query on 'gobject WHERE deployed_version > 0' warns when the count is zero because discovery smoke tests will return empty hierarchies. NamedPipeProbe opens a 2s NamedPipeClientStream against OtOpcUaGalaxyHost's pipe ('OtOpcUaGalaxy' per the installer default) — pipe accepting a connection proves the Host service is listening; disconnects immediately so we don't consume a session slot.
Service lists kept as internal static data so tests can inspect + override: CoreServices (aaBootstrap + aaGR + NmxSvc + MSSQLSERVER — hard fail if missing), SoftServices (aaLogger + aaUserValidator + aaGlobalDataCacheMonitorSvr — warn only; stack runs without them but diagnostics/auth are degraded), HistorianServices (aahClientAccessPoint + aahGateway — opt-in via Options.CheckHistorian, only matters for HistoryRead IPC paths), OtOpcUaServices (our OtOpcUaGalaxyHost hard-required for end-to-end live tests + OtOpcUa warn + GLAuth warn). Narrower entry points CheckRepositoryOnlyAsync and CheckGalaxyHostPipeOnlyAsync for tests that only care about specific subsystems — avoid paying the full probe cost on every GalaxyRepositoryLiveSmokeTests fact.
Multi-targeting mechanics: System.ServiceProcess.ServiceController + Microsoft.Win32.Registry are NuGet packages on net10 but in-box BCL references on net48; csproj conditions Package vs Reference by TargetFramework. Microsoft.Data.SqlClient v6 supports both frameworks so single PackageReference. Net48Polyfills.cs provides IsExternalInit shim (records/init-only setters) and SupportedOSPlatformAttribute stub so the same Probe sources compile on both frameworks without per-callsite preprocessor guards — lets Roslyn's platform-compatibility analyzer stay useful on net10 without breaking net48 builds.
Existing GalaxyRepositoryLiveSmokeTests updated to delegate its skip decision to AvevaPrerequisites.CheckRepositoryOnlyAsync (legacy ZbReachableAsync kept as a compatibility adapter so the in-test 'if (!await ZbReachableAsync()) return;' pattern keeps working while the surrounding fixtures gradually migrate to Assert.Skip-with-reason). Slnx file registers the new project.
Tests — AvevaPrerequisitesLiveTests (8 new Integration cases, Category=LiveGalaxy): the helper correctly reports Framework install (registry pass), aaBootstrap Running (service pass), aaGR Running (service pass), MxAccess COM registered (com pass), ZB database reachable (sql pass), deployed-object count > 0 (warn-upgraded-to-pass because this box has 49 objects deployed), the AVEVA side is ready even when our own services (OtOpcUaGalaxyHost) aren't installed yet (IsAvevaSideReady=true), and the helper emits rows for OtOpcUaGalaxyHost + OtOpcUa + GLAuth even when not installed (regression guard — nobody can accidentally ship a check that omits our own services). Full Galaxy.Host.Tests Category=LiveGalaxy suite: 13 pass (5 prior smoke + 8 new prerequisites). Full solution build clean, 0 errors.
What's NOT in this PR: end-to-end Galaxy stack smoke (Proxy → Host pipe → MXAccess → real Galaxy tag). That's the next PR — this one is the gate the end-to-end smoke will call first to produce actionable skip messages instead of silent returns.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
dohertj2 merged commit 261869d84e into v2 2026-04-18 16:44:42 -04:00
dohertj2 referenced this issue from a commit 2026-04-19 03:16:40 -04:00
Phase 6 — Draft 4 implementation plans covering v2 unimplemented features + adversarial review + adjustments. After drivers were paused per user direction, audited the v2 plan for features documented-but-unshipped and identified four coherent tracks that had no implementation plan at all. Each plan follows the docs/v2/implementation/phase-*.md template (DRAFT status, branch name, Stream A-E task breakdown, Compliance Checks, Risks, Completion Checklist). docs/v2/implementation/phase-6-1-resilience-and-observability.md (243 lines) covers Polly resilience pipelines wired to every capability interface, Tier A/B/C runtime enforcement (memory watchdog generalized beyond Galaxy, scheduled recycle per decision #67, wedge detection), health endpoints on :4841, structured Serilog with correlation IDs, LiteDB local-cache fallback per decision #36. phase-6-2-authorization-runtime.md (145 lines) wires ACL enforcement on every OPC UA Read/Write/Subscribe/Call path + LDAP-group-to-admin-role grants per decisions #105 and #129 -- runtime permission-trie evaluator over the 6-level Cluster/Namespace/UnsArea/UnsLine/Equipment/Tag hierarchy, per-session cache invalidated on generation-apply + LDAP-cache expiry. phase-6-3-redundancy-runtime.md (165 lines) lands the non-transparent warm/hot redundancy runtime per decisions #79-85: dynamic ServiceLevel node, ServerUriArray peer broadcast, mid-apply dip via sp_PublishGeneration hook, operator-driven role transition (no auto-election -- plan remains explicit about what's out of scope). phase-6-4-admin-ui-completion.md (178 lines) closes Phase 1 Stream E completion-checklist items that never landed: UNS drag-reorder + impact preview, Equipment CSV import, 5-identifier search, draft-diff viewer enhancements, OPC 40010 _base Identification field exposure per decisions #138-139. Each plan then got a Codex adversarial-review pass (codex mcp tool, read-only sandbox, synchronous). Reviews explicitly targeted decision-log conflicts, API-shape assumptions, unbounded blast radius, under-specified state transitions, and testing holes. Appended 'Adversarial Review — 2026-04-19' section to each plan with numbered findings (severity / finding / why-it-matters / adjustment accepted). Review surfaced real substantive issues that the initial drafts glossed over: Phase 6.1 auto-retry conflicting with decisions #44-45 no-auto-write-retry rule; Phase 6.1 per-driver-instance pipeline breaking decision #35's per-device isolation; Phase 6.1 recycle/watchdog at Tier A/B breaching decisions #73-74 Tier-C-only constraint; Phase 6.2 conflating control-plane LdapGroupRoleMapping with data-plane ACL grants; Phase 6.2 missing Browse enforcement entirely; Phase 6.2 subscription re-authorization policy unresolved between create-time-only and per-publish; Phase 6.3 ServiceLevel=0 colliding with OPC UA Part 5 Maintenance semantics; Phase 6.3 ServerUriArray excluding self (spec-bug); Phase 6.3 apply-window counter race on cancellation; Phase 6.3 client cutover for Kepware/Aveva OI Gateway is unverified hearsay; Phase 6.4 stale UNS impact preview overwriting concurrent draft edits; Phase 6.4 identifier contract drifting from admin-ui.md canonical set (ZTag/MachineCode/SAPID/EquipmentId/EquipmentUuid, not ZTag/SAPID/UniqueId/Alias1/Alias2); Phase 6.4 CSV import atomicity internally contradictory (single txn vs chunked inserts); Phase 6.4 OPC 40010 field list not matching decision #139. Every finding has an adjustment in the plan doc -- plans are meant to be executable from the next session with the critique already baked in rather than a clean draft that would run into the same issues at implementation time. Codex thread IDs cited in each plan's review section for reproducibility. Pure documentation PR -- no code changes. Plans are DRAFT status; each becomes its own implementation phase with its own entry-gate + exit-gate when business prioritizes.
Sign in to join this conversation.
No Reviewers
No Label
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: dohertj2/lmxopcua#35