diff --git a/docs/plans/alarms-over-gateway.md b/docs/plans/alarms-over-gateway.md index ce16f2b..85ed848 100644 --- a/docs/plans/alarms-over-gateway.md +++ b/docs/plans/alarms-over-gateway.md @@ -442,7 +442,8 @@ storage) plug into the same path. ### PR B.5 — docs + memory housekeeping -**Depends on:** B.1 / B.2 / B.3 / B.4 all green on the parity rig. +**Depends on:** B.1 / B.2 / B.3 / B.4 all green on the parity rig + D.1 +(deployment refresh) verified on the dev rig. **Files:** @@ -533,6 +534,143 @@ completes that slot. Two PRs in the sidecar + one consumer-side PR C.2's lmxopcua-side consumer is **PR B.4 in Track B**, which depends on C.2 being deployed. +## Track D — deployment refresh + +The dev box at `DESKTOP-6JL3KKO` runs three live services from +`C:\publish\` (installed in the session that produced commit +`ea04547`'s install scripts). Once Tracks A / B / C are merged, the +deployed binaries need to be refreshed so the running services pick +up the new alarm path. Track D is one PR — pure ops, no code change. + +### PR D.1 — refresh C:\publish + restart services + +**Depends on:** A.4 + B.4 + C.2 merged (every code-change PR landed). + +**Order matters** — services must stop in reverse-dependency order +(`OtOpcUa` → `OtOpcUaWonderwareHistorian` → `MxAccessGw`) and start in +forward-dependency order (`MxAccessGw` → `OtOpcUaWonderwareHistorian` +→ `OtOpcUa`). Touching binaries while a dependent service holds them +locked produces the publish-time `MSB3027` file-lock error caught +during the original install (see commit `80104ca`). + +**Steps (run as a single PowerShell session on the deploy host):** + +1. **Stop in reverse order**: + ```powershell + nssm stop OtOpcUa + nssm stop OtOpcUaWonderwareHistorian + nssm stop MxAccessGw + Start-Sleep -Seconds 3 + Get-Process MxGateway.Server, MxGateway.Worker, OtOpcUa.Server, ` + OtOpcUa.Driver.Historian.Wonderware -ErrorAction SilentlyContinue | + Stop-Process -Force + ``` + +2. **Refresh mxaccessgw binaries** (Track A output): + ```powershell + $gwSrc = "C:\Users\dohertj2\Desktop\mxaccessgw" + dotnet build "$gwSrc\src\MxGateway.Worker" -c Release + dotnet build "$gwSrc\src\MxGateway.Server" -c Release + + Copy-Item -Recurse -Force ` + "$gwSrc\src\MxGateway.Server\bin\Release\net10.0\*" ` + "C:\publish\mxaccessgw\Server\" + Copy-Item -Recurse -Force ` + "$gwSrc\src\MxGateway.Worker\bin\x86\Release\net48\*" ` + "C:\publish\mxaccessgw\Worker\" + ``` + +3. **Refresh OtOpcUa + historian sidecar binaries** (Tracks B + C + output): + ```powershell + $repo = "C:\Users\dohertj2\Desktop\lmxopcua" + dotnet publish "$repo\src\ZB.MOM.WW.OtOpcUa.Server" ` + -c Release -o "C:\publish\lmxopcua" + dotnet publish "$repo\src\ZB.MOM.WW.OtOpcUa.Driver.Historian.Wonderware" ` + -c Release -o "C:\publish\lmxopcua\WonderwareHistorian" + ``` + +4. **Update service env block if Track C added the new toggle**: + ```powershell + # Pull existing env, append OTOPCUA_HISTORIAN_ALARM_WRITE_ENABLED=true + # (default-on per C.2 design, but explicit assignment lets us flip false + # for read-only deployments without re-installing) + nssm set OtOpcUaWonderwareHistorian AppEnvironmentExtra ` + (((nssm get OtOpcUaWonderwareHistorian AppEnvironmentExtra) ` + + "`r`nOTOPCUA_HISTORIAN_ALARM_WRITE_ENABLED=true")) + ``` + +5. **Start in forward order**: + ```powershell + nssm start MxAccessGw + Start-Sleep -Seconds 4 + nssm start OtOpcUaWonderwareHistorian + Start-Sleep -Seconds 4 + nssm start OtOpcUa + Start-Sleep -Seconds 8 + ``` + +6. **Smoke verification:** + ```powershell + foreach ($s in 'MxAccessGw','OtOpcUaWonderwareHistorian','OtOpcUa') { + (Get-Service $s).Status + } + foreach ($p in 5120, 4840, 4841) { + Get-NetTCPConnection -LocalPort $p -State Listen ` + -ErrorAction SilentlyContinue + } + Get-Content "C:\publish\lmxopcua\logs\otopcua-*.log" -Tail 20 + Get-Content "C:\publish\mxaccessgw\stdout.log" -Tail 20 + Get-Content "C:\ProgramData\OtOpcUa\historian-wonderware-*.log" -Tail 10 + ``` + + Pass criterion: all three services `Running`; ports 5120 + 4840 + listening; sidecar log shows `Wonderware historian sidecar + serving — pipe=OtOpcUaWonderwareHistorian`; OtOpcUa log shows + `OPC UA server started — endpoint=opc.tcp://0.0.0.0:4840/OtOpcUa` + and a new line `IAlarmHistorianWriter resolved: Sidecar` (added + in B.4). + +7. **Functional verification — fire one alarm of each kind and assert + it propagates:** + - **Galaxy-native** — raise the `OtOpcUaParityTest_001.Counter` + `$Alarm*` extension via Galaxy's alarm-fire mechanism; assert an + OPC UA Part 9 transition reaches a connected `otopcua-cli alarms` + subscriber with rich payload (operator-comment field non-null, + original-raise-timestamp present). This validates Track A + B.1 + + B.2 + B.3. + - **Scripted** — author a one-line scripted alarm in the Admin UI + against any always-true predicate; assert the transition lands in + AVEVA Historian via `aaHistClientTrend` query (or + `Driver.Historian.Wonderware.IntegrationTests` with a query for + the alarm event). Validates Track C + B.4. + - **Sub-attribute fallback** — disable `IAlarmSource` on the + GalaxyDriver via the test seam (B.3 will introduce one); fire an + alarm; assert Part 9 transition still raised by the value-driven + path. Validates the fallback wasn't broken. + +**Files:** + +- `scripts\install\Refresh-Services.ps1` *(new — automates the above)* +- `docs\v2\dev-environment.md` — add the refresh script to the dev + workflow section. + +**Tests:** smoke run on the dev rig (`DESKTOP-6JL3KKO`) producing +`docs\plans\artifacts\d1-rollout-YYYY-MM-DD.md` with the captured log +tails + smoke-test assertions. Captured artifact lands as part of the +PR. + +**Rollback:** the refresh script keeps a timestamped backup of the +existing `C:\publish\mxaccessgw\` and `C:\publish\lmxopcua\` trees +before overwriting (mirrored to `C:\publish\.backup-YYYY-MM-DD\`). +Rollback is a stop / restore-from-backup / start sequence; no service +re-install needed since the NSSM service definitions don't change. + +**Production deploy:** out of scope for D.1 — the dev rig is the only +deployment in scope at this point. A separate PR-or-runbook lands the +production refresh once the dev rig has soaked for the documented +duration (parity-rig validation gate; see "Test gates" above). + ## Sequencing matrix ``` @@ -552,13 +690,22 @@ A.4 ConditionRefresh │ │ │ │ B.4 SidecarAlarmHistorianWriter (depends on C.2 deployed) - ──►B.5 docs + memory + + ▼ + Track D (deployment) + ───────────────────────── + D.1 Refresh C:\publish + restart services + (depends on A.4 + B.4 + C.2 merged) + ▼ + ──►B.5 docs + memory + completion banner ``` A.1 + B.1 + C.1 can all land in parallel — none have cross-repo runtime dependencies. B.1's tests use proto types without needing a running gateway. C.1 is purely sidecar-internal. The gateway-side dispatch (A.3) -gates B.2; the sidecar-side wiring (C.2) gates B.4. +gates B.2; the sidecar-side wiring (C.2) gates B.4. D.1 (deployment +refresh) gates B.5 (docs) — the docs sweep records the as-deployed +state, so the deploy must be live first. ## Test gates @@ -677,7 +824,14 @@ needed); land B.4 last and only after end-of-epic gate is green. - `scripts\install\Install-Services.ps1` (C.2 — env-var toggle for write-enable) - `tests\ZB.MOM.WW.OtOpcUa.Driver.Historian.Wonderware.Tests\` (C.1 — outcome mapping + batch + cluster failover) +**lmxopcua — deployment refresh (Track D):** + +- `scripts\install\Refresh-Services.ps1` *(new — D.1)* +- `docs\v2\dev-environment.md` (D.1 — document the refresh workflow) +- `docs\plans\artifacts\d1-rollout-YYYY-MM-DD.md` *(new — D.1 captured smoke run)* + Total: ~10 source files added/modified in mxaccessgw; ~14 in lmxopcua -proper; ~3 in the historian sidecar; ~12 test files across all repos. -Should land in 4-6 weeks of focused work given the parity-rig dependency -for end-to-end validation. +proper; ~3 in the historian sidecar; ~2 deployment scripts; ~12 test +files across all repos. Should land in 4-6 weeks of focused work given +the parity-rig dependency for end-to-end validation, plus a short +final-week ops slot for D.1.