docs: alarms-over-gateway plan — add Track D deployment refresh

After A/B/C all merge, the running services on C:\publish need to be
refreshed before the Galaxy alarm-event family flows end-to-end. Add
PR D.1: a Refresh-Services.ps1 script + runbook for stopping in
reverse-dependency order, restaging binaries from the build outputs,
restarting in forward-dependency order, and capturing a smoke-run
artifact.

D.1 gates B.5 (docs sweep) — the documentation records the
as-deployed shape, so the deployment has to be live first.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
Joseph Doherty
2026-04-30 15:11:23 -04:00
parent 7367b3e23f
commit 8d0e13e69e

View File

@@ -442,7 +442,8 @@ storage) plug into the same path.
### PR B.5 — docs + memory housekeeping
**Depends on:** B.1 / B.2 / B.3 / B.4 all green on the parity rig.
**Depends on:** B.1 / B.2 / B.3 / B.4 all green on the parity rig + D.1
(deployment refresh) verified on the dev rig.
**Files:**
@@ -533,6 +534,143 @@ completes that slot. Two PRs in the sidecar + one consumer-side PR
C.2's lmxopcua-side consumer is **PR B.4 in Track B**, which depends
on C.2 being deployed.
## Track D — deployment refresh
The dev box at `DESKTOP-6JL3KKO` runs three live services from
`C:\publish\` (installed in the session that produced commit
`ea04547`'s install scripts). Once Tracks A / B / C are merged, the
deployed binaries need to be refreshed so the running services pick
up the new alarm path. Track D is one PR — pure ops, no code change.
### PR D.1 — refresh C:\publish + restart services
**Depends on:** A.4 + B.4 + C.2 merged (every code-change PR landed).
**Order matters** — services must stop in reverse-dependency order
(`OtOpcUa``OtOpcUaWonderwareHistorian``MxAccessGw`) and start in
forward-dependency order (`MxAccessGw``OtOpcUaWonderwareHistorian`
`OtOpcUa`). Touching binaries while a dependent service holds them
locked produces the publish-time `MSB3027` file-lock error caught
during the original install (see commit `80104ca`).
**Steps (run as a single PowerShell session on the deploy host):**
1. **Stop in reverse order**:
```powershell
nssm stop OtOpcUa
nssm stop OtOpcUaWonderwareHistorian
nssm stop MxAccessGw
Start-Sleep -Seconds 3
Get-Process MxGateway.Server, MxGateway.Worker, OtOpcUa.Server, `
OtOpcUa.Driver.Historian.Wonderware -ErrorAction SilentlyContinue |
Stop-Process -Force
```
2. **Refresh mxaccessgw binaries** (Track A output):
```powershell
$gwSrc = "C:\Users\dohertj2\Desktop\mxaccessgw"
dotnet build "$gwSrc\src\MxGateway.Worker" -c Release
dotnet build "$gwSrc\src\MxGateway.Server" -c Release
Copy-Item -Recurse -Force `
"$gwSrc\src\MxGateway.Server\bin\Release\net10.0\*" `
"C:\publish\mxaccessgw\Server\"
Copy-Item -Recurse -Force `
"$gwSrc\src\MxGateway.Worker\bin\x86\Release\net48\*" `
"C:\publish\mxaccessgw\Worker\"
```
3. **Refresh OtOpcUa + historian sidecar binaries** (Tracks B + C
output):
```powershell
$repo = "C:\Users\dohertj2\Desktop\lmxopcua"
dotnet publish "$repo\src\ZB.MOM.WW.OtOpcUa.Server" `
-c Release -o "C:\publish\lmxopcua"
dotnet publish "$repo\src\ZB.MOM.WW.OtOpcUa.Driver.Historian.Wonderware" `
-c Release -o "C:\publish\lmxopcua\WonderwareHistorian"
```
4. **Update service env block if Track C added the new toggle**:
```powershell
# Pull existing env, append OTOPCUA_HISTORIAN_ALARM_WRITE_ENABLED=true
# (default-on per C.2 design, but explicit assignment lets us flip false
# for read-only deployments without re-installing)
nssm set OtOpcUaWonderwareHistorian AppEnvironmentExtra `
(((nssm get OtOpcUaWonderwareHistorian AppEnvironmentExtra) `
+ "`r`nOTOPCUA_HISTORIAN_ALARM_WRITE_ENABLED=true"))
```
5. **Start in forward order**:
```powershell
nssm start MxAccessGw
Start-Sleep -Seconds 4
nssm start OtOpcUaWonderwareHistorian
Start-Sleep -Seconds 4
nssm start OtOpcUa
Start-Sleep -Seconds 8
```
6. **Smoke verification:**
```powershell
foreach ($s in 'MxAccessGw','OtOpcUaWonderwareHistorian','OtOpcUa') {
(Get-Service $s).Status
}
foreach ($p in 5120, 4840, 4841) {
Get-NetTCPConnection -LocalPort $p -State Listen `
-ErrorAction SilentlyContinue
}
Get-Content "C:\publish\lmxopcua\logs\otopcua-*.log" -Tail 20
Get-Content "C:\publish\mxaccessgw\stdout.log" -Tail 20
Get-Content "C:\ProgramData\OtOpcUa\historian-wonderware-*.log" -Tail 10
```
Pass criterion: all three services `Running`; ports 5120 + 4840
listening; sidecar log shows `Wonderware historian sidecar
serving — pipe=OtOpcUaWonderwareHistorian`; OtOpcUa log shows
`OPC UA server started — endpoint=opc.tcp://0.0.0.0:4840/OtOpcUa`
and a new line `IAlarmHistorianWriter resolved: Sidecar` (added
in B.4).
7. **Functional verification — fire one alarm of each kind and assert
it propagates:**
- **Galaxy-native** — raise the `OtOpcUaParityTest_001.Counter`
`$Alarm*` extension via Galaxy's alarm-fire mechanism; assert an
OPC UA Part 9 transition reaches a connected `otopcua-cli alarms`
subscriber with rich payload (operator-comment field non-null,
original-raise-timestamp present). This validates Track A + B.1
+ B.2 + B.3.
- **Scripted** — author a one-line scripted alarm in the Admin UI
against any always-true predicate; assert the transition lands in
AVEVA Historian via `aaHistClientTrend` query (or
`Driver.Historian.Wonderware.IntegrationTests` with a query for
the alarm event). Validates Track C + B.4.
- **Sub-attribute fallback** — disable `IAlarmSource` on the
GalaxyDriver via the test seam (B.3 will introduce one); fire an
alarm; assert Part 9 transition still raised by the value-driven
path. Validates the fallback wasn't broken.
**Files:**
- `scripts\install\Refresh-Services.ps1` *(new — automates the above)*
- `docs\v2\dev-environment.md` — add the refresh script to the dev
workflow section.
**Tests:** smoke run on the dev rig (`DESKTOP-6JL3KKO`) producing
`docs\plans\artifacts\d1-rollout-YYYY-MM-DD.md` with the captured log
tails + smoke-test assertions. Captured artifact lands as part of the
PR.
**Rollback:** the refresh script keeps a timestamped backup of the
existing `C:\publish\mxaccessgw\` and `C:\publish\lmxopcua\` trees
before overwriting (mirrored to `C:\publish\.backup-YYYY-MM-DD\`).
Rollback is a stop / restore-from-backup / start sequence; no service
re-install needed since the NSSM service definitions don't change.
**Production deploy:** out of scope for D.1 — the dev rig is the only
deployment in scope at this point. A separate PR-or-runbook lands the
production refresh once the dev rig has soaked for the documented
duration (parity-rig validation gate; see "Test gates" above).
## Sequencing matrix
```
@@ -552,13 +690,22 @@ A.4 ConditionRefresh │ │
│ │
B.4 SidecarAlarmHistorianWriter
(depends on C.2 deployed)
──►B.5 docs + memory
Track D (deployment)
─────────────────────────
D.1 Refresh C:\publish + restart services
(depends on A.4 + B.4 + C.2 merged)
──►B.5 docs + memory + completion banner
```
A.1 + B.1 + C.1 can all land in parallel — none have cross-repo runtime
dependencies. B.1's tests use proto types without needing a running
gateway. C.1 is purely sidecar-internal. The gateway-side dispatch (A.3)
gates B.2; the sidecar-side wiring (C.2) gates B.4.
gates B.2; the sidecar-side wiring (C.2) gates B.4. D.1 (deployment
refresh) gates B.5 (docs) — the docs sweep records the as-deployed
state, so the deploy must be live first.
## Test gates
@@ -677,7 +824,14 @@ needed); land B.4 last and only after end-of-epic gate is green.
- `scripts\install\Install-Services.ps1` (C.2 — env-var toggle for write-enable)
- `tests\ZB.MOM.WW.OtOpcUa.Driver.Historian.Wonderware.Tests\` (C.1 — outcome mapping + batch + cluster failover)
**lmxopcua — deployment refresh (Track D):**
- `scripts\install\Refresh-Services.ps1` *(new — D.1)*
- `docs\v2\dev-environment.md` (D.1 — document the refresh workflow)
- `docs\plans\artifacts\d1-rollout-YYYY-MM-DD.md` *(new — D.1 captured smoke run)*
Total: ~10 source files added/modified in mxaccessgw; ~14 in lmxopcua
proper; ~3 in the historian sidecar; ~12 test files across all repos.
Should land in 4-6 weeks of focused work given the parity-rig dependency
for end-to-end validation.
proper; ~3 in the historian sidecar; ~2 deployment scripts; ~12 test
files across all repos. Should land in 4-6 weeks of focused work given
the parity-rig dependency for end-to-end validation, plus a short
final-week ops slot for D.1.