docs: alarms-over-gateway plan — add Track D deployment refresh
After A/B/C all merge, the running services on C:\publish need to be refreshed before the Galaxy alarm-event family flows end-to-end. Add PR D.1: a Refresh-Services.ps1 script + runbook for stopping in reverse-dependency order, restaging binaries from the build outputs, restarting in forward-dependency order, and capturing a smoke-run artifact. D.1 gates B.5 (docs sweep) — the documentation records the as-deployed shape, so the deployment has to be live first. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -442,7 +442,8 @@ storage) plug into the same path.
|
||||
|
||||
### PR B.5 — docs + memory housekeeping
|
||||
|
||||
**Depends on:** B.1 / B.2 / B.3 / B.4 all green on the parity rig.
|
||||
**Depends on:** B.1 / B.2 / B.3 / B.4 all green on the parity rig + D.1
|
||||
(deployment refresh) verified on the dev rig.
|
||||
|
||||
**Files:**
|
||||
|
||||
@@ -533,6 +534,143 @@ completes that slot. Two PRs in the sidecar + one consumer-side PR
|
||||
C.2's lmxopcua-side consumer is **PR B.4 in Track B**, which depends
|
||||
on C.2 being deployed.
|
||||
|
||||
## Track D — deployment refresh
|
||||
|
||||
The dev box at `DESKTOP-6JL3KKO` runs three live services from
|
||||
`C:\publish\` (installed in the session that produced commit
|
||||
`ea04547`'s install scripts). Once Tracks A / B / C are merged, the
|
||||
deployed binaries need to be refreshed so the running services pick
|
||||
up the new alarm path. Track D is one PR — pure ops, no code change.
|
||||
|
||||
### PR D.1 — refresh C:\publish + restart services
|
||||
|
||||
**Depends on:** A.4 + B.4 + C.2 merged (every code-change PR landed).
|
||||
|
||||
**Order matters** — services must stop in reverse-dependency order
|
||||
(`OtOpcUa` → `OtOpcUaWonderwareHistorian` → `MxAccessGw`) and start in
|
||||
forward-dependency order (`MxAccessGw` → `OtOpcUaWonderwareHistorian`
|
||||
→ `OtOpcUa`). Touching binaries while a dependent service holds them
|
||||
locked produces the publish-time `MSB3027` file-lock error caught
|
||||
during the original install (see commit `80104ca`).
|
||||
|
||||
**Steps (run as a single PowerShell session on the deploy host):**
|
||||
|
||||
1. **Stop in reverse order**:
|
||||
```powershell
|
||||
nssm stop OtOpcUa
|
||||
nssm stop OtOpcUaWonderwareHistorian
|
||||
nssm stop MxAccessGw
|
||||
Start-Sleep -Seconds 3
|
||||
Get-Process MxGateway.Server, MxGateway.Worker, OtOpcUa.Server, `
|
||||
OtOpcUa.Driver.Historian.Wonderware -ErrorAction SilentlyContinue |
|
||||
Stop-Process -Force
|
||||
```
|
||||
|
||||
2. **Refresh mxaccessgw binaries** (Track A output):
|
||||
```powershell
|
||||
$gwSrc = "C:\Users\dohertj2\Desktop\mxaccessgw"
|
||||
dotnet build "$gwSrc\src\MxGateway.Worker" -c Release
|
||||
dotnet build "$gwSrc\src\MxGateway.Server" -c Release
|
||||
|
||||
Copy-Item -Recurse -Force `
|
||||
"$gwSrc\src\MxGateway.Server\bin\Release\net10.0\*" `
|
||||
"C:\publish\mxaccessgw\Server\"
|
||||
Copy-Item -Recurse -Force `
|
||||
"$gwSrc\src\MxGateway.Worker\bin\x86\Release\net48\*" `
|
||||
"C:\publish\mxaccessgw\Worker\"
|
||||
```
|
||||
|
||||
3. **Refresh OtOpcUa + historian sidecar binaries** (Tracks B + C
|
||||
output):
|
||||
```powershell
|
||||
$repo = "C:\Users\dohertj2\Desktop\lmxopcua"
|
||||
dotnet publish "$repo\src\ZB.MOM.WW.OtOpcUa.Server" `
|
||||
-c Release -o "C:\publish\lmxopcua"
|
||||
dotnet publish "$repo\src\ZB.MOM.WW.OtOpcUa.Driver.Historian.Wonderware" `
|
||||
-c Release -o "C:\publish\lmxopcua\WonderwareHistorian"
|
||||
```
|
||||
|
||||
4. **Update service env block if Track C added the new toggle**:
|
||||
```powershell
|
||||
# Pull existing env, append OTOPCUA_HISTORIAN_ALARM_WRITE_ENABLED=true
|
||||
# (default-on per C.2 design, but explicit assignment lets us flip false
|
||||
# for read-only deployments without re-installing)
|
||||
nssm set OtOpcUaWonderwareHistorian AppEnvironmentExtra `
|
||||
(((nssm get OtOpcUaWonderwareHistorian AppEnvironmentExtra) `
|
||||
+ "`r`nOTOPCUA_HISTORIAN_ALARM_WRITE_ENABLED=true"))
|
||||
```
|
||||
|
||||
5. **Start in forward order**:
|
||||
```powershell
|
||||
nssm start MxAccessGw
|
||||
Start-Sleep -Seconds 4
|
||||
nssm start OtOpcUaWonderwareHistorian
|
||||
Start-Sleep -Seconds 4
|
||||
nssm start OtOpcUa
|
||||
Start-Sleep -Seconds 8
|
||||
```
|
||||
|
||||
6. **Smoke verification:**
|
||||
```powershell
|
||||
foreach ($s in 'MxAccessGw','OtOpcUaWonderwareHistorian','OtOpcUa') {
|
||||
(Get-Service $s).Status
|
||||
}
|
||||
foreach ($p in 5120, 4840, 4841) {
|
||||
Get-NetTCPConnection -LocalPort $p -State Listen `
|
||||
-ErrorAction SilentlyContinue
|
||||
}
|
||||
Get-Content "C:\publish\lmxopcua\logs\otopcua-*.log" -Tail 20
|
||||
Get-Content "C:\publish\mxaccessgw\stdout.log" -Tail 20
|
||||
Get-Content "C:\ProgramData\OtOpcUa\historian-wonderware-*.log" -Tail 10
|
||||
```
|
||||
|
||||
Pass criterion: all three services `Running`; ports 5120 + 4840
|
||||
listening; sidecar log shows `Wonderware historian sidecar
|
||||
serving — pipe=OtOpcUaWonderwareHistorian`; OtOpcUa log shows
|
||||
`OPC UA server started — endpoint=opc.tcp://0.0.0.0:4840/OtOpcUa`
|
||||
and a new line `IAlarmHistorianWriter resolved: Sidecar` (added
|
||||
in B.4).
|
||||
|
||||
7. **Functional verification — fire one alarm of each kind and assert
|
||||
it propagates:**
|
||||
- **Galaxy-native** — raise the `OtOpcUaParityTest_001.Counter`
|
||||
`$Alarm*` extension via Galaxy's alarm-fire mechanism; assert an
|
||||
OPC UA Part 9 transition reaches a connected `otopcua-cli alarms`
|
||||
subscriber with rich payload (operator-comment field non-null,
|
||||
original-raise-timestamp present). This validates Track A + B.1
|
||||
+ B.2 + B.3.
|
||||
- **Scripted** — author a one-line scripted alarm in the Admin UI
|
||||
against any always-true predicate; assert the transition lands in
|
||||
AVEVA Historian via `aaHistClientTrend` query (or
|
||||
`Driver.Historian.Wonderware.IntegrationTests` with a query for
|
||||
the alarm event). Validates Track C + B.4.
|
||||
- **Sub-attribute fallback** — disable `IAlarmSource` on the
|
||||
GalaxyDriver via the test seam (B.3 will introduce one); fire an
|
||||
alarm; assert Part 9 transition still raised by the value-driven
|
||||
path. Validates the fallback wasn't broken.
|
||||
|
||||
**Files:**
|
||||
|
||||
- `scripts\install\Refresh-Services.ps1` *(new — automates the above)*
|
||||
- `docs\v2\dev-environment.md` — add the refresh script to the dev
|
||||
workflow section.
|
||||
|
||||
**Tests:** smoke run on the dev rig (`DESKTOP-6JL3KKO`) producing
|
||||
`docs\plans\artifacts\d1-rollout-YYYY-MM-DD.md` with the captured log
|
||||
tails + smoke-test assertions. Captured artifact lands as part of the
|
||||
PR.
|
||||
|
||||
**Rollback:** the refresh script keeps a timestamped backup of the
|
||||
existing `C:\publish\mxaccessgw\` and `C:\publish\lmxopcua\` trees
|
||||
before overwriting (mirrored to `C:\publish\.backup-YYYY-MM-DD\`).
|
||||
Rollback is a stop / restore-from-backup / start sequence; no service
|
||||
re-install needed since the NSSM service definitions don't change.
|
||||
|
||||
**Production deploy:** out of scope for D.1 — the dev rig is the only
|
||||
deployment in scope at this point. A separate PR-or-runbook lands the
|
||||
production refresh once the dev rig has soaked for the documented
|
||||
duration (parity-rig validation gate; see "Test gates" above).
|
||||
|
||||
## Sequencing matrix
|
||||
|
||||
```
|
||||
@@ -552,13 +690,22 @@ A.4 ConditionRefresh │ │
|
||||
│ │
|
||||
B.4 SidecarAlarmHistorianWriter
|
||||
(depends on C.2 deployed)
|
||||
──►B.5 docs + memory
|
||||
|
||||
▼
|
||||
Track D (deployment)
|
||||
─────────────────────────
|
||||
D.1 Refresh C:\publish + restart services
|
||||
(depends on A.4 + B.4 + C.2 merged)
|
||||
▼
|
||||
──►B.5 docs + memory + completion banner
|
||||
```
|
||||
|
||||
A.1 + B.1 + C.1 can all land in parallel — none have cross-repo runtime
|
||||
dependencies. B.1's tests use proto types without needing a running
|
||||
gateway. C.1 is purely sidecar-internal. The gateway-side dispatch (A.3)
|
||||
gates B.2; the sidecar-side wiring (C.2) gates B.4.
|
||||
gates B.2; the sidecar-side wiring (C.2) gates B.4. D.1 (deployment
|
||||
refresh) gates B.5 (docs) — the docs sweep records the as-deployed
|
||||
state, so the deploy must be live first.
|
||||
|
||||
## Test gates
|
||||
|
||||
@@ -677,7 +824,14 @@ needed); land B.4 last and only after end-of-epic gate is green.
|
||||
- `scripts\install\Install-Services.ps1` (C.2 — env-var toggle for write-enable)
|
||||
- `tests\ZB.MOM.WW.OtOpcUa.Driver.Historian.Wonderware.Tests\` (C.1 — outcome mapping + batch + cluster failover)
|
||||
|
||||
**lmxopcua — deployment refresh (Track D):**
|
||||
|
||||
- `scripts\install\Refresh-Services.ps1` *(new — D.1)*
|
||||
- `docs\v2\dev-environment.md` (D.1 — document the refresh workflow)
|
||||
- `docs\plans\artifacts\d1-rollout-YYYY-MM-DD.md` *(new — D.1 captured smoke run)*
|
||||
|
||||
Total: ~10 source files added/modified in mxaccessgw; ~14 in lmxopcua
|
||||
proper; ~3 in the historian sidecar; ~12 test files across all repos.
|
||||
Should land in 4-6 weeks of focused work given the parity-rig dependency
|
||||
for end-to-end validation.
|
||||
proper; ~3 in the historian sidecar; ~2 deployment scripts; ~12 test
|
||||
files across all repos. Should land in 4-6 weeks of focused work given
|
||||
the parity-rig dependency for end-to-end validation, plus a short
|
||||
final-week ops slot for D.1.
|
||||
|
||||
Reference in New Issue
Block a user