Compare commits

..

5 Commits

Author SHA1 Message Date
Joseph Doherty 64d8838e18 docs: reconcile alarms-over-gateway banner with audited source
The 'All 19 PRs merged' banner contradicted the warning paragraph in
the same block and overstated reality against the source tree. Audit
of the lmxopcua + mxaccessgw repos on 2026-05-01 found:

- 17 of 19 PRs merged. Four merged PRs ship inert scaffolds:
  - A.2: MxAccessAlarmEventSink.Attach is a no-op.
  - A.3 / A.4: NotWiredAlarmRpcDispatcher returns OK-with-diagnostic
    for AcknowledgeAlarm and an empty stream for QueryActiveAlarms.
  - C.1: SdkAlarmHistorianWriteBackend.WriteBatchAsync returns
    RetryPlease for every event with a placeholder log.
- The architectural decision the warning paragraph asks the operator
  to make was already resolved 2026-04-30. MxAccessAlarmEventSink.cs
  in mxaccessgw records that aaAlarmManagedClient.AlarmClient is x86
  net48 (same bitness as the worker), and pins the discovered API
  surface (RegisterConsumer / Subscribe / GetStatistics /
  GetAlarmExtendedRec / AlarmAckByGUID). What remains is wiring PRs
  in the worker, not architectural choice.
- D.1 smoke artifact (docs/plans/artifacts/d1-rollout-YYYY-MM-DD.md)
  not yet captured; directory does not exist.

Banner rewritten to split functional-end-to-end vs merged-but-inert
PRs explicitly so future readers don't have to reconcile the doc
against the source tree themselves.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 06:31:22 -04:00
dohertj2 69f02fed7f Merge pull request 'docs: alarms-over-gateway plan banner — record A.2 dev-rig finding' (#418) from track-d1-followup-plan-banner into master 2026-04-30 21:31:40 -04:00
Joseph Doherty 5ed26d2ec6 docs: alarms-over-gateway plan banner — record A.2 dev-rig finding
Replaces the "ships as a follow-up gated on dev-rig validation"
banner with the actual finding from the dev-rig inspection: the
MXAccess COM Toolkit on this AVEVA install does not expose any
alarm-event family, and the AVEVA alarm-subscription managed
assemblies (aaAlarmManagedClient, ArchestrAAlarmsAndEvents.SDK)
are x64-only and incompatible with the worker's x86 bitness.

Two operator-facing paths forward documented inline:

1. Stay on the value-driven sub-attribute path (current production
   behaviour). Operator-comment fidelity is the only v1 regression.

2. Add an x64 alarm-helper sub-process alongside the worker that
   loads aaAlarmManagedClient and forwards transitions over a
   named-pipe IPC. Recovers full v1 fidelity but adds operational
   complexity.

The full architectural notes live in the mxaccessgw repo at
src/MxGateway.Worker/MxAccess/MxAccessAlarmEventSink.cs.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 21:29:16 -04:00
dohertj2 439b39463b Merge pull request 'scripts+docs: Refresh-Services.ps1 for alarm-rig deploy refresh (PR D.1)' (#417) from track-d1-refresh-services into master 2026-04-30 21:13:58 -04:00
Joseph Doherty 32b872d5c7 scripts+docs: Refresh-Services.ps1 for alarm-rig deploy refresh (PR D.1)
Seventeenth PR of the alarms-over-gateway epic
(docs/plans/alarms-over-gateway.md). Lands the script that the
plan calls for in Track D — the actual smoke-run validation
on the dev rig (publish, restart, fire alarms, capture artifacts)
remains operator work; this PR ships the automation that the
operator drives.

scripts/install/Refresh-Services.ps1 — single-shot refresh
script. Designed to run elevated on the deploy host
(DESKTOP-6JL3KKO today; production uses a separate runbook).
The script:

- Stops services in reverse-dependency order (OtOpcUa →
  OtOpcUaWonderwareHistorian → MxAccessGw) and force-kills any
  residual processes (avoids the publish-time MSB3027 file-lock
  the original install script hit).
- Snapshots existing C:\publish trees to
  C:\publish\.backup-YYYY-MM-DD-HHMMSS\ for rollback (skip with
  -SkipBackup).
- Builds + copies mxaccessgw worker (x86 net48) + server (net10.0)
  binaries from the sibling repo.
- Publishes OtOpcUa Server + Wonderware historian sidecar from
  this repo.
- Ensures OTOPCUA_HISTORIAN_ALARM_WRITE_ENABLED=true is set on
  the historian service env block (PR C.2 toggle).
- Starts services in forward-dependency order with the
  inter-service waits the original install used.
- Smoke-verifies (service status, listening ports 5120 / 4840
  / 4841, recent log tails).

Supports -WhatIf for dry-run inspection without touching the
running services.

docs/v2/dev-environment.md — new "Service Refresh —
Refresh-Services.ps1" section between Credential Management
and Test Data Seed. Cross-references the plan's Track D
functional verification scenarios.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 21:11:27 -04:00
3 changed files with 310 additions and 11 deletions
+57 -11
View File
@@ -1,16 +1,62 @@
# Plan — alarms over the mxaccessgw gateway
> ✅ **Completed 2026-04-30 — historical record.**
> The 14-PR sequence (A.1 / A.3, B.1 / B.2 / B.3 / B.4, C.1 / C.2,
> E.1 / E.2 / E.3 / E.4 / E.5 / E.6 / E.7) shipped. The gateway-side
> public RPC surface, the driver-native ack path, the sidecar alarm
> historian writer, and the five client SDKs are all live. **A.2**
> (worker MxAccess alarm subscription) and **A.4** (worker
> ConditionRefresh command) require the AVEVA worker host's MxAccess
> Toolkit C++ SDK and ship as a follow-up gated on dev-rig
> validation. **D.1** (refresh `C:\publish` + smoke-run on the dev
> rig) ships once A.2 is hardware-verified. The remainder of this
> document is preserved as the design record.
> **17 of 19 PRs merged. Public contract surface and the lmxopcua /
> sidecar consumers are live; four merged PRs ship as scaffolds
> pending worker-side wiring.** Status reconciled against the source
> tree on 2026-05-01.
>
> **Functional end-to-end today:** B.1 / B.2 / B.3 / B.4 / B.5
> (EventPump branch, GalaxyDriver `IAlarmSource`, DriverNodeManager
> ack routing, `WonderwareHistorianClient : IAlarmHistorianWriter`,
> docs sweep), C.2 (sidecar wires the alarm-write slot), D.1 script
> (`scripts/install/Refresh-Services.ps1`), E.1 E.7 (proto regen +
> .NET / Python / Go / Java / Rust SDK alarm methods + lmxopcua client
> surface). The value-driven sub-attribute fallback path keeps Galaxy
> alarms functional today.
>
> **Merged-but-inert scaffolds (gated on worker AlarmClient wiring):**
>
> - **A.2** — `MxAccessAlarmEventSink.Attach` is a no-op; the COM-side
> `aaAlarmManagedClient.AlarmClient` registration / subscription has
> not landed yet, so the gateway's
> `MX_EVENT_FAMILY_ON_ALARM_TRANSITION` is reserved on the wire but
> never emitted.
> - **A.3** AcknowledgeAlarm + **A.4** QueryActiveAlarms — public RPC
> handlers in `MxAccessGatewayService.cs` route through
> `NotWiredAlarmRpcDispatcher` (Ack returns OK with a `worker dispatch
> pending dev-rig wiring` diagnostic; Query yields an empty stream).
> - **C.1** sidecar — `AahClientManagedAlarmEventWriter` exists and the
> IPC slot is wired, but the production backend
> `SdkAlarmHistorianWriteBackend.WriteBatchAsync` returns
> `RetryPlease` for every event with a placeholder log — the live
> `aahClientManaged` SDK call site is pinned during the D.1 dev-rig
> smoke. Effect: scripted-alarm transitions queue locally in
> `SqliteStoreAndForwardSink` and the drain worker repeatedly retries.
>
> **Architectural decision RESOLVED 2026-04-30** (recorded in the
> mxaccessgw repo at `src/MxGateway.Worker/MxAccess/MxAccessAlarmEventSink.cs`
> xmldoc): the worker hosts `aaAlarmManagedClient.AlarmClient` (x86
> .NET Framework 4.8 — same bitness as the existing MxAccess COM
> consumer) alongside the COM consumer, sharing the worker's STA +
> WM_APP message pump. The discovered API surface
> (`RegisterConsumer`, `Subscribe`, `GetStatistics`,
> `GetAlarmExtendedRec`, `AlarmAckByGUID`) is documented in that
> file's xmldoc. The earlier concern that AVEVA's alarm SDK was
> x64-only proved wrong against the deployed assemblies. What remains
> is wiring PRs in the worker — session-startup `RegisterConsumer` +
> `Subscribe`, an STA WM_APP handler that routes
> alarm-changed messages into `EnqueueTransition`, and the worker
> command path that calls `AlarmAckByGUID` from a gateway
> `AcknowledgeAlarm` RPC.
>
> **D.1 smoke artifact**
> (`docs/plans/artifacts/d1-rollout-YYYY-MM-DD.md`, called for in the
> Track D test plan below) not yet captured — gated on the worker
> AlarmClient wiring being live on the dev rig so the smoke can
> exercise the alarm scenarios end-to-end and pin the
> `SdkAlarmHistorianWriteBackend` SDK entry point.
>
> The remainder of this document is preserved as the design record.
Coordinated epic across two repos:
+43
View File
@@ -408,6 +408,49 @@ For production:
- Per-NodeId credentials in `ClusterNodeCredential` table (per decision #83)
- Admin app uses LDAP (no SQL credential at all on the user-facing side)
## Service Refresh — `Refresh-Services.ps1`
The deploy host hosts three NSSM-wrapped services (`MxAccessGw`,
`OtOpcUaWonderwareHistorian`, `OtOpcUa`) that consume binaries from
`C:\publish\`. After landing changes in either repo, refresh the
deployed bits with `scripts\install\Refresh-Services.ps1`:
```powershell
# Default invocation (dev rig).
& C:\Users\dohertj2\Desktop\lmxopcua\scripts\install\Refresh-Services.ps1
# Skip the timestamped backup (faster on iterative dev cycles).
& Refresh-Services.ps1 -SkipBackup
# Dry-run — print the actions without doing them.
& Refresh-Services.ps1 -WhatIf
```
The script:
1. Stops services in reverse-dependency order (`OtOpcUa`
`OtOpcUaWonderwareHistorian``MxAccessGw`) and force-kills
any residual processes.
2. Snapshots the existing `C:\publish\mxaccessgw\` and
`C:\publish\lmxopcua\` trees to `C:\publish\.backup-<timestamp>\`
for rollback (skip with `-SkipBackup`).
3. Builds + copies mxaccessgw worker (x86 net48) + server (net10.0)
binaries from the sibling repo.
4. `dotnet publish`-es the OtOpcUa server + Wonderware historian
sidecar from this repo.
5. Ensures `OTOPCUA_HISTORIAN_ALARM_WRITE_ENABLED=true` is set on
the historian service env block (PR C.2 toggle).
6. Starts services in forward-dependency order (`MxAccessGw`
`OtOpcUaWonderwareHistorian``OtOpcUa`).
7. Smoke-verifies — service status, listening ports (5120 / 4840 /
4841), recent log tails.
Functional verification (alarm raise / scripted alarm historian
round-trip / sub-attribute fallback) is the operator's next step
after the refresh; see
[docs/plans/alarms-over-gateway.md](../plans/alarms-over-gateway.md)
§Track D for the scenarios.
## Test Data Seed
Each environment needs a baseline data set so cross-developer tests are reproducible. Lives in `tests/ZB.MOM.WW.OtOpcUa.IntegrationTests/SeedData/`:
+210
View File
@@ -0,0 +1,210 @@
[CmdletBinding()]
param(
[string]$RepoRoot = "C:\Users\dohertj2\Desktop\lmxopcua",
[string]$GatewayRoot = "C:\Users\dohertj2\Desktop\mxaccessgw",
[string]$PublishRoot = "C:\publish",
[switch]$SkipBackup,
[switch]$WhatIf
)
# PR D.1 — refresh C:\publish + restart services for the alarms-over-gateway
# epic. Stops services in reverse-dependency order (OtOpcUa →
# OtOpcUaWonderwareHistorian → MxAccessGw), refreshes binaries from the
# repos, then starts in forward order. A timestamped backup of the existing
# C:\publish trees lands under C:\publish\.backup-YYYY-MM-DD\ unless
# -SkipBackup is supplied.
#
# Designed to run as a single elevated PowerShell session on the deploy host
# (the dev rig today; production refresh is a separate runbook).
Set-StrictMode -Version Latest
$ErrorActionPreference = "Stop"
function Step([string]$Message) {
Write-Host ""
Write-Host "==> $Message" -ForegroundColor Cyan
}
function Run([scriptblock]$Block, [string]$Description) {
if ($WhatIf) {
Write-Host " (skip) $Description" -ForegroundColor DarkYellow
return
}
Write-Host " $Description"
& $Block
}
function Test-NssmService([string]$Name) {
$svc = Get-Service -Name $Name -ErrorAction SilentlyContinue
return $null -ne $svc
}
# ------------------------------------------------------------------------
# Step 1: Stop in reverse dependency order
# ------------------------------------------------------------------------
Step "Stopping services (OtOpcUa → OtOpcUaWonderwareHistorian → MxAccessGw)"
foreach ($name in @('OtOpcUa', 'OtOpcUaWonderwareHistorian', 'MxAccessGw')) {
if (Test-NssmService $name) {
Run { nssm stop $name } "stop $name"
}
else {
Write-Host " ($name not installed; skipping)" -ForegroundColor DarkGray
}
}
if (-not $WhatIf) {
Start-Sleep -Seconds 3
Get-Process MxGateway.Server, MxGateway.Worker, OtOpcUa.Server, OtOpcUa.Driver.Historian.Wonderware -ErrorAction SilentlyContinue |
ForEach-Object {
Write-Host " killing residual process $($_.ProcessName) (PID=$($_.Id))" -ForegroundColor DarkYellow
Stop-Process -Id $_.Id -Force -ErrorAction SilentlyContinue
}
}
# ------------------------------------------------------------------------
# Step 2: Backup existing C:\publish trees
# ------------------------------------------------------------------------
if (-not $SkipBackup -and (Test-Path $PublishRoot)) {
$backupRoot = Join-Path $PublishRoot ".backup-$((Get-Date).ToString('yyyy-MM-dd-HHmmss'))"
Step "Backing up $PublishRoot$backupRoot"
Run {
New-Item -ItemType Directory -Path $backupRoot | Out-Null
foreach ($subdir in @('mxaccessgw', 'lmxopcua')) {
$src = Join-Path $PublishRoot $subdir
if (Test-Path $src) {
Copy-Item -Recurse -Path $src -Destination (Join-Path $backupRoot $subdir)
}
}
} "snapshot publish dirs (rollback target)"
}
else {
Write-Host " (backup skipped)" -ForegroundColor DarkGray
}
# ------------------------------------------------------------------------
# Step 3: Refresh mxaccessgw binaries (Track A output)
# ------------------------------------------------------------------------
Step "Building + copying mxaccessgw binaries from $GatewayRoot"
Run {
& dotnet build "$GatewayRoot\src\MxGateway.Worker" -c Release | Out-Null
& dotnet build "$GatewayRoot\src\MxGateway.Server" -c Release | Out-Null
} "dotnet build (Worker x86 net48 + Server net10.0)"
Run {
$serverDest = Join-Path $PublishRoot "mxaccessgw\Server"
$workerDest = Join-Path $PublishRoot "mxaccessgw\Worker"
if (-not (Test-Path $serverDest)) { New-Item -ItemType Directory -Path $serverDest -Force | Out-Null }
if (-not (Test-Path $workerDest)) { New-Item -ItemType Directory -Path $workerDest -Force | Out-Null }
Copy-Item -Recurse -Force "$GatewayRoot\src\MxGateway.Server\bin\Release\net10.0\*" $serverDest
Copy-Item -Recurse -Force "$GatewayRoot\src\MxGateway.Worker\bin\x86\Release\net48\*" $workerDest
} "copy gateway server + worker outputs"
# ------------------------------------------------------------------------
# Step 4: Refresh OtOpcUa + Wonderware historian sidecar
# ------------------------------------------------------------------------
Step "Publishing OtOpcUa server + Wonderware historian sidecar from $RepoRoot"
Run {
& dotnet publish "$RepoRoot\src\ZB.MOM.WW.OtOpcUa.Server" `
-c Release -o (Join-Path $PublishRoot "lmxopcua") | Out-Null
& dotnet publish "$RepoRoot\src\ZB.MOM.WW.OtOpcUa.Driver.Historian.Wonderware" `
-c Release -o (Join-Path $PublishRoot "lmxopcua\WonderwareHistorian") | Out-Null
} "dotnet publish (Server + sidecar)"
# ------------------------------------------------------------------------
# Step 5: Service env block — ensure OTOPCUA_HISTORIAN_ALARM_WRITE_ENABLED
# is set on the Wonderware historian service (PR C.2 toggle).
# ------------------------------------------------------------------------
if (Test-NssmService 'OtOpcUaWonderwareHistorian') {
Step "Ensuring OTOPCUA_HISTORIAN_ALARM_WRITE_ENABLED is set on the historian service"
Run {
$existing = nssm get OtOpcUaWonderwareHistorian AppEnvironmentExtra
if ($existing -notmatch 'OTOPCUA_HISTORIAN_ALARM_WRITE_ENABLED') {
$combined = $existing + "`r`nOTOPCUA_HISTORIAN_ALARM_WRITE_ENABLED=true"
nssm set OtOpcUaWonderwareHistorian AppEnvironmentExtra $combined | Out-Null
Write-Host " appended OTOPCUA_HISTORIAN_ALARM_WRITE_ENABLED=true" -ForegroundColor DarkGreen
}
else {
Write-Host " already present; leaving service env block untouched"
}
} "patch service env block"
}
# ------------------------------------------------------------------------
# Step 6: Start in forward dependency order
# ------------------------------------------------------------------------
Step "Starting services (MxAccessGw → OtOpcUaWonderwareHistorian → OtOpcUa)"
foreach ($pair in @(
@{ Name = 'MxAccessGw'; Wait = 4 },
@{ Name = 'OtOpcUaWonderwareHistorian'; Wait = 4 },
@{ Name = 'OtOpcUa'; Wait = 8 }
)) {
$name = $pair.Name
if (Test-NssmService $name) {
Run { nssm start $name } "start $name"
if (-not $WhatIf) { Start-Sleep -Seconds $pair.Wait }
}
else {
Write-Host " ($name not installed; skipping)" -ForegroundColor DarkGray
}
}
# ------------------------------------------------------------------------
# Step 7: Smoke verification
# ------------------------------------------------------------------------
Step "Smoke verification"
if (-not $WhatIf) {
foreach ($name in @('MxAccessGw', 'OtOpcUaWonderwareHistorian', 'OtOpcUa')) {
if (Test-NssmService $name) {
$status = (Get-Service $name).Status
$color = if ($status -eq 'Running') { 'Green' } else { 'Red' }
Write-Host " $name = $status" -ForegroundColor $color
}
}
foreach ($port in @(5120, 4840, 4841)) {
$listening = Get-NetTCPConnection -LocalPort $port -State Listen -ErrorAction SilentlyContinue
$color = if ($listening) { 'Green' } else { 'DarkYellow' }
Write-Host " TCP $port listening = $($null -ne $listening)" -ForegroundColor $color
}
Write-Host ""
Write-Host " Recent log tails:" -ForegroundColor DarkCyan
$tails = @(
"$PublishRoot\lmxopcua\logs\otopcua-*.log",
"$PublishRoot\mxaccessgw\stdout.log",
"$env:ProgramData\OtOpcUa\historian-wonderware-*.log"
)
foreach ($pattern in $tails) {
$latest = Get-ChildItem -Path $pattern -ErrorAction SilentlyContinue |
Sort-Object LastWriteTime -Descending |
Select-Object -First 1
if ($null -ne $latest) {
Write-Host ""
Write-Host " --- $($latest.FullName) (last 10 lines) ---" -ForegroundColor DarkGray
Get-Content $latest.FullName -Tail 10 | ForEach-Object { Write-Host " $_" }
}
}
}
Write-Host ""
Write-Host "Refresh complete." -ForegroundColor Green
Write-Host ""
Write-Host "Next: run the functional verification scenarios from"
Write-Host " docs\plans\alarms-over-gateway.md §Track D §6 'Functional verification'"
Write-Host " - Galaxy-native alarm raise"
Write-Host " - Scripted alarm → AVEVA Historian round-trip"
Write-Host " - Sub-attribute fallback path with IAlarmSource disabled"