alarms-over-gateway: full pipeline (wnwrap consumer + dispatcher + IPC + auto-subscribe + ack-by-name + live smoke) #118

Merged
dohertj2 merged 16 commits from docs/alarm-client-wm-app-finding into main 2026-05-01 12:31:29 -04:00
2 changed files with 225 additions and 19 deletions
Showing only changes of commit bb7be14d1d - Show all commits
+76 -10
View File
@@ -258,16 +258,82 @@ the script's writes. So the alarm extension is **evaluating**
its condition, just not visibly producing transitions on the
`aaAlarmManagedClient` consumer stream.
This isolates the unknown to the producer-side path — whether
the BoolAlarm extension's "publish to alarm manager" knob is on,
whether the platform is in an alarm area that matches the
consumer's subscription scope, or whether AVEVA has a separate
"events" path the BoolAlarm uses by default that this consumer
doesn't subscribe to. Resolving requires checking the BoolAlarm
extension's config in System Platform IDE (alarm priority,
category, "Active"/"Enabled" flags, alarm-vs-event mode) and
checking whether `aaObjectViewer`'s Active Alarms panel sees the
alarm fire.
## Multi-channel + multi-subscription probe — sixth run, 2026-05-01
Extended the probe to try every consumer-side approach in
parallel:
- **Subscription expressions** (sequential): `\Galaxy!`,
`\Galaxy!*`, `\\Galaxy!`, `\Galaxy!TestArea`, `\\.\Galaxy!`.
All Subscribe calls returned rc=0; the last one
(`\\.\Galaxy!`) is reflected in `GetProviders` (count=1).
- **Read channels** polled at 500ms cadence: `GetStatistics`,
`GetHighPriAlarm`, `SFCreateSnapshot` + `SFGetStatistics`.
- **Filter+sort**: priority 0..32767, `qtSummary`,
state=`asAlarmActiveNow`, sort=`sfReturnNewestFirst`.
- **AlarmRecord init** (worked around `Not a valid Win32
FileTime` exception): all DateTime fields pre-set to FILETIME
epoch (1601-01-01 UTC) before the call, since
`default(DateTime)` is outside FILETIME range and trips the
interop marshaler.
Result of the 60s run with `TestMachine_001.TestAlarm001` being
flipped every 10s:
```
Subscribe('\Galaxy!') -> 0
Subscribe('\Galaxy!*') -> 0
Subscribe('\\Galaxy!') -> 0
Subscribe('\Galaxy!TestArea') -> 0
Subscribe('\\.\Galaxy!') -> 0
GetProviders [after Subscribe-multi] -> count=1 list=[ 0 \\.\Galaxy!]
GetStatistics #1: total=0 active=0 changes=1 codes=[7] positions=[] handles=[]
GetHighPriAlarm #1: rc=0 { }
SF channel #1: SFCreate=0 numAlarms=0 SFStats=0 unackRet=0 unackAlm=0 ackAlm=0 others=0 events=0 idxNewest=-1
```
**No further "(changed)" entries for the entire 60s window.**
Every read API returned the same empty result on every poll.
User confirms the alarm IS firing — `aaObjectViewer` sees
`$Alarm.InAlarm` flip in lockstep with the script. Historian
records exist (per user — needs verification by querying the
historian directly).
## Conclusion of consumer-side probing
`aaAlarmManagedClient.AlarmClient` is **not** the receive
surface AVEVA's alarm pipeline routes to in this Galaxy
configuration. The consumer chain is verified end-to-end:
- `InitializeConsumer` + `RegisterConsumer` + `Subscribe` all
succeed (rc=0).
- `GetProviders` finds `\Galaxy!` once Initialize is called.
- All read APIs (`GetStatistics`, `GetHighPriAlarm`,
`SFCreateSnapshot`/`SFGetStatistics`) return empty even with
every documented filter combination.
- The consumer's hWnd receives zero AVEVA messages between
`WM_CREATE` and `WM_DESTROY`; AVEVA's traffic goes to its own
internal hwnd.
The next investigation directions are not consumer-side:
1. **Inspect `aaObjectViewer`'s alarm SDK** to see what library
it uses to read alarms. If different from
`aaAlarmManagedClient`, switch the worker over.
2. **Query the historian directly** (`aahEventStorage` /
`aahEventSvc`) to confirm alarms are recorded — and use the
same path for v2 alarm capture.
3. **Inspect AVEVA's alarm-routing config** for this Galaxy in
System Platform IDE — area assignments, alarm provider
bindings, "publish alarm events to" settings on the platform.
For A.2 implementation: the `aaAlarmManagedClient` path the
gateway-worker is currently architected around may be a
dead-end on customer Galaxies configured this way. If the
alarms truly only flow through the historian event-storage path,
A.2 needs to consume from `aahEventStorage` instead — a
fundamental architecture pivot.
### Implications for A.2 implementation
@@ -32,6 +32,17 @@ namespace MxGateway.Worker.Tests;
public sealed class AlarmClientWmProbeTests : IDisposable
{
// Probe configuration. Override in the constructor below if needed.
// Try multiple subscription expressions sequentially (each Subscribe call
// adds to the consumer's scope). The "everything" form varies by AVEVA
// version — we shotgun common forms.
private static readonly string[] SubscriptionExpressions =
{
@"\Galaxy!", // documented "all groups under Galaxy provider"
@"\Galaxy!*", // wildcard variant
@"\\Galaxy!", // double-backslash UNC-style
@"\Galaxy!TestArea", // explicit area where TestMachine_001 lives
@"\\.\Galaxy!", // local-host prefix
};
private const string SubscriptionExpression = @"\Galaxy!";
private static readonly TimeSpan PumpDuration = TimeSpan.FromSeconds(60);
private static readonly TimeSpan PollInterval = TimeSpan.FromMilliseconds(500);
@@ -281,16 +292,28 @@ public sealed class AlarmClientWmProbeTests : IDisposable
// literally mean "match alarms in state 'none'" (i.e., nothing),
// since the eAlarmFilterState enum is 0/1/2/3 single-states not
// flag bits. Try ActiveNow explicitly.
int subscribe = client.Subscribe(
szSubscription: SubscriptionExpression,
wFromPri: 0, wToPri: short.MaxValue,
QueryType: eQueryType.qtHistory,
SortFlags: eSortFlags.sfReturnNewestFirst,
FilterMask: eAlarmFilterState.asAlarmActiveNow,
FilterSpecification: eAlarmFilterState.asAlarmActiveNow);
Log($"Subscribe('{SubscriptionExpression}', qtHistory, state=ActiveNow, pri=[0..32767]) -> {subscribe}");
// Subscribe to every candidate expression — AVEVA accepts multiple
// overlapping subscriptions; whichever matches the producer wins.
foreach (string expr in SubscriptionExpressions)
{
try
{
int subscribe = client.Subscribe(
szSubscription: expr,
wFromPri: 0, wToPri: short.MaxValue,
QueryType: eQueryType.qtSummary,
SortFlags: eSortFlags.sfReturnNewestFirst,
FilterMask: eAlarmFilterState.asAlarmActiveNow,
FilterSpecification: eAlarmFilterState.asAlarmActiveNow);
Log($"Subscribe('{expr}') -> {subscribe}");
}
catch (Exception ex)
{
Log($"Subscribe('{expr}') threw: {ex.GetType().Name}: {ex.Message}");
}
}
LogProviders(client, "after Subscribe");
LogProviders(client, "after Subscribe-multi");
// 3c. Pump for the configured duration. Log every message we see
// (filtered light to avoid noise from WM_PAINT / WM_TIMER /
@@ -322,6 +345,7 @@ public sealed class AlarmClientWmProbeTests : IDisposable
{
PollGetStatistics(client, ++pollCount);
LogProviders(client, $"poll #{pollCount}");
PollAllChannels(client, pollCount);
nextPoll = DateTime.UtcNow + PollInterval;
}
Thread.Sleep(10);
@@ -349,6 +373,122 @@ public sealed class AlarmClientWmProbeTests : IDisposable
private string lastStatsSummary = string.Empty;
private string lastProvidersSummary = string.Empty;
private string lastHighPriSummary = string.Empty;
private string lastSfStatsSummary = string.Empty;
/// <summary>
/// Try every read API the AlarmClient exposes and log when its
/// output changes. AlarmClient has at least three distinct read
/// surfaces — GetStatistics (current-change array), GetHighPriAlarm
/// (single-record peek), and the SF (stored filter) family — and any
/// of them might be the populated one.
/// </summary>
private static AlarmRecord NewAlarmRecord()
{
// The interop's auto-marshal flips DateTime fields to FILETIME on
// the way IN as well as OUT. default(DateTime) (year 1) is outside
// FILETIME's representable range, so initialize all DateTime fields
// to the FILETIME epoch (1601-01-01 UTC) to satisfy the marshaler.
AlarmRecord rec = new AlarmRecord();
DateTime epoch = new DateTime(1601, 1, 1, 0, 0, 0, DateTimeKind.Utc);
foreach (var f in typeof(AlarmRecord).GetFields(
BindingFlags.Public | BindingFlags.Instance | BindingFlags.NonPublic))
{
if (f.FieldType == typeof(DateTime))
{
object boxed = rec;
f.SetValue(boxed, epoch);
rec = (AlarmRecord)boxed;
}
}
return rec;
}
private void PollAllChannels(AlarmClient client, int seq)
{
// Channel A: GetHighPriAlarm — direct peek of highest-priority alarm.
try
{
AlarmRecord rec = NewAlarmRecord();
int rc = client.GetHighPriAlarm(ref rec);
string desc = rc == 0 ? DescribeAlarmRecord(rec) : "<no record>";
string summary = $"rc={rc} {desc}";
if (summary != lastHighPriSummary)
{
Log($"GetHighPriAlarm #{seq}: {summary} (changed)");
lastHighPriSummary = summary;
}
}
catch (Exception ex)
{
string es = $"{ex.GetType().Name}: {ex.Message}";
if (es != lastHighPriSummary)
{
Log($"GetHighPriAlarm #{seq}: threw {es}");
lastHighPriSummary = es;
}
}
// Channel C: GetAlarmExtendedRec by index. Try indices 0..3 directly;
// populated alarms (if any) appear at low indices.
for (int idx = 0; idx <= 2; idx++)
{
try
{
AlarmRecord rec = NewAlarmRecord();
int rc = client.GetAlarmExtendedRec(idx, ref rec);
if (rc == 0)
{
string desc = DescribeAlarmRecord(rec);
Log($"GetAlarmExtendedRec(idx={idx}) #{seq}: rc=0 -> {desc}");
break; // log first present record only
}
}
catch (Exception ex)
{
if (idx == 0)
{
Log($"GetAlarmExtendedRec(idx=0) #{seq}: threw {ex.GetType().Name}: {ex.Message}");
}
break;
}
}
// Channel B: SF — snapshot + GetStatistics + iterate.
try
{
uint numAlarms = 0;
int sfCreate = client.SFCreateSnapshot(0, ref numAlarms);
int unackRet = 0, unackAlm = 0, ackAlm = 0, others = 0, events = 0, idxNewest = 0;
int sfStats = client.SFGetStatistics(
ref unackRet, ref unackAlm, ref ackAlm,
ref others, ref events, ref idxNewest);
string summary = $"SFCreate={sfCreate} numAlarms={numAlarms} " +
$"SFStats={sfStats} unackRet={unackRet} unackAlm={unackAlm} " +
$"ackAlm={ackAlm} others={others} events={events} idxNewest={idxNewest}";
if (summary != lastSfStatsSummary)
{
Log($"SF channel #{seq}: {summary} (changed)");
lastSfStatsSummary = summary;
// If non-zero, fetch the first record by index via the
// standard GetAlarmExtendedRec — after SFCreateSnapshot the
// indices reference the snapshot.
if (numAlarms > 0)
{
AlarmRecord rec = new AlarmRecord();
int recRc = client.GetAlarmExtendedRec(0, ref rec);
Log($" GetAlarmExtendedRec(0) [post-snapshot] rc={recRc} -> {DescribeAlarmRecord(rec)}");
}
}
client.SFDeleteSnapshot();
}
catch (Exception ex)
{
Log($"SF channel #{seq}: threw {ex.GetType().Name}: {ex.Message}");
}
}
private void LogProviders(AlarmClient client, string when)
{