probe: aaAlarmManagedClient receives no alarm data — full consumer chain verified

Sixth probe iteration with every consumer-side knob exhausted:

- Subscriptions tried (all rc=0): \Galaxy!, \Galaxy!*, \Galaxy!,
  \Galaxy!TestArea, \.\Galaxy!.
- Read channels polled at 500ms: GetStatistics, GetHighPriAlarm,
  SFCreateSnapshot + SFGetStatistics.
- Filters: priority 0..32767, qtSummary + qtHistory both tried,
  asAlarmActiveNow.
- AlarmRecord pre-init to FILETIME epoch to dodge marshaler bug
  on default(DateTime).

Result: every read API returns empty for the entire 60s window
even with TestMachine_001.TestAlarm001 firing every 10s and
aaObjectViewer confirming InAlarm transitions. The
aaAlarmManagedClient.AlarmClient is not the receive surface
AVEVA's alarm pipeline routes to in this Galaxy configuration.

The consumer chain is verified working end-to-end: Initialize +
Register + Subscribe all succeed, GetProviders finds the
provider, the WM 0xC275 heartbeat fires at 1Hz to AVEVA's
internal hwnd. There is simply no alarm data flowing through
this consumer surface.

Next investigation is not consumer-side: either find the SDK
aaObjectViewer's alarm panel uses, or query the historian
event storage directly. If alarms only flow via the historian
path on this customer's Galaxy, the worker's PR A.5 architecture
is a dead-end and A.2 needs a different transport.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
Joseph Doherty
2026-05-01 08:26:29 -04:00
parent 8ac6642bf8
commit bb7be14d1d
2 changed files with 225 additions and 19 deletions
+76 -10
View File
@@ -258,16 +258,82 @@ the script's writes. So the alarm extension is **evaluating**
its condition, just not visibly producing transitions on the
`aaAlarmManagedClient` consumer stream.
This isolates the unknown to the producer-side path — whether
the BoolAlarm extension's "publish to alarm manager" knob is on,
whether the platform is in an alarm area that matches the
consumer's subscription scope, or whether AVEVA has a separate
"events" path the BoolAlarm uses by default that this consumer
doesn't subscribe to. Resolving requires checking the BoolAlarm
extension's config in System Platform IDE (alarm priority,
category, "Active"/"Enabled" flags, alarm-vs-event mode) and
checking whether `aaObjectViewer`'s Active Alarms panel sees the
alarm fire.
## Multi-channel + multi-subscription probe — sixth run, 2026-05-01
Extended the probe to try every consumer-side approach in
parallel:
- **Subscription expressions** (sequential): `\Galaxy!`,
`\Galaxy!*`, `\\Galaxy!`, `\Galaxy!TestArea`, `\\.\Galaxy!`.
All Subscribe calls returned rc=0; the last one
(`\\.\Galaxy!`) is reflected in `GetProviders` (count=1).
- **Read channels** polled at 500ms cadence: `GetStatistics`,
`GetHighPriAlarm`, `SFCreateSnapshot` + `SFGetStatistics`.
- **Filter+sort**: priority 0..32767, `qtSummary`,
state=`asAlarmActiveNow`, sort=`sfReturnNewestFirst`.
- **AlarmRecord init** (worked around `Not a valid Win32
FileTime` exception): all DateTime fields pre-set to FILETIME
epoch (1601-01-01 UTC) before the call, since
`default(DateTime)` is outside FILETIME range and trips the
interop marshaler.
Result of the 60s run with `TestMachine_001.TestAlarm001` being
flipped every 10s:
```
Subscribe('\Galaxy!') -> 0
Subscribe('\Galaxy!*') -> 0
Subscribe('\\Galaxy!') -> 0
Subscribe('\Galaxy!TestArea') -> 0
Subscribe('\\.\Galaxy!') -> 0
GetProviders [after Subscribe-multi] -> count=1 list=[ 0 \\.\Galaxy!]
GetStatistics #1: total=0 active=0 changes=1 codes=[7] positions=[] handles=[]
GetHighPriAlarm #1: rc=0 { }
SF channel #1: SFCreate=0 numAlarms=0 SFStats=0 unackRet=0 unackAlm=0 ackAlm=0 others=0 events=0 idxNewest=-1
```
**No further "(changed)" entries for the entire 60s window.**
Every read API returned the same empty result on every poll.
User confirms the alarm IS firing — `aaObjectViewer` sees
`$Alarm.InAlarm` flip in lockstep with the script. Historian
records exist (per user — needs verification by querying the
historian directly).
## Conclusion of consumer-side probing
`aaAlarmManagedClient.AlarmClient` is **not** the receive
surface AVEVA's alarm pipeline routes to in this Galaxy
configuration. The consumer chain is verified end-to-end:
- `InitializeConsumer` + `RegisterConsumer` + `Subscribe` all
succeed (rc=0).
- `GetProviders` finds `\Galaxy!` once Initialize is called.
- All read APIs (`GetStatistics`, `GetHighPriAlarm`,
`SFCreateSnapshot`/`SFGetStatistics`) return empty even with
every documented filter combination.
- The consumer's hWnd receives zero AVEVA messages between
`WM_CREATE` and `WM_DESTROY`; AVEVA's traffic goes to its own
internal hwnd.
The next investigation directions are not consumer-side:
1. **Inspect `aaObjectViewer`'s alarm SDK** to see what library
it uses to read alarms. If different from
`aaAlarmManagedClient`, switch the worker over.
2. **Query the historian directly** (`aahEventStorage` /
`aahEventSvc`) to confirm alarms are recorded — and use the
same path for v2 alarm capture.
3. **Inspect AVEVA's alarm-routing config** for this Galaxy in
System Platform IDE — area assignments, alarm provider
bindings, "publish alarm events to" settings on the platform.
For A.2 implementation: the `aaAlarmManagedClient` path the
gateway-worker is currently architected around may be a
dead-end on customer Galaxies configured this way. If the
alarms truly only flow through the historian event-storage path,
A.2 needs to consume from `aahEventStorage` instead — a
fundamental architecture pivot.
### Implications for A.2 implementation
@@ -32,6 +32,17 @@ namespace MxGateway.Worker.Tests;
public sealed class AlarmClientWmProbeTests : IDisposable
{
// Probe configuration. Override in the constructor below if needed.
// Try multiple subscription expressions sequentially (each Subscribe call
// adds to the consumer's scope). The "everything" form varies by AVEVA
// version — we shotgun common forms.
private static readonly string[] SubscriptionExpressions =
{
@"\Galaxy!", // documented "all groups under Galaxy provider"
@"\Galaxy!*", // wildcard variant
@"\\Galaxy!", // double-backslash UNC-style
@"\Galaxy!TestArea", // explicit area where TestMachine_001 lives
@"\\.\Galaxy!", // local-host prefix
};
private const string SubscriptionExpression = @"\Galaxy!";
private static readonly TimeSpan PumpDuration = TimeSpan.FromSeconds(60);
private static readonly TimeSpan PollInterval = TimeSpan.FromMilliseconds(500);
@@ -281,16 +292,28 @@ public sealed class AlarmClientWmProbeTests : IDisposable
// literally mean "match alarms in state 'none'" (i.e., nothing),
// since the eAlarmFilterState enum is 0/1/2/3 single-states not
// flag bits. Try ActiveNow explicitly.
int subscribe = client.Subscribe(
szSubscription: SubscriptionExpression,
wFromPri: 0, wToPri: short.MaxValue,
QueryType: eQueryType.qtHistory,
SortFlags: eSortFlags.sfReturnNewestFirst,
FilterMask: eAlarmFilterState.asAlarmActiveNow,
FilterSpecification: eAlarmFilterState.asAlarmActiveNow);
Log($"Subscribe('{SubscriptionExpression}', qtHistory, state=ActiveNow, pri=[0..32767]) -> {subscribe}");
// Subscribe to every candidate expression — AVEVA accepts multiple
// overlapping subscriptions; whichever matches the producer wins.
foreach (string expr in SubscriptionExpressions)
{
try
{
int subscribe = client.Subscribe(
szSubscription: expr,
wFromPri: 0, wToPri: short.MaxValue,
QueryType: eQueryType.qtSummary,
SortFlags: eSortFlags.sfReturnNewestFirst,
FilterMask: eAlarmFilterState.asAlarmActiveNow,
FilterSpecification: eAlarmFilterState.asAlarmActiveNow);
Log($"Subscribe('{expr}') -> {subscribe}");
}
catch (Exception ex)
{
Log($"Subscribe('{expr}') threw: {ex.GetType().Name}: {ex.Message}");
}
}
LogProviders(client, "after Subscribe");
LogProviders(client, "after Subscribe-multi");
// 3c. Pump for the configured duration. Log every message we see
// (filtered light to avoid noise from WM_PAINT / WM_TIMER /
@@ -322,6 +345,7 @@ public sealed class AlarmClientWmProbeTests : IDisposable
{
PollGetStatistics(client, ++pollCount);
LogProviders(client, $"poll #{pollCount}");
PollAllChannels(client, pollCount);
nextPoll = DateTime.UtcNow + PollInterval;
}
Thread.Sleep(10);
@@ -349,6 +373,122 @@ public sealed class AlarmClientWmProbeTests : IDisposable
private string lastStatsSummary = string.Empty;
private string lastProvidersSummary = string.Empty;
private string lastHighPriSummary = string.Empty;
private string lastSfStatsSummary = string.Empty;
/// <summary>
/// Try every read API the AlarmClient exposes and log when its
/// output changes. AlarmClient has at least three distinct read
/// surfaces — GetStatistics (current-change array), GetHighPriAlarm
/// (single-record peek), and the SF (stored filter) family — and any
/// of them might be the populated one.
/// </summary>
private static AlarmRecord NewAlarmRecord()
{
// The interop's auto-marshal flips DateTime fields to FILETIME on
// the way IN as well as OUT. default(DateTime) (year 1) is outside
// FILETIME's representable range, so initialize all DateTime fields
// to the FILETIME epoch (1601-01-01 UTC) to satisfy the marshaler.
AlarmRecord rec = new AlarmRecord();
DateTime epoch = new DateTime(1601, 1, 1, 0, 0, 0, DateTimeKind.Utc);
foreach (var f in typeof(AlarmRecord).GetFields(
BindingFlags.Public | BindingFlags.Instance | BindingFlags.NonPublic))
{
if (f.FieldType == typeof(DateTime))
{
object boxed = rec;
f.SetValue(boxed, epoch);
rec = (AlarmRecord)boxed;
}
}
return rec;
}
private void PollAllChannels(AlarmClient client, int seq)
{
// Channel A: GetHighPriAlarm — direct peek of highest-priority alarm.
try
{
AlarmRecord rec = NewAlarmRecord();
int rc = client.GetHighPriAlarm(ref rec);
string desc = rc == 0 ? DescribeAlarmRecord(rec) : "<no record>";
string summary = $"rc={rc} {desc}";
if (summary != lastHighPriSummary)
{
Log($"GetHighPriAlarm #{seq}: {summary} (changed)");
lastHighPriSummary = summary;
}
}
catch (Exception ex)
{
string es = $"{ex.GetType().Name}: {ex.Message}";
if (es != lastHighPriSummary)
{
Log($"GetHighPriAlarm #{seq}: threw {es}");
lastHighPriSummary = es;
}
}
// Channel C: GetAlarmExtendedRec by index. Try indices 0..3 directly;
// populated alarms (if any) appear at low indices.
for (int idx = 0; idx <= 2; idx++)
{
try
{
AlarmRecord rec = NewAlarmRecord();
int rc = client.GetAlarmExtendedRec(idx, ref rec);
if (rc == 0)
{
string desc = DescribeAlarmRecord(rec);
Log($"GetAlarmExtendedRec(idx={idx}) #{seq}: rc=0 -> {desc}");
break; // log first present record only
}
}
catch (Exception ex)
{
if (idx == 0)
{
Log($"GetAlarmExtendedRec(idx=0) #{seq}: threw {ex.GetType().Name}: {ex.Message}");
}
break;
}
}
// Channel B: SF — snapshot + GetStatistics + iterate.
try
{
uint numAlarms = 0;
int sfCreate = client.SFCreateSnapshot(0, ref numAlarms);
int unackRet = 0, unackAlm = 0, ackAlm = 0, others = 0, events = 0, idxNewest = 0;
int sfStats = client.SFGetStatistics(
ref unackRet, ref unackAlm, ref ackAlm,
ref others, ref events, ref idxNewest);
string summary = $"SFCreate={sfCreate} numAlarms={numAlarms} " +
$"SFStats={sfStats} unackRet={unackRet} unackAlm={unackAlm} " +
$"ackAlm={ackAlm} others={others} events={events} idxNewest={idxNewest}";
if (summary != lastSfStatsSummary)
{
Log($"SF channel #{seq}: {summary} (changed)");
lastSfStatsSummary = summary;
// If non-zero, fetch the first record by index via the
// standard GetAlarmExtendedRec — after SFCreateSnapshot the
// indices reference the snapshot.
if (numAlarms > 0)
{
AlarmRecord rec = new AlarmRecord();
int recRc = client.GetAlarmExtendedRec(0, ref rec);
Log($" GetAlarmExtendedRec(0) [post-snapshot] rc={recRc} -> {DescribeAlarmRecord(rec)}");
}
}
client.SFDeleteSnapshot();
}
catch (Exception ex)
{
Log($"SF channel #{seq}: threw {ex.GetType().Name}: {ex.Message}");
}
}
private void LogProviders(AlarmClient client, string when)
{