feat(observability): F13d Prometheus + OpenTelemetry instrumentation
v2-ci / build (push) Failing after 38s
v2-ci / unit-tests (tests/Core/ZB.MOM.WW.OtOpcUa.Cluster.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.ControlPlane.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Runtime.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Security.Tests) (push) Has been skipped
v2-ci / integration (push) Has been skipped

OtOpcUaTelemetry (Commons/Observability) centralizes the project's Meter
+ ActivitySource so all instrumentation points emit through a single
named surface. Counters cover the hot paths:

  otopcua.deploy.applied               (outcome=ack|reject)
  otopcua.deploy.apply.duration        (s, histogram)
  otopcua.driver.lifecycle             (event=spawn|spawn_stub|stop|fault)
  otopcua.virtualtag.eval              (outcome=ok|fail|skip)
  otopcua.scriptedalarm.transition     (state=activated|acknowledged|cleared)
  otopcua.opcua.sink.write             (kind=value|alarm|rebuild)
  otopcua.redundancy.service_level_change (level=byte)

Plus two ActivitySource spans:

  otopcua.deploy.apply                 wraps DriverHostActor.ApplyAndAck
  otopcua.opcua.address_space_rebuild  wraps OpcUaPublishActor.HandleRebuild

Instruments are no-op until a listener attaches, so tests + dev hosts
pay nothing for unread telemetry.

Host Program.cs gains AddOtOpcUaObservability() (binds the OtOpcUa Meter
+ ActivitySource to OpenTelemetry, attaches a Prometheus exporter) and
MapOtOpcUaMetrics() (mounts /metrics scrape endpoint). Driver-side
internals + ASP.NET request metrics deliberately stay off — the scrape
payload is scoped to OtOpcUa signals only.

Tests use MeterListener + ActivityListener to verify
VirtualTagActor.eval, OpcUaPublishActor.AttributeValueUpdate, and
RebuildAddressSpace actually emit on the central instruments. Runtime
suite is 72 / 72 green (+3).

Closes #105. Path A (F13b/c/d) complete; next batch options: #85 UNS
folder hierarchy in SDK, or F8b/F9b production engine bindings.
This commit is contained in:
Joseph Doherty
2026-05-26 10:29:40 -04:00
parent 21eac21409
commit 52997ee164
10 changed files with 352 additions and 3 deletions
@@ -0,0 +1,38 @@
using OpenTelemetry.Metrics;
using OpenTelemetry.Trace;
using ZB.MOM.WW.OtOpcUa.Commons.Observability;
namespace ZB.MOM.WW.OtOpcUa.Host.Observability;
/// <summary>
/// Wires the OtOpcUa Meter + ActivitySource into OpenTelemetry and exposes a Prometheus
/// scrape endpoint at <c>/metrics</c> on the host pipeline. F13d slice — only the meter +
/// activity source declared in <see cref="OtOpcUaTelemetry"/> are surfaced; per-Akka
/// internals + ASP.NET request metrics stay off by default to keep the scrape payload
/// scoped to OtOpcUa-owned signals.
/// </summary>
public static class ObservabilityExtensions
{
public static IServiceCollection AddOtOpcUaObservability(this IServiceCollection services)
{
services.AddOpenTelemetry()
.WithMetrics(b => b
.AddMeter(OtOpcUaTelemetry.MeterName)
.AddPrometheusExporter())
.WithTracing(b => b
.AddSource(OtOpcUaTelemetry.ActivitySourceName));
return services;
}
/// <summary>
/// Mounts the Prometheus scrape endpoint on the existing ASP.NET pipeline. Call after
/// <c>app.UseAuthentication/UseAuthorization</c> if metrics access should require auth;
/// the default leaves it unauthenticated for local Prometheus scrapes.
/// </summary>
public static IEndpointRouteBuilder MapOtOpcUaMetrics(this IEndpointRouteBuilder app)
{
app.MapPrometheusScrapingEndpoint("/metrics");
return app;
}
}