Telemetry-002 was first resolved by documenting the scalar-only limitation; it is now
implemented (recursive nested redaction). Updated the two resolution notes to record
05cc62a and the replaced limitation test, preserving the audit trail. README unchanged
(still 0 pending / 35 total).
18 KiB
Code Review — Telemetry
| Field | Value |
|---|---|
| Library | ZB.MOM.WW.Telemetry/ |
| Packages | ZB.MOM.WW.Telemetry, ZB.MOM.WW.Telemetry.Serilog |
| Component spec | components/observability/spec/SPEC.md |
| Shared contract | components/observability/shared-contract/ZB.MOM.WW.Telemetry.md |
| Status | Reviewed |
| Last reviewed | 2026-06-01 |
| Reviewer | Claude (automated baseline) |
| Commit reviewed | 5f75cd4 |
| Open findings | 0 |
Summary
The library is small, focused, and well-structured: two packages with a clean Serilog/OTel
boundary (the Serilog.* stack appears only in the .Serilog package; the core package is
pure OTel + ASP.NET Core framework reference), correct argument validation, deliberate
sealed types, thorough XML docs, and a deliberate no-process-global-state design for
AddZbSerilog that is well covered by MultiHostTests. The identity triple, Resource
omission rules, exporter wiring (Prometheus always-on, OTLP additive), and trace/log
correlation all match the spec's intent and are exercised by the 19 tests.
The most material problems are in the redaction seam — the one component the review brief
flags as security-critical. RedactionEnricher honours only replacement of scalar
properties: it silently ignores the redactor removing a key (a documented capability of
ILogRedactor), and it cannot see inside destructured/structured property values, so a
secret logged as a field of {@Object} is never scrubbed. Both let secrets reach sinks
despite a conforming redactor. Secondary themes: a spec drift around an undocumented
service.instance.id Resource attribute, two hand-maintained Resource-attribute builders
that can drift apart, and a stale doc-comment on MapZbMetrics. Tests are solid for the
happy paths but have no coverage for redactor removal or structured-value redaction.
Checklist coverage
| # | Category | Examined | Notes |
|---|---|---|---|
| 1 | Correctness & logic bugs | ☑ | Redactor "remove" path is a no-op (Telemetry-001); structured values opaque to redactor (Telemetry-002). |
| 2 | Public API surface & compatibility | ☑ | Surface minimal, sealed, nullable-correct. ZbResource.InstanceId is an added public member not in the contract (Telemetry-004). |
| 3 | Concurrency & thread safety | ☑ | No issues found. Enrichers stateless; Lazy uses ExecutionAndPublication; Activity.Current is async-local. |
| 4 | Error handling & resilience | ☑ | Guard clauses present. new Uri(OtlpEndpoint) can throw late on malformed input (Telemetry-006). |
| 5 | Security & secret handling | ☑ | Redaction gaps (Telemetry-001/002) are security-relevant — secrets can survive a conforming redactor. |
| 6 | Performance & resource management | ☑ | Per-event dictionary snapshot when a redactor is registered (Telemetry-007); acceptable but noted. |
| 7 | Spec & shared-contract adherence | ☑ | Undocumented service.instance.id attribute (Telemetry-004); two Resource builders that can drift (Telemetry-005). |
| 8 | Packaging, dependencies & project layout | ☑ | No issues found. Serilog stack confined to .Serilog; central versions; correct net10.0; framework ref justified. |
| 9 | Testing coverage | ☑ | No tests for redactor removal or structured-value redaction (Telemetry-003). |
| 10 | Documentation & XML docs | ☑ | MapZbMetrics doc-comment is stale: claims "only valid when Exporter = Prometheus" (Telemetry-008). |
Findings
Telemetry-001 — RedactionEnricher ignores property removal, leaving secrets in the event
| Severity | High |
| Category | Security & secret handling |
| Status | Resolved |
| Location | ZB.MOM.WW.Telemetry/src/ZB.MOM.WW.Telemetry.Serilog/RedactionEnricher.cs:49-67 |
Description
ILogRedactor.Redact is documented to let a project "remove or replace any sensitive
values" (ILogRedactor.cs:13 and the XML doc on the interface method: "remove or replace";
the shared contract repeats "Remove or replace any sensitive values"). RedactionEnricher
builds a snapshot dictionary, hands it to the redactor, then writes back only via
AddOrUpdateProperty for entries that remain in the snapshot and HasChanged:
foreach (var entry in snapshot)
{
if (HasChanged(logEvent, entry.Key, entry.Value))
logEvent.AddOrUpdateProperty(propertyFactory.CreateProperty(entry.Key, entry.Value));
}
If a redactor removes a key from the dictionary (properties.Remove("apiKey")) — the most
natural way to implement "must not leave the process" — that key simply no longer appears in
the write-back loop, so the original property is never removed from logEvent. The
secret reaches every sink unredacted, even though the redactor did exactly what its contract
permits. This defeats the seam's stated operational guarantee ("secrets never leave the
process in log events") for any removal-style redactor.
Recommendation
After calling the redactor, reconcile deletions: for each property key present on the
original logEvent but absent from the returned snapshot, call
logEvent.RemovePropertyIfPresent(key). (Capture the original key set before mutation, then
diff.) Add a test asserting a removing redactor scrubs the property (see Telemetry-003).
Resolution
Resolved in 544a6dd, 2026-06-01 — RedactionEnricher now captures the original property
key set and calls RemovePropertyIfPresent for any key the redactor dropped from the snapshot,
so a removing redactor scrubs the property; covered by a new removing-redactor test.
Telemetry-002 — Redactor cannot inspect or scrub destructured/structured property values
| Severity | Medium |
| Category | Security & secret handling |
| Status | Resolved |
| Location | ZB.MOM.WW.Telemetry/src/ZB.MOM.WW.Telemetry.Serilog/RedactionEnricher.cs:49-55 |
Description
The snapshot only unwraps ScalarValue; every other LogEventPropertyValue
(StructureValue from {@Object}, SequenceValue, DictionaryValue) is passed to the
redactor as the raw Serilog wrapper object:
snapshot[property.Key] = property.Value is ScalarValue scalar ? scalar.Value : property.Value;
A project redactor written against the seam (IDictionary<string, object?> of "values")
therefore sees an opaque StructureValue for a destructured payload — it cannot read or
mask a secret field inside a logged object (e.g. logger.Information("{@Command}", cmd)
where cmd.ApiKey is sensitive). MxGateway's reference redactor specifically guards
"which command payloads must not leave the process" (per ILogRedactor XML doc and the
contract), which is precisely the destructured-object case. The seam silently cannot meet
that requirement; the redactor only works for top-level scalar properties.
Recommendation
Document the seam's actual reach (scalar top-level properties only) on ILogRedactor and in
the shared contract, and/or recursively project StructureValue/SequenceValue/
DictionaryValue into the snapshot and rebuild them on write-back so nested fields are
redactable. At minimum, make the limitation explicit so consumers do not assume nested
payloads are scrubbed when they are not.
Resolution
Resolved in 544a6dd, 2026-06-01 (documented the scalar-only limitation), then superseded by
05cc62a, 2026-06-01 — nested redaction implemented. RedactionEnricher now projects each
structured value into a mutable nested view the redactor descends into recursively
(StructureValue → IDictionary<string,object?>, SequenceValue → IList<object?>,
DictionaryValue → IDictionary<string,object?>), so a field nested inside a {@Object} can be
masked or removed. The Project/Rebuild round-trip preserves StructureValue.TypeTag and original
dictionary keys, and a structural ValueEquals skips write-back for properties the redactor left
untouched (no reallocation; scalar fast path retained). The earlier documented-limitation wording on
the ILogRedactor XML doc, shared contract, and README was replaced to document the recursive reach.
Telemetry-003 — No tests for redactor removal or structured-value redaction
| Severity | Medium |
| Category | Testing coverage |
| Status | Resolved |
| Location | ZB.MOM.WW.Telemetry/tests/ZB.MOM.WW.Telemetry.Serilog.Tests/RedactionTests.cs:33-69 |
Description
RedactionTests covers exactly two redaction behaviours: a registered redactor replacing a
scalar value, and a no-op when none is registered. The FakeRedactor only ever reassigns
properties["apiKey"]. There is no test that a redactor which removes a key actually
scrubs it (the Telemetry-001 defect would have been caught), and no test that a redactor can
mask a field of a destructured/structured property (Telemetry-002). For a seam whose entire
purpose is secret containment, the most security-relevant behaviours are untested.
Recommendation
Add tests: (a) a redactor calling properties.Remove(key) results in the property being
absent from the emitted LogEvent; (b) a redactor attempting to mask a nested field of a
{@Object} payload, asserting the documented behaviour (whichever resolution Telemetry-002
takes). These should fail today and pin the fixes.
Resolution
Resolved in 544a6dd, 2026-06-01, then extended in 05cc62a — added
Removing_redactor_scrubs_the_property_from_the_event (red→green for Telemetry-001), a
Resource-attribute parity test, and (for the Telemetry-002 implementation) a nested-reach suite:
mask and remove a field inside a destructured {@Object}, mask a sequence element, mask a
dictionary value, mask a field two levels deep, and an untouched-structure-survives check. The
earlier Redactor_cannot_reach_a_field_inside_a_destructured_object limitation test was replaced.
Telemetry-004 — service.instance.id Resource attribute is undocumented in spec and contract
| Severity | Low |
| Category | Spec & shared-contract adherence |
| Status | Resolved |
| Location | ZB.MOM.WW.Telemetry/src/ZB.MOM.WW.Telemetry/ZbResource.cs:19-45 |
Description
ZbResource adds a service.instance.id attribute (deterministic MachineName:ProcessId)
to the Resource, and exposes it as a new public member ZbResource.InstanceId. The
normalized Resource attribute set is enumerated exhaustively in two authoritative docs —
SPEC.md §2 and METRIC-CONVENTIONS.md §4 — and neither lists service.instance.id;
the shared contract (ZB.MOM.WW.Telemetry.md) likewise documents ZbResource.Build as
populating only service.name/namespace/version/site.id/node.role/host.name and does not
mention an InstanceId member. The attribute itself is a reasonable, standards-aligned
improvement (and disabling the OTel SDK's random-GUID default is sensible for cross-signal
correlation), but it is a silent divergence: the spec/contract are now stale relative to the
code. Per REVIEW-PROCESS §2.7, both directions of drift must be flagged.
Recommendation
Add service.instance.id (with the MachineName:ProcessId rationale) to the Resource table
in SPEC.md §2 and METRIC-CONVENTIONS.md §4, and document the public ZbResource.InstanceId
member in the shared contract, so the normalized spec and the code agree.
Resolution
Resolved in 544a6dd, 2026-06-01 — kept the attribute (documented the
MachineName:ProcessId rationale) and added service.instance.id to the Resource tables in
SPEC.md §2 and METRIC-CONVENTIONS.md §4, plus the ZbResource.InstanceId member to the shared
contract; spec and code now agree.
Telemetry-005 — Two hand-maintained Resource-attribute builders can silently drift
| Severity | Low |
| Category | Spec & shared-contract adherence |
| Status | Resolved |
| Location | ZB.MOM.WW.Telemetry/src/ZB.MOM.WW.Telemetry/ZbResource.cs:38-64, ZB.MOM.WW.Telemetry/src/ZB.MOM.WW.Telemetry.Serilog/ZbSerilogConfig.cs:125-151 |
Description
The Resource attached to metrics/traces is built by ZbResource.Configure (via the OTel
AddService + AddAttributes API), while the Resource attached to the OTLP log sink is
built independently by ZbSerilogConfig.BuildResourceAttributes (a hand-rolled
Dictionary<string, object>). The two currently agree, but they enumerate the same six/seven
attributes in two places with two different mechanisms, so a future change to one (a new
attribute, a renamed key, a changed omission rule) will silently desynchronize logs from
metrics/traces and break the cross-signal correlation the library's whole "unifying hinge"
depends on. There is no test asserting parity between the two attribute sets.
Recommendation
Derive both from a single source of truth — e.g. have ZbResource expose the canonical
attribute map (already mostly the shape BuildResourceAttributes returns) and have the
Serilog sink consume it — or add a parity test that asserts the two attribute sets are
key-for-key identical for a representative options object.
Resolution
Resolved in 544a6dd, 2026-06-01 — introduced ZbResource.BuildAttributes as the single
source of truth; ZbResource.Configure (OTel SDK) and ZbSerilogConfig.BuildResourceAttributes
(OTLP log sink) now both derive from it, and a parity test asserts the two sets are identical.
Telemetry-006 — Malformed OtlpEndpoint throws UriFormatException late, with no context
| Severity | Low |
| Category | Error handling & resilience |
| Status | Resolved |
| Location | ZB.MOM.WW.Telemetry/src/ZB.MOM.WW.Telemetry/ZbTelemetryExtensions.cs:127-135 |
Description
ConfigureOtlp does otlp.Endpoint = new Uri(options.OtlpEndpoint) with no validation. A
malformed endpoint string (typo, missing scheme) throws a raw UriFormatException deep
inside OTel exporter construction at host-build time, with no mention of which option was at
fault. BuildOptions already fails fast and clearly for a missing ServiceName, but does
not validate that OtlpEndpoint is a well-formed absolute URI when Exporter == Otlp (nor
that it is non-empty — an Otlp exporter with a null endpoint is silently registered and
points nowhere). The Serilog path (ZbSerilogConfig) has the same untyped string→endpoint
handoff.
Recommendation
In BuildOptions, when Exporter == ZbExporter.Otlp, validate OtlpEndpoint with
Uri.TryCreate(..., UriKind.Absolute, out _) and throw an ArgumentException naming the
option (consistent with the existing ServiceName guard) rather than letting a bare
UriFormatException escape later.
Resolution
Resolved in 544a6dd, 2026-06-01 — added ZbTelemetryOptionsValidator.Validate, called from
both BuildOptions and AddZbSerilog: when Exporter == Otlp it requires a non-empty,
well-formed absolute OtlpEndpoint and throws a named ArgumentException (no-op for Prometheus);
covered by three new tests.
Telemetry-007 — Redaction snapshot allocates a dictionary on every log event
| Severity | Low |
| Category | Performance & resource management |
| Status | Resolved |
| Location | ZB.MOM.WW.Telemetry/src/ZB.MOM.WW.Telemetry.Serilog/RedactionEnricher.cs:49-67 |
Description
When an ILogRedactor is registered, Enrich allocates a new
Dictionary<string, object?>(logEvent.Properties.Count), copies every property into it, and
then iterates again to diff/write-back — on every single log event, across every logging
thread. Enrichers are on the hottest path in the library (they run for each event the level
filter admits). The early-return when no redactor is registered keeps the common case free,
so the cost is borne only by redaction-enabled consumers (MxGateway), but for a high-volume
gateway this is non-trivial steady-state allocation/GC pressure.
Recommendation
Consider redacting in place against logEvent.Properties without a full snapshot copy (e.g.
only materialize replacements for keys the redactor touches), or short-circuit when the event
has no properties. At minimum, document the per-event cost so consumers can weigh enabling
redaction on very hot loggers. Acceptable as-is given redaction is opt-in and security-first.
Resolution
Resolved in 544a6dd, 2026-06-01 — Enrich now short-circuits before any snapshot allocation
when the event has no properties (and still early-returns when no redactor is registered), so the
per-event dictionary copy is only paid when there is actually something to redact.
Telemetry-008 — MapZbMetrics XML doc claims it is "only valid when Exporter = Prometheus" — stale
| Severity | Low |
| Category | Documentation & XML docs |
| Status | Resolved |
| Location | ZB.MOM.WW.Telemetry/src/ZB.MOM.WW.Telemetry/ZbMetricsEndpointExtensions.cs:11-14 |
Description
The XML doc on MapZbMetrics states it is "Only valid when
ZbTelemetryOptions.Exporter = ZbExporter.Prometheus." That contradicts the actual wiring:
ApplyMetricsExporter (ZbTelemetryExtensions.cs:107-116) always calls
AddPrometheusExporter() regardless of the Exporter setting — OTLP is purely additive.
The library's own README ("Prometheus is always wired for metrics regardless of the
Exporter setting") and the test AddZbTelemetry_OtlpExporter_StillServesPrometheusEndpoint
both confirm /metrics works under Exporter = Otlp. The doc-comment therefore tells
consumers the opposite of the real (and intended) behaviour and could lead them to wrongly
believe MapZbMetrics is a no-op under OTLP. The same stale wording is mirrored in the
shared contract (ZB.MOM.WW.Telemetry.md, MapZbMetrics summary).
Recommendation
Update the doc-comment to state that the Prometheus exporter is always registered and
MapZbMetrics is valid under any Exporter value (Prometheus is always-on; OTLP is an
overlay). Align the shared-contract summary for MapZbMetrics to match.
Resolution
Resolved in 544a6dd, 2026-06-01 — rewrote the MapZbMetrics XML doc to state it is valid
under any Exporter value (Prometheus always-on; OTLP additive overlay) and aligned the matching
shared-contract summary.