2697af31d19f1f27c825f6c8bb983b2046341422
12 Commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
2697af31d1 |
feat(opcua,host): #81 ServiceLevel SDK publisher
SdkServiceLevelPublisher writes Server.ServiceLevel through the SDK's ServerObjectState — the standard OPC UA non-transparent-redundancy signal clients use to pick a primary. Writes are guarded by DiagnosticsLock so concurrent SDK diagnostics scans don't fight with our updates. DeferredServiceLevelPublisher mirrors the DeferredAddressSpaceSink late- binding pattern: Akka actors resolve IServiceLevelPublisher at construction, hosted service swaps the SDK publisher in after StandardServer.Start. Host Program.cs registers DeferredServiceLevelPublisher as the singleton bound to IServiceLevelPublisher; OtOpcUaServerHostedService gets it injected and fills it once IServerInternal is available. Tests boot a real StandardServer on a free port (cross-platform), call Publish, then verify ServerObject.ServiceLevel.Value reflects the write. 5 new tests; OpcUaServer suite now 45/45 green (was 40, +5). Closes #81 residual. Unblocks Task 60 (OPC UA dual-endpoint + ServiceLevel tests). |
||
|
|
52997ee164 |
feat(observability): F13d Prometheus + OpenTelemetry instrumentation
v2-ci / build (push) Failing after 38s
v2-ci / unit-tests (tests/Core/ZB.MOM.WW.OtOpcUa.Cluster.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.ControlPlane.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Runtime.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Security.Tests) (push) Has been skipped
v2-ci / integration (push) Has been skipped
OtOpcUaTelemetry (Commons/Observability) centralizes the project's Meter + ActivitySource so all instrumentation points emit through a single named surface. Counters cover the hot paths: otopcua.deploy.applied (outcome=ack|reject) otopcua.deploy.apply.duration (s, histogram) otopcua.driver.lifecycle (event=spawn|spawn_stub|stop|fault) otopcua.virtualtag.eval (outcome=ok|fail|skip) otopcua.scriptedalarm.transition (state=activated|acknowledged|cleared) otopcua.opcua.sink.write (kind=value|alarm|rebuild) otopcua.redundancy.service_level_change (level=byte) Plus two ActivitySource spans: otopcua.deploy.apply wraps DriverHostActor.ApplyAndAck otopcua.opcua.address_space_rebuild wraps OpcUaPublishActor.HandleRebuild Instruments are no-op until a listener attaches, so tests + dev hosts pay nothing for unread telemetry. Host Program.cs gains AddOtOpcUaObservability() (binds the OtOpcUa Meter + ActivitySource to OpenTelemetry, attaches a Prometheus exporter) and MapOtOpcUaMetrics() (mounts /metrics scrape endpoint). Driver-side internals + ASP.NET request metrics deliberately stay off — the scrape payload is scoped to OtOpcUa signals only. Tests use MeterListener + ActivityListener to verify VirtualTagActor.eval, OpcUaPublishActor.AttributeValueUpdate, and RebuildAddressSpace actually emit on the central instruments. Runtime suite is 72 / 72 green (+3). Closes #105. Path A (F13b/c/d) complete; next batch options: #85 UNS folder hierarchy in SDK, or F8b/F9b production engine bindings. |
||
|
|
50787823d3 |
feat(host,runtime): #108 Host DI bindings — OPC UA server + deferred sink
v2-ci / build (push) Failing after 45s
v2-ci / unit-tests (tests/Core/ZB.MOM.WW.OtOpcUa.Cluster.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.ControlPlane.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Runtime.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Security.Tests) (push) Has been skipped
v2-ci / integration (push) Has been skipped
Wires the OPC UA SDK into the fused Host's lifecycle on driver-role
nodes + spawns OpcUaPublishActor with the proper sink/publisher/dbFactory/
applier resolution. The full read+write data path is now live in
production: Deploy → DriverHost → OpcUaPublish → SDK NodeManager →
subscribed OPC UA clients.
DeferredAddressSpaceSink (Commons.OpcUa):
- Thread-safe wrapper IOpcUaAddressSpaceSink that delegates to an
inner sink swapped in at runtime. Needed because Akka actors
resolve the sink at construction time, but the production sink
(SdkAddressSpaceSink wrapping OtOpcUaNodeManager) only exists
after the SDK StandardServer has started.
- Defaults to NullOpcUaAddressSpaceSink so calls before swap are
safe; SetSink(null) reverts (for graceful shutdown).
OtOpcUaServerHostedService (Host.OpcUa):
- IHostedService that owns the OPC UA SDK lifecycle. Reads
OpcUaApplicationHostOptions from the 'OpcUa' config section,
creates an OtOpcUaSdkServer, boots it through OpcUaApplicationHost,
then swaps a real SdkAddressSpaceSink into the DeferredAddressSpaceSink
singleton.
- SDK boot failure is logged + non-fatal — the rest of the host
(admin UI, driver actors) keeps running. Stop reverts to null sink.
WithOtOpcUaRuntimeActors (Runtime):
- Now spawns OpcUaPublishActor (new actor) + threads its ActorRef
into DriverHostActor's Props so successful applies trigger the
address-space rebuild pipeline.
- Phase7Applier is constructed here from the resolved sink + a
logger; OpcUaPublishActor takes both.
- Prepends the opcua-synchronized-dispatcher HOCON so the extension
is self-contained — consumers (Host, tests) don't need to redeclare
the dispatcher block.
- New OpcUaPublishActorKey + OpcUaPublishActorName for actor-registry
resolution.
- AddOtOpcUaRuntime now also TryAddSingleton's NullOpcUaAddressSpaceSink
+ NullServiceLevelPublisher so admin-only nodes (or tests that
don't bind the Deferred sink) stay safe.
Host.Program.cs (driver-role only):
- Binds DeferredAddressSpaceSink as singleton + as IOpcUaAddressSpaceSink
- AddHostedService<OtOpcUaServerHostedService>()
Tests: OpcUaServer 24 -> 28 (+4 DeferredAddressSpaceSink unit tests),
Runtime 69 -> 69 (existing ServiceCollectionExtensionsTests extended
to verify the new mux + publish actor registration).
All 6 v2 test suites green: 177 tests passing.
Closes #108. Engine-wiring is now production-bound end-to-end on
driver-role nodes — Deploy reaches real OPC UA Variable nodes that
subscribed clients see.
|
||
|
|
f427dc4f26 |
feat(runtime): #112 ScriptedAlarmActor state persistence via IAlarmActorStateStore
v2-ci / build (push) Failing after 42s
v2-ci / unit-tests (tests/Core/ZB.MOM.WW.OtOpcUa.Cluster.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.ControlPlane.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Runtime.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Security.Tests) (push) Has been skipped
v2-ci / integration (push) Has been skipped
ScriptedAlarmActor now survives actor restart: PreStart loads from the configured store + restores in-memory state; every Transition() fires a fire-and-forget save. ActiveState still re-derives from the evaluator on first tick (Phase 7 decision #14), but Acked state + lastAckUser persist verbatim so operators don't re-ack across an outage. Three pieces: - IAlarmActorStateStore seam in Commons.Engines, with the AlarmActorStateSnapshot record (alarmId / state / lastTransitionUtc / lastAckUser) and NullAlarmActorStateStore default. - EfAlarmActorStateStore in Runtime.ScriptedAlarms — production adapter over the existing ScriptedAlarmState table in ConfigDb. Maps the actor's 3-state enum to the table's AckedState column (Active⇒Unacknowledged, Acknowledged⇒Acknowledged, Inactive⇒ Acknowledged). Concurrency conflicts are logged + dropped — the next transition writes again. - ScriptedAlarmActor PreStart load (async, piped back as StateRestored) + Transition save. New Props overload takes the store; default is NullAlarmActorStateStore so tests stay quiet. Tests: Runtime 52 -> 57 (+5): - Transition writes Active then Acknowledged snapshots with lastAckUser populated - PreStart with persisted Active state restores so a subsequent AcknowledgeAlarm fires (not ignored as it would be from Inactive) - Empty store boots Inactive (AcknowledgeAlarm correctly ignored) - EfAlarmActorStateStore Save + Load round-trips via in-memory EF - Load for unknown alarmId returns null All 6 v2 test suites green: 157 tests passing. Closes #112. F9 (#80) remaining residual is predicate binding to Core.ScriptedAlarms.ScriptedAlarmEngine — split as F9b in tasks JSON. |
||
|
|
a1325299ce |
feat(runtime): F10 OpcUaPublishActor sink seams + redundancy-driven ServiceLevel
OpcUaPublishActor now routes through pluggable seams instead of just incrementing a counter: - IOpcUaAddressSpaceSink (Commons.OpcUa) — WriteValue / WriteAlarmState / RebuildAddressSpace. OpcUaQuality enum moved here from the actor's nested type so producers don't have to reference the actor itself. - IServiceLevelPublisher — Publish(byte). NullServiceLevelPublisher retains the last level for inspection. - The actor subscribes to the redundancy-state DPS topic in PreStart and maps the local node's NodeRedundancyState to a coarse ServiceLevel (Primary+leader=240, Primary=200, Secondary=100, Detached=0). This keeps the local SDK's ServiceLevel node honest without round-tripping back through the admin-singleton calculator. - ServiceLevelChanged dedupes identical levels so the SDK doesn't see redundant writes. - Sink + publisher exceptions are caught and logged; the actor never crashes its own dispatcher. - PropsForTests gets optional sink/publisher/localNode params and skips the DPS subscribe so unit tests stay on a vanilla TestKit cluster. Production binding to a real SDK NodeManager + Variable nodes is the remaining residual — split as F10b. Task 60 still blocked on F10b. Tests: Runtime 40 -> 46 (+6): - AttributeValueUpdate routes to sink - AlarmStateUpdate routes to sink - RebuildAddressSpace calls sink.Rebuild - ServiceLevelChanged dedupes - RedundancyStateChanged for primary-leader publishes 240 - RedundancyStateChanged for secondary publishes 100 All 6 v2 test suites green: 132 tests passing. |
||
|
|
14fb2b05ed |
feat(runtime): F8/F9 engine evaluator seams + DPS fan-out
VirtualTagActor and ScriptedAlarmActor now route through pluggable evaluator interfaces and fan out to the cluster's live-tail topics shipped in F15.3: - IVirtualTagEvaluator + NullVirtualTagEvaluator in Commons.Engines. VirtualTagActor calls evaluator on every DependencyValueChanged, dedupes unchanged values, forwards EvaluationResult to its parent, and publishes ScriptLogEntry Warning to the script-logs DPS topic whenever the evaluator fails. - IScriptedAlarmEvaluator + NullScriptedAlarmEvaluator. ScriptedAlarmActor takes an AlarmConfig (id/name/equipment-path/severity/predicate) and publishes both an AlarmTransitionEvent (alerts topic) and a ScriptLogEntry (script-logs topic) at every transition. Manual ConditionMet/Acknowledge/Cleared still flow through the same Transition() so callers without engine bindings still drive the state machine; the legacy single-string Props() overload routes through a default AlarmConfig. The Null* defaults keep the actors safe when no engine is bound — unconfigured nodes never spuriously alarm. Production binding to Core.VirtualTags.VirtualTagEngine and Core.ScriptedAlarms is the remaining residual (F8b/F9b — split in tasks JSON). Tests: Runtime 34 -> 40 (+6): - VirtualTagActorTests x3 (evaluator drives EvaluationResult, unchanged-value dedup, failure publishes Warning ScriptLogEntry) - ScriptedAlarmActorTests x3 (engine threshold drives Activated + Cleared on alerts topic, manual Acknowledge attribution). All 6 v2 test suites green: 126 tests passing. |
||
|
|
59858129cb |
feat(adminui): F15.3 closes F15 — live alerts/script-log, CSV import, Monaco editor
v2-ci / unit-tests (tests/Core/ZB.MOM.WW.OtOpcUa.Cluster.Tests) (push) Has been cancelled
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.ControlPlane.Tests) (push) Has been cancelled
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.Tests) (push) Has been cancelled
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Runtime.Tests) (push) Has been cancelled
v2-ci / build (push) Has been cancelled
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Security.Tests) (push) Has been cancelled
v2-ci / integration (push) Has been cancelled
Final F15 batch wires up the SignalR-backed live pages, ports the bulk
equipment importer, and progressively enhances the Script source editor
with Monaco.
Message contracts:
- Commons.Messages.Alerts.AlarmTransitionEvent — fires on every alarm
state transition; published on the `alerts` DPS topic by future
ScriptedAlarmActor (F9) emits.
- Commons.Messages.Logging.ScriptLogEntry — one log line emitted by a
hosted script; published on the `script-logs` DPS topic by future
VirtualTagActor (F8) + ScriptedAlarmActor (F9) emits.
(Folder named "Logging" to dodge .gitignore's "logs/" rule.)
SignalR plumbing:
- AlertHub gains MethodName + bridge actor (AlertSignalRBridge)
- ScriptLogHub introduced; ScriptLogSignalRBridge follows the same
DPS-subscribe → IHubContext fan-out pattern as FleetStatusSignalRBridge
- WithOtOpcUaSignalRBridges now spawns all three bridges
- MapOtOpcUaHubs maps /hubs/script-log alongside the existing hubs
Pages:
- /alerts live alarm tail, 200-row capacity
- /script-log live script-log tail with level + script
filter, 500-row capacity
- /clusters/{id}/equipment/import — CSV bulk Equipment add with preview
(Name/MachineCode/UnsLineId/Driver +
optional ZTag/SAPID/Manufacturer/Model;
skips rows whose MachineCode already
exists in the fleet)
- ScriptEdit progressively enhanced with Monaco editor via JSInterop —
the textarea remains Blazor's source of truth and Monaco syncs into it
on every keystroke so @bind keeps working; falls back gracefully if
the CDN is unreachable.
MainLayout nav gains a "Live" section (Deployments, Alerts, Alarms
historian) and a "Scripts" link under Scripting. ClusterEquipment
surfaces the new Import CSV button.
Tally: F15 ships ~42 razor pages + 3 SignalR hubs + 3 bridge actors.
Microsoft.AspNetCore.SignalR.Client added (was already in central PM).
All 104 v2 tests remain green.
|
||
|
|
8f32b89fb9 |
feat(adminui): FleetDiagnosticsClient real Akka ActorSelection round-trip (F17)
v2-ci / unit-tests (tests/Core/ZB.MOM.WW.OtOpcUa.Cluster.Tests) (push) Has been cancelled
v2-ci / build (push) Has been cancelled
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.ControlPlane.Tests) (push) Has been cancelled
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.Tests) (push) Has been cancelled
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Runtime.Tests) (push) Has been cancelled
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Security.Tests) (push) Has been cancelled
v2-ci / integration (push) Has been cancelled
- New Commons.Messages.Fleet.GetDiagnostics request record.
- DriverHostActor handles GetDiagnostics in all three states (Steady, Applying,
Stale); replies with a NodeDiagnosticsSnapshot built from _currentRevision
+ the local NodeId. Drivers list is empty until F7 wires the per-instance
children.
- FleetDiagnosticsClient now resolves the target via ActorSelection at
akka.tcp://{system}@{nodeId}/user/driver-host and Asks with a 3s timeout.
On timeout/peer-down it returns an empty snapshot so the UI degrades
gracefully rather than throwing.
Two new integration tests in Host.IntegrationTests:
- GetDiagnostics_returns_snapshot_with_target_NodeId verifies the
cross-node Ask/Reply works.
- GetDiagnostics_after_deploy_reports_current_revision exercises the
end-to-end path: AdminOps starts a deployment, both DriverHostActors
apply, then diagnostics reports the new revision on both nodes.
All 98 v2 tests pass (was 96 + 2 new).
|
||
|
|
136234e7f2 | feat(commons): add cluster/admin/diagnostics client interfaces | ||
|
|
5d3a5a40d7 | feat(commons): add deploy/admin/audit/redundancy/fleet message contracts | ||
|
|
fee4a8c008 | feat(commons): add correlation/execution/node/deployment/revisionhash types | ||
|
|
30a2104fa5 |
feat(scaffold): introduce 8 v2 component projects
Adds the empty project skeletons that subsequent v2 tasks fill in: src/Core/ZB.MOM.WW.OtOpcUa.Commons (types, interfaces, message contracts) src/Core/ZB.MOM.WW.OtOpcUa.Cluster (Akka.Hosting + cluster wiring) src/Server/ZB.MOM.WW.OtOpcUa.Security (cookie+JWT auth, LDAP) src/Server/ZB.MOM.WW.OtOpcUa.ControlPlane (admin-role cluster singletons) src/Server/ZB.MOM.WW.OtOpcUa.Runtime (per-node driver actors) src/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer (OPC UA SDK application host) src/Server/ZB.MOM.WW.OtOpcUa.AdminUI (Razor class library) src/Server/ZB.MOM.WW.OtOpcUa.Host (single fused web binary) Each project sets TreatWarningsAsErrors=true in its own csproj (per the Directory.Build.props deviation note in the previous commit). NuGetAuditSuppress entries cover transitive vulnerability advisories the new strictness surfaces: - GHSA-g94r-2vxg-569j (OpenTelemetry.Api 1.9.0 via Akka.Cluster.Hosting/Tools) - GHSA-h958-fxgg-g7w3 (Opc.Ua.Core 1.5.374.126 via OpcUaServer) - GHSA-37gx-xxp4-5rgx + GHSA-w3x6-4m5h-cxqf (legacy advisories already accepted) OpcUaServer pins OPCFoundation.NetStandard.Opc.Ua.Configuration to 1.5.374.126 via VersionOverride to match Opc.Ua.Server's transitive Opc.Ua.Core (same constraint as the legacy Server project). Runtime does NOT project-reference any concrete Driver.* assemblies; drivers load reflectively at runtime (Phase 6). Runtime gets the IDriver contract through Core.Abstractions instead. Host's Microsoft.Extensions.Hosting.WindowsServices is conditional on the Windows OS so the project builds on macOS dev machines. Build verification: dotnet build -> 438 warnings (all pre-existing xUnit1051 in legacy Server.Tests/Admin.Tests), 0 errors. Closes Task 9 (build green smoke check, no separate commit). |