From 3d982d9a6597cb5dc8bea850c049a5189167dac1 Mon Sep 17 00:00:00 2001 From: Joseph Doherty Date: Sat, 23 May 2026 18:57:04 -0400 Subject: [PATCH] docs: sync against recent code changes MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Five doc-content updates after this session's code-review resolution sweep. No code touched; pure documentation drift correction. 1. docs/reqs/HighLevelReqs.md (HLR-007 — Service Hosting): Refreshed the deployment description from "three cooperating processes (Server, Admin, Galaxy.Host)" to "two cooperating Windows services (Server, Admin)". The legacy x86 TopShelf Galaxy.Host process was retired in PR 7.2 (2026-04-30); Galaxy access now flows through the in-process Tier-A GalaxyDriver talking gRPC to the sibling mxaccessgw gateway. Also called out decision #30 (AddWindowsService replacing TopShelf) inline. 2. docs/VirtualTags.md: - Line 9: "compiled via Microsoft.CodeAnalysis.CSharp.Scripting" replaced with the current pipeline (Microsoft.CodeAnalysis.CSharp regular compiler — Core.Scripting-008 / -016 retired the CSharpScript/ScriptRunner path). - Line 39: orphan-thread leak description rewritten. The CSharp.Scripting-era "underlying ScriptRunner keeps running on its thread-pool thread until the Roslyn runtime returns" is no longer accurate — the new pipeline binds the script as a regular C# Func<> delegate, so the leak is now "synchronous CPU-bound work on a pool thread" (same operator-visible effect, different mechanism). 3. docs/v2/plan.md decision #29 ("Galaxy Host is a separate Windows service"): Annotated both the decision body and the decision-log table row with "Reversed PR 7.2, 2026-04-30" + a one-line summary of the replacement architecture. The original reasoning is preserved as audit trail per the decision-log convention. 4. docs/v2/implementation/phase-7-scripting-and-alarming.md A.1: Added an Implementation note describing the Core.Scripting-008 / -016 supersession of the original CSharpScript pipeline. The historical record stays; the note points future readers at docs/VirtualTags.md "Compile cache" for the current contract. 5. docs/plans/alarms-over-gateway.md "Files" section under client regeneration: Updated the .NET regeneration instructions to point at the new ZB.MOM.WW.MxGateway.Contracts.csproj path. The old clients/dotnet/MxGateway.Client.csproj no longer exists in the sibling repo (restructure after this plan was written) and the vendored-binaries situation in src/Drivers/ZB.MOM.WW.OtOpcUa.Driver.Galaxy/libs/ is called out so a reader following the plan won't chase a deleted path. Verification: grep against docs/ for the pre-fix wordings ("three cooperating processes", "Galaxy.Host (TopShelf)", "ScriptRunner", the wrong BadDeviceFailure hex code 0x80550000) returns no hits. Co-Authored-By: Claude Opus 4.7 (1M context) --- docs/VirtualTags.md | 4 ++-- docs/plans/alarms-over-gateway.md | 16 +++++++++++++--- docs/reqs/HighLevelReqs.md | 7 +++---- .../phase-7-scripting-and-alarming.md | 2 +- docs/v2/plan.md | 4 ++-- 5 files changed, 21 insertions(+), 12 deletions(-) diff --git a/docs/VirtualTags.md b/docs/VirtualTags.md index 9ab1827..5c68043 100644 --- a/docs/VirtualTags.md +++ b/docs/VirtualTags.md @@ -6,7 +6,7 @@ The runtime is split across two projects: `Core.Scripting` holds the Roslyn sand ## Roslyn script sandbox (`Core.Scripting`) -User scripts are compiled via `Microsoft.CodeAnalysis.CSharp.Scripting` against a `ScriptContext` subclass. `ScriptGlobals` exposes the context as a field named `ctx`, so scripts read `ctx.GetTag("...")` / `ctx.SetVirtualTag("...", ...)` / `ctx.Now` / `ctx.Logger` and return a value. +User scripts are compiled via `Microsoft.CodeAnalysis.CSharp` (regular compiler, not the scripting variant — the original `CSharpScript` pipeline was retired by the Core.Scripting-008 / -016 rewrite, see "Compile cache" below). Each script's source is pasted as the body of a synthesized `CompiledScript.Run(ScriptGlobals)` method against a `ScriptContext` subclass. `ScriptGlobals` exposes the context as a field named `ctx`, so scripts read `ctx.GetTag("...")` / `ctx.SetVirtualTag("...", ...)` / `ctx.Now` / `ctx.Logger` and return a value. ### Compile pipeline (`ScriptEvaluator`) @@ -36,7 +36,7 @@ Similarly, **`System.Threading.Tasks` is now denied** (Core.Scripting-003), whic ### Per-evaluation timeout (`TimedScriptEvaluator`) -Wraps `ScriptEvaluator` with a wall-clock budget. Default `DefaultTimeout = 250ms`. Implementation pushes the inner `RunAsync` onto `Task.Run` (so a CPU-bound script can't hog the calling thread before `WaitAsync` registers its timeout) then awaits `runTask.WaitAsync(Timeout, ct)`. A `TimeoutException` from `WaitAsync` is wrapped as `ScriptTimeoutException`. Caller-supplied `CancellationToken` cancellation wins over the timeout and propagates as `OperationCanceledException` — so a shutdown cancel is not misclassified. **Known leak:** when a CPU-bound script times out, the underlying `ScriptRunner` keeps running on its thread-pool thread until the Roslyn runtime returns (documented trade-off; out-of-process evaluation is a v3 concern). +Wraps `ScriptEvaluator` with a wall-clock budget. Default `DefaultTimeout = 250ms`. Implementation pushes the inner `RunAsync` onto `Task.Run` (so a CPU-bound script can't hog the calling thread before `WaitAsync` registers its timeout) then awaits `runTask.WaitAsync(Timeout, ct)`. A `TimeoutException` from `WaitAsync` is wrapped as `ScriptTimeoutException`. Caller-supplied `CancellationToken` cancellation wins over the timeout and propagates as `OperationCanceledException` — so a shutdown cancel is not misclassified. **Known leak:** when a CPU-bound script times out, the underlying compiled-script delegate keeps running on its `Task.Run` thread-pool thread until it returns of its own accord (the CT is checked only at evaluator entry; once the script body is running, only the script returning or throwing will release the thread). The post-rewrite delegate is a regular C# `Func<>` bound to the synthesized `CompiledScript.Run` method, so this is a vanilla "synchronous CPU-bound work on a pool thread" leak rather than anything Roslyn-specific. Documented trade-off; out-of-process evaluation is a v3 concern. ### Script logger plumbing diff --git a/docs/plans/alarms-over-gateway.md b/docs/plans/alarms-over-gateway.md index 3f96dae..fd274c3 100644 --- a/docs/plans/alarms-over-gateway.md +++ b/docs/plans/alarms-over-gateway.md @@ -583,10 +583,20 @@ language binding. **Depends on:** A.1 merged (proto change live). -**Files** (`c:\Users\dohertj2\Desktop\mxaccessgw\clients\`): +**Files** (`c:\Users\dohertj2\Desktop\mxaccessgw\src\` for .NET — note the sibling +repo restructured after this plan was written; `clients/dotnet/MxGateway.Client.csproj` +no longer exists, the proto contracts now live in +`src/ZB.MOM.WW.MxGateway.Contracts/` under the new namespace +`ZB.MOM.WW.MxGateway.Contracts.Proto[.Galaxy]`; the OtOpcUa driver currently +consumes vendored binaries from the pre-restructure build — see +`src/Drivers/ZB.MOM.WW.OtOpcUa.Driver.Galaxy/libs/README.md`): -1. **.NET** — codegen runs on csproj rebuild via `Grpc.Tools`; just - rebuild `MxGateway.Client.csproj` after pulling A.1. +1. **.NET** — codegen runs on csproj rebuild via `Grpc.Tools`; rebuild + `src/ZB.MOM.WW.MxGateway.Contracts/ZB.MOM.WW.MxGateway.Contracts.csproj` + after pulling A.1. (If unwinding the driver's vendored binaries onto the + new contracts namespace as part of the alarm work, namespace-rename + a + reimplementation of the missing `MxGatewayClient` / `MxGatewaySession` + wrappers is also in scope.) 2. **Python** — run `clients\python\generate-proto.ps1`; commit the regenerated `_pb2.py` + `_pb2_grpc.py` files under `clients\python\src\`. diff --git a/docs/reqs/HighLevelReqs.md b/docs/reqs/HighLevelReqs.md index 6771ed8..d7b9b02 100644 --- a/docs/reqs/HighLevelReqs.md +++ b/docs/reqs/HighLevelReqs.md @@ -1,6 +1,6 @@ # High-Level Requirements -> **Revision** — Refreshed 2026-04-19 for the OtOpcUa v2 multi-driver platform (task #205). The original 2025 text described a single-process Galaxy/MXAccess server called LmxOpcUa. Today the project is the **OtOpcUa** multi-driver OPC UA platform deployed as three cooperating processes (Server, Admin, Galaxy.Host). The Galaxy integration is one of seven shipped drivers. HLR-001 through HLR-008 have been rewritten driver-agnostically; HLR-009 has been retired (the embedded Status Dashboard is superseded by the Admin UI). HLR-010 through HLR-017 are new and cover plug-in drivers, resilience, Config DB / draft-publish, cluster redundancy, fleet-wide identifier uniqueness, Admin UI, audit logging, metrics, and the Roslyn capability-wrapping analyzer. +> **Revision** — Refreshed 2026-05-23 for the OtOpcUa v2 multi-driver platform. The original 2025 text described a single-process Galaxy/MXAccess server called LmxOpcUa. Today the project is the **OtOpcUa** multi-driver OPC UA platform deployed as two cooperating processes (Server, Admin). The Galaxy integration is one of seven shipped drivers and is now an in-process Tier-A driver that talks gRPC to a separately installed `mxaccessgw` gateway (sibling repo) — PR 7.2 (2026-04-30) retired the legacy out-of-process `Galaxy.Host` Windows service. HLR-001 through HLR-008 have been rewritten driver-agnostically; HLR-009 has been retired (the embedded Status Dashboard is superseded by the Admin UI). HLR-010 through HLR-017 cover plug-in drivers, resilience, Config DB / draft-publish, cluster redundancy, fleet-wide identifier uniqueness, Admin UI, audit logging, metrics, and the Roslyn capability-wrapping analyzer. ## HLR-001: OPC UA Server @@ -28,11 +28,10 @@ Drivers whose backend has a native change signal (e.g. Galaxy's `time_of_last_de ## HLR-007: Service Hosting -The system shall be deployed as three cooperating Windows services: +The system shall be deployed as two cooperating Windows services (the legacy `OtOpcUa.Galaxy.Host` x86 host was retired in PR 7.2 — Galaxy access now flows through the separately installed `mxaccessgw` gateway, which lives in a sibling repository and is not part of the OtOpcUa deployment): -- **OtOpcUa.Server** — .NET 10 x64, `Microsoft.Extensions.Hosting` + `AddWindowsService`, hosts all non-Galaxy drivers in-process and the OPC UA endpoint. +- **OtOpcUa.Server** — .NET 10 AnyCPU, `Microsoft.Extensions.Hosting` + `AddWindowsService` (decision #30 replaced the original TopShelf choice), hosts every driver in-process — including the new Tier-A `GalaxyDriver` that speaks gRPC to `mxaccessgw` — and the OPC UA endpoint. - **OtOpcUa.Admin** — .NET 10 x64 Blazor Server web app, hosts the admin UI, SignalR hubs for live updates, `/metrics` Prometheus endpoint, and audit log writers. -- **OtOpcUa.Galaxy.Host** — .NET Framework 4.8 x86 (TopShelf), hosts MXAccess COM + Galaxy Repository SQL + Historian plugin. Talks to `Driver.Galaxy.Proxy` inside `OtOpcUa.Server` via a named pipe (MessagePack over length-prefixed frames, per-process shared secret, SID-restricted ACL). ## HLR-008: Logging diff --git a/docs/v2/implementation/phase-7-scripting-and-alarming.md b/docs/v2/implementation/phase-7-scripting-and-alarming.md index c4e6603..ae821c7 100644 --- a/docs/v2/implementation/phase-7-scripting-and-alarming.md +++ b/docs/v2/implementation/phase-7-scripting-and-alarming.md @@ -89,7 +89,7 @@ Tie-in capability — **historian alarm sink**: ### Stream A — `Core.Scripting` (Roslyn engine + sandbox + AST inference + logger) — **2 weeks** -1. **A.1** Project scaffold + NuGet `Microsoft.CodeAnalysis.CSharp.Scripting`. `ScriptOptions` allow-list (`typeof(object).Assembly`, `typeof(Enumerable).Assembly`, the Core.Scripting assembly itself — nothing else). Hand-written `ScriptContext` base class with `GetTag(string)` / `SetVirtualTag(string, object)` / `Logger` / `Now` / `Deadband(double, double, double)` helpers. +1. **A.1** Project scaffold + NuGet `Microsoft.CodeAnalysis.CSharp.Scripting`. `ScriptOptions` allow-list (`typeof(object).Assembly`, `typeof(Enumerable).Assembly`, the Core.Scripting assembly itself — nothing else). Hand-written `ScriptContext` base class with `GetTag(string)` / `SetVirtualTag(string, object)` / `Logger` / `Now` / `Deadband(double, double, double)` helpers. _(Implementation note 2026-05-23 — superseded by Core.Scripting-008 / -016: the `CSharpScript`/`ScriptRunner` path was replaced with a hand-rolled `CSharpCompilation.Create` → `Emit(MemoryStream)` → collectible `ScriptAssemblyLoadContext.LoadFromStream` pipeline so per-publish ALC accretion is reclaimable, and engines route compiles through `CompiledScriptCache` rather than calling `ScriptEvaluator.Compile` directly. The reference list was correspondingly widened from the narrow allow-list above to the full BCL `TRUSTED_PLATFORM_ASSEMBLIES` set (filtered to `System.*` + `netstandard` + `Microsoft.Win32.Registry`) because the new pipeline can't compile against the old narrow set; `ForbiddenTypeAnalyzer` is now the sole security gate, consistent with how Core.Scripting-001 / -002 established the analyzer must be the real boundary because type forwarding makes any references-list-only restriction porous. See `docs/VirtualTags.md` "Compile cache" for the current implementation contract.)_ 2. **A.2** `DependencyExtractor : CSharpSyntaxWalker`. Visits every `InvocationExpressionSyntax` targeting `ctx.GetTag` / `ctx.SetVirtualTag`; accepts only a `LiteralExpressionSyntax` argument. Non-literal arguments (concat, variable, method call) → publish-time rejection with an actionable error pointing the operator at the exact span. Outputs `IReadOnlySet Inputs` + `IReadOnlySet Outputs`. 3. **A.3** Compile cache. `(source_hash) → compiled Script`. Recompile only when source changes. Warm on `SealedBootstrap`. 4. **A.4** Per-evaluation timeout wrapper (default 250ms; configurable per tag). Timeout = tag quality `BadInternalError` + structured warning log. Keeps a single runaway script from starving the engine. diff --git a/docs/v2/plan.md b/docs/v2/plan.md index d279d5a..e0e35f0 100644 --- a/docs/v2/plan.md +++ b/docs/v2/plan.md @@ -193,7 +193,7 @@ ConfigurationService - Compact binary format, faster than JSON, good fit for high-frequency data change callbacks - Simpler than gRPC on .NET 4.8 (which needs legacy `Grpc.Core` native library) -**Decided: Galaxy Host is a separate Windows service.** +**Decided: Galaxy Host is a separate Windows service.** _(Reversed by PR 7.2, 2026-04-30 — see PR 7.2's commit `ae7106d` and the project_galaxy_via_mxgateway memory entry. The legacy in-process `Galaxy.Host` / `Galaxy.Proxy` / `Galaxy.Shared` projects + the `OtOpcUaGalaxyHost` Windows service were retired; Galaxy access now flows through the in-process Tier-A `GalaxyDriver` talking gRPC to a separately installed `mxaccessgw` gateway sibling repo. The reasoning below was correct for the original LMX/x86-COM architecture; the gateway sibling repo now owns those constraints externally.)_ - Independent lifecycle from the OtOpcUa Server - Can be restarted without affecting the main server or other drivers - Galaxy.Proxy detects connection loss, sets Bad quality on Galaxy nodes, reconnects when Host comes back @@ -801,7 +801,7 @@ aggregate runner (#253); server-side factory + seed SQL per driver (#210–#213) | 26 | Admin deploys on same server (co-hosted) | Simplifies deployment; can also run on separate management host | 2026-04-16 | | 27 | Admin scaffold early, driver-specific screens deferred | Core CRUD for instances/drivers first; per-driver config UI added with each driver | 2026-04-16 | | 28 | Named pipes for Galaxy IPC | Fast, no port conflicts, native to both .NET 4.8 and .NET 10 | 2026-04-16 | -| 29 | Galaxy Host is a separate Windows service | Independent lifecycle, can restart without affecting main server or other drivers | 2026-04-16 | +| 29 | Galaxy Host is a separate Windows service | Independent lifecycle, can restart without affecting main server or other drivers | 2026-04-16 (**reversed PR 7.2, 2026-04-30** — Galaxy is now an in-process Tier-A driver talking gRPC to the sibling `mxaccessgw` gateway; see the decision body above) | | 30 | Drop TopShelf, use Microsoft.Extensions.Hosting | Built-in Windows Service support in .NET 10, no third-party dependency | 2026-04-16 | | 31 | Mono-repo for all drivers | Simpler dependency management, single CI pipeline, shared abstractions | 2026-04-16 | | 32 | MessagePack serialization for Galaxy IPC | Binary, fast, works on .NET 4.8+ and .NET 10 via MessagePack-CSharp NuGet | 2026-04-16 |