fdb4ac70518b500dce61d457f3951b0a984da928
914 Commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
fdb4ac7051 | docs(plans): mark Tasks 15-18 complete in tasks.json | ||
|
|
136234e7f2 | feat(commons): add cluster/admin/diagnostics client interfaces | ||
|
|
5d3a5a40d7 | feat(commons): add deploy/admin/audit/redundancy/fleet message contracts | ||
|
|
fee4a8c008 | feat(commons): add correlation/execution/node/deployment/revisionhash types | ||
|
|
c168c1c9c6 | feat(migration): add Migrate-To-V2.ps1 idempotent migration runner | ||
|
|
605dbf3dcc |
feat(configdb): V2HostingAlignment migration consolidating Phase 1a-1e
Phase 1f — the consolidator migration. Closes out the v2 entity-model
rewrite by emitting a single EF migration that captures the cumulative
schema delta from 14a (RowVersion) through 14e (drop generation entities).
Generated: src/Core/ZB.MOM.WW.OtOpcUa.Configuration/Migrations/
20260526081556_V2HostingAlignment.cs (1562 lines)
20260526081556_V2HostingAlignment.Designer.cs
Migration shape (per `grep -nE migrationBuilder.\(...)`):
Drop 12 ForeignKey constraints (one per live-edit entity's GenerationId FK)
Drop 2 Tables (ConfigGeneration, ClusterNodeGenerationState)
Drop 45 Indexes (every UX_*_Generation_* and IX_*_Generation_* across the
13 live-edit tables — 1 also dropped the unique-Primary
filtered index UX_ClusterNode_Primary_Per_Cluster)
Drop 13 Columns (12 GenerationId + 1 RedundancyRole)
Add 12 RowVersion columns (one per live-edit entity)
Create 4 Tables (Deployment, NodeDeploymentState, ConfigEdit,
DataProtectionKeys)
Create ~45 Indexes (recreated under the new naming pattern
UX_<Table>_LogicalId / UX_<Table>_<X> with the
GenerationId column stripped from composite keys)
Notable EF quirks accepted:
Unique-on-required-column indexes (UX_VirtualTag_LogicalId etc.) ship a
`filter: "[VirtualTagId] IS NOT NULL"` clause that EF auto-inserts for
SQL Server. Harmless — the column is C#-side `required` so NULL never
appears.
Verification:
dotnet build src/Core/ZB.MOM.WW.OtOpcUa.Configuration -> 0 errors
dotnet ef migrations script --idempotent (against placeholder DSN)
-> 3259-line
.sql produced
OK
tests/Core/ZB.MOM.WW.OtOpcUa.Configuration.Tests -> 0 errors
Live `dotnet ef database update` against a scratch SQL Server deferred to
Task 15 (Migrate-To-V2.ps1) — SSH to the docker host needs a key/password I
don't have, and the always-on SQL at 10.100.0.35,14330 uses Integrated
Security (Windows auth, unreachable from this macOS dev). The migration
itself is structurally correct by construction (EF tooling generated it
against the live DbContext model); the live-DB confidence step is the
PowerShell wrapper's job.
SchemaComplianceTests updates:
- All_expected_tables_exist: removed ConfigGeneration +
ClusterNodeGenerationState; added Deployment, NodeDeploymentState,
ConfigEdit, DataProtectionKeys.
- Filtered_unique_indexes_match_schema_spec: removed entries for
UX_ClusterNode_Primary_Per_Cluster (Task 14d) and
UX_ConfigGeneration_Draft_Per_Cluster (Task 14e). Two filtered uniques
remain (UX_ClusterNodeCredential_Value, UX_ExternalIdReservation_KindValue_Active).
- Check_constraints_match_schema_spec: added CK_ConfigEdit_FieldsJson_IsJson.
StoredProceduresTests update:
- Removed RedundancyRole + 'Primary' from the raw INSERT into ClusterNode
so the DB-backed test runs against the new schema.
|
||
|
|
e00f46d723 |
refactor(configdb): delete ConfigGeneration + ClusterNodeGenerationState
Phase 1e of the v2 entity-model rewrite. With the FKs gone (Task 14b) and
the apply pipeline replaced (Task 14c), the v1 draft/publish entities have
no remaining v2 consumers.
Deleted entity classes:
src/Core/ZB.MOM.WW.OtOpcUa.Configuration/Entities/ConfigGeneration.cs
src/Core/ZB.MOM.WW.OtOpcUa.Configuration/Entities/ClusterNodeGenerationState.cs
Deleted enum classes (no v2 consumers):
src/Core/ZB.MOM.WW.OtOpcUa.Configuration/Enums/GenerationStatus.cs
src/Core/ZB.MOM.WW.OtOpcUa.Configuration/Enums/NodeApplyStatus.cs
OtOpcUaConfigDbContext changes:
- Removed DbSet<ConfigGeneration> ConfigGenerations
- Removed DbSet<ClusterNodeGenerationState> ClusterNodeGenerationStates
- Removed ConfigureConfigGeneration(modelBuilder) call + method body
- Removed ConfigureClusterNodeGenerationState(modelBuilder) call + body
- Tidied the "v2 deploy-model tables" header comment
Navigation property cleanup:
- ServerCluster.Generations collection -> removed
- ClusterNode.GenerationState navigation -> removed
doc-comment cref cleanup (replaced <see cref="X"/> with <c>X</c> for the
deleted types so the C# XML comment compiler doesn't fail with CS1574):
- Deployment.cs (cref to ConfigGeneration)
- NodeDeploymentState.cs (cref to ClusterNodeGenerationState)
- Core/OpcUa/EquipmentNodeWalker.cs (cref to ConfigGeneration in the
EquipmentNamespaceContent record's doc-comment; while there, removed
"All four collections are scoped to the same ConfigGeneration" since
that's no longer true in v2)
Verification:
src/Core/ZB.MOM.WW.OtOpcUa.Configuration -> 0 errors
src/Core/ZB.MOM.WW.OtOpcUa.Core -> 0 errors
tests/Core/ZB.MOM.WW.OtOpcUa.Configuration.Tests -> 0 errors
tests/Core/ZB.MOM.WW.OtOpcUa.Core.Tests -> 0 errors
whole solution -> 15 errors
(all in Server/Admin; transitive Server.Tests/Admin.Tests skip per the
parent's failure, so the per-project count dropped vs Task 14d's 71)
|
||
|
|
3c915e652e |
refactor(configdb): drop ClusterNode.RedundancyRole (replaced by Akka leader)
Phase 1d of the v2 entity-model rewrite. The static RedundancyRole column
is replaced by Akka cluster's role-leader-of-"driver" election at runtime
(see RedundancyStateActor + ServiceLevelCalculator in Task 35).
Changes:
- Removed `public required RedundancyRole RedundancyRole` from
ClusterNode entity.
- Removed `e.Property(x => x.RedundancyRole).HasConversion<string>()...`
mapping from OtOpcUaConfigDbContext.ConfigureClusterNode.
- Removed the `UX_ClusterNode_Primary_Per_Cluster` filtered unique index
(filter referenced [RedundancyRole]='Primary').
- Dropped `using ZB.MOM.WW.OtOpcUa.Configuration.Enums` from ClusterNode.cs
(no longer needed).
- Deleted `Enums/RedundancyRole.cs` — the enum is unused in v2-kept code.
- DraftValidator: dropped the "exactly one Primary per cluster"
validation block. Comment in place explaining v2 picks primary at
runtime via Akka.
- DraftValidatorTests: dropped ValidateClusterTopology_flags_multiple_Primary
test; reworked BuildNode helper to no longer take a `role` argument.
Untouched (Server + Admin still reference RedundancyRole; accepted broken
per Task 56 policy):
src/Server/ZB.MOM.WW.OtOpcUa.Server/Redundancy/{ClusterTopologyLoader,
RedundancyStatePublisher, RedundancyTopology, ServiceLevelCalculator}.cs
src/Server/ZB.MOM.WW.OtOpcUa.Admin/Services/RedundancyMetrics.cs
DB-runtime tests will fail against the new schema (Task 14f's migration
drops the column) — to be updated in Task 14f's SchemaComplianceTests
update:
- SchemaComplianceTests.cs:55 (expected filtered index list)
- StoredProceduresTests.cs:263 (raw INSERT names the column)
Verification:
src/Core/ZB.MOM.WW.OtOpcUa.Configuration -> 0 errors
tests/Core/ZB.MOM.WW.OtOpcUa.Configuration.Tests -> 0 errors
whole solution -> 71 errors
(70 from Task 14b in Server/Admin, +1 new Server/Redundancy reference)
|
||
|
|
1ddf8bb50e |
refactor(configdb): delete v1 Apply pipeline (replaced by AdminOperationsActor)
Phase 1c of the v2 entity-model rewrite. Deletes the draft/publish lifecycle
machinery that v2 replaces with AdminOperationsActor + ConfigComposer +
DriverInstanceActor.ApplyDelta.
Deleted (6 files):
src/Core/ZB.MOM.WW.OtOpcUa.Configuration/Apply/
IGenerationApplier.cs — interface for the apply pipeline
GenerationApplier.cs — the v1 applier coordinating per-driver hook-back
GenerationDiff.cs — typed wrapper over the sp_ComputeGenerationDiff
SQL output
ApplyCallbacks.cs — per-driver hook surface invoked by the applier
ChangeKind.cs — enum {Added, Modified, Removed, Unchanged}
tests/Core/ZB.MOM.WW.OtOpcUa.Configuration.Tests/GenerationApplierTests.cs
The empty Apply/ directory is removed.
Kept (repurposed in Task 39 for stale-config fallback):
src/Core/ZB.MOM.WW.OtOpcUa.Configuration/LocalCache/GenerationSealedCache.cs
src/Core/ZB.MOM.WW.OtOpcUa.Configuration/LocalCache/ResilientConfigReader.cs
tests/Core/ZB.MOM.WW.OtOpcUa.Configuration.Tests/GenerationSealedCacheTests.cs
tests/Core/ZB.MOM.WW.OtOpcUa.Configuration.Tests/ResilientConfigReaderTests.cs
Naming rename (GenerationSealedCache -> DeploymentArtifactCache) deferred
to Task 39 (DriverHostActor stale-config fallback) where the consumer is
written. The type stays available under its v1 name until then.
IDriver.cs doc-comment: replaced the "Used by IGenerationApplier..." sentence
with "Invoked by the v2 DriverInstanceActor when ApplyDelta reports that only
this driver's config changed in the new deployment."
Server/Admin breakage from Task 14b unchanged (70 errors). Configuration +
Core.Tests + Configuration.Tests stay green.
src/Core/ZB.MOM.WW.OtOpcUa.Configuration -> 0 errors
tests/Core/ZB.MOM.WW.OtOpcUa.Configuration.Tests -> 0 errors
whole solution -> 70 errors (all in Server/Admin)
|
||
|
|
13d3aeab09 |
refactor(configdb): drop GenerationId FK from live-edit entities
Phase 1b of the v2 entity-model rewrite. The design's live-edit model means
the 12 v2 live-edit entities no longer carry a generation scope — they're
edited directly via AdminOperationsActor, with RowVersion (added in Task 14a)
providing last-write-wins detection.
Entity changes (12 files):
Equipment, DriverInstance, Device, Tag, PollGroup, Namespace,
UnsArea, UnsLine, NodeAcl, Script, VirtualTag, ScriptedAlarm
- Removed: public long GenerationId
- Removed: public ConfigGeneration? Generation (navigation)
DbContext changes (OtOpcUaConfigDbContext.cs):
- Removed 12 HasOne(x => x.Generation).WithMany().HasForeignKey... mappings
- Rewrote ~36 indexes: dropped the GenerationId column from each composite
key, renamed UX_<Table>_Generation_<X> -> UX_<Table>_<X> and
IX_<Table>_Generation_<X> -> IX_<Table>_<X>. Logical IDs become globally
unique (UX_<Table>_LogicalId on the LogicalId column alone).
- Removed Namespace's redundant UX_Namespace_Generation_LogicalId_Cluster
index (subsumed by the new UX_Namespace_LogicalId).
Core.Tests fixtures (4 files):
Removed "GenerationId = 1," lines from:
- PermissionTrieBuilderTests.cs (NodeAcl Row factory)
- PermissionTrieTests.cs (NodeAcl Row factory)
- TriePermissionEvaluatorTests.cs (NodeAcl Row factory + 2 gen{1,5}Row
mutations that test stale-generation evaluation; the trie itself still
carries a generation tag via PermissionTrie.GenerationId, fed in via
PermissionTrieBuilder.Build's generationId parameter, so the tests
still exercise the production code path)
- EquipmentNodeWalkerTests.cs (Area/Line/Eq/Tag/VirtualTag/ScriptedAlarm
builders)
Expected breakage (accepted per Task 56 policy):
src/Server/ZB.MOM.WW.OtOpcUa.Server ~25 errors (DriverInstanceBootstrapper,
AuthorizationBootstrap,
EquipmentNamespaceContentLoader,
Phase7Composer, ...)
src/Server/ZB.MOM.WW.OtOpcUa.Admin ~45 errors (VirtualTags.razor,
ScriptedAlarms.razor,
DriverInstanceService,
EquipmentService,
EquipmentImportBatchService,
UnsService,
FocasDriverDetailService,
...)
Server.Tests, Admin.Tests, Admin.E2ETests also break transitively (they
project-reference Server/Admin). All deleted in Task 56.
Verification:
dotnet build src/Core/ZB.MOM.WW.OtOpcUa.Configuration -> 0 errors
dotnet build tests/Core/ZB.MOM.WW.OtOpcUa.Core.Tests -> 0 errors
dotnet build tests/Core/ZB.MOM.WW.OtOpcUa.Configuration.Tests -> 0 errors
dotnet build (whole solution) -> 70 errors, all in Server/Admin
|
||
|
|
4bb4ad8acb |
feat(configdb): add RowVersion to live-edit entities
Phase 1a of the v2 entity-model rewrite. Adds:
public byte[] RowVersion { get; set; } = Array.Empty<byte>();
and the EF Core mapping
e.Property(x => x.RowVersion).IsRowVersion();
to 12 live-edit entities:
Equipment, DriverInstance, Device, Tag, PollGroup, Namespace,
UnsArea, UnsLine, NodeAcl, Script, VirtualTag, ScriptedAlarm
These are the entities that v2 admins will edit directly via
AdminOperationsActor (no draft staging). RowVersion enables
last-write-wins detection when two operators race on the same row.
GenerationId FKs are still in place on these entities (removed in Task 14b);
this commit only adds the rowversion column so the migration in Task 14f can
emit ADD COLUMN before DROP FK as a single atomic step.
|
||
|
|
990ce343fe |
docs(plans): split Task 14 into 14a-14f (entity-model rewrite)
The original Task 14 (5-min EF migration that "drops ConfigGeneration") was
under-scoped: the design doc (live-edit model, ~line 208) requires removing
GenerationId from 13 entities (Equipment, DriverInstance, Device, Tag,
PollGroup, Namespace, UnsArea, UnsLine, NodeAcl, Script, VirtualTag,
ScriptedAlarm) and adding RowVersion columns for last-write-wins detection.
That cascades into GenerationApplier / GenerationDiff / GenerationSealedCache
and the legacy Server/Admin CRUD services.
New decomposition (~85 min total, replacing the original 5-min estimate):
14a standard 10m Add RowVersion to live-edit entities
14b high-risk 30m Drop GenerationId FK from those entities
14c high-risk 20m Obsolete GenerationApplier/Diff/SealedCache
14d standard 5m Drop ClusterNode.RedundancyRole
14e small 5m Delete ConfigGeneration + ClusterNodeGenerationState
14f high-risk 15m Consolidator: generate V2HostingAlignment migration
Policy decision (recorded with user): OtOpcUa.Server + OtOpcUa.Admin are
allowed to fail-to-compile between 14b and Task 56 - only the new v2 projects
need to stay green. Task 56 deletes the legacy projects.
Plan markdown: replaces the original Task 14 section with the 6-task
decomposition + a header explaining the rewrite. Task index table at the
bottom of the plan updated.
Tasks JSON: replaces the single Task 14 row with 6 string-id rows
("14a", "14b", ..., "14f"). Task 15 (Migrate-To-V2.ps1) and downstream
consumers re-pointed at "14f".
Verification step in 14f rewritten to use the shared docker host at
10.100.0.35 per CLAUDE.md (Docker is not installed on this Mac dev VM).
|
||
|
|
8e2c4f2835 |
feat(configdb): add Deployment, NodeDeploymentState, ConfigEdit, DataProtectionKey entities
Phase 1 entities for the v2 live-edit + snapshot-deploy model:
Deployment — immutable artifact snapshot (replaces v1 ConfigGeneration row)
Status enum {Dispatching, AwaitingApplyAcks, Sealed,
PartiallyFailed, TimedOut}; carries the SHA256 RevisionHash and
the SnapshotAndFlatten() ArtifactBlob; RowVersion for optimistic
concurrency.
NodeDeploymentState — per-(node, deployment) apply progress row owned by
DriverHostActor (replaces single-row ClusterNodeGenerationState).
Composite key (NodeId, DeploymentId) gives the
ConfigPublishCoordinator the full history it needs to
reconstruct in-flight state after a failover.
ConfigEdit — append-only audit row written by AdminOperationsActor on every
mutating op; optional ExecutionId correlates edits inside one
admin transaction (e.g. an import batch).
DataProtectionKey — ASP.NET DataProtection key ring storage via
IDataProtectionKeyContext so every admin-role node decrypts
the same cookies without sharing a filesystem.
OtOpcUaConfigDbContext now implements IDataProtectionKeyContext and registers four new
DbSets + four new ConfigureXxx mappings.
Central package bumps (forced by Microsoft.AspNetCore.DataProtection.EntityFrameworkCore
10.0.7's transitive dep):
Microsoft.EntityFrameworkCore.{,Design,InMemory,SqlServer} 10.0.0 -> 10.0.7
Microsoft.Extensions.{Configuration.Abstractions,Configuration.Json,Hosting,Hosting.WindowsServices,Http} 10.0.0 -> 10.0.7
EF migration generation + the ConfigGeneration drop + RedundancyRole column removal are
deferred to Task 14 (high-risk, non-parallelizable).
|
||
|
|
30a2104fa5 |
feat(scaffold): introduce 8 v2 component projects
Adds the empty project skeletons that subsequent v2 tasks fill in: src/Core/ZB.MOM.WW.OtOpcUa.Commons (types, interfaces, message contracts) src/Core/ZB.MOM.WW.OtOpcUa.Cluster (Akka.Hosting + cluster wiring) src/Server/ZB.MOM.WW.OtOpcUa.Security (cookie+JWT auth, LDAP) src/Server/ZB.MOM.WW.OtOpcUa.ControlPlane (admin-role cluster singletons) src/Server/ZB.MOM.WW.OtOpcUa.Runtime (per-node driver actors) src/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer (OPC UA SDK application host) src/Server/ZB.MOM.WW.OtOpcUa.AdminUI (Razor class library) src/Server/ZB.MOM.WW.OtOpcUa.Host (single fused web binary) Each project sets TreatWarningsAsErrors=true in its own csproj (per the Directory.Build.props deviation note in the previous commit). NuGetAuditSuppress entries cover transitive vulnerability advisories the new strictness surfaces: - GHSA-g94r-2vxg-569j (OpenTelemetry.Api 1.9.0 via Akka.Cluster.Hosting/Tools) - GHSA-h958-fxgg-g7w3 (Opc.Ua.Core 1.5.374.126 via OpcUaServer) - GHSA-37gx-xxp4-5rgx + GHSA-w3x6-4m5h-cxqf (legacy advisories already accepted) OpcUaServer pins OPCFoundation.NetStandard.Opc.Ua.Configuration to 1.5.374.126 via VersionOverride to match Opc.Ua.Server's transitive Opc.Ua.Core (same constraint as the legacy Server project). Runtime does NOT project-reference any concrete Driver.* assemblies; drivers load reflectively at runtime (Phase 6). Runtime gets the IDriver contract through Core.Abstractions instead. Host's Microsoft.Extensions.Hosting.WindowsServices is conditional on the Windows OS so the project builds on macOS dev machines. Build verification: dotnet build -> 438 warnings (all pre-existing xUnit1051 in legacy Server.Tests/Admin.Tests), 0 errors. Closes Task 9 (build green smoke check, no separate commit). |
||
|
|
2b811477d1 |
chore(build): introduce central package management for v2
Adds Directory.Packages.props (ManagePackageVersionsCentrally) and
Directory.Build.props (net10.0/nullable/implicit usings/LangVersion latest).
Strips Version attributes from every csproj PackageReference and consolidates
versions into the central file.
Side fixes (necessary to keep the build green on .NET SDK 10.0.105 on macOS):
- Microsoft.CodeAnalysis.CSharp{,.Workspaces}: 5.3.0 -> 5.0.0. The 5.3.0
analyzer DLL references compiler 5.3.0.0 and the local SDK ships compiler
5.0.0.0, producing CS9057 on every project that loaded the Analyzers
output. Master itself was broken on this machine pre-change.
- Server + Server.Tests pin OPCFoundation.NetStandard.Opc.Ua.{Configuration,
Client} to 1.5.374.126 via VersionOverride, matching Opc.Ua.Server's
pin. Mixing 1.5.378.106 Opc.Ua.Core transitively with 1.5.374.126
Opc.Ua.Server breaks CustomNodeManager2 override signatures
(CS0115 on LoadPredefinedNodes/Browse/HistoryRead*) and CS7069 in
the tests. The pin disappears when the legacy Server project is
deleted in Task 56.
- Client.UI + Client.UI.Tests: NuGetAuditSuppress for
GHSA-xrw6-gwf8-vvr9 (Tmds.DBus.Protocol 0.20.0 reaches both projects
transitively from Avalonia.Desktop on Linux/macOS only).
Deviation from the plan: TreatWarningsAsErrors=true is NOT set in
Directory.Build.props because the pre-v2 Admin/Server test projects carry
~240 xUnit1051 analyzer warnings that would fail the build. New v2 projects
opt in via their own csproj; the global flag can return once the legacy
projects are deleted in Task 56.
|
||
|
|
fac32ad69b |
docs(plans): add v2 implementation plan with 66 bite-sized tasks
Converts the akka-hosting-alignment design into an executable plan: 12 phases covering branch/scaffold, ConfigDb schema, Commons, Cluster, Security, ControlPlane singletons, Runtime per-node actors, OpcUaServer extraction, AdminUI migration, Host entry point, cleanup, integration tests, deploy scripts, and docs. Each task has files, TDD steps, exact commands, classification, time estimate, and parallelizable list. Co-located .tasks.json drives executing-plans resume from any session. |
||
|
|
ef4a70751c |
docs(plans): add v2 Akka + fused hosting alignment design
Captures the brainstormed design to align OtOpcUa with ScadaLink: single role-gated binary, Akka.NET cluster with admin/driver roles, cluster singletons for control plane, per-node actor hierarchy for OPC UA runtime, dual-endpoint warm redundancy preserved with ServiceLevel driven by Akka leader, cookie+JWT auth, Traefik routing, and ScadaLink-style live-edit + deploy model replacing the draft/publish ConfigGeneration lifecycle. |
||
|
|
866dc03fac |
style(ui): align admin styling with ScadaLink master conventions
- Move CSS into wwwroot/css/ (theme.css, site.css); sidebar 218 -> 220px - Add hamburger + Bootstrap collapse for <lg viewports - Add Components/Shared/ with LoadingSpinner, ToastNotification, StatusBadge - Replace .page-title with flex + <h4 class="mb-0"> across 20 pages - Convert NewCluster + IdentificationFields forms to card + h6 subsection pattern |
||
|
|
c6082aa0b9 |
fix(admin-e2e): register missing DI services so ClusterDetail interactive circuit boots
UnsTabDragDropE2ETests were timing out at the 'UNS Structure' nav-link locator because AdminWebAppFactory never registered AdminHubConnectionFactory / HubTokenService / DataProtection — ClusterDetail.razor's @inject threw at circuit boot, so the page never advanced past the Loading placeholder. 2 → 3 pass after the registrations land. Also documents the Modbus standard-vs- exception_injection coverage matrix in the fixture README + cross-references docs/drivers/AbServer-Test-Fixture.md from each Emulate test so a developer landing on a skipped test has a direct doc pointer. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
b1f3e09661 |
test(modbus, abcip): align failing integration tests with fixtures
Two real bugs uncovered by re-running with the new fixture defaults
pointing at the shared docker host. Both are test-side, not driver-side.
AbCip — Driver_reads_seeded_DInt_from_ab_server (4 parametrized rows):
Hardcoded 'ab://127.0.0.1:{port}/1,0' in the deviceUri instead of
the resolved fixture.Host. The new 10.100.0.35 default (and any
AB_SERVER_ENDPOINT override) silently couldn't reach this test —
the driver tried to connect to a non-existent localhost:44818 and
returned BadCommunicationError on all 4 profile rows. The sibling
Emulate tests already use the fixture's resolved endpoint; this
smoke test was missed in the original migration.
Fix: deviceUri = $"ab://{fixture.Host}:{fixture.Port}/1,0".
Modbus — Float32_With_CDAB_Roundtrips_Through_Wire:
Test wrote a Float32 to HR 100 (2 consecutive registers: 100+101).
standard.json's writable HR range declares [100,100] only — a
single-cell auto-incrementing register, not a 2-register pair. The
write to register 101 was rejected with Illegal Data Address
(BadOutOfRange).
Fix: moved the tag from HR 100 to HR 200 (in standard.json's
'[200, 209]' scratch range — 10 consecutive writable HRs). The
Float32+CDAB semantic the test exercises is unchanged.
Modbus — Block_Read_Coalescing_Reduces_PDU_Count_End_To_End:
Test read HR 300, 302, 304 — outside both the writable ranges and
the uint16 seed list. pymodbus rejects reads to unseeded HRs even
though 'hr size' is 2048. BadOutOfRange on every read.
Fix: moved the tags from 300/302/304 to 200/202/204 (within the
scratch range). The non-contiguous coalescing semantic (3 tags
inside a 5-register window with MaxReadGap=5) is preserved.
After this commit:
- Modbus.IntegrationTests: 6/38 pass / 32 skip / 0 fail
(was 4 pass / 32 skip / 2 fail; 32 skips are profile-gated
ExceptionInjectionTests — they need MODBUS_SIM_PROFILE=
exception_injection and a different container, intentional gating)
- AbCip.IntegrationTests: 10/12 pass / 2 skip / 0 fail
(was 6 pass / 2 skip / 4 fail; 2 skips are Emulate tests that
need the fixture for separate scenarios)
No driver code changed.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
49644fc7fd |
test(fixtures): migrate integration-test fixture defaults to 10.100.0.35
CLAUDE.md "Docker Workflow" claims (per the 2026-04-28 migration note)
that all fixture-class default endpoints were rewritten to target the
shared Docker host at 10.100.0.35. Audit during today's e2e run showed
the claim was incomplete — five fixture classes still defaulted to
localhost / 127.0.0.1, causing every fixture-touching integration test
to skip with "endpoint unreachable" on a fresh box that hadn't set
the override env vars.
Files corrected:
- tests/.../Modbus.IntegrationTests/ModbusSimulatorFixture.cs
DefaultEndpoint: localhost:5020 → 10.100.0.35:5020
- tests/.../S7.IntegrationTests/Snap7ServerFixture.cs
DefaultEndpoint: localhost:1102 → 10.100.0.35:1102
- tests/.../OpcUaClient.IntegrationTests/OpcPlcFixture.cs
DefaultEndpoint: opc.tcp://localhost:50000 → opc.tcp://10.100.0.35:50000
- tests/.../AbCip.IntegrationTests/AbServerFixture.cs
Host default + ResolveHost fallback: 127.0.0.1 → 10.100.0.35
- tests/.../AbLegacy.IntegrationTests/AbLegacyServerFixture.cs
Host default + ResolveEndpoint fallback: 127.0.0.1 → 10.100.0.35
XML doc comments referencing the old localhost defaults were updated in
the same pass so the class-summary documentation matches the actual
default. The override-via-env-var mechanism (MODBUS_SIM_ENDPOINT,
AB_SERVER_ENDPOINT, AB_LEGACY_ENDPOINT, S7_SIM_ENDPOINT,
OPCUA_SIM_ENDPOINT) is unchanged — pointing at a real PLC or a
locally-running container still works exactly as before.
Verification:
- Solution-wide dotnet build: 0 errors.
- S7.IntegrationTests: 3/3 pass without env-var override.
- OpcUaClient.IntegrationTests: 3/3 pass without env-var override.
- Modbus.IntegrationTests: 4/38 (same as the env-var-override run —
the 2 failures + 32 skips are pre-existing fixture-profile
mismatches unrelated to this fix).
- AbCip.IntegrationTests / AbLegacy.IntegrationTests: same results
as the env-var-override run.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
3d982d9a65 |
docs: sync against recent code changes
Five doc-content updates after this session's code-review resolution sweep. No code touched; pure documentation drift correction. 1. docs/reqs/HighLevelReqs.md (HLR-007 — Service Hosting): Refreshed the deployment description from "three cooperating processes (Server, Admin, Galaxy.Host)" to "two cooperating Windows services (Server, Admin)". The legacy x86 TopShelf Galaxy.Host process was retired in PR 7.2 (2026-04-30); Galaxy access now flows through the in-process Tier-A GalaxyDriver talking gRPC to the sibling mxaccessgw gateway. Also called out decision #30 (AddWindowsService replacing TopShelf) inline. 2. docs/VirtualTags.md: - Line 9: "compiled via Microsoft.CodeAnalysis.CSharp.Scripting" replaced with the current pipeline (Microsoft.CodeAnalysis.CSharp regular compiler — Core.Scripting-008 / -016 retired the CSharpScript/ScriptRunner path). - Line 39: orphan-thread leak description rewritten. The CSharp.Scripting-era "underlying ScriptRunner keeps running on its thread-pool thread until the Roslyn runtime returns" is no longer accurate — the new pipeline binds the script as a regular C# Func<> delegate, so the leak is now "synchronous CPU-bound work on a pool thread" (same operator-visible effect, different mechanism). 3. docs/v2/plan.md decision #29 ("Galaxy Host is a separate Windows service"): Annotated both the decision body and the decision-log table row with "Reversed PR 7.2, 2026-04-30" + a one-line summary of the replacement architecture. The original reasoning is preserved as audit trail per the decision-log convention. 4. docs/v2/implementation/phase-7-scripting-and-alarming.md A.1: Added an Implementation note describing the Core.Scripting-008 / -016 supersession of the original CSharpScript pipeline. The historical record stays; the note points future readers at docs/VirtualTags.md "Compile cache" for the current contract. 5. docs/plans/alarms-over-gateway.md "Files" section under client regeneration: Updated the .NET regeneration instructions to point at the new ZB.MOM.WW.MxGateway.Contracts.csproj path. The old clients/dotnet/MxGateway.Client.csproj no longer exists in the sibling repo (restructure after this plan was written) and the vendored-binaries situation in src/Drivers/ZB.MOM.WW.OtOpcUa.Driver.Galaxy/libs/ is called out so a reader following the plan won't chase a deleted path. Verification: grep against docs/ for the pre-fix wordings ("three cooperating processes", "Galaxy.Host (TopShelf)", "ScriptRunner", the wrong BadDeviceFailure hex code 0x80550000) returns no hits. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
23d59d73f2 |
fix(scripting+alarms): close remaining re-review findings
Single commit covering the four small/medium fixes from the updated code review. Core.Scripting-014 (Medium, Concurrency): CompiledScriptCache.Clear() used the key-only TryRemove(key, out var lazy) overload — same race shape Core.Scripting-006 closed in GetOrCompile's catch block. A concurrent re-add between snapshot and TryRemove was evicted + disposed while the new caller still held it. Replaced with the value-scoped TryRemove(KeyValuePair<,>) overload. Regression test Clear_uses_value_scoped_TryRemove_so_a_race_inserted_entry_survives added. Core.Scripting-013 (Medium, Security): Hand-rolled BuildWrapperSource pastes user source between literal braces; brace-balanced source could inject sibling methods/classes alongside CompiledScript.Run. Analyzer still walked the injected members so it wasn't a direct escape, but it relaxed the documented 'method body' authoring contract. Added EnforceSingleRunMember: after ParseText, the compilation unit must hold exactly one type (CompiledScript) and that type must hold exactly one member (the Run method). Any deviation throws CompilationErrorException with LMX001/ LMX002 diagnostic IDs and a Core.Scripting-013 reference in the message. Two regression tests added covering the sibling-method and sibling-class injection vectors. Core.Scripting-015 (Low, Correctness, latent): ToCSharpTypeName's generic branch truncated at the first backtick via IndexOf, silently dropping closed args of nested-generic shapes (Outer<T>.Inner<U>). No production caller exercises this shape today (all TContext/TResult are top-level non-nested), so the bug was latent. Rewrote the generic branch to walk the FullName segment-by- segment, consuming generic args per segment so nested shapes emit valid C# (global::Ns.Outer<T>.Inner<U> rather than the broken Outer<T,U>). Core.ScriptedAlarms-013 (Low, Documentation): The internal test accessors TryGetScratchReadCacheForTest / TryGetScratchContextForTest return live mutable scratch refilled in place under _evalGate. XML docs didn't warn future test authors about the synchronization contract. Added a <remarks> block to each documenting the only-safe-on-quiesced-engine + identity-or-single-key contract. Verification (suites green): Core.Scripting.Tests: 110/110 (was 107 — +3 new rejection/race tests) Core.ScriptedAlarms.Tests: 67/67 (unchanged — doc-only fix) Core.VirtualTags.Tests: 57/57 (unchanged) After this commit, all 12 findings from the updated re-review are closed (10 Resolved, 1 Won't Fix none, 1 Deferred — Driver.Galaxy-017). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
c2abbf45bd |
fix(driver-galaxy): align package versions + record vendored-DLL provenance
Driver.Galaxy-015, -016, -017, -018 resolution (one logical change set).
Driver.Galaxy-016 (Medium, Perf/Resource):
Reconciled the csproj PackageReferences with what the vendored
MxGateway.Client.dll was actually built against, verified by
reflecting Assembly.GetReferencedAssemblies() on the DLL:
- Polly 8.5.2 → Polly.Core 8.6.6
(most consequential — Polly v7 fluent API vs Polly.Core v8
resilience-pipeline API are DIFFERENT packages; the DLL was
built against Polly.Core so the prior Polly reference would
have failed at runtime with MissingMethodException the first
time the gateway client's retry pipeline ran)
- Grpc.Net.Client 2.71.0 → 2.76.0 (matches sibling Server/Worker)
- Microsoft.Extensions.Logging.Abstractions 10.0.0 → 10.0.7
Google.Protobuf 3.34.1 and Grpc.Core.Api 2.76.0 already matched —
left unchanged.
Driver.Galaxy-015 (re-triaged from Medium-Security → Low-Documentation):
Original framing was a security concern about unknown-provenance
binaries. User clarified the DLLs are their own code, built from
their own mxaccessgw project, not third-party. Re-triaged to a
documentation / audit-trail concern. Fix:
- Added a Provenance section to libs/README.md recording the
source-commit SHA (dd7ca1634e2d2b8a866c81f0009bf87ee9427750,
extracted from the AssemblyInformationalVersion baked into
both DLLs by the original build) and SHA-256 checksums.
- Documented the re-verification recipe (sha256sum + ilspycmd
| grep AssemblyInformationalVersion).
Recommendations about .gitattributes and CI hash-check deferred —
the DLLs are frozen until an unwinding path is taken, so adding
LFS or CI infrastructure now would need removal at unwinding.
Driver.Galaxy-018 (Low, Documentation):
Most of the recommendation folded into the libs/README.md rewrite
(pointed at sibling Server/Worker csproj as the live version source
rather than the deleted MxGateway.Client.csproj; recorded source
commit + SHA-256). <SpecificVersion>false</SpecificVersion> on the
<Reference> items intentionally not added — MSBuild's default for
HintPath references with bare-name Include attributes is already
SpecificVersion=false, so explicitly setting it would be cosmetic
without changing behaviour.
Driver.Galaxy-017 (Low, Design) — Deferred:
Recommendation part (b) (record mxaccessgw source-commit SHA in
libs/README.md) is satisfied by Driver.Galaxy-015's resolution.
Parts (a) and (c) — a GetVersion RPC at session-open and a parity
test against the live gateway's proto descriptor — are substantial
new RPC + plumbing work not in scope for this code-review sweep.
The risk surface is bounded because either of the libs/README.md
unwinding paths closes the vendoring + this concern naturally.
Re-open if neither path is taken within the next quarter and the
live gateway evolves its proto under the driver.
Verification:
- Build clean (Driver.Galaxy.csproj 0 errors, 0 warnings).
- Driver.Galaxy.Tests: 245/245 pass against the corrected
package set.
- Solution-wide build remains clean.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
3a53d03d23 |
fix(scripting): block ThreadPool/Timer/AssemblyLoadContext in sandbox
Core.Scripting-012 (High, Security) resolution.
The Core.Scripting-008 rewrite broadened the BCL references list from a
narrow allow-list to the full System.* + netstandard +
Microsoft.Win32.Registry set, delegating the security gate entirely to
ForbiddenTypeAnalyzer. Three categories of dangerous BCL types were
reachable from script source without a deny-list entry:
- System.Threading.ThreadPool — QueueUserWorkItem re-introduces the
background-fanout threat Core.Scripting-003 closed against
System.Threading.Tasks.
- System.Threading.Timer — schedules unbounded callback work that
outlives the per-evaluation timeout.
- System.Runtime.Loader.AssemblyLoadContext — loads arbitrary DLLs.
Defense-in-depth gap; invocation needs reflection (already denied)
but the load itself was reachable.
Fix:
- Added 'System.Runtime.Loader' to ForbiddenNamespacePrefixes
(preferred over type-granular per the recommendation so future BCL
additions to that namespace are denied by default).
- Added 'System.Threading.ThreadPool' and 'System.Threading.Timer'
to ForbiddenFullTypeNames — both live in System.Threading shared
with allowed primitives so they must be type-granular.
Regression tests added to ScriptSandboxTests:
Rejects_ThreadPool_QueueUserWorkItem_at_compile
Rejects_Timer_new_at_compile
Rejects_AssemblyLoadContext_at_compile
Docs:
docs/v2/implementation/phase-7-scripting-and-alarming.md decision #6
and the Sandbox-escape compliance-check row both updated to enumerate
the new entries per the Core.Scripting-009 doc-sync convention.
Two lower-impact suggestions from the finding's recommendation
(System.Console, CultureInfo.DefaultThreadCurrentCulture) were
intentionally not addressed and are recorded as accepted minor risks
in the resolution.
Verification: Core.Scripting.Tests 107/107 (was 104 + 3 new rejection
tests).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
fb7c6c7046 |
fix(scripting): route engines through CompiledScriptCache (Core.Scripting-016)
Both VirtualTagEngine.Load and ScriptedAlarmEngine.LoadAsync were calling
ScriptEvaluator.Compile directly, bypassing CompiledScriptCache. The
Core.Scripting-008 collectible-ALC fix wired Dispose only through the cache's
Clear()/Dispose(), so the per-publish accretion the -008 fix was meant to
eliminate was still in effect on the actual production path — the headline
'no more restarts needed' guarantee wasn't delivered.
Resolution:
- VirtualTagEngine + ScriptedAlarmEngine each gained a private
CompiledScriptCache<TContext, TResult> instance.
- Both Load methods now call _compileCache.GetOrCompile(source).
- Publish-replace path: _compileCache.Clear() runs alongside the existing
_tags / _alarms clears so the prior generation's ALCs are disposed
before recompile.
- Engine Dispose now calls _compileCache.Dispose() so shutdown actually
releases the emitted assemblies.
Side-fix in CompiledScriptCache: Dispose() set _disposed=true then called
Clear(), but Clear() had a pre-existing 'if (_disposed) return' guard that
aborted the drain unconditionally — making the Dispose-triggered cleanup a
silent no-op. Removed the disposed-guard on Clear() (clearing an empty/
cleared cache is idempotent).
Side-fix in ScriptedAlarmEngine.Dispose: cleared _alarms AFTER the
Task.WhenAll drain. The drain guarantees no background callback is mid-
flight, so clearing is safe. Previously _alarms was deliberately NOT
cleared on Dispose (per Core.ScriptedAlarms-005), but that left the
AlarmState records holding TimedScriptEvaluator → ScriptEvaluator → delegate
references that rooted the emitted assemblies, defeating the cache's
Dispose work on the engine side.
Regression tests:
- VirtualTagEngineTests.Dispose_unloads_compiled_script_assembly
- ScriptedAlarmEngineTests.Dispose_unloads_compiled_predicate_assembly
Both use WeakReference + bounded GC.Collect() to prove the emitted
assembly is reclaimable after engine.Dispose(). The alarms test had to
be synchronous (not 'async Task<WeakReference>') because async state
machines capture locals as state-struct fields, keeping them alive past
the method's apparent end and defeating GC.
Verification:
- Core.Scripting.Tests: 104/104 (unchanged).
- VirtualTags.Tests: 57/57 (was 56 — +1 unload test).
- ScriptedAlarms.Tests: 67/67 (was 66 — +1 unload test).
- All other consumer suites still green.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
a6ae4e22d1 |
fix(status-codes): correct BadDeviceFailure from 0x80550000 to 0x808B0000
Driver.Cli.Common-007 + Driver.Cli.Common-008 resolution.
Driver.Cli.Common-007 (High, Correctness):
0x80550000 is the canonical OPC UA spec value for BadSecurityPolicyRejected,
not BadDeviceFailure. The correct spec value for BadDeviceFailure is
0x808B0000 (verified against OPC Foundation Opc.Ua.StatusCodes;
corroborated locally by Driver.Galaxy.Runtime.StatusCodeMap and both
Wonderware historian quality mappers which all hand-pin the correct
value).
The bug was duplicated across six driver modules:
- FocasStatusMapper.BadDeviceFailure
- AbCipStatusMapper.BadDeviceFailure
- AbLegacyStatusMapper.BadDeviceFailure
- TwinCATStatusMapper.BadDeviceFailure
- ModbusDriver.StatusBadDeviceFailure
- S7Driver.StatusBadDeviceFailure
Plus the SnapshotFormatter shortlist that named 0x80550000 as
BadDeviceFailure, and three downstream Modbus tests that asserted
against the wrong value (so CI was blind).
This commit fixes all six native-mapper constants, the formatter
shortlist, and the three Modbus tests in one pass. Added a regression
guard to FormatStatus_does_not_apply_pre_fix_wrong_names that pins
0x80550000 never renders as BadDeviceFailure (mirroring the existing
-001 wrong-name guards).
Behavior change: OPC UA clients consuming the native drivers now see
the canonical BadDeviceFailure (0x808B0000) on device-fault paths
instead of the misnamed BadSecurityPolicyRejected (0x80550000). Wire-
level status semantics now match operator-facing CLI labels.
Driver.Cli.Common-008 (Low, Testing):
Deleted the redundant FormatStatus_names_native_driver_emitted_codes
Theory — its five InlineData rows were already covered by the
well-known Theory in the same commit (
|
||
|
|
41e62b2663 |
docs(code-reviews): updated re-review at commit a9be809 — 12 new findings
Re-reviewed the four modules with source changes since the previous review commit |
||
|
|
a9be80923c |
docs(v2-release-readiness): drop stale 'this branch task-galaxy-e2e' ref
The task-galaxy-e2e branch was merged + deleted; the durable reference is PR #205 alone. Tidies a dangling pointer that future readers might chase looking for a branch that no longer exists. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
994997ba7b |
fix(driver-galaxy): vendor MxGateway.Client + MxGateway.Contracts as binary refs
The sibling mxaccessgw repo restructured: clients/dotnet/MxGateway.Client no longer exists, and the proto contracts moved to a new namespace (ZB.MOM.WW.MxGateway.Contracts.Proto, was MxGateway.Contracts.Proto). The driver's source still expects the pre-restructure namespace, so the broken ProjectReference produced 86 build errors in src/ + 1 in tests/ on master. Resolution: vendor the last known-good build of MxGateway.Client.dll (99 KB, May 22) and MxGateway.Contracts.dll (490 KB, May 23) under src/Drivers/.../Driver.Galaxy/libs/, reference them via <Reference HintPath=...> in both the driver and its test csproj, and declare the NuGet packages the dropped ProjectReference was supplying transitively (Google.Protobuf, Grpc.Core.Api, Grpc.Net.Client, Microsoft.Extensions.Logging.Abstractions, Polly) at versions matching the sibling repo's ZB.MOM.WW.MxGateway.Contracts.csproj so binary compatibility is preserved. Why this over a source migration: Source migration would require namespace renames across ~19 driver files PLUS reimplementing MxGatewayClient / MxGatewaySession / GalaxyRepositoryClient (~2,200 LoC) — the sibling repo dropped the client library entirely, keeping only the proto contracts. Vendoring the last known-good binaries unblocks the build in minutes, freezes the gateway contract surface at a known-good version, and preserves the option to migrate properly once the sibling repo decides whether to restore a client library or hand the work back to us. libs/README.md documents the unwinding plan (either path closes the debt: sibling restores a client library, or driver migrates to the new contracts namespace + reimplements the client wrapper). Verification: - dotnet build ZB.MOM.WW.OtOpcUa.slnx: 0 errors (was 87). - Driver.Galaxy unit tests: 245/245 pass. - Integration tests not run here (require a live mxaccessgw gateway). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
0001cdd579 |
fix(scripted-alarms): reuse per-alarm evaluation scratch on the hot path
Core.ScriptedAlarms-009 resolution: replace the per-call Dictionary +
AlarmPredicateContext allocation with a per-alarm reusable AlarmScratch
held in _scratchByAlarmId, refilled in place under _evalGate on each
evaluation. The hot path no longer allocates per upstream tag change.
Why this matters:
On a busy line where many tags feeding many alarms change frequently,
the old BuildReadCache allocated a fresh dictionary + context on every
predicate evaluation — a steady stream of short-lived allocations the
GC eventually has to reclaim. With the reuse, the dictionary and
context are allocated once per alarm (on first evaluation) and refilled
in place across every subsequent re-eval.
Implementation:
- New private AlarmScratch class holds the reusable
Dictionary<string, DataValueSnapshot> read cache (pre-sized to the
alarm's Inputs.Count) and the AlarmPredicateContext that wraps it by
reference. The context observes refilled values without being
re-created.
- ConcurrentDictionary<string, AlarmScratch> _scratchByAlarmId on the
engine, cleared in LoadAsync alongside _alarms so a config-publish
drops the prior generation's scratch (Inputs / Logger may change).
- EvaluatePredicateToStateAsync looks up scratch via GetOrAdd, calls
the new RefillReadCache(Dictionary, IReadOnlySet) helper to clear +
repopulate the dictionary in place, then runs the predicate against
the reused context.
- BuildReadCache removed.
Safety:
Reuse is serialised under _evalGate which guarantees no two threads
ever observe the same scratch in a half-refilled state. The
AlarmPredicateContext is bound to the scratch dictionary by reference,
so the predicate's ctx.GetTag(path) sees the freshly-refilled values
rather than a stale snapshot.
Verification:
- All 66 ScriptedAlarms tests pass (was 63 — three new regression tests
locking the reuse contract).
- All 56 VirtualTags tests still pass (unchanged).
- All 104 Core.Scripting tests still pass (unchanged).
New tests in ScriptedAlarmEngineTests:
- Reevaluation_reuses_the_same_read_cache_dictionary — asserts
ReferenceEquals(scratch_before, scratch_after) across two
evaluations of the same alarm.
- Reevaluation_reuses_the_same_predicate_context — same, for the
context.
- LoadAsync_drops_the_prior_generations_scratch — asserts a config
publish wipes the prior scratch (so a stale Logger / Inputs can't
leak into the new generation).
Internal test hooks TryGetScratchReadCacheForTest /
TryGetScratchContextForTest added via the existing
InternalsVisibleTo for the tests project. Kept internal — not part of
the public engine surface.
Docs:
- docs/v2/Galaxy.Performance.md "Scripted-alarm engine" section
rewritten as "hot-path allocation reuse" documenting the new
contract + reuse safety reasoning + the three regression tests.
- code-reviews/Core.ScriptedAlarms/findings.md -009 flipped
Won't Fix → Resolved.
- code-reviews/README.md regenerated.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
7b6ab2ec6f |
fix(scripting): unload compiled-script assemblies via collectible ALC
Core.Scripting-008 resolution: replace the legacy CSharpScript.CreateDelegate
path with hand-rolled CSharpCompilation + Emit + collectible AssemblyLoadContext,
so per-publish compile accretion no longer requires a server restart to reclaim.
Why this was needed:
Roslyn's CSharpScript path emits dynamically-compiled script assemblies into
the default AssemblyLoadContext, which is non-collectible. Across config-
publish generations each Clear() drops dictionary entries but the emitted
assemblies stay loaded for process lifetime, so memory grows steadily on
long-running servers with frequent publishes. The accepted-limitation note
in docs/VirtualTags.md recommended scheduled restarts as the workaround;
operator feedback was that restarts are difficult, so the underlying
limitation was the right thing to fix.
Implementation:
- New ScriptAssemblyLoadContext(name, isCollectible: true) hosts one emitted
script assembly per evaluator.
- ScriptEvaluator.Compile synthesises a wrapper class around the user source
(CompiledScript.Run(globals) — explicit return required per ordinary C#
semantics, which every existing script already uses), builds a
CSharpCompilation against the sandbox references, runs the
ForbiddenTypeAnalyzer over the semantic model unchanged, emits to an
in-memory PE stream, loads via ScriptAssemblyLoadContext.LoadFromStream,
and binds a strongly-typed Func<ScriptGlobals<TContext>, TResult> delegate
via reflection.
- ScriptEvaluator now implements IDisposable — Dispose calls
AssemblyLoadContext.Unload(), which makes the emitted assembly eligible
for GC at the next collection cycle.
- CompiledScriptCache.Clear() disposes every materialised evaluator before
dropping its dictionary entry; CompiledScriptCache itself is now
IDisposable for graceful server shutdown.
- ScriptSandbox.Build returns a new SandboxConfig (References + Imports)
instead of a Roslyn ScriptOptions; references now span BCL via the
TRUSTED_PLATFORM_ASSEMBLIES set filtered to System.* + netstandard +
Microsoft.Win32.Registry, so forbidden BCL types resolve at compile and
ForbiddenTypeAnalyzer is the sole security gate (consistent with the
Core.Scripting-001 / -002 model — references-list-only restriction is
porous against type forwarding, so the analyzer must be the real gate).
Verification:
- All 104 Core.Scripting tests pass (was 101 — three new regression tests
locking the unload contract).
- All 56 VirtualTags tests pass (unchanged).
- All 63 ScriptedAlarms tests pass (unchanged).
- New CompiledScriptCacheTests:
- Dispose_unloads_compiled_script_assembly_load_context — proves single-
evaluator ALC unload via WeakReference + bounded GC.Collect() loop.
- Clear_disposes_every_materialised_evaluator — proves publish-replace
releases every prior generation's ALC.
- GetOrCompile_after_Dispose_throws_ObjectDisposedException — locks the
post-dispose contract.
Docs:
- docs/VirtualTags.md "Compile cache" section rewritten: the accepted-
limitation note replaced with the unload contract + the new authoring
convention (explicit return).
- docs/ScriptedAlarms.md cross-reference updated to drop the obsolete
restart guidance.
- code-reviews/Core.Scripting/findings.md Core.Scripting-008 flipped
Won't Fix → Resolved with the implementation summary.
- code-reviews/README.md regenerated.
Pre-existing breakage note: Driver.Galaxy fails the solution-wide build on
master because its ProjectReference to the sibling mxaccessgw repo's
MxGateway.Client targets a path that the sibling repo no longer has after a
recent restructuring. This is unrelated to Core.Scripting-008 and was
verified to exist on master before this branch was cut.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
5a9c4591b9 |
fix(cli-common): name native-driver-emitted status codes in SnapshotFormatter
Driver.FOCAS.Cli-005 follow-up: extend the SnapshotFormatter.FormatStatus shortlist with the five Bad* codes the native-protocol mappers (FOCAS, AbCip, AbLegacy) emit but which the shortlist previously left unnamed, so they rendered only as severity-class 'Bad' instead of the documented 'BadDeviceFailure' / 'BadNotWritable' / ... names operators are told to read off probe/write output. Added entries: 0x80020000 BadInternalError 0x803B0000 BadNotWritable 0x803C0000 BadOutOfRange 0x803D0000 BadNotSupported 0x80550000 BadDeviceFailure (BadTimeout 0x800A0000 was already added under Driver.Cli.Common-001.) Tests: SnapshotFormatterTests gains a new [Theory] FormatStatus_names_native_driver_emitted_codes covering the five names, and the existing well-known [Theory] is extended with the same entries to enforce exact '0x... (Name)' rendering. Suite now 47 green (was 42). Flips Driver.FOCAS.Cli-005 from Deferred to Resolved; README regenerated. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
0f8ce1cb80 |
docs(code-reviews): regenerate index — final batch — 6 Low findings resolved
Batch 7 closed the last Open findings in Client.UI. The review backlog is now empty: 0 Open findings across all 31 modules. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
1b10194634 |
fix(client-ui): resolve Low code-review findings (Client.UI-003,004,006,009,010,011)
- Client.UI-003: wire Serilog properly per CLAUDE.md — console sink + rolling daily file sink in Program.Main, Log.CloseAndFlush in finally, per-VM Log.ForContext<> loggers. - Client.UI-004: migrate the cert-store folder picker from the obsolete OpenFolderDialog to StorageProvider.OpenFolderPickerAsync (with TryGetFolderFromPathAsync seed + TryGetLocalPath extraction). - Client.UI-006: surface formerly silent catch blocks via an observable StatusMessage on the Subscriptions / Alarms VMs that bubbles up into the shell's status bar; soft fallbacks log at Information level so hard failures stay distinguishable. - Client.UI-009: docs/Client.UI.md now lists Standard Deviation in the Aggregate row of the Query Options table. - Client.UI-010: removed the unused MinDateTimeProperty / MaxDateTimeProperty styled properties from DateTimeRangePicker. - Client.UI-011: updated the cert-store TextBox watermark from the legacy AppData/LmxOpcUaClient/pki to the canonical AppData/OtOpcUaClient/pki. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
59ecd18169 |
docs(code-reviews): regenerate index — 25 Low findings resolved
Batch 6 cleared Open findings in Driver.FOCAS.Cli (1 deferred to Driver.Cli.Common), Driver.Cli.Common, Driver.Historian.Wonderware.Client, Client.CLI, and Client.Shared. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
2a6ac07111 |
fix(client-shared): resolve Low code-review findings (Client.Shared-003,004,009,010,011)
- Client.Shared-003: DefaultSessionAdapter.WriteValueAsync / CallMethodAsync guard against null/empty Results and throw ServiceResultException with the response's ServiceResult code instead of indexing into a missing list. - Client.Shared-004: DefaultSessionAdapter.CloseAsync / HistoryReadRawAsync / HistoryReadAggregateAsync use the Session.*Async overloads and honour the caller's CancellationToken. - Client.Shared-009: AcknowledgeAlarmAsync returns the underlying ServiceResultException.StatusCode on failure instead of always Good; IOpcUaClientService doc updated to describe the new contract. - Client.Shared-010: ConnectionSettings.CertificateStorePath defaults to empty; DefaultApplicationConfigurationFactory resolves the canonical PKI path lazily, so per-failover ConnectionSettings copies don't hit the filesystem. - Client.Shared-011: added the alarm-fallback regression test, extracted EndpointSelector as a pure static, and added EndpointSelectorTests covering security-mode match, Basic256Sha256 preference, fallback, diagnostics, hostname rewrite, and null/empty guards. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
7fe9f16cf8 |
fix(client-cli): resolve Low code-review findings (Client.CLI-002,003,004,006,007,008,009,010)
- Client.CLI-002: SubscribeCommand's neverWentBad list now requires the node to be present in lastStatus (i.e. received at least one update) so the 'suspect' bucket only contains observed nodes. - Client.CLI-003: every long-running command validates numeric option ranges (Interval / Depth / MaxDepth / Duration / Max) and throws CliFx CommandException on out-of-range values. - Client.CLI-004: SubscribeCommand carries XML summary docs on the type, ctor, every [CommandOption] property, and ExecuteAsync — matching the sibling commands' style. - Client.CLI-006: HistoryReadCommand parses --start / --end with InvariantCulture+UTC and surfaces FormatException as CommandException; every NodeIdParser.ParseRequired call wraps FormatException / ArgumentException as CommandException. - Client.CLI-007: CommandBase.ConfigureLogging calls Log.CloseAndFlush() before assigning a new Log.Logger so prior sinks are disposed. - Client.CLI-008: rewrote the subscribe and historyread sections of docs/Client.CLI.md (every flag documented, summary-bucket vocabulary, StandardDeviation aggregate, UTC --start/--end convention). - Client.CLI-009: SubscribeCommand / AlarmsCommand use named local handlers and detach them via -= after UnsubscribeAsync so no notification reaches the console after the command's output phase ends. - Client.CLI-010: added CommandRangeValidationTests, EventHandlerLifecycleTests, InputValidationErrorsTests, LoggerLifecycleTests, and SubscribeCommandSummaryTests pinning every Low fix; FakeOpcUaClientService gained AddDiscoveredVariable + RaiseDataChanged + BrowseResultsByParent helpers. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
879925180b |
fix(driver-historian-wonderware-client): resolve Low code-review findings (Driver.Historian.Wonderware.Client-003,004,006,008,010)
- Driver.Historian.Wonderware.Client-003: replaced the mixed Interlocked + healthLock counters with RecordOutcome that touches _totalQueries and exactly one of _totalSuccesses / _totalFailures under one acquisition. - Driver.Historian.Wonderware.Client-004: InvokeAndClassifyAsync routes transport + sidecar classification through a single RecordOutcome call; the legacy ReclassifySuccessAsFailure two-step is gone. - Driver.Historian.Wonderware.Client-006: removed the dead ReconnectInitialBackoff / ReconnectMaxBackoff options and added a doc <remarks> stating the channel performs a single in-place reconnect; retry/backoff stays with the caller. - Driver.Historian.Wonderware.Client-008: the audit-suppression comment block now records advisory titles, why neither applies, and the revisit trigger. - Driver.Historian.Wonderware.Client-010: reworded Dispose() to claim deadlock-safety and added a GetHealthSnapshot summary documenting the single-channel collapse + counter invariant. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
3ca569f621 |
fix(driver-cli-common): resolve Low code-review findings (Driver.Cli.Common-004,006)
- Driver.Cli.Common-004: confirm the FormatTable empty-input guard
landed earlier (commit
|
||
|
|
6923be3aa2 |
fix(driver-focas-cli): resolve Low code-review findings (Driver.FOCAS.Cli-001,002,003,004; -005 deferred)
- Driver.FOCAS.Cli-001: WriteCommand.ParseValue now wraps numeric FormatException / OverflowException as CliFx CommandException with the offending value. - Driver.FOCAS.Cli-002: SubscribeCommand's OnDataChange handler and the banner both take a writeLock so notification-callback and main-thread writes can't interleave; handler exceptions are warn-and-swallow. - Driver.FOCAS.Cli-003: FocasCommandBase.ValidateOptions rejects --cnc-port outside 1..65535, non-positive --timeout-ms, and non-positive --interval-ms; ExecuteAsync calls it first. - Driver.FOCAS.Cli-004: 'await using var driver' is the sole driver disposal path; dropped the redundant explicit await ShutdownAsync. - Driver.FOCAS.Cli-005 (Deferred): the fix lives in Driver.Cli.Common.SnapshotFormatter — explicitly naming the status-code shortlist there benefits every driver CLI. Left as a Driver.Cli.Common follow-up. - Registered the new tests/Drivers/Cli/ZB.MOM.WW.OtOpcUa.Driver.FOCAS.Cli.Tests project in ZB.MOM.WW.OtOpcUa.slnx. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
2a941b255f |
docs(code-reviews): regenerate index — 29 Low findings resolved
Batch 5 cleared Open findings in Driver.AbCip.Cli, Driver.AbLegacy.Cli, Driver.S7.Cli, Driver.TwinCAT.Cli, and Driver.Modbus.Cli. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
80ef8806e0 |
fix(driver-modbus-cli): resolve Low code-review findings (Driver.Modbus.Cli-003,004,005,006,007,008)
- Driver.Modbus.Cli-003: ModbusCommandBase.ValidateEndpoint rejects --port outside 1..65535, non-positive --timeout-ms, and --unit-id outside 1..247. - Driver.Modbus.Cli-004: wrapped SubscribeCommand's OnDataChange handler body in a try/catch (warn-and-swallow) and serialised the console write through a lock. - Driver.Modbus.Cli-005: Probe / Read / Write now catch the cancellation-during-init OperationCanceledException and print 'Cancelled.' instead of dumping a stack trace. - Driver.Modbus.Cli-006: ProbeCommand.ComputeVerdict derives the headline from BOTH the driver state and the probe snapshot's OPC UA quality class so the headline can't disagree with the wire result. - Driver.Modbus.Cli-007: docs/Driver.Modbus.Cli.md carries an explicit 'CLI scope' callout — the address-string grammar is a DriverConfig JSON feature; the CLI takes the structured triple only. - Driver.Modbus.Cli-008: pinned BuildOptions, ValidateEndpoint, the region-validation guards, ComputeVerdict, and the cancellation-during- initialize paths. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
f2ee027145 |
fix(driver-twincat-cli): resolve Low code-review findings (Driver.TwinCAT.Cli-001,002,003,004,005,006,007)
- Driver.TwinCAT.Cli-001: TwinCATCommandBase.Validate rejects non-positive TimeoutMs / IntervalMs and AmsPort outside 1..65535; ExecuteAsync calls it first. - Driver.TwinCAT.Cli-002: SubscribeCommand serialises every WriteLine through a writeLock to remove the notification-callback vs banner interleave risk. - Driver.TwinCAT.Cli-003: SubscribeCommand.DescribeMechanism derives the banner label from the returned ISubscriptionHandle.DiagnosticId so it can't disagree with what the driver actually did. - Driver.TwinCAT.Cli-004: introduced TwinCATTagCommandBase carrying --poll-only + BuildOptions; BrowseCommand stays on the slimmer TwinCATCommandBase so --poll-only no longer surfaces in browse --help. - Driver.TwinCAT.Cli-005: ProbeCommand --type now carries the 't' short alias to match the other commands. - Driver.TwinCAT.Cli-006: 35 new tests covering Gateway / AmsAddress parse / BuildOptions / PollOnly / browse-helpers / probe-alias / mechanism derivation. - Driver.TwinCAT.Cli-007: replaced the empty-init <inheritdoc/> with an explicit summary warning future maintainers about the no-op init. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
67ef6c4ebc |
fix(driver-s7-cli): resolve Low code-review findings (Driver.S7.Cli-004,005,006,007)
- Driver.S7.Cli-004: 'await using var driver' is the sole driver disposal path; dropped the redundant explicit await ShutdownAsync from each command's finally. - Driver.S7.Cli-005: deleted the stale empty tests/ZB.MOM.WW.OtOpcUa.Driver.S7.Cli.Tests/ directory (the real test project lives under tests/Drivers/Cli/). - Driver.S7.Cli-006: S7CommandBaseBuildOptionsTests cover the probe toggle, timeout mapping, host/port/CPU/rack/slot wiring, and tag list passthrough. - Driver.S7.Cli-007: re-added the SubscribeCommand handler comment explaining the CliFx IConsole.Output usage and that the poll-thread raises events. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
f46e126208 |
fix(driver-ablegacy-cli): resolve Low code-review findings (Driver.AbLegacy.Cli-002,003,004,005,006,007)
- Driver.AbLegacy.Cli-002: WriteCommand.Value description lists the full true/false, 1/0, on/off, yes/no alias set. - Driver.AbLegacy.Cli-003: SubscribeCommand serialises every WriteLine via a per-execution consoleGate lock so the poll-thread OnDataChange handler can't interleave with the banner. - Driver.AbLegacy.Cli-004: dropped 'await using var driver' in favour of a plain 'var driver' + explicit await ShutdownAsync in finally; the driver is no longer shut down twice. - Driver.AbLegacy.Cli-005: SubscribeCommand.IntervalMs description carries the PollGroupEngine 250ms-floor caveat; docs/Driver.AbLegacy.Cli.md spells out the same. - Driver.AbLegacy.Cli-006: ProbeCommand --type now carries the short alias 't' to match the other commands. - Driver.AbLegacy.Cli-007: BuildOptionsTests cover the probe-disabled, device-shape, tag-passthrough, timeout-propagation, and empty-tag-list paths. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
759af8c1bb |
fix(driver-abcip-cli): resolve Low code-review findings (Driver.AbCip.Cli-003,004,005,006,007,008)
- Driver.AbCip.Cli-003: SubscribeCommand prints the 'Subscribed' banner BEFORE wiring OnDataChange so the main thread can't interleave its write with the poll-thread handler. - Driver.AbCip.Cli-004: AbCipCommandBase.Timeout and SubscribeCommand validate TimeoutMs / IntervalMs and throw CommandException on non-positive values. - Driver.AbCip.Cli-005: every command now calls FlushLogging() in its finally block. - Driver.AbCip.Cli-006: Timeout init throws NotSupportedException with a pointer at TimeoutMs instead of silently swallowing assignments. - Driver.AbCip.Cli-007: added AbCipCommandBaseTests covering BuildOptions shape, probe / controller-browse / alarm toggles, host address, family selection, tag list passthrough. - Driver.AbCip.Cli-008: rewrote the opening paragraph in docs/Driver.AbCip.Cli.md to credit the six-CLI roster with a pointer at docs/DriverClis.md. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
61c0311938 |
docs(code-reviews): regenerate index — 24 Low findings resolved
Batch 4 cleared Open findings in Driver.TwinCAT, Driver.Modbus, Driver.OpcUaClient, Driver.Historian.Wonderware, and Driver.Modbus.Addressing. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
9263519852 |
fix(driver-modbus-addressing): resolve Low code-review findings (Driver.Modbus.Addressing-006,007,009)
- Driver.Modbus.Addressing-006: broaden the catch in TryParseFamilyNative so a future helper throwing a non-Argument/Overflow type still satisfies the try-parse contract. - Driver.Modbus.Addressing-007: document that the address grammar does not carry ModbusStringByteOrder (the structured-tag path does); add a 'Grammar scope' bullet to docs/v2/dl205.md. - Driver.Modbus.Addressing-009: reword the ModbusModiconAddress comments so they don't imply a leading-digit invariant the parser doesn't enforce. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
1f29b215c8 |
fix(driver-historian-wonderware): resolve Low code-review findings (Driver.Historian.Wonderware-004,005,007,008,010,011,012)
- Driver.Historian.Wonderware-004: ToHistorianEvent synthesises a fresh
Guid when the upstream EventId is unparseable and logs the substitution
instead of writing the historian with Guid.Empty.
- Driver.Historian.Wonderware-005: GetHealthSnapshot derives the
connection-open booleans from the active-node fields so the snapshot
is self-consistent without depending on the secondary lock.
- Driver.Historian.Wonderware-007: SID-mismatch branch in PipeServer now
sends a HelloAck { Accepted=false, RejectReason } so the client sees a
symmetric rejection.
- Driver.Historian.Wonderware-008: classify StartQuery failures —
connection-class codes drop the connection, query-class codes throw
QueryClassStartQueryException so the IPC layer surfaces Success=false.
- Driver.Historian.Wonderware-010: RequestTimeoutSeconds now enforced
via BuildRequestCts linked to the caller's CancellationToken.
- Driver.Historian.Wonderware-011: refreshed XML docs to describe the
current sidecar / named-pipe architecture (Galaxy.Host / Proxy
references reframed as historical context).
- Driver.Historian.Wonderware-012: pinned the previously-uncovered
HistorianDataSource behaviours with five new test files; also removed
the stale empty tests/ZB.MOM.WW.OtOpcUa.Driver.Historian.Wonderware.Tests
directory.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|