Rewrite docker-dev/README.md and update docker-dev/Dockerfile comment
to reflect the new topology: one Akka mesh seeded by central-1/central-2
(fused admin+driver, MAIN cluster, single UI at http://localhost:9200),
with site-a-*/site-b-* as driver-only members scoped by ClusterId.
Removes all references to the old three-mesh layout (admin-a, admin-b,
driver-a, driver-b, site-a.localhost, site-b.localhost).
Set Security__DeployApiKey on the 6 admin/fused nodes so POST /api/deployments works in the
dev fleet (reachable via traefik :9200). Smoke-tested: no/wrong key -> 401, correct key -> 202
with a real deployment id.
The docker-dev sql service had no volume, so its data lived in the container
writable layer. A recreate silently dropped the OtOpcUa database and every host
node failed its configdb health check (AdminUI 503) until an operator re-ran
'dotnet ef database update' + the cluster-seed. Add a named volume
(otopcua-mssql-data -> /var/opt/mssql) so the migrated schema + seeded clusters
survive 'docker compose up' cycles.
docker-dev un-stubbed → binds zb-shared-glauth on 10.100.0.35:3893 (dc=zb,dc=local)
via cn=serviceaccount; sign in multi-role/password (group→role seeded by
seed-clusters.sql). Per-VM C:\publish\glauth + base DNs dc=lmxopcua/dc=otopcua
obsolete. Source of truth: scadaproj/infra/glauth/.
The 6 admin/site host containers drop DevStubMode and bind the shared dev GLAuth
(scadaproj/infra/glauth/, dc=zb,dc=local) via cn=serviceaccount. seed-clusters.sql
seeds system-wide LdapGroupRoleMapping rows OtOpcUa-Admins->Administrator,
OtOpcUa-Designers->Designer, OtOpcUa-Viewers->Viewer (bare-RDN group keys, matching
the shared lib's ToGroupShortName). Verified: multi-role -> Viewer+Designer+
Administrator at :9200 via real LDAP.
The per-role overlay (appsettings.{role}.json) was appended after WebApplicationBuilder's
default sources, so it outranked environment variables — a baked role file could not be
overridden by a deployment env var. In otopcua-dev this meant appsettings.admin.json's
Security:Ldap:DevStubMode=false beat the compose's DevStub override, so every AdminUI login
attempted a real LDAP bind against a non-existent server and failed with 'Unexpected
authentication error'.
- Program.cs: re-append AddEnvironmentVariables() + AddCommandLine(args) after the role
overlay so deployment overrides keep top precedence (overlay still beats base appsettings).
- docker-dev/docker-compose.yml: the DevStub env var targeted the stale 'Authentication:Ldap'
section; the code reads 'Security:Ldap'. Corrected the prefix on every host node (+ comment).
Dev AdminUI login now signs in as Administrator via the DevStub bypass.
Host :80 collides with the sister scadabridge-traefik dev stack; bind the
OtOpcUa Traefik :80 entrypoint to host 9200 instead (admin UI now at
http://localhost:9200). Dashboard already on 8089 to avoid the same clash.
Replaces the old fallback (mxgw_otopcua_…UY_NKlBl3) with the freshly issued
mxgw_otopcua2_GI7-… on all 8 host services. Gateway endpoint stays at
http://10.100.0.48:5120 (seed-clusters.sql already points there). Operators
who set GALAXY_MXGW_API_KEY in their shell continue to override the default
unchanged.
Closes the gap where Tag rows with EquipmentId=NULL + Namespace.Kind=SystemPlatform
(Galaxy hierarchy) existed in ConfigDb but were never surfaced in the OPC UA
address space. Now they materialise as Variable nodes under a folder named for
their FolderPath, browseable through any OPC UA client.
Layers touched:
- IOpcUaAddressSpaceSink: new EnsureVariable(nodeId, parentFolderId, displayName,
dataType) signature on the sink interface, NullSink, DeferredSink, SdkSink.
- OtOpcUaNodeManager.EnsureVariable: creates a BaseDataVariableState parented
under the named folder (or root), initial Value=null +
StatusCode=BadWaitingForInitialData; resolves Tag.DataType strings to the
matching OPC UA built-in NodeId. Idempotent.
- Phase7CompositionResult: new GalaxyTags collection of GalaxyTagPlan records
carrying (TagId, DriverInstanceId, FolderPath, DisplayName, DataType,
MxAccessRef). Constructor overloads keep existing call sites compiling.
- Phase7Composer.Compose: now takes Tag + Namespace inputs, filters for
SystemPlatform-namespace tags with EquipmentId=NULL, emits GalaxyTagPlan
rows with MXAccess ref "FolderPath.Name".
- Phase7Plan: new AddedGalaxyTags / RemovedGalaxyTags / ChangedGalaxyTags
collections + GalaxyTagDelta record; IsEmpty + needsRebuild updated.
- Phase7Planner.Compute: diffs GalaxyTags by TagId via existing DiffById helper.
- DeploymentArtifact.ParseComposition: reads the Tags + Namespaces +
DriverInstances arrays the ConfigComposer already emits, applies the same
SystemPlatform filter, returns the same GalaxyTagPlan list as the composer
so artifact-side and compose-side plans agree.
- Phase7Applier: new MaterialiseGalaxyTags pass that ensures one folder per
distinct FolderPath then one Variable per tag. NodeId for the variable is
"<FolderPath>.<Name>" matching the MXAccess ref so the future Galaxy
SubscribeBulk wiring can address them directly.
- OpcUaPublishActor.RebuildAddressSpace: invokes MaterialiseGalaxyTags after
MaterialiseHierarchy. _lastApplied initialiser updated for the new ctor.
- seed-clusters.sql: pre-existing TestMachine_001.TestAlarm001..003 rows
needed no change — the composer/applier now picks them up automatically.
Verified end-to-end via docker-dev: deploy click → driver-a logs
"Phase7Applier: Galaxy tags materialised (tags=3, folders=1)" → OPC UA Client
CLI browses the three Variable nodes under TestMachine_001 folder. Reads
return BadWaitingForInitialData status (expected — Galaxy driver's
SubscribeBulk wiring to push values into the nodes is the remaining
follow-up).
Two bring-up issues found while clicking through the operator Deploy flow
on the docker-dev stack:
- ConfigPublishCoordinator computes expected-ack NodeIds from
Akka.Cluster.State.Members as "{host}:{port}" (e.g. "driver-a:4053") to
match ClusterRoleInfo's NodeId derivation. The seed had been using the
bare service name ("driver-a"), so NodeDeploymentState INSERT hit FK
violation 547 on NodeDeploymentState.NodeId → ClusterNode.NodeId. Seed
now writes the full host:port form for every ClusterNode row.
- Blazor Server uses SignalR (WebSocket upgrade after the initial GET).
Without sticky sessions, Traefik round-robins admin-a/admin-b and the
WebSocket upgrade lands on the wrong backend, returning "No Connection
with that ID: Status code '404'" so @onclick handlers never fire on the
client. Added sticky.cookie (otopcua_lb, SameSite=Lax) to all three
Traefik service loadBalancers so each session pins to one node.
Verified end-to-end: clicked "Deploy current configuration" on
/deployments → Deployment row sealed in ~70ms → driver-a + driver-b
spawn GalaxyMxGateway driver (stub=False) → GalaxyDriver connects to
http://10.100.0.48:5120 with the seeded ApiKeySecretRef=env:GALAXY_MXGW_API_KEY.
User confirmed the mxaccessgw client (Galaxy driver) doesn't need Windows
— only the gateway worker has that constraint. This wires the Galaxy
driver into the docker-dev fleet:
- docker-compose.yml: GALAXY_MXGW_API_KEY env var on every host service
(admin nodes harmlessly ignore it; driver-role nodes pick it up when
the seeded DriverInstance resolves ApiKeySecretRef=env:GALAXY_MXGW_API_KEY).
Default value matches the key the operator provided; override via shell
env (GALAXY_MXGW_API_KEY=... docker compose up -d) to rotate without
editing compose.
- seed-clusters.sql: now creates a SystemPlatform Namespace
(MAIN-galaxy, urn:zb:docker-dev:galaxy) plus a GalaxyMxGateway
DriverInstance (MAIN-galaxy-mxgw) in the MAIN cluster pointing at
http://10.100.0.48:5120 with UseTls=false. Idempotent via IF NOT EXISTS.
- DriverInstanceActor.ShouldStub: clarified the doc comment — only the
legacy "Galaxy" type name and "Historian.Wonderware" are Windows-only;
the v2 "GalaxyMxGateway" driver is .NET 10 cross-platform (gRPC to an
external gateway) and is NOT stubbed.
- README: documents the final operator step — sign in, click "Deploy
current configuration" on /deployments to materialise the seeded
Galaxy driver into a running gRPC connection. Raw DriverInstance rows
don't spawn drivers on their own; the v2 lifecycle requires a sealed
Deployment first.
Two fixes surfaced while bringing up the docker-dev stack end-to-end:
- HealthEndpoints.MapOtOpcUaHealth now calls .AllowAnonymous() on /health/ready,
/health/active, /healthz. Without it the AddOtOpcUaAuth fallback policy 401s
every probe and Traefik marks every backend unhealthy → all three cluster
routes return 503.
- cluster-seed entrypoint no longer attempts to apply Migrate-To-V2.sql via
sqlcmd. The EF-generated idempotent script puts CREATE PROCEDURE inside
IF NOT EXISTS BEGIN ... END blocks (procs must be first in their batch),
so sqlcmd fails with "Must declare the scalar variable @FromGenerationId".
EF's own runner handles this; sqlcmd doesn't. The seed now just waits for
the schema and applies row inserts. Migrations remain the operator's job:
dotnet ef database update --project src/Core/.../Configuration \
--startup-project src/Server/.../Host
Also: LDAP service removed (bitnami/openldap:2.6 image retired, legacy tag
crashes mid-setup with exit 68); every host now runs with
Authentication__Ldap__DevStubMode=true. Bumped LDAP+Traefik dashboard host
ports to avoid collisions with the sister scadalink dev stack (3893→3894,
8080→8089).
Confirmed working end-to-end: all three Traefik routes return HTTP 200,
cluster-seed populates ServerCluster (MAIN/SITE-A/SITE-B) + ClusterNode
(driver-a/b, site-a-1/2, site-b-1/2) rows on first boot.
Adds a one-shot cluster-seed service to docker-dev/docker-compose.yml
that pre-populates the three Akka clusters' scope rows in the shared
OtOpcUa ConfigDb so operators don't have to click through /clusters +
/hosts on every fresh bring-up.
Seed contents:
ServerCluster MAIN (Warm/2), SITE-A (Warm/2), SITE-B (Warm/2)
ClusterNode driver-a + driver-b → MAIN
site-a-1 + site-a-2 → SITE-A
site-b-1 + site-b-2 → SITE-B
NodeCount + RedundancyMode honour the CK_ServerCluster check constraint.
ApplicationUri follows the urn:OtOpcUa:<NodeId> convention; uniqueness
across the fleet satisfies UX_ClusterNode_ApplicationUri.
Mechanism:
- docker-dev/seed/seed-clusters.sql — idempotent INSERTs (IF NOT EXISTS
guards on every row).
- docker-dev/seed/entrypoint.sh — bash wrapper that waits for SQL to
accept connections, then polls until dbo.ServerCluster exists (the
host containers' EF auto-migration creates it on first boot), then
applies the SQL script.
- cluster-seed service uses mcr.microsoft.com/mssql-tools as the base
image (bash + sqlcmd available), restart: "no" so it runs once.
Re-running `docker compose up` is safe: the seed exits cleanly on the
second run because every INSERT is guarded.
Manual re-seed: `docker compose run --rm cluster-seed`.
The previous commit (961e094) gave each site cluster its own database
(OtOpcUa_SiteA / OtOpcUa_SiteB). That fights the architecture — ConfigDb
is multi-tenant by design: one schema with a ServerCluster table whose
rows scope the rest of the configuration via ClusterId. Per-cluster
databases would split the schema and force every singleton/coordinator
to point at a different connection string.
Correct model: one ConfigDb, three ServerCluster rows (MAIN / SITE-A /
SITE-B), each Akka cluster's ClusterNode rows pointing back at the
matching ClusterId. Akka mesh isolation is still enforced by the
disjoint seed-node lists (unchanged from the previous commit).
Compose: all eight host nodes now point at Server=sql,1433;Database=OtOpcUa
and the README documents the post-boot ServerCluster + ClusterNode rows
operators need to create via /clusters and /hosts before the runtime can
resolve its scope.
Extends the docker-dev compose with two additional, fully-isolated Akka
clusters representing distinct sites. Each site is a 2-node fused
admin+driver cluster (OTOPCUA_ROLES=admin,driver on both nodes), backed
by its own ConfigDb database so configuration state stays separate from
the main cluster and from the other site.
Cluster isolation: the three meshes share the same Akka system name
"otopcua" and remoting port 4053 (inside each container's own network
namespace), but their seed-node lists are disjoint — main seeds at
admin-a, site-a seeds at site-a-1, site-b seeds at site-b-1 — so gossip
doesn't cross between them.
Layout:
Main cluster ConfigDb=OtOpcUa admin-a, admin-b, driver-a, driver-b
Site A ConfigDb=OtOpcUa_SiteA site-a-1, site-a-2 (fused admin+driver)
Site B ConfigDb=OtOpcUa_SiteB site-b-1, site-b-2 (fused admin+driver)
OPC UA endpoints exposed on host ports 4840-4845. Admin UIs reachable
through Traefik via Host-header routing:
http://localhost → main cluster (PathPrefix default)
http://site-a.localhost → site A
http://site-b.localhost → site B
`*.localhost` auto-resolves on macOS; Linux users add the two hosts to
/etc/hosts (or rely on the resolver's RFC 6761 behaviour).
- scripts/install/traefik.yml + traefik-dynamic.yml: Traefik static + dynamic
config. One :80 entry point, one router on HostRegexp(otopcua.*), one
service load-balancing admin-a:9000 + admin-b:9000 with /health/active health
check (interval 5s, timeout 2s, expected 200). Followers return 503 from
/health/active so Traefik drops them within the next interval after a
leadership change.
- scripts/install/Install-Traefik.ps1: downloads Traefik for Windows, drops the
yml configs, registers the OtOpcUaTraefik Windows service via sc.exe with
restart-on-failure. Companion to Install-Services.ps1.
- docker-dev/{Dockerfile,docker-compose.yml,traefik-dynamic.yml,README.md}:
Mac-friendly four-node fleet (admin-a + admin-b + driver-a + driver-b) plus
SQL Server 2022 + OpenLDAP + Traefik. Single OtOpcUa.Host image built once;
Compose drives OTOPCUA_ROLES + Cluster:* per container to differentiate the
four hosts. README walks through bring-up + failover smoke + the dev LDAP
users.
Note: untested on macOS (no local Docker — see docs/v2/dev-environment.md).