Commit Graph

22 Commits

Author SHA1 Message Date
Joseph Doherty f817fc8a8f fix(docker-dev): pin EF/AspNetCore logs to Warning + per-service mem limits to stop OOM/starvation 2026-06-07 10:53:08 -04:00
Joseph Doherty b0a62a9f3b fix(docker-dev): self-bootstrap schema via one-shot migrator (fixes fresh-volume quirks)
Adds a 'migrator' Dockerfile stage + Compose service that runs 'dotnet ef
database update' once on bring-up, so a fresh SQL volume gets the schema with no
operator step (quirk 1). cluster-seed + every host node depend on it via
service_completed_successfully, so the seed never races an in-progress migration
(quirk 2). Host build pinned to target: runtime (the migrator is now the last
stage). entrypoint + README updated; the manual 'dotnet ef' first-time step is
gone. Verified: down -v + up --build self-bootstraps (migrator+seed exit 0,
6 nodes up), deploy Sealed 6/6.
2026-06-07 08:20:56 -04:00
Joseph Doherty b45e0be427 docs(docker-dev): document first-time DB migrate + reseed (fresh-volume bootstrap) 2026-06-07 03:56:33 -04:00
Joseph Doherty b88ae5db10 docs(docker-dev): document single-mesh hub-and-spoke topology
Rewrite docker-dev/README.md and update docker-dev/Dockerfile comment
to reflect the new topology: one Akka mesh seeded by central-1/central-2
(fused admin+driver, MAIN cluster, single UI at http://localhost:9200),
with site-a-*/site-b-* as driver-only members scoped by ClusterId.
Removes all references to the old three-mesh layout (admin-a, admin-b,
driver-a, driver-b, site-a.localhost, site-b.localhost).
2026-06-07 03:34:49 -04:00
Joseph Doherty 05471dc36c feat(docker-dev): seed MAIN ClusterNodes as central-1/central-2 2026-06-07 03:09:16 -04:00
Joseph Doherty 7bba86b2af feat(docker-dev): Traefik routes only the central cluster UI 2026-06-07 03:08:26 -04:00
Joseph Doherty 5f48f81d5a feat(docker-dev): single-mesh hub-and-spoke (central-1/2 + driver-only sites) 2026-06-07 03:08:17 -04:00
Joseph Doherty 9a67ebc8a8 chore(docker-dev): enable the deploy REST API key on admin-capable nodes
v2-ci / build (push) Failing after 50s
v2-ci / unit-tests (tests/Core/ZB.MOM.WW.OtOpcUa.Cluster.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.ControlPlane.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Runtime.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Security.Tests) (push) Has been skipped
v2-ci / integration (tests/Server/ZB.MOM.WW.OtOpcUa.Host.IntegrationTests) (push) Has been skipped
v2-ci / integration (tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.IntegrationTests) (push) Has been skipped
Set Security__DeployApiKey on the 6 admin/fused nodes so POST /api/deployments works in the
dev fleet (reachable via traefik :9200). Smoke-tested: no/wrong key -> 401, correct key -> 202
with a real deployment id.
2026-06-06 16:03:25 -04:00
Joseph Doherty 3e9793eff7 fix(docker-dev): persist dev SQL ConfigDb on a named volume
The docker-dev sql service had no volume, so its data lived in the container
writable layer. A recreate silently dropped the OtOpcUa database and every host
node failed its configdb health check (AdminUI 503) until an operator re-ran
'dotnet ef database update' + the cluster-seed. Add a named volume
(otopcua-mssql-data -> /var/opt/mssql) so the migrated schema + seeded clusters
survive 'docker compose up' cycles.
2026-06-05 05:26:34 -04:00
Joseph Doherty 3be4e97b89 docs(glauth): dev/test LDAP is now the shared GLAuth on 10.100.0.35
v2-ci / build (push) Failing after 42s
v2-ci / unit-tests (tests/Core/ZB.MOM.WW.OtOpcUa.Cluster.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.ControlPlane.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Runtime.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Security.Tests) (push) Has been skipped
v2-ci / integration (tests/Server/ZB.MOM.WW.OtOpcUa.Host.IntegrationTests) (push) Has been skipped
v2-ci / integration (tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.IntegrationTests) (push) Has been skipped
docker-dev un-stubbed → binds zb-shared-glauth on 10.100.0.35:3893 (dc=zb,dc=local)
via cn=serviceaccount; sign in multi-role/password (group→role seeded by
seed-clusters.sql). Per-VM C:\publish\glauth + base DNs dc=lmxopcua/dc=otopcua
obsolete. Source of truth: scadaproj/infra/glauth/.
2026-06-04 16:38:22 -04:00
Joseph Doherty 1d7028c2f9 feat(auth): un-stub docker-dev onto shared GLAuth 10.100.0.35 + seed OtOpcUa-* role mappings
v2-ci / build (push) Failing after 43s
v2-ci / unit-tests (tests/Core/ZB.MOM.WW.OtOpcUa.Cluster.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.ControlPlane.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Runtime.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Security.Tests) (push) Has been skipped
v2-ci / integration (tests/Server/ZB.MOM.WW.OtOpcUa.Host.IntegrationTests) (push) Has been skipped
v2-ci / integration (tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.IntegrationTests) (push) Has been skipped
The 6 admin/site host containers drop DevStubMode and bind the shared dev GLAuth
(scadaproj/infra/glauth/, dc=zb,dc=local) via cn=serviceaccount. seed-clusters.sql
seeds system-wide LdapGroupRoleMapping rows OtOpcUa-Admins->Administrator,
OtOpcUa-Designers->Designer, OtOpcUa-Viewers->Viewer (bare-RDN group keys, matching
the shared lib's ToGroupShortName). Verified: multi-role -> Viewer+Designer+
Administrator at :9200 via real LDAP.
2026-06-04 16:06:43 -04:00
Joseph Doherty c3ae458a95 fix(config): let env vars override per-role appsettings overlay + correct dev-compose Ldap section
v2-ci / build (push) Failing after 46s
v2-ci / unit-tests (tests/Core/ZB.MOM.WW.OtOpcUa.Cluster.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.ControlPlane.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Runtime.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Security.Tests) (push) Has been skipped
v2-ci / integration (tests/Server/ZB.MOM.WW.OtOpcUa.Host.IntegrationTests) (push) Has been skipped
v2-ci / integration (tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.IntegrationTests) (push) Has been skipped
The per-role overlay (appsettings.{role}.json) was appended after WebApplicationBuilder's
default sources, so it outranked environment variables — a baked role file could not be
overridden by a deployment env var. In otopcua-dev this meant appsettings.admin.json's
Security:Ldap:DevStubMode=false beat the compose's DevStub override, so every AdminUI login
attempted a real LDAP bind against a non-existent server and failed with 'Unexpected
authentication error'.

- Program.cs: re-append AddEnvironmentVariables() + AddCommandLine(args) after the role
  overlay so deployment overrides keep top precedence (overlay still beats base appsettings).
- docker-dev/docker-compose.yml: the DevStub env var targeted the stale 'Authentication:Ldap'
  section; the code reads 'Security:Ldap'. Corrected the prefix on every host node (+ comment).

Dev AdminUI login now signs in as Administrator via the DevStub bypass.
2026-06-04 11:37:21 -04:00
Joseph Doherty a4fb97aef8 chore(docker-dev): remap Traefik to host port 9200
v2-ci / build (push) Failing after 2m6s
v2-ci / unit-tests (tests/Core/ZB.MOM.WW.OtOpcUa.Cluster.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.ControlPlane.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Runtime.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Security.Tests) (push) Has been skipped
v2-ci / integration (tests/Server/ZB.MOM.WW.OtOpcUa.Host.IntegrationTests) (push) Has been skipped
v2-ci / integration (tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.IntegrationTests) (push) Has been skipped
Host :80 collides with the sister scadabridge-traefik dev stack; bind the
OtOpcUa Traefik :80 entrypoint to host 9200 instead (admin UI now at
http://localhost:9200). Dashboard already on 8089 to avoid the same clash.
2026-05-29 12:09:21 -04:00
Joseph Doherty 7a0b8525a9 chore(docker-dev): rotate GALAXY_MXGW_API_KEY default to new credential
Replaces the old fallback (mxgw_otopcua_…UY_NKlBl3) with the freshly issued
mxgw_otopcua2_GI7-… on all 8 host services. Gateway endpoint stays at
http://10.100.0.48:5120 (seed-clusters.sql already points there). Operators
who set GALAXY_MXGW_API_KEY in their shell continue to override the default
unchanged.
2026-05-29 07:18:23 -04:00
Joseph Doherty 7dfbca6469 feat(opcua): materialise SystemPlatform tags (Galaxy) as OPC UA variables
v2-ci / build (push) Failing after 47s
v2-ci / unit-tests (tests/Core/ZB.MOM.WW.OtOpcUa.Cluster.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.ControlPlane.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Runtime.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Security.Tests) (push) Has been skipped
v2-ci / integration (tests/Server/ZB.MOM.WW.OtOpcUa.Host.IntegrationTests) (push) Has been skipped
v2-ci / integration (tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.IntegrationTests) (push) Has been skipped
Closes the gap where Tag rows with EquipmentId=NULL + Namespace.Kind=SystemPlatform
(Galaxy hierarchy) existed in ConfigDb but were never surfaced in the OPC UA
address space. Now they materialise as Variable nodes under a folder named for
their FolderPath, browseable through any OPC UA client.

Layers touched:

- IOpcUaAddressSpaceSink: new EnsureVariable(nodeId, parentFolderId, displayName,
  dataType) signature on the sink interface, NullSink, DeferredSink, SdkSink.
- OtOpcUaNodeManager.EnsureVariable: creates a BaseDataVariableState parented
  under the named folder (or root), initial Value=null +
  StatusCode=BadWaitingForInitialData; resolves Tag.DataType strings to the
  matching OPC UA built-in NodeId. Idempotent.
- Phase7CompositionResult: new GalaxyTags collection of GalaxyTagPlan records
  carrying (TagId, DriverInstanceId, FolderPath, DisplayName, DataType,
  MxAccessRef). Constructor overloads keep existing call sites compiling.
- Phase7Composer.Compose: now takes Tag + Namespace inputs, filters for
  SystemPlatform-namespace tags with EquipmentId=NULL, emits GalaxyTagPlan
  rows with MXAccess ref "FolderPath.Name".
- Phase7Plan: new AddedGalaxyTags / RemovedGalaxyTags / ChangedGalaxyTags
  collections + GalaxyTagDelta record; IsEmpty + needsRebuild updated.
- Phase7Planner.Compute: diffs GalaxyTags by TagId via existing DiffById helper.
- DeploymentArtifact.ParseComposition: reads the Tags + Namespaces +
  DriverInstances arrays the ConfigComposer already emits, applies the same
  SystemPlatform filter, returns the same GalaxyTagPlan list as the composer
  so artifact-side and compose-side plans agree.
- Phase7Applier: new MaterialiseGalaxyTags pass that ensures one folder per
  distinct FolderPath then one Variable per tag. NodeId for the variable is
  "<FolderPath>.<Name>" matching the MXAccess ref so the future Galaxy
  SubscribeBulk wiring can address them directly.
- OpcUaPublishActor.RebuildAddressSpace: invokes MaterialiseGalaxyTags after
  MaterialiseHierarchy. _lastApplied initialiser updated for the new ctor.
- seed-clusters.sql: pre-existing TestMachine_001.TestAlarm001..003 rows
  needed no change — the composer/applier now picks them up automatically.

Verified end-to-end via docker-dev: deploy click → driver-a logs
"Phase7Applier: Galaxy tags materialised (tags=3, folders=1)" → OPC UA Client
CLI browses the three Variable nodes under TestMachine_001 folder. Reads
return BadWaitingForInitialData status (expected — Galaxy driver's
SubscribeBulk wiring to push values into the nodes is the remaining
follow-up).
2026-05-26 15:43:22 -04:00
Joseph Doherty 44b8a9c7ff fix(deploy): ClusterNode NodeId uses host:port + Traefik sticky cookie
v2-ci / build (push) Failing after 41s
v2-ci / unit-tests (tests/Core/ZB.MOM.WW.OtOpcUa.Cluster.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.ControlPlane.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Runtime.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Security.Tests) (push) Has been skipped
v2-ci / integration (tests/Server/ZB.MOM.WW.OtOpcUa.Host.IntegrationTests) (push) Has been skipped
v2-ci / integration (tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.IntegrationTests) (push) Has been skipped
Two bring-up issues found while clicking through the operator Deploy flow
on the docker-dev stack:

- ConfigPublishCoordinator computes expected-ack NodeIds from
  Akka.Cluster.State.Members as "{host}:{port}" (e.g. "driver-a:4053") to
  match ClusterRoleInfo's NodeId derivation. The seed had been using the
  bare service name ("driver-a"), so NodeDeploymentState INSERT hit FK
  violation 547 on NodeDeploymentState.NodeId → ClusterNode.NodeId. Seed
  now writes the full host:port form for every ClusterNode row.

- Blazor Server uses SignalR (WebSocket upgrade after the initial GET).
  Without sticky sessions, Traefik round-robins admin-a/admin-b and the
  WebSocket upgrade lands on the wrong backend, returning "No Connection
  with that ID: Status code '404'" so @onclick handlers never fire on the
  client. Added sticky.cookie (otopcua_lb, SameSite=Lax) to all three
  Traefik service loadBalancers so each session pins to one node.

Verified end-to-end: clicked "Deploy current configuration" on
/deployments → Deployment row sealed in ~70ms → driver-a + driver-b
spawn GalaxyMxGateway driver (stub=False) → GalaxyDriver connects to
http://10.100.0.48:5120 with the seeded ApiKeySecretRef=env:GALAXY_MXGW_API_KEY.
2026-05-26 15:10:11 -04:00
Joseph Doherty 60beb9128e feat(deploy,runtime): wire mxaccessgw connection — endpoint, key, seed row
v2-ci / build (push) Failing after 37s
v2-ci / unit-tests (tests/Core/ZB.MOM.WW.OtOpcUa.Cluster.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.ControlPlane.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Runtime.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Security.Tests) (push) Has been skipped
v2-ci / integration (tests/Server/ZB.MOM.WW.OtOpcUa.Host.IntegrationTests) (push) Has been skipped
v2-ci / integration (tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.IntegrationTests) (push) Has been skipped
User confirmed the mxaccessgw client (Galaxy driver) doesn't need Windows
— only the gateway worker has that constraint. This wires the Galaxy
driver into the docker-dev fleet:

- docker-compose.yml: GALAXY_MXGW_API_KEY env var on every host service
  (admin nodes harmlessly ignore it; driver-role nodes pick it up when
  the seeded DriverInstance resolves ApiKeySecretRef=env:GALAXY_MXGW_API_KEY).
  Default value matches the key the operator provided; override via shell
  env (GALAXY_MXGW_API_KEY=... docker compose up -d) to rotate without
  editing compose.
- seed-clusters.sql: now creates a SystemPlatform Namespace
  (MAIN-galaxy, urn:zb:docker-dev:galaxy) plus a GalaxyMxGateway
  DriverInstance (MAIN-galaxy-mxgw) in the MAIN cluster pointing at
  http://10.100.0.48:5120 with UseTls=false. Idempotent via IF NOT EXISTS.
- DriverInstanceActor.ShouldStub: clarified the doc comment — only the
  legacy "Galaxy" type name and "Historian.Wonderware" are Windows-only;
  the v2 "GalaxyMxGateway" driver is .NET 10 cross-platform (gRPC to an
  external gateway) and is NOT stubbed.
- README: documents the final operator step — sign in, click "Deploy
  current configuration" on /deployments to materialise the seeded
  Galaxy driver into a running gRPC connection. Raw DriverInstance rows
  don't spawn drivers on their own; the v2 lifecycle requires a sealed
  Deployment first.
2026-05-26 14:58:02 -04:00
Joseph Doherty ed1c17bc7b fix(deploy,host): docker-dev bring-up — anon health probes, robust seeder
v2-ci / build (push) Failing after 32s
v2-ci / unit-tests (tests/Core/ZB.MOM.WW.OtOpcUa.Cluster.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.ControlPlane.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Runtime.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Security.Tests) (push) Has been skipped
v2-ci / integration (tests/Server/ZB.MOM.WW.OtOpcUa.Host.IntegrationTests) (push) Has been skipped
v2-ci / integration (tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.IntegrationTests) (push) Has been skipped
Two fixes surfaced while bringing up the docker-dev stack end-to-end:

- HealthEndpoints.MapOtOpcUaHealth now calls .AllowAnonymous() on /health/ready,
  /health/active, /healthz. Without it the AddOtOpcUaAuth fallback policy 401s
  every probe and Traefik marks every backend unhealthy → all three cluster
  routes return 503.

- cluster-seed entrypoint no longer attempts to apply Migrate-To-V2.sql via
  sqlcmd. The EF-generated idempotent script puts CREATE PROCEDURE inside
  IF NOT EXISTS BEGIN ... END blocks (procs must be first in their batch),
  so sqlcmd fails with "Must declare the scalar variable @FromGenerationId".
  EF's own runner handles this; sqlcmd doesn't. The seed now just waits for
  the schema and applies row inserts. Migrations remain the operator's job:
      dotnet ef database update --project src/Core/.../Configuration \
                                --startup-project src/Server/.../Host

Also: LDAP service removed (bitnami/openldap:2.6 image retired, legacy tag
crashes mid-setup with exit 68); every host now runs with
Authentication__Ldap__DevStubMode=true. Bumped LDAP+Traefik dashboard host
ports to avoid collisions with the sister scadalink dev stack (3893→3894,
8080→8089).

Confirmed working end-to-end: all three Traefik routes return HTTP 200,
cluster-seed populates ServerCluster (MAIN/SITE-A/SITE-B) + ClusterNode
(driver-a/b, site-a-1/2, site-b-1/2) rows on first boot.
2026-05-26 14:37:01 -04:00
Joseph Doherty f02071c9a2 feat(deploy): bake the ServerCluster/ClusterNode seed into docker-compose
Adds a one-shot cluster-seed service to docker-dev/docker-compose.yml
that pre-populates the three Akka clusters' scope rows in the shared
OtOpcUa ConfigDb so operators don't have to click through /clusters +
/hosts on every fresh bring-up.

Seed contents:
  ServerCluster   MAIN (Warm/2), SITE-A (Warm/2), SITE-B (Warm/2)
  ClusterNode     driver-a + driver-b  → MAIN
                  site-a-1 + site-a-2  → SITE-A
                  site-b-1 + site-b-2  → SITE-B

NodeCount + RedundancyMode honour the CK_ServerCluster check constraint.
ApplicationUri follows the urn:OtOpcUa:<NodeId> convention; uniqueness
across the fleet satisfies UX_ClusterNode_ApplicationUri.

Mechanism:
  - docker-dev/seed/seed-clusters.sql — idempotent INSERTs (IF NOT EXISTS
    guards on every row).
  - docker-dev/seed/entrypoint.sh — bash wrapper that waits for SQL to
    accept connections, then polls until dbo.ServerCluster exists (the
    host containers' EF auto-migration creates it on first boot), then
    applies the SQL script.
  - cluster-seed service uses mcr.microsoft.com/mssql-tools as the base
    image (bash + sqlcmd available), restart: "no" so it runs once.

Re-running `docker compose up` is safe: the seed exits cleanly on the
second run because every INSERT is guarded.

Manual re-seed: `docker compose run --rm cluster-seed`.
2026-05-26 14:06:47 -04:00
Joseph Doherty 993e012e55 fix(deploy): site clusters share the single OtOpcUa ConfigDb
The previous commit (961e094) gave each site cluster its own database
(OtOpcUa_SiteA / OtOpcUa_SiteB). That fights the architecture — ConfigDb
is multi-tenant by design: one schema with a ServerCluster table whose
rows scope the rest of the configuration via ClusterId. Per-cluster
databases would split the schema and force every singleton/coordinator
to point at a different connection string.

Correct model: one ConfigDb, three ServerCluster rows (MAIN / SITE-A /
SITE-B), each Akka cluster's ClusterNode rows pointing back at the
matching ClusterId. Akka mesh isolation is still enforced by the
disjoint seed-node lists (unchanged from the previous commit).

Compose: all eight host nodes now point at Server=sql,1433;Database=OtOpcUa
and the README documents the post-boot ServerCluster + ClusterNode rows
operators need to create via /clusters and /hosts before the runtime can
resolve its scope.
2026-05-26 14:02:24 -04:00
Joseph Doherty 961e09430a feat(deploy): add site-a + site-b 2-node clusters to docker-dev
Extends the docker-dev compose with two additional, fully-isolated Akka
clusters representing distinct sites. Each site is a 2-node fused
admin+driver cluster (OTOPCUA_ROLES=admin,driver on both nodes), backed
by its own ConfigDb database so configuration state stays separate from
the main cluster and from the other site.

Cluster isolation: the three meshes share the same Akka system name
"otopcua" and remoting port 4053 (inside each container's own network
namespace), but their seed-node lists are disjoint — main seeds at
admin-a, site-a seeds at site-a-1, site-b seeds at site-b-1 — so gossip
doesn't cross between them.

Layout:
  Main cluster   ConfigDb=OtOpcUa        admin-a, admin-b, driver-a, driver-b
  Site A         ConfigDb=OtOpcUa_SiteA  site-a-1, site-a-2 (fused admin+driver)
  Site B         ConfigDb=OtOpcUa_SiteB  site-b-1, site-b-2 (fused admin+driver)

OPC UA endpoints exposed on host ports 4840-4845. Admin UIs reachable
through Traefik via Host-header routing:
  http://localhost               → main cluster (PathPrefix default)
  http://site-a.localhost        → site A
  http://site-b.localhost        → site B

`*.localhost` auto-resolves on macOS; Linux users add the two hosts to
/etc/hosts (or rely on the resolver's RFC 6761 behaviour).
2026-05-26 13:59:23 -04:00
Joseph Doherty 7e3b56c27d feat(deploy): Traefik active-leader routing + docker-dev compose (Task 63)
- scripts/install/traefik.yml + traefik-dynamic.yml: Traefik static + dynamic
  config. One :80 entry point, one router on HostRegexp(otopcua.*), one
  service load-balancing admin-a:9000 + admin-b:9000 with /health/active health
  check (interval 5s, timeout 2s, expected 200). Followers return 503 from
  /health/active so Traefik drops them within the next interval after a
  leadership change.

- scripts/install/Install-Traefik.ps1: downloads Traefik for Windows, drops the
  yml configs, registers the OtOpcUaTraefik Windows service via sc.exe with
  restart-on-failure. Companion to Install-Services.ps1.

- docker-dev/{Dockerfile,docker-compose.yml,traefik-dynamic.yml,README.md}:
  Mac-friendly four-node fleet (admin-a + admin-b + driver-a + driver-b) plus
  SQL Server 2022 + OpenLDAP + Traefik. Single OtOpcUa.Host image built once;
  Compose drives OTOPCUA_ROLES + Cluster:* per container to differentiate the
  four hosts. README walks through bring-up + failover smoke + the dev LDAP
  users.

Note: untested on macOS (no local Docker — see docs/v2/dev-environment.md).
2026-05-26 06:46:40 -04:00