44b8a9c7ff746909b2f5cc72b0bd8f6ccbec370a
4 Commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
44b8a9c7ff |
fix(deploy): ClusterNode NodeId uses host:port + Traefik sticky cookie
Some checks failed
v2-ci / build (push) Failing after 41s
v2-ci / unit-tests (tests/Core/ZB.MOM.WW.OtOpcUa.Cluster.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.ControlPlane.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Runtime.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Security.Tests) (push) Has been skipped
v2-ci / integration (tests/Server/ZB.MOM.WW.OtOpcUa.Host.IntegrationTests) (push) Has been skipped
v2-ci / integration (tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.IntegrationTests) (push) Has been skipped
Two bring-up issues found while clicking through the operator Deploy flow
on the docker-dev stack:
- ConfigPublishCoordinator computes expected-ack NodeIds from
Akka.Cluster.State.Members as "{host}:{port}" (e.g. "driver-a:4053") to
match ClusterRoleInfo's NodeId derivation. The seed had been using the
bare service name ("driver-a"), so NodeDeploymentState INSERT hit FK
violation 547 on NodeDeploymentState.NodeId → ClusterNode.NodeId. Seed
now writes the full host:port form for every ClusterNode row.
- Blazor Server uses SignalR (WebSocket upgrade after the initial GET).
Without sticky sessions, Traefik round-robins admin-a/admin-b and the
WebSocket upgrade lands on the wrong backend, returning "No Connection
with that ID: Status code '404'" so @onclick handlers never fire on the
client. Added sticky.cookie (otopcua_lb, SameSite=Lax) to all three
Traefik service loadBalancers so each session pins to one node.
Verified end-to-end: clicked "Deploy current configuration" on
/deployments → Deployment row sealed in ~70ms → driver-a + driver-b
spawn GalaxyMxGateway driver (stub=False) → GalaxyDriver connects to
http://10.100.0.48:5120 with the seeded ApiKeySecretRef=env:GALAXY_MXGW_API_KEY.
|
||
|
|
60beb9128e |
feat(deploy,runtime): wire mxaccessgw connection — endpoint, key, seed row
Some checks failed
v2-ci / build (push) Failing after 37s
v2-ci / unit-tests (tests/Core/ZB.MOM.WW.OtOpcUa.Cluster.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.ControlPlane.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Runtime.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Security.Tests) (push) Has been skipped
v2-ci / integration (tests/Server/ZB.MOM.WW.OtOpcUa.Host.IntegrationTests) (push) Has been skipped
v2-ci / integration (tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.IntegrationTests) (push) Has been skipped
User confirmed the mxaccessgw client (Galaxy driver) doesn't need Windows — only the gateway worker has that constraint. This wires the Galaxy driver into the docker-dev fleet: - docker-compose.yml: GALAXY_MXGW_API_KEY env var on every host service (admin nodes harmlessly ignore it; driver-role nodes pick it up when the seeded DriverInstance resolves ApiKeySecretRef=env:GALAXY_MXGW_API_KEY). Default value matches the key the operator provided; override via shell env (GALAXY_MXGW_API_KEY=... docker compose up -d) to rotate without editing compose. - seed-clusters.sql: now creates a SystemPlatform Namespace (MAIN-galaxy, urn:zb:docker-dev:galaxy) plus a GalaxyMxGateway DriverInstance (MAIN-galaxy-mxgw) in the MAIN cluster pointing at http://10.100.0.48:5120 with UseTls=false. Idempotent via IF NOT EXISTS. - DriverInstanceActor.ShouldStub: clarified the doc comment — only the legacy "Galaxy" type name and "Historian.Wonderware" are Windows-only; the v2 "GalaxyMxGateway" driver is .NET 10 cross-platform (gRPC to an external gateway) and is NOT stubbed. - README: documents the final operator step — sign in, click "Deploy current configuration" on /deployments to materialise the seeded Galaxy driver into a running gRPC connection. Raw DriverInstance rows don't spawn drivers on their own; the v2 lifecycle requires a sealed Deployment first. |
||
|
|
ed1c17bc7b |
fix(deploy,host): docker-dev bring-up — anon health probes, robust seeder
Some checks failed
v2-ci / build (push) Failing after 32s
v2-ci / unit-tests (tests/Core/ZB.MOM.WW.OtOpcUa.Cluster.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.ControlPlane.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Runtime.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Security.Tests) (push) Has been skipped
v2-ci / integration (tests/Server/ZB.MOM.WW.OtOpcUa.Host.IntegrationTests) (push) Has been skipped
v2-ci / integration (tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.IntegrationTests) (push) Has been skipped
Two fixes surfaced while bringing up the docker-dev stack end-to-end:
- HealthEndpoints.MapOtOpcUaHealth now calls .AllowAnonymous() on /health/ready,
/health/active, /healthz. Without it the AddOtOpcUaAuth fallback policy 401s
every probe and Traefik marks every backend unhealthy → all three cluster
routes return 503.
- cluster-seed entrypoint no longer attempts to apply Migrate-To-V2.sql via
sqlcmd. The EF-generated idempotent script puts CREATE PROCEDURE inside
IF NOT EXISTS BEGIN ... END blocks (procs must be first in their batch),
so sqlcmd fails with "Must declare the scalar variable @FromGenerationId".
EF's own runner handles this; sqlcmd doesn't. The seed now just waits for
the schema and applies row inserts. Migrations remain the operator's job:
dotnet ef database update --project src/Core/.../Configuration \
--startup-project src/Server/.../Host
Also: LDAP service removed (bitnami/openldap:2.6 image retired, legacy tag
crashes mid-setup with exit 68); every host now runs with
Authentication__Ldap__DevStubMode=true. Bumped LDAP+Traefik dashboard host
ports to avoid collisions with the sister scadalink dev stack (3893→3894,
8080→8089).
Confirmed working end-to-end: all three Traefik routes return HTTP 200,
cluster-seed populates ServerCluster (MAIN/SITE-A/SITE-B) + ClusterNode
(driver-a/b, site-a-1/2, site-b-1/2) rows on first boot.
|
||
|
|
f02071c9a2 |
feat(deploy): bake the ServerCluster/ClusterNode seed into docker-compose
Adds a one-shot cluster-seed service to docker-dev/docker-compose.yml
that pre-populates the three Akka clusters' scope rows in the shared
OtOpcUa ConfigDb so operators don't have to click through /clusters +
/hosts on every fresh bring-up.
Seed contents:
ServerCluster MAIN (Warm/2), SITE-A (Warm/2), SITE-B (Warm/2)
ClusterNode driver-a + driver-b → MAIN
site-a-1 + site-a-2 → SITE-A
site-b-1 + site-b-2 → SITE-B
NodeCount + RedundancyMode honour the CK_ServerCluster check constraint.
ApplicationUri follows the urn:OtOpcUa:<NodeId> convention; uniqueness
across the fleet satisfies UX_ClusterNode_ApplicationUri.
Mechanism:
- docker-dev/seed/seed-clusters.sql — idempotent INSERTs (IF NOT EXISTS
guards on every row).
- docker-dev/seed/entrypoint.sh — bash wrapper that waits for SQL to
accept connections, then polls until dbo.ServerCluster exists (the
host containers' EF auto-migration creates it on first boot), then
applies the SQL script.
- cluster-seed service uses mcr.microsoft.com/mssql-tools as the base
image (bash + sqlcmd available), restart: "no" so it runs once.
Re-running `docker compose up` is safe: the seed exits cleanly on the
second run because every INSERT is guarded.
Manual re-seed: `docker compose run --rm cluster-seed`.
|