Commit Graph

8 Commits

Author SHA1 Message Date
Joseph Doherty
9c6e3c2e56 feat: add CLI debug snapshot command for one-shot instance state inspection
Adds `debug snapshot --id <int>` to query a running instance's current
attribute values and alarm states without the subscribe/stream overhead
of the debug view. Routes through ManagementActor → CommunicationService
→ site DeploymentManager → InstanceActor using the existing remote query
pattern.
2026-03-18 07:16:22 -04:00
Joseph Doherty
4f22ca2b1f feat: replace ActorSelection with ClusterClient for inter-cluster communication
Central and site clusters now communicate via ClusterClient/
ClusterClientReceptionist instead of direct ActorSelection. Both
CentralCommunicationActor and SiteCommunicationActor are registered
with their cluster's receptionist. Central creates one ClusterClient
per site using NodeA/NodeB contact points from the DB. Sites configure
multiple CentralContactPoints for automatic failover between central
nodes. ISiteClientFactory enables test injection.
2026-03-18 00:08:47 -04:00
Joseph Doherty
9e97c1acd2 feat: replace site registration with database-driven site addressing
Central now resolves site Akka remoting addresses from the Sites DB table
(NodeAAddress/NodeBAddress) instead of relying on runtime RegisterSite
messages. Eliminates the race condition where sites starting before central
had their registration dead-lettered. Addresses are cached in
CentralCommunicationActor with 60s periodic refresh and on-demand refresh
when sites are added/edited/deleted via UI or CLI.
2026-03-17 23:13:10 -04:00
Joseph Doherty
50dad61e72 Add Management Service and CLI components (design docs)
New components 18-19: ManagementService (Akka.NET actor on Central exposing
all admin operations via ClusterClientReceptionist) and CLI (console app using
ClusterClient for scripting). Updated HighLevelReqs, CLAUDE.md, README,
Component-Host, Component-Communication, Component-Security.
2026-03-17 14:28:02 -04:00
Joseph Doherty
2f3e0ceecb feat: include data connections and SMTP in artifact deployment 2026-03-17 13:48:52 -04:00
Joseph Doherty
34694adba2 Apply Codex review findings across all 17 components
Template Engine: add composed member addressing (path-qualified canonical names),
override granularity per entity type, semantic validation (call targets, arg types),
graph acyclicity enforcement, revision hashes for flattened configs.

Deployment Manager: add deployment ID + idempotency, per-instance operation lock
covering all mutating commands, state transition matrix, site-side apply atomicity
(all-or-nothing), artifact version compatibility policy.

Site Runtime: add script trust model (forbidden APIs, execution timeout, constrained
compilation), concurrency/serialization rules (Instance Actor serializes mutations),
site-wide stream backpressure (per-subscriber buffering, fire-and-forget publish).

Communication: add application-level correlation IDs for protocol safety beyond
Akka.NET transport guarantees.

External System Gateway: add 408/429 as transient errors, CachedCall idempotency
note, dedicated dispatcher for blocking I/O isolation.

Health Monitoring: add monotonic sequence numbers to prevent stale report overwrites.

Security: require LDAPS/StartTLS for LDAP connections.

Central UI: add failover behavior (SignalR reconnect, JWT survives, shared Data
Protection keys, load balancer readiness).

Cluster Infrastructure: add down-if-alone=on for safe singleton ownership.

Site Event Logging: clarify active-node-only logging (no replication), add 1GB
storage cap with oldest-first purge.

Host: add readiness gating (health check endpoint, no traffic until operational).

Commons: add message contract versioning policy (additive-only evolution).

Configuration Database: add optimistic concurrency on deployment status records.
2026-03-16 09:06:12 -04:00
Joseph Doherty
bd735de8c4 Refine Communication Layer: timeouts, transport config, ordering, failure behavior
Add per-pattern message timeouts with sensible defaults (120s for deployments, 30s
for queries/commands). Configure Akka.NET transport heartbeat explicitly rather than
relying on framework defaults. Document per-site message ordering guarantee. Specify
that in-flight messages on disconnect result in timeout error (no central buffering)
and debug streams die on any disconnect.
2026-03-16 08:04:06 -04:00
Joseph Doherty
1944f94fed Initial design docs from claude.ai refinement sessions 2026-03-16 07:39:26 -04:00