Commit Graph

10 Commits

Author SHA1 Message Date
Joseph Doherty
7a0bd0f701 Create implementation plan generation framework
generate_plans.md: Master plan defining 10 phases (0, 1, 2, 3A, 3B, 3C, 4-8)
with component assignments, sub-tasks, testable outcomes, and HighLevelReqs
coverage. Phase 3 split into 3A (runtime foundation + failover), 3B (site I/O
+ observability), 3C (deployment pipeline + S&F) per Codex review. Failover
testing embedded in runtime phases, not deferred to hardening.

requirements-traceability.md: Full matrix mapping all 54 HighLevelReqs sections
and 22 REQ-* identifiers to implementation phases. Zero unmapped requirements.

questions.md: 15 open questions requiring follow-up before/during implementation
(tooling, environments, team, integration targets).
2026-03-16 09:59:23 -04:00
Joseph Doherty
a540912782 Refine Notification Service: SMTP config, OAuth2, delivery behavior, error handling
Expand SMTP configuration with OAuth2 Client Credentials support for Microsoft 365,
connection timeout, and max concurrent connections. Single email per send with all
recipients in BCC. Plain text only. Classify SMTP errors: transient (4xx/connection)
to S&F, permanent (5xx) returned to script. No app-level rate limiting.
2026-03-16 08:38:38 -04:00
Joseph Doherty
cd03b77913 Refine Inbound API: HTTP contract, extended types, logging, rate limiting
Define POST /api/{methodName} URL structure with X-API-Key header. Flat JSON
request/response with no envelope wrapper. Add extended type system (Object, List)
for complex API parameters and return values, applied to both Inbound API and
External System Gateway method definitions. Only failures logged; no rate limiting
in this controlled industrial environment.
2026-03-16 08:26:04 -04:00
Joseph Doherty
cbc78465e0 Refine Security & Auth: LDAP bind, JWT sessions, idle timeout, failure handling
Replace Windows Integrated Auth with direct LDAP bind (username/password login form).
Add JWT-based sessions with HMAC-SHA256 shared key for load balancer compatibility.
15-minute token refresh re-queries LDAP for current group memberships. 30-minute
configurable idle timeout. LDAP failure: new logins fail, active sessions continue
with current roles until LDAP recovers.
2026-03-16 08:16:29 -04:00
Joseph Doherty
57eae0c1db Refine Health Monitoring: timing defaults, offline detection, error rate calculation
Set 30-second report interval with 60-second absolute timeout for offline detection.
Define error rates as raw counts per interval (reset after each report). Script errors
include all failure types. Automatic online recovery on first received report. Flat
snapshot report structure.
2026-03-16 08:10:16 -04:00
Joseph Doherty
3dd62adf42 Refine Cluster Infrastructure: split-brain, seed nodes, failure detection, dual recovery
Add keep-oldest split-brain resolver with 15s stable-after duration. Configure both
nodes as seed nodes for symmetric startup. Set moderate failure detection defaults
(2s heartbeat, 10s threshold, ~25s total failover). Document automatic dual-node
recovery from persistent storage with no manual intervention.
2026-03-16 08:07:28 -04:00
Joseph Doherty
bd735de8c4 Refine Communication Layer: timeouts, transport config, ordering, failure behavior
Add per-pattern message timeouts with sensible defaults (120s for deployments, 30s
for queries/commands). Configure Akka.NET transport heartbeat explicitly rather than
relying on framework defaults. Document per-site message ordering guarantee. Specify
that in-flight messages on disconnect result in timeout error (no central buffering)
and debug streams die on any disconnect.
2026-03-16 08:04:06 -04:00
Joseph Doherty
1ef316f32c Add dual call modes for external systems: synchronous Call() and cached CachedCall()
Scripts now choose per invocation whether an external system call is synchronous
(all failures return to script) or cached (transient failures go to store-and-forward).
Mirrors the existing Database.Connection/CachedWrite pattern. Updated ESG, Site
Runtime script API, high-level requirements, and design doc.
2026-03-16 08:00:20 -04:00
Joseph Doherty
5fff1712a8 Refine External System Gateway: protocol, auth, timeouts, error classification
Specify HTTP/REST with JSON as the invocation protocol. Add API key and Basic Auth
as outbound authentication modes. Add per-system call timeouts. Classify errors by
HTTP status for store-and-forward decisions (5xx/transient → retry, 4xx → permanent
error to script). Document ADO.NET connection pooling for database connections.
Update Store-and-Forward to clarify transient-only buffering.
2026-03-16 07:57:00 -04:00
Joseph Doherty
19c7e6880f Refine Data Connection Layer: error handling, reconnection, write failures, health reporting
Add connection lifecycle (fixed-interval auto-reconnect, immediate bad quality on
disconnect, transparent re-subscribe), synchronous write failure errors to scripts,
periodic tag path resolution retry, and enhanced health reporting with tag resolution
counts. Update cross-references in Health Monitoring and Site Runtime.
2026-03-16 07:51:37 -04:00