Files
scadalink-design/docs/requirements/Component-SiteEventLogging.md
Joseph Doherty d91aa83665 refactor(docs): move requirements and test infra docs into docs/ subdirectories
Organize documentation by moving requirements (HighLevelReqs, Component-*,
lmxproxy_protocol) to docs/requirements/ and test infrastructure docs to
docs/test_infra/. Updates all cross-references in README, CLAUDE.md,
infra/README, component docs, and 23 plan files.
2026-03-21 01:11:35 -04:00

4.2 KiB

Component: Site Event Logging

Purpose

The Site Event Logging component records operational events at each site cluster, providing a local audit trail of runtime activity. Events are queryable from the central UI for remote troubleshooting.

Location

Site clusters (event recording and storage). Central cluster (remote query access via UI).

Responsibilities

  • Record operational events from all site subsystems.
  • Persist events to local SQLite.
  • Enforce 30-day retention policy with automatic purging.
  • Respond to remote queries from central for event log data.

Events Logged

Category Events
Script Executions Script started, completed, failed (with error details), recursion limit exceeded
Alarm Events Alarm activated, alarm cleared (which alarm, which instance), alarm evaluation error
Deployment Events Configuration received from central, scripts compiled, applied successfully, apply failed
Data Connection Status Connected, disconnected, reconnected (per connection)
Store-and-Forward Message queued, delivered, retried, parked
Instance Lifecycle Instance enabled, disabled, deleted

Event Entry Schema

Each event entry contains:

  • Timestamp: When the event occurred.
  • Event Type: Category of the event (script, alarm, deployment, connection, store-and-forward, instance-lifecycle).
  • Severity: Info, Warning, or Error.
  • Instance ID (optional): The instance associated with the event (if applicable).
  • Source: The subsystem that generated the event (e.g., "ScriptActor:MonitorSpeed", "AlarmActor:OverTemp", "DataConnection:PLC1").
  • Message: Human-readable description of the event.
  • Details (optional): Additional structured data (e.g., exception stack trace, alarm name, message ID, compilation errors).

Storage

  • Events are stored in local SQLite on each site node.
  • Each node maintains its own event log. Only the active node generates and stores events. Event logs are not replicated to the standby node. On failover, the new active node starts logging to its own SQLite database; historical events from the previous active node are no longer queryable via central until that node comes back online. This is acceptable because event logs are diagnostic, not transactional.
  • Retention: 30 days. A daily background job runs on the active node and deletes all events older than 30 days. Hard delete — no archival.
  • Storage cap: A configurable maximum database size (default: 1 GB) is enforced. If the storage cap is reached before the 30-day retention window, the oldest events are purged first. This prevents disk exhaustion from alarm storms, script failure loops, or connection flapping.

Central Access

  • The central UI can query site event logs remotely via the Communication Layer.
  • Queries support filtering by:
    • Event type / category
    • Time range
    • Instance ID
    • Severity
    • Keyword search: Free-text search on message and source fields (SQLite LIKE query). Useful for finding events by script name, alarm name, or error message across all instances.
  • Results are paginated with a configurable page size (default: 500 events). Each response includes a continuation token for fetching additional pages. This prevents broad queries from overwhelming the communication channel.
  • The site processes the query locally against SQLite and returns matching results to central.

Dependencies

  • SQLite: Local storage on each site node.
  • Communication Layer: Handles remote query requests from central.
  • Site Runtime: Generates script execution events, alarm events, deployment application events, and instance lifecycle events.
  • Data Connection Layer: Generates connection status events.
  • Store-and-Forward Engine: Generates buffer activity events.

Interactions

  • All site subsystems: Event logging is a cross-cutting concern — any subsystem that produces notable events calls the Event Logging service.
  • Communication Layer: Receives remote queries from central and returns results.
  • Central UI: Site Event Log Viewer displays queried events.
  • Health Monitoring: Script error rates and alarm evaluation error rates can be derived from event log data.