# Component: Site Event Logging ## Purpose The Site Event Logging component records operational events at each site cluster, providing a local audit trail of runtime activity. Events are queryable from the central UI for remote troubleshooting. ## Location Site clusters (event recording and storage). Central cluster (remote query access via UI). ## Responsibilities - Record operational events from all site subsystems. - Persist events to local SQLite. - Enforce 30-day retention policy with automatic purging. - Respond to remote queries from central for event log data. ## Events Logged | Category | Events | |----------|--------| | Script Executions | Script started, completed, failed (with error details), recursion limit exceeded | | Alarm Events | Alarm activated, alarm cleared (which alarm, which instance), alarm evaluation error | | Deployment Events | Configuration received from central, scripts compiled, applied successfully, apply failed | | Data Connection Status | Connected, disconnected, reconnected (per connection) | | Store-and-Forward | Message queued, delivered, retried, parked | | Instance Lifecycle | Instance enabled, disabled, deleted | ## Event Entry Schema Each event entry contains: - **Timestamp**: When the event occurred. - **Event Type**: Category of the event (script, alarm, deployment, connection, store-and-forward, instance-lifecycle). - **Severity**: Info, Warning, or Error. - **Instance ID** *(optional)*: The instance associated with the event (if applicable). - **Source**: The subsystem that generated the event (e.g., "ScriptActor:MonitorSpeed", "AlarmActor:OverTemp", "DataConnection:PLC1"). - **Message**: Human-readable description of the event. - **Details** *(optional)*: Additional structured data (e.g., exception stack trace, alarm name, message ID, compilation errors). ## Storage - Events are stored in **local SQLite** on each site node. - Each node maintains its own event log. Only the **active node** generates and stores events. Event logs are **not replicated** to the standby node. On failover, the new active node starts logging to its own SQLite database; historical events from the previous active node are no longer queryable via central until that node comes back online. This is acceptable because event logs are diagnostic, not transactional. - **Retention**: 30 days. A **daily background job** runs on the active node and deletes all events older than 30 days. Hard delete — no archival. - **Storage cap**: A configurable maximum database size (default: 1 GB) is enforced. If the storage cap is reached before the 30-day retention window, the oldest events are purged first. This prevents disk exhaustion from alarm storms, script failure loops, or connection flapping. ## Central Access - The central UI can query site event logs remotely via the Communication Layer. - Queries support filtering by: - Event type / category - Time range - Instance ID - Severity - **Keyword search**: Free-text search on message and source fields (SQLite LIKE query). Useful for finding events by script name, alarm name, or error message across all instances. - Results are **paginated** with a configurable page size (default: 500 events). Each response includes a continuation token for fetching additional pages. This prevents broad queries from overwhelming the communication channel. - The site processes the query locally against SQLite and returns matching results to central. ## Dependencies - **SQLite**: Local storage on each site node. - **Communication Layer**: Handles remote query requests from central. - **Site Runtime**: Generates script execution events, alarm events, deployment application events, and instance lifecycle events. - **Data Connection Layer**: Generates connection status events. - **Store-and-Forward Engine**: Generates buffer activity events. ## Interactions - **All site subsystems**: Event logging is a cross-cutting concern — any subsystem that produces notable events calls the Event Logging service. - **Communication Layer**: Receives remote queries from central and returns results. - **Central UI**: Site Event Log Viewer displays queried events. - **Health Monitoring**: Script error rates and alarm evaluation error rates can be derived from event log data.