Initial design docs from claude.ai refinement sessions
This commit is contained in:
69
Component-SiteEventLogging.md
Normal file
69
Component-SiteEventLogging.md
Normal file
@@ -0,0 +1,69 @@
|
||||
# Component: Site Event Logging
|
||||
|
||||
## Purpose
|
||||
|
||||
The Site Event Logging component records operational events at each site cluster, providing a local audit trail of runtime activity. Events are queryable from the central UI for remote troubleshooting.
|
||||
|
||||
## Location
|
||||
|
||||
Site clusters (event recording and storage). Central cluster (remote query access via UI).
|
||||
|
||||
## Responsibilities
|
||||
|
||||
- Record operational events from all site subsystems.
|
||||
- Persist events to local SQLite.
|
||||
- Enforce 30-day retention policy with automatic purging.
|
||||
- Respond to remote queries from central for event log data.
|
||||
|
||||
## Events Logged
|
||||
|
||||
| Category | Events |
|
||||
|----------|--------|
|
||||
| Script Executions | Script started, completed, failed (with error details), recursion limit exceeded |
|
||||
| Alarm Events | Alarm activated, alarm cleared (which alarm, which instance), alarm evaluation error |
|
||||
| Deployment Events | Configuration received from central, scripts compiled, applied successfully, apply failed |
|
||||
| Data Connection Status | Connected, disconnected, reconnected (per connection) |
|
||||
| Store-and-Forward | Message queued, delivered, retried, parked |
|
||||
| Instance Lifecycle | Instance enabled, disabled, deleted |
|
||||
|
||||
## Event Entry Schema
|
||||
|
||||
Each event entry contains:
|
||||
- **Timestamp**: When the event occurred.
|
||||
- **Event Type**: Category of the event (script, alarm, deployment, connection, store-and-forward, instance-lifecycle).
|
||||
- **Severity**: Info, Warning, or Error.
|
||||
- **Instance ID** *(optional)*: The instance associated with the event (if applicable).
|
||||
- **Source**: The subsystem that generated the event (e.g., "ScriptActor:MonitorSpeed", "AlarmActor:OverTemp", "DataConnection:PLC1").
|
||||
- **Message**: Human-readable description of the event.
|
||||
- **Details** *(optional)*: Additional structured data (e.g., exception stack trace, alarm name, message ID, compilation errors).
|
||||
|
||||
## Storage
|
||||
|
||||
- Events are stored in **local SQLite** on each site node.
|
||||
- Each node maintains its own event log (the active node generates events; the standby node generates minimal events related to replication).
|
||||
- **Retention**: 30 days. A background job automatically purges events older than 30 days.
|
||||
|
||||
## Central Access
|
||||
|
||||
- The central UI can query site event logs remotely via the Communication Layer.
|
||||
- Queries support filtering by:
|
||||
- Event type / category
|
||||
- Time range
|
||||
- Instance ID
|
||||
- Severity
|
||||
- The site processes the query locally and returns matching results to central.
|
||||
|
||||
## Dependencies
|
||||
|
||||
- **SQLite**: Local storage on each site node.
|
||||
- **Communication Layer**: Handles remote query requests from central.
|
||||
- **Site Runtime**: Generates script execution events, alarm events, deployment application events, and instance lifecycle events.
|
||||
- **Data Connection Layer**: Generates connection status events.
|
||||
- **Store-and-Forward Engine**: Generates buffer activity events.
|
||||
|
||||
## Interactions
|
||||
|
||||
- **All site subsystems**: Event logging is a cross-cutting concern — any subsystem that produces notable events calls the Event Logging service.
|
||||
- **Communication Layer**: Receives remote queries from central and returns results.
|
||||
- **Central UI**: Site Event Log Viewer displays queried events.
|
||||
- **Health Monitoring**: Script error rates and alarm evaluation error rates can be derived from event log data.
|
||||
Reference in New Issue
Block a user