Log Analyzer

Automatically analyze application logs to detect errors, anomalies, and performance issues.

This workflow parses logs, identifies patterns, correlates errors, and generates actionable insights.

Implementation

1import { relay } from "@relayplane/workflows";
2
3const result = await relay
4  .workflow("log-analyzer")
5
6  // Step 1: Parse and categorize log entries
7  .step("parse-logs")
8  .with("openai:gpt-4o")
9  .prompt(`Parse and categorize these application logs:
10
11{{logEntries}}
12
13For each unique error or pattern:
14- Error type/category
15- Frequency count
16- First occurrence timestamp
17- Severity (critical/error/warning/info)
18- Affected service/component
19- Common error messages
20
21Return structured summary grouped by error type.`)
22
23  // Step 2: Identify root causes
24  .step("root-cause")
25  .with("anthropic:claude-3.5-sonnet")
26  .depends("parse-logs")
27  .prompt(`Analyze error patterns for root causes:
28
29Parsed Logs: {{parse-logs.output}}
30
31For top 5 most frequent errors:
32- Likely root cause
33- Evidence from log patterns
34- Related errors (correlation)
35- Time patterns (spike times, recurring intervals)
36- Impacted user flows
37
38Use stack traces and error messages to infer causation.`)
39
40  // Step 3: Detect anomalies
41  .step("detect-anomalies")
42  .with("openai:gpt-4o")
43  .depends("parse-logs")
44  .prompt(`Detect anomalies in log patterns:
45
46Log Summary: {{parse-logs.output}}
47Baseline Stats: {{baselineStats}}
48
49Identify:
50- Error rate spikes (>2σ from baseline)
51- New error types (not seen in baseline)
52- Performance degradation (slow queries, timeouts)
53- Unusual traffic patterns
54- Failed authentication spikes
55
56For each anomaly:
57- What changed
58- When it started
59- Potential impact
60- Urgency level`)
61
62  // Step 4: Extract performance insights
63  .step("performance-analysis")
64  .with("anthropic:claude-3.5-sonnet")
65  .depends("parse-logs")
66  .prompt(`Analyze performance from logs:
67
68{{logEntries}}
69
70Extract metrics:
71- Average response times by endpoint
72- Slow queries (>1s)
73- Database connection pool exhaustion
74- Memory usage warnings
75- Cache hit/miss rates
76
77Identify bottlenecks and optimization opportunities.`)
78
79  // Step 5: Generate incident report
80  .step("incident-report")
81  .with("anthropic:claude-3.5-sonnet")
82  .depends("parse-logs", "root-cause", "detect-anomalies", "performance-analysis")
83  .prompt(`Create incident analysis report:
84
85Errors: {{parse-logs.output}}
86Root Causes: {{root-cause.output}}
87Anomalies: {{detect-anomalies.output}}
88Performance: {{performance-analysis.output}}
89
90Structure:
91# Log Analysis Report - {{timeRange}}
92
93## 🚨 Critical Issues
94- Errors requiring immediate attention
95
96## 📊 Error Summary
97- Top 10 errors by frequency
98- Error rate trends
99
100## 🔍 Root Cause Analysis
101- Primary issues identified
102- Correlation patterns
103
104## ⚡ Performance Insights
105- Slow operations
106- Resource constraints
107
108## 💡 Recommended Actions
109- Prioritized fixes
110- Monitoring improvements
111
112Use specific log excerpts as evidence.`)
113
114  .run({
115    logEntries: logs,
116    baselineStats: {
117      avgErrorRate: 0.05,
118      avgResponseTime: 250,
119      typicalErrorTypes: ["validation_error", "not_found"],
120    },
121    timeRange: "Last 24 hours",
122  });
123
124// Send alerts for critical issues
125const report = result.steps["incident-report"].output;
126if (report.includes("CRITICAL")) {
127  await sendPagerDuty({
128    severity: "critical",
129    title: "Critical errors detected in logs",
130    details: report,
131  });
132}

Real-time Log Monitoring

1import { relay } from "@relayplane/workflows";
2import { createReadStream } from "fs";
3import { createInterface } from "readline";
4
5// Stream logs and analyze in batches
6async function monitorLogs(logFile: string) {
7  const logBuffer: string[] = [];
8  const BATCH_SIZE = 1000;
9
10  const fileStream = createReadStream(logFile);
11  const rl = createInterface({
12    input: fileStream,
13    crlfDelay: Infinity,
14  });
15
16  for await (const line of rl) {
17    logBuffer.push(line);
18
19    if (logBuffer.length >= BATCH_SIZE) {
20      // Analyze batch
21      const analysis = await relay
22        .workflow("log-analyzer")
23        .run({
24          logEntries: logBuffer.join("\n"),
25          baselineStats: await getBaseline(),
26          timeRange: "Last hour",
27        });
28
29      // Check for anomalies
30      const anomalies = analysis.steps["detect-anomalies"].output;
31      if (anomalies.includes("CRITICAL")) {
32        await alertOnCall(anomalies);
33      }
34
35      // Clear buffer
36      logBuffer.length = 0;
37    }
38  }
39}
40
41// Run continuously
42setInterval(() => monitorLogs("/var/log/app.log"), 60000);

Integration with Log Aggregators

1// DataDog integration
2import { relay } from "@relayplane/workflows";
3
4async function analyzeDDLogs() {
5  const logs = await datadogClient.logs.list({
6    filter: {
7      query: "status:error",
8      from: new Date(Date.now() - 3600000), // Last hour
9      to: new Date(),
10    },
11    page: { limit: 5000 },
12  });
13
14  const analysis = await relay
15    .workflow("log-analyzer")
16    .run({
17      logEntries: logs.data.map(l => JSON.stringify(l.attributes)).join("\n"),
18      baselineStats: await getHourlyBaseline(),
19    });
20
21  // Post summary to Slack
22  await postToSlack({
23    channel: "#engineering-alerts",
24    text: analysis.steps["incident-report"].output,
25  });
26}
27
28// CloudWatch Logs
29import { CloudWatchLogsClient, FilterLogEventsCommand } from "@aws-sdk/client-cloudwatch-logs";
30
31async function analyzeCloudWatchLogs(logGroup: string) {
32  const client = new CloudWatchLogsClient({});
33  const command = new FilterLogEventsCommand({
34    logGroupName: logGroup,
35    startTime: Date.now() - 3600000,
36    filterPattern: "ERROR",
37  });
38
39  const response = await client.send(command);
40  const logs = response.events?.map(e => e.message).join("\n") || "";
41
42  return await relay.workflow("log-analyzer").run({ logEntries: logs });
43}

Sample Output

# Log Analysis Report - Last 24 hours ## 🚨 Critical Issues 1. **Database Connection Pool Exhausted** (247 occurrences) - Started: 14:32 UTC - Cause: Connection leak in user service - Impact: 15% of requests failing - Action: Restart service, patch connection handling 2. **Authentication Service Down** (89 occurrences) - Started: 18:15 UTC (ongoing) - Cause: Downstream auth0 timeout - Impact: All logins failing - Action: Enable fallback auth, contact Auth0 ## 📊 Error Summary - Total errors: 1,247 (+340% vs baseline) - Error rate: 2.1% (baseline: 0.05%) - New error types: 3 - Top errors: 1. ConnectionPoolExhausted (247) 2. AuthenticationTimeout (89) 3. RateLimitExceeded (63) ## 🔍 Root Cause Analysis - **Connection leaks**: Users service not releasing DB connections after errors - **Cascading failures**: Auth timeout causing retry storms - **Missing circuit breaker**: No fallback for auth service failures ## ⚡ Performance Insights - P95 response time: 3.2s (up from 450ms baseline) - 12 endpoints with response time over 5s - Database query time increased 4x - 89% of requests waiting on connections

Customization

Add regex patterns for custom error formats
Machine learning for anomaly detection thresholds
Correlation with deployment events
User impact estimation (affected user IDs)
Auto-create Jira tickets for recurring issues