Log Analyzer
Automatically analyze application logs to detect errors, anomalies, and performance issues.
This workflow parses logs, identifies patterns, correlates errors, and generates actionable insights.
Implementation
1import { relay } from "@relayplane/workflows";23const result = await relay4 .workflow("log-analyzer")56 // Step 1: Parse and categorize log entries7 .step("parse-logs")8 .with("openai:gpt-4o")9 .prompt(`Parse and categorize these application logs:1011{{logEntries}}1213For each unique error or pattern:14- Error type/category15- Frequency count16- First occurrence timestamp17- Severity (critical/error/warning/info)18- Affected service/component19- Common error messages2021Return structured summary grouped by error type.`)2223 // Step 2: Identify root causes24 .step("root-cause")25 .with("anthropic:claude-3.5-sonnet")26 .depends("parse-logs")27 .prompt(`Analyze error patterns for root causes:2829Parsed Logs: {{parse-logs.output}}3031For top 5 most frequent errors:32- Likely root cause33- Evidence from log patterns34- Related errors (correlation)35- Time patterns (spike times, recurring intervals)36- Impacted user flows3738Use stack traces and error messages to infer causation.`)3940 // Step 3: Detect anomalies41 .step("detect-anomalies")42 .with("openai:gpt-4o")43 .depends("parse-logs")44 .prompt(`Detect anomalies in log patterns:4546Log Summary: {{parse-logs.output}}47Baseline Stats: {{baselineStats}}4849Identify:50- Error rate spikes (>2σ from baseline)51- New error types (not seen in baseline)52- Performance degradation (slow queries, timeouts)53- Unusual traffic patterns54- Failed authentication spikes5556For each anomaly:57- What changed58- When it started59- Potential impact60- Urgency level`)6162 // Step 4: Extract performance insights63 .step("performance-analysis")64 .with("anthropic:claude-3.5-sonnet")65 .depends("parse-logs")66 .prompt(`Analyze performance from logs:6768{{logEntries}}6970Extract metrics:71- Average response times by endpoint72- Slow queries (>1s)73- Database connection pool exhaustion74- Memory usage warnings75- Cache hit/miss rates7677Identify bottlenecks and optimization opportunities.`)7879 // Step 5: Generate incident report80 .step("incident-report")81 .with("anthropic:claude-3.5-sonnet")82 .depends("parse-logs", "root-cause", "detect-anomalies", "performance-analysis")83 .prompt(`Create incident analysis report:8485Errors: {{parse-logs.output}}86Root Causes: {{root-cause.output}}87Anomalies: {{detect-anomalies.output}}88Performance: {{performance-analysis.output}}8990Structure:91# Log Analysis Report - {{timeRange}}9293## 🚨 Critical Issues94- Errors requiring immediate attention9596## 📊 Error Summary97- Top 10 errors by frequency98- Error rate trends99100## 🔍 Root Cause Analysis101- Primary issues identified102- Correlation patterns103104## ⚡ Performance Insights105- Slow operations106- Resource constraints107108## 💡 Recommended Actions109- Prioritized fixes110- Monitoring improvements111112Use specific log excerpts as evidence.`)113114 .run({115 logEntries: logs,116 baselineStats: {117 avgErrorRate: 0.05,118 avgResponseTime: 250,119 typicalErrorTypes: ["validation_error", "not_found"],120 },121 timeRange: "Last 24 hours",122 });123124// Send alerts for critical issues125const report = result.steps["incident-report"].output;126if (report.includes("CRITICAL")) {127 await sendPagerDuty({128 severity: "critical",129 title: "Critical errors detected in logs",130 details: report,131 });132}Real-time Log Monitoring
1import { relay } from "@relayplane/workflows";2import { createReadStream } from "fs";3import { createInterface } from "readline";45// Stream logs and analyze in batches6async function monitorLogs(logFile: string) {7 const logBuffer: string[] = [];8 const BATCH_SIZE = 1000;910 const fileStream = createReadStream(logFile);11 const rl = createInterface({12 input: fileStream,13 crlfDelay: Infinity,14 });1516 for await (const line of rl) {17 logBuffer.push(line);1819 if (logBuffer.length >= BATCH_SIZE) {20 // Analyze batch21 const analysis = await relay22 .workflow("log-analyzer")23 .run({24 logEntries: logBuffer.join("\n"),25 baselineStats: await getBaseline(),26 timeRange: "Last hour",27 });2829 // Check for anomalies30 const anomalies = analysis.steps["detect-anomalies"].output;31 if (anomalies.includes("CRITICAL")) {32 await alertOnCall(anomalies);33 }3435 // Clear buffer36 logBuffer.length = 0;37 }38 }39}4041// Run continuously42setInterval(() => monitorLogs("/var/log/app.log"), 60000);Integration with Log Aggregators
1// DataDog integration2import { relay } from "@relayplane/workflows";34async function analyzeDDLogs() {5 const logs = await datadogClient.logs.list({6 filter: {7 query: "status:error",8 from: new Date(Date.now() - 3600000), // Last hour9 to: new Date(),10 },11 page: { limit: 5000 },12 });1314 const analysis = await relay15 .workflow("log-analyzer")16 .run({17 logEntries: logs.data.map(l => JSON.stringify(l.attributes)).join("\n"),18 baselineStats: await getHourlyBaseline(),19 });2021 // Post summary to Slack22 await postToSlack({23 channel: "#engineering-alerts",24 text: analysis.steps["incident-report"].output,25 });26}2728// CloudWatch Logs29import { CloudWatchLogsClient, FilterLogEventsCommand } from "@aws-sdk/client-cloudwatch-logs";3031async function analyzeCloudWatchLogs(logGroup: string) {32 const client = new CloudWatchLogsClient({});33 const command = new FilterLogEventsCommand({34 logGroupName: logGroup,35 startTime: Date.now() - 3600000,36 filterPattern: "ERROR",37 });3839 const response = await client.send(command);40 const logs = response.events?.map(e => e.message).join("\n") || "";4142 return await relay.workflow("log-analyzer").run({ logEntries: logs });43}Sample Output
# Log Analysis Report - Last 24 hours ## 🚨 Critical Issues 1. **Database Connection Pool Exhausted** (247 occurrences) - Started: 14:32 UTC - Cause: Connection leak in user service - Impact: 15% of requests failing - Action: Restart service, patch connection handling 2. **Authentication Service Down** (89 occurrences) - Started: 18:15 UTC (ongoing) - Cause: Downstream auth0 timeout - Impact: All logins failing - Action: Enable fallback auth, contact Auth0 ## 📊 Error Summary - Total errors: 1,247 (+340% vs baseline) - Error rate: 2.1% (baseline: 0.05%) - New error types: 3 - Top errors: 1. ConnectionPoolExhausted (247) 2. AuthenticationTimeout (89) 3. RateLimitExceeded (63) ## 🔍 Root Cause Analysis - **Connection leaks**: Users service not releasing DB connections after errors - **Cascading failures**: Auth timeout causing retry storms - **Missing circuit breaker**: No fallback for auth service failures ## ⚡ Performance Insights - P95 response time: 3.2s (up from 450ms baseline) - 12 endpoints with response time over 5s - Database query time increased 4x - 89% of requests waiting on connections
Customization
- Add regex patterns for custom error formats
- Machine learning for anomaly detection thresholds
- Correlation with deployment events
- User impact estimation (affected user IDs)
- Auto-create Jira tickets for recurring issues