Incident Report Generator
Automatically generate post-mortem reports from incident timelines and Slack conversations.
This workflow analyzes incident data, creates timelines, identifies root causes, and generates structured post-mortems.
Implementation
1import { relay } from "@relayplane/workflows";23const result = await relay4 .workflow("incident-report")56 // Step 1: Parse incident timeline from multiple sources7 .step("parse-timeline")8 .with("openai:gpt-4o")9 .prompt(`Parse this incident data into a structured timeline:1011PagerDuty Events:12{{pagerdutyEvents}}1314Slack #incidents Channel:15{{slackMessages}}1617Monitoring Alerts:18{{datadogAlerts}}1920Create chronological timeline with:21- Timestamp (ISO format)22- Event description23- Who took action24- System component affected2526Return as JSON array sorted by time.`)2728 // Step 2: Identify root cause29 .step("root-cause")30 .with("anthropic:claude-3.5-sonnet")31 .depends("parse-timeline")32 .prompt(`Analyze this incident timeline to identify root cause:3334{{parse-timeline.output}}3536Determine:37- Primary root cause (be specific)38- Contributing factors39- Which system/service failed40- Whether this was preventable41- Similar past incidents4243Use the "5 Whys" technique.`)4445 // Step 3: Calculate impact metrics46 .step("calculate-impact")47 .with("openai:gpt-4o")48 .depends("parse-timeline")49 .prompt(`Calculate incident impact from this timeline:5051{{parse-timeline.output}}5253Additional context:54- Detection time: {{detectionTime}}55- Resolution time: {{resolutionTime}}56- Affected users: {{affectedUsers}}57- Revenue impact: {{revenueImpact}}5859Calculate:60- Total downtime (minutes)61- MTTR (Mean Time To Recovery)62- MTTD (Mean Time To Detect)63- Severity level (SEV0-SEV3)64- Customer impact score6566Return as structured JSON.`)6768 // Step 4: Generate action items69 .step("action-items")70 .with("anthropic:claude-3.5-sonnet")71 .depends("root-cause", "calculate-impact")72 .prompt(`Generate concrete action items to prevent recurrence:7374Root Cause: {{root-cause.output}}75Impact: {{calculate-impact.output}}7677Create action items with:78- Immediate fixes (0-7 days)79- Short-term improvements (1-4 weeks)80- Long-term prevention (1-3 months)8182Each item needs:83- Description84- Owner (role, not person)85- Estimated effort86- Priority (P0-P2)8788Be specific and actionable.`)8990 // Step 5: Write formal post-mortem91 .step("write-postmortem")92 .with("anthropic:claude-3.5-sonnet")93 .depends("parse-timeline", "root-cause", "calculate-impact", "action-items")94 .prompt(`Write a blameless post-mortem report:9596Timeline: {{parse-timeline.output}}97Root Cause: {{root-cause.output}}98Impact: {{calculate-impact.output}}99Action Items: {{action-items.output}}100101Structure:102# Incident Summary103- Date and duration104- Severity105- Services affected106- Customer impact107108# Timeline109- Detection110- Key events111- Resolution112113# Root Cause Analysis114- What happened115- Why it happened116- Contributing factors117118# Impact Assessment119- Metrics120- Customer effect121- Business impact122123# What Went Well124- Positive aspects of response125126# What Could Be Improved127- Areas for improvement128129# Action Items130- Categorized by timeframe131- Owners assigned132133# Lessons Learned134135Use professional, blameless language.136Target audience: Engineering and leadership.`)137138 .run({139 pagerdutyEvents: pdEvents,140 slackMessages: slackThread,141 datadogAlerts: ddAlerts,142 detectionTime: "2024-01-15T14:32:00Z",143 resolutionTime: "2024-01-15T16:18:00Z",144 affectedUsers: "~12,000",145 revenueImpact: "$8,500 estimated",146 });147148// Save to wiki/docs149await saveToConfluence({150 space: "Engineering",151 title: `Post-Mortem: ${incidentTitle}`,152 content: result.steps["write-postmortem"].output,153});Webhook Trigger
Auto-generate post-mortems when incidents are resolved:
1// PagerDuty webhook handler2app.post("/webhooks/pagerduty", async (req, res) => {3 const event = req.body;45 if (event.event === "incident.resolved") {6 const incident = event.incident;78 // Fetch related data9 const slackThread = await getSlackThread(incident.id);10 const alerts = await getDatadogAlerts(incident.created_at, incident.resolved_at);1112 // Generate post-mortem13 await relay14 .workflow("incident-report")15 .run({16 pagerdutyEvents: incident.log_entries,17 slackMessages: slackThread,18 datadogAlerts: alerts,19 detectionTime: incident.created_at,20 resolutionTime: incident.resolved_at,21 affectedUsers: incident.impacted_services.total_users,22 });23 }2425 res.sendStatus(200);26});Benefits
- Time Savings: Post-mortem creation drops from 2-4 hours to 10 minutes
- Consistency: All reports follow same structured format
- Completeness: Never miss critical timeline events
- Blameless Culture: AI maintains professional, learning-focused tone
Production Tip: Run this within 24 hours of incident resolution while details are fresh