PII Detector & Redactor
Automatically detect and redact personally identifiable information (PII) from documents and logs.
This workflow identifies sensitive data including names, emails, SSNs, credit cards, and more with redaction options.
Implementation
1import { relay } from "@relayplane/workflows";23const result = await relay4 .workflow("pii-detector")56 // Step 1: Detect PII entities7 .step("detect-pii")8 .with("openai:gpt-4o")9 .prompt(`Scan this content for PII (Personally Identifiable Information):1011{{content}}1213Identify all instances of:14- Full names15- Email addresses16- Phone numbers (all formats)17- Social Security Numbers (SSN)18- Credit card numbers19- Home addresses20- IP addresses21- Date of birth22- Government ID numbers (passport, driver's license)23- Medical record numbers24- Biometric data25- Account numbers2627For each finding:28- PII type29- Exact text/value found30- Location (line number or context)31- Sensitivity level (high/medium/low)3233Return as JSON array.`)3435 // Step 2: Assess risk level36 .step("assess-risk")37 .with("anthropic:claude-3.5-sonnet")38 .depends("detect-pii")39 .prompt(`Assess data privacy risk:4041PII Found: {{detect-pii.output}}4243Content Type: {{contentType}}44Intended Use: {{intendedUse}}45Audience: {{audience}}4647Evaluate:48- GDPR implications (if EU data)49- CCPA requirements (if California residents)50- HIPAA violations (if health data)51- Financial data regulations52- Overall privacy risk score (0-100)5354Recommend classification: Public / Internal / Confidential / Restricted`)5556 // Step 3: Generate redacted version57 .step("redact-content")58 .with("anthropic:claude-3.5-sonnet")59 .depends("detect-pii")60 .prompt(`Create redacted version of content:6162Original: {{content}}63PII Detected: {{detect-pii.output}}6465Redaction strategy:66- Replace names with "[NAME]"67- Replace emails with "[EMAIL]"68- Replace SSN with "[SSN]"69- Replace credit cards with "[CREDIT_CARD]"70- Replace addresses with "[ADDRESS]"71- Preserve overall meaning and context7273Return fully redacted content.`)7475 // Step 4: Anonymization suggestions76 .step("anonymize-suggestions")77 .with("openai:gpt-4o")78 .depends("detect-pii", "assess-risk")79 .prompt(`Suggest anonymization strategies:8081PII: {{detect-pii.output}}82Risk Assessment: {{assess-risk.output}}8384For each PII type, recommend:85- Redaction (remove entirely)86- Masking (partial: j***@example.com)87- Tokenization (replace with unique ID)88- Generalization (specific → generic: "John" → "User A")89- Synthetic data (fake but realistic)9091Consider the use case: {{intendedUse}}`)9293 // Step 5: Generate report94 .step("pii-report")95 .with("anthropic:claude-3.5-sonnet")96 .depends("detect-pii", "assess-risk", "anonymize-suggestions")97 .prompt(`Create PII detection report:9899Findings: {{detect-pii.output}}100Risk: {{assess-risk.output}}101Recommendations: {{anonymize-suggestions.output}}102103Format:104# PII Detection Report105106## 📊 Summary107- Total PII instances found108- PII types detected109- Risk level110111## 🔍 Detailed Findings112- List each PII with context113114## ⚠️ Compliance Risks115- GDPR/CCPA/HIPAA implications116- Required actions117118## 💡 Recommendations119- How to handle each PII type120- Anonymization approach121122## ✅ Next Steps123- Actionable items124125Professional, clear, compliance-focused.`)126127 .run({128 content: documentText,129 contentType: "Customer Support Transcript",130 intendedUse: "Training machine learning model",131 audience: "Internal data science team",132 });133134// Save redacted version135await saveFile({136 path: "data/redacted/transcript-001.txt",137 content: result.steps["redact-content"].output,138});139140// Alert if high-risk PII found141const risk = JSON.parse(result.steps["assess-risk"].output);142if (risk.score > 70) {143 await notifySecurityTeam({144 alert: "High-risk PII detected",145 report: result.steps["pii-report"].output,146 });147}Real-time Log Sanitization
1import { relay } from "@relayplane/workflows";23// Sanitize logs before storing4async function sanitizeLogs(logEntries: string[]): Promise { 5 const batch = logEntries.join("\n");67 const result = await relay8 .workflow("pii-detector")9 .run({10 content: batch,11 contentType: "Application Logs",12 intendedUse: "Debugging and monitoring",13 audience: "Engineering team",14 });1516 const redacted = result.steps["redact-content"].output;17 return redacted.split("\n");18}1920// Use in logging pipeline21const sanitizedLogs = await sanitizeLogs(unsafeLogs);22await sendToDatadog(sanitizedLogs);Database Anonymization
1// Anonymize production database for staging2import { relay } from "@relayplane/workflows";34async function anonymizeUserData(users: any[]) {5 for (const user of users) {6 const userData = JSON.stringify(user);78 const result = await relay9 .workflow("pii-detector")10 .step("detect-pii")11 .with("openai:gpt-4o")12 .step("generate-synthetic")13 .with("anthropic:claude-3.5-sonnet")14 .depends("detect-pii")15 .prompt(`Generate synthetic replacement data:1617Original: {{userData}}18PII: {{detect-pii.output}}1920Generate realistic but fake:21- Names (maintain gender/ethnicity distribution)22- Emails (same domain patterns)23- Addresses (real cities, fake streets)24- Phone numbers (valid format, fake numbers)2526Preserve relationships and patterns.`)27 .run({ userData });2829 const syntheticUser = result.steps["generate-synthetic"].output;30 await updateStagingDB(user.id, syntheticUser);31 }32}Sample Output
# PII Detection Report ## 📊 Summary - **Total PII Found:** 37 instances - **PII Types:** Email (12), Phone (8), SSN (2), Name (15) - **Risk Level:** HIGH (Score: 85/100) - **Classification:** RESTRICTED ## 🔍 Detailed Findings **High Sensitivity:** 1. **SSN:** "123-45-6789" (Line 47) - Detected in customer intake form - Full 9-digit format 2. **SSN:** "987-65-4321" (Line 103) **Medium Sensitivity:** 1. **Email:** "john.doe@email.com" (15 occurrences) 2. **Phone:** "(555) 123-4567" (8 occurrences) 3. **Name:** "John Doe" (15 occurrences) ## ⚠️ Compliance Risks - **GDPR:** Presence of EU citizen data requires consent and data processing agreement - **CCPA:** California residents identified - right to deletion applies - **HIPAA:** Medical record numbers detected - requires encryption at rest ## 💡 Recommendations 1. **Immediate Actions:** - Remove all SSNs before using for ML training - Encrypt document at rest - Implement access logging 2. **Anonymization Strategy:** - SSN → Full redaction (not needed for ML model) - Email → Hash or tokenize (preserve uniqueness) - Phone → Partial mask: (555) ***-**67 - Name → Replace with synthetic names ## ✅ Next Steps 1. Legal team approval before using data 2. Implement recommended redactions 3. Update data processing agreement 4. Enable audit logging for access
Regex Patterns
1// Supplement AI with regex for common patterns2const PII_PATTERNS = {3 ssn: /\b\d{3}-\d{2}-\d{4}\b/g,4 email: /\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b/g,5 phone: /\b(?:\d{3}-\d{3}-\d{4}|\(\d{3}\)\s*\d{3}-\d{4})\b/g,6 creditCard: /\b\d{4}[\s-]?\d{4}[\s-]?\d{4}[\s-]?\d{4}\b/g,7 ipAddress: /\b(?:\d{1,3}\.){3}\d{1,3}\b/g,8};910function quickPIICheck(text: string): boolean {11 return Object.values(PII_PATTERNS).some(pattern => pattern.test(text));12}1314// Pre-filter before AI analysis15if (quickPIICheck(document)) {16 await relay.workflow("pii-detector").run({ content: document });17}Benefits
- Compliance: Meet GDPR, CCPA, HIPAA requirements
- Risk Reduction: Prevent data breaches and leaks
- Safe ML Training: Use production data without privacy risks
- Audit Trail: Document PII handling for compliance