Deployment Guide

Learn how to deploy RelayPlane workflows to production across different environments and platforms.

Local Development

Before deploying to production, you'll want to develop and test workflows locally. RelayPlane runs entirely on your machine with no external dependencies required.

Environment Setup

Create a .env file in your project root to store API keys and configuration:

.env
1# AI Provider API Keys
2OPENAI_API_KEY=sk-...
3ANTHROPIC_API_KEY=sk-ant-...
4GOOGLE_AI_API_KEY=...
5
6# Optional: RelayPlane Pro for telemetry
7RELAYPLANE_API_KEY=rp-...
8
9# Environment
10NODE_ENV=development
11
12# Logging
13LOG_LEVEL=debug

Never commit your .env file to version control. Add it to your .gitignore file immediately after creating it.

Running Workflows Locally

Use tsx or ts-node to run your workflows during development:

1# Install dependencies
2npm install
3
4# Run a workflow file directly
5npx tsx src/workflows/my-workflow.ts
6
7# Or use ts-node
8npx ts-node --esm src/workflows/my-workflow.ts
9
10# With environment variables
11npx tsx --env-file=.env src/workflows/my-workflow.ts

Debugging with Console and Metadata

Use console.log and workflow metadata to debug issues:

1import { relay } from "@relayplane/workflows";
2
3const result = await relay
4 .workflow("debug-example")
5 .step("analyze", {
6 systemPrompt: "Analyze the input data",
7 })
8 .with("openai:gpt-4o")
9 .run({
10 data: "test input",
11 // Enable debug metadata
12 __debug: true
13 });
14
15// Log full result structure
16console.log("Result:", JSON.stringify(result, null, 2));
17
18// Access step-specific outputs
19console.log("Analyze output:", result.steps.analyze);
20
21// Check execution metadata
22console.log("Duration:", result.metadata?.duration);
23console.log("Token usage:", result.metadata?.tokenUsage);

Pro tip: Set LOG_LEVEL=debug in your environment to see detailed execution traces including prompts, model responses, and timing information.

Production Deployment

When deploying to production, security and reliability become critical. Follow these guidelines to ensure safe operation.

Environment Variables

Store all sensitive configuration in environment variables, never in code:

src/config/env.ts
1// Validate required environment variables at startup
2function getRequiredEnv(key: string): string {
3 const value = process.env[key];
4 if (!value) {
5 throw new Error(`Missing required environment variable: ${key}`);
6 }
7 return value;
8}
9
10export const config = {
11 // AI Providers
12 openaiApiKey: getRequiredEnv("OPENAI_API_KEY"),
13 anthropicApiKey: process.env.ANTHROPIC_API_KEY, // Optional
14
15 // RelayPlane
16 relayplaneApiKey: process.env.RELAYPLANE_API_KEY,
17
18 // App config
19 nodeEnv: process.env.NODE_ENV || "production",
20 logLevel: process.env.LOG_LEVEL || "info",
21};

Security Considerations

  • Never commit API keys - Use .gitignore and secret management services
  • Rotate keys regularly - Set calendar reminders to rotate API keys quarterly
  • Use least privilege - Create API keys with minimal required permissions
  • Audit access logs - Monitor who accesses your AI provider accounts
  • Set spending limits - Configure billing alerts and hard caps with your AI providers

API key exposure is a critical security incident. If you accidentally commit API keys, rotate them immediately and check your provider's usage logs for unauthorized access.

Logging Setup

Configure structured logging for production monitoring:

src/utils/logger.ts
1import pino from "pino";
2
3export const logger = pino({
4 level: process.env.LOG_LEVEL || "info",
5 formatters: {
6 level: (label) => ({ level: label }),
7 },
8 timestamp: pino.stdTimeFunctions.isoTime,
9 // Add request context
10 mixin() {
11 return {
12 service: "relayplane-workflows",
13 environment: process.env.NODE_ENV,
14 };
15 },
16});
17
18// Usage in workflows
19logger.info({ workflowName: "invoice-processor", duration: 1234 }, "Workflow completed");
20logger.error({ error: err.message, stack: err.stack }, "Workflow failed");

Docker Deployment

Docker provides consistent environments across development and production. Here's a production-ready setup.

Example Dockerfile

Dockerfile
1# Build stage
2FROM node:20-alpine AS builder
3
4WORKDIR /app
5
6# Copy package files
7COPY package*.json ./
8COPY tsconfig.json ./
9
10# Install dependencies
11RUN npm ci --only=production
12
13# Copy source code
14COPY src ./src
15
16# Build TypeScript
17RUN npm run build
18
19# Production stage
20FROM node:20-alpine AS runner
21
22WORKDIR /app
23
24# Create non-root user for security
25RUN addgroup --system --gid 1001 nodejs
26RUN adduser --system --uid 1001 relayplane
27USER relayplane
28
29# Copy built application
30COPY --from=builder --chown=relayplane:nodejs /app/dist ./dist
31COPY --from=builder --chown=relayplane:nodejs /app/node_modules ./node_modules
32COPY --from=builder --chown=relayplane:nodejs /app/package.json ./
33
34# Set environment
35ENV NODE_ENV=production
36
37# Health check
38HEALTHCHECK --interval=30s --timeout=3s --start-period=5s --retries=3 \
39 CMD node -e "console.log('healthy')" || exit 1
40
41CMD ["node", "dist/index.js"]

Docker Compose for Multi-Service

Use Docker Compose to orchestrate workflows with databases and queues:

docker-compose.yml
1version: '3.8'
2
3services:
4 workflows:
5 build: .
6 environment:
7 - NODE_ENV=production
8 - OPENAI_API_KEY=${OPENAI_API_KEY}
9 - ANTHROPIC_API_KEY=${ANTHROPIC_API_KEY}
10 - REDIS_URL=redis://redis:6379
11 - DATABASE_URL=postgresql://postgres:password@db:5432/relayplane
12 depends_on:
13 - redis
14 - db
15 restart: unless-stopped
16 deploy:
17 resources:
18 limits:
19 memory: 1G
20 cpus: '1'
21
22 redis:
23 image: redis:7-alpine
24 volumes:
25 - redis_data:/data
26 restart: unless-stopped
27
28 db:
29 image: postgres:15-alpine
30 environment:
31 POSTGRES_DB: relayplane
32 POSTGRES_PASSWORD: password
33 volumes:
34 - postgres_data:/var/lib/postgresql/data
35 restart: unless-stopped
36
37volumes:
38 redis_data:
39 postgres_data:

Container Best Practices

  • Use multi-stage builds - Reduce image size by separating build and runtime
  • Run as non-root - Create dedicated users for better security
  • Pin versions - Use specific image tags, not latest
  • Add health checks - Enable orchestrators to detect unhealthy containers
  • Set resource limits - Prevent runaway containers from affecting other services

Serverless (Lambda/Vercel)

Serverless platforms are great for sporadic workflow execution but require special considerations for AI workloads.

Cold Start Considerations

Cold starts can add 500ms-2s to workflow execution. Minimize impact with these strategies:

src/api/workflow.ts
1import { relay } from "@relayplane/workflows";
2
3// Initialize outside handler to persist across invocations
4const workflowConfig = {
5 // Pre-configure providers to reduce cold start time
6 providers: {
7 openai: { apiKey: process.env.OPENAI_API_KEY },
8 anthropic: { apiKey: process.env.ANTHROPIC_API_KEY },
9 },
10};
11
12// Lambda handler
13export async function handler(event: APIGatewayEvent) {
14 const input = JSON.parse(event.body || "{}");
15
16 const result = await relay
17 .workflow("serverless-workflow")
18 .step("process")
19 .with("openai:gpt-4o-mini") // Use faster models for serverless
20 .run(input);
21
22 return {
23 statusCode: 200,
24 body: JSON.stringify(result),
25 };
26}

Timeout Configuration

AI API calls are slow! LLM responses typically take 2-30 seconds depending on the model and prompt complexity. Configure generous timeouts to avoid premature termination.

vercel.json
1{
2 "functions": {
3 "api/workflow.ts": {
4 "maxDuration": 60,
5 "memory": 1024
6 }
7 }
8}
serverless.yml
1functions:
2 processWorkflow:
3 handler: src/api/workflow.handler
4 timeout: 60 # 60 seconds for AI calls
5 memorySize: 1024
6 environment:
7 OPENAI_API_KEY: ${env:OPENAI_API_KEY}
8 ANTHROPIC_API_KEY: ${env:ANTHROPIC_API_KEY}

Edge Functions Limitations

Edge functions (Vercel Edge, Cloudflare Workers) have restrictions that affect AI workflows:

  • CPU time limits - Edge functions typically limit to 50ms-30s CPU time
  • No Node.js APIs - Edge runtime lacks fs, child_process, and other Node modules
  • Memory constraints - Limited to 128MB in some platforms
  • Use serverless instead - For AI workflows, prefer traditional serverless functions

Recommendation: Use edge functions only for lightweight tasks like request routing. Run actual AI workflows in serverless functions or containers.

Monitoring & Observability

Production workflows need comprehensive monitoring to track performance, costs, and errors.

Telemetry Integration with RelayPlane Pro

RelayPlane Pro provides built-in telemetry for all your workflows:

1import { relay } from "@relayplane/workflows";
2
3// Enable telemetry with your API key
4relay.configure({
5 apiKey: process.env.RELAYPLANE_API_KEY,
6 telemetry: {
7 enabled: true,
8 // Send traces for debugging
9 traces: true,
10 // Track token usage for cost monitoring
11 tokenUsage: true,
12 // Include custom metadata
13 metadata: {
14 environment: process.env.NODE_ENV,
15 version: process.env.APP_VERSION,
16 },
17 },
18});
19
20// Telemetry is automatically collected for all workflows
21const result = await relay
22 .workflow("monitored-workflow")
23 .step("process")
24 .with("openai:gpt-4o")
25 .run(input);
26
27// View traces and metrics at dashboard.relayplane.com

Key Metrics to Track

Monitor these metrics for healthy production workflows:

  • Execution duration - Track p50, p95, p99 latencies per workflow
  • Token usage - Monitor input/output tokens per model for cost control
  • Error rate - Track failures by workflow, step, and error type
  • Throughput - Workflows executed per minute/hour
  • Cost per workflow - Calculate from token usage and provider pricing
  • Rate limit hits - Track when you're hitting provider limits
src/monitoring/metrics.ts
1import { metrics } from "@opentelemetry/api";
2
3const meter = metrics.getMeter("relayplane-workflows");
4
5// Create metrics
6const workflowDuration = meter.createHistogram("workflow.duration", {
7 description: "Workflow execution duration in milliseconds",
8 unit: "ms",
9});
10
11const tokenUsage = meter.createCounter("workflow.tokens", {
12 description: "Total tokens used",
13});
14
15const workflowErrors = meter.createCounter("workflow.errors", {
16 description: "Total workflow errors",
17});
18
19// Record metrics after workflow execution
20export function recordWorkflowMetrics(result: WorkflowResult) {
21 workflowDuration.record(result.metadata.duration, {
22 workflow: result.workflowName,
23 });
24
25 tokenUsage.add(result.metadata.tokenUsage.total, {
26 workflow: result.workflowName,
27 model: result.metadata.model,
28 });
29
30 if (result.error) {
31 workflowErrors.add(1, {
32 workflow: result.workflowName,
33 errorType: result.error.name,
34 });
35 }
36}

Alerting Recommendations

Set up alerts for these critical conditions:

  • Error rate > 5% - Investigate immediately, may indicate model issues
  • p95 latency > 30s - Workflows taking too long, consider optimization
  • Daily cost > budget - Prevent unexpected billing surprises
  • Rate limits hit - Need to implement queuing or request higher limits
  • Token usage spike - May indicate prompt injection or infinite loops

Scaling Considerations

As workflow volume grows, you'll need strategies to handle load while respecting AI provider limits.

Rate Limiting Strategies

AI providers enforce rate limits on requests per minute (RPM) and tokens per minute (TPM). Implement client-side limiting:

src/utils/rate-limiter.ts
1import Bottleneck from "bottleneck";
2
3// Create limiters for each provider
4const openaiLimiter = new Bottleneck({
5 maxConcurrent: 10, // Max parallel requests
6 minTime: 100, // Min 100ms between requests (600 RPM)
7});
8
9const anthropicLimiter = new Bottleneck({
10 maxConcurrent: 5,
11 minTime: 200, // 300 RPM
12});
13
14// Wrap workflow execution with rate limiting
15export async function executeWithRateLimit(
16 provider: string,
17 fn: () => Promise
18) {
19 const limiter = provider === "openai" ? openaiLimiter : anthropicLimiter;
20 return limiter.schedule(fn);
21}
22
23// Usage
24const result = await executeWithRateLimit("openai", () =>
25 relay.workflow("my-workflow").step("process").with("openai:gpt-4o").run(input)
26);

Queue-Based Processing

For high-volume workloads, use a queue to decouple ingestion from processing:

src/queue/workflow-queue.ts
1import { Queue, Worker } from "bullmq";
2import { relay } from "@relayplane/workflows";
3
4// Create queue
5const workflowQueue = new Queue("workflows", {
6 connection: { host: "localhost", port: 6379 },
7 defaultJobOptions: {
8 attempts: 3,
9 backoff: {
10 type: "exponential",
11 delay: 5000,
12 },
13 },
14});
15
16// Add jobs to queue
17export async function enqueueWorkflow(
18 workflowName: string,
19 input: Record
20) {
21 await workflowQueue.add(workflowName, { workflowName, input });
22}
23
24// Process jobs with concurrency control
25const worker = new Worker(
26 "workflows",
27 async (job) => {
28 const { workflowName, input } = job.data;
29
30 const result = await relay
31 .workflow(workflowName)
32 .step("process")
33 .with("openai:gpt-4o")
34 .run(input);
35
36 return result;
37 },
38 {
39 connection: { host: "localhost", port: 6379 },
40 concurrency: 5, // Process 5 workflows in parallel
41 }
42);
43
44worker.on("completed", (job) => {
45 console.log(`Workflow ${job.id} completed`);
46});
47
48worker.on("failed", (job, err) => {
49 console.error(`Workflow ${job?.id} failed:`, err);
50});

Batch Workflows

Process multiple items efficiently with batch workflows:

1import { relay } from "@relayplane/workflows";
2
3// Batch process items with controlled concurrency
4async function processBatch(items: string[], batchSize = 10) {
5 const results = [];
6
7 // Process in batches
8 for (let i = 0; i < items.length; i += batchSize) {
9 const batch = items.slice(i, i + batchSize);
10
11 // Process batch items in parallel
12 const batchResults = await Promise.all(
13 batch.map((item) =>
14 relay
15 .workflow("batch-processor")
16 .step("process")
17 .with("openai:gpt-4o-mini") // Use faster model for batches
18 .run({ item })
19 )
20 );
21
22 results.push(...batchResults);
23
24 // Optional: Add delay between batches to stay under rate limits
25 if (i + batchSize < items.length) {
26 await new Promise((resolve) => setTimeout(resolve, 1000));
27 }
28 }
29
30 return results;
31}
32
33// Usage
34const items = ["item1", "item2", "item3", /* ... hundreds more */];
35const results = await processBatch(items, 10);

Cost optimization: For batch processing, use gpt-4o-mini or claude-3-haiku which are 10-20x cheaper than larger models. Reserve powerful models for complex tasks.

Next Steps