ai-agentsagent-opssecurityrelayplanebuild-in-public

Let Your Agents Cook

Matt Turley··4 min read

Most people building with AI agents skip the boring part.

They wire up a swarm, let it produce output, and ship whatever comes out. No review. No verification. Just vibes.

I did that too. Then my security agent caught live API credentials committed to git. By an agent I built.

So now every agent output goes through a mandatory pipeline before it touches production. Three pipelines, no exceptions.

The Pipelines

Code: @coder builds, @sentinel reviews security, @verifier validates tests. Only then it ships.

Content: @writer drafts, verification catches hallucinations and wrong URLs, then approval.

Research: @hunter/@scout gather signals, findings get distilled into the backlog. No raw research gets acted on.

An orchestrator runs every 30 minutes. It checks completions, triggers the next pipeline step, dispatches new work. When @sentinel fails a review, the fix gets auto-dispatched, re-reviewed, and ships only when clean. No human in the loop.

Why This Matters

Agents make the same classes of mistakes humans do. They commit secrets. They fail open on error paths. They default to permissive when they should default to restrictive.

Except agents do it faster and more confidently.

Today @sentinel flagged fail-open auth and a null check that would've given users perpetual free access to RelayPlane. Caught. Fixed. Re-reviewed. Shipped clean.

The pipeline overhead is minimal compared to the cost of shipping a security hole. The security agent adds maybe 3 minutes per code task. It's caught fail-open auth logic, null checks that bypass tier enforcement, and a credential leak.

What We Learned

Specialization is key. The coding agent should not review its own code. The writing agent should not fact-check its own claims. Separate agents with separate system prompts for each role.

The self-correction loop matters more than the initial review. When @sentinel fails a review, the system automatically dispatches a fix, then re-reviews. Some tasks go through two @sentinel reviews before shipping clean. No human intervention needed.

Pipeline overhead is not the bottleneck. Shipping a security hole is the bottleneck. Review adds minutes, not hours.

The Credential Incident

Early on we let agents ship output directly. That ended when our security review agent caught live API credentials committed to git by the coding agent. Credentials for Reddit and Twitter APIs, sitting in git history with no .gitignore to prevent it.

We immediately added mandatory pipelines for everything. Not optional. Not “for important stuff.” Everything.

The agents do the work. The pipeline makes sure it's safe to ship. That's the deal.

Setup

If you're running agent infrastructure, the orchestrator pattern is straightforward:

  1. Agents complete tasks and report results to a shared log
  2. Orchestrator cron picks up completions and triggers the next pipeline step
  3. Each pipeline step runs in isolation with a fresh agent instance
  4. Only when all steps pass does output reach production

RelayPlane gives every agent call cost visibility and budget limits so the orchestrator can also track spending per pipeline run. When a task would blow the budget, it blocks before any tokens are burned.

If you're shipping raw agent output to production, the question isn't whether something will go wrong. It's when, and how bad.


Running agents at scale? RelayPlane handles budget enforcement and multi-provider routing so your agents can cook without burning the kitchen down.