AI Model Latency Benchmarks

Real-time latency benchmarks for AI model providers. Measured from US-East with your own API keys (BYOK).

Last updated: February 2026

Benchmarks run every 6 hours. Data shown is from standardized prompts to ensure fair comparison across providers.
ModelProviderP50 LatencyP95 LatencyTokens/secSuccess Rate
Fastestgemini-2.0-flash
google178ms312ms156.399.8%
gpt-4o-mini
openai245ms412ms125.599.9%
claude-3.5-haiku
anthropic289ms478ms118.299.8%
claude-4-sonnet
anthropic478ms845ms82.199.8%
gpt-4.1
openai485ms856ms82.799.9%
gpt-4o
openai512ms892ms78.499.9%
gemini-1.5-pro
google623ms1124ms65.399.5%
claude-4-opus
anthropic892ms1456ms48.299.7%

Methodology

  • Standardized prompts: Short prompts (~50 tokens) with ~100 token responses
  • Sample size: 10 requests per model per run
  • Frequency: Benchmarks run every 6 hours
  • Location: US-East (Virginia)
  • BYOK: Uses RelayPlane's API keys, same as your workflows

API Access

Access benchmark data programmatically:

GET https://api.relayplane.com/api/v1/benchmarks

Returns JSON with latest benchmark results. No authentication required.

Route to the fastest model automatically

RelayPlane can automatically select the fastest provider for your workflows based on real-time benchmarks.