AI Model Latency Benchmarks

Real-time latency benchmarks for AI model providers. Measured from US-East with your own API keys (BYOK).

Last updated: February 2026

Benchmarks run every 6 hours. Data shown is from standardized prompts to ensure fair comparison across providers.

Model	Provider	P50 Latency	P95 Latency	Tokens/sec	Success Rate
Fastestgemini-2.0-flash	google	178ms	312ms	156.3	99.8%
gpt-4o-mini	openai	245ms	412ms	125.5	99.9%
claude-3.5-haiku	anthropic	289ms	478ms	118.2	99.8%
claude-4-sonnet	anthropic	478ms	845ms	82.1	99.8%
gpt-4.1	openai	485ms	856ms	82.7	99.9%
gpt-4o	openai	512ms	892ms	78.4	99.9%
gemini-1.5-pro	google	623ms	1124ms	65.3	99.5%
claude-4-opus	anthropic	892ms	1456ms	48.2	99.7%

Methodology

Access benchmark data programmatically:

GET https://api.relayplane.com/api/v1/benchmarks

Returns JSON with latest benchmark results. No authentication required.

RelayPlane can automatically select the fastest provider for your workflows based on real-time benchmarks.