Real-time latency benchmarks for AI model providers. Measured from US-East with your own API keys (BYOK).
Last updated: February 2026
| Model | Provider | P50 Latency | P95 Latency | Tokens/sec | Success Rate |
|---|---|---|---|---|---|
Fastestgemini-2.0-flash | 178ms | 312ms | 156.3 | 99.8% | |
gpt-4o-mini | openai | 245ms | 412ms | 125.5 | 99.9% |
claude-3.5-haiku | anthropic | 289ms | 478ms | 118.2 | 99.8% |
claude-4-sonnet | anthropic | 478ms | 845ms | 82.1 | 99.8% |
gpt-4.1 | openai | 485ms | 856ms | 82.7 | 99.9% |
gpt-4o | openai | 512ms | 892ms | 78.4 | 99.9% |
gemini-1.5-pro | 623ms | 1124ms | 65.3 | 99.5% | |
claude-4-opus | anthropic | 892ms | 1456ms | 48.2 | 99.7% |
Access benchmark data programmatically:
GET https://api.relayplane.com/api/v1/benchmarksReturns JSON with latest benchmark results. No authentication required.
RelayPlane can automatically select the fastest provider for your workflows based on real-time benchmarks.