relayplanellmcost-trackingnodejstutorial

How to Track LLM API Costs Per Request in Node.js

Matt Turley··7 min read

The typical LLM cost story goes like this: you ship a feature, usage grows, and then at the end of the month you get an invoice and spend an afternoon trying to figure out which customers, features, or code paths drove the number. By then, you can't do anything about it.

The fix is per-request cost tracking, wired up from day one. This post walks through what that looks like in practice, and how to do it without building an accounting system alongside your product.


The Problem Is Timing

LLM billing is aggregated. OpenAI, Anthropic, Google — they all send you one number per model per month. If you're running a multi-tenant app, that number tells you nothing about which tenant is expensive.

You need cost data at the request level, attributed to something meaningful (user ID, tenant ID, feature name), stored somewhere you can query.

Here's what most teams build first:

async function callLLM(prompt, userId) {
  const response = await openai.chat.completions.create({
    model: 'gpt-4o',
    messages: [{ role: 'user', content: prompt }],
  });

  const inputTokens = response.usage.prompt_tokens;
  const outputTokens = response.usage.completion_tokens;

  // Hardcoded prices that will be wrong after the next OpenAI repricing
  const costUsd = (inputTokens * 0.000005) + (outputTokens * 0.000015);

  await db.query(
    'INSERT INTO llm_usage (user_id, model, input_tokens, output_tokens, cost_usd, created_at) VALUES (?, ?, ?, ?, ?, NOW())',
    [userId, 'gpt-4o', inputTokens, outputTokens, costUsd]
  );

  return response;
}

This works. Until you add a second model and forget to update the price constants. Or until OpenAI reprices and your historical cost data is wrong. Or until a teammate adds a callLLMDirect() function that bypasses the wrapper and you have a gap in your data.


The Multi-Tenant Shape

In a multi-tenant app, cost tracking needs to answer questions like:

  • How much did tenant acme-corp spend this month?
  • Which feature is driving the most token usage?
  • Is user X's usage outside the norm for their plan?
  • Which model switch would save the most money at current usage patterns?

That means your cost records need attributes beyond tokens. A minimal schema looks like:

{
  requestId: 'req_abc123',
  tenantId: 'acme-corp',
  userId: 'user_456',
  feature: 'document-summarizer',
  model: 'gpt-4o',
  inputTokens: 1240,
  outputTokens: 380,
  costUsd: 0.007900,
  latencyMs: 1840,
  timestamp: '2026-03-12T10:22:14Z',
}

Keeping this accurate across multiple models, providers, and price changes is the part that becomes a maintenance burden.


Using @relayplane/proxy for Per-Request Cost Tracking

@relayplane/proxy runs as a local proxy that sits between your app and the LLM APIs. Every request flows through it, gets cost-estimated at current prices using maintained pricing tables, and gets logged to a local SQLite database.

Setup takes about 2 minutes:

npm install -g @relayplane/proxy
relayplane init
relayplane start

Then point your OpenAI client at the proxy:

import OpenAI from 'openai';

const openai = new OpenAI({
  baseURL: 'http://localhost:4100/v1',
});

That's all you change. Your existing API calls work identically.

Every request the proxy handles gets written to a local SQLite database with the following schema: id, prompt, system_prompt, task_type, model, success, output, error, duration_ms, tokens_in, tokens_out, cost_usd, metadata, created_at. The proxy automatically classifies task type from the prompt content and tracks cost against maintained pricing tables. No extra code in your application layer.


Querying Cost Data

The proxy's built-in dashboard (open http://localhost:4100 in your browser) shows spend by model, latency distributions, and error rates. For CLI stats, run relayplane stats. For programmatic access, query the SQLite database directly:

import Database from 'better-sqlite3';
import path from 'path';
import os from 'os';

const db = new Database(path.join(os.homedir(), '.relayplane', 'data.db'));

// Spend by model this month
function getMonthlySpendByModel() {
  return db.prepare(`
    SELECT
      model,
      SUM(cost_usd) as total_cost,
      SUM(tokens_in + tokens_out) as total_tokens,
      COUNT(*) as request_count
    FROM runs
    WHERE created_at >= date('now', 'start of month')
    GROUP BY model
    ORDER BY total_cost DESC
  `).all();
}

// Recent requests with cost breakdown
function getRecentRequests(limit = 50) {
  return db.prepare(`
    SELECT
      model,
      task_type,
      tokens_in,
      tokens_out,
      cost_usd,
      duration_ms,
      created_at
    FROM runs
    ORDER BY created_at DESC
    LIMIT ?
  `).all(limit);
}

This gives you real cost data without building a separate data pipeline. For per-tenant or per-feature attribution in a multi-tenant app, you'll want to add your own logging layer on top that records the tenant/user context alongside the request ID.


Pricing Table Accuracy

The hardcoded-constants approach breaks when providers reprice. GPT-4o has been repriced multiple times. Claude models have different input/output token prices, and the pricing between Sonnet/Haiku/Opus varies by 10x.

@relayplane/proxy maintains pricing tables as part of the package across all 11 supported providers. When models reprice, the updated package reflects it. You update the package, not your application code.

npm update -g @relayplane/proxy

That's the entire update path for keeping cost math accurate.


What This Replaces

Without a proxy, per-request cost tracking requires:

  • A wrapper function around every LLM call
  • Hardcoded pricing constants per model per provider
  • A logging sink (database, log service, analytics tool)
  • Maintenance every time pricing changes or you add a model

With @relayplane/proxy, the proxy layer handles all of that. Your application code stays focused on business logic. You stop writing and maintaining LLM infrastructure.

For teams that are just now building cost attribution into their product, starting with the proxy approach means never having the hardcoded-constants problem in the first place.


Get Started

npm install -g @relayplane/proxy
relayplane init
relayplane start

Point your OpenAI client at http://localhost:4100/v1 (or Anthropic client at http://localhost:4100). Costs start appearing in the dashboard immediately. The package is MIT licensed and the data stays local in SQLite.

Full docs at relayplane.com. If you're building a multi-tenant app and want to know what your LLM spend looks like per customer before the month-end invoice, this is the fastest path to get there.


Matt Turley is a solo developer building RelayPlane. He writes about AI infrastructure, agent workflows, and building in public.