How to Build a Usage-Based Billing System for AI Agents (Step-by-Step)

← All posts

Flat-rate SaaS billing is wrong for AI agents. A customer paying $99/month for an agent that fires 50,000 times is getting a bargain that eventually kills your margins. A customer paying the same for 20 runs is being overcharged into churn. Usage-based billing — where customers pay proportional to what they consume and the value they receive — solves both problems. This is the step-by-step guide to building it.

Why usage-based billing matters for AI agents

Traditional SaaS has predictable, uniform resource consumption per user. An AI agent doesn't. The same agent might use 400 tokens on a simple lookup and 40,000 on a complex research task. An outreach agent might fire 10,000 times and convert at 1% — 100 outcomes with wildly different value than the 9,900 that didn't produce anything.

Flat-rate billing obscures this completely. You're charging the same for all of it, which means:

Power users subsidize light users — your best customers cost more to serve but pay the same.
You can't price for value — an agent that booked a $50,000 meeting should cost more than one that found nothing.
Growth destroys margins — as usage scales, your LLM API costs grow linearly while revenue stays flat.

Usage-based billing aligns cost and revenue. Here's how to build it, step by step.

Step 1: Instrument your agent — capturing usage events

Before you can bill for usage, you need to measure it. Every AI agent execution should emit a metering event with four data points: which agent ran, what action it took, how many tokens it consumed, and what the outcome was.

The instrumentation goes at the end of each agent execution — after the LLM response returns, before you return to the caller. Here's the pattern:

Instrument your agent execution javascript

// After your agent runs, emit a meter event
async function runAgent(input) {
  const start = Date.now();

  // Your existing LLM call
  const llmResponse = await openai.chat.completions.create({
    model: 'gpt-4o',
    messages: buildPrompt(input),
  });

  const tokensIn  = llmResponse.usage.prompt_tokens;
  const tokensOut = llmResponse.usage.completion_tokens;
  const result    = parseAgentOutput(llmResponse);

  // Meter the execution — this is the billing record
  await fetch('https://rev.polsia.app/v1/meter', {
    method: 'POST',
    headers: {
      'Authorization': 'Bearer YOUR_API_KEY',
      'Content-Type': 'application/json',
    },
    body: JSON.stringify({
      agent_id:      'my-agent-v1',
      action:        'task.complete',
      tokens_input:  tokensIn,
      tokens_output: tokensOut,
      outcome:       result.success ? 'success' : 'failure',
      metadata: {
        duration_ms: Date.now() - start,
        input_length: input.length,
        customer_id:  input.customerId,
      },
    }),
  });

  return result;
}

Three instrumentation rules to follow:

Always meter, even on failure. Failed runs still consume compute. If you only meter successes, your billing data is wrong and you're absorbing failure costs silently.
Capture token counts from the LLM response, not estimates. Estimation drifts. Every major LLM provider returns actual usage in the response — read it directly.
Store a customer_id in metadata. You'll need to aggregate by customer for invoicing. Do it at instrumentation time, not retroactively.

Step 2: Choose a metering strategy — real-time vs batch, per-token vs per-call

Not all agents bill the same way. Before wiring up a pricing engine, choose the right metering strategy for your workload.

Strategy	How it works	Best for
Real-time per-call	Meter API called synchronously on every execution. Price computed instantly.	Low-to-medium volume agents where latency is acceptable. Simplest to implement.
Real-time per-token	Token counts sent with each call. Price = base fee + (tokens × rate).	Developer tools, internal agents, or workloads with highly variable token consumption.
Deferred outcome	Call registered as pending. Billing finalized when outcome is confirmed (e.g., reply received, meeting booked).	Async workflows where success isn't known immediately — outreach, approvals, fulfillment.
Batch (async queue)	Meter events buffered and flushed periodically. Useful for very high-throughput agents.	>1,000 calls/minute where synchronous metering adds unacceptable latency.

For most early-stage AI agents, real-time per-call is correct. Add per-token components once you have usage data showing your token consumption is actually variable enough to matter. Add deferred outcomes when you've built reliable outcome detection. Don't over-engineer from day one.

The hybrid pricing stack: Most production agents end up with three components: a small base fee per call (covers fixed overhead), a token rate (covers variable compute), and an outcome bonus (captures value on success). Configure all three — set any to zero that don't apply yet — and adjust rates as your data matures. See AI Agent Pricing Models Compared for the full breakdown.

Step 3: Build the billing pipeline — aggregation, invoicing, payment collection

The billing pipeline is what sits between raw meter events and money in your account. It has three stages:

Aggregation

Raw meter events need to be aggregated per customer per billing period. This is where you compute: how many calls did customer X make this month? How many tokens? How many successful outcomes?

Query aggregated usage per customer javascript

// Using Rev's analytics API to get per-customer usage
const response = await fetch(
  'https://rev.polsia.app/v1/analytics?period=monthly&customer_id=cust_123',
  {
    headers: { 'Authorization': 'Bearer YOUR_API_KEY' }
  }
);

const usage = await response.json();
// usage.total_calls:   4,821
// usage.total_tokens:  9,304,000
// usage.success_count: 142
// usage.total_charged: 284.30  ← computed by pricing engine on each meter call

Invoicing

Once you have aggregated usage, you need to turn it into an invoice. The simplest approach for early-stage products is to use Stripe's metered billing — you report usage to Stripe each billing period and Stripe handles the invoice math, PDF generation, and payment collection. Rev handles the Stripe integration end-to-end if you don't want to build this yourself.

Payment collection

Two patterns work here:

Postpaid (monthly invoice): Accumulate usage, invoice at period end, collect payment. Lower friction for customers, higher default risk for you. Works for enterprise or high-trust customer segments.
Prepaid (credit balance): Customers buy credits upfront. Usage draws down the balance. Auto-recharge when balance falls below a threshold. Lower default risk, slightly higher onboarding friction. Better for developer-focused products where customers want predictable spend.

For AI agent billing, prepaid credits often outperform postpaid because developers prefer controlling their spend. Rev's pricing plans use a subscription + overage model — a predictable base that customers can budget, with usage-based overages that scale with their actual usage.

Step 4: Handle edge cases — failed calls, retries, timeout billing, disputes

The happy path is straightforward. The edge cases are where billing systems break trust with customers.

Failed calls

If your agent throws an exception or the LLM returns an error, did compute actually run? Usually yes — tokens were consumed before the failure. Bill for the tokens but don't charge outcome bonuses on failures.

Always meter, even on error javascript

async function runAgentWithBilling(input) {
  let tokensIn = 0, tokensOut = 0;
  let outcome = 'failure';
  let result = null;

  try {
    const llmResponse = await openai.chat.completions.create({ ... });
    tokensIn  = llmResponse.usage.prompt_tokens;
    tokensOut = llmResponse.usage.completion_tokens;
    result    = parseAgentOutput(llmResponse);
    outcome   = result.success ? 'success' : 'failure';
  } catch (err) {
    // LLM error — estimate tokens from input length if response unavailable
    tokensIn  = estimateInputTokens(input);
    outcome   = 'failure';
  } finally {
    // Always meter — failures still consumed compute
    if (tokensIn > 0) {
      await meterCall({ tokensIn, tokensOut, outcome });
    }
  }

  return result;
}

Retries

If your agent retries failed calls automatically, each retry is a separate billing event — it consumed compute. Use a unique idempotency_key to prevent double-billing if your metering call itself retries due to network issues.

Idempotent metering with retry protection javascript

const { v4: uuidv4 } = require('uuid');

// Generate once per logical operation, not per meter attempt
const idempotencyKey = uuidv4();

await fetch('https://rev.polsia.app/v1/meter', {
  method: 'POST',
  headers: {
    'Authorization': 'Bearer YOUR_API_KEY',
    'Content-Type': 'application/json',
    'Idempotency-Key': idempotencyKey,  // Safe to retry this call
  },
  body: JSON.stringify({
    agent_id: 'my-agent-v1',
    action:   'task.complete',
    tokens_input: tokensIn,
    tokens_output: tokensOut,
    outcome: outcome,
  }),
});

Timeout billing

When an LLM call times out, you typically don't know the token count. Your options:

Estimate from input length (input tokens are knowable pre-call)
Charge a flat timeout fee (predictable, easier to explain)
Don't charge timeouts (absorb the cost, better customer experience)

Option 3 is the right call at early stage — you need customer trust more than you need perfect accounting. Implement a proper timeout token estimation when billing disputes become a pattern.

Disputes

Customers will dispute charges they don't recognize. Win the dispute by having an audit trail for every meter event: timestamp, token counts, outcome, agent ID, and enough metadata to reconstruct what happened. Rev stores every meter event permanently — the customer can see their own usage data in the dashboard, which reduces disputes by letting them self-audit before they reach you.

Step 5: Scale considerations — high-throughput metering, idempotency, audit trails

When you're running thousands of agent calls per hour, the billing infrastructure needs to keep up without adding meaningful latency to each call.

Async metering for high-throughput agents

At high volume, making a synchronous HTTP call to a metering API on every agent execution adds latency and creates a dependency: if the metering service has a blip, your agent fails. The fix is async metering — buffer events locally and flush in batches.

Async buffered metering for high-throughput javascript

class MeterBuffer {
  constructor(apiKey, { flushInterval = 5000, maxBatch = 100 } = {}) {
    this.apiKey = apiKey;
    this.buffer = [];
    this.flushInterval = flushInterval;
    this.maxBatch = maxBatch;
    this._timer = setInterval(() => this.flush(), flushInterval);
  }

  // Non-blocking — returns immediately
  record(event) {
    this.buffer.push({ ...event, recorded_at: new Date().toISOString() });
    if (this.buffer.length >= this.maxBatch) this.flush();
  }

  async flush() {
    if (this.buffer.length === 0) return;
    const batch = this.buffer.splice(0, this.maxBatch);
    try {
      await fetch('https://rev.polsia.app/v1/meter/batch', {
        method: 'POST',
        headers: {
          'Authorization': `Bearer ${this.apiKey}`,
          'Content-Type': 'application/json',
        },
        body: JSON.stringify({ events: batch }),
      });
    } catch (err) {
      // Push failed events back for retry
      this.buffer.unshift(...batch);
    }
  }
}

const meter = new MeterBuffer(process.env.REV_API_KEY);

// In your agent — non-blocking, zero latency impact
meter.record({
  agent_id: 'high-volume-agent',
  action: 'classify.complete',
  tokens_input: 600,
  tokens_output: 80,
  outcome: 'success',
});

Idempotency at scale

In distributed systems, meter events can be emitted multiple times due to retries, network failures, or at-least-once delivery semantics. Your metering layer must be idempotent — receiving the same event twice should produce the same billing result as receiving it once. Rev deduplicates by Idempotency-Key — always include one.

Audit trail requirements

Every billing event should be immutable and queryable. Minimum fields per record:

Event ID — globally unique, maps to idempotency key
Timestamp — when the execution happened, not when you metered it
Customer ID — who gets billed
Agent ID + action — what ran and what it did
Token counts (in + out) — raw LLM usage
Outcome + price charged — what the billing engine decided
Metadata blob — anything customer-visible for dispute resolution

Never delete billing records. Archive them, compress them, move them to cold storage — but never delete. Revenue audits and customer disputes can go back years. For the same reason, record the price at billing time, not a reference to your current pricing tier. Pricing tiers change; historical charges must be immutable.

The audit trail is also your analytics foundation. Every meter event is a product usage signal: which agents fire most, which succeed, what the average token consumption per action is, which customers have the highest failure rates. Build the audit trail right and your analytics come for free. See The Complete Guide to Billing AI Agents for more on how Rev's analytics layer works.

Skip building it yourself — Rev handles all of this out of the box

The five steps above represent 2–4 weeks of engineering for a team building it from scratch. You need: a metering API with idempotency and batch support, a pricing engine that handles base + token + outcome configurations, a deferred outcome resolution system, Stripe integration for invoicing and payment collection, a customer-visible usage dashboard, and an immutable audit trail.

Rev is all of this in one API call.

Everything above — in three lines javascript

// Add this after your LLM call. Rev handles pricing, invoicing, audit trail.
const { price_charged, outcome_id } = await fetch('https://rev.polsia.app/v1/meter', {
  method: 'POST',
  headers: {
    'Authorization': 'Bearer YOUR_API_KEY',
    'Content-Type': 'application/json',
  },
  body: JSON.stringify({
    agent_id:      'my-agent-v1',
    action:        'task.complete',
    tokens_input:  llmResponse.usage.prompt_tokens,
    tokens_output: llmResponse.usage.completion_tokens,
    outcome:       result.success ? 'success' : 'pending',
    expires_at:    new Date(Date.now() + 7 * 86400000).toISOString(),
    metadata:      { customer_id: customerId },
  }),
}).then(r => r.json());

You configure pricing tiers in the dashboard (base fee, token rate, outcome bonus — set any to zero). Rev computes the price on every call, stores the event, handles Stripe invoicing, and surfaces usage analytics per customer. The revenue calculator lets you model expected revenue at different call volumes and success rates before you pick your pricing.

Stop building billing infrastructure. Start billing.
Get your API key and make your first metered call in under 5 minutes — no setup, no sales call. Get Your API Key — It's Free →