Flat-rate SaaS billing is wrong for AI agents. A customer paying $99/month for an agent that fires 50,000 times is getting a bargain that eventually kills your margins. A customer paying the same for 20 runs is being overcharged into churn. Usage-based billing — where customers pay proportional to what they consume and the value they receive — solves both problems. This is the step-by-step guide to building it.
Why usage-based billing matters for AI agents
Traditional SaaS has predictable, uniform resource consumption per user. An AI agent doesn't. The same agent might use 400 tokens on a simple lookup and 40,000 on a complex research task. An outreach agent might fire 10,000 times and convert at 1% — 100 outcomes with wildly different value than the 9,900 that didn't produce anything.
Flat-rate billing obscures this completely. You're charging the same for all of it, which means:
- Power users subsidize light users — your best customers cost more to serve but pay the same.
- You can't price for value — an agent that booked a $50,000 meeting should cost more than one that found nothing.
- Growth destroys margins — as usage scales, your LLM API costs grow linearly while revenue stays flat.
Usage-based billing aligns cost and revenue. Here's how to build it, step by step.
Step 1: Instrument your agent — capturing usage events
Before you can bill for usage, you need to measure it. Every AI agent execution should emit a metering event with four data points: which agent ran, what action it took, how many tokens it consumed, and what the outcome was.
The instrumentation goes at the end of each agent execution — after the LLM response returns, before you return to the caller. Here's the pattern:
// After your agent runs, emit a meter event
async function runAgent(input) {
const start = Date.now();
// Your existing LLM call
const llmResponse = await openai.chat.completions.create({
model: 'gpt-4o',
messages: buildPrompt(input),
});
const tokensIn = llmResponse.usage.prompt_tokens;
const tokensOut = llmResponse.usage.completion_tokens;
const result = parseAgentOutput(llmResponse);
// Meter the execution — this is the billing record
await fetch('https://rev.polsia.app/v1/meter', {
method: 'POST',
headers: {
'Authorization': 'Bearer YOUR_API_KEY',
'Content-Type': 'application/json',
},
body: JSON.stringify({
agent_id: 'my-agent-v1',
action: 'task.complete',
tokens_input: tokensIn,
tokens_output: tokensOut,
outcome: result.success ? 'success' : 'failure',
metadata: {
duration_ms: Date.now() - start,
input_length: input.length,
customer_id: input.customerId,
},
}),
});
return result;
}
Three instrumentation rules to follow:
- Always meter, even on failure. Failed runs still consume compute. If you only meter successes, your billing data is wrong and you're absorbing failure costs silently.
- Capture token counts from the LLM response, not estimates. Estimation drifts. Every major LLM provider returns actual usage in the response — read it directly.
- Store a
customer_idin metadata. You'll need to aggregate by customer for invoicing. Do it at instrumentation time, not retroactively.
Step 2: Choose a metering strategy — real-time vs batch, per-token vs per-call
Not all agents bill the same way. Before wiring up a pricing engine, choose the right metering strategy for your workload.
| Strategy | How it works | Best for |
|---|---|---|
| Real-time per-call | Meter API called synchronously on every execution. Price computed instantly. | Low-to-medium volume agents where latency is acceptable. Simplest to implement. |
| Real-time per-token | Token counts sent with each call. Price = base fee + (tokens × rate). | Developer tools, internal agents, or workloads with highly variable token consumption. |
| Deferred outcome | Call registered as pending. Billing finalized when outcome is confirmed (e.g., reply received, meeting booked). | Async workflows where success isn't known immediately — outreach, approvals, fulfillment. |
| Batch (async queue) | Meter events buffered and flushed periodically. Useful for very high-throughput agents. | >1,000 calls/minute where synchronous metering adds unacceptable latency. |
For most early-stage AI agents, real-time per-call is correct. Add per-token components once you have usage data showing your token consumption is actually variable enough to matter. Add deferred outcomes when you've built reliable outcome detection. Don't over-engineer from day one.
Step 3: Build the billing pipeline — aggregation, invoicing, payment collection
The billing pipeline is what sits between raw meter events and money in your account. It has three stages:
Aggregation
Raw meter events need to be aggregated per customer per billing period. This is where you compute: how many calls did customer X make this month? How many tokens? How many successful outcomes?
// Using Rev's analytics API to get per-customer usage
const response = await fetch(
'https://rev.polsia.app/v1/analytics?period=monthly&customer_id=cust_123',
{
headers: { 'Authorization': 'Bearer YOUR_API_KEY' }
}
);
const usage = await response.json();
// usage.total_calls: 4,821
// usage.total_tokens: 9,304,000
// usage.success_count: 142
// usage.total_charged: 284.30 ← computed by pricing engine on each meter call
Invoicing
Once you have aggregated usage, you need to turn it into an invoice. The simplest approach for early-stage products is to use Stripe's metered billing — you report usage to Stripe each billing period and Stripe handles the invoice math, PDF generation, and payment collection. Rev handles the Stripe integration end-to-end if you don't want to build this yourself.
Payment collection
Two patterns work here:
- Postpaid (monthly invoice): Accumulate usage, invoice at period end, collect payment. Lower friction for customers, higher default risk for you. Works for enterprise or high-trust customer segments.
- Prepaid (credit balance): Customers buy credits upfront. Usage draws down the balance. Auto-recharge when balance falls below a threshold. Lower default risk, slightly higher onboarding friction. Better for developer-focused products where customers want predictable spend.
For AI agent billing, prepaid credits often outperform postpaid because developers prefer controlling their spend. Rev's pricing plans use a subscription + overage model — a predictable base that customers can budget, with usage-based overages that scale with their actual usage.
Step 4: Handle edge cases — failed calls, retries, timeout billing, disputes
The happy path is straightforward. The edge cases are where billing systems break trust with customers.
Failed calls
If your agent throws an exception or the LLM returns an error, did compute actually run? Usually yes — tokens were consumed before the failure. Bill for the tokens but don't charge outcome bonuses on failures.
async function runAgentWithBilling(input) {
let tokensIn = 0, tokensOut = 0;
let outcome = 'failure';
let result = null;
try {
const llmResponse = await openai.chat.completions.create({ ... });
tokensIn = llmResponse.usage.prompt_tokens;
tokensOut = llmResponse.usage.completion_tokens;
result = parseAgentOutput(llmResponse);
outcome = result.success ? 'success' : 'failure';
} catch (err) {
// LLM error — estimate tokens from input length if response unavailable
tokensIn = estimateInputTokens(input);
outcome = 'failure';
} finally {
// Always meter — failures still consumed compute
if (tokensIn > 0) {
await meterCall({ tokensIn, tokensOut, outcome });
}
}
return result;
}
Retries
If your agent retries failed calls automatically, each retry is a separate billing event — it consumed compute. Use a unique idempotency_key to prevent double-billing if your metering call itself retries due to network issues.
const { v4: uuidv4 } = require('uuid');
// Generate once per logical operation, not per meter attempt
const idempotencyKey = uuidv4();
await fetch('https://rev.polsia.app/v1/meter', {
method: 'POST',
headers: {
'Authorization': 'Bearer YOUR_API_KEY',
'Content-Type': 'application/json',
'Idempotency-Key': idempotencyKey, // Safe to retry this call
},
body: JSON.stringify({
agent_id: 'my-agent-v1',
action: 'task.complete',
tokens_input: tokensIn,
tokens_output: tokensOut,
outcome: outcome,
}),
});
Timeout billing
When an LLM call times out, you typically don't know the token count. Your options:
- Estimate from input length (input tokens are knowable pre-call)
- Charge a flat timeout fee (predictable, easier to explain)
- Don't charge timeouts (absorb the cost, better customer experience)
Option 3 is the right call at early stage — you need customer trust more than you need perfect accounting. Implement a proper timeout token estimation when billing disputes become a pattern.
Disputes
Customers will dispute charges they don't recognize. Win the dispute by having an audit trail for every meter event: timestamp, token counts, outcome, agent ID, and enough metadata to reconstruct what happened. Rev stores every meter event permanently — the customer can see their own usage data in the dashboard, which reduces disputes by letting them self-audit before they reach you.
Step 5: Scale considerations — high-throughput metering, idempotency, audit trails
When you're running thousands of agent calls per hour, the billing infrastructure needs to keep up without adding meaningful latency to each call.
Async metering for high-throughput agents
At high volume, making a synchronous HTTP call to a metering API on every agent execution adds latency and creates a dependency: if the metering service has a blip, your agent fails. The fix is async metering — buffer events locally and flush in batches.
class MeterBuffer {
constructor(apiKey, { flushInterval = 5000, maxBatch = 100 } = {}) {
this.apiKey = apiKey;
this.buffer = [];
this.flushInterval = flushInterval;
this.maxBatch = maxBatch;
this._timer = setInterval(() => this.flush(), flushInterval);
}
// Non-blocking — returns immediately
record(event) {
this.buffer.push({ ...event, recorded_at: new Date().toISOString() });
if (this.buffer.length >= this.maxBatch) this.flush();
}
async flush() {
if (this.buffer.length === 0) return;
const batch = this.buffer.splice(0, this.maxBatch);
try {
await fetch('https://rev.polsia.app/v1/meter/batch', {
method: 'POST',
headers: {
'Authorization': `Bearer ${this.apiKey}`,
'Content-Type': 'application/json',
},
body: JSON.stringify({ events: batch }),
});
} catch (err) {
// Push failed events back for retry
this.buffer.unshift(...batch);
}
}
}
const meter = new MeterBuffer(process.env.REV_API_KEY);
// In your agent — non-blocking, zero latency impact
meter.record({
agent_id: 'high-volume-agent',
action: 'classify.complete',
tokens_input: 600,
tokens_output: 80,
outcome: 'success',
});
Idempotency at scale
In distributed systems, meter events can be emitted multiple times due to retries, network failures, or at-least-once delivery semantics. Your metering layer must be idempotent — receiving the same event twice should produce the same billing result as receiving it once. Rev deduplicates by Idempotency-Key — always include one.
Audit trail requirements
Every billing event should be immutable and queryable. Minimum fields per record:
- Event ID — globally unique, maps to idempotency key
- Timestamp — when the execution happened, not when you metered it
- Customer ID — who gets billed
- Agent ID + action — what ran and what it did
- Token counts (in + out) — raw LLM usage
- Outcome + price charged — what the billing engine decided
- Metadata blob — anything customer-visible for dispute resolution
Never delete billing records. Archive them, compress them, move them to cold storage — but never delete. Revenue audits and customer disputes can go back years. For the same reason, record the price at billing time, not a reference to your current pricing tier. Pricing tiers change; historical charges must be immutable.
Skip building it yourself — Rev handles all of this out of the box
The five steps above represent 2–4 weeks of engineering for a team building it from scratch. You need: a metering API with idempotency and batch support, a pricing engine that handles base + token + outcome configurations, a deferred outcome resolution system, Stripe integration for invoicing and payment collection, a customer-visible usage dashboard, and an immutable audit trail.
Rev is all of this in one API call.
// Add this after your LLM call. Rev handles pricing, invoicing, audit trail.
const { price_charged, outcome_id } = await fetch('https://rev.polsia.app/v1/meter', {
method: 'POST',
headers: {
'Authorization': 'Bearer YOUR_API_KEY',
'Content-Type': 'application/json',
},
body: JSON.stringify({
agent_id: 'my-agent-v1',
action: 'task.complete',
tokens_input: llmResponse.usage.prompt_tokens,
tokens_output: llmResponse.usage.completion_tokens,
outcome: result.success ? 'success' : 'pending',
expires_at: new Date(Date.now() + 7 * 86400000).toISOString(),
metadata: { customer_id: customerId },
}),
}).then(r => r.json());
You configure pricing tiers in the dashboard (base fee, token rate, outcome bonus — set any to zero). Rev computes the price on every call, stores the event, handles Stripe invoicing, and surfaces usage analytics per customer. The revenue calculator lets you model expected revenue at different call volumes and success rates before you pick your pricing.
Get your API key and make your first metered call in under 5 minutes — no setup, no sales call. Get Your API Key — It's Free →
What to read next
Now that you understand the implementation, two resources make the decisions easier:
- AI Agent Pricing Models Compared — per-token vs per-call vs outcome-based, with a decision framework for choosing the right model for your specific agent type.
- The Complete Guide to Billing AI Agents — build vs buy analysis, deferred outcome patterns, and the full Rev API surface area.
- API reference — every endpoint, parameter, and response field.
- Revenue calculator — model your expected monthly revenue before committing to a pricing tier.