dart-studio
← All writing
·1 min read·Alex Kargin

When to stop paying OpenAI — the break-even math

Cloud APIs win at volume zero. Local models win past a certain point. Here's the specific number where the line crosses for typical workloads.

pricingeconomicslocal-llm

The economic question behind "should I use a cloud LLM API or run my own?" has a surprisingly specific answer. It depends on one number: calls per month.

Baseline assumption

Typical chatbot reply: ~5,000 tokens in (system prompt + context + user message), ~300 tokens out. Numbers below assume that shape; taller context or longer responses shift the break-even.

Cloud pricing (current generation)

  • OpenAI gpt-4o-mini: $0.15 / 1M input + $0.60 / 1M output. Per reply: ~$0.0009. 1,000 replies ≈ $0.90.
  • Anthropic Haiku: comparable, maybe 10–20% different in either direction.

At low volume the cloud bill is rounding-error small. Don't overthink the architecture.

Local model cost

  • Hardware: a $20/month VPS handles a small model
  • Electricity: negligible for a 1B-parameter model on modern CPU
  • Human time: ~1 hour/month babysitting

All-in: roughly $30/month flat, regardless of traffic.

Where the lines cross

At current token pricing, you'd need roughly 30,000 chatbot replies a month before gpt-4o-mini becomes more expensive than a modest self-hosted alternative. That's ~1,000 conversations per day. Almost nobody hits it.

So for pure dollars, cloud wins for 99% of small deployments on paper.

Why the paper math isn't the whole math

Three reasons the decision doesn't always follow the cheaper-per-request math:

  1. Privacy. One PHI leak through a third-party API triggers a HIPAA conversation nobody wants. Local inference removes the conversation.
  2. Reliability. The cloud provider has an outage on the busiest day of your year, at 2am, and the person on-call is you. Local inference on your own box doesn't.
  3. Pricing risk. Cloud APIs raise prices with 30 days' notice. Your local model costs what it cost a year ago.

Rule of thumb

  • <500 calls/month: cloud. Save yourself the ops.
  • 500–5,000/month, privacy matters: go local.
  • >5,000/month: local is cheaper and more reliable.

As with most infrastructure questions, the right answer is "measure first, decide second."

— Alex Kargin. More engineering writing at kargin-utkin.com.

Next step

Thirty-minute scoping call. No pitch, no retainer.

Tell us what eats your week. We'll tell you, honestly, whether AI can help — and if it can't, we'll say so.

Book a scoping callOr use the contact form →

Serving Broward & Palm Beach County, FL.