dart-studio
← All writing
·2 min read·Alex Kargin

Running AI on a $20/month VPS — what we actually measured

Can a cheap rented server really host a working AI chatbot? We benchmarked three small models on real small-business questions. Here's what held up.

benchmarkslocal-llminfrastructure

Every week someone asks us the same thing: "can I just run this on a cheap server, or do I need a GPU?"

Short answer: for a focused small-business chatbot, yes. For anything else, it depends. Here are the numbers we actually saw.

The setup

  • Server: standard 2-vCPU / 4GB RAM VPS (~$20/month on any of the usual hosts)
  • Software: Ollama, Apache reverse proxy, a small Next.js frontend
  • Models tested: llama3.2:1b, qwen2.5:0.5b, phi3:mini
  • Workload: a 7-question plumbing FAQ fed through a chat widget — the kind of thing a Broward plumber's website might need.

The results

ModelSize on diskFirst-token latencyFull replyAnswer quality (1-5)
llama3.2:1b1.3 GB2.1s9s4
qwen2.5:0.5b400 MB1.1s4s3
phi3:mini2.2 GB3.0s14s4

Numbers move by 20–40% depending on how loaded the server is at the moment. If your VPS is sharing CPU with a database under load, expect the high end.

What actually matters

For a FAQ bot on a plumbing website, first-token latency is the number that makes or breaks the demo. Nobody stares at an empty chat bubble for 3 seconds and thinks "I bet this is working."

Our fix: stream the response. The model still takes 9 seconds to finish generating, but the first word shows up in 2, and users read as it types. Perceived latency collapses to seconds.

When we'd still reach for a bigger stack

Three situations force us to go past the $20 VPS:

  1. Multilingual. 1B models wobble on Spanish nuance that matters in South Florida.
  2. Real-time voice. That's a different game (hello, Twilio + streaming ASR).
  3. Pulling from client data. If the bot is reading your CRM, the infra gets more careful.

How we use this

Our live chatbot demo runs on exactly this setup. Every visitor who types in it is hitting a 1B-parameter model on a shared server. It's good enough to sound like a real business. We built it so prospects could answer the "is it actually fast?" question themselves.

Your turn: 30-min scoping call, free, no pitch.

Next step

Thirty-minute scoping call. No pitch, no retainer.

Tell us what eats your week. We'll tell you, honestly, whether AI can help — and if it can't, we'll say so.

Book a scoping call

Serving Broward & Palm Beach County, FL.