April 2, 2025·2 min read·Alex Kargin

Running AI on a $20/month VPS — what we actually measured

Can a cheap rented server really host a working AI chatbot? We benchmarked three small models on real small-business questions. Here's what held up.

benchmarkslocal-llminfrastructure

Every week someone asks us the same thing: "can I just run this on a cheap server, or do I need a GPU?"

Short answer: for a focused small-business chatbot, yes. For anything else, it depends. Here are the numbers we actually saw.

The setup

Server: standard 2-vCPU / 4GB RAM VPS (~$20/month on any of the usual hosts)
Software: Ollama, Apache reverse proxy, a small Next.js frontend
Models tested: llama3.2:1b, qwen2.5:0.5b, phi3:mini
Workload: a 7-question plumbing FAQ fed through a chat widget — the kind of thing a Broward plumber's website might need.

The results

Model	Size on disk	First-token latency	Full reply	Answer quality (1-5)
llama3.2:1b	1.3 GB	2.1s	9s	4
qwen2.5:0.5b	400 MB	1.1s	4s	3
phi3:mini	2.2 GB	3.0s	14s	4

Numbers move by 20–40% depending on how loaded the server is at the moment. If your VPS is sharing CPU with a database under load, expect the high end.

What actually matters

For a FAQ bot on a plumbing website, first-token latency is the number that makes or breaks the demo. Nobody stares at an empty chat bubble for 3 seconds and thinks "I bet this is working."

Our fix: stream the response. The model still takes 9 seconds to finish generating, but the first word shows up in 2, and users read as it types. Perceived latency collapses to seconds.

When we'd still reach for a bigger stack

Three situations force us to go past the $20 VPS:

Multilingual. 1B models wobble on Spanish nuance that matters in South Florida.
Real-time voice. That's a different game (hello, Twilio + streaming ASR).
Pulling from client data. If the bot is reading your CRM, the infra gets more careful.

How we use this

Our live chatbot demo runs on exactly this setup. Every visitor who types in it is hitting a 1B-parameter model on a shared server. It's good enough to sound like a real business. We built it so prospects could answer the "is it actually fast?" question themselves.

Your turn: 30-min scoping call, free, no pitch.

The setup

The results

What actually matters

When we'd still reach for a bigger stack

How we use this

Thirty-minute scoping call. No pitch, no retainer.