On-prem AI for small businesses — hype vs. what works
Running AI "on your own server" sounds great until you price the GPU. Here's what's actually realistic at the small-operation end of the spectrum.
"On-prem AI" is having a moment. Half of what's being sold under that label is real engineering; the other half is consultants moving GPU servers to businesses that don't need them. Separating them is useful.
"On-prem" really means one of three things
- A desktop under the counter. Mac mini, refurbished SFF PC, NUC. Surprisingly capable. Quiet, cheap, handles a small chatbot or review-reply workload.
- A VPS you rent. Technically not on-prem, but nothing leaves your control and it still satisfies most data-residency arguments.
- A real server you own. Dell / HP tower with proper power and cooling. Overkill for most, necessary for some.
For a small technical operation with a chatbot and a booking form, options 1 or 2 are right. Option 3 is almost always someone else's upsell.
When a GPU actually matters
- Large-model inference in real time. 70B-parameter quality in under 2 seconds requires GPU. 1B–7B on CPU is fine for less-demanding workloads.
- Many concurrent users. A storefront chatbot handling 10 simultaneous streaming replies needs more throughput than a single-threaded CPU provides.
- Real-time voice pipelines. ASR + LLM + TTS with sub-second targets demand GPU.
Everything else — text chatbot, review responder, email triage, lead capture — runs on CPU.
Typical cost for the "cheap path"
| Item | Cost |
|---|---|
| Mac mini M-series or refurb SFF PC | $600–1,200 one-time |
| External SSD for models (2TB) | $150 |
| Small UPS | $100 |
| Network config | $0–200 |
| Setup labor | $500–1,500 |
Total: under $2,500 one-time for a system that runs for ~5 years.
Typical cost for the "all cloud" path
$49–99/month, forever. Cancel the subscription, lose the accumulated data. Vendor raises prices, you pay or migrate. Over 5 years: $3K–6K with none of the ownership.
The hybrid that works in practice
For most small deployments:
- Heavy lifting (inference, DB) on a local or rented box you control
- Backups to cheap cloud storage (B2 / Wasabi / S3 cold tier)
- Analytics to ClickHouse on the same box or a shared analytics host
The local box runs the critical path. The cloud handles "what if the box catches fire."
Where on-prem is the wrong call
If the site is a pure static marketing page with no AI interaction, no data, no state — just host it on a CDN and move on. On-prem is a means to an end (cost + control + privacy), not a virtue.
— Alex Kargin. More engineering writing at kargin-utkin.com.