dart-studio
← All writing
·2 min read·Alex Kargin

On-prem AI for small businesses — hype vs. what works

Running AI "on your own server" sounds great until you price the GPU. Here's what's actually realistic at the small-operation end of the spectrum.

on-premreality-checkhardware

"On-prem AI" is having a moment. Half of what's being sold under that label is real engineering; the other half is consultants moving GPU servers to businesses that don't need them. Separating them is useful.

"On-prem" really means one of three things

  1. A desktop under the counter. Mac mini, refurbished SFF PC, NUC. Surprisingly capable. Quiet, cheap, handles a small chatbot or review-reply workload.
  2. A VPS you rent. Technically not on-prem, but nothing leaves your control and it still satisfies most data-residency arguments.
  3. A real server you own. Dell / HP tower with proper power and cooling. Overkill for most, necessary for some.

For a small technical operation with a chatbot and a booking form, options 1 or 2 are right. Option 3 is almost always someone else's upsell.

When a GPU actually matters

  • Large-model inference in real time. 70B-parameter quality in under 2 seconds requires GPU. 1B–7B on CPU is fine for less-demanding workloads.
  • Many concurrent users. A storefront chatbot handling 10 simultaneous streaming replies needs more throughput than a single-threaded CPU provides.
  • Real-time voice pipelines. ASR + LLM + TTS with sub-second targets demand GPU.

Everything else — text chatbot, review responder, email triage, lead capture — runs on CPU.

Typical cost for the "cheap path"

ItemCost
Mac mini M-series or refurb SFF PC$600–1,200 one-time
External SSD for models (2TB)$150
Small UPS$100
Network config$0–200
Setup labor$500–1,500

Total: under $2,500 one-time for a system that runs for ~5 years.

Typical cost for the "all cloud" path

$49–99/month, forever. Cancel the subscription, lose the accumulated data. Vendor raises prices, you pay or migrate. Over 5 years: $3K–6K with none of the ownership.

The hybrid that works in practice

For most small deployments:

  • Heavy lifting (inference, DB) on a local or rented box you control
  • Backups to cheap cloud storage (B2 / Wasabi / S3 cold tier)
  • Analytics to ClickHouse on the same box or a shared analytics host

The local box runs the critical path. The cloud handles "what if the box catches fire."

Where on-prem is the wrong call

If the site is a pure static marketing page with no AI interaction, no data, no state — just host it on a CDN and move on. On-prem is a means to an end (cost + control + privacy), not a virtue.

— Alex Kargin. More engineering writing at kargin-utkin.com.

Next step

Thirty-minute scoping call. No pitch, no retainer.

Tell us what eats your week. We'll tell you, honestly, whether AI can help — and if it can't, we'll say so.

Book a scoping callOr use the contact form →

Serving Broward & Palm Beach County, FL.