Why we default to Ollama for small-business AI
The same toolchain that data engineers are using to run models locally turns out to be the right default for small, long-lived deployments. Here's why.
When picking a stack for a long-lived small deployment, the useful constraint to apply is: will this survive the original engineer leaving, for five years, without external intervention?
That rules out a surprising amount of 2024-era tooling. API wrappers break when providers change pricing or sunset models. Custom inference servers need ops knowledge the eventual operator won't have. Managed platforms lock in and re-price.
What survives? Boring, open, locally runnable. Ollama fits.
What Ollama actually is
A small daemon that runs open-source models on any Linux/macOS/Windows box. It exposes an OpenAI-compatible HTTP API. Any code written against OpenAI switches with a one-line change:
OPENAI_BASE_URL=http://127.0.0.1:11434/v1
That's the entire migration. Same SDK, same request shape, same response shape.
Why this matters for small deployments
- Vendor pricing risk goes to zero. The model is a file on disk. It runs until electricity runs out.
- Data stays local. For anything touching PHI, legal records, or payment data, this stops objections before they start.
- Swapping models is trivial. Today it's
llama3.2:1b. In six months when something better ships,ollama pull <new-model>and change one env var. No contract renegotiation, no approval cycles.
Where Ollama is the wrong answer
Three cases to reach for a cloud LLM API instead:
- Frontier-quality work in a niche language. Small open models still wobble on nuanced Spanish or Russian; GPT-4o and Claude Sonnet don't.
- Real-time voice pipelines. The whole round-trip (ASR + LLM + TTS) gets easier on managed infrastructure.
- Client already has credits and doesn't care about the data-sovereignty question.
The operational argument
The best part of the stack is what it doesn't have. No Kubernetes. No auto-scaler. No observability mesh. One systemd unit, one daemon, one ollama pull. A junior developer can pick up the whole thing in an afternoon, which is the real sustainability test.
— Alex Kargin. More engineering writing at kargin-utkin.com.