August 5, 2025·2 min read·Alex Kargin

Why we default to Ollama for small-business AI

The same toolchain that data engineers are using to run models locally turns out to be the right fit for a plumber's chatbot. Here's why we keep reaching for it.

ollamalocal-llmstack

When we started scoping small-business AI projects, we made one bet: whatever we pick for the LLM layer, it has to survive the client for five years without us.

That rules out a lot of stacks. API wrappers break when the provider changes pricing. Custom inference servers require ops expertise the client doesn't have. Managed platforms lock you in and re-price you.

What survives five years? Boring, open, locally-runnable. That's Ollama.

What it actually is

Ollama is a tiny daemon that runs open-source models on any box — your laptop, a $20 VPS, or a beefy on-prem server. It exposes an OpenAI-compatible API, which means any code you wrote against OpenAI switches to Ollama with a one-line change.

Yes, really:

OPENAI_BASE_URL=http://127.0.0.1:11434/v1

That's the whole migration.

Why this matters for a small business

No vendor can raise your prices. The model is a file on your disk. It runs until electricity runs out.
Data never leaves the building. For any business that touches PHI, legal records, or payment data, this is the argument that stops objections.
Swap models trivially. Today it's llama3.2:1b. In six months when something better drops, you ollama pull it and change one env var. No contract renegotiation.

Where it doesn't fit

We still send customers to cloud APIs when:

Voice-realtime is the use case (Ollama + realtime voice pipeline is non-trivial)
Multilingual nuance matters more than cost (cloud frontier models still win on Spanish/Russian/Mandarin fine work)
The client already has credits and doesn't care

Proof it's not just theory

Our own chatbot demo runs Ollama on shared infrastructure. Try it here — ask it anything a plumbing customer would. It's running a 1B-parameter model on a server shared with four other projects, and it holds up.

The same infrastructure pattern is used on our engineering-lab site: labs.kargin-utkin.com hosts a production-style gym-member-churn predictor with interpretable ML — also self-hosted.

Bottom line

For small businesses, the right AI stack is the one that doesn't outrun you in cost, can't be taken away, and doesn't require an engineer on speed dial. Ollama clears all three bars.

See all our services · Try the chatbot · Book a call

What it actually is

Why this matters for a small business

Where it doesn't fit

Proof it's not just theory

Bottom line

Thirty-minute scoping call. No pitch, no retainer.