The DeepSeek moment, and what it meant for your coffee shop
When a Chinese lab shipped a reasoning model you could run on your own hardware, the ground moved for everyone — especially anyone paying per token.
In late January 2025, DeepSeek dropped a reasoning model that ran on commodity hardware and matched the commercial giants on several benchmarks. The "only Big-AI can do this" framing got noticeably quieter.
Three shifts followed that matter for anyone running a small technical operation.
1. The phone-home tax became optional
Before: every inference routes through an API provider, metered in tokens, priced per thousand. After: a 2GB model on a modest VPS answers the same questions for a rounding error once you own the hardware.
For an FAQ-style chatbot doing ~500 conversations a month at ~5k tokens each, cloud-API cost lands in the $40–80 range. The same workload on a self-hosted model: roughly $0 marginal, once the box is paid for. The gap widens as volume climbs.
2. "Data leaves the building" stopped being inevitable
The thing that actually blocked AI adoption in regulated or trust-sensitive contexts — patient names, payment data, legal records running through a third-party API — became newly optional. On-prem inference means the conversation stays local.
That unlocks verticals (dental, medical, legal, accounting) where the compliance conversation used to end the deal.
3. The interesting question shifted
From can I afford AI? to who owns the model, who owns the data, and who sets the price next year? Different question with different answers.
What the 2025 drumbeat looked like after
DeepSeek wasn't one release; it kicked off a cadence. Llama 3.2, Qwen 2.5, Phi-3, then Llama 3.3 — each one a little better, a little smaller, a little friendlier to CPU-only deployment. By mid-2025 the floor for "useful local LLM" had dropped to models you could run on a laptop without a GPU.
None of that is a story about any particular vendor. It's a story about the ground under the vendors shifting.
— Alex Kargin. More engineering writing at kargin-utkin.com.