Hugging Face Blog·June 1, 2026
JetBrains Unveils Mellum2: A Lean, Mean MoE Model for Production AI Workflows
JetBrains has released Mellum2, a 12-billion-parameter Mixture-of-Experts (MoE) model designed for high-speed text and code tasks. Unlike monolithic foundation models, Mellum2 activates only 2.5 billion parameters per token, delivering over twice the inference speed of comparable open models without sacrificing benchmark performance.
Trained from scratch on natural language and code, Mellum2 is purpose-built for the kind of latency-sensitive operations that power modern AI stacks: routing queries, compressing context for retrieval-augmented generation (RAG), summarizing documents, orchestrating sub-agents, and handling intermediate steps in multi-model systems. The model’s compact footprint makes it a strong candidate for self-hosted deployments where proprietary code or data privacy is a concern.
JetBrains positions Mellum2 as a “focal” model—a specialized component that slots into larger AI architectures rather than trying to do everything. Think of it as the fast, reliable worker handling the high-frequency tasks so that larger, more expensive models are reserved only for heavy lifting. The company reports competitive results across code generation, reasoning, and math benchmarks, while maintaining production-grade throughput.
Mellum2 is released under the Apache 2.0 license and is available for download on Hugging Face. For those interested in the technical details, JetBrains has also published a full report covering architecture, training setup, and evaluation methodology. For engineering teams building inside IDEs, RAG pipelines, or agent workflows, this is a model worth trying.
Source: Hugging Face Blog →
Want a self-updating news feed like this on your own site? Book a free call →