Moonshot AI’s latest release, Kimi K2 Thinking, has stunned the community by out‑ranking OpenAI’s GPT‑5 and Anthropic’s Claude 4.5 on a battery of third‑party benchmarks, all while remaining fully open‑source. The 1‑trillion‑parameter mixture‑of‑experts model activates 32 billion weights per inference and offers a built‑in reasoning trace that guides up to 300 tool calls without human supervision. With scores of 44.9 % on Humanity’s Last Exam, 71.3 % on SWE‑Bench Verified and 60.2 % on BrowseComp, Kimi K2 eclipses both proprietary leaders and the previous open‑weight champion MiniMax‑M2.
Beyond raw numbers, Kimi K2’s architecture blends advanced quantization (INT4 QAT) with sparse activation, doubling inference speed while preserving accuracy over 256 k‑token contexts. The model’s “thinking‑content” field exposes intermediate logic, allowing developers to audit decisions and fine‑tune tool‑use pipelines. Benchmarks confirm its dominance: it scores 85.7 % on GPQA Diamond versus GPT‑5’s 84.5 % and reaches parity on mathematical challenges like AIME 2025. Open‑weight competitors no longer lag; MiniMax‑M2’s BrowseComp score of 44.0 is now outpaced by Kimi K2’s 60.2 %, illustrating the rapid closing of the performance gap.
The licensing choice—a Modified MIT with a modest attribution clause for high‑volume deployments—makes Kimi K2 one of the most permissively licensed frontier models, enabling researchers and enterprises to deploy it without the overhead of proprietary agreements. For businesses, the implication is stark: a free, open‑source model can match or exceed the capabilities of paid APIs while offering full control over data and compliance. As U.S. giants grapple with soaring compute costs and regulatory scrutiny, the emergence of Kimi K2 signals a structural shift toward efficiency‑driven, collaborative AI research, redefining what it means to possess cutting‑edge intelligence.
Want the full story?
Read on VentureBeat →