Moonshot’s Kimi K2 Surpasses GPT‑5 on Key Benchmar...

Moonshot AI’s Kimi K2 Thinking, released today, marks a watershed moment for open‑source language models. Built on a sparse mixture‑of‑experts architecture with one trillion parameters and 32 billion active experts per inference, K2 delivers 256‑k‑token context windows and up to 200–300 sequential tool calls without human intervention. In benchmark tests, the model achieved a 44.9 % score on Humanity’s Last Exam, 60.2 % on BrowseComp, 71.3 % on SWE‑Bench Verified, and 83.1 % on LiveCodeBench v6—outpacing GPT‑5’s corresponding scores and eclipsing the previous open‑weight leader MiniMax‑M2 by wide margins.

Beyond raw performance, K2’s architecture is engineered for efficiency. The model incorporates INT4 quantization‑aware training, doubling inference speed relative to full‑precision models while maintaining accuracy. Pricing tiers—$0.15 per 1 M cache‑hit tokens, $0.60 per 1 M cache‑miss tokens, and $2.50 per 1 M output tokens—are an order of magnitude cheaper than GPT‑5, making high‑end reasoning accessible at a fraction of the cost. Licensing under a modified MIT agreement grants full commercial use yet requires attribution for deployments exceeding 100 million monthly active users or $20 million monthly revenue, a light‑touch requirement that preserves openness.

The implications are far‑reaching. Enterprises can now deploy a fully open‑weight system that matches or surpasses proprietary flagship models while retaining control over data, compliance, and customization. The collapse of the performance gap between closed and public models signals a shift from capital‑intensive data centers to architecture‑driven efficiency, challenging the business model of U.S. incumbents and raising questions about sustainable AI investment. For developers and researchers, K2 offers transparent reasoning traces and tool‑use capabilities that enable complex, multi‑step workflows—an essential feature for the next generation of autonomous agents.

Moonshot’s Kimi K2 Surpasses GPT‑5 on Key Benchmarks

Related Articles

Deductive AI Cuts DoorDash Debugging Hours by 1,000

Baseten Launches Training Platform to Own Model Weights

Qodo’s Context Engineering Saves Monday.com From Code Overload

Suno AI: Revolutionizing Music Creation with Artificial Intelligence