Introducing Kimi K2 Thinking, China's ‘Most Capable' Open-Source Model

Investors call Beijing’s Moonshot AI one of China’s “AI Tiger” companies — a firm at the forefront of China’s race toward global AI dominance. According to co-founder Yang Zhilin, his goal is to build foundational models to achieve artificial general intelligence (AGI).

This week, Moonshot AI released what it calls its “most capable” open-source thinking model: Kimi K2 Thinking.

All About Kimi K2 Thinking
Kimi K2 Thinking Specs
How Does Kimi Compare? K2 Thinking vs GPT-5 vs Claude Sonnet 4.5
Kimi K2 Thinking's Agentic Capabilities
Kimi K2 Thinking's General Capabilities
How to Use Kimi K2 Thinking

All About Kimi K2 Thinking

According to company officials, Kimi K2 Thinking (sometimes referred to as just K2 Thinking) is built as an open-source “thinking agent that reasons step-by-step” while using tools.

K2 Thinking’s key features include:

Deep Thinking & Tool Orchestration: Trained to weave chain-of-thought reasoning with tool calls, enabling autonomous tasks like research, coding and writing workflows.
Native INT4 Quantization: Quantization-Aware Training (QAT) is used in the post-training stage to deliver a lossless 2× speed-up.
Stable Long-Horizon Agency: K2 Thinking can execute 200-300 consecutive tool calls without human interference (prior models degraded after 30-50 steps).

Kimi K2 Thinking Specs

Architecture	Mixture-of-Experts (MoE)
Total Parameters	1 Trillion
Activated Parameters	32 Billion
Number of Layers (Dense layer included)	61
Number of Dense Layers	1
Attention Hidden Dimension	7168
MoE Hidden Dimension (per Expert)	2048
Number of Attention Heads	64
Number of Experts	384
Selected Experts per Token	8
Number of Shared Experts	1
Vocabulary Size	160K
Context Length	256K
Attention Mechanism	MLA
Activation Function	SwiGLU

How Does Kimi Compare? K2 Thinking vs GPT-5 vs Claude Sonnet 4.5

According to Moonshot AI, K2 Thinking sets new records across benchmarks for reasoning, coding and agent capabilities. It was compared against OpenAI's GPT-5 (High) and Anthropic's Claude Sonnet 4.5 (Thinking).

K2 Learning achieved:

44.9% on Humanity’s Last Exam (with tools)
60.2% on BrowseComp
71.3% on SWE-Bench Verified

Kimi K2 Learning's Evaluation Scores — Kimi K2 Thinking’s Evaluation Scores

A Note About AI Evaluation Accuracy

Unlike traditional software with deterministic outputs, LLMs generate probabilistic results, meaning the same input can produce different outputs. Evaluations typically rely on standardized benchmarks (like MMLU or GPQA) but these tools have notable limitations.

Some benchmarks lack practical usability or fail to represent real-world scenarios adequately, and results can be misleading if not interpreted with proper context.

“Benchmarks are deeply political, performative and generative in the sense that they do not passively describe and measure how things are in the world, but actively take part in shaping it,” researchers noted. These benchmarks influence how AI models are trained, fine-tuned and applied, they added — practices with broad political, economic and cultural impacts.

Kimi K2 Thinking's Agentic Capabilities

According to Moonshot AI, Kimi K2 Thinking excels at:

Agentic Reasoning

K2 Thinking is set up with a diverse toolkit that allows it to plan, reason, execute and adapt across hundreds of steps. In one example, according to the model's creators, it solved a PhD-level math problem through 23 interwoven reasoning and tool calls.

Agentic Coding

K2 Thinking reasons while using tools, allowing it to integrate into software agents to complete complex, multi-step development workflows.

Some examples that Kimi K2 Thinking built from a single prompt, according to the company, include:

A component-heavy website
A short math explainer visualization
A simulation of virus-attacking cells in bloodstream
A vinyl record simulation
Live coding music with Strudel.cc

Agentic Search and Browsing

K2 Thinking performs cycles of think → search → browser use → think → code. During those cycles, it generates and refines hypotheses, verifies information, reasons and constructs answers.

The AI model's intertwined reasoning allows it to turn vague, open-ended problems into clear, actionable subtasks.

Related Article: Reimagining Traditional Workflows With AI Agents

Learning Opportunities

Webinar

Apr

The State of Enterprise Site Search: Moving Beyond "Good Enough"

Join CMSWire and SearchStax for a conversation about how enterprise IT and marketing leaders are moving beyond basic site search.

Webinar

Apr

AI for Your DXP: Connect What You Have, Transform How You Work

Most AI strategies stop at the platform—but work happens elsewhere. Bring intelligence into Teams, email, tickets and CRM.

Webinar

On demand

Content Leaders Collective: Navigating Content Decisions at Scale

Discover how content leaders are modernizing content operations, avoiding costly missteps and preparing for scale and AI.

Watch Now

Webinar

On demand

Content Strategy Leaders Live: Scaling for Speed, Complexity and AI in High Tech

A candid roundtable on how high-tech leaders are rethinking content at scale.

Watch Now

Webinar

On demand

Do More with Less: Modernizing the Cloud Contact Center for 2026

Learn how to leverage cloud platforms without adding a single hire to personalize every customer interaction.

Watch Now

Webinar

Complex, internal combustion engine or fine clockwork.

On demand

Cut the Noise: Deploying AI That Actually Moves the Needle

Learn how to turn AI experimentation into concrete revenue operations.

Watch Now

Webinar

Apr

The State of Enterprise Site Search: Moving Beyond "Good Enough"

Join CMSWire and SearchStax for a conversation about how enterprise IT and marketing leaders are moving beyond basic site search.

Webinar

Apr

AI for Your DXP: Connect What You Have, Transform How You Work

Most AI strategies stop at the platform—but work happens elsewhere. Bring intelligence into Teams, email, tickets and CRM.

Webinar

On demand

Content Leaders Collective: Navigating Content Decisions at Scale

Discover how content leaders are modernizing content operations, avoiding costly missteps and preparing for scale and AI.

Watch Now

Kimi K2 Thinking's General Capabilities

According to Moonshot AI, K2 Learning's other general capabilities include:

Creative Writing: K2 Thinking has strong command of style and instruction, able to handle diverse tones and formats with natural fluency.
Practical Writing: K2 Thinking reportedly follows prompts with high precision, often expanding on every mentioned point to ensure complete coverage.
Personal & Emotion: The AI model responds with empathy when addressing personal or emotional questions, offering nuanced perspectives and actionable next steps.

How to Use Kimi K2 Thinking

You can use Kimi K2 Thinking now on Kimi.com under the chat mode.

This mode only uses a subset of tools and reduces the number of tool calls, meaning using K2 Thinking this way may not reproduce benchmark scores, according to company officials.

The full agentic mode will be available soon, which the model makers say will “reflect the full capabilities of K2 Thinking.”

The model is also accessible through the Kimi K2 Thinking API.

Table of Contents

All About Kimi K2 Thinking

Kimi K2 Thinking Specs

How Does Kimi Compare? K2 Thinking vs GPT-5 vs Claude Sonnet 4.5

A Note About AI Evaluation Accuracy

Kimi K2 Thinking's Agentic Capabilities

Agentic Reasoning

Agentic Coding

Agentic Search and Browsing

Kimi K2 Thinking's General Capabilities

How to Use Kimi K2 Thinking