Claude Sonnet 4.5 Benchmark Report

Claude Sonnet 4.5 Benchmark

The world's best coding model. Claude Sonnet 4.5 sets new standards in software engineering, computer use, and agentic tasks while showing substantial gains in reasoning and math.

Model Profile

Claude Sonnet 4.5

Release9/29/2025
DeveloperAnthropic
CategoryProprietary Model
Claude Sonnet 4.5 is Anthropic's flagship model. Performance data reflects verified third-party benchmarks from Artificial Analysis.

Claude Sonnet 4.5 Benchmarks

With each new model, Claude becomes more powerful and accurate. The advantages are often obvious but difficult to understand and categorize. We've compiled a list of different fields in order to get deeper insights into Claude Sonnet 4.5's performance.

Below is a concise, quotable roundup of how Claude Sonnet 4.5 performs, where it shines, how it compares to GPT-5 and earlier Claude models, plus caveats on what the numbers really mean.

Claude Sonnet 4.5 Benchmark Results

Comprehensive evaluation results across key AI capability domains reported by Anthropic

79.5%
Average Score
3/5
Excellent Results
88%
Highest Score

Claude Intelligence Comparison

Model Generations

Sonnet 4.5 leads
Excellent
0%85%100%

Intelligence comparison across Claude model generations using Artificial Analysis Intelligence Index.

Coding Performance - 82.0%

SWE-bench Verified

82.0%
Excellent
0%82%100%

Best coding model in the world. Data from SWE-bench Verified with parallel test-time compute.

Intelligence Benchmark - 63

Reasoning & Problem-Solving

63
Good
0%63%100%

Artificial Analysis Intelligence Index composite score measuring reasoning and problem-solving.

Math Competition - 88.0%

AIME 2025

88.0%
Excellent
0%88%100%

Data from AIME 2025: Advanced mathematics competition performance from Artificial Analysis.

Claude Models Comparison

Performance analysis comparing Claude Sonnet 4.5 with other Claude models using real data from Artificial Analysis.

Claude Intelligence: Model Generation Comparison

Intelligence comparison across Claude model generations using the Artificial Analysis Intelligence Index.

Sonnet 4.5 (Primary)
Other Claude Models

Intelligence comparison across Claude model generations using the Artificial Analysis Intelligence Index.

Evolution of intelligence between models:

  • Claude 3.5 established strong reasoning and coding capabilities with balanced performance.
  • Claude 4 delivered major gains in complex reasoning and multi-step problem solving.
  • Claude Sonnet 4.5 represents Anthropic's latest advancement in AI capabilities.

Data Source: All performance metrics sourced from independent evaluations by Artificial Analysis.

Coding Performance

Data from SWE-bench Verified: Real-world software engineering on authentic GitHub issues with comprehensive test coverage

Sonnet 4.5 (Primary)
Other Claude Models
Other Models

Claude Sonnet 4.5 achieves 82.0% on SWE-bench Verified with parallel test-time compute, making it the best coding model in the world. This benchmark tests real GitHub issues with comprehensive test coverage.

Parallel test-time compute means running multiple attempts in parallel and selecting the best result that passes tests.

Source: Anthropic's official announcement (September 29, 2025)

Claude vs Leading AI Models

Performance comparisons between Claude Sonnet 4.5 and the world's most advanced AI models using verified data from Artificial Analysis.

Intelligence Benchmark (Claude vs Leading AI Models)

Composite score measuring reasoning, problem-solving, and knowledge across multiple domains from Artificial Analysis

Sonnet 4.5 (Primary)
Other Claude Models
Other Models

Claude Sonnet 4.5 achieves a score of 63 on the Artificial Analysis Intelligence Index, demonstrating strong reasoning and problem-solving capabilities across multiple domains.

Source: Artificial Analysis Intelligence Index Leaderboard

Math Competition

Data from AIME 2025: Advanced mathematics competition performance from Artificial Analysis

Sonnet 4.5 (Primary)
Other Claude Models
Other Models

Claude Sonnet 4.5 achieves 88.0% on AIME 2025, demonstrating strong mathematical reasoning capabilities.

Source: Artificial Analysis AIME 2025 Benchmark Leaderboard

AI Model Speed Comparison

Compare real-time token generation speeds between any two AI models. Watch as they generate 200 tokens (≈150 words) side by side.

First Model

77.1 tok/s
Speed
Progress0/200 tokens (0%)

Second Model

664.9 tok/s
Speed
Progress0/200 tokens (0%)
Simulating 200 tokens (≈150 words) generation • Based on median output tokens per second from analysis data

Claude Sonnet 4.5 Extended Thinking Explained

Claude Sonnet 4.5 introduces Extended Thinking, a feature that adjusts computational effort based on task complexity. For straightforward prompts, it responds quickly using a lighter reasoning path.

For complex tasks, it engages deeper reasoning, breaking down problems into logical steps. This enables Claude Sonnet 4.5 to maintain focus for more than 30 hours on complex, multi-step tasks, making it ideal for long-running agentic workflows.

Where Claude Sonnet 4.5 Excels

Claude Sonnet 4.5 sets new standards in several key areas:

  • Coding: 77.2% on SWE-bench Verified - the best coding model in the world
  • Computer Use: 61.4% on OSWorld - a 45% improvement over Claude Sonnet 4
  • Agentic Tasks: Maintains focus for 30+ hours on complex multi-step tasks
  • Alignment: Most aligned frontier model with reduced harmful behaviors

These improvements make Claude Sonnet 4.5 particularly valuable for software development, complex problem-solving, and building sophisticated AI agents.

Where Claude Sonnet 4.5 Has Room to Grow

Despite its strengths, Claude Sonnet 4.5 has areas for improvement. Performance can vary based on configuration: results may differ depending on whether Extended Thinking is enabled and the complexity of the prompt.

The model is subject to AI Safety Level 3 (ASL-3) protections, which include classifiers that may occasionally flag normal content, though false positives have been reduced by a factor of ten since initial release.

Like all frontier models, Claude Sonnet 4.5 should be used with careful evaluation and appropriate safeguards in production environments.

Conclusion

Claude Sonnet 4.5 represents a major advancement in AI capabilities, achieving world-leading performance in coding, computer use, and agentic tasks. Extended Thinking provides flexibility for both quick responses and deep reasoning.

As Anthropic's most aligned model yet, it sets new standards in AI safety and reliability. At the same pricing as Claude Sonnet 4 ($3/$15 per million tokens), it offers exceptional value for developers and businesses building with AI.

Claude Sonnet 4.5 demonstrates what's possible when cutting-edge capabilities meet thoughtful alignment, setting a new benchmark for frontier AI models.

FAQ

Ready to Experience Claude Sonnet 4.5?

Try the world's best coding model and see how it can transform your development workflow with state-of-the-art performance.