Cursor Composer 2.5 Hits 79.8% on SWE-bench Multilingual

Cursor has upgraded its proprietary AI coding model with the release of Composer 2.5, focusing on sustained, multi-step work across large codebases. The model uses targeted reinforcement learning to match the performance of frontier models like Claude Opus 4.7 and GPT-5.5 on long-horizon agentic tasks. Cursor directed 85% of its compute budget for this release toward extended training and reinforcement learning environments.

Architecture and Reinforcement Learning

Composer 2.5 continues to utilize the open-source Moonshot Kimi K2.5 checkpoint as its foundation. The training stack relies on Sharded Muon and Dual Mesh HSDP, which facilitate more efficient distributed training.

Cursor scaled the model’s intelligence by training it on 25 times more synthetic tasks than the previous generation. These tasks are heavily grounded in real-world codebases. The training process uses techniques like “feature deletion,” where the agent receives a codebase and its test suite, then must reimplement a deliberately removed feature to satisfy the existing tests. This provides a verifiable reward signal for the reinforcement learning pipeline.

The model also incorporates new learning methods utilizing textual feedback to tune behavioral aspects. This targeted RL improves the agent’s communication style and effort calibration, prioritizing usability traits that standard benchmarks often fail to capture.

To handle context scaling, Composer 2.5 implements a “compaction-in-the-loop” training process. The model learns to effectively summarize its own context, allowing it to navigate code trajectories that extend far beyond its physical context window.

Cursor concurrently disclosed a strategic collaboration with SpaceXAI to train a future, significantly larger model from scratch using Colossus 2’s million H100-equivalents and 10 times more total compute.

Benchmark Performance

Composer 2.5 reaches parity with top-tier models on complex software engineering evaluations, demonstrating its capacity for multi-file code modifications.

Benchmark	Composer 2.5 Score
SWE-bench Multilingual	79.8%
CursorBench v3.1	63.2%

Pricing and Platform Updates

Composer 2.5 is available immediately within the Cursor IDE. Usage is divided into two distinct pricing tiers depending on latency requirements:

Standard Tier: $0.50 per 1 million input tokens and $2.50 per 1 million output tokens.
Fast Tier (Default): $3.00 per 1 million input tokens and $15.00 per 1 million output tokens.

The release ships alongside new platform capabilities designed for multi-step agent operations. A new Build in Parallel feature uses async subagents to multitask across independent segments of a coding plan, accelerating execution speed during complex refactors.

For development teams requiring controlled execution, Cursor added Development Environments for Cloud Agents. Users can configure Dockerfile-based environments with build secrets and layer caching, allowing agents to run end-to-end tasks in an isolated cloud setup. A new Microsoft Teams integration also enables developers to delegate tasks directly to cloud agents by mentioning @Cursor in chat threads.

If you manage complex applications, these updates shift the IDE agent’s role from localized code generation to persistent, repository-wide execution tasks.

Cursor Composer 2.5 Hits 79.8% on SWE-bench Multilingual

Architecture and Reinforcement Learning

Benchmark Performance

Pricing and Platform Updates

Keep Reading

How to Control Token Budgets in Claude Code via Effort Levels

Cursor's Composer 2.5 Cuts Bugbot Review Times to 90 Seconds

Scaling Ecom-RLVE for Verifiable AI Shopping Agents

Claude 3.5 Sonnet Doubles Devin's Coding Task Resolution

Opus 4.8 Max Accuracy Drops to 73% on Hardened SWE-bench Pro