Ai Engineering 4 min read

Google Rolls Out Beta Gemini-in-Sheets Creation Tools as It Tops SpreadsheetBench

Google launched beta Gemini-in-Sheets creation and editing features as the product posted a state-of-the-art 70.48% on SpreadsheetBench.

Google rolled out new beta Gemini-in-Sheets creation and editing tools on March 10, 2026, and tied the launch to a public benchmark result: Gemini in Google Sheets scored 70.48% on SpreadsheetBench, placing it first on the leaderboard. Google’s official announcement pairs a product launch with a measurable claim about spreadsheet manipulation. For developers building spreadsheet agents, office automation, or tool-using LLM workflows, the shift is from formula assistance to agentic spreadsheet operations.

Product Scope

The March 10 rollout adds two concrete Sheets workflows. First, Gemini can create, organize, and edit entire spreadsheets from natural language prompts. Second, a new Fill with Gemini flow can populate rows or columns with generated text, summaries, categorizations, or information pulled from Google Search. The rollout is beta, starting March 10, for Google AI Pro and Google AI Ultra subscribers. Docs, Sheets, and Slides creation features are English-only initially and available globally for supported users.

Google’s Sheets support documentation shows Gemini can perform direct spreadsheet actions such as conditional formatting, pivot tables, dropdowns, sorting, filtering, find-and-replace, range filling, formatting, row and column operations, chart creation, and optimization tasks. Gemini in Sheets operates as a bounded office agent with direct spreadsheet manipulation capabilities, aligning with the broader move from chat interfaces toward tool-using systems covered in AI Agents vs Chatbots: What’s the Difference?.

Benchmark Results

SpreadsheetBench is a public spreadsheet manipulation benchmark with 912 tasks across 2,729 spreadsheets, built from real-world problems sourced from online Excel forums. Its evaluation checks whether a system can successfully produce the required spreadsheet edits or outputs in an online-judge-style environment.

As of March 10, 2026, the visible leaderboard entries were:

SystemScoreVerification
Gemini in Google Sheets70.48%Verified
Qingqiu Agent69.96%Verified
Univer68.86%Verified
Lingxi66.89%Verified
Copilot in Excel (Agent Mode)57.20%Unverified
ChatGPT Agent w/ .xlsx45.50%Unverified
Claude Files Opus 4.142.90%Unverified

Gemini’s 70.48% is 0.52 percentage points ahead of Qingqiu Agent at 69.96%. The benchmark is public and Google’s submission is Verified, but the lead is narrow. The top four entries are verified; several well-known competing systems are unverified, so direct score comparison across verification status is limited.

SpreadsheetBench and Agent Evaluation

SpreadsheetBench evaluates spreadsheet manipulation, not just spreadsheet question answering. Production spreadsheet work requires the system to update cells, create derived structures, preserve formatting constraints, and complete multi-step operations correctly. The benchmark is more comparable to agent evaluations than to static reasoning tests. When the task environment has state, tools, and procedural constraints, benchmark wins increasingly come from execution frameworks and action policies, not just larger base models.

Technical Signals

Google’s support materials indicate Gemini in Sheets can expose the code used to generate output for some tasks. At least part of the system relies on code-backed transformations or structured execution paths rather than pure freeform generation. Deterministic operations, explicit action plans, and reversible edits reduce the failure modes that matter in production. This overlaps with ideas from Structured Output from LLMs: JSON Mode Explained. Spreadsheets are a high-friction environment for loose natural language generation; systems improve when they convert user intent into typed operations.

Reliability and Availability

A 70.48% success rate implies nearly 3 in 10 benchmark tasks remain unsolved. Spreadsheet tasks have asymmetric failure costs. A formatting error is annoying; a mistaken formula, wrong filter, or bad categorization can silently corrupt downstream analysis. The safe path for spreadsheet automation is a staged workflow: generate proposed actions, surface the edits, and require approval for destructive or analytical operations.

The rollout is beta, initially available to Google AI Pro and Google AI Ultra subscribers, and English-only at launch for Docs, Sheets, and Slides. Deployment readiness depends on plan eligibility, language coverage, and admin controls.

If your product or internal tooling depends on spreadsheets, test an agentic edit path against your current prompt-to-formula or export-to-Python workflow. Start with bounded operations such as column filling, categorization, and formatting, require review for analytical transformations, and measure task completion on real sheets rather than demo prompts.

Get Insanely Good at AI

Get Insanely Good at AI

The book for developers who want to understand how AI actually works. LLMs, prompt engineering, RAG, AI agents, and production systems.

Keep Reading