Google Releases Gemini-SQL2: Gemini 3.1 Pro Text-to-SQL Gains 80.04% on BIRD Single-Model Leaderboard

The Google Research team announced the launch of the Gemini-SQL2 on X. They described the program as a text-to-SQL breakthrough powered by Gemini 3.1 Pro. Gemini-SQL2 posted an implementation accuracy of 80.04% on the BIRD Text-to-SQL leaderboard (Model One). Google’s chart puts it above its Gemini-SQL, the previous top entry. The metric measures whether the generated SQL works and returns the correct results, not whether it looks valid.

Gemini-SQL2
Gemini-SQL2 is a text-to-SQL capability, not a model-independent release. It translates natural language queries into what Google calls ‘actionable SQL queries.’ Power is built into the Gemini 3.1 Pro.
According to the announcement at X, “data manipulation and complex business scenarios make generating accurate SQL in natural language very difficult.” IX Post also said that “improved understanding of SQL can increase natural language capabilities across Google’s data services.” That points to integration targets like BigQuery Studio, AlloyDB AI, and Cloud SQL Studio, which already ships SQL generation based on Gemini. Google has not yet confirmed which products will receive Gemini-SQL2.
Measurements
BIRD (Large Scale Grounded Text-to-SQL Database Test Benchmark) is the industry standard for this task. It contains 12,751 query-SQL pairs across 95 databases spanning 37 professional domains, totaling 33.4GB. Databases include dirty values and need the support of external information, unlike older benchmarks like Spider.
BIRD execution precision (EX): the generated SQL must execute and return results similar to the golden query. Google says this specifically. “With the BIRD rating, which measures guaranteed execution accuracy, GeminiSQL-2’s SQL not only looks good, it also performs well.”
The Trained Single Model Track limits the processing, retrieval, and agent frameworks that aggregate use to improve scores. It measures the ability of the text-to-SQL model. Google Cloud’s previous record on this track, reported on November 15, 2025, was 76.13. Google shows human performance at 92.96, leaving a gap of 12.92 points from 80.04.
How the Leaderboard stacks up
Google’s chart, in the X post, shows Gemini-SQL2 ahead of eight named competitors, as well as several unnamed competitors. Only 80.04% were mentioned as text. The values below are read from the chart area and are approximate; dates indicate the horizontal placement of each point.
| The program | Organization | BIRD Modeling Accuracy (One Model) | Chart Date |
|---|---|---|---|
| Gemini-SQL2 | 80.04% (mean) | June 2026 | |
| Gemini-SQL | ~77.2% | March 2026 | |
| IQ-SQL | AWS | ~76.5% | December 2025 |
| Databricks RLVR 32B | Databricks | ~75.7% | July 2025 |
| SiriusAI-Text2SQL-32B-v2 | Tencent | ~75.0% | December 2025 |
| Arctic-Text2SQL-R1-32B | Snowflake | ~73.9% | June 2025 |
| GPT-5.5-xhigh | OpenAI | ~72.5% | April 2026 |
| SQLWeaver-32B | Alibaba | ~71.7% | May 2026 |
| Claude Opus 4.6 | Anthropic | ~70.1% | February 2026 |
Two patterns are evident. Google now holds two top-ranked databases, Gemini-SQL2 and Gemini-SQL. Several special models of 32B SQL also sit above some of the standard models on the border in this chart.
Use Cases with examples
- Self-help statistics: The revenue manager requests monthly revenue per region, for accounts received within 90 days of development. This requires summation, window logic, and date arithmetic. Signature-validated generation captures valid SQL but returns incorrect rows.
- Data engineering draft: Devs can write BigQuery changes from English, then update instead of writing from scratch. Google’s November 2025 task identified schema understanding as a critical component. Higher BIRD scores indicate better handling of ambiguous columns and dirty values.
- Embedded features to “query your data”.: SaaS teams adding links to natural language queries still need human review with 80% accuracy. One out of five questions may be incorrect. The result sets the expectation, not the removal of the update.
Gemini-SQL2 Introducing: Community Welcome Dashboard
Verified social engagement on Google Research announcements posts • first ~3 hours • June 12, 2026
Bird single-Model leaderboard • Accuracy of Execution
Division of Field of Engagement
X / Twitter (main post)
Watching144.4K
Being loved2,800
He repeats267
Bookmarks1,300
Answers64
Marriage standard3.1%
LinkedIn (main post)
Reaction349+
Comments12
He repeats27
Reception signal
Bookmark-plus-like to reply ratio in X. A high retention rate with few responses usually indicates approval over conflict. The level of the comment level has not been measured; responses are still loading during capture.
Usage Pattern
Google has not published the Gemini-SQL2 model string or API yet. The pattern based on the schema below works with current Gemini models with the google-genai SDK. Change the model string when Gemini-SQL2 runs.
from google import genai
client = genai.Client() # reads GEMINI_API_KEY from environment
schema = """
CREATE TABLE orders (
order_id INTEGER, customer TEXT, region TEXT,
amount REAL, status TEXT, created_at DATE
);
"""
question = "Total paid order amount by region in 2026, highest first."
prompt = f"""You are a text-to-SQL system.
Schema:{schema}
Question: {question}
Return only one executable SQLite query. No explanation."""
resp = client.models.generate_content(
model="gemini-3.1-pro-preview", # the base model named in the announcement; swap when a Gemini-SQL2 ID ships
contents=prompt,
)
print(resp.text)
Production systems should add passive validation. Run the returned SQL, catch errors, and try again with an additional error message. That loop shows what is awarded for the accuracy of BIRD’s execution.
Key Takeaways
- Google reports a Gemini-SQL2 implementation accuracy of 80.04% on the BIRD single-model leaderboard.
- It’s powered by Gemini 3.1 Pro and targets “ready-to-execute SQL,” not just plain SQL.
- In the Google chart, Gemini-SQL2 and Gemini-SQL hold the top two named positions; human performance is 92.96.
- No API, model card, technical report, or product integration information has been published yet.
MARKTECHPOST Virtual Dictionary
Text-to-SQL Playground
Work Gemini-SQL2 I just got married 80.04% in (BIRD benchmark, one model). Select a query, inspect the generated SQL, and run it on the browser’s live dataset.
1 • Ask in natural language
2 • Executed SQL
Select a question above to generate SQL.
CREATE TABLE orders ( order_id INTEGER, customer TEXT, region TEXT, amount REAL, status TEXT, created_at DATE ); -- 12 sample rows loaded in this browser
Execution precision means that SQL must initialize AND return the correct rows.
Check it out Details here. Also, feel free to follow us Twitter and don’t forget to join our 150k+ML SubReddit and Subscribe to Our newspaper. Wait! are you on telegram? now you can join us on telegram too.
Need to work with us on developing your GitHub Repo OR Hug Face Page OR Product Release OR Webinar etc.? contact us




