Google Releases Gemini-SQL2: Gemini 3.1 Pro Text-to-SQL Gains 80.04% on BIRD Single-Model Leaderboard

admin June 12, 2026

0 0 4 minutes read

Google Releases Gemini-SQL2: Gemini 3.1 Pro Text-to-SQL Gains 80.04% on BIRD Single-Model Leaderboard

The Google Research team announced the launch of the Gemini-SQL2 on X. They described the program as a text-to-SQL breakthrough powered by Gemini 3.1 Pro. Gemini-SQL2 posted an implementation accuracy of 80.04% on the BIRD Text-to-SQL leaderboard (Model One). Google’s chart puts it above its Gemini-SQL, the previous top entry. The metric measures whether the generated SQL works and returns the correct results, not whether it looks valid.

Gemini-SQL2

Gemini-SQL2 is a text-to-SQL capability, not a model-independent release. It translates natural language queries into what Google calls ‘actionable SQL queries.’ Power is built into the Gemini 3.1 Pro.

According to the announcement at X, “data manipulation and complex business scenarios make generating accurate SQL in natural language very difficult.” IX Post also said that “improved understanding of SQL can increase natural language capabilities across Google’s data services.” That points to integration targets like BigQuery Studio, AlloyDB AI, and Cloud SQL Studio, which already ships SQL generation based on Gemini. Google has not yet confirmed which products will receive Gemini-SQL2.

Measurements

BIRD (Large Scale Grounded Text-to-SQL Database Test Benchmark) is the industry standard for this task. It contains 12,751 query-SQL pairs across 95 databases spanning 37 professional domains, totaling 33.4GB. Databases include dirty values and need the support of external information, unlike older benchmarks like Spider.

BIRD execution precision (EX): the generated SQL must execute and return results similar to the golden query. Google says this specifically. “With the BIRD rating, which measures guaranteed execution accuracy, GeminiSQL-2’s SQL not only looks good, it also performs well.”

The Trained Single Model Track limits the processing, retrieval, and agent frameworks that aggregate use to improve scores. It measures the ability of the text-to-SQL model. Google Cloud’s previous record on this track, reported on November 15, 2025, was 76.13. Google shows human performance at 92.96, leaving a gap of 12.92 points from 80.04.

How the Leaderboard stacks up

Google’s chart, in the X post, shows Gemini-SQL2 ahead of eight named competitors, as well as several unnamed competitors. Only 80.04% were mentioned as text. The values below are read from the chart area and are approximate; dates indicate the horizontal placement of each point.

The program	Organization	BIRD Modeling Accuracy (One Model)	Chart Date
Gemini-SQL2	Google	80.04% (mean)	June 2026
Gemini-SQL	Google	~77.2%	March 2026
IQ-SQL	AWS	~76.5%	December 2025
Databricks RLVR 32B	Databricks	~75.7%	July 2025
SiriusAI-Text2SQL-32B-v2	Tencent	~75.0%	December 2025
Arctic-Text2SQL-R1-32B	Snowflake	~73.9%	June 2025
GPT-5.5-xhigh	OpenAI	~72.5%	April 2026
SQLWeaver-32B	Alibaba	~71.7%	May 2026
Claude Opus 4.6	Anthropic	~70.1%	February 2026

Two patterns are evident. Google now holds two top-ranked databases, Gemini-SQL2 and Gemini-SQL. Several special models of 32B SQL also sit above some of the standard models on the border in this chart.

Use Cases with examples

Self-help statistics: The revenue manager requests monthly revenue per region, for accounts received within 90 days of development. This requires summation, window logic, and date arithmetic. Signature-validated generation captures valid SQL but returns incorrect rows.
Data engineering draft: Devs can write BigQuery changes from English, then update instead of writing from scratch. Google’s November 2025 task identified schema understanding as a critical component. Higher BIRD scores indicate better handling of ambiguous columns and dirty values.
Embedded features to “query your data”.: SaaS teams adding links to natural language queries still need human review with 80% accuracy. One out of five questions may be incorrect. The result sets the expectation, not the removal of the update.

Gemini-SQL2 Introducing: Community Welcome Dashboard

Verified social engagement on Google Research announcements posts • first ~3 hours • June 12, 2026

Bird single-Model leaderboard • Accuracy of Execution

Division of Field of Engagement

X / Twitter (main post)

Watching144.4K

Being loved2,800

He repeats267

Bookmarks1,300

Answers64

Marriage standard3.1%

LinkedIn (main post)

Reaction349+

Comments12

He repeats27

Reception signal

Bookmark-plus-like to reply ratio in X. A high retention rate with few responses usually indicates approval over conflict. The level of the comment level has not been measured; responses are still loading during capture.

Usage Pattern

Google has not published the Gemini-SQL2 model string or API yet. The pattern based on the schema below works with current Gemini models with the google-genai SDK. Change the model string when Gemini-SQL2 runs.

from google import genai

client = genai.Client()  # reads GEMINI_API_KEY from environment

schema = """
CREATE TABLE orders (
  order_id INTEGER, customer TEXT, region TEXT,
  amount REAL, status TEXT, created_at DATE
);
"""

question = "Total paid order amount by region in 2026, highest first."

prompt = f"""You are a text-to-SQL system.
Schema:{schema}
Question: {question}
Return only one executable SQLite query. No explanation."""

resp = client.models.generate_content(
    model="gemini-3.1-pro-preview",  # the base model named in the announcement; swap when a Gemini-SQL2 ID ships
    contents=prompt,
)
print(resp.text)

Production systems should add passive validation. Run the returned SQL, catch errors, and try again with an additional error message. That loop shows what is awarded for the accuracy of BIRD’s execution.

Key Takeaways

Google reports a Gemini-SQL2 implementation accuracy of 80.04% on the BIRD single-model leaderboard.
It’s powered by Gemini 3.1 Pro and targets “ready-to-execute SQL,” not just plain SQL.
In the Google chart, Gemini-SQL2 and Gemini-SQL hold the top two named positions; human performance is 92.96.
No API, model card, technical report, or product integration information has been published yet.

MARKTECHPOST Virtual Dictionary

Text-to-SQL Playground

Work Gemini-SQL2 I just got married 80.04% in (BIRD benchmark, one model). Select a query, inspect the generated SQL, and run it on the browser’s live dataset.

1 • Ask in natural language

2 • Executed SQL

Select a question above to generate SQL.

CREATE TABLE orders (
  order_id INTEGER, customer TEXT, region TEXT,
  amount REAL, status TEXT, created_at DATE
);  -- 12 sample rows loaded in this browser

Execution precision means that SQL must initialize AND return the correct rows.

Check it out Details here. Also, feel free to follow us Twitter and don’t forget to join our 150k+ML SubReddit and Subscribe to Our newspaper. Wait! are you on telegram? now you can join us on telegram too.

Need to work with us on developing your GitHub Repo OR Hug Face Page OR Product Release OR Webinar etc.? contact us