Gemma 4 Tool Calling Explained: Step by Step Guide

Imagine asking your AI model, “What’s the weather like in Tokyo right now?” and instead of manipulating the response, it calls your actual Python function, fetches the live data, and responds appropriately. That’s how to enable tool call functions in Gemma 4 from Google. A truly exciting addition to open-source AI: this call of duty is built, reliable, and built directly into the AI model!
Combined with local reference Ollama, it allows you to develop cloud-independent AI agents. The best part – these agents have access to real-world APIs and services locally, without subscription. In this guide, we’ll cover the concept and use of architecture and three activities you can try right away.
Also read: Using Claude Code Free with Gemma 4 and Ollama
Conversational language models have limited information based on when they were developed. Therefore, they can only give a limited answer if you ask for current market prices or current weather conditions. This shortcoming was addressed by providing an API wrapper for common models (functions). The purpose – to solve these types of questions by using (tool calling services).
By allowing tooltips, the model can see:
- When it is necessary to retrieve external information
- Identify the correct function based on the given API
- Combine properly formatted method calls (with arguments)
It then waits until the execution of that code block returns output. It then compiles a evaluated response based on that output.
To clarify: the model never uses user-created method calls. It only determines which methods to call and how to structure the call argument list. The user code will use the methods they call the API function. In this case, the model represents the human brain, and the called functions represent the hands.
Before you start writing code, it’s beneficial to understand how everything works. Here is the loop that each tool in Gemma 4 will follow, as it makes tool calls:
- Define functions in Python to perform real operations (ie, get weather data from an external source, query a database, convert money from one currency to another).
- Create a JSON schema for each function you created. The schema must contain the name of the function and what its parameters are (including its types).
- When the system sends you a message, you send both the tool schemas you created and the system message to the Ollama API.
- The Ollama API returns data in the tool_calls block rather than plain text.
- You perform the operation using the parameters sent to you by the Ollama API.
- You return the result to the Ollama API as a ‘role’:’tool’ response.
- The Ollama API receives the result and returns the response to you in natural language.
This two-pass pattern is the basis for every agent that calls an AI task, including the examples shown below.
To perform these tasks, you will need two parts: Ollama must be installed locally on your device, and you will need to download the Gemma 4 Edge 2B model. There are no dependencies beyond what is provided by the standard Python installation, so you don’t need to worry about installing Pip packages at all.
1. Installing Ollama with Homebrew or macOS:
# Install Ollama (macOS/Linux)
curl --fail -fsSL | sh 2. To download the model (about 2.5 GB):
# Download the Gemma 4 Edge Model – E2B
ollama pull gemma4:e2bAfter downloading the model, use the Ollama list to verify that it is in the list of models. Now you can connect to the API running on the URL and run requests against it using the helper function we’ll create:
import json, urllib.request, urllib.parse
def call_ollama(payload: dict) -> dict:
data = json.dumps(payload).encode("utf-8")
req = urllib.request.Request(
"/api/chat",
data=data,
headers={"Content-Type": "application/json"},
)
with urllib.request.urlopen(req) as resp:
return json.loads(resp.read().decode("utf-8"))No third-party libraries are required; therefore, the agent can work independently and provide complete light.
Also read: How to Use Gemma 4 on Your Phone: A Hands-On Guide
Hands-on activity 01: Viewing Live Weather
The first of our methods uses open-meteo which pulls live data for any location with a free keyless weather API to pull information down to a location based on longitude/latitude coordinates. If you are going to use this API, you will have to do a series of steps:
1. Write your function in Python
def get_current_weather(city: str, unit: str = "celsius") -> str:
geo_url = f"
with urllib.request.urlopen(geo_url) as r:
geo = json.loads(r.read())
loc = geo["results"][0]
lat, lon = loc["latitude"], loc["longitude"]
url = (f"
f"?latitude={lat}&longitude={lon}"
f"¤t=temperature_2m,wind_speed_10m"
f"&temperature_unit={unit}")
with urllib.request.urlopen(url) as r:
data = json.loads(r.read())
c = data["current"]
return f"{city}: {c['temperature_2m']}°, wind {c['wind_speed_10m']} km/h" 2. Define your JSON schema
This provides information to the model so that Gemma 4 knows exactly what the function will be doing/expecting when called.
weather_tool = {
"type": "function",
"function": {
"name": "get_current_weather",
"description": "Get live temperature and wind speed for a city.",
"parameters": {
"type": "object",
"properties": {
"city": {"type": "string", "description": "City name, e.g. Mumbai"},
"unit": {"type": "string", "enum": ["celsius", "fahrenheit"]}
},
"required": ["city"]
}
}3. Create a query for your tool call (and host and process the response)
messages = [{"role": "user", "content": "What's the weather in Mumbai right now?"}] response = call_ollama({"model": "gemma4:e2b", "messages": messages, "tools": [weather_tool], "stream": False}) msg = response["message"]
if "tool_calls" in msg: tc = msg["tool_calls"][0] fn = tc["function"]["name"] args = tc["function"]["arguments"] result = get_current_weather(**args) # executed locally
messages.append(msg)
messages.append({"role": "tool", "content": result, "name": fn})
final = call_ollama({"model": "gemma4:e2b", "messages": messages, "tools": [weather_tool], "stream": False})
print(final["message"]["content"])Output

Hands-on Task 02: Live Currency Converter
Classic LLM fails by manipulating currency rates and not being able to provide accurate, up-to-date currency conversions. With the help of ExchangeRate-API, the converter can find the latest foreign exchange rates and convert accurately between two currencies.
Once you’ve completed Steps 1-3 below, you’ll have a fully functional Gemma 4 converter:
1. Write your Python function
def convert_currency(amount: float, from_curr: str, to_curr: str) -> str:
url = f"
with urllib.request.urlopen(url) as r:
data = json.loads(r.read())
rate = data["rates"].get(to_curr.upper())
if not rate:
return f"Currency {to_curr} not found."
converted = round(amount * rate, 2)
return f"{amount} {from_curr.upper()} = {converted} {to_curr.upper()} (rate: {rate})"2. Define your JSON schema
currency_tool = {
"type": "function",
"function": {
"name": "convert_currency",
"description": "Convert an amount between two currencies at live rates.",
"parameters": {
"type": "object",
"properties": {
"amount": {"type": "number", "description": "Amount to convert"},
"from_curr": {"type": "string", "description": "Source currency, e.g. USD"},
"to_curr": {"type": "string", "description": "Target currency, e.g. EUR"}
},
"required": ["amount", "from_curr", "to_curr"]
}
}
} 3. Check your solution using a natural language question
response = call_ollama({
"model": "gemma4:e2b",
"messages": [{"role": "user", "content": "How much is 5000 INR in USD today?"}],
"tools": [currency_tool],
"stream": False
}) Gemma 4 will process the natural language query and format the appropriate API call based on the value = 5000, from = ‘INR’, to = ‘USD’. The resulting API call will then be processed in the same ‘response’ method described in Task 01.
Output

Gemma 4 succeeds in this task. You can assign multiple tools to the model at once and send a compound query. The model covers all the necessary calls at once; tying hands is not necessary.
1. Add a time zone tool
def get_current_time(city: str) -> str:
url = f"
with urllib.request.urlopen(url) as r:
data = json.loads(r.read())
return f"Current time in {city}: {data['time']}, {data['dayOfWeek']} {data['date']}"
time_tool = {
"type": "function",
"function": {
"name": "get_current_time",
"description": "Get the current local time in a city.",
"parameters": {
"type": "object",
"properties": {
"city": {"type": "string", "description": "City name for timezone, e.g. Tokyo"}
},
"required": ["city"]
}
} 2. Build a multi-tool agent loop
TOOL_FUNCTIONS = { "get_current_weather": get_current_weather, "convert_currency": convert_currency, "get_current_time": get_current_time, }
def run_agent(user_query: str): all_tools = [weather_tool, currency_tool, time_tool] messages = [{"role": "user", "content": user_query}]
response = call_ollama({"model": "gemma4:e2b", "messages": messages, "tools": all_tools, "stream": False})
msg = response["message"]
messages.append(msg)
if "tool_calls" in msg:
for tc in msg["tool_calls"]:
fn = tc["function"]["name"]
args = tc["function"]["arguments"]
result = TOOL_FUNCTIONS[fn](**args)
messages.append({"role": "tool]]]", "content": result, "name": fn})
final = call_ollama({"model": "gemma4:e2b", "messages": messages, "tools": all_tools, "stream": False})
return final["message"]["content"]
return msg.get("content", "")3. Make a combined/multiple objective question
print(run_agent(
"I'm flying to Tokyo tomorrow. What's the current time there, "
"the weather, and how much is 10000 INR in JPY?"
))eOutput

Here, we have described three different tasks with three different APIs for real-time natural language processing using one common concept. It includes all on-premise executions without cloud solutions from the Gemma 4 example; none of these components use remote services or the cloud.
What makes Gemma 4 different from Agent AI?
Some open weight models can call tools, however they don’t work reliably, and this is what sets them apart from Gemma 4. The model provides valid JSON variables, processes optional parameters correctly, and decides when to return information and not call a tool. As you continue to use it, keep the following in mind:
- The quality of the schema is very important. If your field definition isn’t clear, you’ll have a hard time identifying your tool’s arguments. Be specific about units, formats, and examples.
- The required list is validated by Gemma 4. Gemma 4 respects the required/optional distinction.
- If the tool returns a result, that result becomes the context of any “role”: the “tool” messages you send during your last pass. A rich result from the tool, the answer will be rich.
- A common mistake is to return the result of a tool as “role”: “user” instead of “role”: “tool”, as the model will not refer to you correctly and will try to re-invoke the call.
Also read: Top 10 Gemma 4 projects that will blow your mind
The conclusion
He created a real AI agent that uses the function calling feature of Gemma 4, and it works everywhere. An agent-based system uses all the building blocks in production. Possible next steps may include:
- adding a file system tool that will allow reading and writing local files on demand;
- using an SQL database as a way to query natural language data;
- creating a memory tool that will create session snapshots and write them to disk, thus giving the agent the ability to remember past conversations.
An open-source AI agent is emerging rapidly. Gemma 4’s ability to support scheduled workloads provides greater performance autonomy without relying on the cloud. Start small, create a working system, and the building blocks for your next projects will be ready to come together.
Sign in to continue reading and enjoy content curated by experts.



