Building a Multi-Tool Gemma 4 Agent with Error Recovery
Machine Learning Mastery Grade 10 3d ago

Building a Multi-Tool Gemma 4 Agent with Error Recovery

In a <a href="https://machinelearningmastery.

In this article, you will learn how to transform a basic tool-calling script into a resilient agent that gracefully handles failures from misbehaving tools, malformed model outputs, and unavailable services. Topics we will cover include: - How to structure an iterative agent loop with a safety cap on iteration count. - The four distinct categories of failure an agent encounters when calling tools, and how to handle each one. - How to design tool error messages that teach the model how to recover, reducing wasted iterations. Introduction In a previous article, we wired up Gemma 4 to a handful of Python functions using Ollama’s tool-calling API. That gave us a working single-turn dispatcher: the model picks a tool, our code runs it, the model answers. It’s a useful starting point, but it’s a long way from an agent. One of the things that turns a tool-calling demo into an actual agent is how it handles things going wrong. Tools fail. The model hallucinates a function name, or passes a string where you wanted a number, or asks about a city your lookup table has never heard of. An upstream API times out. A required argument is missing. In the previous tutorial, any of these would either crash the script or get swallowed by a try /except that prints a message and gives up. That’s fine for a single path demo. It’s not fine for anything you’d want to leave running. This article rebuilds the agent around the assumption that things will go wrong, and shows how to recover gracefully when they do. The pattern is simple: catch errors at the boundary, convert them into messages the model can read, send them back to the model, and let the model decide whether to retry, route around the problem, or explain the failure to the user. We’ll also wrap everything in a proper iterative agent loop with a safety cap on iteration count. The full script can be found here. This article walks through the parts that matter. Rethinking the Tool Loop The original dispatcher ran a single round: send the user query, collect tool calls, run them, send the results back, print the model’s reply. That’s a one-shot interaction. It works fine when the model’s first response correctly answers the user’s question, but it has nowhere to go when something goes wrong. If a tool fails, the model gets one chance to react and then we’re done. If the model wants to call another tool after seeing the first result, too bad; we already exited. A proper agent loop is iterative. The structure is straightforward: - Send the current message history to the model. - If the model produces tool calls, execute each one, append every result to the history, and loop again. - If the model produces a plain text response, that’s the final answer. Return. - Cap the loop at MAX_ITERATIONS so a confused model can’t burn through your CPU forever. That last point is non-negotiable. Small models occasionally get stuck calling the same tool repeatedly, or oscillating between two tools, and there’s nothing more demoralizing than walking back to your terminal to find your laptop’s fans screaming because Gemma decided to look up the weather in London thirty times in a row. Here’s the loop: | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 | def run_agent(user_query): messages = [{"role": "user", "content": user_query}] for iteration in range(1, MAX_ITERATIONS + 1): payload = { "model": MODEL_NAME, "messages": messages, "tools": available_tools, "stream": False, } print(f"[EXECUTION — iteration {iteration}]") print(" ● Querying model...\n") try: response_data = call_ollama(payload) except Exception as e: print(f" └─ [ERROR] Error calling Ollama API: {e}") print(f" └─ Make sure Ollama is running and {MODEL_NAME} is pulled.") return message = response_data.get("message", {}) tool_calls = message.get("tool_calls") or [] # Branch A: the model wants to use tools if tool_calls: print(f"[TOOL EXECUTION — {len(tool_calls)} call(s)]") messages.append(message) tool_messages = print_tool_calls(tool_calls) messages.extend(tool_messages) print() continue # Branch B: the model produced a final answer print("[RESPONSE]") print(message.get("content", "") + "\n") return # Safety rail: we exhausted MAX_ITERATIONS without a final answer print("[RESPONSE]") print( f"Hit the {MAX_ITERATIONS}-iteration cap without a final answer. " "This usually means the model is stuck in a tool-calling loop. " "Try simplifying the query.\n" ) | The pattern is worth committing to memory because it shows up in every agent framework you’ll ever read: the message history is the state. For each iteration we send the entire conversation (the original user query, the model’s tool-call request, our tool results, any follow-up model messages) back to the model. The model is stateless; the list is the agent’s memory. This iterative structure is also what makes error recovery possible. When a tool fails and we send the error back as a tool message, the model gets to see that error and react to it on the next iteration. Without the loop, there’s nothing to react into. Building the Tool Registry Here we build our four tools, all deterministic, all offline. No API keys, no network calls, no flaky external services to debug. The point of this article is the error-handling architecture, not the tools themselves, so we want the tools to behave predictably so we can focus on the framework around them, and so we can deliberately trigger every failure mode at will. The tools are: get_weather(city) : looks up a city in a small dict of canned weather dataget_local_time(city) : computes the real current time in that city’s timezone usingzoneinfo convert_currency(amount, from_currency, to_currency) : does the math against a hardcoded USD-anchored rate tableget_city_population(city) : another lookup against a small dict The static data lives at the top of the file: | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 | CITY_DATA = { "london": {"timezone": "Europe/London", "population": 8_982_000}, "tokyo": {"timezone": "Asia/Tokyo", "population": 13_960_000}, "sao paulo": {"timezone": "America/Sao_Paulo", "population": 12_330_000}, "paris": {"timezone": "Europe/Paris", "population": 2_161_000}, "new york": {"timezone": "America/New_York", "population": 8_336_000}, "sydney": {"timezone": "Australia/Sydney", "population": 5_312_000}, "mumbai": {"timezone": "Asia/Kolkata", "population": 20_410_000}, } EXCHANGE_RATES = { "USD": 1.00, "EUR": 0.92, "GBP": 0.79, "JPY": 156.40, "BRL": 5.12, "CAD": 1.37, "AUD": 1.51, "INR": 83.20, } | The functions are deliberately simple, but they raise on bad input rather than returning error strings. Here’s get_weather : | 1 2 3 4 5 6 7 8 9 | def get_weather(city: str) -> str: """Returns current weather conditions for a known city.""" key = city.lower().strip() if key not in WEATHER_DATA: raise ValueError( f"Unknown city: '{city}'. Known cities: {', '.join(sorted(WEATHER_DATA.keys()))}." ) data = WEATHER_DATA[key] return f"The weather in {city.title()} is {data['conditions']} with a temperature of {data['temp_c']}°C." | Two things to call out about that error message. First, it’s specific: it tells the caller what went wrong and what the valid options are. Second, the tool raise s a ValueError rather than returning the error as a string. Don’t catch and string-format errors inside the tool; instead, let them propagate. We want the dispatcher to handle every kind of failure in one place, and we want the message the model sees on a bad input to be informative enough that the model can correct itself. get_local_time does the only real work — actual timezone-aware datetime arithmetic — and that’s also the tool we’ll later use to demonstrate graceful degradation against a simulated upstream failure: | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 | def get_local_time(city: str) -> str: """Returns the current local time for a city, with a cached fallback.""" key = city.lower().strip() # Simulate an upstream geocoding service that may fail unpredictably if SIMULATE_GEOCODING_OUTAGE and random.random() SIMULATE_GEOCODING_OUTAGE flag lets us reproduce a real-world failure mode without needing real infrastructure to fail. We'll come back to it. The tool schemas are unchanged from the previous tutorial's style: standard Ollama function-calling format, with clear descriptions of what each tool does and what arguments it expects. The Four Error Recovery Patterns Time to get serious. There are four distinct failure modes you'll encounter when an agent talks to tools, and each one needs its own strategy. They're handled in a single dispatcher function, but it's worth understanding them as separate concepts. Pattern 1: Tool Execution Errors The first defense is the dispatcher itself. It wraps every tool call in a structured try / except block and converts every kind of failure into a (status, content) pair the agent loop can pass back to the model: def dispatch_tool_call(tool_call): function_name = tool_call["function"]["name"] arguments = tool_call["function"]["arguments"] or {} # Defense 1: validate the tool name against the registry if function_name not in TOOL_FUNCTIONS: return "error", ( f"Unknown tool '{function_name}'. " f"Valid tools are: {', '.join(TOOL_FUNCTIONS.keys())}." ) func = TOOL_FUNCTIONS[function_name] # Defense 2: catch argument errors (wrong types, missing or extra args) try: result = func(**arguments) return "ok", str(result) except TypeError as e: return "error", f"Bad arguments for {function_name}: {e}" except ValueError as e: return "error", str(e) except ToolUnavailableError as e: return "error", f"Tool temporarily unavailable: {e}" except Exception as e: return "error", f"Unexpected error in {function_name}: {type(e).__name__}: {e}" | The key insight: return the error to the model as a tool result instead of raising it back t

Comments

No comments yet. Start the discussion.