My agent re-ran a tool it didn't like the output of — multi-tool agents and the thing the docs don't tell you about tool descriptions

In the last post I built a minimal Claude agent with one tool, no framework, just the Anthropic SDK. This is the follow-up: three tools, and the discovery that most of the agent's planning behaviour was coming from somewhere I hadn't been paying attention to.

The setup: three tools, no specified order
Module 1's agent had one tool (save_game_design). Module 2 adds two more:
pythondef search_similar_games(genre: str, mechanics: str) -> str:
# returns reference games for a genre
...

def estimate_dev_time(features: list, team_size: int) -> str:
base_weeks = len(features) * 2
adjusted = base_weeks / max(team_size, 1)
solo_note = " Note: as a solo dev, budget 3x this estimate." if team_size == 1 else ""
return f"Estimated development time: {adjusted:.0f}-{adjusted*1.5:.0f} weeks.{solo_note}"
The tool definitions are where it gets interesting. Notice the description fields:
pythontools = [
{
"name": "search_similar_games",
"description": "Search for similar existing games to use as reference "
"for scope and mechanics. Call this early to ground the "
"design in reality.",
...
},
{
"name": "estimate_dev_time",
"description": "Estimate development time based on planned features and "
"team size. Call this before finalizing the design to "
"ensure scope is realistic.",
...
},
{
"name": "save_game_design",
"description": "Save the completed game design document to a file. "
"Only call this when the full design is ready.",
...
}
]
"Call this early." "Call this before finalizing." "Only call this when the full design is ready." Those phrases are the entire sequencing logic. There is no orchestration code. The system prompt lists the steps as a suggestion but doesn't enforce them. The loop is identical to Module 1 — branch on block.name to dispatch the right function.
What it did with one prompt
Same Krenholm prompt as Module 1. The agent's actual sequence:
[search_similar_games] → Prison Architect, RimWorld, Dwarf Fortress
[estimate_dev_time] → 24-36 weeks (3x for solo dev)
[estimate_dev_time] → 20-30 weeks ← it ran this one again
[save_game_design]
Between the two estimates, the model's text output:

"24–36 weeks for the core is workable, but 3x polish is a serious warning. Let me trim scope smartly — dropping procedural maps and day/night cycle — and re-estimate the leaner version."

It observed a result, evaluated it, cut two features from its own feature list, and re-ran the estimate to verify. The observe → evaluate → adjust loop, with no code telling it to do that. The only thing that makes re-running possible is that the loop keeps going as long as stop_reason == "tool_use" — the agent can call the same tool as many times as it decides to.
The thing the docs underplay: tool descriptions are planning instructions
I came into Module 2 expecting to write orchestration logic — some state machine deciding "research, then design, then estimate, then save." I wrote none. The sequencing came entirely from the natural-language description fields.
This reframes what a tool definition is. It's not just an API contract telling the model what arguments to pass. The description is a planning hint the model reads when deciding whether and when to call the tool relative to the others. "Call this early" and "call this before finalizing" are, functionally, a plan written in prose.
Concretely: in an earlier run with a vaguer save-tool description, the agent saved the document before finishing the design. Tightening the description to "Only call this when the full design is ready" fixed the ordering without touching the system prompt or the loop. If your multi-tool agent calls things in the wrong order, look at the descriptions before you reach for orchestration code.
One implementation gotcha: don't assume one tool_use block per response
At one point the model said two steps were "independent" and it could run them together. In my run the calls still arrived sequentially — but the API permits multiple tool_use blocks in a single response, so the defensive way to write the loop is to iterate over all of them and return all the corresponding tool_result blocks in the next message:
pythonif response.stop_reason == "tool_use":
tool_results = []
for block in response.content:
if block.type == "tool_use":
result = dispatch(block.name, block.input) # branch on name
tool_results.append({
"type": "tool_result",
"tool_use_id": block.id,
"content": result
})
conversation_history.append({"role": "user", "content": tool_results})
continue
Assume one-tool-per-turn and you'll silently drop calls.
The Windows tax, still being paid
Committing Module 2, PowerShell rejected &&:
The token '&&' is not a valid statement separator in this version.
Every git tutorial uses git add . && git commit && git push. On PowerShell you run them as three separate commands. Minor, but it's the third Windows-specific paper cut in two modules. If you're following along on Windows, expect these.
What it produced
The design document is meaningfully better than Module 1's — not because the model improved, but because it had tools feeding it reality and it used them on its own output. The doc cites the reference games directly in the art direction, includes a realistic dev-time roadmap, and has an "Intentionally Cut Features" section listing what the agent removed to keep scope shippable.
The mechanic that produced all of it is still just the loop: reason, call a tool, read the result, decide whether to keep going. The leverage turned out to be in the prose around the loop, not the loop itself.
Code and full terminal log: github.com/quietaidev-collab/zero-to-agent
Module 3 next: two of these agents talking to each other.