Skip to content
About

Tools & Memory

An agent’s loop is only as good as two things: the tools that let it affect the world, and the memory that lets it carry information across steps and sessions.

A tool is a function the agent can call — search, a database query, an API request, a calculator, code execution. Function calling is the mechanism: you describe your tools to the model; when it wants one, it returns a structured request naming the tool and its arguments; your code runs it and returns the result.

tools = [{
"name": "get_order_status",
"description": "Get the current status of a customer's order by order ID.",
"parameters": {
"type": "object",
"properties": {
"order_id": {"type": "string", "description": "e.g. 'o_8842'"},
},
"required": ["order_id"],
},
}]
# The model returns: {"tool": "get_order_status", "args": {"order_id": "o_8842"}}
# Your code executes it — the model never runs anything itself.

Tool quality drives agent reliability more than any prompt tweak. The model picks tools using only their descriptions, so treat tool design as API design for a non-human caller:

  • Crisp descriptions. State exactly what the tool does and when to use it. Vague descriptions cause wrong-tool choices.
  • Few, well-scoped tools. A handful of clear tools beats thirty overlapping ones — too many and the model gets lost choosing.
  • Constrained inputs. Use enums and typed schemas so the model can’t pass nonsense.
  • Helpful results — including errors. A result of "Error: order_id not found. Check the ID format (o_XXXX)." lets the agent self-correct; "Error 400" leaves it stuck.
  • Right-sized output. Don’t return a 50KB blob — it floods the context window. Return what the agent needs.

An LLM is stateless — each call only knows what’s in its context window. Memory is the engineering around that limit. Agents need several distinct kinds.

The current task’s context: the goal, recent steps, and observations — the running transcript inside the loop. The constraint is the context window. Long tasks overflow it, so you manage working memory actively:

  • Trim — drop the oldest, least relevant turns.
  • Summarize — periodically compress older history into a short recap.
  • Offload — write bulky results to external storage and keep only a reference in context.

Information that must survive across sessions — user preferences, facts learned, past outcomes. It lives outside the model, usually in a vector database, and is retrieved when relevant. Long-term memory is essentially RAG pointed at the agent’s own history.

WRITE End of session Extract facts Embed Store in memory READ New situation Embed it Retrieve memories Inject to context
TypeHoldsLives inLifespan
Working / short-termCurrent task stateContext windowThis task
EpisodicPast events and outcomesVector DB / storeAcross sessions
SemanticFacts, preferences, knowledgeVector DB / storeLong-lived
ProceduralHow to do recurring tasksPrompts / tools / codePersistent

Deciding what occupies the limited context window each step — instructions, which memories, which tool results, how much history — is context engineering, and it’s the core craft of building agents. Too little context and the agent flounders; too much and it gets distracted, slows down, and costs more. Curate deliberately, every step.

Tools let agents act; function calling is the protocol, and the model only requests — your code executes and authorizes. Design tools like APIs for a non-human caller: clear descriptions, narrow scope, typed inputs, useful errors — and treat every tool as a security boundary. Memory compensates for the stateless model: working memory (managed against the context limit), long-term memory (RAG over past sessions). Choosing what fills the context each step — context engineering — is the central skill.