What makes an AI an "agent"? How do LLMs plan, use tools, maintain memory, and reason their way to goals? This week digs into the cognitive architecture of intelligent agents.
A hammer is a tool โ it does exactly what you do with it, nothing more. A calculator is a tool too. But imagine an intern you give a task to: "Research our competitors and write a report." They break it down: search Google, take notes, organize findings, write draft, review it, fix errors, deliver the final product. They make decisions, use tools, handle unexpected obstacles, and persist toward a goal across many steps. That's an agent. The key insight: agents pursue goals autonomously over extended horizons, using tools and environment feedback to adapt.
| Dimension | Traditional LLM (chatbot) | AI Agent |
|---|---|---|
| Task duration | Single turn: query โ response | Multi-step: goal โ plan โ execute โ reflect โ deliver |
| Tool use | None (just text generation) | Actively calls APIs, runs code, searches web, reads files |
| Memory | Only current context window | Short-term context + long-term external memory |
| Decision making | One-shot: generate best response | Iterative: act โ observe โ reason โ act again |
| Error handling | None โ output is final | Detects failures, self-corrects via reflection |
| Goal persistence | Answers one question at a time | Maintains goal state across many steps/sessions |
The context window of an LLM is finite (even 128K tokens is limited for long tasks). Agents need different types of memory to operate over extended horizons. Cognitive science inspired the design: human memory has episodic, semantic, procedural, and working memory โ agent memory systems mirror this.
What's currently in the LLM's context window: the prompt, conversation history, recent tool outputs, and partial results.
Analogy: RAM โ fast, immediately accessible, but limited and lost when the session ends.
Records of past agent runs, interactions, and experiences. Retrieved when relevant to a new task. Enables the agent to learn from past successes and failures.
Implementation: vector database of past task logs. Retrieve via semantic similarity.
General world knowledge, domain facts, product documentation โ not tied to a specific episode. Queried via RAG to provide factual grounding.
Implementation: vector DB indexed with domain documents (same as RAG knowledge base).
Learned skills, workflows, and action sequences that have worked before. Encoded in the agent's system prompt, fine-tuned weights, or tool definitions.
Implementation: system prompt with successful workflow patterns, or fine-tuned model weights.
Without tools, an LLM can only manipulate text โ it's trapped in language. Tools extend the agent into the real world: reading files, writing code, calling databases, fetching URLs, executing calculations. The agent selects tools via function calling โ the LLM outputs a structured JSON specifying the tool and its arguments.
A capable agent doesn't tackle a complex goal in one step โ it breaks it down into a plan: a sequence of sub-tasks, each with defined inputs, outputs, and tool requirements. This mirrors how humans solve complex problems by breaking them into manageable steps.
Generate the full plan upfront, then execute each step. Efficient but brittle โ if early steps fail, the plan may be invalid.
Generate only the next action at each step, using current observations. More adaptive โ the plan evolves based on what the agent discovers.
Perception = what information the agent has access to at each step. This includes: the original goal, conversation history, tool outputs, retrieved memories, and environmental state. Managing this context is crucial โ context windows are finite, and irrelevant information degrades reasoning quality.
LLMs make errors. Agents that can reflect on their outputs, compare them against success criteria, and generate corrective actions dramatically outperform those that don't. Reflection is a meta-cognitive skill: reasoning about one's own reasoning.
ReAct (Yao et al., 2022) is the foundational agent reasoning pattern. It interleaves Thought (LLM reasoning trace) with Action (tool call) with Observation (tool output). This tight coupling between reasoning and acting reduces hallucination because every factual claim is immediately grounded by a tool result.
Interleaves thought traces with tool actions. Linear chain: TโAโOโTโAโO... Best for: factual Q&A, web research, API tasks where you need grounded, verifiable answers at each step.
Adds a reflection loop: after each failed attempt, the agent writes a verbal self-critique stored in episodic memory. Next attempt uses this reflection. Best for: coding challenges, multi-step reasoning where initial attempts often fail.
Generates multiple candidate reasoning steps at each point, evaluates them, and explores the best via BFS/DFS/beam search. Best for: creative problem-solving, mathematical proofs, strategic planning โ tasks with a combinatorial search space.
Monte Carlo Tree Search: expand a tree of possible action sequences, simulate outcomes, backpropagate scores, repeat. Enables superhuman performance on hard reasoning tasks. The basis of OpenAI's "thinking" models (o1, o3, o4-mini).
LLM agents didn't emerge in a vacuum โ they were preceded by decades of AI research on cognitive architectures. Understanding the classical foundations helps you reason about why LLM agents work the way they do, and where they still fall short.
Symbolic cognitive architecture. Uses production rules, working memory, and "chunking" to learn new rules from problem-solving. Foundation of rule-based AI agents.
Hybrid symbolic/neural. Modules for declarative memory, procedural memory, and perceptual/motor. Most empirically grounded cognitive architecture in psychology.
Belief-Desire-Intention. Agents have beliefs (world state), desires (goals), and intentions (committed plans). Still used in multi-agent systems and robotics.
The LLM implicitly serves as all cognitive modules. Context window = working memory. Tool calls = perception/motor. System prompt = procedural knowledge. Emergent rather than designed.
Combines LLM "intuition" with formal planners, theorem provers, or logic engines. Best for safety-critical domains where the neural component must be verifiably constrained.
Agents grounded in physical or virtual environments (robotics, game agents). Perception = visual/sensor input. The agent must ground language in physical reality โ hardest open problem in agentic AI.
| Failure Mode | Description | Mitigation |
|---|---|---|
| Hallucinated tool calls | LLM invents tool arguments or tool names that don't exist | Strict function schemas, input validation, constrained generation |
| Infinite loops | Agent gets stuck in a loop (same action repeatedly) | Max step limits, loop detection, action deduplication |
| Goal drift | Agent pursues sub-goal instead of original goal | Persistent goal reminder in every prompt, goal-checking step |
| Cascading errors | Wrong result in step 2 propagates and corrupts all later steps | Error detection at each step, checkpointing, rollback |
| Prompt injection | Malicious content in environment overrides agent instructions | Input sanitization, privilege separation, human-in-the-loop |
| Over-trust of tools | Agent blindly trusts incorrect tool output | Cross-verification with multiple sources, confidence thresholds |
| Context overflow | Conversation grows beyond context window, losing key info | Context compression, sliding window, memory summarization |
1. What is the fundamental property that distinguishes an AI "agent" from a standard LLM chatbot?
2. In the ReAct framework, why does interleaving Thought-Action-Observation reduce hallucination compared to standard Chain-of-Thought?
3. An agent working on a 30-step task starts "forgetting" its original goal by step 20 and begins pursuing a tangential sub-goal. This is called:
4. Which type of agent memory is analogous to a human writing notes to themselves after completing a task so they do it better next time?
Deploying an LLM without understanding alignment is like shipping a self-driving car without safety testing. This section covers the technical reality of AI safety โ from WEAT bias measurement and DPO alignment math, to EU AI Act compliance and LLM watermarking. Every AI/ML professional in 2026 needs this.
WEAT (Word Embedding Association Test, Caliskan et al. 2017) measures implicit bias in word embeddings by testing whether target concepts associate more strongly with one attribute than another โ mirroring the human IAT (Implicit Association Test).
RLHF with PPO requires 4 models simultaneously: reference policy ฯ_ref, current policy ฯ_ฮธ, reward model r_ฯ, and value function V_ฯ. It's unstable, slow, and expensive. DPO (Rafailov et al., NeurIPS 2023) realizes that the optimal policy can be derived analytically โ no RL needed.
| Aspect | PPO-RLHF | DPO |
|---|---|---|
| Models needed | 4 (policy, ref, reward, value) | 2 (policy, ref) |
| Stability | Often unstable | Stable (SFT-like) |
| Compute cost | Very high | Moderate |
| Reward hacking | Yes (explicit RM) | Less (implicit) |
| Used by | GPT-4, Claude | LLaMA-2, Mistral, Zephyr |
KTO (Kahneman-Tversky Optimization, 2024) extends DPO using prospect theory โ doesn't require paired preferences, works with individual (x, y, label) examples.
Social scoring systems, real-time biometric surveillance in public, manipulation of vulnerable groups, exploitation of unconscious behaviors. Banned outright.
LLMs used in hiring, credit scoring, education assessment, critical infrastructure, law enforcement, medical devices. Requires conformity assessment, transparency, human oversight, logging.
Models like GPT-4, LLaMA, Claude with systemic risk (>10ยฒยณ FLOPs training compute) face additional obligations: red-teaming, adversarial testing, cybersecurity measures, incident reporting to EU AI Office.
AI-generated content must be labeled. Technical approaches: token-level watermarking (Kirchenbauer 2023 โ green/red token lists), metadata embedding, provenance certificates. EU Code of Practice requires multi-layered approach since no single method is sufficient.
Three widely-used fairness criteria are mathematically incompatible (Chouldechova 2017, Kleinberg 2017) โ satisfying one often violates the others. Understanding this is critical for any responsible AI deployment.
| Criterion | Definition | Formula | Tradeoff |
|---|---|---|---|
| Demographic Parity | Equal positive prediction rates across groups | P(ลถ=1|A=0) = P(ลถ=1|A=1) | Ignores base rate differences |
| Equalized Odds | Equal TPR and FPR across groups | P(ลถ=1|Y=y,A=a) equal for all a,y | Requires same accuracy per group |
| Calibration | Predicted probabilities match true frequencies in each group | P(Y=1|score=s,A=a) = s for all a | Conflicts with equalized odds when base rates differ |
Chouldechova (2017) proved: when base rates differ across groups, no classifier can simultaneously achieve demographic parity, equalized odds, AND calibration. This means choosing a fairness criterion is a policy decision, not a technical one. NLP researchers must make this explicit when building and deploying models.