๐Ÿค–

What is an AI Agent? Definitions & The Big Shift

M13T1L1 ยท From chatbots to autonomous goal-pursuing systems

A hammer is a tool โ€” it does exactly what you do with it, nothing more. A calculator is a tool too. But imagine an intern you give a task to: "Research our competitors and write a report." They break it down: search Google, take notes, organize findings, write draft, review it, fix errors, deliver the final product. They make decisions, use tools, handle unexpected obstacles, and persist toward a goal across many steps. That's an agent. The key insight: agents pursue goals autonomously over extended horizons, using tools and environment feedback to adapt.

What vs. Traditional NLP Models

โ–ผ
DimensionTraditional LLM (chatbot)AI Agent
Task durationSingle turn: query โ†’ responseMulti-step: goal โ†’ plan โ†’ execute โ†’ reflect โ†’ deliver
Tool useNone (just text generation)Actively calls APIs, runs code, searches web, reads files
MemoryOnly current context windowShort-term context + long-term external memory
Decision makingOne-shot: generate best responseIterative: act โ†’ observe โ†’ reason โ†’ act again
Error handlingNone โ€” output is finalDetects failures, self-corrects via reflection
Goal persistenceAnswers one question at a timeMaintains goal state across many steps/sessions
๐Ÿ”‘ Core Definition An AI Agent is a system that: (1) perceives its environment, (2) reasons about its state and goal, (3) plans and takes actions, (4) observes the results, and (5) adapts until the goal is achieved. The LLM is the "brain" โ€” the cognitive engine โ€” but the agent is the whole system including memory, tools, and feedback loop.

The Agent Loop: Perception โ†’ Brain โ†’ Action โ†’ Observation CORE

โ–ผ
๐Ÿ‘๏ธ PERCEPTION
User goal + environment state + memory retrieval + tool outputs
โ†“
๐Ÿง  BRAIN (LLM)
Reason, plan, decide: "What should I do next?"
โ†“
๐Ÿ“‹ PLANNING
Decompose goal โ†’ sub-tasks โ†’ select next action
โ†“
โšก ACTION
Execute: call tool / write code / call API / respond
โ†“
๐ŸŒ ENVIRONMENT (Tool outputs, APIs, files)
Result: success / error / new information
โ†“
๐Ÿ” Observation โ†’ back to Perception โ†’ loop until goal achieved or max steps
โœ… Why the Loop Matters The loop is what gives agents their power. A single-turn LLM gets one shot. An agent can try, fail, observe the error, reason about why it failed, adjust its approach, try again โ€” exactly like a human solving a complex problem. This enables handling of tasks with unpredictable intermediate states.
๐Ÿงฉ

Agent Architecture: The Five Core Components

M13T1L1 ยท Building blocks of every AI agent

1. Memory Systems DEEP DIVE

โ–ผ
Why Agents Need Multiple Memory Types

The context window of an LLM is finite (even 128K tokens is limited for long tasks). Agents need different types of memory to operate over extended horizons. Cognitive science inspired the design: human memory has episodic, semantic, procedural, and working memory โ€” agent memory systems mirror this.

๐Ÿ“ In-Context (Working Memory)

What's currently in the LLM's context window: the prompt, conversation history, recent tool outputs, and partial results.

Analogy: RAM โ€” fast, immediately accessible, but limited and lost when the session ends.

๐Ÿ“– Episodic Memory

Records of past agent runs, interactions, and experiences. Retrieved when relevant to a new task. Enables the agent to learn from past successes and failures.

Implementation: vector database of past task logs. Retrieve via semantic similarity.

๐Ÿง  Semantic Memory

General world knowledge, domain facts, product documentation โ€” not tied to a specific episode. Queried via RAG to provide factual grounding.

Implementation: vector DB indexed with domain documents (same as RAG knowledge base).

โš™๏ธ Procedural Memory

Learned skills, workflows, and action sequences that have worked before. Encoded in the agent's system prompt, fine-tuned weights, or tool definitions.

Implementation: system prompt with successful workflow patterns, or fine-tuned model weights.

2. Tool Use & Action Space

โ–ผ
What Tools Enable

Without tools, an LLM can only manipulate text โ€” it's trapped in language. Tools extend the agent into the real world: reading files, writing code, calling databases, fetching URLs, executing calculations. The agent selects tools via function calling โ€” the LLM outputs a structured JSON specifying the tool and its arguments.

Common Tool Categories:
๐Ÿ” Web Search ๐Ÿ Code Executor ๐Ÿ“‚ File Reader/Writer ๐Ÿ—„๏ธ Database Query (SQL) ๐ŸŒ HTTP/API calls ๐Ÿ“Š Data Visualization ๐Ÿงฎ Calculator ๐Ÿ“ง Email/Calendar ๐Ÿ”ข Vector DB Retrieval ๐Ÿค– Sub-agent spawning
# How Function/Tool Calling works (OpenAI-style): 1. Agent receives task: "What's the current stock price of NVDA?" 2. LLM outputs a tool call: { "tool": "web_search", "arguments": {"query": "NVDA stock price today"} } 3. Framework executes tool โ†’ result returned: {"result": "NVDA: $875.40 (+2.3%) as of 10:30 AM EST"} 4. Tool result added to context, LLM generates final answer: "NVIDIA (NVDA) is currently trading at $875.40, up 2.3% today."

3. Planning & Task Decomposition

โ–ผ
How Agents Decompose Complex Goals

A capable agent doesn't tackle a complex goal in one step โ€” it breaks it down into a plan: a sequence of sub-tasks, each with defined inputs, outputs, and tool requirements. This mirrors how humans solve complex problems by breaking them into manageable steps.

# Example: Goal decomposition for a research task GOAL: "Write a research brief on the current state of AI regulation in the EU" PLAN (agent-generated): Step 1: search_web("EU AI Act latest updates 2025") โ†’ article summaries Step 2: search_web("EU AI regulation enforcement 2025") โ†’ enforcement news Step 3: retrieve_docs("EU AI Act text", vector_db) โ†’ policy details Step 4: synthesize_findings(step1, step2, step3) โ†’ structured notes Step 5: write_draft(notes, format="executive brief") โ†’ draft text Step 6: review_and_edit(draft, criteria="accuracy, clarity") โ†’ final # Agent executes each step, handles errors, and adapts if # a search returns insufficient info

๐Ÿ“‹ Plan-and-Execute Pattern

Generate the full plan upfront, then execute each step. Efficient but brittle โ€” if early steps fail, the plan may be invalid.

๐Ÿ”„ ReAct Pattern (Reactive Planning)

Generate only the next action at each step, using current observations. More adaptive โ€” the plan evolves based on what the agent discovers.

4. Perception & Context Management

โ–ผ
What Perception Means for LLM Agents

Perception = what information the agent has access to at each step. This includes: the original goal, conversation history, tool outputs, retrieved memories, and environmental state. Managing this context is crucial โ€” context windows are finite, and irrelevant information degrades reasoning quality.

  • Context compression: Summarize long histories to free up context window space while preserving key information
  • Memory gating: Only retrieve memories relevant to the current step โ€” don't flood context
  • Structured prompting: Use clear sections (Goal / Memory / Tools / Last observation / Instruction) to organize what the LLM "sees"
  • Multimodal perception: Modern agents can perceive images, audio, PDFs, code โ€” not just text

5. Reflection & Self-Correction

โ–ผ
Why Self-Correction Matters

LLMs make errors. Agents that can reflect on their outputs, compare them against success criteria, and generate corrective actions dramatically outperform those that don't. Reflection is a meta-cognitive skill: reasoning about one's own reasoning.

# Reflexion agent pattern (Shinn et al., 2023): Step 1: GENERATE โ†’ Agent attempts the task โ†’ produces output Step 2: EVALUATE โ†’ Evaluator scores output (LLM judge or ground truth) "Your code throws IndexError on line 7. Test case 3 fails." Step 3: REFLECT โ†’ Agent writes self-reflection: "I forgot to check array bounds before accessing index. I should add a bounds check before line 7." Step 4: REFINE โ†’ Store reflection in episodic memory โ†’ use it to guide next attempt โ†’ regenerate Step 5: REPEAT โ†’ until tests pass or max attempts reached # Result: Reflexion agents solve significantly more # HumanEval coding tasks than non-reflective agents
๐Ÿ’ก

Reasoning Frameworks for Agents M13T1L2

ReAct, Reflexion, Tree-of-Thought, and MCTS

ReAct: Reasoning + Acting KEY FRAMEWORK

โ–ผ
What

ReAct (Yao et al., 2022) is the foundational agent reasoning pattern. It interleaves Thought (LLM reasoning trace) with Action (tool call) with Observation (tool output). This tight coupling between reasoning and acting reduces hallucination because every factual claim is immediately grounded by a tool result.

๐Ÿ’ญ Thought
Reason about state
โ†’
โšก Action
Call tool/API
โ†’
๐Ÿ‘๏ธ Observation
Tool result
โ†’
๐Ÿ’ญ Thought
Update reasoning
โ†’ ...
โœ… Final Answer
# ReAct trace example for a multi-hop factual question: # "Who founded the company that makes the A100 GPU?" Thought 1: I need to find which company makes the A100 GPU. Action 1: search("A100 GPU manufacturer") Observation 1: "NVIDIA A100 GPU is manufactured by NVIDIA Corporation" Thought 2: Now I need to find who founded NVIDIA Corporation. Action 2: search("NVIDIA Corporation founders") Observation 2: "NVIDIA was co-founded by Jensen Huang, Chris Malachowsky, and Curtis Priem in 1993" Thought 3: I have the complete answer. Answer: NVIDIA, which makes the A100 GPU, was founded by Jensen Huang, Chris Malachowsky, and Curtis Priem.
โœ… ReAct vs. Pure CoT CoT reasons entirely in the model's "head" โ€” no external verification. ReAct grounds each reasoning step with a real tool call. For factual tasks, ReAct reduces hallucination significantly because the model can't make up an observation โ€” it comes from an actual tool.

๐Ÿ”ฌ Interactive: ReAct Agent Trace Simulator

Select a task to see how a ReAct agent breaks it down into thought-action-observation cycles.

Advanced Reasoning Frameworks Comparison BEYOND CLASS

โ–ผ

ReAct (Reasoning + Acting)

Yao et al., 2022 ยท Princeton / Google

Interleaves thought traces with tool actions. Linear chain: Tโ†’Aโ†’Oโ†’Tโ†’Aโ†’O... Best for: factual Q&A, web research, API tasks where you need grounded, verifiable answers at each step.

Reflexion

Shinn et al., 2023 ยท Northeastern University

Adds a reflection loop: after each failed attempt, the agent writes a verbal self-critique stored in episodic memory. Next attempt uses this reflection. Best for: coding challenges, multi-step reasoning where initial attempts often fail.

Tree-of-Thoughts (ToT)

Yao et al., 2023 ยท Princeton / Google

Generates multiple candidate reasoning steps at each point, evaluates them, and explores the best via BFS/DFS/beam search. Best for: creative problem-solving, mathematical proofs, strategic planning โ€” tasks with a combinatorial search space.

MCTS for Agents (AlphaCode-style)

Used in DeepMind's AlphaCode 2, o1/o3 reasoning

Monte Carlo Tree Search: expand a tree of possible action sequences, simulate outcomes, backpropagate scores, repeat. Enables superhuman performance on hard reasoning tasks. The basis of OpenAI's "thinking" models (o1, o3, o4-mini).

๐Ÿ”ฌ Why "Reasoning Models" (o1/o3/DeepSeek-R1) Are Different Traditional LLMs generate a single response. Reasoning models (using MCTS + RLHF) generate an extended internal "thinking" trace โ€” exploring multiple reasoning paths, backtracking, verifying โ€” before producing an answer. This is why they excel at math olympiad problems, PhD-level science, and complex code. The thinking tokens are a form of test-time compute scaling.

Cognitive Architectures: Classical vs. Neural Agents DEEP DIVE

โ–ผ
Why Study Classical Architecture

LLM agents didn't emerge in a vacuum โ€” they were preceded by decades of AI research on cognitive architectures. Understanding the classical foundations helps you reason about why LLM agents work the way they do, and where they still fall short.

๐Ÿ›๏ธ SOAR (1983)

Symbolic cognitive architecture. Uses production rules, working memory, and "chunking" to learn new rules from problem-solving. Foundation of rule-based AI agents.

๐Ÿง  ACT-R (1993)

Hybrid symbolic/neural. Modules for declarative memory, procedural memory, and perceptual/motor. Most empirically grounded cognitive architecture in psychology.

๐Ÿค– BDI Agents

Belief-Desire-Intention. Agents have beliefs (world state), desires (goals), and intentions (committed plans). Still used in multi-agent systems and robotics.

๐Ÿ’ฌ LLM Agents (2022+)

The LLM implicitly serves as all cognitive modules. Context window = working memory. Tool calls = perception/motor. System prompt = procedural knowledge. Emergent rather than designed.

๐Ÿ”— Hybrid Symbolic-Neural

Combines LLM "intuition" with formal planners, theorem provers, or logic engines. Best for safety-critical domains where the neural component must be verifiably constrained.

๐ŸŒ Embodied Agents

Agents grounded in physical or virtual environments (robotics, game agents). Perception = visual/sensor input. The agent must ground language in physical reality โ€” hardest open problem in agentic AI.

โš ๏ธ

Agent Failure Modes & Safety

Why agents fail โ€” and how to make them safer

Common Agent Failure Modes

โ–ผ
Failure ModeDescriptionMitigation
Hallucinated tool callsLLM invents tool arguments or tool names that don't existStrict function schemas, input validation, constrained generation
Infinite loopsAgent gets stuck in a loop (same action repeatedly)Max step limits, loop detection, action deduplication
Goal driftAgent pursues sub-goal instead of original goalPersistent goal reminder in every prompt, goal-checking step
Cascading errorsWrong result in step 2 propagates and corrupts all later stepsError detection at each step, checkpointing, rollback
Prompt injectionMalicious content in environment overrides agent instructionsInput sanitization, privilege separation, human-in-the-loop
Over-trust of toolsAgent blindly trusts incorrect tool outputCross-verification with multiple sources, confidence thresholds
Context overflowConversation grows beyond context window, losing key infoContext compression, sliding window, memory summarization
โš ๏ธ The Alignment Challenge in Agentic Systems Alignment problems are amplified in agents: a slightly misspecified goal in a single-turn chatbot causes a bad response. The same misspecification in an agent running 100 steps and calling external APIs can cause significant real-world harm before anyone notices. Human-in-the-loop checkpoints and reversibility are critical engineering requirements for production agents.

๐Ÿง  Quiz 11 Prep โ€” Foundations of Agentic AI

1. What is the fundamental property that distinguishes an AI "agent" from a standard LLM chatbot?

โœ… Correct! The agent loop is the defining characteristic. Unlike a chatbot that handles one query at a time, an agent perceives its environment, reasons about its state, takes actions (including tool calls), observes results, and loops โ€” persisting toward a goal across many steps.
โŒ The distinction is about the architecture and operational paradigm, not model size or training. The key is the perceive-reason-act loop with tool use and goal persistence.

2. In the ReAct framework, why does interleaving Thought-Action-Observation reduce hallucination compared to standard Chain-of-Thought?

โœ… Correct! In pure CoT, the model reasons entirely in its own head โ€” it can hallucinate both the reasoning steps and the final answer. In ReAct, the Observation comes from an actual tool call, providing a verified anchor point. The model's subsequent thoughts are conditioned on real information, not confabulated context.
โŒ The key insight is about grounding. Observations in ReAct are tool outputs โ€” real, verified data from the world. This prevents the model from "making up" facts during its reasoning chain.

3. An agent working on a 30-step task starts "forgetting" its original goal by step 20 and begins pursuing a tangential sub-goal. This is called:

โœ… Correct! Goal drift is a well-documented agent failure mode. Over many steps, the context window fills up with tool outputs and observations, and the original high-level goal gets "washed out." The mitigation is to include an explicit goal reminder at every step of the prompt.
โŒ While context overflow and hallucination can contribute, the specific phenomenon of losing the original goal in favor of a sub-goal across many steps is called goal drift.

4. Which type of agent memory is analogous to a human writing notes to themselves after completing a task so they do it better next time?

โœ… Correct! Episodic memory records past experiences โ€” what happened, what worked, what failed. The Reflexion framework writes self-critiques (reflections on failed episodes) into episodic memory, which are retrieved when the agent faces a similar task again. This is exactly analogous to a human writing a "lessons learned" document.
โŒ Semantic memory is general world knowledge. Procedural memory is about learned skills/workflows. Episodic memory is the record of specific events and experiences โ€” the "diary" memory type.
๐Ÿ”ฌ Beyond the Slides ยท Graduate Depth

Ethics, Alignment & Responsible AI: The Science Behind Safe LLMs

Deploying an LLM without understanding alignment is like shipping a self-driving car without safety testing. This section covers the technical reality of AI safety โ€” from WEAT bias measurement and DPO alignment math, to EU AI Act compliance and LLM watermarking. Every AI/ML professional in 2026 needs this.

โš–๏ธ Bias in Embeddings: WEAT & Measurement Research

WEAT (Word Embedding Association Test, Caliskan et al. 2017) measures implicit bias in word embeddings by testing whether target concepts associate more strongly with one attribute than another โ€” mirroring the human IAT (Implicit Association Test).

# WEAT test statistic s(w, A, B) = mean_a cos(w,a) - mean_b cos(w,b) # How much more similar is word w to # attribute set A vs. attribute set B? S(X, Y, A, B) = ฮฃ_{xโˆˆX} s(x,A,B) - ฮฃ_{yโˆˆY} s(y,A,B) # Positive = X associates more with A # (e.g., "male" names + "career" words) Effect size d = S / std_dev(union_X_Y)

Classic WEAT Findings

  • GloVe/Word2Vec: flowers/insects, pleasantness/unpleasantness
  • Male names โ†’ career; female names โ†’ family (strong effect)
  • European-American names โ†’ pleasant; African-American names โ†’ unpleasant
  • Science terms โ†’ male; arts terms โ†’ female

Debiasing Methods

  • Hard debiasing (Bolukbasi 2016): project out gender subspace
  • Counterfactual augmentation: swap gendered words in training data
  • Limitation: debiasing embeddings often reduces downstream accuracy; bias can reappear in fine-tuning

๐ŸŽฏ DPO: Direct Preference Optimization Deep Dive PhD

๐Ÿ“– Why DPO Over PPO?

RLHF with PPO requires 4 models simultaneously: reference policy ฯ€_ref, current policy ฯ€_ฮธ, reward model r_ฯ†, and value function V_ฯ†. It's unstable, slow, and expensive. DPO (Rafailov et al., NeurIPS 2023) realizes that the optimal policy can be derived analytically โ€” no RL needed.

# RLHF objective (requires reward model) max_ฯ€ E[r(x,y)] - ฮฒยทKL[ฯ€ || ฯ€_ref] # Optimal solution analytically: ฯ€*(y|x) = ฯ€_ref(y|x)ยทexp(r(x,y)/ฮฒ) / Z(x) # Rearranging: reward can be expressed as: r(x,y) = ฮฒยทlog(ฯ€*(y|x)/ฯ€_ref(y|x)) + ฮฒยทlog Z(x) # DPO loss (no reward model needed!): L_DPO = -log ฯƒ(ฮฒยทlog(ฯ€_ฮธ(y_w|x)/ฯ€_ref(y_w|x)) - ฮฒยทlog(ฯ€_ฮธ(y_l|x)/ฯ€_ref(y_l|x))) # y_w = preferred, y_l = rejected output

DPO vs PPO-RLHF

AspectPPO-RLHFDPO
Models needed4 (policy, ref, reward, value)2 (policy, ref)
StabilityOften unstableStable (SFT-like)
Compute costVery highModerate
Reward hackingYes (explicit RM)Less (implicit)
Used byGPT-4, ClaudeLLaMA-2, Mistral, Zephyr

KTO (Kahneman-Tversky Optimization, 2024) extends DPO using prospect theory โ€” doesn't require paired preferences, works with individual (x, y, label) examples.

๐Ÿ›๏ธ EU AI Act: What LLM Developers Must Know 2026 Compliance

1
Unacceptable Risk (Prohibited)

Social scoring systems, real-time biometric surveillance in public, manipulation of vulnerable groups, exploitation of unconscious behaviors. Banned outright.

2
High Risk (Strict Requirements)

LLMs used in hiring, credit scoring, education assessment, critical infrastructure, law enforcement, medical devices. Requires conformity assessment, transparency, human oversight, logging.

3
General Purpose AI (GPAI) Models

Models like GPT-4, LLaMA, Claude with systemic risk (>10ยฒยณ FLOPs training compute) face additional obligations: red-teaming, adversarial testing, cybersecurity measures, incident reporting to EU AI Office.

4
Watermarking Requirements (2026)

AI-generated content must be labeled. Technical approaches: token-level watermarking (Kirchenbauer 2023 โ€” green/red token lists), metadata embedding, provenance certificates. EU Code of Practice requires multi-layered approach since no single method is sufficient.

๐Ÿงฎ Fairness Metrics: The Irreconcilable Trio Research

Three widely-used fairness criteria are mathematically incompatible (Chouldechova 2017, Kleinberg 2017) โ€” satisfying one often violates the others. Understanding this is critical for any responsible AI deployment.

CriterionDefinitionFormulaTradeoff
Demographic ParityEqual positive prediction rates across groupsP(ลถ=1|A=0) = P(ลถ=1|A=1)Ignores base rate differences
Equalized OddsEqual TPR and FPR across groupsP(ลถ=1|Y=y,A=a) equal for all a,yRequires same accuracy per group
CalibrationPredicted probabilities match true frequencies in each groupP(Y=1|score=s,A=a) = s for all aConflicts with equalized odds when base rates differ
๐Ÿ’ก The Impossibility Theorem

Chouldechova (2017) proved: when base rates differ across groups, no classifier can simultaneously achieve demographic parity, equalized odds, AND calibration. This means choosing a fairness criterion is a policy decision, not a technical one. NLP researchers must make this explicit when building and deploying models.