AIP-C01 Deep Study Guide

AWS Certified Generative AI Developer β€” Professional
Professional Level Beta Exam 5 Domains Β· 75 Questions (65 scored)

AIP-C01 Exam Blueprint & Study Map

AWS Certified Generative AI Developer β€” Professional Β· 65 Questions Β· 130 minutes Β· ~$300 USD

Target: Developers with 2+ yrs AWS + 1+ yr GenAI hands-on
5

Content Domains

65

Exam Questions

130

Minutes

~100

Bedrock Models Available

750

Passing Score (scaled)

πŸ“‹

Domain Weight Distribution & Key Focus Areas

β–Ά
DomainApprox WeightCore TasksHigh-Priority Services
1. FM Integration, Data & Compliance~30%FM selection, Data pipelines, Vector stores, RAG, Prompt engineeringBedrock, OpenSearch, SageMaker, S3, Glue
2. Implementation & Integration~25%Agentic AI, Deployment strategies, Enterprise integration, API patternsBedrock Agents, Step Functions, Lambda, API GW, ECS
3. AI Safety, Security & Governance~20%Guardrails, IAM, Compliance, Responsible AI, AuditBedrock Guardrails, IAM, CloudTrail, Security Hub, KMS
4. Operational Efficiency & Optimization~15%Cost optimization, Performance tuning, Monitoring, CachingCloudWatch, Cost Explorer, Bedrock Prompt Routing, SageMaker
5. Testing, Validation & Troubleshooting~10%Model evaluation, QA frameworks, Debugging, Regression testingBedrock Model Eval, SageMaker Experiments, CloudWatch Logs
🎯 Exam Strategy (given your AIF-C01 background)Focus deep on: (1) Agentic AI patterns β€” Bedrock Agents, Strands, MCP, multi-agent orchestration; (2) Advanced RAG β€” chunking strategies, hybrid search, re-ranking; (3) Nova model family β€” Pro/Lite/Sonic/Forge use cases; (4) Deployment decision trees β€” when Lambda vs Provisioned Throughput vs SageMaker endpoints; (5) Guardrails deep config.
πŸ—ΊοΈ

Services You Must Know Cold

β–Ά
🟠 Amazon Bedrock Core
  • InvokeModel / InvokeModelWithResponseStream
  • Converse API (unified multi-model)
  • Knowledge Bases (RAG)
  • Agents + Action Groups
  • Guardrails (content/PII/topic)
  • Prompt Management + Flows
  • Model Evaluation
  • Custom Model Import
  • Provisioned Throughput
  • Cross-Region Inference
  • Intelligent Prompt Routing
πŸ”΅ Amazon Nova Family
  • Nova 2 Pro β€” complex multi-step reasoning
  • Nova 2 Lite β€” cost-effective, high-volume
  • Nova 2 Sonic β€” real-time voice/conversation
  • Nova Canvas β€” image generation
  • Nova Reel β€” video generation
  • Nova Forge β€” custom model training from checkpoints
  • Supports system messages, multimodal, streaming
🟒 Agentic Stack
  • Bedrock Agents (managed)
  • Bedrock AgentCore (composable services)
  • Strands Agents SDK (open-source)
  • AWS Agent Squad (multi-agent)
  • MCP (Model Context Protocol)
  • Step Functions (ReAct/CoT workflows)
  • Lambda (stateless MCP servers)
  • ECS (complex MCP servers)
🟣 Vector & RAG Stack
  • Amazon OpenSearch (neural plugin, HNSW)
  • Aurora PostgreSQL + pgvector
  • Amazon S3 Vectors (new β€” billions of vectors)
  • Bedrock Knowledge Bases
  • Amazon Titan Embeddings (V1/V2/Multimodal)
  • Amazon Kendra (keyword + semantic hybrid)
  • Bedrock Data Automation
πŸ”΄ Security & Governance
  • Bedrock Guardrails (content filter, PII, topic)
  • IAM + resource-based policies
  • AWS KMS (data encryption)
  • AWS CloudTrail (audit)
  • AWS Security Hub (near-real-time risk)
  • Amazon Macie (PII in S3)
  • VPC Endpoints (PrivateLink)
  • SageMaker Model Cards
🩡 Deployment & Ops
  • SageMaker Real-time, Serverless, Async endpoints
  • SageMaker Multi-model / Multi-container endpoints
  • SageMaker Inference Components
  • EC2 UltraServers (large model inference)
  • DeepSpeed / Triton model parallelism
  • Nova Forge (custom training via SageMaker)
  • CloudWatch (metrics, alarms, dashboards)
  • AWS X-Ray (distributed tracing)
⚑

Critical "Know the Difference" Decision Points

β–Ά
DecisionOption AOption BChoose When
Bedrock vs SageMakerBedrock β€” managed, pay-per-tokenSageMaker β€” custom containers, full controlBedrock for standard FMs; SageMaker for custom/open-source models or complex inference pipelines
On-demand vs Provisioned ThroughputOn-demand β€” variable traffic, pay-per-useProvisioned β€” predictable, dedicated capacityProvisioned for steady high-volume; On-demand for spiky/low traffic
RAG vs Fine-tuningRAG β€” dynamic, updatable knowledgeFine-tuning β€” baked-in domain knowledgeRAG when data changes frequently; Fine-tuning for style/tone/format adaptation
Lambda vs ECS for MCPLambda β€” stateless, lightweight toolsECS β€” stateful, complex compute toolsLambda for simple tool calls; ECS for code execution, image processing
OpenSearch vs pgvector vs S3 VectorsOpenSearch β€” full-text + vector hybridpgvector β€” relational + vector in RDSS3 Vectors for billions of vectors, cost-optimized; pgvector when you need SQL joins; OpenSearch for hybrid keyword+semantic
Fixed vs Semantic ChunkingFixed-size β€” simple, predictableSemantic β€” content-aware boundariesFixed for uniform content; Semantic for varied documents; Hierarchical for structured docs
Nova Pro vs Lite vs SonicPro β€” complex reasoning tasksLite β€” high-volume, cost-efficientSonic for real-time voice/conversation; Lite for batch/simple; Pro for analysis/reasoning
Bedrock Agents vs Strands vs Step FunctionsBedrock Agents β€” fully managed, conversationalStrands β€” open-source, custom controlStep Functions for deterministic workflows with branching; Bedrock Agents/Strands for autonomous LLM-driven action selection

Domain 1: Foundation Model Integration, Data Management & Compliance

Tasks 1.1–1.6 Β· Covers FM selection, data pipelines, vector stores, RAG, and prompt engineering

~30% of Exam Weight
1.1

Analyze Requirements & Design GenAI Solutions

β–Ά
πŸ—οΈ Architectural Patterns for GenAI Solutions

Three primary integration approaches based on control/expertise trade-offs:

Amazon Bedrock (Unified API)
  • Fully managed, no infrastructure
  • Pay-per-use (tokens)
  • Quick time-to-market
  • ~100 models via single API
  • Best for: standard FMs, rapid prototyping
SageMaker (Custom/Control)
  • Bring your own model/container
  • Fine-grained instance control
  • Supports open-source models
  • GPU selection (g5, p4d families)
  • Best for: custom models, complex inference
AWS AI Factories (On-Prem)
  • AWS-managed infra in your DC
  • Cloud-like AI in own environment
  • For data sovereignty requirements
  • Also: AWS Outposts for hybrid
🎯 Exam Tip Questions about "which service to use" almost always hinge on: (1) level of control needed, (2) data residency requirements, (3) traffic pattern (steady vs. spiky), and (4) team ML expertise. Memorize these trade-offs.
AWS Well-Architected for GenAI (6 Pillars applied)
PillarGenAI-Specific Consideration
Operational ExcellenceAutomated model retraining, baseline behavior metrics, self-healing capabilities
SecurityIAM for model access, VPC endpoints, KMS encryption, Guardrails for output safety
ReliabilityMulti-AZ by default, cross-Region inference for HA, circuit breakers, fallback models
Performance EfficiencyRight-sizing (Lambda vs. Provisioned), caching embeddings/responses, batch inference
Cost OptimizationOn-demand vs. provisioned throughput, model cascading (cheap→expensive), prompt caching
SustainabilityModel distillation, parameter-efficient fine-tuning (PEFT), smaller specialized models
PoC β†’ Production Transition Framework
  1. Define use case scope with success criteria and ROI metrics
  2. Select FM: benchmark on custom eval set, not just public leaderboards
  3. Build PoC using Bedrock + Lambda + simple front-end
  4. Validate with stakeholders: accuracy, latency, cost per inference
  5. Harden for production: add Guardrails, monitoring, error handling
  6. Phased rollout: pilot β†’ limited release β†’ full production
🧩 Enterprise Adoption Strategy
AI Center of Excellence (CoE)
  • Central governance and best practices
  • Pattern library and code templates
  • Model governance committee
  • Standardized onboarding process
  • Cross-functional team structure
Production Monitoring Framework
  • Technical: inference latency (p50/p95/p99), throughput, error rates
  • Business: cost per inference, user satisfaction, task completion
  • Quality: accuracy, consistency, hallucination rate
  • CloudWatch dashboards + automated alerts
πŸ’‘ Nova Forge β€” Cost Optimization for Custom ModelsNova Forge allows continued pre-training from checkpoints (pre/mid/post-training phases), blending proprietary data with Nova-curated data. This significantly reduces cost vs. full retraining and preserves foundational skills. Uses RL with your own reward functions and orchestrator for multi-turn rollouts. Accessed via Amazon SageMaker AI.
1.2

Select & Configure Foundation Models

β–Ά
πŸ“Š FM Evaluation Frameworks & Benchmarks
General Benchmarks
  • MMLU β€” 57 subjects, knowledge breadth
  • HELM β€” 42+ models, multidimensional (fairness, bias, toxicity)
  • BigBench β€” 204+ diverse tasks, capability boundaries
  • BIG-Bench Hard β€” complex multi-step reasoning
  • GLUE/SuperGLUE β€” language understanding
Task-Specific
  • HumanEval+ / MBPP+ β€” code generation
  • GSM8K / MATH β€” mathematical reasoning
  • MT-Bench β€” multi-turn conversation (GPT-4 as judge)
  • MedPaLM β€” medical domain
  • FinanceBench β€” financial analysis + compliance
Multimodal
  • MMMLU β€” text/image/audio/video
  • MME β€” fine-grained perception vs. reasoning
  • MMMU β€” professional multimodal tasks
  • LMSys Chatbot Arena β€” human preference (Elo ratings)
🎯 Key Insight Benchmark scores don't always translate to real-world performance. Always supplement with custom benchmarks built from your actual use case data. The Bedrock Model Evaluation feature lets you run your own evals (automatic + human review).
πŸ”€ Model Routing Strategies (Critical Topic)
StrategyHow It WorksBest ForAWS Implementation
Static RoutingPredetermined rules (department, content type, user role)Simple, predictable workloadsLambda + JSON routing config, AppConfig feature flags
Dynamic / Intelligent RoutingRuntime analysis of prompt complexity, content type, cost/qualityMixed workloads needing optimizationBedrock Intelligent Prompt Routing
Content-Based RoutingStep Functions Choice states evaluate input characteristicsSpecialized models per domainStep Functions + Lambda classifier
Model CascadingStart cheap (Nova Lite) β†’ escalate to Pro only if quality < thresholdCost optimization with quality floorLambda confidence scoring + escalation logic
Cross-Region InferenceDistribute requests across AWS regionsThroughput scaling, HA, latency optimizationBedrock Cross-Region Inference Profiles
Nova Model Routing Tiers
Nova 2 Pro

High-complexity: multi-step reasoning, detailed analysis, document understanding. Highest cost/quality.

Nova 2 Lite

Medium complexity: standard generation, high-volume processing. Best cost/performance ratio for most workloads.

Nova 2 Sonic

Real-time conversational AI with lowest latency. Optimized for voice applications and streaming dialogue.

πŸ’‘ Bedrock Intelligent Prompt Routing β€” Key Facts
  • Analyzes: prompt length, complexity, content type, performance requirements
  • Evaluates: latency requirements, cost limits, quality thresholds
  • Routes between: Claude, Nova, Titan, and other Bedrock models
  • Learns from performance data over time (improves routing accuracy)
  • Maintains consistent response formats across models
  • Configure via Bedrock API β€” minimal code changes required
πŸ›‘οΈ Resilient AI System Design
Circuit Breaker Pattern
  • Monitor error rate over N requests
  • Threshold: ~50% failures β†’ open circuit
  • Recovery timeout: 30-60 seconds
  • Half-open: test 10-20% traffic
  • Implement with: Step Functions + CloudWatch alarms
Fallback Hierarchy
  • Primary model β†’ Smaller model
  • Smaller model β†’ Cached response
  • Cached β†’ Static response
  • Each level has quality threshold check
  • Log all fallback events for analysis
Cross-Region HA
  • Active-active or active-passive
  • Route 53 health checks (30s interval, 3 failures)
  • Cross-region inference profiles in Bedrock
  • DynamoDB Global Tables for state sync
  • S3 Cross-Region Replication for assets
⚠️ Watch Out "Graceful degradation" is a key exam theme. Know how to design systems that degrade gracefully: return a cached/simplified response rather than fail completely. AWS AppConfig for dynamic config updates without redeployment.
1.3

Implement Data Validation & Processing Pipelines

β–Ά
πŸ”„ Data Quality & Validation Architecture

Data quality impacts FMs through three channels: prompts, retrieved information (RAG), and fine-tuning datasets.

ToolRole in PipelineKey Capability
AWS GlueETL, Data Catalog, CrawlersSchema detection, data cataloging, validation workflows, PySpark transforms
SageMaker Data WranglerData exploration & transformation UI300+ built-in transforms, data quality reports, bias detection
SageMaker Processing JobsLarge-scale data processingPre-built Scikit-learn/Spark containers, feature engineering, evaluation
AWS LambdaCustom validation logic, real-time checksSchema validation, type checks, range validation, normalization
Step FunctionsPipeline orchestration with quality gatesError handling, retries, parallel processing, feedback loops
Amazon ComprehendNLP enrichmentEntity extraction, sentiment, PII detection for data enhancement
Bedrock Data AutomationUnstructured data processingAuto-cleansing, tokenization, formatting for training/RAG data
CloudWatchData quality monitoringCustom metrics for data drift, quality scores, anomaly detection
πŸ“¦ JSON Formatting for Bedrock APIs (Must Know)

Each FM has a specific JSON schema. The Converse API provides a unified interface.

Claude / Nova (messages format):

// Claude format { "anthropic_version": "bedrock-2023-05-31", "max_tokens": 1000, "system": "You are an assistant...", "messages": [{ "role": "user", "content": "Your prompt here" }], "temperature": 0.7, "top_p": 0.9 } // Nova format (similar, uses inferenceConfig) { "system": [{"text": "..."}], "messages": [{...}], "inferenceConfig": { "maxTokens": 1000, "temperature": 0.7 } }

Amazon Titan format:

{ "inputText": "Your prompt", "textGenerationConfig": { "maxTokenCount": 500, "temperature": 0.8, "topP": 0.9, "stopSequences": ["User:"] } }

HTTP Error Codes:

  • 400 β€” Bad Request (invalid JSON, missing fields)
  • 401/403 β€” Auth/permission issues (non-retriable)
  • 429 β€” Throttling (retriable with backoff)
  • 500/503 β€” Service errors (retriable)
🎯 Retry Strategy: Exponential backoff starting at 100ms, factor of 2, max 3-5 attempts, add ±100ms jitter. SDK does this automatically if configured.
Multimodal Input (image in messages):
{ "messages": [{ "role": "user", "content": [ {"type": "text", "text": "Describe this diagram"}, {"type": "image", "source": { "type": "base64", "media_type": "image/jpeg", "data": "<base64-encoded-image>" }} ] }] }
🎭 Multimodal Data Processing
Text Processing
  • Amazon Comprehend: entities, sentiment, PII
  • AWS Glue: ETL, normalization
  • Lambda: custom cleaning, tokenization
  • Bedrock Data Automation: AI-powered prep
Image Processing
  • Amazon Rekognition: object detection, labels
  • Bedrock Nova Canvas/Titan Image
  • Base64 encoding for Bedrock API
  • S3 + Lambda trigger pipeline
Audio/Video
  • Amazon Transcribe: speech-to-text
  • Cross-modal alignment (sync audio/video)
  • Nova Reel: video generation
  • Nova Sonic: real-time audio conversation
πŸ’‘ S3 Vectors (New Feature)Amazon S3 Vectors is a new capability for storing and querying vector embeddings natively in S3. Supports billions of vectors with sub-second query latency. Key advantages: 40-60% cost reduction with Intelligent-Tiering, metadata pre-filtering reduces search space 50-70%, multi-region replication with <15 min sync, ABAC for fine-grained access.
1.4

Design & Implement Vector Store Solutions

β–Ά
πŸ“ Vector Database Deep Dive
Distance Metrics β€” Know All Three:
MetricFormula ConceptBest ForNotes
Cosine SimilarityAngle between vectors (direction only)Text embeddings, docs of different lengthsRange: -1 to 1; ignores magnitude; most common for NLP
Euclidean DistanceStraight-line distance in vector spaceWhen magnitude matters, dense embeddingsSensitive to dimensionality; lower = more similar
Dot ProductMagnitude + direction combinedWhen content volume is relevantCan favor longer documents; efficient compute
AWS Vector Store Options:
ServiceIndex TypeHybrid SearchScaleBest Use Case
OpenSearch NeuralHNSW or IVFβœ… Keyword + VectorLarge to very largeFull-text + semantic search, enterprise search
Aurora pgvectorIVFFlat, HNSWβœ… SQL + VectorMediumNeed relational queries + similarity (e.g., filter by user_id then similarity)
S3 VectorsNative S3 distributed❌ Vector onlyBillions of vectorsCost-optimized large-scale vector storage
Bedrock Knowledge BasesManaged (OSS backend)βœ… Managed hybridEnterpriseManaged RAG β€” no infra management
Amazon MemoryDBRedis-compatible❌MediumUltra-low latency vector + key-value
πŸ” OpenSearch HNSW Configuration (Deep Detail)

Hierarchical Navigable Small World (HNSW) is the primary index type for vector search in OpenSearch:

Index Construction Parameters
  • M: Max connections per node β€” higher M = better recall but more memory (typical: 16-64)
  • ef_construction: Search width during build β€” higher = better quality, slower indexing (typical: 100-512)
  • max_connections: Upper limit on node connections
Search Parameters
  • ef_search: Search width during query β€” higher = better recall, slower (typical: 100-512)
  • num_candidates: Candidates to evaluate
  • rescore: Enable for improved accuracy
  • Performance: p50/p95/p99 latency + recall@k
4-Stage Hierarchical Search Pipeline:
  1. Coarse filtering: Apply metadata filters, document clustering, semantic routing to relevant partitions
  2. Approximate ANN search: Fast approximate nearest neighbor, retrieve larger candidate set
  3. Fine-grained ranking: Precise cosine scores, business logic weighting, diversity algorithms
  4. Result assembly: Retrieve full content + metadata, final formatting, relevance explanations
🎯 S3 Vectors Performance: Pre-filter metadata BEFORE vector calculations to reduce search space 50-70%. Use prefix-based hierarchical organization for efficient filtering. Configure Intelligent-Tiering for 40-60% cost reduction on infrequently accessed vectors.
πŸ”„ Vector Store Data Maintenance Systems
Event-Driven Updates
  • S3 event β†’ Lambda β†’ re-embed β†’ upsert
  • DynamoDB Streams β†’ update pipeline
  • Near real-time freshness
  • Best for: frequently changing docs
Batch Sync
  • Scheduled Glue jobs or Step Functions
  • Delta detection (last-modified timestamps)
  • Cost-efficient for bulk updates
  • Best for: large corpora, nightly updates
Hybrid Approach
  • Real-time for high-priority content
  • Batch for bulk/archival content
  • Drift monitoring with CloudWatch
  • Version control for knowledge bases
S3 Metadata Framework for RAG Enhancement:
System-Defined Metadata
  • Content-Type, Content-Length
  • Last-Modified timestamp
  • ETag (content fingerprint)
  • x-amz-version-id
User-Defined Metadata (x-amz-meta-*)
  • document-author, department, category
  • expiry-date, version, language
  • security-classification, jurisdiction
  • Enables pre-filtering before vector search
1.5

Design Retrieval Mechanisms for FM Augmentation (RAG)

β–Ά
βœ‚οΈ Chunking Strategies β€” Critical Deep Dive
StrategyHow It WorksProsConsUse When
Fixed-SizeSplit every N tokens (e.g., 512) with optional overlap (e.g., 50 tokens)Simple, predictable, consistent embeddingsMay break semantic unitsUniform content (FAQs, reports)
Recursive CharacterTry splitting on paragraphs β†’ sentences β†’ words β†’ charsPreserves natural boundaries betterVariable chunk sizesGeneral-purpose documents
SemanticSplit where embedding similarity drops below thresholdContent-aware, preserves meaningSlower, requires embedding during chunkingVaried documents, conversational content
HierarchicalParent chunks (large context) + child chunks (precise retrieval)Best of both worlds: precision + contextMore complex, higher storage costLong documents needing both broad and specific retrieval
Document-StructureUse headers, sections, paragraphs as boundariesPreserves logical document structureRequires structured inputPDFs, Word docs, HTML with clear structure
πŸ’‘ Chunking Best Practices
  • Overlap: 10-20% of chunk size to preserve cross-boundary context
  • Include metadata in chunk (source, page, section) for better retrieval context
  • Measure: chunk cohesion (intra-chunk cosine similarity), retrievability metrics
  • Bedrock Knowledge Bases offers: fixed-size, semantic, and hierarchical chunking built-in
  • Custom chunking: Lambda function for complex logic (hierarchical workflows)
🧲 Embedding Models β€” Amazon Titan Embeddings
Titan Text Embeddings V2
  • Dimensions: 256, 512, or 1024 (configurable)
  • Supports normalization (for cosine)
  • English + multilingual support
  • Best for: text-only semantic search
Titan Multimodal Embeddings G1
  • Embeds both text AND images in same space
  • Cross-modal similarity search
  • Dimension: 1024
  • Best for: product search, media retrieval
Embedding Selection Criteria
  • Match dimensionality to quality/cost need
  • Use SAME model for indexing AND querying
  • Consider: throughput, cost per 1K tokens
  • Cohere Embed for multilingual enterprise
🎯 Critical Rule: Always use the exact same embedding model for both creating the vector index AND for embedding queries at search time. Mixing models produces meaningless similarity scores.
πŸ”Ž Advanced Query Engineering
Query Enhancement Techniques:
Query Expansion
  • Use LLM to generate synonyms/related terms
  • HyDE: generate hypothetical answer, embed it, search for similar docs
  • Multi-query: generate N variations β†’ union results
  • Domain-specific expansion (medical/legal terms)
Query Decomposition
  • Break complex queries into sub-queries
  • Identify: temporal, entity, constraint components
  • Run sub-queries in parallel (Lambda)
  • Aggregate + deduplicate results
  • Use Step Functions for orchestration
Re-ranking
  • First-pass: fast ANN retrieval (top-k)
  • Re-rank with cross-encoder model
  • Apply business logic weighting
  • Diversity algorithms (avoid result clustering)
  • Amazon Kendra: hybrid keyword + semantic
# Query pipeline classification example def select_processing_pipeline(query, classification): if classification == 'simple': return ['expansion'] elif classification == 'complex': return ['decomposition', 'expansion', 'transformation'] elif classification == 'domain_specific': return ['domain_expansion', 'specialized_transformation']
1.6

Implement Prompt Engineering Strategies & Governance

β–Ά
✍️ Advanced Prompt Engineering Techniques
TechniqueDescriptionAWS ImplementationBest For
Chain-of-Thought (CoT)"Think step by step" β€” forces intermediate reasoning before answerSystem message + prompt structure; Step Functions for multi-stepMath, logic, complex analysis
ReAct (Reason+Act)Interleaved Reasoning-Action-Observation loopStep Functions state machine (Reason state β†’ Action state β†’ Observe state)Agentic tasks needing tool use
Few-ShotProvide 3-5 examples in promptBedrock Prompt Management templates with examplesClassification, format adherence
Tree of ThoughtExplore multiple reasoning branches in parallelStep Functions Parallel states + aggregation LambdaComplex multi-path problems
Self-ConsistencySample N responses, majority voteLambda to invoke model N times + aggregationFactual accuracy, reducing hallucination
Prompt ChainingOutput of prompt A feeds prompt BBedrock Flows (visual) or Step FunctionsMulti-stage document processing
πŸ’‘ Bedrock Flows: Visual, node-based builder for prompt chains. Nodes include: FM inference nodes, Lambda nodes, Condition nodes, Iterator nodes, Collector nodes, Knowledge Base retrieval nodes. Use for: RAG + generation pipelines, multi-step reasoning, conditional branching without custom code.
πŸ›‘οΈ Bedrock Guardrails β€” Deep Configuration
Content Filters
  • Categories: Hate, Insults, Sexual, Violence, Misconduct, Prompt Attack
  • Severity levels: LOW, MEDIUM, HIGH
  • Applies to: INPUT and/or OUTPUT
  • Custom threshold per category
Topic Denial
  • Define forbidden topics with plain language
  • Examples: competitor products, legal advice, medical diagnoses
  • LLM-based classification (no regex)
  • Returns custom denial message
PII Redaction
  • 50+ PII types: SSN, credit card, email, phone, name, address
  • Modes: REDACT (replace with type) or BLOCK
  • Applies to both input and output
  • Audit-ready with CloudTrail logging
Grounding Check
  • Detects hallucinations vs. source documents
  • Checks if output is grounded in retrieved context
  • Relevance scoring threshold configurable
  • Essential for RAG pipelines
Word Filters
  • Custom blocked word lists
  • Managed lists (profanity)
  • Applied post-generation
Prompt Injection Defense
  • PROMPT_ATTACK filter category in content filter
  • Detects jailbreak attempts, role-play attacks
  • System prompt separation (protected)
  • Input validation in Lambda pre-Bedrock call
⚠️ Guardrails Gotcha: Guardrails must be explicitly associated with a model invocation (via guardrailIdentifier + guardrailVersion in the API call). They do NOT auto-apply to all Bedrock calls. Also: Guardrails can be applied at both REQUEST and RESPONSE level independently.
πŸ“‹ Prompt Management & Governance (Enterprise)
Bedrock Prompt Management Features:
  • Centralized repository: Store prompt templates with versions
  • Parameterization: Variables in templates ({{input}}, {{context}})
  • Version control: Draft β†’ Review β†’ Approved β†’ Production
  • Approval workflows: Governance gates before deployment
  • A/B testing: Route % traffic to different prompt versions
  • Analytics: Track performance per prompt version
Governance Architecture:
  • CloudTrail: All prompt management API calls logged
  • IAM policies: Role-based access to prompt versions
  • Security Hub: Near-real-time risk analytics for FM deployments
  • Centralized vs. Federated: Central policy + distributed implementation
  • Async monitoring: Don't impact latency with sync governance checks
🎯 QA for Probabilistic Outputs: FM outputs are probabilistic β€” same input can produce different outputs. Design validation around: semantic similarity scoring (not exact match), threshold-based acceptance, statistical validation over N runs. Use SageMaker AI serverless RL-based customization (new feature) to reduce fine-tuning time from months to days.

Domain 2: Implementation & Integration

Tasks 2.1–2.5 Β· Agentic AI, Deployment Strategies, Enterprise Integration, API Patterns, Dev Tools

~25% of Exam Weight
2.1

Implement Agentic AI Solutions & Tool Integrations

β–Ά
πŸ€– Agentic AI Architecture Overview
TechnologyTypeKey CharacteristicsWhen to Use
Amazon Bedrock AgentsFully ManagedBuilt-in orchestration, action groups, knowledge bases, memory, Guardrails integrationStandard agentic workflows, minimal infra management, conversational agents
Bedrock AgentCoreComposable ServicesFramework-agnostic (works with any SDK/model), AgentCore Policy (governance), AgentCore Evaluations, episodic memory for enhanced contextComplex agents needing fine-grained composability, multi-framework environments
Strands Agents SDKOpen-SourceFull code visibility, modular (swap components), built-in eval, MCP integration, @tool decoratorCustom agent logic, need transparency/control, contributing to open-source
AWS Agent SquadMulti-Agent OrchestrationCoordinates multiple specialized agents, shared context/state, task delegationComplex tasks requiring collaboration between specialized agents
Step Functions (ReAct)Workflow EngineDeterministic state machines, guaranteed execution, built-in error handling, human approval stepsPredictable workflows needing audit trail, human-in-the-loop, compliance
πŸ”— Model Context Protocol (MCP) β€” Deep Dive

MCP is a standardized protocol for agent-tool interactions. Agents discover tools, invoke them, and get results via MCP servers.

MCP Transport Protocols
  • stdio: Local process communication (dev/local)
  • SSE: Server-Sent Events (streaming, HTTP)
  • streamable-http: For AWS deployments (Mcp-Session-Id header for isolation)
MCP Server Hosting Options
  • Lambda: Stateless, lightweight tools (web search, calculations, data retrieval)
  • ECS: Stateful, complex tools (code execution, image processing, large compute)
  • API Gateway: Expose MCP-compatible endpoints for existing services
6-Step MCP Workflow:
  1. MCP Client Initialization: Agent app connects to MCP server via transport protocol
  2. Tool Discovery: Agent calls list_tools() β€” gets name, description, input schema for each tool
  3. Agent Creation: Agent created with discovered tools; LLM can now see tools in system prompt
  4. Reasoning & Tool Selection: LLM analyzes user query, decides which tool to call and with what arguments
  5. MCP Server Execution: Server executes tool function, returns result to agent (server is stateless)
  6. Final Response: Agent synthesizes tool results into coherent response to user
# Strands Agent with MCP integration pattern from mcp import stdio_client, StdioServerParameters from strands import Agent from strands.tools.mcp import MCPClient mcp_client = MCPClient(lambda: stdio_client( StdioServerParameters(command="uvx", args=["awslabs.aws-documentation-mcp-server@latest"]) )) with mcp_client: tools = mcp_client.list_tools_sync() agent = Agent(tools=tools, model="anthropic.claude-3-5-sonnet-20241022-v2:0") response = agent("What is the Bedrock Converse API?") # Agent auto-selects tools
πŸ”’ Safeguarded AI Workflows
Stopping Conditions
  • Step Functions: max iteration count in Choice state
  • Lambda: timeout settings (predictable execution)
  • CloudWatch alarms: auto-halt on error rate threshold
  • Circuit breaker: 50% failure β†’ open circuit 30-60s
IAM Boundaries for Agents
  • Least-privilege resource policies
  • Restrict agent to only necessary actions/resources
  • Deny any unneeded service calls
  • Session policies for temporary credentials
Human-in-the-Loop
  • Step Functions Human Task state (wait for token)
  • API Gateway: collect human feedback
  • DynamoDB: store review decisions with TTL
  • Escalation criteria based on confidence scores
Input Validation
  • Schema validation before agent processing
  • Lambda pre-processing for malformed inputs
  • Bedrock Guardrails: prompt injection detection
  • Rate limiting via API Gateway usage plans
🎯 ReAct Pattern in Step Functions: State machine alternates: Reason state (invoke LLM) β†’ Parse Action state (Lambda extracts tool call) β†’ Execute Action state (call tool) β†’ Observe state (feed result back to LLM) β†’ repeat until final answer or max steps reached.
🀝 Multi-Agent Coordination Patterns
Ensemble / Aggregation
  • Multiple agents/models on same task
  • Majority voting for classification
  • Weighted averaging for numeric outputs
  • Ranked fusion for retrieval
  • Lambda aggregation logic
Specialized Routing
  • Agent Squad: route to specialized agent
  • Claude β†’ complex reasoning tasks
  • Nova Pro β†’ document analysis
  • Nova Lite β†’ simple/high-volume tasks
  • Domain-specific agents (medical, legal)
Hierarchical Agents
  • Orchestrator agent decomposes task
  • Sub-agents handle specific components
  • Results aggregated by orchestrator
  • Step Functions manages coordination
  • DynamoDB shares state between agents
2.2

Implement Model Deployment Strategies

β–Ά
πŸš€ Deployment Strategy Decision Framework
StrategyServiceTraffic PatternLatencyCost ModelKey Config
On-Demand ServerlessLambda + BedrockSpiky, unpredictableVariable (cold start risk)Pay per invocationMemory, timeout, concurrency limits
Bedrock On-DemandBedrock InvokeModelAnyLow-mediumPay per tokenModel ID, throttling limits
Bedrock Provisioned ThroughputBedrock PTSteady, high-volumeConsistent, lowPer-hour commitment (1mo/6mo)Model Units (MUs), CloudWatch monitoring
SageMaker Real-timeSageMaker EndpointsConsistent, latency-sensitiveLow (<1s)Instance hours + dataInstance type, auto-scaling policy
SageMaker ServerlessSageMaker ServerlessIntermittentMedium (cold start)Pay per requestMemory size, max concurrency
SageMaker AsyncSageMaker Async EndpointsBatch, non-latency-sensitiveMinutesInstance hours (scale-to-zero)S3 input/output, max concurrency
Multi-Model EndpointSageMaker MMEMany models, low per-model trafficVariable (model loading)Shared instance across modelsContainer + model artifacts, routing
πŸ–₯️ Large Language Model Deployment Challenges
Memory Management
  • LLMs can be 10s-100s of GB
  • SageMaker: up to 500GB model size
  • GPU instances: ml.g5, ml.p4d.24xlarge (for large models)
  • CPU for small NER/classification: ml.c5.9xlarge
  • Container health check timeout: up to 60 min
Model Parallelism
  • DeepSpeed: tensor/pipeline parallelism
  • Triton + FasterTransformer: optimized inference
  • SageMaker Distributed Inference
  • UltraServers: multi-EC2 instances with low-latency interconnect
  • For models larger than single GPU memory
Token Processing Optimization
  • Batching: group requests to maximize GPU utilization
  • Continuous batching: process tokens as they arrive
  • KV-cache: reuse attention computations
  • Quantization (INT8/INT4): reduce model size
  • Knowledge distillation: train smaller model from large
SageMaker Endpoint Types Comparison:
Inference Components (New)
  • Host multiple models on single endpoint
  • Define separate scaling policies per model
  • Control memory/CPU allocation per component
  • Scale each model independently based on traffic
  • Best for: multi-model serving with different traffic patterns
Serial Inference Pipelines
  • Chain multiple models in sequence
  • Output of model N β†’ input of model N+1
  • E.g.: preprocessing model β†’ LLM β†’ postprocessing
  • Single endpoint for the pipeline
  • Best for: fixed multi-step inference workflows
πŸ’‘ Nova Forge via SageMaker: Custom model training starting from Nova checkpoints. Mix proprietary data with Nova-curated data across all training phases (pre/mid/post-training). Supports RL with custom reward functions and custom orchestrator for multi-turn rollouts. Prevents catastrophic forgetting better than pure custom training.
πŸ’‘ Bedrock Custom Model Import: Import models trained/fine-tuned in SageMaker into Bedrock. Get on-demand API access without managing endpoints. More cost-effective than provisioned throughput for variable traffic.
βš–οΈ Optimized Deployment Approaches
Model Cascading Architecture:
  1. Route all requests to smallest/cheapest model first (Nova Lite)
  2. Evaluate response quality with confidence scoring Lambda
  3. If quality < threshold (e.g., 0.7-0.9), escalate to Nova Pro
  4. Cache high-quality responses for similar future queries
  5. Monitor cascade metrics: escalation rate, cost savings, quality distribution
Caching Strategies
  • Response caching: ElastiCache/DynamoDB for identical/near-identical queries
  • Embedding caching: Avoid re-embedding same content
  • Semantic caching: Return cached if query vector is close enough (similarity threshold)
  • API Gateway cache: 300s default TTL for GET requests
Asynchronous Inference Pattern
  • SQS queue β†’ Lambda β†’ SageMaker Async Endpoint
  • Results stored in S3, notification via SNS
  • Scale to zero when no traffic
  • SQS visibility timeout matches processing duration (5-15 min for LLMs)
  • DLQ after 3-5 failed attempts
2.3

Design & Implement Enterprise Integration Architectures

β–Ά
🏒 Enterprise Connectivity Patterns
API-Based Integration
  • API Gateway: REST/HTTP/WebSocket APIs
  • Custom domain mappings for branding
  • Regional (low-latency) vs Edge-optimized (global)
  • Lambda integration for custom logic
  • Usage plans + throttling per API key
Event-Driven Integration
  • EventBridge: route business events to FM processing
  • Pattern matching: select which events need GenAI
  • SQS DLQ: handle failed event processing
  • EventBridge Pipes: source β†’ filter β†’ enrich β†’ target
  • Loose coupling between systems
Hybrid/On-Premises
  • AWS Outposts: run FM inference in your DC
  • AWS Wavelength: edge deployments for ultra-low latency
  • Local Zones: geographic compliance
  • Direct Connect: dedicated network to AWS
  • Site-to-Site VPN: encrypted connectivity
πŸ” Secure Access Framework for GenAI
Security LayerService/PatternImplementation Detail
Identity FederationIAM Identity Center / CognitoAttribute mapping from IdP, role assignment per user group
Fine-grained AccessAmazon Verified PermissionsCedar policy language, attribute-based (ABAC) policies on resources
Network IsolationVPC Endpoints (PrivateLink)Private connectivity to Bedrock without internet; security groups + NACLs
Encryption in TransitACM + TLS 1.2+All API calls to Bedrock are TLS encrypted by default
Encryption at RestAWS KMSCustomer-managed keys (CMK) for model artifacts, prompt logs, knowledge bases
Audit LoggingCloudTrail + CloudWatch LogsLog all FM API calls with request/response for compliance
πŸ”§ CI/CD for GenAI + GenAI Gateway Architecture
CI/CD Pipeline (CodePipeline + CodeBuild)
  1. Source: CodeCommit/GitHub trigger
  2. Build: CodeBuild β€” package Lambda, validate prompts, dependency scan
  3. Test: Automated FM behavior tests (deterministic + probabilistic)
  4. Security scan: SAST/DAST, dependency vulnerabilities
  5. Staging deploy: limited traffic rollout
  6. Approval gate: human review or automated quality check
  7. Production deploy: blue/green or canary
  8. Post-deploy: CloudWatch alarms, rollback trigger
GenAI Gateway Pattern
  • Centralized entry point for all FM access
  • API Gateway β†’ Lambda Gateway β†’ Bedrock/SageMaker
  • Enforces: auth, rate limiting, logging, cost tracking
  • Model routing logic centralized here
  • X-Ray tracing across all hops
  • Cost allocation by team/use-case via tags
  • Supports: A/B testing, gradual rollout
2.4

Implement FM API Integrations

β–Ά
🌊 Streaming & Real-Time AI
Bedrock Streaming API
  • InvokeModelWithResponseStream
  • Returns chunks as they're generated
  • Buffer management: 5-20 chunks, flush on sentence completion
  • Client-side progressive rendering
  • Error recovery: fallback to full-response API if streaming fails persistently
WebSocket / SSE Patterns
  • WebSocket: bidirectional, keep-alive ping every 30-60s
  • Idle timeout: ~10 min for interactive sessions
  • SSE: reconnection backoff from 1s to max 30-60s
  • Event IDs: resume streams after disconnection
  • API Gateway: chunked transfer encoding
πŸ”„ Resilience Patterns β€” Key Numbers to Know
Retry Configuration
  • Initial backoff: 100ms
  • Backoff factor: 2x (exponential)
  • Max backoff: 20 seconds
  • Max attempts: 3-5
  • Jitter: Β±100ms or factor 0.1-0.3
  • Retriable: 429, 500, 503
  • Non-retriable: 400, 401, 403
Circuit Breaker
  • Failure threshold: 50% over 10 requests
  • Recovery timeout: 30-60 seconds
  • Half-open test traffic: 10-20%
  • Implement: Step Functions + CloudWatch
  • Alert: CloudWatch alarm β†’ SNS β†’ Lambda
Throttling Config
  • Account-level: 10,000 RPS
  • Stage-level: 1,000-5,000 RPS
  • Route-level: 50-500 RPS (complex models)
  • SQS for request buffering under throttle
  • SQS visibility timeout: 5-15 min for LLMs
Connection Pooling
  • Pool size: 10-20 connections per instance
  • Connection TTL: 60-300 seconds
  • Reduce SDK client instantiation (reuse across Lambda invocations)
  • Use global variables for SDK clients in Lambda
🎯 X-Ray Tracing Pattern: Add custom subsegments for: (1) input preprocessing, (2) model invocation, (3) response postprocessing. Annotate with: model_name, input_complexity_score, output_quality_score, cost_estimate. This enables performance analysis by model and query type.

Domain 3: AI Safety, Security & Governance

Guardrails, IAM, Responsible AI, Compliance, Audit, Data Privacy

~20% of Exam Weight
3.1

Implement AI Safety Controls & Responsible AI

β–Ά
πŸ›‘οΈ Bedrock Guardrails β€” Complete Configuration
Guardrail FeatureConfiguration DetailApplied AtUse Case
Content FiltersCategories: HATE, INSULTS, SEXUAL, VIOLENCE, MISCONDUCT, PROMPT_ATTACK. Severity: LOW/MEDIUM/HIGH per categoryINPUT and/or OUTPUT independentlyBlock harmful content generation
Denied TopicsPlain language topic description (LLM-based classification, not regex). Custom denial message.INPUT (topic detection)Block competitor questions, legal/medical advice
Word FiltersCustom word lists + AWS managed profanity listOUTPUTEnforce brand/compliance word policies
PII Detection & Redaction50+ entity types: SSN, email, phone, credit card, name, address, IP. Mode: REDACT or BLOCKINPUT and/or OUTPUTHIPAA, PCI-DSS, GDPR compliance
Grounding CheckVerifies output is grounded in source context. Configurable relevance threshold.OUTPUT (requires context)Reduce hallucinations in RAG pipelines
Sensitive Info FiltersRegex patterns for custom sensitive data (e.g., employee IDs, internal codes)INPUT and OUTPUTOrganization-specific PII beyond standard types
⚠️ Guardrails Must Be Explicitly Invoked: Pass guardrailIdentifier + guardrailVersion in InvokeModel/Converse API call. They don't auto-apply. Can test guardrails independently with ApplyGuardrail API before deploying.
🎯 Prompt Injection Defense: Use PROMPT_ATTACK content filter + keep system prompt in system parameter (separate from user messages, protected by Bedrock). Also: validate/sanitize user input in Lambda before sending to Bedrock. Input validation + Guardrails = defense in depth.
βš–οΈ Responsible AI Principles on AWS
Bias Detection & Mitigation
  • SageMaker Clarify: bias metrics (class imbalance, DPPL, KL divergence)
  • SageMaker Data Wrangler: data quality reports
  • Model Cards (SageMaker): document model limitations, bias findings
  • HELM benchmark: includes fairness + toxicity metrics
Explainability
  • SageMaker Clarify: SHAP values for feature importance
  • Chain-of-Thought prompting: expose reasoning
  • Model Cards: document intended use, out-of-scope uses
  • Attribution in RAG: cite source documents
Privacy & Data Protection
  • Amazon Macie: detect PII in S3 automatically
  • Bedrock: no model training on customer data (by default)
  • VPC Endpoints: data doesn't leave AWS network
  • KMS CMK: customer controls encryption keys
Auditability
  • CloudTrail: all Bedrock API calls logged
  • CloudWatch Logs: model inputs/outputs (optional logging)
  • Bedrock Model Invocation Logging: S3 + CloudWatch
  • CloudTrail Lake: query audit events with SQL
πŸ”‘ IAM & Security for GenAI
Key IAM Actions for Bedrock (Know These)
bedrock:InvokeModel bedrock:InvokeModelWithResponseStream bedrock:Retrieve bedrock:RetrieveAndGenerate bedrock:ApplyGuardrail bedrock:CreateKnowledgeBase bedrock:GetFoundationModel bedrock:ListFoundationModels
Service Control Policies (SCPs)
  • Org-level deny: prevent use of non-approved models
  • Region restrictions: only us-east-1, us-west-2
  • Require conditions: VPC source, MFA, time-of-day
  • Block: CreateProvisionedModelThroughput without approval
Resource-Based Policies for Bedrock
  • Knowledge Base policies: control who can Retrieve/RetrieveAndGenerate
  • Cross-account access for shared models
  • Agent resource policies: restrict which roles can invoke
  • Condition keys: bedrock:RequestedModelId (restrict to approved models)
πŸ’‘ AWS Security Hub for GenAI: New enhanced capabilities β€” near-real-time risk analytics, improved prioritization for FM-related findings. Integrates with CloudTrail for prompt management audit events. Correlates findings across sources. Configure custom security standards for FM-specific risks (prompt injection attempts, unusual API usage patterns).
3.2

Compliance, Governance & Data Privacy

β–Ά
πŸ“œ Compliance Frameworks & Controls
RegulationKey RequirementAWS Controls
GDPRData minimization, right to erasure, consentMacie (PII detection), Guardrails (PII redaction), KMS (encryption), VPC endpoints (data residency)
HIPAAPHI protection, audit trails, BAABedrock HIPAA eligibility (with BAA), Macie, CloudTrail, dedicated endpoints, encryption
PCI-DSSCardholder data protectionGuardrails PII filter (credit card), KMS, VPC, CloudTrail, WAF on API Gateway
SOC 2Security, availability, confidentialityCloudTrail audit, Security Hub, GuardDuty, Access Analyzer
🎯 Data Residency: For regulatory requirements, use: (1) VPC Endpoints to keep data within AWS network, (2) AWS Outposts for on-premises data that can't leave DC, (3) Specific region selection (e.g., eu-west-1 for EU data), (4) S3 Object Lock for retention compliance, (5) Local Zones for specific geographic requirements.

Domain 4: Operational Efficiency & Optimization

Cost optimization, Performance tuning, Caching, Monitoring, Auto-scaling

~15% of Exam Weight
4.1

Cost Optimization for GenAI

β–Ά
πŸ’° Cost Optimization Strategies
Model Selection
  • Use smaller models for simple tasks (cascading)
  • Nova Lite for high-volume β†’ Pro only when needed
  • Measure: cost per task completion (not per token)
  • A/B test model quality vs. cost
Prompt Optimization
  • Shorter prompts = fewer input tokens = lower cost
  • Remove unnecessary context
  • Structured prompts produce shorter outputs
  • Max tokens limit prevents runaway costs
Caching
  • Semantic cache: return cached if cosine similarity > 0.95
  • Response cache: ElastiCache for exact-match queries
  • Embedding cache: avoid re-embedding same documents
  • Up to 90% cost reduction for repetitive queries
Provisioned Throughput
  • 1-month or 6-month commitment
  • Break-even: typically ~70% utilization
  • Use CloudWatch to track PT utilization
  • Only for truly steady, predictable workloads
πŸ’‘ Nova Forge Cost Optimization: Continued pre-training from checkpoints is dramatically cheaper than full retraining. Blending approach reduces catastrophic forgetting, meaning you don't need to retrain as often when adding new domain knowledge. RL with custom reward functions enables efficient post-training.
πŸ’‘ S3 Vectors + Intelligent-Tiering: Automatically moves infrequently accessed vectors to lower-cost tiers. 40-60% storage cost reduction for large vector collections. No performance impact for frequently queried vectors (cached in high-performance tier).
4.2

Performance Optimization & Monitoring

β–Ά
πŸ“Š Key Metrics & Monitoring Architecture
Technical Metrics
  • Inference latency: p50, p90, p95, p99
  • Throughput (tokens/s, requests/s)
  • Error rate by error type
  • Token utilization (input vs. output)
  • Cache hit rate
  • Model invocation count
Business Metrics
  • Cost per inference / per task completion
  • User satisfaction (CSAT, thumbs up/down)
  • Task completion rate
  • Time-to-first-token (UX)
  • Business value per dollar spent
Quality Metrics
  • Hallucination rate (grounding check score)
  • Response relevance (semantic similarity)
  • Guardrail trigger rate (by category)
  • Human review escalation rate
  • Model drift (quality degradation over time)
CloudWatch Dashboard Components:
  • Bedrock invocation metrics (built-in namespace: AWS/Bedrock)
  • Custom metrics: quality scores, cache hit rates, business KPIs (via PutMetricData)
  • Log Insights queries: identify patterns in prompt confusion, slow responses
  • Composite alarms: trigger only when multiple conditions met simultaneously
  • Anomaly detection: ML-based baseline for adaptive alerting
🎯 Bedrock Model Invocation Logging: Enable to capture: full request/response to S3 or CloudWatch Logs. Use for: quality auditing, debugging, fine-tuning data collection, compliance. Configure at account level or per model. Important: logging adds slight latency β€” consider async delivery to S3 via Firehose for high-volume production.
πŸ“ˆ Auto-Scaling Strategies
ServiceScaling TriggerScaling TypeNotes
SageMaker EndpointsInvocationsPerInstance, CPU utilization, custom metricsTarget tracking or Step scalingCooldown periods prevent thrashing; Inference Components allow per-model scaling
LambdaConcurrent executions (auto)Automatic, up to account limitReserved concurrency for predictability; Provisioned concurrency for cold start elimination
Bedrock Provisioned ThroughputManual or CloudWatch-triggered scalingModel Units (MUs)No auto-scale; plan capacity from usage metrics
OpenSearchCPU, memory, storage utilizationHorizontal (add data nodes) or verticalUltraWarm for cost-efficient historical vectors; Auto-Tune for JVM optimization
API GatewayThrottling limits per stage/routeUsage plans (no auto-scale)SQS buffer behind API GW for burst handling

Domain 5: Testing, Validation & Troubleshooting

Model evaluation, QA frameworks, Regression testing, Debugging GenAI applications

~10% of Exam Weight
5.1

Model Evaluation & Validation Frameworks

β–Ά
πŸ§ͺ Bedrock Model Evaluation
Automatic Evaluation
  • Metrics: accuracy, robustness, toxicity
  • Uses built-in or custom datasets
  • Comparisons across multiple models
  • Results in S3 and viewable in console
  • ROUGE, METEOR, BERTScore for text quality
Human Evaluation (A/B)
  • Side-by-side model comparison
  • Human raters rank responses
  • Criteria: accuracy, coherence, helpfulness
  • Works with AWS Mechanical Turk or internal teams
  • Statistical significance testing
πŸ’‘ Probabilistic Validation Approach: For deterministic outputs (JSON schemas, specific formats) β†’ use exact match / schema validation. For generative outputs (summaries, answers) β†’ use semantic similarity scoring (cosine similarity > threshold), not exact match. Run N samples, validate distribution of quality scores.
πŸ” Testing Frameworks & Strategies
Test TypeWhat It TestsImplementation
Functional TestingCorrect outputs for expected inputsLambda test harness, expected output comparison
Edge Case TestingBoundary inputs, empty strings, very long prompts, special charactersParameterized test suite, automated via Step Functions
Prompt Injection TestingResistance to jailbreak/injection attacksRed-teaming prompts, Guardrail PROMPT_ATTACK filter testing
Regression TestingNew model/prompt version doesn't degrade previous qualityGolden dataset + automated quality comparison, CloudWatch quality metrics
Load TestingPerformance under expected trafficLambda concurrent invocations, API GW throttle testing
Hallucination TestingFactual accuracy, grounding in source docsBedrock Grounding Check, RAGAs framework, human spot checks
Bias TestingConsistent quality across demographic groupsSageMaker Clarify, HELM fairness metrics
🎯 SageMaker AI RL-based Customization: Serverless RL-based fine-tuning (new feature) reduces fine-tuning time from months to days. Enables rapid testing of new prompt architectures and domain-specific model behaviors. Used in QA workflows to quickly validate if a model variant improves target metrics.
πŸ› Troubleshooting GenAI Applications
Poor RAG Quality
  • Check: chunk size vs. query complexity
  • Verify: same embedding model for index + query
  • Inspect: similarity scores (too low = bad embeddings)
  • Review: metadata filters (over-filtering?)
  • Embedding drift: re-embed if model updated
High Latency
  • X-Ray trace: find slow subsegment
  • Cold starts: enable Lambda Provisioned Concurrency
  • Vector search: reduce ef_search, add metadata pre-filter
  • Model: try smaller model or Cross-Region inference
  • Cache hit rate too low: review semantic threshold
Hallucinations
  • Enable Bedrock Grounding Check guardrail
  • Increase retrieved context (more chunks)
  • Add citation requirement to prompt
  • Reduce temperature (0.1-0.3 for factual tasks)
  • Use CoT to expose reasoning
Throttling (429 errors)
  • Check: Bedrock quota limits in Service Quotas
  • Request quota increase via support ticket
  • Implement exponential backoff + jitter
  • Add SQS buffer for burst absorption
  • Use Cross-Region inference for capacity
⚠️ CloudWatch Logs Insights for GenAI Debugging: Query patterns: filter @message like "throttle" | stats count() by @logStream β€” find throttled Lambda functions. Also query for guardrail trigger patterns, slow invocations (filter @duration > 5000), and error message patterns to identify systemic issues.

Deep Dive by Service

Reference architecture, configuration details, and exam tips per AWS service

🟠

Amazon Bedrock β€” Complete Service Reference

β–Ά
πŸ“‘ Bedrock APIs β€” Know Every One
APIPurposeKey ParametersResponse Type
InvokeModelSynchronous single inferencemodelId, body (model-specific JSON)Complete response
InvokeModelWithResponseStreamStreaming inferencemodelId, bodyEvent stream (chunk by chunk)
ConverseUnified multi-model API (recommended)modelId, messages[], system[], inferenceConfigComplete, unified format
ConverseStreamUnified streaming APISame as ConverseEvent stream, unified format
RetrieveKnowledge Base vector search onlyknowledgeBaseId, retrievalQueryRetrieved chunks + metadata
RetrieveAndGenerateRAG: retrieve + generate in one callknowledgeBaseId, input, retrievalConfig, generationConfigGenerated response + citations
ApplyGuardrailTest guardrails without model callguardrailIdentifier, guardrailVersion, source, contentAction (NONE/GUARDRAIL_INTERVENED) + assessments
CreateModelEvaluationJobAutomated model evaluationevaluationConfig, inferenceConfig, outputDataConfigJob ARN
🎯 Converse API Advantage: Unified format works across all Bedrock models (Claude, Nova, Titan, Llama, Mistral). One code path for all models. Handles system prompts, multi-turn conversation, tool use, document understanding. Preferred over InvokeModel for new development.
⚑ Provisioned Throughput β€” Detailed Config
What is a Model Unit (MU)?

A Model Unit represents a specific throughput capacity (tokens per minute). Different models have different MU sizes. Purchase 1+ MUs based on peak throughput requirement.

  • 1-month term: lower commitment, higher per-MU cost
  • 6-month term: better rate, more risk
  • No-commitment: available for some models (most expensive)
  • CloudWatch: TokensPerMinute metric for utilization
When to Use Provisioned Throughput
  • Steady traffic (70%+ utilization to break even)
  • Need guaranteed capacity (SLA requirements)
  • Consistent low latency requirements
  • Avoid: spiky/unpredictable traffic (use on-demand)
  • Cross-Region inference: use inference profiles instead

⚑ Quick Reference Cheat Sheet

Critical numbers, decision trees, and patterns for exam day

πŸ”’

Critical Numbers & Thresholds to Memorize

β–Ά
⚑ Retry & Resilience
Initial backoff
100ms
Backoff factor
2x (exponential)
Max backoff
20 seconds
Max attempts
3-5
Jitter
Β±100ms or 0.1-0.3 factor
Circuit open at
50% fail over 10 requests
Recovery timeout
30-60 seconds
Half-open traffic
10-20%
πŸ”„ SQS for LLMs
Visibility timeout
5-15 minutes (LLM tasks)
DLQ after
3-5 failed attempts
Max message size
256KB (use S3 pointer for large)
Retention
Up to 14 days
FIFO vs Standard
FIFO for ordered processing
πŸ”Œ API Gateway
Request timeout max
29 seconds (REST API)
Account-level throttle
10,000 RPS default
Stage-level
1,000-5,000 RPS
Route-level
50-500 RPS (complex models)
Burst limit
2-3x steady-state rate
Cache TTL
300s default
πŸ”΄ Bedrock Throttling
429 error
Throttled (retriable)
400 error
Bad request (NOT retriable)
401/403
Auth (NOT retriable)
500/503
Service error (retriable)
Bedrock timeout
Up to 120s complex models
Simple models
15-30s timeout OK
πŸ“Š Vector Search
HNSW M param
16-64 (connections/node)
ef_construction
100-512 (build quality)
ef_search
100-512 (query quality)
Cosine range
-1 to 1 (1 = identical)
Semantic cache threshold
Cosine > 0.95
S3 Vectors pre-filter saves
50-70% search space
πŸ”— Lambda + GenAI
Connection pool size
10-20 connections/instance
Connection TTL
60-300 seconds
Provisioned concurrency
Eliminate cold starts
Global SDK client
Initialize OUTSIDE handler
LLM task timeout
5-15 minutes (async)
Max Lambda timeout
15 minutes
πŸ’° Cost Thresholds
Provisioned break-even
~70% utilization
S3 Vectors Intelligent-Tiering
40-60% cost reduction
Semantic cache savings
Up to 90% for repetitive queries
Model cascade target
Start with Nova Lite, escalate only if quality < threshold
πŸ“‘ Streaming
Buffer size
5-20 chunks
Flush trigger
Sentence completion or 100-500ms
WebSocket keep-alive
Ping every 30-60s
WebSocket idle timeout
~10 minutes
SSE reconnect
1s β†’ backoff β†’ max 30-60s
🧠

Last-Mile Exam Traps

β–Ά
🎯 Why this mattersThe official-style practice questions are less about broad GenAI knowledge and more about picking the precise AWS-native answer when multiple options feel plausible. Use this section as your final decision matrix.
Common TrapCorrect LeanWhat AWS Is Testing
Need managed failover and performance-aware regional routingInference profileNot just "multi-Region"; choose the Bedrock-native routing construct.
Need general async batch inference for text/image workloadsCreateModelInvocationJobStartAsyncInvoke is the distractor for Nova Reel video generation.
Need to inspect which knowledge base files failed ingestionKnowledge base logging to CloudWatch LogsCloudTrail audits API calls, not document-level ingestion outcomes.
Need to reorder already relevant retrieval resultsReranker modelsHybrid search improves retrieval; rerank improves final ordering.
Need to guarantee every inference call includes a guardrailIAM condition key bedrock:GuardrailIdentifierCentral enforcement beats custom proxy code.
Need to know which specific guardrail layer intervenedtrace: "enabled" + GuardrailPolicyType metricsNot just whether input/output was blocked, but which policy fired.
Need generation to halt on a phraseStop sequencesPrompt instructions are weaker than inference parameters.
Unpredictable traffic with long idle periodsOn-demand BedrockProvisioned Throughput is usually only right for high steady utilization.
Deterministic workflow with audit and mandatory sequenceStep FunctionsAgents/Flows are often distractors when compliance is explicit.
Persistent MCP tool serversECS/FargateLambda is attractive, but poor for persistent SSE-style connections.
πŸ’‘ Final HeuristicIf two answers both work, the exam usually wants the option with the least custom infrastructure, the clearest AWS-native fit, and the most direct match to the exact constraint in the question stem.
🌳

Decision Trees β€” What to Use When

β–Ά
πŸš€ Deployment Strategy Decision Tree
Q: What type of traffic pattern? β”œβ”€β”€ Spiky / unpredictable, low volume β”‚ └── β†’ Lambda + Bedrock On-Demand (pay per token) β”œβ”€β”€ Steady, high volume (>70% utilization) β”‚ └── β†’ Bedrock Provisioned Throughput (hourly commitment) β”œβ”€β”€ Need custom/open-source model β”‚ β”œβ”€β”€ Low latency needed β†’ SageMaker Real-time Endpoint β”‚ β”œβ”€β”€ Intermittent traffic β†’ SageMaker Serverless Inference β”‚ β”œβ”€β”€ Batch, non-urgent β†’ SageMaker Async Endpoint β”‚ └── Many models, low per-model traffic β†’ SageMaker MME β”œβ”€β”€ Model too large for single GPU (>80GB) β”‚ └── β†’ SageMaker + DeepSpeed/Triton OR EC2 UltraServer └── On-premises data requirement └── β†’ AWS Outposts OR AWS Wavelength (edge)
πŸ” Vector Store Selection Decision Tree
Q: What are your requirements? β”œβ”€β”€ Billions of vectors, lowest cost β”‚ └── β†’ Amazon S3 Vectors (new) with Intelligent-Tiering β”œβ”€β”€ Need keyword + semantic (hybrid) search β”‚ └── β†’ Amazon OpenSearch Service with Neural Plugin β”œβ”€β”€ Need vector + relational SQL queries (JOINs, filters) β”‚ └── β†’ Aurora PostgreSQL + pgvector extension β”œβ”€β”€ Fully managed RAG (no infra) β”‚ └── β†’ Amazon Bedrock Knowledge Bases (managed KNN with OpenSearch Serverless) β”œβ”€β”€ Ultra-low latency + key-value β”‚ └── β†’ Amazon MemoryDB (Redis-compatible) └── Enterprise search + AI (hybrid) └── β†’ Amazon Kendra (semantic + keyword, pre-built connectors)
πŸ€– Agentic Architecture Decision Tree
Q: What type of agentic workflow? β”œβ”€β”€ Simple autonomous agent (managed) β”‚ └── β†’ Amazon Bedrock Agents β”‚ (+ Knowledge Bases for RAG, + Guardrails for safety) β”œβ”€β”€ Need full code control + transparency β”‚ └── β†’ Strands Agents SDK (open-source) β”‚ + MCP servers (Lambda for simple, ECS for complex tools) β”œβ”€β”€ Multiple specialized agents collaborating β”‚ └── β†’ AWS Agent Squad + Strands OR Bedrock Agents multi-agent β”œβ”€β”€ Deterministic workflow (compliance, audit required) β”‚ └── β†’ AWS Step Functions state machine β”‚ (ReAct pattern or sequential with LLM at each step) β”œβ”€β”€ Complex governance + composability β”‚ └── β†’ Amazon Bedrock AgentCore (framework-agnostic composable services) └── Human approval required in workflow └── β†’ Step Functions waitForTaskToken (human-in-the-loop)
πŸ”’ Guardrails Decision Tree
Q: What safety requirement do you have? β”œβ”€β”€ Block harmful content (hate, violence, sexual) β”‚ └── β†’ ContentPolicyConfig with appropriate inputStrength/outputStrength β”œβ”€β”€ Block specific topics (competitors, legal advice) β”‚ └── β†’ TopicPolicyConfig (DENY + plain language description) β”œβ”€β”€ Detect/redact PII (HIPAA, GDPR) β”‚ └── β†’ SensitiveInformationPolicyConfig (REDACT or BLOCK per PII type) β”œβ”€β”€ Block custom words/phrases β”‚ └── β†’ WordPolicyConfig (custom list + managed profanity) β”œβ”€β”€ Prevent hallucinations in RAG β”‚ └── β†’ GroundingPolicyConfig (GROUNDING filter with threshold 0.7-0.9) β”œβ”€β”€ Prevent prompt injection attacks β”‚ └── β†’ ContentPolicyConfig with PROMPT_ATTACK filter (HIGH on INPUT) └── Need to test guardrail before deploying └── β†’ Use ApplyGuardrail API independently
βœ‚οΈ Chunking Strategy Decision Tree
Q: What type of document and use case? β”œβ”€β”€ Uniform, structured content (FAQs, product descriptions) β”‚ └── β†’ Fixed-size chunking (256-512 tokens, 10-20% overlap) β”œβ”€β”€ Long documents needing both precision + context β”‚ └── β†’ Hierarchical chunking (parent=large context, child=small precision) β”œβ”€β”€ Varied documents, conversational data β”‚ └── β†’ Semantic chunking (split where similarity drops) β”œβ”€β”€ Documents with clear headers/sections (PDFs, docs) β”‚ └── β†’ Document-structure chunking (split at headings) β”œβ”€β”€ Custom logic required (domain-specific, preprocessing) β”‚ └── β†’ Lambda custom chunking workflow └── Using Bedrock Knowledge Bases (managed) └── β†’ Choose: Default (300 tokens), Fixed, Hierarchical, or Semantic in KB config
🎯

Top 20 Exam-Day Tips (High-Yield)

β–Ά
  1. Converse API = unified multi-model: Single code path for Claude, Nova, Titan, Llama. Preferred for new development over InvokeModel.
  2. Bedrock Guardrails must be explicitly invoked: Add guardrailConfig to every API call. Not auto-applied. Test with ApplyGuardrail API.
  3. Same embedding model for index AND query: Never mix models. This is a common trap question.
  4. Nova Forge = SageMaker AI: Accessed through SageMaker, NOT directly in Bedrock. Training from checkpoints to prevent catastrophic forgetting.
  5. S3 Vectors = billions of vectors: New service for massive-scale vector storage with Intelligent-Tiering. Pre-filter metadata before vector calculation (50-70% savings).
  6. Provisioned Throughput break-even ~70% utilization: Below that, on-demand is cheaper. Use CloudWatch to track PT utilization before committing.
  7. Step Functions + Bedrock = native integration: No Lambda needed for InvokeModel or RetrieveAndGenerate in Step Functions.
  8. MCP: Lambda = lightweight, ECS = complex: Lambda for stateless tool access (search, calc), ECS for code execution or image processing.
  9. HNSW ef_search tradeoff: Higher ef_search = better recall but slower queries. Tune based on acceptable latency at p99.
  10. pgvector advantage = SQL + vector: When you need relational queries combined with similarity search. Not for billion-scale.
  11. Cross-Region Inference Profiles: Use for distributing load across regions. Automatic failover, no additional cost vs. on-demand tokens.
  12. AgentCore vs. Agents: AgentCore = composable services (Policy, Evaluations, Memory) that work with ANY framework/model. Bedrock Agents = specific managed agent runtime.
  13. Probabilistic validation: FM outputs vary. Use semantic similarity scoring + thresholds (not exact match) for QA. Run N samples, validate distribution.
  14. ReAct in Step Functions: Reason (LLM) β†’ Parse Action (Lambda) β†’ Execute Tool (Lambda/API) β†’ Observe (Pass state) β†’ loop. Max iterations in Choice state.
  15. Hierarchical chunking = parent+child: Child chunks for precise matching, parent chunks returned as context. Built into Bedrock Knowledge Bases.
  16. Grounding Check = hallucination prevention: Bedrock Guardrail feature. Only works when you pass source context. Set threshold 0.7-0.9.
  17. Lambda cold starts: Use Provisioned Concurrency for latency-sensitive paths. Initialize SDK clients OUTSIDE handler function (global scope).
  18. SageMaker Inference Components: Host multiple models on one endpoint with INDEPENDENT scaling policies per model. Different from Multi-Model Endpoints (which share compute).
  19. Model cascading pattern: Route ALL traffic to Nova Lite first. Escalate to Nova Pro only when quality score < threshold. Can save 60-80% of inference costs.
  20. Security Hub + Bedrock: Near-real-time risk analytics for FM deployments. Correlates CloudTrail events with security findings. Configure custom standards for FM-specific risks.
πŸ“š

Services Not to Forget (Often Overlooked)

β–Ά
Amazon Kendra

Enterprise search combining keyword (BM25) + semantic. Pre-built connectors (S3, SharePoint, Confluence, Salesforce). FAQ extraction. Relevance tuning. Use when: existing enterprise docs, need zero-config search quality.

Amazon Macie

ML-powered PII detection in S3. Auto-discovers sensitive data. Integrates with Security Hub. Use for: data governance, GDPR compliance, before feeding data to FMs.

AWS AppConfig

Dynamic configuration without redeployment. Use for: routing rules, model selection logic, feature flags, A/B test percentages. Supports gradual rollout with automatic rollback.

Amazon Verified Permissions

Fine-grained authorization using Cedar policy language. ABAC (attribute-based) policies. Use for: controlling which users can query which knowledge bases, role-based FM access.

Amazon Bedrock Data Automation

AI-powered pipeline for processing unstructured documents (PDFs, images, audio, video). Extracts structured data automatically. Reduces manual preprocessing for RAG pipelines.

AWS X-Ray

Distributed tracing across Lambda β†’ Bedrock β†’ OpenSearch. Custom segments + annotations (model_name, cost, quality_score). Service map visualization. Filter traces by annotation. Use for latency debugging.

Amazon Comprehend

NLP enrichment for data pipelines. Entity extraction, sentiment, key phrases, PII detection, topic modeling. Use to enrich documents with metadata BEFORE indexing into vector store.

CloudTrail Lake

Query audit events with SQL (Athena-like). Use for: compliance reporting on FM usage, query who invoked which model when, detect unusual access patterns in prompt management.