AIP-C01 Exam Blueprint & Study Map
AWS Certified Generative AI Developer β Professional Β· 65 Questions Β· 130 minutes Β· ~$300 USD
Target: Developers with 2+ yrs AWS + 1+ yr GenAI hands-onContent Domains
Exam Questions
Minutes
Bedrock Models Available
Passing Score (scaled)
Domain Weight Distribution & Key Focus Areas
βΆ| Domain | Approx Weight | Core Tasks | High-Priority Services |
|---|---|---|---|
| 1. FM Integration, Data & Compliance | ~30% | FM selection, Data pipelines, Vector stores, RAG, Prompt engineering | Bedrock, OpenSearch, SageMaker, S3, Glue |
| 2. Implementation & Integration | ~25% | Agentic AI, Deployment strategies, Enterprise integration, API patterns | Bedrock Agents, Step Functions, Lambda, API GW, ECS |
| 3. AI Safety, Security & Governance | ~20% | Guardrails, IAM, Compliance, Responsible AI, Audit | Bedrock Guardrails, IAM, CloudTrail, Security Hub, KMS |
| 4. Operational Efficiency & Optimization | ~15% | Cost optimization, Performance tuning, Monitoring, Caching | CloudWatch, Cost Explorer, Bedrock Prompt Routing, SageMaker |
| 5. Testing, Validation & Troubleshooting | ~10% | Model evaluation, QA frameworks, Debugging, Regression testing | Bedrock Model Eval, SageMaker Experiments, CloudWatch Logs |
Services You Must Know Cold
βΆπ Amazon Bedrock Core
- InvokeModel / InvokeModelWithResponseStream
- Converse API (unified multi-model)
- Knowledge Bases (RAG)
- Agents + Action Groups
- Guardrails (content/PII/topic)
- Prompt Management + Flows
- Model Evaluation
- Custom Model Import
- Provisioned Throughput
- Cross-Region Inference
- Intelligent Prompt Routing
π΅ Amazon Nova Family
- Nova 2 Pro β complex multi-step reasoning
- Nova 2 Lite β cost-effective, high-volume
- Nova 2 Sonic β real-time voice/conversation
- Nova Canvas β image generation
- Nova Reel β video generation
- Nova Forge β custom model training from checkpoints
- Supports system messages, multimodal, streaming
π’ Agentic Stack
- Bedrock Agents (managed)
- Bedrock AgentCore (composable services)
- Strands Agents SDK (open-source)
- AWS Agent Squad (multi-agent)
- MCP (Model Context Protocol)
- Step Functions (ReAct/CoT workflows)
- Lambda (stateless MCP servers)
- ECS (complex MCP servers)
π£ Vector & RAG Stack
- Amazon OpenSearch (neural plugin, HNSW)
- Aurora PostgreSQL + pgvector
- Amazon S3 Vectors (new β billions of vectors)
- Bedrock Knowledge Bases
- Amazon Titan Embeddings (V1/V2/Multimodal)
- Amazon Kendra (keyword + semantic hybrid)
- Bedrock Data Automation
π΄ Security & Governance
- Bedrock Guardrails (content filter, PII, topic)
- IAM + resource-based policies
- AWS KMS (data encryption)
- AWS CloudTrail (audit)
- AWS Security Hub (near-real-time risk)
- Amazon Macie (PII in S3)
- VPC Endpoints (PrivateLink)
- SageMaker Model Cards
π©΅ Deployment & Ops
- SageMaker Real-time, Serverless, Async endpoints
- SageMaker Multi-model / Multi-container endpoints
- SageMaker Inference Components
- EC2 UltraServers (large model inference)
- DeepSpeed / Triton model parallelism
- Nova Forge (custom training via SageMaker)
- CloudWatch (metrics, alarms, dashboards)
- AWS X-Ray (distributed tracing)
Critical "Know the Difference" Decision Points
βΆ| Decision | Option A | Option B | Choose When |
|---|---|---|---|
| Bedrock vs SageMaker | Bedrock β managed, pay-per-token | SageMaker β custom containers, full control | Bedrock for standard FMs; SageMaker for custom/open-source models or complex inference pipelines |
| On-demand vs Provisioned Throughput | On-demand β variable traffic, pay-per-use | Provisioned β predictable, dedicated capacity | Provisioned for steady high-volume; On-demand for spiky/low traffic |
| RAG vs Fine-tuning | RAG β dynamic, updatable knowledge | Fine-tuning β baked-in domain knowledge | RAG when data changes frequently; Fine-tuning for style/tone/format adaptation |
| Lambda vs ECS for MCP | Lambda β stateless, lightweight tools | ECS β stateful, complex compute tools | Lambda for simple tool calls; ECS for code execution, image processing |
| OpenSearch vs pgvector vs S3 Vectors | OpenSearch β full-text + vector hybrid | pgvector β relational + vector in RDS | S3 Vectors for billions of vectors, cost-optimized; pgvector when you need SQL joins; OpenSearch for hybrid keyword+semantic |
| Fixed vs Semantic Chunking | Fixed-size β simple, predictable | Semantic β content-aware boundaries | Fixed for uniform content; Semantic for varied documents; Hierarchical for structured docs |
| Nova Pro vs Lite vs Sonic | Pro β complex reasoning tasks | Lite β high-volume, cost-efficient | Sonic for real-time voice/conversation; Lite for batch/simple; Pro for analysis/reasoning |
| Bedrock Agents vs Strands vs Step Functions | Bedrock Agents β fully managed, conversational | Strands β open-source, custom control | Step Functions for deterministic workflows with branching; Bedrock Agents/Strands for autonomous LLM-driven action selection |
Domain 1: Foundation Model Integration, Data Management & Compliance
Tasks 1.1β1.6 Β· Covers FM selection, data pipelines, vector stores, RAG, and prompt engineering
~30% of Exam WeightAnalyze Requirements & Design GenAI Solutions
βΆThree primary integration approaches based on control/expertise trade-offs:
Amazon Bedrock (Unified API)
- Fully managed, no infrastructure
- Pay-per-use (tokens)
- Quick time-to-market
- ~100 models via single API
- Best for: standard FMs, rapid prototyping
SageMaker (Custom/Control)
- Bring your own model/container
- Fine-grained instance control
- Supports open-source models
- GPU selection (g5, p4d families)
- Best for: custom models, complex inference
AWS AI Factories (On-Prem)
- AWS-managed infra in your DC
- Cloud-like AI in own environment
- For data sovereignty requirements
- Also: AWS Outposts for hybrid
AWS Well-Architected for GenAI (6 Pillars applied)
| Pillar | GenAI-Specific Consideration |
|---|---|
| Operational Excellence | Automated model retraining, baseline behavior metrics, self-healing capabilities |
| Security | IAM for model access, VPC endpoints, KMS encryption, Guardrails for output safety |
| Reliability | Multi-AZ by default, cross-Region inference for HA, circuit breakers, fallback models |
| Performance Efficiency | Right-sizing (Lambda vs. Provisioned), caching embeddings/responses, batch inference |
| Cost Optimization | On-demand vs. provisioned throughput, model cascading (cheapβexpensive), prompt caching |
| Sustainability | Model distillation, parameter-efficient fine-tuning (PEFT), smaller specialized models |
PoC β Production Transition Framework
- Define use case scope with success criteria and ROI metrics
- Select FM: benchmark on custom eval set, not just public leaderboards
- Build PoC using Bedrock + Lambda + simple front-end
- Validate with stakeholders: accuracy, latency, cost per inference
- Harden for production: add Guardrails, monitoring, error handling
- Phased rollout: pilot β limited release β full production
AI Center of Excellence (CoE)
- Central governance and best practices
- Pattern library and code templates
- Model governance committee
- Standardized onboarding process
- Cross-functional team structure
Production Monitoring Framework
- Technical: inference latency (p50/p95/p99), throughput, error rates
- Business: cost per inference, user satisfaction, task completion
- Quality: accuracy, consistency, hallucination rate
- CloudWatch dashboards + automated alerts
Select & Configure Foundation Models
βΆGeneral Benchmarks
- MMLU β 57 subjects, knowledge breadth
- HELM β 42+ models, multidimensional (fairness, bias, toxicity)
- BigBench β 204+ diverse tasks, capability boundaries
- BIG-Bench Hard β complex multi-step reasoning
- GLUE/SuperGLUE β language understanding
Task-Specific
- HumanEval+ / MBPP+ β code generation
- GSM8K / MATH β mathematical reasoning
- MT-Bench β multi-turn conversation (GPT-4 as judge)
- MedPaLM β medical domain
- FinanceBench β financial analysis + compliance
Multimodal
- MMMLU β text/image/audio/video
- MME β fine-grained perception vs. reasoning
- MMMU β professional multimodal tasks
- LMSys Chatbot Arena β human preference (Elo ratings)
| Strategy | How It Works | Best For | AWS Implementation |
|---|---|---|---|
| Static Routing | Predetermined rules (department, content type, user role) | Simple, predictable workloads | Lambda + JSON routing config, AppConfig feature flags |
| Dynamic / Intelligent Routing | Runtime analysis of prompt complexity, content type, cost/quality | Mixed workloads needing optimization | Bedrock Intelligent Prompt Routing |
| Content-Based Routing | Step Functions Choice states evaluate input characteristics | Specialized models per domain | Step Functions + Lambda classifier |
| Model Cascading | Start cheap (Nova Lite) β escalate to Pro only if quality < threshold | Cost optimization with quality floor | Lambda confidence scoring + escalation logic |
| Cross-Region Inference | Distribute requests across AWS regions | Throughput scaling, HA, latency optimization | Bedrock Cross-Region Inference Profiles |
Nova Model Routing Tiers
Nova 2 Pro
High-complexity: multi-step reasoning, detailed analysis, document understanding. Highest cost/quality.
Nova 2 Lite
Medium complexity: standard generation, high-volume processing. Best cost/performance ratio for most workloads.
Nova 2 Sonic
Real-time conversational AI with lowest latency. Optimized for voice applications and streaming dialogue.
- Analyzes: prompt length, complexity, content type, performance requirements
- Evaluates: latency requirements, cost limits, quality thresholds
- Routes between: Claude, Nova, Titan, and other Bedrock models
- Learns from performance data over time (improves routing accuracy)
- Maintains consistent response formats across models
- Configure via Bedrock API β minimal code changes required
Circuit Breaker Pattern
- Monitor error rate over N requests
- Threshold: ~50% failures β open circuit
- Recovery timeout: 30-60 seconds
- Half-open: test 10-20% traffic
- Implement with: Step Functions + CloudWatch alarms
Fallback Hierarchy
- Primary model β Smaller model
- Smaller model β Cached response
- Cached β Static response
- Each level has quality threshold check
- Log all fallback events for analysis
Cross-Region HA
- Active-active or active-passive
- Route 53 health checks (30s interval, 3 failures)
- Cross-region inference profiles in Bedrock
- DynamoDB Global Tables for state sync
- S3 Cross-Region Replication for assets
Implement Data Validation & Processing Pipelines
βΆData quality impacts FMs through three channels: prompts, retrieved information (RAG), and fine-tuning datasets.
| Tool | Role in Pipeline | Key Capability |
|---|---|---|
| AWS Glue | ETL, Data Catalog, Crawlers | Schema detection, data cataloging, validation workflows, PySpark transforms |
| SageMaker Data Wrangler | Data exploration & transformation UI | 300+ built-in transforms, data quality reports, bias detection |
| SageMaker Processing Jobs | Large-scale data processing | Pre-built Scikit-learn/Spark containers, feature engineering, evaluation |
| AWS Lambda | Custom validation logic, real-time checks | Schema validation, type checks, range validation, normalization |
| Step Functions | Pipeline orchestration with quality gates | Error handling, retries, parallel processing, feedback loops |
| Amazon Comprehend | NLP enrichment | Entity extraction, sentiment, PII detection for data enhancement |
| Bedrock Data Automation | Unstructured data processing | Auto-cleansing, tokenization, formatting for training/RAG data |
| CloudWatch | Data quality monitoring | Custom metrics for data drift, quality scores, anomaly detection |
Each FM has a specific JSON schema. The Converse API provides a unified interface.
Claude / Nova (messages format):
Amazon Titan format:
HTTP Error Codes:
- 400 β Bad Request (invalid JSON, missing fields)
- 401/403 β Auth/permission issues (non-retriable)
- 429 β Throttling (retriable with backoff)
- 500/503 β Service errors (retriable)
Multimodal Input (image in messages):
Text Processing
- Amazon Comprehend: entities, sentiment, PII
- AWS Glue: ETL, normalization
- Lambda: custom cleaning, tokenization
- Bedrock Data Automation: AI-powered prep
Image Processing
- Amazon Rekognition: object detection, labels
- Bedrock Nova Canvas/Titan Image
- Base64 encoding for Bedrock API
- S3 + Lambda trigger pipeline
Audio/Video
- Amazon Transcribe: speech-to-text
- Cross-modal alignment (sync audio/video)
- Nova Reel: video generation
- Nova Sonic: real-time audio conversation
Design & Implement Vector Store Solutions
βΆDistance Metrics β Know All Three:
| Metric | Formula Concept | Best For | Notes |
|---|---|---|---|
| Cosine Similarity | Angle between vectors (direction only) | Text embeddings, docs of different lengths | Range: -1 to 1; ignores magnitude; most common for NLP |
| Euclidean Distance | Straight-line distance in vector space | When magnitude matters, dense embeddings | Sensitive to dimensionality; lower = more similar |
| Dot Product | Magnitude + direction combined | When content volume is relevant | Can favor longer documents; efficient compute |
AWS Vector Store Options:
| Service | Index Type | Hybrid Search | Scale | Best Use Case |
|---|---|---|---|---|
| OpenSearch Neural | HNSW or IVF | β Keyword + Vector | Large to very large | Full-text + semantic search, enterprise search |
| Aurora pgvector | IVFFlat, HNSW | β SQL + Vector | Medium | Need relational queries + similarity (e.g., filter by user_id then similarity) |
| S3 Vectors | Native S3 distributed | β Vector only | Billions of vectors | Cost-optimized large-scale vector storage |
| Bedrock Knowledge Bases | Managed (OSS backend) | β Managed hybrid | Enterprise | Managed RAG β no infra management |
| Amazon MemoryDB | Redis-compatible | β | Medium | Ultra-low latency vector + key-value |
Hierarchical Navigable Small World (HNSW) is the primary index type for vector search in OpenSearch:
Index Construction Parameters
- M: Max connections per node β higher M = better recall but more memory (typical: 16-64)
- ef_construction: Search width during build β higher = better quality, slower indexing (typical: 100-512)
- max_connections: Upper limit on node connections
Search Parameters
- ef_search: Search width during query β higher = better recall, slower (typical: 100-512)
- num_candidates: Candidates to evaluate
- rescore: Enable for improved accuracy
- Performance: p50/p95/p99 latency + recall@k
4-Stage Hierarchical Search Pipeline:
- Coarse filtering: Apply metadata filters, document clustering, semantic routing to relevant partitions
- Approximate ANN search: Fast approximate nearest neighbor, retrieve larger candidate set
- Fine-grained ranking: Precise cosine scores, business logic weighting, diversity algorithms
- Result assembly: Retrieve full content + metadata, final formatting, relevance explanations
Event-Driven Updates
- S3 event β Lambda β re-embed β upsert
- DynamoDB Streams β update pipeline
- Near real-time freshness
- Best for: frequently changing docs
Batch Sync
- Scheduled Glue jobs or Step Functions
- Delta detection (last-modified timestamps)
- Cost-efficient for bulk updates
- Best for: large corpora, nightly updates
Hybrid Approach
- Real-time for high-priority content
- Batch for bulk/archival content
- Drift monitoring with CloudWatch
- Version control for knowledge bases
S3 Metadata Framework for RAG Enhancement:
System-Defined Metadata
- Content-Type, Content-Length
- Last-Modified timestamp
- ETag (content fingerprint)
- x-amz-version-id
User-Defined Metadata (x-amz-meta-*)
- document-author, department, category
- expiry-date, version, language
- security-classification, jurisdiction
- Enables pre-filtering before vector search
Design Retrieval Mechanisms for FM Augmentation (RAG)
βΆ| Strategy | How It Works | Pros | Cons | Use When |
|---|---|---|---|---|
| Fixed-Size | Split every N tokens (e.g., 512) with optional overlap (e.g., 50 tokens) | Simple, predictable, consistent embeddings | May break semantic units | Uniform content (FAQs, reports) |
| Recursive Character | Try splitting on paragraphs β sentences β words β chars | Preserves natural boundaries better | Variable chunk sizes | General-purpose documents |
| Semantic | Split where embedding similarity drops below threshold | Content-aware, preserves meaning | Slower, requires embedding during chunking | Varied documents, conversational content |
| Hierarchical | Parent chunks (large context) + child chunks (precise retrieval) | Best of both worlds: precision + context | More complex, higher storage cost | Long documents needing both broad and specific retrieval |
| Document-Structure | Use headers, sections, paragraphs as boundaries | Preserves logical document structure | Requires structured input | PDFs, Word docs, HTML with clear structure |
- Overlap: 10-20% of chunk size to preserve cross-boundary context
- Include metadata in chunk (source, page, section) for better retrieval context
- Measure: chunk cohesion (intra-chunk cosine similarity), retrievability metrics
- Bedrock Knowledge Bases offers: fixed-size, semantic, and hierarchical chunking built-in
- Custom chunking: Lambda function for complex logic (hierarchical workflows)
Titan Text Embeddings V2
- Dimensions: 256, 512, or 1024 (configurable)
- Supports normalization (for cosine)
- English + multilingual support
- Best for: text-only semantic search
Titan Multimodal Embeddings G1
- Embeds both text AND images in same space
- Cross-modal similarity search
- Dimension: 1024
- Best for: product search, media retrieval
Embedding Selection Criteria
- Match dimensionality to quality/cost need
- Use SAME model for indexing AND querying
- Consider: throughput, cost per 1K tokens
- Cohere Embed for multilingual enterprise
Query Enhancement Techniques:
Query Expansion
- Use LLM to generate synonyms/related terms
- HyDE: generate hypothetical answer, embed it, search for similar docs
- Multi-query: generate N variations β union results
- Domain-specific expansion (medical/legal terms)
Query Decomposition
- Break complex queries into sub-queries
- Identify: temporal, entity, constraint components
- Run sub-queries in parallel (Lambda)
- Aggregate + deduplicate results
- Use Step Functions for orchestration
Re-ranking
- First-pass: fast ANN retrieval (top-k)
- Re-rank with cross-encoder model
- Apply business logic weighting
- Diversity algorithms (avoid result clustering)
- Amazon Kendra: hybrid keyword + semantic
Implement Prompt Engineering Strategies & Governance
βΆ| Technique | Description | AWS Implementation | Best For |
|---|---|---|---|
| Chain-of-Thought (CoT) | "Think step by step" β forces intermediate reasoning before answer | System message + prompt structure; Step Functions for multi-step | Math, logic, complex analysis |
| ReAct (Reason+Act) | Interleaved Reasoning-Action-Observation loop | Step Functions state machine (Reason state β Action state β Observe state) | Agentic tasks needing tool use |
| Few-Shot | Provide 3-5 examples in prompt | Bedrock Prompt Management templates with examples | Classification, format adherence |
| Tree of Thought | Explore multiple reasoning branches in parallel | Step Functions Parallel states + aggregation Lambda | Complex multi-path problems |
| Self-Consistency | Sample N responses, majority vote | Lambda to invoke model N times + aggregation | Factual accuracy, reducing hallucination |
| Prompt Chaining | Output of prompt A feeds prompt B | Bedrock Flows (visual) or Step Functions | Multi-stage document processing |
Content Filters
- Categories: Hate, Insults, Sexual, Violence, Misconduct, Prompt Attack
- Severity levels: LOW, MEDIUM, HIGH
- Applies to: INPUT and/or OUTPUT
- Custom threshold per category
Topic Denial
- Define forbidden topics with plain language
- Examples: competitor products, legal advice, medical diagnoses
- LLM-based classification (no regex)
- Returns custom denial message
PII Redaction
- 50+ PII types: SSN, credit card, email, phone, name, address
- Modes: REDACT (replace with type) or BLOCK
- Applies to both input and output
- Audit-ready with CloudTrail logging
Grounding Check
- Detects hallucinations vs. source documents
- Checks if output is grounded in retrieved context
- Relevance scoring threshold configurable
- Essential for RAG pipelines
Word Filters
- Custom blocked word lists
- Managed lists (profanity)
- Applied post-generation
Prompt Injection Defense
- PROMPT_ATTACK filter category in content filter
- Detects jailbreak attempts, role-play attacks
- System prompt separation (protected)
- Input validation in Lambda pre-Bedrock call
Bedrock Prompt Management Features:
- Centralized repository: Store prompt templates with versions
- Parameterization: Variables in templates ({{input}}, {{context}})
- Version control: Draft β Review β Approved β Production
- Approval workflows: Governance gates before deployment
- A/B testing: Route % traffic to different prompt versions
- Analytics: Track performance per prompt version
Governance Architecture:
- CloudTrail: All prompt management API calls logged
- IAM policies: Role-based access to prompt versions
- Security Hub: Near-real-time risk analytics for FM deployments
- Centralized vs. Federated: Central policy + distributed implementation
- Async monitoring: Don't impact latency with sync governance checks
Domain 2: Implementation & Integration
Tasks 2.1β2.5 Β· Agentic AI, Deployment Strategies, Enterprise Integration, API Patterns, Dev Tools
~25% of Exam WeightImplement Agentic AI Solutions & Tool Integrations
βΆ| Technology | Type | Key Characteristics | When to Use |
|---|---|---|---|
| Amazon Bedrock Agents | Fully Managed | Built-in orchestration, action groups, knowledge bases, memory, Guardrails integration | Standard agentic workflows, minimal infra management, conversational agents |
| Bedrock AgentCore | Composable Services | Framework-agnostic (works with any SDK/model), AgentCore Policy (governance), AgentCore Evaluations, episodic memory for enhanced context | Complex agents needing fine-grained composability, multi-framework environments |
| Strands Agents SDK | Open-Source | Full code visibility, modular (swap components), built-in eval, MCP integration, @tool decorator | Custom agent logic, need transparency/control, contributing to open-source |
| AWS Agent Squad | Multi-Agent Orchestration | Coordinates multiple specialized agents, shared context/state, task delegation | Complex tasks requiring collaboration between specialized agents |
| Step Functions (ReAct) | Workflow Engine | Deterministic state machines, guaranteed execution, built-in error handling, human approval steps | Predictable workflows needing audit trail, human-in-the-loop, compliance |
MCP is a standardized protocol for agent-tool interactions. Agents discover tools, invoke them, and get results via MCP servers.
MCP Transport Protocols
- stdio: Local process communication (dev/local)
- SSE: Server-Sent Events (streaming, HTTP)
- streamable-http: For AWS deployments (Mcp-Session-Id header for isolation)
MCP Server Hosting Options
- Lambda: Stateless, lightweight tools (web search, calculations, data retrieval)
- ECS: Stateful, complex tools (code execution, image processing, large compute)
- API Gateway: Expose MCP-compatible endpoints for existing services
6-Step MCP Workflow:
- MCP Client Initialization: Agent app connects to MCP server via transport protocol
- Tool Discovery: Agent calls list_tools() β gets name, description, input schema for each tool
- Agent Creation: Agent created with discovered tools; LLM can now see tools in system prompt
- Reasoning & Tool Selection: LLM analyzes user query, decides which tool to call and with what arguments
- MCP Server Execution: Server executes tool function, returns result to agent (server is stateless)
- Final Response: Agent synthesizes tool results into coherent response to user
Stopping Conditions
- Step Functions: max iteration count in Choice state
- Lambda: timeout settings (predictable execution)
- CloudWatch alarms: auto-halt on error rate threshold
- Circuit breaker: 50% failure β open circuit 30-60s
IAM Boundaries for Agents
- Least-privilege resource policies
- Restrict agent to only necessary actions/resources
- Deny any unneeded service calls
- Session policies for temporary credentials
Human-in-the-Loop
- Step Functions Human Task state (wait for token)
- API Gateway: collect human feedback
- DynamoDB: store review decisions with TTL
- Escalation criteria based on confidence scores
Input Validation
- Schema validation before agent processing
- Lambda pre-processing for malformed inputs
- Bedrock Guardrails: prompt injection detection
- Rate limiting via API Gateway usage plans
Reason state (invoke LLM) β Parse Action state (Lambda extracts tool call) β Execute Action state (call tool) β Observe state (feed result back to LLM) β repeat until final answer or max steps reached.Ensemble / Aggregation
- Multiple agents/models on same task
- Majority voting for classification
- Weighted averaging for numeric outputs
- Ranked fusion for retrieval
- Lambda aggregation logic
Specialized Routing
- Agent Squad: route to specialized agent
- Claude β complex reasoning tasks
- Nova Pro β document analysis
- Nova Lite β simple/high-volume tasks
- Domain-specific agents (medical, legal)
Hierarchical Agents
- Orchestrator agent decomposes task
- Sub-agents handle specific components
- Results aggregated by orchestrator
- Step Functions manages coordination
- DynamoDB shares state between agents
Implement Model Deployment Strategies
βΆ| Strategy | Service | Traffic Pattern | Latency | Cost Model | Key Config |
|---|---|---|---|---|---|
| On-Demand Serverless | Lambda + Bedrock | Spiky, unpredictable | Variable (cold start risk) | Pay per invocation | Memory, timeout, concurrency limits |
| Bedrock On-Demand | Bedrock InvokeModel | Any | Low-medium | Pay per token | Model ID, throttling limits |
| Bedrock Provisioned Throughput | Bedrock PT | Steady, high-volume | Consistent, low | Per-hour commitment (1mo/6mo) | Model Units (MUs), CloudWatch monitoring |
| SageMaker Real-time | SageMaker Endpoints | Consistent, latency-sensitive | Low (<1s) | Instance hours + data | Instance type, auto-scaling policy |
| SageMaker Serverless | SageMaker Serverless | Intermittent | Medium (cold start) | Pay per request | Memory size, max concurrency |
| SageMaker Async | SageMaker Async Endpoints | Batch, non-latency-sensitive | Minutes | Instance hours (scale-to-zero) | S3 input/output, max concurrency |
| Multi-Model Endpoint | SageMaker MME | Many models, low per-model traffic | Variable (model loading) | Shared instance across models | Container + model artifacts, routing |
Memory Management
- LLMs can be 10s-100s of GB
- SageMaker: up to 500GB model size
- GPU instances: ml.g5, ml.p4d.24xlarge (for large models)
- CPU for small NER/classification: ml.c5.9xlarge
- Container health check timeout: up to 60 min
Model Parallelism
- DeepSpeed: tensor/pipeline parallelism
- Triton + FasterTransformer: optimized inference
- SageMaker Distributed Inference
- UltraServers: multi-EC2 instances with low-latency interconnect
- For models larger than single GPU memory
Token Processing Optimization
- Batching: group requests to maximize GPU utilization
- Continuous batching: process tokens as they arrive
- KV-cache: reuse attention computations
- Quantization (INT8/INT4): reduce model size
- Knowledge distillation: train smaller model from large
SageMaker Endpoint Types Comparison:
Inference Components (New)
- Host multiple models on single endpoint
- Define separate scaling policies per model
- Control memory/CPU allocation per component
- Scale each model independently based on traffic
- Best for: multi-model serving with different traffic patterns
Serial Inference Pipelines
- Chain multiple models in sequence
- Output of model N β input of model N+1
- E.g.: preprocessing model β LLM β postprocessing
- Single endpoint for the pipeline
- Best for: fixed multi-step inference workflows
Model Cascading Architecture:
- Route all requests to smallest/cheapest model first (Nova Lite)
- Evaluate response quality with confidence scoring Lambda
- If quality < threshold (e.g., 0.7-0.9), escalate to Nova Pro
- Cache high-quality responses for similar future queries
- Monitor cascade metrics: escalation rate, cost savings, quality distribution
Caching Strategies
- Response caching: ElastiCache/DynamoDB for identical/near-identical queries
- Embedding caching: Avoid re-embedding same content
- Semantic caching: Return cached if query vector is close enough (similarity threshold)
- API Gateway cache: 300s default TTL for GET requests
Asynchronous Inference Pattern
- SQS queue β Lambda β SageMaker Async Endpoint
- Results stored in S3, notification via SNS
- Scale to zero when no traffic
- SQS visibility timeout matches processing duration (5-15 min for LLMs)
- DLQ after 3-5 failed attempts
Design & Implement Enterprise Integration Architectures
βΆAPI-Based Integration
- API Gateway: REST/HTTP/WebSocket APIs
- Custom domain mappings for branding
- Regional (low-latency) vs Edge-optimized (global)
- Lambda integration for custom logic
- Usage plans + throttling per API key
Event-Driven Integration
- EventBridge: route business events to FM processing
- Pattern matching: select which events need GenAI
- SQS DLQ: handle failed event processing
- EventBridge Pipes: source β filter β enrich β target
- Loose coupling between systems
Hybrid/On-Premises
- AWS Outposts: run FM inference in your DC
- AWS Wavelength: edge deployments for ultra-low latency
- Local Zones: geographic compliance
- Direct Connect: dedicated network to AWS
- Site-to-Site VPN: encrypted connectivity
| Security Layer | Service/Pattern | Implementation Detail |
|---|---|---|
| Identity Federation | IAM Identity Center / Cognito | Attribute mapping from IdP, role assignment per user group |
| Fine-grained Access | Amazon Verified Permissions | Cedar policy language, attribute-based (ABAC) policies on resources |
| Network Isolation | VPC Endpoints (PrivateLink) | Private connectivity to Bedrock without internet; security groups + NACLs |
| Encryption in Transit | ACM + TLS 1.2+ | All API calls to Bedrock are TLS encrypted by default |
| Encryption at Rest | AWS KMS | Customer-managed keys (CMK) for model artifacts, prompt logs, knowledge bases |
| Audit Logging | CloudTrail + CloudWatch Logs | Log all FM API calls with request/response for compliance |
CI/CD Pipeline (CodePipeline + CodeBuild)
- Source: CodeCommit/GitHub trigger
- Build: CodeBuild β package Lambda, validate prompts, dependency scan
- Test: Automated FM behavior tests (deterministic + probabilistic)
- Security scan: SAST/DAST, dependency vulnerabilities
- Staging deploy: limited traffic rollout
- Approval gate: human review or automated quality check
- Production deploy: blue/green or canary
- Post-deploy: CloudWatch alarms, rollback trigger
GenAI Gateway Pattern
- Centralized entry point for all FM access
- API Gateway β Lambda Gateway β Bedrock/SageMaker
- Enforces: auth, rate limiting, logging, cost tracking
- Model routing logic centralized here
- X-Ray tracing across all hops
- Cost allocation by team/use-case via tags
- Supports: A/B testing, gradual rollout
Implement FM API Integrations
βΆBedrock Streaming API
- InvokeModelWithResponseStream
- Returns chunks as they're generated
- Buffer management: 5-20 chunks, flush on sentence completion
- Client-side progressive rendering
- Error recovery: fallback to full-response API if streaming fails persistently
WebSocket / SSE Patterns
- WebSocket: bidirectional, keep-alive ping every 30-60s
- Idle timeout: ~10 min for interactive sessions
- SSE: reconnection backoff from 1s to max 30-60s
- Event IDs: resume streams after disconnection
- API Gateway: chunked transfer encoding
Retry Configuration
- Initial backoff: 100ms
- Backoff factor: 2x (exponential)
- Max backoff: 20 seconds
- Max attempts: 3-5
- Jitter: Β±100ms or factor 0.1-0.3
- Retriable: 429, 500, 503
- Non-retriable: 400, 401, 403
Circuit Breaker
- Failure threshold: 50% over 10 requests
- Recovery timeout: 30-60 seconds
- Half-open test traffic: 10-20%
- Implement: Step Functions + CloudWatch
- Alert: CloudWatch alarm β SNS β Lambda
Throttling Config
- Account-level: 10,000 RPS
- Stage-level: 1,000-5,000 RPS
- Route-level: 50-500 RPS (complex models)
- SQS for request buffering under throttle
- SQS visibility timeout: 5-15 min for LLMs
Connection Pooling
- Pool size: 10-20 connections per instance
- Connection TTL: 60-300 seconds
- Reduce SDK client instantiation (reuse across Lambda invocations)
- Use global variables for SDK clients in Lambda
Domain 3: AI Safety, Security & Governance
Guardrails, IAM, Responsible AI, Compliance, Audit, Data Privacy
~20% of Exam WeightImplement AI Safety Controls & Responsible AI
βΆ| Guardrail Feature | Configuration Detail | Applied At | Use Case |
|---|---|---|---|
| Content Filters | Categories: HATE, INSULTS, SEXUAL, VIOLENCE, MISCONDUCT, PROMPT_ATTACK. Severity: LOW/MEDIUM/HIGH per category | INPUT and/or OUTPUT independently | Block harmful content generation |
| Denied Topics | Plain language topic description (LLM-based classification, not regex). Custom denial message. | INPUT (topic detection) | Block competitor questions, legal/medical advice |
| Word Filters | Custom word lists + AWS managed profanity list | OUTPUT | Enforce brand/compliance word policies |
| PII Detection & Redaction | 50+ entity types: SSN, email, phone, credit card, name, address, IP. Mode: REDACT or BLOCK | INPUT and/or OUTPUT | HIPAA, PCI-DSS, GDPR compliance |
| Grounding Check | Verifies output is grounded in source context. Configurable relevance threshold. | OUTPUT (requires context) | Reduce hallucinations in RAG pipelines |
| Sensitive Info Filters | Regex patterns for custom sensitive data (e.g., employee IDs, internal codes) | INPUT and OUTPUT | Organization-specific PII beyond standard types |
Bias Detection & Mitigation
- SageMaker Clarify: bias metrics (class imbalance, DPPL, KL divergence)
- SageMaker Data Wrangler: data quality reports
- Model Cards (SageMaker): document model limitations, bias findings
- HELM benchmark: includes fairness + toxicity metrics
Explainability
- SageMaker Clarify: SHAP values for feature importance
- Chain-of-Thought prompting: expose reasoning
- Model Cards: document intended use, out-of-scope uses
- Attribution in RAG: cite source documents
Privacy & Data Protection
- Amazon Macie: detect PII in S3 automatically
- Bedrock: no model training on customer data (by default)
- VPC Endpoints: data doesn't leave AWS network
- KMS CMK: customer controls encryption keys
Auditability
- CloudTrail: all Bedrock API calls logged
- CloudWatch Logs: model inputs/outputs (optional logging)
- Bedrock Model Invocation Logging: S3 + CloudWatch
- CloudTrail Lake: query audit events with SQL
Key IAM Actions for Bedrock (Know These)
Service Control Policies (SCPs)
- Org-level deny: prevent use of non-approved models
- Region restrictions: only us-east-1, us-west-2
- Require conditions: VPC source, MFA, time-of-day
- Block: CreateProvisionedModelThroughput without approval
Resource-Based Policies for Bedrock
- Knowledge Base policies: control who can Retrieve/RetrieveAndGenerate
- Cross-account access for shared models
- Agent resource policies: restrict which roles can invoke
- Condition keys: bedrock:RequestedModelId (restrict to approved models)
Compliance, Governance & Data Privacy
βΆ| Regulation | Key Requirement | AWS Controls |
|---|---|---|
| GDPR | Data minimization, right to erasure, consent | Macie (PII detection), Guardrails (PII redaction), KMS (encryption), VPC endpoints (data residency) |
| HIPAA | PHI protection, audit trails, BAA | Bedrock HIPAA eligibility (with BAA), Macie, CloudTrail, dedicated endpoints, encryption |
| PCI-DSS | Cardholder data protection | Guardrails PII filter (credit card), KMS, VPC, CloudTrail, WAF on API Gateway |
| SOC 2 | Security, availability, confidentiality | CloudTrail audit, Security Hub, GuardDuty, Access Analyzer |
Domain 4: Operational Efficiency & Optimization
Cost optimization, Performance tuning, Caching, Monitoring, Auto-scaling
~15% of Exam WeightCost Optimization for GenAI
βΆModel Selection
- Use smaller models for simple tasks (cascading)
- Nova Lite for high-volume β Pro only when needed
- Measure: cost per task completion (not per token)
- A/B test model quality vs. cost
Prompt Optimization
- Shorter prompts = fewer input tokens = lower cost
- Remove unnecessary context
- Structured prompts produce shorter outputs
- Max tokens limit prevents runaway costs
Caching
- Semantic cache: return cached if cosine similarity > 0.95
- Response cache: ElastiCache for exact-match queries
- Embedding cache: avoid re-embedding same documents
- Up to 90% cost reduction for repetitive queries
Provisioned Throughput
- 1-month or 6-month commitment
- Break-even: typically ~70% utilization
- Use CloudWatch to track PT utilization
- Only for truly steady, predictable workloads
Performance Optimization & Monitoring
βΆTechnical Metrics
- Inference latency: p50, p90, p95, p99
- Throughput (tokens/s, requests/s)
- Error rate by error type
- Token utilization (input vs. output)
- Cache hit rate
- Model invocation count
Business Metrics
- Cost per inference / per task completion
- User satisfaction (CSAT, thumbs up/down)
- Task completion rate
- Time-to-first-token (UX)
- Business value per dollar spent
Quality Metrics
- Hallucination rate (grounding check score)
- Response relevance (semantic similarity)
- Guardrail trigger rate (by category)
- Human review escalation rate
- Model drift (quality degradation over time)
CloudWatch Dashboard Components:
- Bedrock invocation metrics (built-in namespace: AWS/Bedrock)
- Custom metrics: quality scores, cache hit rates, business KPIs (via PutMetricData)
- Log Insights queries: identify patterns in prompt confusion, slow responses
- Composite alarms: trigger only when multiple conditions met simultaneously
- Anomaly detection: ML-based baseline for adaptive alerting
| Service | Scaling Trigger | Scaling Type | Notes |
|---|---|---|---|
| SageMaker Endpoints | InvocationsPerInstance, CPU utilization, custom metrics | Target tracking or Step scaling | Cooldown periods prevent thrashing; Inference Components allow per-model scaling |
| Lambda | Concurrent executions (auto) | Automatic, up to account limit | Reserved concurrency for predictability; Provisioned concurrency for cold start elimination |
| Bedrock Provisioned Throughput | Manual or CloudWatch-triggered scaling | Model Units (MUs) | No auto-scale; plan capacity from usage metrics |
| OpenSearch | CPU, memory, storage utilization | Horizontal (add data nodes) or vertical | UltraWarm for cost-efficient historical vectors; Auto-Tune for JVM optimization |
| API Gateway | Throttling limits per stage/route | Usage plans (no auto-scale) | SQS buffer behind API GW for burst handling |
Domain 5: Testing, Validation & Troubleshooting
Model evaluation, QA frameworks, Regression testing, Debugging GenAI applications
~10% of Exam WeightModel Evaluation & Validation Frameworks
βΆAutomatic Evaluation
- Metrics: accuracy, robustness, toxicity
- Uses built-in or custom datasets
- Comparisons across multiple models
- Results in S3 and viewable in console
- ROUGE, METEOR, BERTScore for text quality
Human Evaluation (A/B)
- Side-by-side model comparison
- Human raters rank responses
- Criteria: accuracy, coherence, helpfulness
- Works with AWS Mechanical Turk or internal teams
- Statistical significance testing
| Test Type | What It Tests | Implementation |
|---|---|---|
| Functional Testing | Correct outputs for expected inputs | Lambda test harness, expected output comparison |
| Edge Case Testing | Boundary inputs, empty strings, very long prompts, special characters | Parameterized test suite, automated via Step Functions |
| Prompt Injection Testing | Resistance to jailbreak/injection attacks | Red-teaming prompts, Guardrail PROMPT_ATTACK filter testing |
| Regression Testing | New model/prompt version doesn't degrade previous quality | Golden dataset + automated quality comparison, CloudWatch quality metrics |
| Load Testing | Performance under expected traffic | Lambda concurrent invocations, API GW throttle testing |
| Hallucination Testing | Factual accuracy, grounding in source docs | Bedrock Grounding Check, RAGAs framework, human spot checks |
| Bias Testing | Consistent quality across demographic groups | SageMaker Clarify, HELM fairness metrics |
Poor RAG Quality
- Check: chunk size vs. query complexity
- Verify: same embedding model for index + query
- Inspect: similarity scores (too low = bad embeddings)
- Review: metadata filters (over-filtering?)
- Embedding drift: re-embed if model updated
High Latency
- X-Ray trace: find slow subsegment
- Cold starts: enable Lambda Provisioned Concurrency
- Vector search: reduce ef_search, add metadata pre-filter
- Model: try smaller model or Cross-Region inference
- Cache hit rate too low: review semantic threshold
Hallucinations
- Enable Bedrock Grounding Check guardrail
- Increase retrieved context (more chunks)
- Add citation requirement to prompt
- Reduce temperature (0.1-0.3 for factual tasks)
- Use CoT to expose reasoning
Throttling (429 errors)
- Check: Bedrock quota limits in Service Quotas
- Request quota increase via support ticket
- Implement exponential backoff + jitter
- Add SQS buffer for burst absorption
- Use Cross-Region inference for capacity
Deep Dive by Service
Reference architecture, configuration details, and exam tips per AWS service
Amazon Bedrock β Complete Service Reference
βΆ| API | Purpose | Key Parameters | Response Type |
|---|---|---|---|
| InvokeModel | Synchronous single inference | modelId, body (model-specific JSON) | Complete response |
| InvokeModelWithResponseStream | Streaming inference | modelId, body | Event stream (chunk by chunk) |
| Converse | Unified multi-model API (recommended) | modelId, messages[], system[], inferenceConfig | Complete, unified format |
| ConverseStream | Unified streaming API | Same as Converse | Event stream, unified format |
| Retrieve | Knowledge Base vector search only | knowledgeBaseId, retrievalQuery | Retrieved chunks + metadata |
| RetrieveAndGenerate | RAG: retrieve + generate in one call | knowledgeBaseId, input, retrievalConfig, generationConfig | Generated response + citations |
| ApplyGuardrail | Test guardrails without model call | guardrailIdentifier, guardrailVersion, source, content | Action (NONE/GUARDRAIL_INTERVENED) + assessments |
| CreateModelEvaluationJob | Automated model evaluation | evaluationConfig, inferenceConfig, outputDataConfig | Job ARN |
What is a Model Unit (MU)?
A Model Unit represents a specific throughput capacity (tokens per minute). Different models have different MU sizes. Purchase 1+ MUs based on peak throughput requirement.
- 1-month term: lower commitment, higher per-MU cost
- 6-month term: better rate, more risk
- No-commitment: available for some models (most expensive)
- CloudWatch: TokensPerMinute metric for utilization
When to Use Provisioned Throughput
- Steady traffic (70%+ utilization to break even)
- Need guaranteed capacity (SLA requirements)
- Consistent low latency requirements
- Avoid: spiky/unpredictable traffic (use on-demand)
- Cross-Region inference: use inference profiles instead
β‘ Quick Reference Cheat Sheet
Critical numbers, decision trees, and patterns for exam day
Critical Numbers & Thresholds to Memorize
βΆ- Initial backoff
- 100ms
- Backoff factor
- 2x (exponential)
- Max backoff
- 20 seconds
- Max attempts
- 3-5
- Jitter
- Β±100ms or 0.1-0.3 factor
- Circuit open at
- 50% fail over 10 requests
- Recovery timeout
- 30-60 seconds
- Half-open traffic
- 10-20%
- Visibility timeout
- 5-15 minutes (LLM tasks)
- DLQ after
- 3-5 failed attempts
- Max message size
- 256KB (use S3 pointer for large)
- Retention
- Up to 14 days
- FIFO vs Standard
- FIFO for ordered processing
- Request timeout max
- 29 seconds (REST API)
- Account-level throttle
- 10,000 RPS default
- Stage-level
- 1,000-5,000 RPS
- Route-level
- 50-500 RPS (complex models)
- Burst limit
- 2-3x steady-state rate
- Cache TTL
- 300s default
- 429 error
- Throttled (retriable)
- 400 error
- Bad request (NOT retriable)
- 401/403
- Auth (NOT retriable)
- 500/503
- Service error (retriable)
- Bedrock timeout
- Up to 120s complex models
- Simple models
- 15-30s timeout OK
- HNSW M param
- 16-64 (connections/node)
- ef_construction
- 100-512 (build quality)
- ef_search
- 100-512 (query quality)
- Cosine range
- -1 to 1 (1 = identical)
- Semantic cache threshold
- Cosine > 0.95
- S3 Vectors pre-filter saves
- 50-70% search space
- Connection pool size
- 10-20 connections/instance
- Connection TTL
- 60-300 seconds
- Provisioned concurrency
- Eliminate cold starts
- Global SDK client
- Initialize OUTSIDE handler
- LLM task timeout
- 5-15 minutes (async)
- Max Lambda timeout
- 15 minutes
- Provisioned break-even
- ~70% utilization
- S3 Vectors Intelligent-Tiering
- 40-60% cost reduction
- Semantic cache savings
- Up to 90% for repetitive queries
- Model cascade target
- Start with Nova Lite, escalate only if quality < threshold
- Buffer size
- 5-20 chunks
- Flush trigger
- Sentence completion or 100-500ms
- WebSocket keep-alive
- Ping every 30-60s
- WebSocket idle timeout
- ~10 minutes
- SSE reconnect
- 1s β backoff β max 30-60s
Last-Mile Exam Traps
βΆ| Common Trap | Correct Lean | What AWS Is Testing |
|---|---|---|
| Need managed failover and performance-aware regional routing | Inference profile | Not just "multi-Region"; choose the Bedrock-native routing construct. |
| Need general async batch inference for text/image workloads | CreateModelInvocationJob | StartAsyncInvoke is the distractor for Nova Reel video generation. |
| Need to inspect which knowledge base files failed ingestion | Knowledge base logging to CloudWatch Logs | CloudTrail audits API calls, not document-level ingestion outcomes. |
| Need to reorder already relevant retrieval results | Reranker models | Hybrid search improves retrieval; rerank improves final ordering. |
| Need to guarantee every inference call includes a guardrail | IAM condition key bedrock:GuardrailIdentifier | Central enforcement beats custom proxy code. |
| Need to know which specific guardrail layer intervened | trace: "enabled" + GuardrailPolicyType metrics | Not just whether input/output was blocked, but which policy fired. |
| Need generation to halt on a phrase | Stop sequences | Prompt instructions are weaker than inference parameters. |
| Unpredictable traffic with long idle periods | On-demand Bedrock | Provisioned Throughput is usually only right for high steady utilization. |
| Deterministic workflow with audit and mandatory sequence | Step Functions | Agents/Flows are often distractors when compliance is explicit. |
| Persistent MCP tool servers | ECS/Fargate | Lambda is attractive, but poor for persistent SSE-style connections. |
Decision Trees β What to Use When
βΆTop 20 Exam-Day Tips (High-Yield)
βΆ- Converse API = unified multi-model: Single code path for Claude, Nova, Titan, Llama. Preferred for new development over InvokeModel.
- Bedrock Guardrails must be explicitly invoked: Add guardrailConfig to every API call. Not auto-applied. Test with ApplyGuardrail API.
- Same embedding model for index AND query: Never mix models. This is a common trap question.
- Nova Forge = SageMaker AI: Accessed through SageMaker, NOT directly in Bedrock. Training from checkpoints to prevent catastrophic forgetting.
- S3 Vectors = billions of vectors: New service for massive-scale vector storage with Intelligent-Tiering. Pre-filter metadata before vector calculation (50-70% savings).
- Provisioned Throughput break-even ~70% utilization: Below that, on-demand is cheaper. Use CloudWatch to track PT utilization before committing.
- Step Functions + Bedrock = native integration: No Lambda needed for InvokeModel or RetrieveAndGenerate in Step Functions.
- MCP: Lambda = lightweight, ECS = complex: Lambda for stateless tool access (search, calc), ECS for code execution or image processing.
- HNSW ef_search tradeoff: Higher ef_search = better recall but slower queries. Tune based on acceptable latency at p99.
- pgvector advantage = SQL + vector: When you need relational queries combined with similarity search. Not for billion-scale.
- Cross-Region Inference Profiles: Use for distributing load across regions. Automatic failover, no additional cost vs. on-demand tokens.
- AgentCore vs. Agents: AgentCore = composable services (Policy, Evaluations, Memory) that work with ANY framework/model. Bedrock Agents = specific managed agent runtime.
- Probabilistic validation: FM outputs vary. Use semantic similarity scoring + thresholds (not exact match) for QA. Run N samples, validate distribution.
- ReAct in Step Functions: Reason (LLM) β Parse Action (Lambda) β Execute Tool (Lambda/API) β Observe (Pass state) β loop. Max iterations in Choice state.
- Hierarchical chunking = parent+child: Child chunks for precise matching, parent chunks returned as context. Built into Bedrock Knowledge Bases.
- Grounding Check = hallucination prevention: Bedrock Guardrail feature. Only works when you pass source context. Set threshold 0.7-0.9.
- Lambda cold starts: Use Provisioned Concurrency for latency-sensitive paths. Initialize SDK clients OUTSIDE handler function (global scope).
- SageMaker Inference Components: Host multiple models on one endpoint with INDEPENDENT scaling policies per model. Different from Multi-Model Endpoints (which share compute).
- Model cascading pattern: Route ALL traffic to Nova Lite first. Escalate to Nova Pro only when quality score < threshold. Can save 60-80% of inference costs.
- Security Hub + Bedrock: Near-real-time risk analytics for FM deployments. Correlates CloudTrail events with security findings. Configure custom standards for FM-specific risks.
Services Not to Forget (Often Overlooked)
βΆAmazon Kendra
Enterprise search combining keyword (BM25) + semantic. Pre-built connectors (S3, SharePoint, Confluence, Salesforce). FAQ extraction. Relevance tuning. Use when: existing enterprise docs, need zero-config search quality.
Amazon Macie
ML-powered PII detection in S3. Auto-discovers sensitive data. Integrates with Security Hub. Use for: data governance, GDPR compliance, before feeding data to FMs.
AWS AppConfig
Dynamic configuration without redeployment. Use for: routing rules, model selection logic, feature flags, A/B test percentages. Supports gradual rollout with automatic rollback.
Amazon Verified Permissions
Fine-grained authorization using Cedar policy language. ABAC (attribute-based) policies. Use for: controlling which users can query which knowledge bases, role-based FM access.
Amazon Bedrock Data Automation
AI-powered pipeline for processing unstructured documents (PDFs, images, audio, video). Extracts structured data automatically. Reduces manual preprocessing for RAG pipelines.
AWS X-Ray
Distributed tracing across Lambda β Bedrock β OpenSearch. Custom segments + annotations (model_name, cost, quality_score). Service map visualization. Filter traces by annotation. Use for latency debugging.
Amazon Comprehend
NLP enrichment for data pipelines. Entity extraction, sentiment, key phrases, PII detection, topic modeling. Use to enrich documents with metadata BEFORE indexing into vector store.
CloudTrail Lake
Query audit events with SQL (Athena-like). Use for: compliance reporting on FM usage, query who invoked which model when, detect unusual access patterns in prompt management.