AIP-C01 Interactive Study Tool | AWS Certified Generative AI Developer

Scenario-Based Practice Questions

40+ exam-style questions weighted by domain importance. Select a domain or attempt all.

Flashcard System

60+ flashcards organized by domain. Click to flip, then rate your knowledge.

"Which AWS Service?" Rapid-Fire Drill

30+ timed scenarios. Pick the right service in 15 seconds!

Test your service knowledge under pressure. 15 seconds per question.

Architecture Pattern Matcher

Match services to use cases, order pipeline steps, and connect architecture patterns.

Exam Readiness Tracker

Self-assess your knowledge across all 5 domains. Progress saves automatically.

Final 48-Hour Focus Board

Your fastest route to exam readiness: study the traps, then grind the final simulator.

How to use this panel If time is tight, memorize the decision pairs first, then work the challenge simulator in Final Practice, then come back here for rapid correction of weak areas.

Tonight: Highest-Value Topics

Bedrock APIs: `Converse`, `InvokeModel`, `InvokeModelWithResponseStream`, `CreateModelInvocationJob`, `RetrieveAndGenerate`
Inference profiles vs. cross-Region inference vs. Provisioned Throughput
Knowledge Bases vs. OpenSearch hybrid search vs. custom RAG
Bedrock Agents vs. Strands vs. Step Functions vs. Flows
Guardrails: tracing, `GuardrailPolicyType`, `bedrock:GuardrailIdentifier`

Tomorrow: Stabilize the Gaps

Reranker models, hybrid retrieval, embedding drift, KB ingestion logging
Identity patterns: Cognito OIDC, IAM Identity Center, temporary credentials
Optimization patterns: prompt caching, batch inference, model cascading
Evaluation patterns: LLM-as-judge, RAGAs, A/B testing, canary rollout
Application patterns: Amplify AI Kit, Amazon Q Developer, MCP deployment

Final Practice Order

Run the full deck once in Final Practice.
Review only missed questions and write the decision rule in one line.
Use flashcards here for weak domains.
Do one last retry-misses pass, then finish with a shuffled fresh run.

Exam Trap Matrix ▶

If the question says...	Usually think...	Why it matters
Managed failover and best Region for inference	Inference profile	More precise than generic Route 53 or manual cross-Region logic.
Large async batch jobs for text/image workloads	`CreateModelInvocationJob`	`StartAsyncInvoke` is a trap for Nova Reel video generation, not general batch inference.
Need to improve ordering of already relevant retrieved docs	Reranker models	Retrieval found good docs, but ranking is weak.
Need to inspect KB ingestion failures	Knowledge base logging to CloudWatch Logs	CloudTrail logs API calls, not document-level ingestion statuses.
Need to force guardrail use on every model call	IAM with `bedrock:GuardrailIdentifier`	Central enforcement beats custom proxy Lambda.
Need to know exactly which guardrail policy intervened	`trace: enabled` + `GuardrailPolicyType` metrics	Better than only knowing input vs output was blocked.
Need a model to stop after a phrase	Stop sequences	Prompt instructions are weaker and unreliable.
Unpredictable traffic with long idle periods	On-demand Bedrock via Lambda/API	Provisioned Throughput is usually wrong unless utilization is steady.
Deterministic compliance workflow with audit	Step Functions	Agents and Flows are too dynamic for strict execution guarantees.
Persistent MCP server connections	ECS Fargate	Lambda is a common distractor but is a poor fit for persistent SSE connections.

What You Cannot Afford to Mix Up ▶

Retrieval Stack

Knowledge Bases = managed RAG, least ops
OpenSearch hybrid = best search tuning + high QPS
pgvector = vector + SQL joins
S3 Vectors = billion-scale, lowest cost

Agent Stack

Bedrock Agents = managed agent orchestration
Strands = open-source, more control
AgentCore = runtime/policy/memory/evals building blocks
Step Functions = deterministic orchestration, not autonomous reasoning

Security Stack

Guardrails = content safety and grounded responses
Cognito / Identity Center = federation and temporary access
PrivateLink = network isolation
CloudTrail = API audit, not KB ingestion detail

Additional Study Material

Deep-dive content on topics not fully covered in the reference guide.

Task 2.5: Application Integration Patterns & Development Tools ▶

Amazon Q Developer

AI-powered development assistant (replaced CodeWhisperer):

Code Generation: Inline suggestions, function completion, boilerplate
Code Transformation: Refactoring, language translation, modernization
Security Scanning: Vulnerability detection, remediation suggestions
Unit Test Generation: Auto-generate tests for existing code
Code Explanation: Explain complex code blocks in plain English
CLI Assistance: Natural language to CLI commands

IDE Support: VS Code, JetBrains, Visual Studio, AWS Console, CLI

Amazon Q Business

Fully managed GenAI assistant for enterprise knowledge:

40+ Data Source Connectors: Confluence, SharePoint, Salesforce, S3, databases, Slack, Jira
ACL Respect: Honors existing access permissions from source systems
Natural Language Q&A: Ask questions across all connected sources
Admin Controls: Topic blocking, response guardrails
Plugins: Create custom actions (ticket creation, approvals)

Exam Tip: Q Developer = code/IDE assistance. Q Business = enterprise knowledge search. Don't confuse them.

AWS Amplify AI Kit

Declarative React components: <AIConversation>
Built-in streaming response handling
Conversation history management
Authentication integration (Cognito)
Backend Bedrock connection (no direct client API calls)
Rapid prototyping for mobile/web GenAI features

Bedrock Flows vs Step Functions

Aspect	Bedrock Flows	Step Functions
Type	No-code prompt chain builder	General workflow orchestration
Users	Business analysts, non-devs	Developers
Best For	LLM prompt chains	Complex business logic, audit trails
Execution	Sequential prompt flow	Deterministic state machine
Integrations	Bedrock models, KBs, Guardrails	200+ AWS services

Bedrock Data Automation

Automated extraction from complex documents (PDFs, images)
Handles tables, forms, charts, key-value pairs
Pre-processes documents before Knowledge Base ingestion
Better retrieval accuracy for structured content
Replaces need for custom Textract + parsing pipelines

CRM Enhancement Pattern

CRM events → EventBridge → Lambda → Bedrock
Use cases: summarize customer interactions, generate next-best-action, auto-draft responses
Write results back to CRM via API
Works with Salesforce, HubSpot, ServiceNow
Lambda handles authentication and data transformation

Prompt Caching (Critical New Feature) ▶

How Prompt Caching Works Bedrock caches static prompt prefixes (system prompts, document context, few-shot examples). The first request processes the full prompt and creates a cache checkpoint. Subsequent requests with the same prefix read from cache at reduced cost. Only new/changed tokens after the cached prefix are processed at full price.

TTL Options

Default TTL: 5 minutes — Auto-expires if no requests with that prefix
Extended TTL: 1 hour — For longer sessions (document QA, agentic workflows)
Cache refreshes on each access within TTL window
TTL is per-model, per-region

Supported APIs

Converse API
InvokeModel API
ConverseStream API
InvokeModelWithResponseStream API
Compatible with Cross-Region Inference

Key Requirements

Minimum tokens per checkpoint: Varies by model (e.g., 1024 tokens for Claude)
Cache checkpoints are placed at the end of the cacheable prefix
Prefix must be an exact byte-for-byte match
Multiple cache checkpoints possible in a single prompt

Cost Model

Cache Write: Slightly higher cost than normal processing (one-time)
Cache Read: Up to 90% cheaper than normal processing
Break-even: After ~2-3 cache reads, you save money
Response metadata shows cache hit/miss status

Exam Tip: Best Use Cases for Prompt Caching 1. Multi-turn conversations with long system prompts
2. Document QA (same document, many questions)
3. Agentic workflows with repeated tool definitions
4. Few-shot learning with extensive example sets
5. Any pattern where the prompt prefix is reused across requests

// Prompt caching with Converse API
const response = await client.converse({
  modelId: "anthropic.claude-3-5-sonnet-20240620-v1:0",
  messages: [...],
  system: [{
    text: longSystemPrompt,  // This gets cached
    cacheControl: { type: "ephemeral" }  // Enable caching
  }],
  // Extended TTL
  performanceConfig: { cacheExpiry: { ttlInSeconds: 3600 } }
});

AgentCore Deep Dive ▶

AgentCore Runtime

Session Isolation: Each user session runs in its own sandboxed environment
Duration: Up to 8 hours of continuous execution per session
Bidirectional Streaming: Real-time interaction between user and agent
Code Interpreter: Execute Python/JavaScript in sandbox
Browser Runtime: Navigate web pages, fill forms, extract data
Auto-scaling based on concurrent sessions

AgentCore Gateway

API → Tool: Convert any REST API or Lambda into agent-compatible tools
MCP Server Connection: Connect to external MCP servers
Unified Interface: Single management layer for all tool integrations
Authentication: Handles tool-level auth (API keys, OAuth, IAM)
Schema Management: Auto-generates tool schemas for the agent

AgentCore Policy

Natural Language Rules: "Only admin users can access the delete API"
Auto-compilation: Converts to Cedar policy language
Real-time Enforcement: Evaluated on every tool call during execution
Identity-aware: Uses user role, tenant, custom claims for decisions
Audit Logging: Every policy decision is logged

Watch Out: Don't confuse AgentCore Policy with Bedrock Guardrails. Policy controls which tools a user can access. Guardrails controls what content the model can generate.

AgentCore Memory

Session Memory: Short-term context within a single conversation
Episodic Memory: Long-term memory that persists across sessions
Agents learn from past interactions (user preferences, past resolutions)
Semantic search over memory store
Memory scoped per user, per tenant

AgentCore Evaluations

13 Built-in Evaluators: Tool selection, relevance, safety, consistency, etc.
Custom Evaluators: Define domain-specific quality checks
Continuous Monitoring: Evaluate every interaction in real-time
Batch Evaluation: Run evaluations on historical conversations
Integrates with CloudWatch for alerting

AgentCore Identity & Observability

Identity:

Integrates with identity providers (Okta, Azure AD, Cognito)
Custom claims for tenant-level isolation
Maps external identities to agent authorization context

Observability:

CloudWatch dashboards (latency, throughput, errors, tokens)
OpenTelemetry integration for distributed tracing
Track agent behavior across tools and sessions

Domain 3 Expansion: AI Governance & Responsible AI ▶

AI Governance Mechanisms

Model Inventory: Centralized catalog of all AI models (SageMaker Model Registry)
Approval Workflows: Pending → Approved → Rejected status for each model version
Version Control: Track model versions, parameters, training data
Data Lineage: Source data → preprocessing → training → deployment tracking
Risk Assessment: Categorize models by risk level (low/medium/high)

SageMaker Model Cards

Standardized documentation for ML models
Content: Purpose, training data, performance metrics, ethical considerations, limitations
Designed for auditor review
Integrates with Model Registry
Exportable as PDF for compliance records
Supports custom fields for domain-specific requirements

Responsible AI Principles on AWS

Principle	AWS Implementation
Fairness	SageMaker Clarify bias detection (pre/post-training)
Transparency	Model Cards, SHAP explanations, feature importance
Privacy	Data never used for training, encryption, VPC isolation
Security	Guardrails, IAM, KMS encryption, PrivateLink
Robustness	Model evaluation, A/B testing, canary deployments
Governance	Model Registry, approval workflows, audit trails

SageMaker Clarify Bias Metrics

DPPL (Difference in Positive Proportions in Labels): Measures labeling bias between groups
KL Divergence: Statistical measure of distribution difference between groups
Class Imbalance Ratio (CI): Detects over/under-representation
DPL (Difference in Positive Proportions in Labels): Pre-training metric
DPPL: Post-training — compares predicted outcomes across groups

Exam Tip: Know that Clarify provides BOTH pre-training bias detection (in the data) and post-training bias detection (in the model predictions). These are different and complementary.

Domain 5 Expansion: Testing, Evaluation & Deployment ▶

RAGAs Framework

Purpose-built metrics for RAG pipeline evaluation:

Context Relevance: Are retrieved documents relevant to the query?
Faithfulness: Is the answer grounded in retrieved context? (detects hallucination)
Answer Relevance: Does the answer address the actual question?
Answer Correctness: Is the answer factually correct?

Automated evaluation using LLM-as-judge. Open-source Python library.

LLM-as-Judge Pattern

Use a powerful evaluator model to score outputs of another model
Dimensions: coherence, relevance, fluency, harmlessness, helpfulness
Scalable alternative to human evaluation
Supported by Bedrock Model Evaluation (automatic mode)
Limitations: May have style bias, position bias, verbosity bias
Best practice: use a different model family as judge than the one being evaluated

Bedrock Model Evaluation

Type	Automatic	Human
Method	Built-in metrics + LLM-as-judge	Human reviewers via Ground Truth
Speed	Fast, scalable	Slow, expensive
Best For	Objective metrics, model comparison	Subjective quality, edge cases
Metrics	Accuracy, toxicity, robustness, quality dimensions	Custom rubrics, preferences

SageMaker Deployment Strategies

Canary: Send small % to new model first, gradually increase. Auto-rollback on CloudWatch alarm triggers.
Blue-Green: Deploy new model in parallel, switch all traffic at once. Old model stays for instant rollback.
Linear: Shift traffic in equal increments (e.g., 10% every 10 min).
All-at-once: Switch immediately. Fastest but riskiest — no gradual rollout.

A/B Testing with SageMaker Experiments

Organize model variants as experiment trials
Track parameters, metrics, and artifacts per variant
Compare results across trials with built-in visualization
Integrates with Model Registry for version tracking
Use for: prompt variants, model comparisons, hyperparameter tuning

Common Troubleshooting Patterns

Throttling (429): Retry with exponential backoff + Cross-Region Inference
RAG quality degradation: Re-evaluate chunking, re-index, check embedding consistency
Agent wrong tool selection: Improve tool descriptions, use AgentCore Evaluations
High latency: Enable streaming, optimize prompt length, use prompt caching
Cost overruns: Implement intelligent routing, prompt caching, batch processing