Domain 2 — Implementation & Integration

2.1 — Agentic AI Solutions & Tool Integration

Strands SDK — When Single Bedrock Agents Aren't Enough

Strands Agents SDK is an open-source framework for building multi-agent systems where different specialist agents handle different domains. Think of it as a routing layer that directs queries to the right expert agent.

MCP (Model Context Protocol) servers = the connectors between your Strands agent and external tools/APIs. Each company system or tool gets its own MCP server. The agent dynamically selects which MCP server to call at runtime.

Use Strands + MCP When

Multiple specialist domains needed

Dynamic tool selection at runtime

Complex multi-step, multi-domain workflows

LLM needs to decide which tool to call

Each tool = its own MCP server deployed as Lambda

Use Bedrock Flows When

Fixed, predictable execution path

Sequential steps you define in advance

Deterministic pipelines (no dynamic routing)

Simple LLM → Lambda → LLM chains

NOT for dynamic tool selection

Bedrock Flows implements predefined sequential steps with fixed logic. If the scenario involves an LLM deciding at runtime which tool to call, or routing between specialist domains, Flows can't do that — use Strands.

Strands Specialist Multi-Agent Pattern

Pattern: Create one specialist agent per domain (BillingAgent, TechSupportAgent, AccountAgent). Deploy each as a Lambda function. Register them all in Strands. An orchestrator agent routes user requests to the right specialist.

A single Bedrock Agent with many OpenAPI action groups becomes unwieldy for complex multi-domain scenarios. The LLM has to process all the tool schemas at once, increasing latency and confusion. Specialist agents keep the context focused.

Scenario: "AI assistant handles subscriptions, billing, AND tech support" → Strands with specialist agents (one per domain) routed by an orchestrator. Not a single Bedrock Agent trying to do everything.

ReAct Pattern vs Chain-of-Thought in Step Functions

ReAct (Reasoning + Acting)

Interleaves reasoning and action

Observe → Reason → Act → Observe again…

Dynamic — loops until task complete

Good for: tool use, multi-step data retrieval

In Step Functions: states for Observe/Reason/Act

Chain-of-Thought (CoT)

Explicit reasoning steps in the prompt

Linear — no action loop, no tool calls

Good for: complex math, logical reasoning

Not designed for tool use or dynamic workflows

CoT Choice states = not how ReAct works

When the exam asks about Step Functions + FM for a dynamic data retrieval/workflow system → ReAct pattern with states for Observation, Reasoning, and Action. Use built-in Step Functions error handling + retry logic for model invocations.

Chain-of-thought with "Choice states to dynamically adjust the reasoning path" — this is CoT terminology mixed with Step Functions, not the right architecture for agentic workflows.

IAM for WebSocket + Bedrock Streaming

When Lambda streams Bedrock responses over a WebSocket API, it needs TWO permissions in its IAM role:

bedrock:InvokeModelWithResponseStream — permission to stream from Bedrock (not just InvokeModel)

execute-api:ManageConnections — permission to push to WebSocket clients via API Gateway

The resource ARN for ManageConnections must include your specific API Gateway WebSocket API ID. Generic * ARNs are not best practice and may not work.

Buffering streaming tokens in DynamoDB + DynamoDB Streams adds unnecessary complexity and latency. The solution is just correct IAM permissions — streaming is designed to work directly without buffering.

2.2 — Model Deployment Strategies

SageMaker Inference Type Decision Tree

Type	Choose When	Instance	Key Trait
Real-time	Low latency (<1s), synchronous, constant traffic	Any	Always-on endpoint, billed per hour
Asynchronous	Long-running (image/video gen), large payloads, spiky traffic	Accelerated (GPU)	Queue-based, result to S3
Serverless	Infrequent/bursty traffic, simple models, cost savings at idle	General purpose (auto)	Pay per invocation, cold starts possible
Batch	Offline, large dataset, no latency requirement	Any	Process millions at once, lowest cost

Image generation models are GPU-intensive and have variable processing time. They need Asynchronous inference + accelerated (GPU) instance types. Serverless endpoints use general-purpose instances which lack GPU acceleration for image gen workloads.

Serverless = general compute, auto-scaled to zero. Async = queue-based, GPU-capable, designed for long-running ML inference.

Shadow Testing vs A/B Testing — Validation Before Release

Shadow Test ✓ — Safe Validation

New model receives a copy of prod traffic

Users only see old model responses

Zero production impact on users

Compare metrics: latency, accuracy, errors

Use to validate BEFORE releasing to users

A/B Test (Production Variants)

Live traffic split between old + new model

Real users see different model responses

Good for incremental rollout / comparison

Risk: users see potentially worse responses

Use when you're ready to expose both models

Scenario: "validate a new model version without impacting production users" → shadow test. If the question says "gradually shift traffic" → that's A/B or canary deployment.

Bedrock On-Demand vs SageMaker — Which to Deploy To

If the model is natively available in Amazon Bedrock (Amazon Nova, Claude, Titan, Llama via Bedrock) → use Bedrock on-demand inference. Zero infrastructure. Pay per token. Automatic scaling.

If the model is custom-trained, fine-tuned by you, or from HuggingFace via JumpStart → deploy to SageMaker endpoint (real-time or async).

The exam tries to get you to deploy Nova or Claude to SageMaker. Don't take the bait. Bedrock-native models should always stay in Bedrock. SageMaker is for your own models.

2.3 — Enterprise Integration Architectures

Async Document Processing Pattern

Pattern for document upload + AI processing: S3 presigned URL (client uploads directly to S3, no API Gateway needed) → S3 Event Notification → SQS → Lambda → Bedrock (async processing) → results back to S3 or DynamoDB.

Using WebSocket + InvokeModelWithResponseStream for document processing is the wrong pattern. Streaming is for interactive real-time output where users are waiting. Document processing is async — the user uploads and comes back for results.

Presigned URLs keep large file uploads out of Lambda memory limits. S3 → SQS → Lambda = decoupled, resilient, retryable. This is the AWS-recommended pattern for file-based AI workflows.

Scenario: "users upload documents for AI analysis" → presigned URL upload to S3 → event-driven pipeline. NOT API Gateway + streaming.

EventBridge + Step Functions for Video/Multi-Service Orchestration

When you need to chain multiple AWS services (e.g., Rekognition + Transcribe + Bedrock), use EventBridge rule → Step Functions state machine as the orchestrator. Step Functions calls each service API directly — no Lambda glue code needed for simple integrations.

S3 upload → EventBridge rule (matches PutObject) → Step Functions workflow → state 1: Rekognition (object detection) → state 2: Bedrock FM (generate summary) → state 3: store results.

BDA blueprints define document field extraction schemas — they are not workflow orchestrators. You cannot use a BDA blueprint to "orchestrate Rekognition and Bedrock."

API Gateway for Multi-Provider Model Routing

Pattern: Single API Gateway REST API + non-proxy integrations + mapping templates (VTL) to transform request/response per provider + stage variables for endpoint URLs + AWS Secrets Manager for API keys.

Never store API keys in client-side code. Never create one API Gateway per model provider — that's unmanageable and insecure. One API GW handles all providers through routing logic in mapping templates.

Avoid: separate API GWs per model provider + client-side routing + API keys in the client app. This is a security disaster and a scaling nightmare.

Header-based routing: API GW reads a header (e.g., Content-Type or X-Model-Provider) and uses the stage variable to route to the right backend model endpoint.

RAG vs Streaming — Same Answer, Different Context

This pattern appears multiple times. Memorize it:

Accuracy problem (wrong answers, stale data, hallucination about your products) → RAG + Knowledge Bases with up-to-date product catalog.

Latency/UX problem (users waiting too long to see any response) → response streaming (show tokens as they arrive).

These solve completely different things. Streaming the wrong answer is still the wrong answer.

2.4 — FM API Integrations

Converse API vs InvokeModel API

Converse API = unified multi-turn conversation interface. Works across all Bedrock models with a consistent message format. Automatically handles system prompts, conversation history, tool use. Best for chat applications.

InvokeModel API = model-specific, lower-level. Each model has its own request/response format. More control, but you have to handle the format differences yourself.

Fine-tuning data format = JSONL in Converse API format (not InvokeModel format). The exam specifies this when asking about fine-tuning data prep with Glue ETL.

InvokeModelWithResponseStream — Streaming Specifics

For streaming responses: use InvokeModelWithResponseStream (not InvokeModel). The response comes back as a stream of chunks (SSE events) that you send progressively to the client.

IAM permission needed: bedrock:InvokeModelWithResponseStream — this is a separate permission from bedrock:InvokeModel. Both must be granted if your app uses both.

For WebSocket delivery: Lambda needs bedrock:InvokeModelWithResponseStream AND execute-api:ManageConnections. The ManageConnections permission is what lets Lambda push to WebSocket clients.

2.5 — Application Integration Patterns & Development Tools

Amazon Q Developer — What It Actually Does

Q Developer is a full AI coding assistant — not just autocomplete. Key capabilities:

Contextual code suggestions — understands YOUR project's codebase, not just generic patterns. It reads your existing code to give relevant suggestions.

Code generation + refactoring — generate new functions, refactor existing ones, rename variables across files.

API guidance — explains AWS service APIs and how to use them correctly in context.

Performance optimization — identifies bottlenecks and suggests improvements.

Security analysis — flags security issues in code (one of many capabilities, not the only one).

The exam tries to limit Q Developer to "security analysis only" or "just code completions and documentation lookups." Q Developer does all of the above plus code transformation, test generation, and modernization.

Key word: contextual — it understands your project's specific code, dependencies, and patterns. Not just generic AWS documentation lookups.

Inference Profiles for Cost Attribution

Inference profiles are a routing and tagging mechanism. When you invoke Bedrock through an inference profile, AWS automatically tags the API calls with the profile's metadata for cost reporting.

Create one inference profile per business unit / cost center / clinic. Invoke models via the profile ARN instead of the model ARN directly. AWS Cost Explorer shows costs broken down by profile.

Scenario: "medical company, multiple clinics, need to track Bedrock costs per clinic" → create one inference profile per clinic ID → each Lambda invocation uses the clinic's profile → cost reports per clinic.

Routing by S3 key prefix or tagging Lambda functions doesn't give you clean Bedrock cost attribution. Only inference profiles give you direct cost breakdown at the Bedrock API level.

Bedrock Flows vs Agents vs Strands — When to Use Each

Service	Use For	Dynamic?
Bedrock Flows	Fixed multi-step LLM pipelines with predictable paths	No — steps predefined
Bedrock Agents	Single domain agent with defined action groups (tools)	Yes — LLM picks which tool
Strands SDK	Multi-specialist agents, MCP tool integration, complex orchestration	Yes — dynamic at runtime

Fixed sequential document summarization pipeline → Flows.
Customer service bot that can look up orders OR check inventory → Bedrock Agent.
Platform with billing, tech support, AND account agents, each needing their own context → Strands.

🟠 Domain 2 — Implementation & Integration

Strands SDK — When Single Bedrock Agents Aren't Enough

Use Strands + MCP When

Use Bedrock Flows When

Strands Specialist Multi-Agent Pattern

ReAct Pattern vs Chain-of-Thought in Step Functions

ReAct (Reasoning + Acting)

Chain-of-Thought (CoT)

IAM for WebSocket + Bedrock Streaming

SageMaker Inference Type Decision Tree

Shadow Testing vs A/B Testing — Validation Before Release

Shadow Test ✓ — Safe Validation

A/B Test (Production Variants)

Bedrock On-Demand vs SageMaker — Which to Deploy To

Async Document Processing Pattern

EventBridge + Step Functions for Video/Multi-Service Orchestration

API Gateway for Multi-Provider Model Routing

RAG vs Streaming — Same Answer, Different Context

Converse API vs InvokeModel API

InvokeModelWithResponseStream — Streaming Specifics

Amazon Q Developer — What It Actually Does

Inference Profiles for Cost Attribution

Bedrock Flows vs Agents vs Strands — When to Use Each