🟠 Domain 2 β€” Implementation & Integration

Subdomains 2.1–2.5 Β· Your gap areas: 2.1 Agents Β· 2.2 Deployment Β· 2.3 Architecture Β· 2.5 App Integration

2.1 β€” Agentic AI Solutions & Tool Integration

Strands SDK β€” When Single Bedrock Agents Aren't Enough

Strands Agents SDK is an open-source framework for building multi-agent systems where different specialist agents handle different domains. Think of it as a routing layer that directs queries to the right expert agent.
MCP (Model Context Protocol) servers = the connectors between your Strands agent and external tools/APIs. Each company system or tool gets its own MCP server. The agent dynamically selects which MCP server to call at runtime.

Use Strands + MCP When

  • Multiple specialist domains needed
  • Dynamic tool selection at runtime
  • Complex multi-step, multi-domain workflows
  • LLM needs to decide which tool to call
  • Each tool = its own MCP server deployed as Lambda
  • Use Bedrock Flows When

  • Fixed, predictable execution path
  • Sequential steps you define in advance
  • Deterministic pipelines (no dynamic routing)
  • Simple LLM β†’ Lambda β†’ LLM chains
  • NOT for dynamic tool selection
  • Bedrock Flows implements predefined sequential steps with fixed logic. If the scenario involves an LLM deciding at runtime which tool to call, or routing between specialist domains, Flows can't do that β€” use Strands.

    Strands Specialist Multi-Agent Pattern

    Pattern: Create one specialist agent per domain (BillingAgent, TechSupportAgent, AccountAgent). Deploy each as a Lambda function. Register them all in Strands. An orchestrator agent routes user requests to the right specialist.
    A single Bedrock Agent with many OpenAPI action groups becomes unwieldy for complex multi-domain scenarios. The LLM has to process all the tool schemas at once, increasing latency and confusion. Specialist agents keep the context focused.
    Scenario: "AI assistant handles subscriptions, billing, AND tech support" β†’ Strands with specialist agents (one per domain) routed by an orchestrator. Not a single Bedrock Agent trying to do everything.

    ReAct Pattern vs Chain-of-Thought in Step Functions

    ReAct (Reasoning + Acting)

  • Interleaves reasoning and action
  • Observe β†’ Reason β†’ Act β†’ Observe again…
  • Dynamic β€” loops until task complete
  • Good for: tool use, multi-step data retrieval
  • In Step Functions: states for Observe/Reason/Act
  • Chain-of-Thought (CoT)

  • Explicit reasoning steps in the prompt
  • Linear β€” no action loop, no tool calls
  • Good for: complex math, logical reasoning
  • Not designed for tool use or dynamic workflows
  • CoT Choice states = not how ReAct works
  • When the exam asks about Step Functions + FM for a dynamic data retrieval/workflow system β†’ ReAct pattern with states for Observation, Reasoning, and Action. Use built-in Step Functions error handling + retry logic for model invocations.
    Chain-of-thought with "Choice states to dynamically adjust the reasoning path" β€” this is CoT terminology mixed with Step Functions, not the right architecture for agentic workflows.

    IAM for WebSocket + Bedrock Streaming

    When Lambda streams Bedrock responses over a WebSocket API, it needs TWO permissions in its IAM role:
    bedrock:InvokeModelWithResponseStream β€” permission to stream from Bedrock (not just InvokeModel)
    execute-api:ManageConnections β€” permission to push to WebSocket clients via API Gateway
    The resource ARN for ManageConnections must include your specific API Gateway WebSocket API ID. Generic * ARNs are not best practice and may not work.
    Buffering streaming tokens in DynamoDB + DynamoDB Streams adds unnecessary complexity and latency. The solution is just correct IAM permissions β€” streaming is designed to work directly without buffering.
    2.2 β€” Model Deployment Strategies

    SageMaker Inference Type Decision Tree

    TypeChoose WhenInstanceKey Trait
    Real-timeLow latency (<1s), synchronous, constant trafficAnyAlways-on endpoint, billed per hour
    AsynchronousLong-running (image/video gen), large payloads, spiky trafficAccelerated (GPU)Queue-based, result to S3
    ServerlessInfrequent/bursty traffic, simple models, cost savings at idleGeneral purpose (auto)Pay per invocation, cold starts possible
    BatchOffline, large dataset, no latency requirementAnyProcess millions at once, lowest cost
    Image generation models are GPU-intensive and have variable processing time. They need Asynchronous inference + accelerated (GPU) instance types. Serverless endpoints use general-purpose instances which lack GPU acceleration for image gen workloads.
    Serverless = general compute, auto-scaled to zero. Async = queue-based, GPU-capable, designed for long-running ML inference.

    Shadow Testing vs A/B Testing β€” Validation Before Release

    Shadow Test βœ“ β€” Safe Validation

  • New model receives a copy of prod traffic
  • Users only see old model responses
  • Zero production impact on users
  • Compare metrics: latency, accuracy, errors
  • Use to validate BEFORE releasing to users
  • A/B Test (Production Variants)

  • Live traffic split between old + new model
  • Real users see different model responses
  • Good for incremental rollout / comparison
  • Risk: users see potentially worse responses
  • Use when you're ready to expose both models
  • Scenario: "validate a new model version without impacting production users" β†’ shadow test. If the question says "gradually shift traffic" β†’ that's A/B or canary deployment.

    Bedrock On-Demand vs SageMaker β€” Which to Deploy To

    If the model is natively available in Amazon Bedrock (Amazon Nova, Claude, Titan, Llama via Bedrock) β†’ use Bedrock on-demand inference. Zero infrastructure. Pay per token. Automatic scaling.
    If the model is custom-trained, fine-tuned by you, or from HuggingFace via JumpStart β†’ deploy to SageMaker endpoint (real-time or async).
    The exam tries to get you to deploy Nova or Claude to SageMaker. Don't take the bait. Bedrock-native models should always stay in Bedrock. SageMaker is for your own models.
    2.3 β€” Enterprise Integration Architectures

    Async Document Processing Pattern

    Pattern for document upload + AI processing: S3 presigned URL (client uploads directly to S3, no API Gateway needed) β†’ S3 Event Notification β†’ SQS β†’ Lambda β†’ Bedrock (async processing) β†’ results back to S3 or DynamoDB.
    Using WebSocket + InvokeModelWithResponseStream for document processing is the wrong pattern. Streaming is for interactive real-time output where users are waiting. Document processing is async β€” the user uploads and comes back for results.
    Presigned URLs keep large file uploads out of Lambda memory limits. S3 β†’ SQS β†’ Lambda = decoupled, resilient, retryable. This is the AWS-recommended pattern for file-based AI workflows.
    Scenario: "users upload documents for AI analysis" β†’ presigned URL upload to S3 β†’ event-driven pipeline. NOT API Gateway + streaming.

    EventBridge + Step Functions for Video/Multi-Service Orchestration

    When you need to chain multiple AWS services (e.g., Rekognition + Transcribe + Bedrock), use EventBridge rule β†’ Step Functions state machine as the orchestrator. Step Functions calls each service API directly β€” no Lambda glue code needed for simple integrations.
    S3 upload β†’ EventBridge rule (matches PutObject) β†’ Step Functions workflow β†’ state 1: Rekognition (object detection) β†’ state 2: Bedrock FM (generate summary) β†’ state 3: store results.
    BDA blueprints define document field extraction schemas β€” they are not workflow orchestrators. You cannot use a BDA blueprint to "orchestrate Rekognition and Bedrock."

    API Gateway for Multi-Provider Model Routing

    Pattern: Single API Gateway REST API + non-proxy integrations + mapping templates (VTL) to transform request/response per provider + stage variables for endpoint URLs + AWS Secrets Manager for API keys.
    Never store API keys in client-side code. Never create one API Gateway per model provider β€” that's unmanageable and insecure. One API GW handles all providers through routing logic in mapping templates.
    Avoid: separate API GWs per model provider + client-side routing + API keys in the client app. This is a security disaster and a scaling nightmare.
    Header-based routing: API GW reads a header (e.g., Content-Type or X-Model-Provider) and uses the stage variable to route to the right backend model endpoint.

    RAG vs Streaming β€” Same Answer, Different Context

    This pattern appears multiple times. Memorize it:
    Accuracy problem (wrong answers, stale data, hallucination about your products) β†’ RAG + Knowledge Bases with up-to-date product catalog.

    Latency/UX problem (users waiting too long to see any response) β†’ response streaming (show tokens as they arrive).

    These solve completely different things. Streaming the wrong answer is still the wrong answer.
    2.4 β€” FM API Integrations

    Converse API vs InvokeModel API

    Converse API = unified multi-turn conversation interface. Works across all Bedrock models with a consistent message format. Automatically handles system prompts, conversation history, tool use. Best for chat applications.
    InvokeModel API = model-specific, lower-level. Each model has its own request/response format. More control, but you have to handle the format differences yourself.
    Fine-tuning data format = JSONL in Converse API format (not InvokeModel format). The exam specifies this when asking about fine-tuning data prep with Glue ETL.

    InvokeModelWithResponseStream β€” Streaming Specifics

    For streaming responses: use InvokeModelWithResponseStream (not InvokeModel). The response comes back as a stream of chunks (SSE events) that you send progressively to the client.
    IAM permission needed: bedrock:InvokeModelWithResponseStream β€” this is a separate permission from bedrock:InvokeModel. Both must be granted if your app uses both.
    For WebSocket delivery: Lambda needs bedrock:InvokeModelWithResponseStream AND execute-api:ManageConnections. The ManageConnections permission is what lets Lambda push to WebSocket clients.
    2.5 β€” Application Integration Patterns & Development Tools

    Amazon Q Developer β€” What It Actually Does

    Q Developer is a full AI coding assistant β€” not just autocomplete. Key capabilities:
    Contextual code suggestions β€” understands YOUR project's codebase, not just generic patterns. It reads your existing code to give relevant suggestions.
    Code generation + refactoring β€” generate new functions, refactor existing ones, rename variables across files.
    API guidance β€” explains AWS service APIs and how to use them correctly in context.
    Performance optimization β€” identifies bottlenecks and suggests improvements.
    Security analysis β€” flags security issues in code (one of many capabilities, not the only one).
    The exam tries to limit Q Developer to "security analysis only" or "just code completions and documentation lookups." Q Developer does all of the above plus code transformation, test generation, and modernization.
    Key word: contextual β€” it understands your project's specific code, dependencies, and patterns. Not just generic AWS documentation lookups.

    Inference Profiles for Cost Attribution

    Inference profiles are a routing and tagging mechanism. When you invoke Bedrock through an inference profile, AWS automatically tags the API calls with the profile's metadata for cost reporting.
    Create one inference profile per business unit / cost center / clinic. Invoke models via the profile ARN instead of the model ARN directly. AWS Cost Explorer shows costs broken down by profile.
    Scenario: "medical company, multiple clinics, need to track Bedrock costs per clinic" β†’ create one inference profile per clinic ID β†’ each Lambda invocation uses the clinic's profile β†’ cost reports per clinic.
    Routing by S3 key prefix or tagging Lambda functions doesn't give you clean Bedrock cost attribution. Only inference profiles give you direct cost breakdown at the Bedrock API level.

    Bedrock Flows vs Agents vs Strands β€” When to Use Each

    ServiceUse ForDynamic?
    Bedrock FlowsFixed multi-step LLM pipelines with predictable pathsNo β€” steps predefined
    Bedrock AgentsSingle domain agent with defined action groups (tools)Yes β€” LLM picks which tool
    Strands SDKMulti-specialist agents, MCP tool integration, complex orchestrationYes β€” dynamic at runtime
    Fixed sequential document summarization pipeline β†’ Flows.
    Customer service bot that can look up orders OR check inventory β†’ Bedrock Agent.
    Platform with billing, tech support, AND account agents, each needing their own context β†’ Strands.