AWS DVA-C02 — Complete Study Guide

Domain 1 Overview

Develop, test, and maintain applications on AWS. Covers architectural patterns, AWS SDKs and APIs, messaging and event-driven services, Lambda development, and data store integration. The highest-weighted domain at 32%.

⚡ 32% of scored content — Highest weighted domain

Task 1.1

Develop Code for Applications Hosted on AWS

Architectural patterns, SDK usage, API design, messaging services, streaming, and event-driven patterns.

Skills in:

Describe architectural patterns (event-driven, microservices, monolithic, choreography, orchestration, fanout)
Describe differences between stateful/stateless and tightly/loosely coupled components
Describe differences between synchronous and asynchronous patterns
Write and run unit tests; write code that interacts with AWS services via APIs and SDKs
Use messaging services, handle streaming data, implement resilient application code
Use Amazon EventBridge for event-driven patterns; use Amazon Q Developer

🏗️ Architectural Patterns — Event-Driven, Microservices, Choreography vs. Orchestration

ArchitectureExam Fave

▾

The DVA-C02 exam tests your ability to choose the right architectural pattern for a given scenario. Understand the tradeoffs deeply.

Pattern Comparison

Pattern	Description	AWS Services	Best For
Monolithic	Single deployable unit; tightly coupled	EC2, Elastic Beanstalk	Simple apps, small teams, early stage
Microservices	Independent services with own data stores	ECS, EKS, Lambda, API GW	Scale independently, team autonomy
Event-Driven	Components react to events asynchronously	EventBridge, SNS, SQS, Lambda	Decoupled, scalable workflows
Fanout	One publisher → multiple subscribers	SNS → SQS queues	Parallel processing of same event

Choreography vs. Orchestration

	Choreography	Orchestration
Control	Decentralized — each service reacts to events	Centralized — one service directs the flow
Coupling	Loose — services don't know each other	Tighter — orchestrator knows all services
AWS Service	EventBridge, SNS, SQS	AWS Step Functions
Visibility	Harder to trace the full workflow	Full execution history & audit trail
Best for	Simple, independent workflows	Complex multi-step processes with error handling

Fanout Pattern

An e-commerce order event is published to an SNS topic. Three SQS queues subscribe: one for inventory, one for shipping, one for email notifications. All three process the order in parallel. This is the classic SNS fanout pattern — one message, multiple independent consumers.

Stateful vs. Stateless

Stateless: No session data stored in the application instance — any request can go to any instance. Store state externally (DynamoDB, ElastiCache, S3). Lambda functions are stateless by design.
Stateful: Session data in memory — the same client must hit the same instance. Requires sticky sessions (ALB) or session replication. Harder to scale horizontally.

Tight vs. Loose Coupling

Tightly coupled: Service A calls Service B synchronously and waits. If B fails, A fails. Direct function calls or synchronous HTTP are examples.
Loosely coupled: Service A sends a message to a queue or topic and continues. B processes when ready. SQS, SNS, and EventBridge enable loose coupling.

Sync vs. Async Patterns

Synchronous	Asynchronous
Caller waits for response	Caller fires and continues
REST API calls, Lambda invoked by API GW	SQS, SNS, EventBridge, Lambda async invoke
Immediate feedback	Better scalability and resilience
Timeout risk under load	Backpressure handled by queue depth

💡

Mnemonic — CHESS: Choreography = each service reacts. Hard to trace. EventBridge/SNS. Step Functions = State machine for orchestration.

🎯

"Long-running multi-step workflow" → Step Functions Standard. "Multiple services react to same event" → SNS fanout to SQS. "Decouple producer from consumer" → SQS. "Route events based on rules" → EventBridge.

⚠️

Common traps:

"Microservices are always better than monolithic" — FALSE; microservices add operational complexity. For simple apps, a monolith is appropriate.
"Lambda functions are stateful within their execution environment" — PARTIALLY TRUE; the execution context can be reused between warm invocations, but you cannot rely on it. Store durable state externally.
"Choreography is better than orchestration because it's loosely coupled" — CONTEXT-DEPENDENT; orchestration (Step Functions) is preferred when you need visibility, error handling, and complex branching logic.

🔧 AWS SDK, APIs & Credential Chain

SDKHigh Frequency

▾

Every interaction with AWS from code goes through the SDK. Understanding the credential resolution chain and retry behavior is critical for DVA-C02.

SDK Credential Resolution Order (checked in order)

1. Explicit in code — hardcoded (never do this in production)
2. Environment variables — AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY
3. AWS credentials file — ~/.aws/credentials (default profile)
4. AWS config file — ~/.aws/config
5. Container credentials — ECS task role via metadata endpoint
6. Instance profile — EC2 IAM Role via IMDS (169.254.169.254)

Retry Logic & Exponential Backoff

AWS SDKs automatically retry on retryable errors (e.g., throttling ThrottlingException, server errors 5xx)
Exponential backoff: Wait 2ⁿ seconds × random jitter between retries — prevents thundering herd
Non-retryable errors (4xx like AccessDeniedException) are NOT retried by the SDK
When you receive ProvisionedThroughputExceededException from DynamoDB or SQS throttling, implement backoff in your code

Common API Patterns

Pattern	Description	Example
Paginator	Auto-handles NextToken pagination	`list_objects_v2` on large S3 buckets
Waiter	Poll until resource reaches target state	Wait for EC2 instance to be running
Presigned URL	Temporary, signed URL for S3 access	Upload/download without credentials
Batch operations	Reduce API calls by batching	DynamoDB BatchWriteItem, SQS SendMessageBatch

// Example: S3 presigned URL generation (Python boto3)
s3 = boto3.client('s3')
url = s3.generate_presigned_url(
    'get_object',
    Params={'Bucket': 'my-bucket', 'Key': 'file.pdf'},
    ExpiresIn=3600  # URL valid for 1 hour
)

🎯

Never hardcode credentials. On EC2 → use IAM Role. On Lambda → use execution role. On ECS → use task role. The SDK resolves credentials in the chain order above automatically.

⚠️

Common traps:

"SDK retries all errors automatically" — FALSE; only retryable errors (5xx, throttling). AccessDenied (403) is not retried.
"Presigned URLs can be used to grant permanent access to S3 objects" — FALSE; they are time-limited (max 7 days for SDK-generated, 12h for STS-based).
"Environment variable credentials override instance profile credentials" — TRUE; they are checked earlier in the chain (position 2 vs 6).

🌐 Amazon API Gateway — REST, HTTP & WebSocket APIs

API GatewayHigh Frequency

▾

API Gateway is the front door for serverless applications. The DVA-C02 exam tests all three API types, authorization methods, stages, and caching.

API Types Compared

Feature	REST API	HTTP API	WebSocket API
Protocol	HTTP/HTTPS	HTTP/HTTPS	WebSocket
Cost	Higher	~70% cheaper	Per message/connection
Features	Full-featured (caching, WAF, usage plans)	Lightweight, low-latency	Persistent connections
Auth	Lambda, Cognito, IAM, API Keys	Lambda, Cognito, IAM	Lambda authorizer
Best for	APIs needing advanced features	Simple proxies, microservices	Real-time (chat, notifications)

Authorization Methods

Authorizer Type	How It Works	Use Case
IAM Authorization	Caller signs request with Sig V4	Internal AWS service-to-service
Cognito User Pool	Validates JWT from Cognito	Web/mobile apps with Cognito login
Lambda Authorizer (Token)	Lambda receives Bearer token, returns IAM policy	Custom JWT, OAuth, 3rd-party IdP
Lambda Authorizer (Request)	Lambda receives full request (headers, query)	Complex auth logic, IP allow-listing
API Key	Key sent in header; not authentication — use for rate limiting	Usage plans, partner throttling

Stages & Stage Variables

Stages are named snapshots of your API (e.g., dev, staging, prod)
Stage variables are key-value pairs like environment variables — referenced as ${stageVariables.lambdaAlias} in integrations
Use stage variables to point different stages to different Lambda aliases or backends without changing the API definition
Canary deployments: Route a % of traffic to new deployment for testing before full release

Caching & Throttling

API Gateway cache: Caches responses per stage; configurable TTL (0–3600s); reduces Lambda invocations for repeated identical requests
Throttling: Default 10,000 RPS per account; per-method throttling possible via usage plans
Returns 429 Too Many Requests when throttled

Stage Variable Pattern

Your API has three stages. The Lambda integration URL uses ${stageVariables.lambdaAlias}. In dev stage, the variable = dev. In prod, = live. Deploying a new Lambda version to the dev alias doesn't affect production — the same API definition routes to the correct Lambda alias per stage.

💡

Auth choice: Internal AWS service → IAM. Mobile/web app users → Cognito User Pool. Custom/3rd-party token → Lambda Authorizer. Rate-limiting partners → API Keys + Usage Plans.

⚠️

Common traps:

"API Keys authenticate and authorize users" — FALSE; API Keys identify the calling application, not the user. They are used for throttling/metering, not security authorization.
"HTTP API supports caching and WAF integration" — FALSE; HTTP API is lightweight and does NOT support response caching or AWS WAF integration. Use REST API for these features.
"Caching is free" — FALSE; API Gateway caching is an additional charge based on cache size.

📨 Messaging & Eventing — SQS, SNS, EventBridge & Fanout

MessagingExam Fave

▾

Understanding when to use SQS vs SNS vs EventBridge is a core DVA-C02 skill. Know the delivery model, ordering guarantees, and failure handling for each.

Service Comparison

Feature	SQS	SNS	EventBridge
Model	Pull (consumer polls)	Push (fan-out)	Push (rule-based routing)
Consumers	One consumer per message	Multiple subscribers	Multiple targets per rule
Persistence	Up to 14 days	No storage (fire & forget)	No storage
Ordering	FIFO queue only	None	None
Exactly-once	FIFO only	No	No
Message size	256 KB	256 KB	256 KB

SQS Key Concepts

Visibility Timeout: How long a message is hidden after being received. If processing doesn't finish & delete within this window, the message becomes visible again → potential duplicate processing
Dead-Letter Queue (DLQ): After maxReceiveCount failed processing attempts, message is moved to DLQ for investigation
Long Polling: WaitTimeSeconds up to 20s — waits for messages instead of returning empty responses. Reduces API costs and false empties
FIFO Queues: Exactly-once processing, strict ordering, max 3,000 TPS (with batching) — use Message Group ID for ordered processing per group
Message Retention: Default 4 days, max 14 days

SNS Fanout Pattern

graph LR P["Publisher"] --> T["SNS Topic"] T --> Q1["SQS Queue A\n(Inventory)"] T --> Q2["SQS Queue B\n(Shipping)"] T --> Q3["SQS Queue C\n(Email)"] Q1 --> L1["Lambda\nProcessor"] Q2 --> L2["Lambda\nProcessor"] Q3 --> L3["Lambda\nProcessor"]

Amazon EventBridge

Serverless event bus — routes events based on content-based filtering rules
Event sources: AWS services, SaaS partners (Zendesk, Shopify), custom apps
Targets: Lambda, SQS, SNS, Step Functions, API GW, Kinesis, etc.
Event Patterns: Filter events by source, detail-type, and any field in the event JSON
Scheduled Rules (cron): Replace CloudWatch Events scheduled rules; run tasks on a schedule
Event Buses: Default (AWS services), custom (your app events), partner (SaaS events)

🎯

Route events to multiple targets based on rules → EventBridge. One message, multiple parallel consumers → SNS → SQS fanout. Ensure exactly-once ordered processing → SQS FIFO. Decouple and buffer with retry → SQS Standard + DLQ.

⚠️

Common traps:

"SQS Standard guarantees at-most-once delivery" — FALSE; Standard provides at-least-once (duplicates can occur). Only FIFO provides exactly-once.
"If a Lambda function fails to process an SQS message, the message is automatically sent to DLQ" — FALSE; it goes to DLQ only after maxReceiveCount failures. The message is retried first.
"SNS stores messages if a subscriber is unavailable" — FALSE; SNS is fire-and-forget. If the subscriber (e.g., HTTP endpoint) is down, the message is lost. Use SQS subscriber to persist.

🌊 Amazon Kinesis — Streaming Data

KinesisMedium Frequency

▾

Kinesis enables real-time data streaming. Know the difference between the Kinesis products and when to use each vs SQS.

Kinesis Products

Service	Purpose	Consumers	Key Feature
Kinesis Data Streams (KDS)	Custom real-time processing	Multiple (Lambda, KCL, Analytics)	Configurable retention 1–365 days; ordered per shard
Kinesis Data Firehose	Load streaming data to destinations	S3, Redshift, OpenSearch, Splunk	Fully managed, no custom consumer code needed
Kinesis Data Analytics	Real-time SQL/Flink on streams	KDS or Firehose as source	Detect anomalies, aggregate, filter in real-time

KDS Core Concepts

Shard: Unit of capacity — 1 MB/s write, 2 MB/s read per shard. Add shards to scale.
Partition Key: Determines which shard receives the record. Use high-cardinality keys for even distribution
Sequence Number: Unique per record per shard; ordered within a shard
Enhanced Fan-Out: Dedicated 2 MB/s per consumer per shard using HTTP/2 push — for multiple high-throughput consumers
Lambda + KDS: Lambda polls via event source mapping; batch size configurable; bisect on error for failed batches

Kinesis vs. SQS

	Kinesis Data Streams	SQS
Multiple consumers	Yes — same data read by multiple consumers	No — message consumed by one consumer
Ordering	Ordered per shard	FIFO only (SQS FIFO)
Replay	Yes — replay up to retention period	No — messages deleted after processing
Use case	Real-time analytics, event replay, audit logs	Task queue, job distribution, decoupling

🎯

"Multiple consumers process same data stream" → Kinesis. "Real-time clickstream analytics" → Kinesis. "Deliver streaming data to S3 without code" → Kinesis Firehose. "Decouple service X from service Y" → SQS.

⚠️

Common traps:

"Kinesis Firehose delivers data in real-time with sub-second latency" — FALSE; Firehose buffers data (minimum 60-second buffer interval or 1 MB) before delivering. It is near-real-time, not sub-second.
"Adding more shards to KDS increases read throughput for all consumers equally" — FALSE without Enhanced Fan-Out. With standard consumers, total read throughput is shared across consumers on the same shard (2 MB/s shared).

Task 1.2

Develop Code for AWS Lambda

Lambda fundamentals, configuration, event sources, error handling, VPC access, and performance tuning.

Skills in:

Describe access to private resources in VPCs from Lambda code
Configure Lambda functions: environment variables, memory, concurrency, timeout, runtime, handler, layers, extensions, triggers, destinations
Handle the event lifecycle and errors using code (Lambda Destinations, dead-letter queues)
Write and run test code; integrate Lambda with AWS services
Tune Lambda functions for optimal performance; process and transform data in near real time

λ Lambda Fundamentals — Execution Model & Cold Starts

LambdaExam Fave

▾

Lambda is the core of serverless development on AWS. Understanding its execution lifecycle, concurrency model, and cold start behavior is essential.

Execution Lifecycle

Init phase: Download code, start runtime, run initialization code outside handler. Happens on cold start. Keep this fast.
Invoke phase: Run the handler function. Called for every invocation.
Shutdown phase: Runtime receives shutdown signal; run cleanup extensions.

Cold Start vs. Warm Start

	Cold Start	Warm Start
When	First invocation or after scale-out/idle	Reuse of existing execution environment
Latency	Higher (100ms–several seconds, depending on runtime)	Near-zero overhead
Init code runs?	Yes	No
Global variables cached?	No (fresh environment)	Yes (reused from previous invocation)

Handler Structure

# Python handler — event is the trigger payload, context has metadata
def lambda_handler(event, context):
    print(context.function_name)
    print(context.get_remaining_time_in_millis())
    return {
        "statusCode": 200,
        "body": "Hello from Lambda"
    }

Key Configuration Parameters

Parameter	Range / Default	Impact
Memory	128 MB – 10,240 MB	CPU and network bandwidth scale proportionally with memory
Timeout	3s default, max 15 min	Function killed if exceeded; set higher than p99 duration
Ephemeral Storage (/tmp)	512 MB – 10,240 MB	Temporary file storage within execution environment
Concurrency	Default account limit: 1,000	Max simultaneous executions across all functions

Performance Optimization Tips

Initialize outside handler: DB connections, SDK clients, config loading — cached across warm invocations
Right-size memory: More memory = more CPU = faster execution. AWS Lambda Power Tuning tool finds optimal setting
Minimize deployment package size: Faster cold starts; use Layers for shared dependencies
Avoid recursive invocations: Lambda calling Lambda in a loop can rapidly exhaust concurrency and cause runaway costs

💡

Cold start optimization rule: Move everything you can outside the handler. Connections, clients, and config load once on cold start, then are reused across warm invocations.

🎯

Eliminate cold starts entirely → Provisioned Concurrency (pre-warms N execution environments). Java cold starts are long → use Lambda SnapStart (snapshots initialized environment). "Function needs more CPU" → increase memory allocation.

⚠️

Common traps:

"Lambda timeout is 15 minutes by default" — FALSE; default is 3 seconds. 15 minutes is the maximum.
"Increasing Lambda memory only helps memory-bound functions" — FALSE; memory also increases allocated vCPU, so CPU-bound functions also benefit.
"Global variables persist indefinitely between invocations" — FALSE; they persist only within the same execution environment instance. New instances start fresh.

🔗 Lambda Event Sources, Triggers & Concurrency

LambdaHigh Frequency

▾

Invocation Types

Type	Behavior	Retry on failure	Triggered by
Synchronous	Caller waits for response	No (caller handles errors)	API GW, ALB, SDK invoke, Cognito triggers
Asynchronous	Lambda queues event, returns 202 immediately	Up to 2 retries (auto)	S3 events, SNS, EventBridge, SES
Event Source Mapping (poll-based)	Lambda polls the source on your behalf	Retries until success or expiry	SQS, Kinesis, DynamoDB Streams, MSK

Concurrency Types

Type	Purpose	Behavior
Unreserved Concurrency	Default — shared pool across all functions	Throttles when account limit reached
Reserved Concurrency	Guarantee max N for one function; cap others	Function cannot exceed N; other functions cannot use N
Provisioned Concurrency	Pre-initialize N environments to eliminate cold starts	Costs money even when idle; enables smooth scaling

Reserved Concurrency as a Throttle

You have a Lambda that processes SQS messages and writes to RDS. RDS max connections = 100. Setting Lambda's reserved concurrency to 50 ensures at most 50 concurrent DB connections, protecting RDS from connection storms during traffic spikes. The RDS Proxy is also a solution but reserved concurrency is simpler to configure.

Event Source Mapping (SQS, Kinesis, DDB Streams)

Lambda polls the source and invokes your function with a batch of records
Batch size: SQS (1–10,000), Kinesis (1–10,000), DDB Streams (1–10,000)
Batch window: Lambda waits up to N seconds to fill a batch before invoking
On failure: For SQS, failed batch items are returned to queue. Use bisect on error to split batches and isolate poison pill messages
Partial Batch Response: Return batchItemFailures to tell Lambda which items failed — only those are retried

🎯

"Lambda hammering RDS with connections" → Set reserved concurrency OR use RDS Proxy. "Cold start latency for critical path" → Provisioned Concurrency. "Lambda should not exceed X invocations" → Reserved Concurrency.

⚠️

Common traps:

"Reserved Concurrency eliminates cold starts" — FALSE; Provisioned Concurrency eliminates cold starts. Reserved only caps concurrency.
"S3 event notifications invoke Lambda synchronously" — FALSE; S3 invokes Lambda asynchronously. The call returns immediately and Lambda retries on failure.
"SQS FIFO queues support exactly-once Lambda processing automatically" — PARTIALLY TRUE; FIFO provides exactly-once delivery, but your Lambda handler must still be idempotent for robustness.

🚨 Lambda Error Handling — DLQ, Destinations & Retry Behavior

LambdaHigh Frequency

▾

Async Invocation Retry & DLQ

On failure, Lambda retries async invocations up to 2 times (3 total attempts)
Events can wait in the async event queue for up to 6 hours
DLQ: Configure an SQS queue or SNS topic as the DLQ. Failed events (after all retries) are sent here for investigation
DLQ only captures the final failure; Lambda Destinations capture success OR failure at each attempt

Lambda Destinations (preferred over DLQ)

Destination Type	On Success	On Failure
SQS Queue	✅	✅
SNS Topic	✅	✅
EventBridge Bus	✅	✅
Another Lambda Function	✅	❌

DLQ vs. Destinations

DLQ: only captures the event payload when all retries are exhausted. Lambda Destinations: sends a richer record (original event + function response/error + execution context) for both success and failure conditions. Destinations are more flexible and are the modern recommendation.

Error Handling in Event Source Mappings

Kinesis/DDB Streams: If batch fails, Lambda retries the entire batch until success OR the data expires from the stream. Risk of blocking the shard. Configure bisectBatchOnFunctionError to split the batch and isolate poison pills.
SQS: Failed messages return to queue and become visible after visibility timeout. After maxReceiveCount, moved to DLQ.
Partial Batch Response (SQS/Kinesis): Return {"batchItemFailures": [{"itemIdentifier": "..."}]} to retry only specific messages, not the whole batch

🎯

Modern pattern: Use Lambda Destinations over DLQ for async functions — you get richer metadata and can route success and failure independently. Use DLQ for simplicity or SQS event source mapping failures.

⚠️

Common traps:

"DLQ works for all Lambda invocation types" — FALSE; DLQ only works for async invocations. For event source mappings (SQS/Kinesis), configure the source's own DLQ.
"Lambda automatically retries synchronous invocations" — FALSE; with synchronous invocations, the caller receives the error and must implement its own retry logic.

🔒 Lambda in VPC, Layers & Environment Variables

LambdaVPC

▾

Lambda in VPC

By default, Lambda runs in an AWS-managed VPC with internet access but no access to your private VPC resources (RDS, ElastiCache, internal ALBs)
Configure Lambda with VPC subnets + security groups → Lambda creates an ENI (Elastic Network Interface) in your subnet
Internet access in VPC: Lambda in a private subnet needs a NAT Gateway to reach the internet. Lambda in a VPC does NOT get internet access via a public subnet without NAT
VPC Endpoints: Use to privately access AWS services (S3, DynamoDB) from Lambda-in-VPC without NAT Gateway

Lambda Layers

Layers are ZIP archives containing libraries, a custom runtime, or data — shared across multiple functions
Up to 5 layers per function; combined unzipped size ≤ 250 MB
Common use: Share boto3, numpy, pandas, or common utility code across Lambda functions
Layers are versioned — reference a specific layer version ARN in your function config

Environment Variables

Key-value pairs available to function code via os.environ (Python) or process.env (Node.js)
Encrypted at rest using KMS (Lambda service key by default, or your CMK)
Do NOT store plaintext secrets in environment variables — use Secrets Manager or SSM Parameter Store and fetch at init time
Max 4 KB total for all environment variables combined

Lambda Extensions

Run alongside the Lambda function in the same execution environment
Use cases: monitoring agents, security agents, secret rotation, log forwarding (e.g., Datadog, New Relic agents)
Internal extensions: run in the same process. External extensions: run as separate processes

🎯

"Lambda needs to access RDS in private subnet" → Configure Lambda with same VPC + security group allowing DB port. "Lambda needs to call external API but also access RDS in VPC" → Lambda in private subnet + NAT Gateway for external internet. "Share common library across 10 Lambda functions" → Lambda Layer.

⚠️

Common traps:

"Putting Lambda in a public subnet gives it internet access" — FALSE; Lambda in a VPC (even public subnet) does NOT get internet access automatically. You need a NAT Gateway in a public subnet, with Lambda in a private subnet routing to it.
"Lambda can have unlimited layers" — FALSE; maximum 5 layers per function.
"Lambda environment variables are always secure" — PARTIALLY TRUE; they are encrypted at rest with KMS, but plaintext secrets are visible to anyone with IAM GetFunctionConfiguration permission. Use Secrets Manager for true secrets.

Task 1.3

Use Data Stores in Application Development

DynamoDB, S3, RDS/Aurora, ElastiCache, and specialized data stores — selection, access patterns, and caching.

Skills in:

Describe high-cardinality partition keys, database consistency models, and differences between query and scan operations
Define Amazon DynamoDB keys and indexing (LSI, GSI)
Serialize and deserialize data; manage data lifecycles
Use data caching services; use specialized data stores based on access patterns (e.g., Amazon OpenSearch Service)

🗃️ Amazon DynamoDB — Keys, Indexes & Data Modeling

DynamoDBExam Fave

▾

DynamoDB is a fully managed NoSQL database. The DVA-C02 exam heavily tests data modeling, index selection, and read/write patterns.

Key Structure

Key Type	Components	Uniqueness
Simple Primary Key	Partition Key (PK) only	PK must be unique across all items
Composite Primary Key	Partition Key (PK) + Sort Key (SK)	PK+SK combo must be unique

Indexes

	LSI (Local Secondary Index)	GSI (Global Secondary Index)
Partition Key	Same as base table	Can be any attribute
Sort Key	Different from base table	Can be any attribute
Creation time	At table creation only	Anytime
Consistency	Strongly consistent reads supported	Eventually consistent only
Storage	Within same partition as base table	Separate partition space
Max per table	5	20

Single-Table Design Example

A social media app stores users and posts. PK = USER#userId, SK = PROFILE for user items. PK = USER#userId, SK = POST#timestamp for post items. One table, multiple entity types — query all posts for a user with a single Query operation.

Partition Key Selection

High-cardinality keys = even data distribution across partitions = avoid hot partitions
Good: User ID, Order ID, Session ID (high cardinality, random distribution)
Bad: Country, Status, Boolean flag (low cardinality = hot partition)
Write sharding: If you must use a low-cardinality key, append a random suffix (1-N) to distribute writes, then use a GSI or Query across shards to read

Capacity Modes

	Provisioned Mode	On-Demand Mode
Billing	Per RCU/WCU provisioned	Per request (read/write)
Predictability	Predictable traffic patterns	Unknown or variable traffic
Auto Scaling	Supported (but lags behind spikes)	Instant — no pre-planning

💡

LSI vs. GSI: Local = same partition key, Limited to creation time. Global = any key, Great flexibility. If you might need different access patterns later, plan for GSIs at design time.

⚠️

Common traps:

"GSIs support strongly consistent reads" — FALSE; GSIs are always eventually consistent. LSIs support both.
"You can add an LSI to an existing table" — FALSE; LSIs must be created with the table. GSIs can be added anytime.
"DynamoDB automatically distributes data evenly across partitions" — TRUE only if your partition key has high cardinality. A poorly chosen PK leads to hot partitions and throttling.

📖 DynamoDB — Query vs. Scan, Consistency & Advanced Features

DynamoDBHigh Frequency

▾

Query vs. Scan

	Query	Scan
Requires	Partition key value (mandatory)	Nothing (reads entire table)
Optional filter	Sort key condition + filter expression	Filter expression
Efficiency	Efficient — reads only matching items	Expensive — reads all items, filters after read
Cost	Low (reads proportional to results)	High (reads entire table)
Use	Production access pattern	Avoid in production; admin/migration use

Consistency Models

	Eventually Consistent	Strongly Consistent
Latency	Lower	Slightly higher
Cost	0.5 RCU per 4 KB	1 RCU per 4 KB
Reads latest data?	Not guaranteed (may see stale data)	Always latest committed data
Default	Yes	Opt-in via ConsistentRead=true

DynamoDB Streams

Ordered log of item-level changes (INSERT, MODIFY, REMOVE) — retained 24 hours
View types: KEYS_ONLY, NEW_IMAGE, OLD_IMAGE, NEW_AND_OLD_IMAGES
Trigger Lambda functions for change-data-capture patterns (e.g., update search index, send notification)

DynamoDB Accelerator (DAX)

In-memory cache specifically for DynamoDB — microsecond read latency (vs milliseconds)
Fully API-compatible with DynamoDB SDK — change endpoint, no code rewrite needed
Cache hit: response from DAX. Cache miss: DAX reads from DynamoDB and caches result
Ideal for: read-heavy workloads, hot items, gaming leaderboards, social feeds
Does NOT help for write-heavy or strongly consistent read workloads

TTL (Time to Live)

Automatically deletes expired items based on a timestamp attribute — no charge for TTL deletes
Use for session data, temporary tokens, cart items with expiry
TTL deletions are eventually consistent — items may be visible for up to 48 hours after expiry

🎯

"DynamoDB reads are slow" → Add DAX. "Need to trigger Lambda on table changes" → DynamoDB Streams + Lambda event source mapping. "Auto-expire session data" → TTL attribute. "Read only latest price for a product" → Query with ConsistentRead=true (pay double RCU).

⚠️

Common traps:

"Scan with FilterExpression only reads filtered items" — FALSE; FilterExpression applies AFTER reading the data. You still pay for all read items before filtering — use Query instead.
"DAX works for all DynamoDB operations including Scan and transactional" — FALSE; DAX caches GetItem and Query results. Scans pass through DAX to DynamoDB. Transactional API calls bypass DAX.

🪣 Amazon S3 — Developer Access Patterns

S3High Frequency

▾

Key Developer APIs

Feature	Description	Use Case
Presigned URL (GET)	Temporary URL to download object; no AWS credentials needed by caller	Share private files with users
Presigned URL (PUT)	Temporary URL to upload directly to S3 without going through your server	Client-side file uploads (browser, mobile)
Multipart Upload	Upload large files in parts (required >5 GB, recommended >100 MB)	Large file uploads with resume capability
S3 Select	SQL query to retrieve subset of data from CSV/JSON/Parquet object	Reduce data transfer for log analysis
Event Notifications	Trigger SNS, SQS, or Lambda on object events	Thumbnail generation, virus scanning

Consistency Model

S3 provides strong read-after-write consistency for new objects and overwrites (as of December 2020)
After a successful PUT, any GET of that key returns the latest version
Applies to both objects and object listings

S3 Event Notifications

Trigger on: s3:ObjectCreated:*, s3:ObjectRemoved:*, s3:Replication:*, etc.
Destinations: SQS, SNS, Lambda, EventBridge
For complex routing (multiple targets, filtering), use EventBridge as the destination — then route to multiple targets via EventBridge rules

// Presigned URL generation (SDK v3, JavaScript)
const { getSignedUrl } = require("@aws-sdk/s3-request-presigner");
const url = await getSignedUrl(s3Client,
  new GetObjectCommand({ Bucket, Key }),
  { expiresIn: 3600 }
);

🎯

"User uploads large file directly to S3" → Presigned PUT URL (server generates URL, client uploads directly — bypasses your server). "Trigger image resizing Lambda on upload" → S3 Event Notification → Lambda. "Multiple services need to react to the same S3 event" → S3 Event Notification → EventBridge → multiple targets.

⚠️

Common traps:

"S3 is eventually consistent for new PUTs" — FALSE (post-2020); S3 now provides strong consistency for all operations including new object creation.
"Multipart upload can be used for objects less than 5 MB" — technically possible but not meaningful. Parts must be at least 5 MB (except last part).
"S3 event notifications can trigger multiple Lambda functions directly" — FALSE; each event configuration has one destination. Use SNS fanout or EventBridge for multiple targets.

⚡ Amazon ElastiCache — Caching Patterns & RDS Connection Pooling

ElastiCacheMedium Frequency

▾

Redis vs. Memcached

Feature	Redis	Memcached
Data structures	Rich (lists, sets, sorted sets, hashes, bitmaps)	Simple key-value strings only
Persistence	Yes (RDB snapshots, AOF)	No (in-memory only)
Replication/HA	Yes (primary-replica, Multi-AZ)	No
Pub/Sub	Yes	No
Cluster mode	Yes (sharding)	Yes (multi-node, no replication)
Best for	Sessions, leaderboards, queues, complex caching	Simple caching, multi-threaded scale-out

Caching Patterns

Pattern	Read Flow	Write Flow	Drawback
Lazy Loading (Cache-Aside)	Check cache → miss → read DB → store in cache	Write to DB only (cache updated on next miss)	Cache miss = 3 trips; potential stale data
Write-Through	Always read from cache	Write to DB AND cache simultaneously	Cache bloat (items written but never read)
Write-Behind (Write-Back)	Read from cache	Write to cache, async write to DB	Data loss risk if cache fails before DB write

RDS Proxy

Sits between your application and RDS — pools and multiplexes database connections
Critical for Lambda → RDS: Lambda can open thousands of concurrent connections during spikes; RDS Proxy limits actual DB connections
Reduces failover time for Aurora Multi-AZ by maintaining connections through DB failover
Supports MySQL, PostgreSQL, MariaDB, Aurora (not Oracle)

🎯

"Leaderboard with real-time rank" → Redis sorted set. "Lambda connecting to RDS causes too many connections" → RDS Proxy. "Cache frequently read DB results" → ElastiCache lazy loading (cache-aside). "Need HA cache with failover" → Redis with Multi-AZ enabled.

⚠️

Common traps:

"Memcached supports HA with primary-replica failover" — FALSE; Memcached is a distributed cache with no replication. If a node fails, that data is lost. Use Redis for HA.
"DAX and ElastiCache serve the same purpose" — FALSE; DAX is DynamoDB-specific and API-compatible. ElastiCache is a general-purpose cache for any backend. Use DAX for DynamoDB, ElastiCache for RDS/MySQL/etc.

Domain 2 Overview

Secure applications and data on AWS. Covers authentication and authorization (Cognito, IAM, STS), encryption at rest and in transit (KMS, ACM), and management of sensitive data (Secrets Manager, Parameter Store).

⚡ 26% of scored content

Task 2.1

Implement Authentication and/or Authorization

IAM roles, Cognito User & Identity Pools, bearer tokens, STS, programmatic access, and cross-service auth.

Skills in:

Use an identity provider to implement federated access (Amazon Cognito, IAM)
Secure applications by using bearer tokens; configure programmatic access to AWS
Assume an IAM role; define permissions for IAM principals
Implement application-level authorization for fine-grained access control
Handle cross-service authentication in microservice architectures

🔑 IAM for Developers — Roles, STS & Programmatic Access

IAMHigh Frequency

▾

Developers interact with IAM primarily through roles and the STS credential chain. Understanding how services assume roles and how temporary credentials flow is fundamental to DVA-C02.

SDK Credential Resolution Chain (in order)

1. Code-level credentials (never use in production)
2. Environment variables: AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, AWS_SESSION_TOKEN
3. AWS credentials file: ~/.aws/credentials
4. AWS config file: ~/.aws/config
5. ECS container credentials (task role via metadata endpoint)
6. EC2 Instance Profile / Lambda execution role (IMDS endpoint)

Key STS API Calls

API	Use Case
`AssumeRole`	Cross-account or service-to-service, Lambda assume role
`AssumeRoleWithWebIdentity`	Federated via OIDC (Google, Amazon, GitHub Actions)
`AssumeRoleWithSAML`	Federated via enterprise IdP (ADFS, Okta) using SAML 2.0
`GetSessionToken`	Add MFA requirement to existing IAM user session

IAM Policy Evaluation

Default = implicit Deny
Explicit Deny always wins — overrides any Allow
Action allowed only when: explicit Allow exists AND no Deny blocks it at any level (SCP → Permission Boundary → Identity Policy → Resource Policy)

Cross-Service Auth in Microservices

Service A (ECS task) calls Service B (Lambda via API Gateway). Pattern: ECS task has an IAM task role. Task calls STS AssumeRole to get temporary credentials for Service B's IAM role, or passes a signed request (Signature Version 4). The Lambda function's resource policy must allow Service A's role to invoke it.

🎯

Lambda execution role — the role the Lambda function assumes. Defined in Role in function config. Lambda resource-based policy — who can invoke the function (e.g., API Gateway, S3). Both must be correct for cross-service invocations.

⚠️

Common traps:

"A Lambda function with an execution role can call any AWS API" — FALSE; the execution role must explicitly allow the specific API actions needed.
"IAM permissions boundaries grant additional permissions" — FALSE; boundaries only restrict the maximum permissions — they never grant by themselves.
"ECS task role and EC2 instance role are the same concept" — conceptually similar but different configuration. ECS uses taskRoleArn in the task definition.

👤 Amazon Cognito — User Pools vs. Identity Pools

CognitoExam Fave

▾

Cognito is the primary AWS service for application-level auth. The exam frequently tests the distinction between User Pools (authentication) and Identity Pools (authorization for AWS resources).

User Pools vs. Identity Pools

	User Pools	Identity Pools (Federated Identities)
Purpose	User directory + authentication	Exchange any token for AWS credentials
Returns	JWT tokens (id, access, refresh)	Temporary STS credentials (IAM role)
Supported IdPs	Username/password, Google, Facebook, SAML, OIDC	Cognito User Pool, Google, Facebook, SAML, unauthenticated
AWS API access	No (tokens for app-level auth only)	Yes (STS creds for S3, DynamoDB, etc.)
Use case	Login for web/mobile app	Let users directly call AWS APIs (e.g., upload to S3)

Cognito User Pool — JWT Tokens

ID Token: Contains user identity claims (sub, email, custom attributes). Verify with your API.
Access Token: Authorize against Cognito APIs (e.g., update user attributes). Use as Bearer token in API Gateway Cognito authorizer.
Refresh Token: Long-lived (default 30 days). Exchange for new id/access tokens without re-login.

Common Integration Pattern

sequenceDiagram participant User participant UP as Cognito User Pool participant IP as Cognito Identity Pool participant S3 User->>UP: Login (username/password) UP-->>User: JWT Tokens (id, access, refresh) User->>IP: Exchange JWT for AWS credentials IP-->>User: Temp STS credentials (IAM Role) User->>S3: Upload file using STS credentials

Lambda Triggers

Pre-signup: Validate or auto-confirm users during registration
Pre-token generation: Add custom claims to tokens
Post-authentication: Log or audit successful logins
Custom message: Customize verification/MFA messages
User migration: Migrate users from legacy system on first login

💡

User Pool = Who are you? (AuthN) → returns JWT. Identity Pool = What can you do on AWS? (AuthZ) → returns STS credentials. Need both for "login with Google then upload to S3."

🎯

"Mobile app users log in and upload to S3 directly" → User Pool (authenticate) + Identity Pool (get STS creds for S3). "API Gateway validates JWT from Cognito" → Cognito User Pool authorizer on API GW. "Add user's department to JWT claims" → Pre-token generation Lambda trigger.

⚠️

Common traps:

"Cognito User Pool tokens grant direct access to AWS services like S3" — FALSE; User Pool tokens are JWTs for app-level auth. You need Identity Pools to exchange them for STS credentials.
"Identity Pools require Cognito User Pools as the IdP" — FALSE; Identity Pools support any OIDC-compatible IdP (Google, Facebook, Apple) as well as unauthenticated (guest) access.

🛡️ API Gateway Authorization — Lambda Authorizers & Bearer Tokens

API GatewayIAM

▾

Authorization Method Decision Tree

Scenario	Recommended Authorizer
Internal AWS service calling API	IAM Authorization (Sig V4)
Users authenticated via Cognito User Pool	Cognito User Pool Authorizer
Custom JWT from 3rd-party IdP (Auth0, Okta)	Lambda Authorizer (Token type)
Complex rules: IP allowlist, multi-header logic	Lambda Authorizer (Request type)
Throttle / meter partner integrations	API Keys + Usage Plans

Lambda Authorizer Flow

API GW invokes your Lambda with the Bearer token (or full request)
Lambda validates the token (e.g., verifies JWT signature, checks expiry)
Lambda returns an IAM policy document: {"principalId": "user123", "policyDocument": {...}}
Policy caching: API GW caches the returned policy by token (TTL configurable). Reduces Lambda invocations per request.

# Lambda authorizer response structure
{
  "principalId": "user-123",
  "policyDocument": {
    "Version": "2012-10-17",
    "Statement": [{
      "Action": "execute-api:Invoke",
      "Effect": "Allow",
      "Resource": "arn:aws:execute-api:*:*:*"
    }]
  },
  "context": { "userId": "user-123" }
}

🎯

Pass context from authorizer to Lambda backend: Add context field in authorizer response. The backend Lambda receives it via event.requestContext.authorizer.userId. Avoids redundant token parsing in every backend function.

⚠️

Common traps:

"Lambda authorizer is called for every API request" — FALSE when caching is enabled. The policy is cached by token value for the configured TTL.
"API Keys provide strong security authentication" — FALSE; API Keys are for usage tracking and throttling, not for authentication. Never use them as a security mechanism.

Task 2.2

Implement Encryption by Using AWS Services

KMS, envelope encryption, S3 encryption options, in-transit TLS, ACM, and key rotation.

Skills in:

Define encryption at rest and in transit; describe differences between client-side and server-side encryption
Describe certificate management (AWS Private CA); use encryption keys to encrypt or decrypt data
Generate certificates and SSH keys for development; use encryption across account boundaries
Enable and disable key rotation

🔐 AWS KMS — Key Types, Key Policy & Envelope Encryption

KMSExam Fave

▾

AWS KMS is the foundation of encryption on AWS. Every developer must understand the three key types, envelope encryption, and how to call KMS APIs from code.

KMS Key Types

Key Type	Who Manages	Key Policy	Cost	Use Case
AWS Managed Key	AWS (auto-rotated every year)	Managed by AWS	Free	Default for S3, RDS, EBS (e.g., `aws/s3`)
Customer Managed Key (CMK)	You (optional auto-rotation)	You control it	$1/month + API calls	Custom encryption, cross-account, key policy control
Imported Key Material	You (bring your own key)	You control it	$1/month	Compliance: must own key material

Envelope Encryption

KMS can only encrypt data up to 4 KB directly. For larger data, use envelope encryption:
Step 1: Call GenerateDataKey → KMS returns a plaintext data key (DEK) AND an encrypted DEK
Step 2: Encrypt your data locally with the plaintext DEK (AES-256)
Step 3: Store the encrypted DEK alongside the encrypted data. Discard the plaintext DEK.
Decryption: Call Decrypt with the encrypted DEK → get plaintext DEK → decrypt data locally

Encryption Context

Optional additional authenticated data (AAD) — key-value pairs sent with encrypt/decrypt calls
If you encrypt with context {"purpose":"payments"}, you must provide the same context to decrypt
Appears in CloudTrail logs — useful for auditing which application/service used a key

Key Policy Essentials

Every KMS key has a key policy (resource-based policy). Unlike IAM, the key policy must explicitly allow the AWS account or it has no access — even root.
Cross-account: Key policy must allow the external account → then IAM policy in that account must also allow kms:Decrypt

Key Rotation

	Automatic Rotation	Manual Rotation
Frequency	Every 365 days (cannot customize)	Any schedule you choose
Old key material	Retained for decryption of old ciphertext	Must keep old key alias active
Application change?	No (same Key ID)	Yes (update alias to point to new key)
Available for	CMKs only (not imported key material)	All key types

💡

Envelope Encryption = DEK inside KMS envelope: Your data is encrypted with a local DEK. The DEK is "wrapped" (encrypted) by KMS. Only KMS can unwrap it. The data never goes to KMS.

🎯

"Encrypt a 500 MB file with KMS" → Use GenerateDataKey + encrypt locally (envelope encryption). "Compliance requires 90-day rotation" → Manual key rotation (automatic is fixed at 365 days). "Audit which app used KMS" → Add Encryption Context + query CloudTrail.

⚠️

Common traps:

"AWS Managed Keys can be disabled or deleted by the customer" — FALSE; you have no control over AWS Managed Keys. Use CMKs for control.
"KMS automatic rotation changes the Key ID" — FALSE; rotation only changes the key material. The Key ID and alias stay the same — no application changes needed.
"Imported key material supports automatic rotation" — FALSE; automatic rotation is only for CMKs with KMS-generated key material.

🔒 Encryption Options — S3, At Rest vs. In Transit, Client vs. Server-Side

S3KMSHigh Frequency

▾

S3 Server-Side Encryption Options

Method	Key Management	Key Access	Use Case
SSE-S3	S3 manages keys entirely	Transparent to user	Default encryption; no key management needed
SSE-KMS	AWS KMS CMK	You control key policy; CloudTrail logs access	Audit who accessed, cross-account encryption
SSE-C	Customer provides key in request header	AWS never stores the key	Compliance: must own key material; key not on AWS
Client-Side	Encrypted before sending to S3	AWS never sees plaintext	Maximum control; AWS cannot access data even with account compromise

Encryption in Transit

All AWS SDK and console calls use HTTPS (TLS) by default
Enforce HTTPS on S3: Bucket policy with Deny when aws:SecureTransport = false
ACM (AWS Certificate Manager): Free public TLS certificates for ALBs, CloudFront, API GW. Auto-renews.
CloudFront certificates must be in us-east-1 region — hard exam fact

Client-Side vs. Server-Side

	Server-Side (SSE)	Client-Side
Who encrypts	AWS service (S3, RDS, etc.)	Your application code
AWS sees plaintext?	Briefly (SSE-S3, SSE-KMS)	Never
Complexity	Low — checkbox / config	Higher — you manage encryption library
Performance overhead	None to you	CPU on client

🎯

"AWS must NEVER see the key" → SSE-C or Client-Side encryption. "Audit every access to S3 objects" → SSE-KMS (every decrypt call appears in CloudTrail). "Simplest encryption with no management" → SSE-S3 (enabled by default on all new buckets since Jan 2023).

⚠️

Common traps:

"SSE-KMS means AWS cannot read your data" — FALSE; with SSE-KMS, AWS KMS holds the key. With SSE-C or client-side encryption, AWS cannot read plaintext. Compliance may require one over the other.
"ACM certificates can be used in any region for CloudFront" — FALSE; CloudFront requires ACM certificates in us-east-1.

Task 2.3

Manage Sensitive Data in Application Code

Secrets Manager, Parameter Store, environment variable encryption, data classification, and masking.

Skills in:

Describe data classification (PII, PHI); encrypt environment variables containing sensitive data
Use secret management services to secure sensitive data
Sanitize sensitive data; implement application-level data masking and sanitization
Implement data access patterns for multi-tenant applications

🗝️ AWS Secrets Manager vs. SSM Parameter Store

SecretsExam Fave

▾

Choosing between Secrets Manager and Parameter Store is a common DVA-C02 scenario. Secrets Manager is purpose-built for secrets with rotation; Parameter Store is for configuration and simple secrets.

Comparison

Feature	Secrets Manager	SSM Parameter Store
Purpose	Secrets (DB passwords, API keys)	Config + secrets
Auto rotation	✅ Native (uses Lambda)	❌ No native rotation
RDS integration	✅ Native DB credential rotation	❌ Manual only
Encryption	Always KMS-encrypted	SecureString uses KMS; String is plaintext
Cost	$0.40/secret/month + $0.05/10K API calls	Free (Standard) / $0.05/10K (Advanced)
Max value size	64 KB	4 KB (Standard), 8 KB (Advanced)
Hierarchy / paths	Limited naming	✅ Path-based: `/app/prod/db-url`
Best for	Database passwords, API keys needing rotation	App config, feature flags, non-rotating secrets

Secrets Manager Auto Rotation

Rotation uses a Lambda function (AWS provides templates for RDS, Redshift, DocumentDB)
Rotation schedule: interval in days or cron expression
On rotation: new secret version is created → application fetches latest automatically using the AWSCURRENT label
Versions: AWSCURRENT (active), AWSPREVIOUS (just rotated), AWSPENDING (being created)

Best Practice: Fetch at Init, Cache in Memory

# Fetch secret once at Lambda cold start — cache for warm invocations
import boto3, json
client = boto3.client('secretsmanager')
secret = json.loads(client.get_secret_value(
    SecretId='prod/myapp/db'
)['SecretString'])
DB_PASSWORD = secret['password']  # cached globally

🎯

"DB password needs auto-rotation every 30 days" → Secrets Manager. "App config values like feature flags" → Parameter Store (free). "Multiple apps share same config path" → Parameter Store with path hierarchy. "Secrets Manager or Env Vars?" → Always Secrets Manager for production secrets.

⚠️

Common traps:

"SSM Parameter Store SecureString auto-rotates" — FALSE; you must write your own rotation logic. Secrets Manager has this built-in.
"Storing secrets in Lambda environment variables is secure" — PARTIALLY; env vars are KMS-encrypted at rest but are visible in the Lambda console and to any principal with GetFunctionConfiguration access. Use Secrets Manager for sensitive production credentials.

🏷️ Data Classification, Masking & Multi-Tenant Patterns

SecurityMedium Frequency

▾

Data Classification

Type	Examples	Regulation
PII (Personally Identifiable Information)	Name, email, SSN, phone, IP address	GDPR, CCPA
PHI (Protected Health Information)	Medical records, diagnoses, insurance IDs	HIPAA
PCI-DSS	Credit card numbers, CVVs	PCI-DSS

Data Masking & Sanitization

Masking: Replace sensitive data with a representative value (e.g., 4111-****-****-1234 for credit card)
Tokenization: Replace sensitive data with a non-sensitive token; original data stored in a secure vault
Sanitization: Remove or replace dangerous characters to prevent injection (SQL injection, XSS)
Never log PII/PHI — use structured logging with field-level exclusion

Multi-Tenant Data Isolation Patterns

Pattern	Isolation Level	AWS Implementation
Silo (per-tenant DB)	High (physical isolation)	Separate RDS instances; separate DynamoDB tables
Pool (shared DB, tenant ID column)	Medium (logical isolation)	DynamoDB partition key = tenantId; RDS with row-level security
Bridge (hybrid)	Medium-High	Shared infrastructure, per-tenant encryption keys (KMS per tenant)

🎯

"Scan S3 for PII/PHI" → Amazon Macie. "Multi-tenant app — isolate data per tenant in DynamoDB" → Partition key = tenantId. Use IAM condition dynamodb:LeadingKeys to enforce each user can only access their own partitions via Cognito Identity Pool role.

Domain 3 Overview

Package, deploy, and automate the release of AWS applications. Covers AWS SAM, CloudFormation, CDK, CI/CD pipeline services (CodeBuild, CodeDeploy, CodePipeline), deployment strategies, and testing approaches.

⚡ 24% of scored content

Task 3.1

Prepare Application Artifacts to be Deployed to AWS

Lambda packaging, AWS SAM, CloudFormation, CDK, AppConfig, and IaC templates.

Skills in:

Manage dependencies within the package (env vars, config files, container images)
Organize files and directory structure for deployment
Use code repositories in deployment environments
Apply application requirements for resources (memory, cores)
Prepare application configurations for specific environments (using AWS AppConfig)

📦 AWS SAM — Serverless Application Model

SAMExam Fave

▾

AWS SAM is a shorthand extension of CloudFormation for serverless apps. The exam tests SAM template syntax, local testing, and the build/deploy workflow.

SAM Template Structure

AWSTemplateFormatVersion: '2010-09-09'
Transform: AWS::Serverless-2016-10-31   # Declares this is a SAM template
Globals:
  Function:
    Timeout: 30
    Runtime: python3.12
    Environment:
      Variables:
        TABLE_NAME: !Ref MyTable
Resources:
  MyFunction:
    Type: AWS::Serverless::Function
    Properties:
      Handler: app.lambda_handler
      CodeUri: ./src/
      Events:
        ApiEvent:
          Type: Api
          Properties:
            Path: /hello
            Method: get

SAM Resource Types

SAM Type	Expands To
`AWS::Serverless::Function`	Lambda Function + IAM Role + optional Event Source Mapping
`AWS::Serverless::Api`	API Gateway REST API + Stage + Deployment
`AWS::Serverless::HttpApi`	API Gateway HTTP API
`AWS::Serverless::SimpleTable`	DynamoDB table (single primary key)
`AWS::Serverless::StateMachine`	Step Functions state machine

SAM CLI Workflow

sam init — scaffold a new SAM app from a template
sam build — package code and dependencies into .aws-sam/build
sam local invoke — run a Lambda function locally with a test event JSON
sam local start-api — start a local API Gateway emulator
sam deploy --guided — deploy to AWS; creates/updates CloudFormation stack
sam logs — tail Lambda logs from CloudWatch

🎯

SAM Transform: The line Transform: AWS::Serverless-2016-10-31 is what makes a template a SAM template. During deployment, CloudFormation calls the SAM transform to expand SAM-specific resource types into standard CloudFormation resources.

⚠️

Common traps:

"SAM is a completely separate service from CloudFormation" — FALSE; SAM is a superset/extension of CloudFormation. sam deploy creates a CloudFormation stack.
"sam local invoke tests the same environment as AWS Lambda" — FALSE; local invocation runs in a Docker container that simulates Lambda but is not identical (no VPC, different IAM, different resource limits).

🏗️ AWS CloudFormation — Stacks, Templates & Change Sets

CloudFormationHigh Frequency

▾

Core Concepts

Stack: A collection of AWS resources managed as a single unit. Create/update/delete together.
Template: JSON or YAML file defining the desired state. Sections: Parameters, Mappings, Conditions, Resources (required), Outputs
Change Set: Preview the impact of a template update before executing. Shows what will be Added/Modified/Removed.
Drift Detection: Identify resources that have been manually modified outside CloudFormation
Stack Policy: Protects stack resources from unintended updates during stack update operations

Key Template Functions

Function	Purpose	Example
`!Ref`	Reference parameter or resource	`!Ref MyBucket`
`!GetAtt`	Get attribute of a resource	`!GetAtt MyTable.Arn`
`!Sub`	String substitution	`!Sub "arn:aws:s3:::${BucketName}/*"`
`!ImportValue`	Import an exported Output from another stack	Cross-stack references
`!If`	Conditional resource creation	`!If [IsProd, ProdConfig, DevConfig]`

Nested Stacks & StackSets

Nested Stacks: Reusable template components — a root stack references child stack templates via S3 URLs. Used to break up large templates.
StackSets: Deploy a single template to multiple accounts and regions simultaneously — essential for multi-account governance

AWS AppConfig

Managed feature flags and configuration — separate configuration from code deployments
Supports validation (JSON schema) before config goes live
Supports gradual deployment of config changes (canary, linear, all-at-once)
Applications fetch config at runtime without code redeployment

🎯

"Preview what changes before updating stack" → Change Set. "Deploy same template to 50 accounts" → StackSets. "Share VPC ID between stacks" → Outputs + !ImportValue. "Separate feature flags from Lambda code" → AppConfig.

⚠️

Common traps:

"CloudFormation can import existing resources into a stack" — TRUE via Resource Import, but only for supported resource types.
"!ImportValue creates a hard dependency — if the exporting stack tries to delete the exported Output, it will fail until all importing stacks are deleted first."

Task 3.2

Test Applications in Development Environments

API Gateway stages and stage variables, Lambda testing strategies, integration tests, mock APIs.

Skills in:

Test deployed code using AWS services and tools
Write integration tests and mock APIs for external dependencies
Test applications by using development endpoints (configuring stages in Amazon API Gateway)
Deploy application stack updates to existing environments (e.g., deploying SAM template to a different staging environment)
Test event-driven applications

🧪 API Gateway Stages, Stage Variables & Testing

API GatewayMedium Frequency

▾

Stage-Based Testing Pattern

Deploy same API to dev, test, and prod stages — each stage is an independent deployment
Stage Variables: Use ${stageVariables.lambdaAlias} in Lambda integration ARN to route each stage to a different Lambda alias
Canary deployments on stages: Route X% of traffic to a new deployment version while keeping the rest on stable — built into API GW stage settings

Testing Approaches

Test Type	Tool	What It Tests
Unit test	pytest, Jest, JUnit	Individual functions in isolation
Local integration	sam local invoke / start-api	Lambda handler with real event payloads
Integration test	Deploy to `dev` stage	Full stack — API GW → Lambda → DynamoDB
Mock API	API GW mock integration	Frontend dev without backend
Event test	Lambda console test, SAM CLI	Event-driven functions (S3, SQS, DDB Streams)

Testing Event-Driven Apps

Use SAM CLI event templates: sam local generate-event s3 put — generates sample S3 event JSON
For SQS: sam local generate-event sqs receive-message
Test Lambda → SQS → Lambda pipelines locally with sam local start-lambda + real SQS queue in dev account

🎯

"Test new Lambda version without affecting prod traffic" → Deploy to Lambda alias pointing to new version. Use API GW stage variable to route dev stage to dev alias. "Frontend team needs API without backend" → API GW Mock Integration returns hardcoded response.

Task 3.3

Automate Deployment Testing

Lambda versions and aliases, container image tags, Amplify branches, and IaC deployment automation.

Skills in:

Create application test events (JSON payloads for Lambda, API GW, SAM)
Deploy API resources to various environments
Create application environments using approved versions (Lambda aliases, container image tags, Amplify branches)
Implement and deploy IaC templates (SAM, CloudFormation)
Manage environments in individual AWS services (dev/test/prod in API GW)
Use Amazon Q Developer to generate automated tests

🏷️ Lambda Versions & Aliases — Immutable Releases & Traffic Shifting

LambdaHigh Frequency

▾

Versions

Publishing a version creates an immutable snapshot of the function code + config
Versions are numbered: :1, :2, :3 …
$LATEST is the mutable, unpublished version — always points to the latest code
You cannot update a published version — it is read-only

Aliases

Named pointers to specific versions (e.g., live → v5, beta → v6)
Weighted routing: An alias can split traffic between two versions — e.g., live = 90% v5 + 10% v6. Used for blue/green and canary testing.
API Gateway stage variable or event source mapping references the alias — swap the version behind the alias without changing integrations

graph LR D["dev stage\n(API GW)"] --> DA["Lambda alias: dev\n→ $LATEST"] P["prod stage\n(API GW)"] --> PA["Lambda alias: live\n90% v5 / 10% v6"] PA -->|90%| V5["Version 5\n(stable)"] PA -->|10%| V6["Version 6\n(new)"]

Container Image Tags & ECR

Lambda supports container images up to 10 GB (vs 250 MB for ZIP)
Images stored in Amazon ECR — tag with :latest, :v1.2.0, or commit SHA for traceability
CI/CD pipeline: build → push to ECR with version tag → update Lambda to use new image URI → run tests → promote tag to :stable

🎯

"Test 10% of prod traffic on new Lambda version" → Lambda alias with weighted routing (90/10). "Roll back to previous version instantly" → Update alias to point back to previous version number. "Prevent accidental deploy of untested code" → Require published version (not $LATEST) in prod alias.

⚠️

Common traps:

"Updating an alias to point to a new version causes downtime" — FALSE; alias update is near-instant and atomic.
"$LATEST is a publishable version" — FALSE; $LATEST is always the unpublished current state. Publishing creates numbered versions from $LATEST.
"Lambda aliases can point to another alias" — FALSE; aliases can only point to specific version numbers or $LATEST, not to other aliases.

Task 3.4

Deploy Code by Using AWS CI/CD Services

CodeCommit, CodeBuild, CodeDeploy, CodePipeline, deployment strategies, rollback.

Skills in:

Describe Lambda deployment packaging options and API Gateway stages with custom domains
Update existing IaC templates; manage application environments using AWS services
Deploy an application version using deployment strategies; commit code to invoke build/test/deploy actions
Use orchestrated workflows to deploy code to different environments
Perform application rollbacks; use labels and branches for version and release management
Configure deployment strategies (blue/green, canary, rolling) for application releases

🚀 AWS CI/CD Pipeline — CodeCommit, CodeBuild, CodeDeploy & CodePipeline

CI/CDExam Fave

▾

The four Code* services form the AWS-native CI/CD toolchain. Know each service's role and how they integrate in a pipeline.

Service Roles

Service	Purpose	Key Concept
AWS CodeCommit	Git repository (source control)	Private Git repos; triggers CodePipeline on push
AWS CodeBuild	Fully managed build & test service	Runs `buildspec.yml`; Docker-based; pay-per-minute
AWS CodeDeploy	Automated deployment to compute targets	EC2, Lambda, ECS; uses `appspec.yml`
AWS CodePipeline	Orchestrates the full CI/CD workflow	Stages: Source → Build → Test → Deploy; integrates with GitHub/Bitbucket too
AWS CodeArtifact	Artifact repository (npm, Maven, PyPI)	Cache and share packages; pull-through cache from public repos

buildspec.yml — CodeBuild

version: 0.2
phases:
  install:
    commands:
      - pip install -r requirements.txt
  build:
    commands:
      - python -m pytest tests/
      - sam build
  post_build:
    commands:
      - sam package --s3-bucket $ARTIFACT_BUCKET
artifacts:
  files:
    - packaged-template.yaml

appspec.yml — CodeDeploy for Lambda

version: 0.0
Resources:
  - MyLambdaFunction:
      Type: AWS::Lambda::Function
      Properties:
        Name: MyFunction
        Alias: live
        CurrentVersion: !Ref Version1
        TargetVersion: !Ref Version2
Hooks:
  - BeforeAllowTraffic: PreTrafficHookFunction
  - AfterAllowTraffic: PostTrafficHookFunction

🎯

CodeBuild vs. CodeDeploy: CodeBuild compiles, tests, packages. CodeDeploy deploys the artifact to a target. CodePipeline orchestrates both. Common exam trap: mixing up which service does what.

⚠️

Common traps:

"CodePipeline can only use CodeCommit as source" — FALSE; CodePipeline supports GitHub, GitHub Enterprise, GitLab, Bitbucket, S3, and ECR as sources.
"CodeBuild replaces CodeDeploy" — FALSE; they serve different purposes. CodeBuild builds and tests; CodeDeploy deploys the built artifact.

🔄 Deployment Strategies — Blue/Green, Canary, Rolling, All-at-Once

CI/CDExam Fave

▾

Strategy Comparison

Strategy	How It Works	Downtime?	Rollback Speed	Cost
All-at-once	Deploy to all instances simultaneously	Yes (brief)	Redeploy (slow)	Cheapest
Rolling	Deploy to one batch at a time	Reduced capacity briefly	Redeploy	Low
Rolling with additional batch	Add new batch before removing old	No	Redeploy	Medium
Immutable	Launch new ASG with new version; swap on success	No	Terminate new instances (fast)	High (double capacity)
Blue/Green	Full duplicate environment; switch DNS/ALB	No	Instant (swap back)	Highest (2× environment)
Canary	X% traffic to new, rest to old	No	Shift traffic back	Medium

Lambda-Specific Deployment via CodeDeploy

LambdaAllAtOnce: Instantly shifts all traffic to new version
LambdaCanary10Percent5Minutes: 10% to new version → wait 5 min → 100% if hooks pass
LambdaLinear10PercentEvery1Minute: Increase by 10% every minute until 100%
Pre/Post traffic hooks: Lambda functions that run before/after traffic shift to validate the deployment (e.g., run smoke tests)

Elastic Beanstalk Deployment Policies

Policy	Downtime	Best For
All at once	Yes	Dev/test rapid deployments
Rolling	No (reduced capacity)	Production with some risk tolerance
Rolling with additional batch	No (full capacity maintained)	Production zero-downtime
Immutable	No	Production — fastest rollback
Blue/Green (swap URLs)	No	Production — full environment isolation

💡

Speed of rollback (fastest to slowest): Blue/Green (instant DNS swap) → Canary (shift traffic back) → Immutable (terminate ASG) → Rolling (redeploy previous version).

🎯

"Zero downtime with fastest rollback for Lambda" → CodeDeploy LambdaCanary + pre/post traffic hooks. "Zero downtime for EC2 with no capacity impact" → Rolling with additional batch. "Instant full rollback for production" → Blue/Green. "Cheapest, accepts downtime" → All-at-once.

⚠️

Common traps:

"Rolling deployment maintains 100% capacity" — FALSE; standard rolling takes instances offline during update, reducing capacity. Use Rolling with additional batch to maintain full capacity.
"Blue/Green deployment is always cheaper than rolling" — FALSE; Blue/Green requires double the infrastructure temporarily — it is the most expensive strategy.

Domain 4 Overview

Debug, monitor, and optimize AWS applications. Covers CloudWatch Logs and Insights, AWS X-Ray distributed tracing, custom metrics (including EMF), structured logging, and performance optimization with caching, concurrency tuning, and messaging efficiency.

⚡ 18% of scored content

Task 4.1

Assist in a Root Cause Analysis

CloudWatch Logs, Logs Insights queries, X-Ray traces, custom metrics, dashboards, and deployment failure debugging.

Skills in:

Debug code to identify defects; interpret application metrics, logs, and traces
Query logs to find relevant data; implement custom metrics (CloudWatch EMF)
Review application health using dashboards and insights
Troubleshoot deployment failures using service output logs
Debug service integration issues in applications

📋 Amazon CloudWatch Logs — Log Groups, Insights & Metric Filters

CloudWatchExam Fave

▾

CloudWatch Logs is the primary log aggregation service on AWS. Knowing how to query, filter, and act on logs is a key DVA-C02 skill.

Hierarchy

Log Group: Container for related log streams (e.g., one per Lambda function or ECS service). Retention configured here (1 day – 10 years, or never expire).
Log Stream: Sequence of log events from a single source (e.g., one Lambda execution environment, one EC2 instance).
Log Event: A single timestamped record.

CloudWatch Logs Insights — Query Syntax

# Find Lambda errors in the last hour
fields @timestamp, @message
| filter @message like /ERROR/
| sort @timestamp desc
| limit 20

# Calculate p99 Lambda duration
filter @type = "REPORT"
| stats pct(@duration, 99) as p99Duration by bin(5m)

# Count errors by error type
filter @message like /Exception/
| parse @message "* Exception: *" as exType, exMsg
| stats count(*) by exType

Metric Filters

Extract numeric values from log events and publish them as CloudWatch custom metrics
Example: Count occurrences of ERROR in logs → create metric ErrorCount → set alarm on it
Pattern: { $.level = "ERROR" } (JSON log format) or simple text pattern ERROR
One metric filter = one CloudWatch metric per log group

Lambda Logging

Lambda automatically sends stdout/stderr to CloudWatch Logs (/aws/lambda/functionName)
Lambda Logs format: START, END, REPORT lines contain requestId, duration, billed duration, memory used, init duration (cold start)
CloudWatch Lambda Insights: Enhanced monitoring — CPU, memory, disk, network utilization via a Lambda layer

Common Root Cause Patterns

Symptom	Likely Cause	Where to Look
Lambda 502 errors from API GW	Lambda function exception or timeout	Lambda log group for error detail
Lambda throttling (429)	Concurrency limit reached	CloudWatch metric: `Throttles`
Lambda timeout	Function exceeds timeout config	REPORT line: Duration > configured timeout
DynamoDB ProvisionedThroughputExceededException	Hot partition or insufficient capacity	CW metric: `ConsumedWriteCapacityUnits`, `ThrottledRequests`
API GW 5xx errors	Backend Lambda or integration error	API GW execution logs; enable in stage settings

🎯

Enable API Gateway execution logging in the stage settings to see full request/response cycle. Without this, you only see access logs — insufficient for debugging integration errors. Set log level to INFO or ERROR.

⚠️

Common traps:

"CloudWatch Logs Insights queries run in real-time as logs arrive" — FALSE; Logs Insights queries historical log data. For real-time filtering, use CloudWatch Live Tail or Subscription Filters.
"Lambda always writes logs to CloudWatch automatically" — TRUE only if the execution role has logs:CreateLogGroup, logs:CreateLogStream, and logs:PutLogEvents permissions.

🔍 AWS X-Ray — Distributed Tracing, Segments & Service Map

X-RayHigh Frequency

▾

X-Ray provides end-to-end distributed tracing across microservices. It is the go-to tool for identifying latency bottlenecks and errors across a request chain on the DVA-C02 exam.

Core Concepts

Concept	Description
Trace	Complete end-to-end request across all services — identified by unique Trace ID
Segment	Data from a single service (e.g., Lambda function, EC2 app) for one request
Subsegment	Granular unit within a segment (e.g., DynamoDB call, HTTP downstream call, custom code block)
Annotation	Indexed key-value pair — filterable in X-Ray console (e.g., `userId`, `tenantId`)
Metadata	Non-indexed key-value pair — visible in trace but not searchable
Service Map	Visual graph of all services in the request chain with latency and error rates

Enabling X-Ray

Lambda: Enable Active Tracing in function config. Lambda sends trace data automatically. Add X-Ray SDK for subsegments.
API Gateway: Enable X-Ray Tracing on the stage settings. Adds trace header to requests.
ECS/EC2: Run X-Ray Daemon as a sidecar or daemon process. SDK sends trace data to daemon → daemon batches and sends to X-Ray API.
SDK instrumentation: Wrap AWS SDK calls and HTTP clients with X-Ray SDK to automatically create subsegments

X-Ray SDK — Annotations vs. Metadata

# Python: add annotation (indexed — use for filtering)
xray_recorder.put_annotation("userId", user_id)
xray_recorder.put_annotation("plan", "premium")

# Metadata (not indexed — for debugging context)
xray_recorder.put_metadata("requestPayload", request_body)

# Custom subsegment
with xray_recorder.in_subsegment("database-query") as sub:
    result = table.query(...)
    sub.put_annotation("recordCount", len(result['Items']))

Sampling Rules

X-Ray does not record every request — it samples to reduce cost and noise
Default rule: Record first request per second + 5% of additional requests per service host
Custom sampling rules: Target specific services, URLs, or HTTP methods with different rates
Sampling rules are centrally configured — no code change needed to adjust sampling

💡

Annotation vs. Metadata: Annotation = Actionable / searchable (like a database index). Metadata = More context (like a comment — descriptive but not queryable).

🎯

"Find all requests from tenantId=ABC that took >2s" → Add tenantId as X-Ray annotation → filter in X-Ray console. "Which downstream service is causing latency" → X-Ray Service Map — look for high response time subsegments. "Lambda function not sending traces" → Check Active Tracing is on AND execution role has xray:PutTraceSegments + xray:PutTelemetryRecords.

⚠️

Common traps:

"X-Ray traces every single request by default" — FALSE; X-Ray uses sampling. To trace every request, set a custom sampling rule with 100% fixed rate (not recommended at high volume).
"X-Ray metadata is searchable in the console like annotations" — FALSE; only annotations are indexed and filterable. Metadata is visible within a specific trace detail but cannot be used to search/filter across traces.
"X-Ray is enabled automatically on Lambda functions" — FALSE; you must explicitly enable Active Tracing in the Lambda function configuration.

Task 4.2

Instrument Code for Observability

Logging vs. monitoring vs. observability, custom metrics, EMF, CloudWatch Alarms, X-Ray SDK, structured logging, health checks.

Skills in:

Describe differences between logging, monitoring, and observability
Implement an effective logging strategy to record application behavior and state
Implement code that emits custom metrics; add annotations for tracing services
Implement notification alerts for specific actions (quota limits, deployment completions)
Implement tracing using AWS services and tools; implement structured logging
Configure application health checks and readiness probes

📊 CloudWatch Custom Metrics, EMF & Alarms

CloudWatchHigh Frequency

▾

Publishing custom metrics lets you monitor business-level KPIs alongside infrastructure metrics. EMF is the modern, cost-effective approach for Lambda functions.

Logging vs. Monitoring vs. Observability

Pillar	What It Provides	AWS Service
Logging	Discrete event records — what happened	CloudWatch Logs, CloudTrail
Monitoring	Time-series metrics — how things are performing	CloudWatch Metrics & Alarms
Tracing	Request flow across services — why it happened	AWS X-Ray
Observability	Combine all three to understand system state	CloudWatch Application Signals, X-Ray, OTel

Custom Metrics — PutMetricData

# Emit custom metric: order processing time
cw = boto3.client('cloudwatch')
cw.put_metric_data(
    Namespace='MyApp/Orders',
    MetricData=[{
        'MetricName': 'ProcessingDuration',
        'Value': processing_ms,
        'Unit': 'Milliseconds',
        'Dimensions': [{
            'Name': 'Environment',
            'Value': 'prod'
        }]
    }]
)

Embedded Metric Format (EMF) — Preferred for Lambda

Embed metric values in structured log JSON — CloudWatch automatically extracts them as metrics
Zero additional API calls — metrics are extracted from your existing logs
No extra cost for the PutMetricData API call; you only pay for the log ingestion
Use the aws-embedded-metrics library (Python/Node.js/Java)

# Python EMF — metrics extracted automatically from logs
from aws_embedded_metrics import metric_scope

@metric_scope
def lambda_handler(event, context, metrics):
    metrics.set_namespace("MyApp")
    metrics.put_dimensions({"Service": "OrderProcessor"})
    metrics.put_metric("OrderCount", 1, "Count")
    metrics.put_metric("ProcessingTime", elapsed_ms, "Milliseconds")

CloudWatch Alarms

Alarm Type	Triggers When	Use Case
Threshold Alarm	Metric crosses a static threshold	CPU > 80%, Error count > 5
Anomaly Detection	Metric deviates from ML-predicted band	Traffic drops/spikes without fixed threshold
Composite Alarm	Combination of multiple alarms (AND/OR)	Alert only when both high CPU AND high errors
Math Expression Alarm	Metric math result crosses threshold	Error rate = errors / requests > 1%

Alarm States

OK: Metric is within the threshold
ALARM: Metric has breached the threshold for the configured datapoints-to-alarm
INSUFFICIENT_DATA: Not enough data points yet (common with new metrics or long evaluation periods)

🎯

"Notify on Lambda throttling" → CloudWatch Alarm on Throttles metric → SNS → email/PagerDuty. "Track business KPI in Lambda" → Use EMF (no extra API calls, extracted from logs automatically). "Alert only when multiple conditions are true" → Composite Alarm.

⚠️

Common traps:

"An alarm in INSUFFICIENT_DATA state means something is wrong" — FALSE; insufficient data is normal for new alarms or when a metric has gaps (e.g., Lambda not invoked recently).
"CloudWatch custom metrics are retained indefinitely" — FALSE; CloudWatch retains data at different resolutions: 1-second for 3 hours, 1-minute for 15 days, 5-minute for 63 days, 1-hour for 15 months.

📝 Structured Logging, Health Checks & Observability Best Practices

LoggingMedium Frequency

▾

Structured Logging (JSON)

Log as JSON instead of plain text — machine-parseable, queryable with Logs Insights
Always include: requestId, userId, timestamp, level, message, duration
Never include PII, credentials, or sensitive data in logs
Use a correlation ID (requestId or traceId) to link logs across multiple services for the same request

# Structured log entry — queryable with Logs Insights
{
  "level": "INFO",
  "requestId": "abc-123",
  "userId": "u-456",
  "action": "processOrder",
  "orderId": "ord-789",
  "durationMs": 45,
  "status": "success"
}

Application Health Checks

Type	What It Checks	Used By
ALB Target Health Check	HTTP endpoint returns 2xx	ALB removes unhealthy targets from rotation
ECS Container Health Check	Command/HTTP runs successfully inside container	ECS replaces unhealthy tasks
Route 53 Health Check	DNS-level endpoint availability	DNS failover routing policies
CloudWatch Synthetics Canary	Scripted browser/API test runs on a schedule	Proactive endpoint monitoring

Subscription Filters — Real-Time Log Routing

Stream filtered log data from a CloudWatch log group to Lambda, Kinesis, or Firehose in real time
Use case: Forward ERROR logs from all Lambda functions to a centralized Kinesis stream → Firehose → OpenSearch for analysis
One subscription filter per log group (with Firehose); up to 2 with Lambda

🎯

"Real-time log analysis pipeline" → CloudWatch Logs Subscription Filter → Kinesis Data Firehose → S3 / OpenSearch. "Proactive endpoint test every 5 min" → CloudWatch Synthetics Canary. "Correlate logs across Lambda, API GW, DynamoDB for one request" → Use X-Ray trace ID as correlation ID in all log statements.

Task 4.3

Optimize Applications by Using AWS Services and Features

Lambda concurrency, caching strategies (ElastiCache, DAX, CloudFront, API GW), messaging optimization, and application profiling.

Skills in:

Define concurrency; profile application performance
Determine minimum memory and compute power for an application
Use subscription filter policies to optimize messaging
Cache content based on request headers; implement application-level caching
Optimize application resource usage; analyze application performance issues
Use application logs to identify performance bottlenecks

⚡ Lambda Concurrency Deep Dive — Limits, Throttling & Burst

LambdaHigh Frequency

▾

Concurrency management is one of the most tested Lambda optimization topics. Understanding reserved vs. provisioned concurrency and throttling behavior is essential.

Concurrency Model

Each in-flight request uses one concurrent execution
Account default limit: 1,000 concurrent executions (soft limit — can be increased via Service Quotas)
Burst limit: Initial burst capacity (500–3,000 depending on region); after burst, scales by 500 per minute
When concurrency limit is hit → throttling → TooManyRequestsException (HTTP 429)

Concurrency Types Recap

Type	Purpose	Cost when idle	Eliminates cold start?
Unreserved	Default — shared pool	No	No
Reserved	Guarantee max N; protect downstream	No (just a limit)	No
Provisioned	Pre-warm N environments	Yes (charged continuously)	Yes

Throttling Behavior by Invocation Type

Invocation Type	Throttle Behavior
Synchronous (API GW → Lambda)	Returns 429 immediately to caller; no retry
Asynchronous (S3, SNS, EventBridge)	Lambda retries for up to 6 hours; backs off automatically
Event Source Mapping (SQS, Kinesis)	Messages stay in queue/stream; Lambda retries when concurrency available

Lambda Power Tuning

Open-source tool (AWS Lambda Power Tuning) runs your function at multiple memory sizes
Finds the optimal memory setting for cost, speed, or both
More memory ≠ always slower; often reducing duration pays for memory increase
Rule: Cost = duration × memory. Higher memory may run faster enough to be cheaper overall.

💡

Reserved Concurrency = Ceiling (caps usage). Provisioned Concurrency = Floor (pre-warms environments). Set both: Reserved prevents runaway scale; Provisioned ensures no cold starts up to that level.

🎯

"Prevent Lambda from overwhelming RDS" → Reserved Concurrency (cap Lambda). "Eliminate cold starts for latency-sensitive function" → Provisioned Concurrency. "Lambda throttled during traffic spike" → Request concurrency limit increase via Service Quotas. "Find cheapest memory setting" → Lambda Power Tuning tool.

⚠️

Common traps:

"Setting reserved concurrency to 0 disables the function" — TRUE — this is actually a useful technique to temporarily disable a function by throttling all executions.
"Provisioned concurrency guarantees no throttling" — FALSE; Provisioned Concurrency pre-warms execution environments but the function can still be throttled if it exceeds the reserved or account concurrency limit.
"Lambda always scales to handle any load instantly" — FALSE; there is a burst limit per region. Above the burst, Lambda scales by 500 concurrent executions per minute.

🚀 Caching Strategies — CloudFront, API GW, ElastiCache & DAX

PerformanceHigh Frequency

▾

Caching at the right layer dramatically reduces latency and cost. The DVA-C02 exam tests which cache layer to choose for a given scenario.

Cache Layer Comparison

Cache Layer	Service	What It Caches	Best For
CDN / Edge	CloudFront	Static assets, API responses	Global users; reduce origin load; low latency worldwide
API Layer	API Gateway cache	API endpoint responses	Repeated identical API requests; reduce Lambda invocations
Application Layer	ElastiCache (Redis/Memcached)	DB query results, session data	RDS read offload; session store; complex data structures
Database Layer	DynamoDB DAX	DynamoDB GetItem/Query results	DynamoDB hot reads; microsecond latency

CloudFront Cache Behaviors

Cache based on request headers: Add headers to the Cache Key to vary cached responses (e.g., Accept-Language for localized content). More headers = fewer cache hits.
TTL: Set via Cache-Control: max-age=N response header from origin, or override in CloudFront policy (min/max/default TTL)
Cache invalidation: CreateInvalidation API or console — invalidate specific paths. Cost: first 1,000 paths/month free, then $0.005/path.
Origin Shield: Additional caching layer between CloudFront edge and origin — reduces origin load further

API Gateway Caching

Cache enabled per stage; capacity from 0.5 GB to 237 GB
TTL: 0–3600 seconds (default 300s)
Cache invalidation: client sends Cache-Control: max-age=0 header (if you grant them permission)
Cache hit metric: CacheHitCount / CacheMissCount in CloudWatch

SNS Subscription Filter Policies

Filter policies on SNS subscriptions prevent unwanted messages from being delivered to a subscriber
Filtering happens at SNS — subscribers only receive messages matching their policy
Reduces downstream processing costs (fewer Lambda invocations, fewer SQS messages)
Filter on message attributes: {"eventType": ["ORDER_PLACED", "ORDER_SHIPPED"]}

🎯

"Reduce Lambda invocations for repeated API calls" → API GW cache. "Reduce RDS load from read-heavy app" → ElastiCache lazy loading. "Serve static content globally with low latency" → CloudFront. "Only process ORDER events from SNS topic" → SNS subscription filter policy on your SQS subscriber.

⚠️

Common traps:

"CloudFront caches POST requests" — FALSE; CloudFront only caches GET and HEAD requests by default. POST, PUT, DELETE pass through to origin (not cached).
"Adding more headers to the CloudFront cache key improves cache hit rate" — FALSE; more headers in the cache key means more unique cache entries → lower cache hit rate. Use only headers that genuinely vary the response.
"SNS filter policies are applied at the publisher" — FALSE; filter policies are configured on the subscription (subscriber side). The publisher sends the message to all subscribers; SNS filters before delivering.

🔧 SQS Optimization & Application Performance Profiling

PerformanceMedium Frequency

▾

SQS Optimization Techniques

Technique	How	Benefit
Long Polling	Set `WaitTimeSeconds` = 1–20	Reduces empty API calls; reduces cost
Batch Processing	Receive up to 10 messages per call (`MaxNumberOfMessages`)	Fewer API calls; higher throughput
Message Batching (send)	Use `SendMessageBatch` (up to 10 messages)	10× throughput per API call
Visibility Timeout Tuning	Set to > max processing time	Prevent duplicate processing
Dead-Letter Queue	Set `maxReceiveCount`	Isolate poison-pill messages; prevent infinite retry loop

Application Performance Analysis

CloudWatch Metrics: Identify CPU/memory saturation, DB connection counts, queue depth trends over time
X-Ray Traces: Find which subsegment (DB call, downstream HTTP, internal code) contributes most to latency
CloudWatch Logs Insights: Query p99 latency: stats pct(@duration, 99) by bin(5m)
Lambda REPORT logs: Init Duration reveals cold start cost; Max Memory Used reveals memory waste

Performance Bottleneck Decision Tree

Symptom	Root Cause	Fix
High Lambda duration	Slow downstream DB/API call	Add DAX/ElastiCache; use async pattern
Lambda cold starts	No pre-warmed environment	Provisioned Concurrency; reduce package size
DynamoDB throttling	Hot partition or under-provisioned	Better partition key; On-Demand mode; DAX
RDS connection errors under load	Connection pool exhausted	RDS Proxy; reduce Lambda reserved concurrency
High API GW latency	Repeated identical requests hitting Lambda	Enable API GW caching
SQS high message age	Consumers too slow	Increase Lambda concurrency; increase batch size

🎯

Full observability stack for Lambda: X-Ray (tracing) + CloudWatch Logs (structured JSON) + EMF (custom metrics) + CloudWatch Alarms (notification) + Lambda Insights layer (system metrics). This combination gives you complete visibility without external tools.

⚠️

Common traps:

"Long polling always returns messages faster" — FALSE; long polling waits up to WaitTimeSeconds for a message. If no message arrives, it returns empty after the wait. It is more efficient, not necessarily faster.
"Increasing Lambda memory always reduces cost" — FALSE; if the function is not compute-bound, increasing memory just costs more with no duration benefit. Use Power Tuning to find the optimum.

Complete Study Guide
All Four Domains

Domain 1 Overview

Develop Code for Applications Hosted on AWS

Develop Code for AWS Lambda

Use Data Stores in Application Development

Domain 2 Overview

Implement Authentication and/or Authorization

Implement Encryption by Using AWS Services

Manage Sensitive Data in Application Code

Domain 3 Overview

Prepare Application Artifacts to be Deployed to AWS

Test Applications in Development Environments

Automate Deployment Testing

Deploy Code by Using AWS CI/CD Services

Domain 4 Overview

Assist in a Root Cause Analysis

Instrument Code for Observability

Optimize Applications by Using AWS Services and Features

On this page