AWS Certified Developer — Associate · DVA-C02

Complete Study Guide
All Four Domains

📋 Domains: 4 ⚡ Pass Score: 720 / 1000 ❓ Scored Questions: 50

Domain 1 Overview

Develop, test, and maintain applications on AWS. Covers architectural patterns, AWS SDKs and APIs, messaging and event-driven services, Lambda development, and data store integration. The highest-weighted domain at 32%.

⚡ 32% of scored content — Highest weighted domain
Task 1.1

Develop Code for Applications Hosted on AWS

Architectural patterns, SDK usage, API design, messaging services, streaming, and event-driven patterns.

Skills in:
  • Describe architectural patterns (event-driven, microservices, monolithic, choreography, orchestration, fanout)
  • Describe differences between stateful/stateless and tightly/loosely coupled components
  • Describe differences between synchronous and asynchronous patterns
  • Write and run unit tests; write code that interacts with AWS services via APIs and SDKs
  • Use messaging services, handle streaming data, implement resilient application code
  • Use Amazon EventBridge for event-driven patterns; use Amazon Q Developer
🏗️ Architectural Patterns — Event-Driven, Microservices, Choreography vs. Orchestration
ArchitectureExam Fave

The DVA-C02 exam tests your ability to choose the right architectural pattern for a given scenario. Understand the tradeoffs deeply.

Pattern Comparison
PatternDescriptionAWS ServicesBest For
MonolithicSingle deployable unit; tightly coupledEC2, Elastic BeanstalkSimple apps, small teams, early stage
MicroservicesIndependent services with own data storesECS, EKS, Lambda, API GWScale independently, team autonomy
Event-DrivenComponents react to events asynchronouslyEventBridge, SNS, SQS, LambdaDecoupled, scalable workflows
FanoutOne publisher → multiple subscribersSNS → SQS queuesParallel processing of same event
Choreography vs. Orchestration
ChoreographyOrchestration
ControlDecentralized — each service reacts to eventsCentralized — one service directs the flow
CouplingLoose — services don't know each otherTighter — orchestrator knows all services
AWS ServiceEventBridge, SNS, SQSAWS Step Functions
VisibilityHarder to trace the full workflowFull execution history & audit trail
Best forSimple, independent workflowsComplex multi-step processes with error handling
Fanout Pattern

An e-commerce order event is published to an SNS topic. Three SQS queues subscribe: one for inventory, one for shipping, one for email notifications. All three process the order in parallel. This is the classic SNS fanout pattern — one message, multiple independent consumers.

Stateful vs. Stateless
  • Stateless: No session data stored in the application instance — any request can go to any instance. Store state externally (DynamoDB, ElastiCache, S3). Lambda functions are stateless by design.
  • Stateful: Session data in memory — the same client must hit the same instance. Requires sticky sessions (ALB) or session replication. Harder to scale horizontally.
Tight vs. Loose Coupling
  • Tightly coupled: Service A calls Service B synchronously and waits. If B fails, A fails. Direct function calls or synchronous HTTP are examples.
  • Loosely coupled: Service A sends a message to a queue or topic and continues. B processes when ready. SQS, SNS, and EventBridge enable loose coupling.
Sync vs. Async Patterns
SynchronousAsynchronous
Caller waits for responseCaller fires and continues
REST API calls, Lambda invoked by API GWSQS, SNS, EventBridge, Lambda async invoke
Immediate feedbackBetter scalability and resilience
Timeout risk under loadBackpressure handled by queue depth
💡

Mnemonic — CHESS: Choreography = each service reacts. Hard to trace. EventBridge/SNS. Step Functions = State machine for orchestration.

🎯

"Long-running multi-step workflow" → Step Functions Standard. "Multiple services react to same event" → SNS fanout to SQS. "Decouple producer from consumer" → SQS. "Route events based on rules" → EventBridge.

⚠️

Common traps:

  • "Microservices are always better than monolithic" — FALSE; microservices add operational complexity. For simple apps, a monolith is appropriate.
  • "Lambda functions are stateful within their execution environment" — PARTIALLY TRUE; the execution context can be reused between warm invocations, but you cannot rely on it. Store durable state externally.
  • "Choreography is better than orchestration because it's loosely coupled" — CONTEXT-DEPENDENT; orchestration (Step Functions) is preferred when you need visibility, error handling, and complex branching logic.
🔧 AWS SDK, APIs & Credential Chain
SDKHigh Frequency

Every interaction with AWS from code goes through the SDK. Understanding the credential resolution chain and retry behavior is critical for DVA-C02.

SDK Credential Resolution Order (checked in order)
  • 1. Explicit in code — hardcoded (never do this in production)
  • 2. Environment variablesAWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY
  • 3. AWS credentials file~/.aws/credentials (default profile)
  • 4. AWS config file~/.aws/config
  • 5. Container credentials — ECS task role via metadata endpoint
  • 6. Instance profile — EC2 IAM Role via IMDS (169.254.169.254)
Retry Logic & Exponential Backoff
  • AWS SDKs automatically retry on retryable errors (e.g., throttling ThrottlingException, server errors 5xx)
  • Exponential backoff: Wait 2n seconds × random jitter between retries — prevents thundering herd
  • Non-retryable errors (4xx like AccessDeniedException) are NOT retried by the SDK
  • When you receive ProvisionedThroughputExceededException from DynamoDB or SQS throttling, implement backoff in your code
Common API Patterns
PatternDescriptionExample
PaginatorAuto-handles NextToken paginationlist_objects_v2 on large S3 buckets
WaiterPoll until resource reaches target stateWait for EC2 instance to be running
Presigned URLTemporary, signed URL for S3 accessUpload/download without credentials
Batch operationsReduce API calls by batchingDynamoDB BatchWriteItem, SQS SendMessageBatch
// Example: S3 presigned URL generation (Python boto3)
s3 = boto3.client('s3')
url = s3.generate_presigned_url(
    'get_object',
    Params={'Bucket': 'my-bucket', 'Key': 'file.pdf'},
    ExpiresIn=3600  # URL valid for 1 hour
)
🎯

Never hardcode credentials. On EC2 → use IAM Role. On Lambda → use execution role. On ECS → use task role. The SDK resolves credentials in the chain order above automatically.

⚠️

Common traps:

  • "SDK retries all errors automatically" — FALSE; only retryable errors (5xx, throttling). AccessDenied (403) is not retried.
  • "Presigned URLs can be used to grant permanent access to S3 objects" — FALSE; they are time-limited (max 7 days for SDK-generated, 12h for STS-based).
  • "Environment variable credentials override instance profile credentials" — TRUE; they are checked earlier in the chain (position 2 vs 6).
🌐 Amazon API Gateway — REST, HTTP & WebSocket APIs
API GatewayHigh Frequency

API Gateway is the front door for serverless applications. The DVA-C02 exam tests all three API types, authorization methods, stages, and caching.

API Types Compared
FeatureREST APIHTTP APIWebSocket API
ProtocolHTTP/HTTPSHTTP/HTTPSWebSocket
CostHigher~70% cheaperPer message/connection
FeaturesFull-featured (caching, WAF, usage plans)Lightweight, low-latencyPersistent connections
AuthLambda, Cognito, IAM, API KeysLambda, Cognito, IAMLambda authorizer
Best forAPIs needing advanced featuresSimple proxies, microservicesReal-time (chat, notifications)
Authorization Methods
Authorizer TypeHow It WorksUse Case
IAM AuthorizationCaller signs request with Sig V4Internal AWS service-to-service
Cognito User PoolValidates JWT from CognitoWeb/mobile apps with Cognito login
Lambda Authorizer (Token)Lambda receives Bearer token, returns IAM policyCustom JWT, OAuth, 3rd-party IdP
Lambda Authorizer (Request)Lambda receives full request (headers, query)Complex auth logic, IP allow-listing
API KeyKey sent in header; not authentication — use for rate limitingUsage plans, partner throttling
Stages & Stage Variables
  • Stages are named snapshots of your API (e.g., dev, staging, prod)
  • Stage variables are key-value pairs like environment variables — referenced as ${stageVariables.lambdaAlias} in integrations
  • Use stage variables to point different stages to different Lambda aliases or backends without changing the API definition
  • Canary deployments: Route a % of traffic to new deployment for testing before full release
Caching & Throttling
  • API Gateway cache: Caches responses per stage; configurable TTL (0–3600s); reduces Lambda invocations for repeated identical requests
  • Throttling: Default 10,000 RPS per account; per-method throttling possible via usage plans
  • Returns 429 Too Many Requests when throttled
Stage Variable Pattern

Your API has three stages. The Lambda integration URL uses ${stageVariables.lambdaAlias}. In dev stage, the variable = dev. In prod, = live. Deploying a new Lambda version to the dev alias doesn't affect production — the same API definition routes to the correct Lambda alias per stage.

💡

Auth choice: Internal AWS service → IAM. Mobile/web app users → Cognito User Pool. Custom/3rd-party token → Lambda Authorizer. Rate-limiting partners → API Keys + Usage Plans.

⚠️

Common traps:

  • "API Keys authenticate and authorize users" — FALSE; API Keys identify the calling application, not the user. They are used for throttling/metering, not security authorization.
  • "HTTP API supports caching and WAF integration" — FALSE; HTTP API is lightweight and does NOT support response caching or AWS WAF integration. Use REST API for these features.
  • "Caching is free" — FALSE; API Gateway caching is an additional charge based on cache size.
📨 Messaging & Eventing — SQS, SNS, EventBridge & Fanout
MessagingExam Fave

Understanding when to use SQS vs SNS vs EventBridge is a core DVA-C02 skill. Know the delivery model, ordering guarantees, and failure handling for each.

Service Comparison
FeatureSQSSNSEventBridge
ModelPull (consumer polls)Push (fan-out)Push (rule-based routing)
ConsumersOne consumer per messageMultiple subscribersMultiple targets per rule
PersistenceUp to 14 daysNo storage (fire & forget)No storage
OrderingFIFO queue onlyNoneNone
Exactly-onceFIFO onlyNoNo
Message size256 KB256 KB256 KB
SQS Key Concepts
  • Visibility Timeout: How long a message is hidden after being received. If processing doesn't finish & delete within this window, the message becomes visible again → potential duplicate processing
  • Dead-Letter Queue (DLQ): After maxReceiveCount failed processing attempts, message is moved to DLQ for investigation
  • Long Polling: WaitTimeSeconds up to 20s — waits for messages instead of returning empty responses. Reduces API costs and false empties
  • FIFO Queues: Exactly-once processing, strict ordering, max 3,000 TPS (with batching) — use Message Group ID for ordered processing per group
  • Message Retention: Default 4 days, max 14 days
SNS Fanout Pattern
graph LR P["Publisher"] --> T["SNS Topic"] T --> Q1["SQS Queue A\n(Inventory)"] T --> Q2["SQS Queue B\n(Shipping)"] T --> Q3["SQS Queue C\n(Email)"] Q1 --> L1["Lambda\nProcessor"] Q2 --> L2["Lambda\nProcessor"] Q3 --> L3["Lambda\nProcessor"]
Amazon EventBridge
  • Serverless event bus — routes events based on content-based filtering rules
  • Event sources: AWS services, SaaS partners (Zendesk, Shopify), custom apps
  • Targets: Lambda, SQS, SNS, Step Functions, API GW, Kinesis, etc.
  • Event Patterns: Filter events by source, detail-type, and any field in the event JSON
  • Scheduled Rules (cron): Replace CloudWatch Events scheduled rules; run tasks on a schedule
  • Event Buses: Default (AWS services), custom (your app events), partner (SaaS events)
🎯

Route events to multiple targets based on rules → EventBridge. One message, multiple parallel consumers → SNS → SQS fanout. Ensure exactly-once ordered processing → SQS FIFO. Decouple and buffer with retry → SQS Standard + DLQ.

⚠️

Common traps:

  • "SQS Standard guarantees at-most-once delivery" — FALSE; Standard provides at-least-once (duplicates can occur). Only FIFO provides exactly-once.
  • "If a Lambda function fails to process an SQS message, the message is automatically sent to DLQ" — FALSE; it goes to DLQ only after maxReceiveCount failures. The message is retried first.
  • "SNS stores messages if a subscriber is unavailable" — FALSE; SNS is fire-and-forget. If the subscriber (e.g., HTTP endpoint) is down, the message is lost. Use SQS subscriber to persist.
🌊 Amazon Kinesis — Streaming Data
KinesisMedium Frequency

Kinesis enables real-time data streaming. Know the difference between the Kinesis products and when to use each vs SQS.

Kinesis Products
ServicePurposeConsumersKey Feature
Kinesis Data Streams (KDS)Custom real-time processingMultiple (Lambda, KCL, Analytics)Configurable retention 1–365 days; ordered per shard
Kinesis Data FirehoseLoad streaming data to destinationsS3, Redshift, OpenSearch, SplunkFully managed, no custom consumer code needed
Kinesis Data AnalyticsReal-time SQL/Flink on streamsKDS or Firehose as sourceDetect anomalies, aggregate, filter in real-time
KDS Core Concepts
  • Shard: Unit of capacity — 1 MB/s write, 2 MB/s read per shard. Add shards to scale.
  • Partition Key: Determines which shard receives the record. Use high-cardinality keys for even distribution
  • Sequence Number: Unique per record per shard; ordered within a shard
  • Enhanced Fan-Out: Dedicated 2 MB/s per consumer per shard using HTTP/2 push — for multiple high-throughput consumers
  • Lambda + KDS: Lambda polls via event source mapping; batch size configurable; bisect on error for failed batches
Kinesis vs. SQS
Kinesis Data StreamsSQS
Multiple consumersYes — same data read by multiple consumersNo — message consumed by one consumer
OrderingOrdered per shardFIFO only (SQS FIFO)
ReplayYes — replay up to retention periodNo — messages deleted after processing
Use caseReal-time analytics, event replay, audit logsTask queue, job distribution, decoupling
🎯

"Multiple consumers process same data stream" → Kinesis. "Real-time clickstream analytics" → Kinesis. "Deliver streaming data to S3 without code" → Kinesis Firehose. "Decouple service X from service Y" → SQS.

⚠️

Common traps:

  • "Kinesis Firehose delivers data in real-time with sub-second latency" — FALSE; Firehose buffers data (minimum 60-second buffer interval or 1 MB) before delivering. It is near-real-time, not sub-second.
  • "Adding more shards to KDS increases read throughput for all consumers equally" — FALSE without Enhanced Fan-Out. With standard consumers, total read throughput is shared across consumers on the same shard (2 MB/s shared).
Task 1.2

Develop Code for AWS Lambda

Lambda fundamentals, configuration, event sources, error handling, VPC access, and performance tuning.

Skills in:
  • Describe access to private resources in VPCs from Lambda code
  • Configure Lambda functions: environment variables, memory, concurrency, timeout, runtime, handler, layers, extensions, triggers, destinations
  • Handle the event lifecycle and errors using code (Lambda Destinations, dead-letter queues)
  • Write and run test code; integrate Lambda with AWS services
  • Tune Lambda functions for optimal performance; process and transform data in near real time
λ Lambda Fundamentals — Execution Model & Cold Starts
LambdaExam Fave

Lambda is the core of serverless development on AWS. Understanding its execution lifecycle, concurrency model, and cold start behavior is essential.

Execution Lifecycle
  • Init phase: Download code, start runtime, run initialization code outside handler. Happens on cold start. Keep this fast.
  • Invoke phase: Run the handler function. Called for every invocation.
  • Shutdown phase: Runtime receives shutdown signal; run cleanup extensions.
Cold Start vs. Warm Start
Cold StartWarm Start
WhenFirst invocation or after scale-out/idleReuse of existing execution environment
LatencyHigher (100ms–several seconds, depending on runtime)Near-zero overhead
Init code runs?YesNo
Global variables cached?No (fresh environment)Yes (reused from previous invocation)
Handler Structure
# Python handler — event is the trigger payload, context has metadata
def lambda_handler(event, context):
    print(context.function_name)
    print(context.get_remaining_time_in_millis())
    return {
        "statusCode": 200,
        "body": "Hello from Lambda"
    }
Key Configuration Parameters
ParameterRange / DefaultImpact
Memory128 MB – 10,240 MBCPU and network bandwidth scale proportionally with memory
Timeout3s default, max 15 minFunction killed if exceeded; set higher than p99 duration
Ephemeral Storage (/tmp)512 MB – 10,240 MBTemporary file storage within execution environment
ConcurrencyDefault account limit: 1,000Max simultaneous executions across all functions
Performance Optimization Tips
  • Initialize outside handler: DB connections, SDK clients, config loading — cached across warm invocations
  • Right-size memory: More memory = more CPU = faster execution. AWS Lambda Power Tuning tool finds optimal setting
  • Minimize deployment package size: Faster cold starts; use Layers for shared dependencies
  • Avoid recursive invocations: Lambda calling Lambda in a loop can rapidly exhaust concurrency and cause runaway costs
💡

Cold start optimization rule: Move everything you can outside the handler. Connections, clients, and config load once on cold start, then are reused across warm invocations.

🎯

Eliminate cold starts entirely → Provisioned Concurrency (pre-warms N execution environments). Java cold starts are long → use Lambda SnapStart (snapshots initialized environment). "Function needs more CPU" → increase memory allocation.

⚠️

Common traps:

  • "Lambda timeout is 15 minutes by default" — FALSE; default is 3 seconds. 15 minutes is the maximum.
  • "Increasing Lambda memory only helps memory-bound functions" — FALSE; memory also increases allocated vCPU, so CPU-bound functions also benefit.
  • "Global variables persist indefinitely between invocations" — FALSE; they persist only within the same execution environment instance. New instances start fresh.
🔗 Lambda Event Sources, Triggers & Concurrency
LambdaHigh Frequency
Invocation Types
TypeBehaviorRetry on failureTriggered by
SynchronousCaller waits for responseNo (caller handles errors)API GW, ALB, SDK invoke, Cognito triggers
AsynchronousLambda queues event, returns 202 immediatelyUp to 2 retries (auto)S3 events, SNS, EventBridge, SES
Event Source Mapping (poll-based)Lambda polls the source on your behalfRetries until success or expirySQS, Kinesis, DynamoDB Streams, MSK
Concurrency Types
TypePurposeBehavior
Unreserved ConcurrencyDefault — shared pool across all functionsThrottles when account limit reached
Reserved ConcurrencyGuarantee max N for one function; cap othersFunction cannot exceed N; other functions cannot use N
Provisioned ConcurrencyPre-initialize N environments to eliminate cold startsCosts money even when idle; enables smooth scaling
Reserved Concurrency as a Throttle

You have a Lambda that processes SQS messages and writes to RDS. RDS max connections = 100. Setting Lambda's reserved concurrency to 50 ensures at most 50 concurrent DB connections, protecting RDS from connection storms during traffic spikes. The RDS Proxy is also a solution but reserved concurrency is simpler to configure.

Event Source Mapping (SQS, Kinesis, DDB Streams)
  • Lambda polls the source and invokes your function with a batch of records
  • Batch size: SQS (1–10,000), Kinesis (1–10,000), DDB Streams (1–10,000)
  • Batch window: Lambda waits up to N seconds to fill a batch before invoking
  • On failure: For SQS, failed batch items are returned to queue. Use bisect on error to split batches and isolate poison pill messages
  • Partial Batch Response: Return batchItemFailures to tell Lambda which items failed — only those are retried
🎯

"Lambda hammering RDS with connections" → Set reserved concurrency OR use RDS Proxy. "Cold start latency for critical path" → Provisioned Concurrency. "Lambda should not exceed X invocations" → Reserved Concurrency.

⚠️

Common traps:

  • "Reserved Concurrency eliminates cold starts" — FALSE; Provisioned Concurrency eliminates cold starts. Reserved only caps concurrency.
  • "S3 event notifications invoke Lambda synchronously" — FALSE; S3 invokes Lambda asynchronously. The call returns immediately and Lambda retries on failure.
  • "SQS FIFO queues support exactly-once Lambda processing automatically" — PARTIALLY TRUE; FIFO provides exactly-once delivery, but your Lambda handler must still be idempotent for robustness.
🚨 Lambda Error Handling — DLQ, Destinations & Retry Behavior
LambdaHigh Frequency
Async Invocation Retry & DLQ
  • On failure, Lambda retries async invocations up to 2 times (3 total attempts)
  • Events can wait in the async event queue for up to 6 hours
  • DLQ: Configure an SQS queue or SNS topic as the DLQ. Failed events (after all retries) are sent here for investigation
  • DLQ only captures the final failure; Lambda Destinations capture success OR failure at each attempt
Lambda Destinations (preferred over DLQ)
Destination TypeOn SuccessOn Failure
SQS Queue
SNS Topic
EventBridge Bus
Another Lambda Function
DLQ vs. Destinations

DLQ: only captures the event payload when all retries are exhausted. Lambda Destinations: sends a richer record (original event + function response/error + execution context) for both success and failure conditions. Destinations are more flexible and are the modern recommendation.

Error Handling in Event Source Mappings
  • Kinesis/DDB Streams: If batch fails, Lambda retries the entire batch until success OR the data expires from the stream. Risk of blocking the shard. Configure bisectBatchOnFunctionError to split the batch and isolate poison pills.
  • SQS: Failed messages return to queue and become visible after visibility timeout. After maxReceiveCount, moved to DLQ.
  • Partial Batch Response (SQS/Kinesis): Return {"batchItemFailures": [{"itemIdentifier": "..."}]} to retry only specific messages, not the whole batch
🎯

Modern pattern: Use Lambda Destinations over DLQ for async functions — you get richer metadata and can route success and failure independently. Use DLQ for simplicity or SQS event source mapping failures.

⚠️

Common traps:

  • "DLQ works for all Lambda invocation types" — FALSE; DLQ only works for async invocations. For event source mappings (SQS/Kinesis), configure the source's own DLQ.
  • "Lambda automatically retries synchronous invocations" — FALSE; with synchronous invocations, the caller receives the error and must implement its own retry logic.
🔒 Lambda in VPC, Layers & Environment Variables
LambdaVPC
Lambda in VPC
  • By default, Lambda runs in an AWS-managed VPC with internet access but no access to your private VPC resources (RDS, ElastiCache, internal ALBs)
  • Configure Lambda with VPC subnets + security groups → Lambda creates an ENI (Elastic Network Interface) in your subnet
  • Internet access in VPC: Lambda in a private subnet needs a NAT Gateway to reach the internet. Lambda in a VPC does NOT get internet access via a public subnet without NAT
  • VPC Endpoints: Use to privately access AWS services (S3, DynamoDB) from Lambda-in-VPC without NAT Gateway
Lambda Layers
  • Layers are ZIP archives containing libraries, a custom runtime, or data — shared across multiple functions
  • Up to 5 layers per function; combined unzipped size ≤ 250 MB
  • Common use: Share boto3, numpy, pandas, or common utility code across Lambda functions
  • Layers are versioned — reference a specific layer version ARN in your function config
Environment Variables
  • Key-value pairs available to function code via os.environ (Python) or process.env (Node.js)
  • Encrypted at rest using KMS (Lambda service key by default, or your CMK)
  • Do NOT store plaintext secrets in environment variables — use Secrets Manager or SSM Parameter Store and fetch at init time
  • Max 4 KB total for all environment variables combined
Lambda Extensions
  • Run alongside the Lambda function in the same execution environment
  • Use cases: monitoring agents, security agents, secret rotation, log forwarding (e.g., Datadog, New Relic agents)
  • Internal extensions: run in the same process. External extensions: run as separate processes
🎯

"Lambda needs to access RDS in private subnet" → Configure Lambda with same VPC + security group allowing DB port. "Lambda needs to call external API but also access RDS in VPC" → Lambda in private subnet + NAT Gateway for external internet. "Share common library across 10 Lambda functions" → Lambda Layer.

⚠️

Common traps:

  • "Putting Lambda in a public subnet gives it internet access" — FALSE; Lambda in a VPC (even public subnet) does NOT get internet access automatically. You need a NAT Gateway in a public subnet, with Lambda in a private subnet routing to it.
  • "Lambda can have unlimited layers" — FALSE; maximum 5 layers per function.
  • "Lambda environment variables are always secure" — PARTIALLY TRUE; they are encrypted at rest with KMS, but plaintext secrets are visible to anyone with IAM GetFunctionConfiguration permission. Use Secrets Manager for true secrets.
Task 1.3

Use Data Stores in Application Development

DynamoDB, S3, RDS/Aurora, ElastiCache, and specialized data stores — selection, access patterns, and caching.

Skills in:
  • Describe high-cardinality partition keys, database consistency models, and differences between query and scan operations
  • Define Amazon DynamoDB keys and indexing (LSI, GSI)
  • Serialize and deserialize data; manage data lifecycles
  • Use data caching services; use specialized data stores based on access patterns (e.g., Amazon OpenSearch Service)
🗃️ Amazon DynamoDB — Keys, Indexes & Data Modeling
DynamoDBExam Fave

DynamoDB is a fully managed NoSQL database. The DVA-C02 exam heavily tests data modeling, index selection, and read/write patterns.

Key Structure
Key TypeComponentsUniqueness
Simple Primary KeyPartition Key (PK) onlyPK must be unique across all items
Composite Primary KeyPartition Key (PK) + Sort Key (SK)PK+SK combo must be unique
Indexes
LSI (Local Secondary Index)GSI (Global Secondary Index)
Partition KeySame as base tableCan be any attribute
Sort KeyDifferent from base tableCan be any attribute
Creation timeAt table creation onlyAnytime
ConsistencyStrongly consistent reads supportedEventually consistent only
StorageWithin same partition as base tableSeparate partition space
Max per table520
Single-Table Design Example

A social media app stores users and posts. PK = USER#userId, SK = PROFILE for user items. PK = USER#userId, SK = POST#timestamp for post items. One table, multiple entity types — query all posts for a user with a single Query operation.

Partition Key Selection
  • High-cardinality keys = even data distribution across partitions = avoid hot partitions
  • Good: User ID, Order ID, Session ID (high cardinality, random distribution)
  • Bad: Country, Status, Boolean flag (low cardinality = hot partition)
  • Write sharding: If you must use a low-cardinality key, append a random suffix (1-N) to distribute writes, then use a GSI or Query across shards to read
Capacity Modes
Provisioned ModeOn-Demand Mode
BillingPer RCU/WCU provisionedPer request (read/write)
PredictabilityPredictable traffic patternsUnknown or variable traffic
Auto ScalingSupported (but lags behind spikes)Instant — no pre-planning
💡

LSI vs. GSI: Local = same partition key, Limited to creation time. Global = any key, Great flexibility. If you might need different access patterns later, plan for GSIs at design time.

⚠️

Common traps:

  • "GSIs support strongly consistent reads" — FALSE; GSIs are always eventually consistent. LSIs support both.
  • "You can add an LSI to an existing table" — FALSE; LSIs must be created with the table. GSIs can be added anytime.
  • "DynamoDB automatically distributes data evenly across partitions" — TRUE only if your partition key has high cardinality. A poorly chosen PK leads to hot partitions and throttling.
📖 DynamoDB — Query vs. Scan, Consistency & Advanced Features
DynamoDBHigh Frequency
Query vs. Scan
QueryScan
RequiresPartition key value (mandatory)Nothing (reads entire table)
Optional filterSort key condition + filter expressionFilter expression
EfficiencyEfficient — reads only matching itemsExpensive — reads all items, filters after read
CostLow (reads proportional to results)High (reads entire table)
UseProduction access patternAvoid in production; admin/migration use
Consistency Models
Eventually ConsistentStrongly Consistent
LatencyLowerSlightly higher
Cost0.5 RCU per 4 KB1 RCU per 4 KB
Reads latest data?Not guaranteed (may see stale data)Always latest committed data
DefaultYesOpt-in via ConsistentRead=true
DynamoDB Streams
  • Ordered log of item-level changes (INSERT, MODIFY, REMOVE) — retained 24 hours
  • View types: KEYS_ONLY, NEW_IMAGE, OLD_IMAGE, NEW_AND_OLD_IMAGES
  • Trigger Lambda functions for change-data-capture patterns (e.g., update search index, send notification)
DynamoDB Accelerator (DAX)
  • In-memory cache specifically for DynamoDB — microsecond read latency (vs milliseconds)
  • Fully API-compatible with DynamoDB SDK — change endpoint, no code rewrite needed
  • Cache hit: response from DAX. Cache miss: DAX reads from DynamoDB and caches result
  • Ideal for: read-heavy workloads, hot items, gaming leaderboards, social feeds
  • Does NOT help for write-heavy or strongly consistent read workloads
TTL (Time to Live)
  • Automatically deletes expired items based on a timestamp attribute — no charge for TTL deletes
  • Use for session data, temporary tokens, cart items with expiry
  • TTL deletions are eventually consistent — items may be visible for up to 48 hours after expiry
🎯

"DynamoDB reads are slow" → Add DAX. "Need to trigger Lambda on table changes" → DynamoDB Streams + Lambda event source mapping. "Auto-expire session data" → TTL attribute. "Read only latest price for a product" → Query with ConsistentRead=true (pay double RCU).

⚠️

Common traps:

  • "Scan with FilterExpression only reads filtered items" — FALSE; FilterExpression applies AFTER reading the data. You still pay for all read items before filtering — use Query instead.
  • "DAX works for all DynamoDB operations including Scan and transactional" — FALSE; DAX caches GetItem and Query results. Scans pass through DAX to DynamoDB. Transactional API calls bypass DAX.
🪣 Amazon S3 — Developer Access Patterns
S3High Frequency
Key Developer APIs
FeatureDescriptionUse Case
Presigned URL (GET)Temporary URL to download object; no AWS credentials needed by callerShare private files with users
Presigned URL (PUT)Temporary URL to upload directly to S3 without going through your serverClient-side file uploads (browser, mobile)
Multipart UploadUpload large files in parts (required >5 GB, recommended >100 MB)Large file uploads with resume capability
S3 SelectSQL query to retrieve subset of data from CSV/JSON/Parquet objectReduce data transfer for log analysis
Event NotificationsTrigger SNS, SQS, or Lambda on object eventsThumbnail generation, virus scanning
Consistency Model
  • S3 provides strong read-after-write consistency for new objects and overwrites (as of December 2020)
  • After a successful PUT, any GET of that key returns the latest version
  • Applies to both objects and object listings
S3 Event Notifications
  • Trigger on: s3:ObjectCreated:*, s3:ObjectRemoved:*, s3:Replication:*, etc.
  • Destinations: SQS, SNS, Lambda, EventBridge
  • For complex routing (multiple targets, filtering), use EventBridge as the destination — then route to multiple targets via EventBridge rules
// Presigned URL generation (SDK v3, JavaScript)
const { getSignedUrl } = require("@aws-sdk/s3-request-presigner");
const url = await getSignedUrl(s3Client,
  new GetObjectCommand({ Bucket, Key }),
  { expiresIn: 3600 }
);
🎯

"User uploads large file directly to S3" → Presigned PUT URL (server generates URL, client uploads directly — bypasses your server). "Trigger image resizing Lambda on upload" → S3 Event Notification → Lambda. "Multiple services need to react to the same S3 event" → S3 Event Notification → EventBridge → multiple targets.

⚠️

Common traps:

  • "S3 is eventually consistent for new PUTs" — FALSE (post-2020); S3 now provides strong consistency for all operations including new object creation.
  • "Multipart upload can be used for objects less than 5 MB" — technically possible but not meaningful. Parts must be at least 5 MB (except last part).
  • "S3 event notifications can trigger multiple Lambda functions directly" — FALSE; each event configuration has one destination. Use SNS fanout or EventBridge for multiple targets.
Amazon ElastiCache — Caching Patterns & RDS Connection Pooling
ElastiCacheMedium Frequency
Redis vs. Memcached
FeatureRedisMemcached
Data structuresRich (lists, sets, sorted sets, hashes, bitmaps)Simple key-value strings only
PersistenceYes (RDB snapshots, AOF)No (in-memory only)
Replication/HAYes (primary-replica, Multi-AZ)No
Pub/SubYesNo
Cluster modeYes (sharding)Yes (multi-node, no replication)
Best forSessions, leaderboards, queues, complex cachingSimple caching, multi-threaded scale-out
Caching Patterns
PatternRead FlowWrite FlowDrawback
Lazy Loading (Cache-Aside)Check cache → miss → read DB → store in cacheWrite to DB only (cache updated on next miss)Cache miss = 3 trips; potential stale data
Write-ThroughAlways read from cacheWrite to DB AND cache simultaneouslyCache bloat (items written but never read)
Write-Behind (Write-Back)Read from cacheWrite to cache, async write to DBData loss risk if cache fails before DB write
RDS Proxy
  • Sits between your application and RDS — pools and multiplexes database connections
  • Critical for Lambda → RDS: Lambda can open thousands of concurrent connections during spikes; RDS Proxy limits actual DB connections
  • Reduces failover time for Aurora Multi-AZ by maintaining connections through DB failover
  • Supports MySQL, PostgreSQL, MariaDB, Aurora (not Oracle)
🎯

"Leaderboard with real-time rank" → Redis sorted set. "Lambda connecting to RDS causes too many connections" → RDS Proxy. "Cache frequently read DB results" → ElastiCache lazy loading (cache-aside). "Need HA cache with failover" → Redis with Multi-AZ enabled.

⚠️

Common traps:

  • "Memcached supports HA with primary-replica failover" — FALSE; Memcached is a distributed cache with no replication. If a node fails, that data is lost. Use Redis for HA.
  • "DAX and ElastiCache serve the same purpose" — FALSE; DAX is DynamoDB-specific and API-compatible. ElastiCache is a general-purpose cache for any backend. Use DAX for DynamoDB, ElastiCache for RDS/MySQL/etc.

Domain 2 Overview

Secure applications and data on AWS. Covers authentication and authorization (Cognito, IAM, STS), encryption at rest and in transit (KMS, ACM), and management of sensitive data (Secrets Manager, Parameter Store).

⚡ 26% of scored content
Task 2.1

Implement Authentication and/or Authorization

IAM roles, Cognito User & Identity Pools, bearer tokens, STS, programmatic access, and cross-service auth.

Skills in:
  • Use an identity provider to implement federated access (Amazon Cognito, IAM)
  • Secure applications by using bearer tokens; configure programmatic access to AWS
  • Assume an IAM role; define permissions for IAM principals
  • Implement application-level authorization for fine-grained access control
  • Handle cross-service authentication in microservice architectures
🔑 IAM for Developers — Roles, STS & Programmatic Access
IAMHigh Frequency

Developers interact with IAM primarily through roles and the STS credential chain. Understanding how services assume roles and how temporary credentials flow is fundamental to DVA-C02.

SDK Credential Resolution Chain (in order)
  • 1. Code-level credentials (never use in production)
  • 2. Environment variables: AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, AWS_SESSION_TOKEN
  • 3. AWS credentials file: ~/.aws/credentials
  • 4. AWS config file: ~/.aws/config
  • 5. ECS container credentials (task role via metadata endpoint)
  • 6. EC2 Instance Profile / Lambda execution role (IMDS endpoint)
Key STS API Calls
APIUse Case
AssumeRoleCross-account or service-to-service, Lambda assume role
AssumeRoleWithWebIdentityFederated via OIDC (Google, Amazon, GitHub Actions)
AssumeRoleWithSAMLFederated via enterprise IdP (ADFS, Okta) using SAML 2.0
GetSessionTokenAdd MFA requirement to existing IAM user session
IAM Policy Evaluation
  • Default = implicit Deny
  • Explicit Deny always wins — overrides any Allow
  • Action allowed only when: explicit Allow exists AND no Deny blocks it at any level (SCP → Permission Boundary → Identity Policy → Resource Policy)
Cross-Service Auth in Microservices

Service A (ECS task) calls Service B (Lambda via API Gateway). Pattern: ECS task has an IAM task role. Task calls STS AssumeRole to get temporary credentials for Service B's IAM role, or passes a signed request (Signature Version 4). The Lambda function's resource policy must allow Service A's role to invoke it.

🎯

Lambda execution role — the role the Lambda function assumes. Defined in Role in function config. Lambda resource-based policy — who can invoke the function (e.g., API Gateway, S3). Both must be correct for cross-service invocations.

⚠️

Common traps:

  • "A Lambda function with an execution role can call any AWS API" — FALSE; the execution role must explicitly allow the specific API actions needed.
  • "IAM permissions boundaries grant additional permissions" — FALSE; boundaries only restrict the maximum permissions — they never grant by themselves.
  • "ECS task role and EC2 instance role are the same concept" — conceptually similar but different configuration. ECS uses taskRoleArn in the task definition.
👤 Amazon Cognito — User Pools vs. Identity Pools
CognitoExam Fave

Cognito is the primary AWS service for application-level auth. The exam frequently tests the distinction between User Pools (authentication) and Identity Pools (authorization for AWS resources).

User Pools vs. Identity Pools
User PoolsIdentity Pools (Federated Identities)
PurposeUser directory + authenticationExchange any token for AWS credentials
ReturnsJWT tokens (id, access, refresh)Temporary STS credentials (IAM role)
Supported IdPsUsername/password, Google, Facebook, SAML, OIDCCognito User Pool, Google, Facebook, SAML, unauthenticated
AWS API accessNo (tokens for app-level auth only)Yes (STS creds for S3, DynamoDB, etc.)
Use caseLogin for web/mobile appLet users directly call AWS APIs (e.g., upload to S3)
Cognito User Pool — JWT Tokens
  • ID Token: Contains user identity claims (sub, email, custom attributes). Verify with your API.
  • Access Token: Authorize against Cognito APIs (e.g., update user attributes). Use as Bearer token in API Gateway Cognito authorizer.
  • Refresh Token: Long-lived (default 30 days). Exchange for new id/access tokens without re-login.
Common Integration Pattern
sequenceDiagram participant User participant UP as Cognito User Pool participant IP as Cognito Identity Pool participant S3 User->>UP: Login (username/password) UP-->>User: JWT Tokens (id, access, refresh) User->>IP: Exchange JWT for AWS credentials IP-->>User: Temp STS credentials (IAM Role) User->>S3: Upload file using STS credentials
Lambda Triggers
  • Pre-signup: Validate or auto-confirm users during registration
  • Pre-token generation: Add custom claims to tokens
  • Post-authentication: Log or audit successful logins
  • Custom message: Customize verification/MFA messages
  • User migration: Migrate users from legacy system on first login
💡

User Pool = Who are you? (AuthN) → returns JWT. Identity Pool = What can you do on AWS? (AuthZ) → returns STS credentials. Need both for "login with Google then upload to S3."

🎯

"Mobile app users log in and upload to S3 directly" → User Pool (authenticate) + Identity Pool (get STS creds for S3). "API Gateway validates JWT from Cognito" → Cognito User Pool authorizer on API GW. "Add user's department to JWT claims" → Pre-token generation Lambda trigger.

⚠️

Common traps:

  • "Cognito User Pool tokens grant direct access to AWS services like S3" — FALSE; User Pool tokens are JWTs for app-level auth. You need Identity Pools to exchange them for STS credentials.
  • "Identity Pools require Cognito User Pools as the IdP" — FALSE; Identity Pools support any OIDC-compatible IdP (Google, Facebook, Apple) as well as unauthenticated (guest) access.
🛡️ API Gateway Authorization — Lambda Authorizers & Bearer Tokens
API GatewayIAM
Authorization Method Decision Tree
ScenarioRecommended Authorizer
Internal AWS service calling APIIAM Authorization (Sig V4)
Users authenticated via Cognito User PoolCognito User Pool Authorizer
Custom JWT from 3rd-party IdP (Auth0, Okta)Lambda Authorizer (Token type)
Complex rules: IP allowlist, multi-header logicLambda Authorizer (Request type)
Throttle / meter partner integrationsAPI Keys + Usage Plans
Lambda Authorizer Flow
  • API GW invokes your Lambda with the Bearer token (or full request)
  • Lambda validates the token (e.g., verifies JWT signature, checks expiry)
  • Lambda returns an IAM policy document: {"principalId": "user123", "policyDocument": {...}}
  • Policy caching: API GW caches the returned policy by token (TTL configurable). Reduces Lambda invocations per request.
# Lambda authorizer response structure
{
  "principalId": "user-123",
  "policyDocument": {
    "Version": "2012-10-17",
    "Statement": [{
      "Action": "execute-api:Invoke",
      "Effect": "Allow",
      "Resource": "arn:aws:execute-api:*:*:*"
    }]
  },
  "context": { "userId": "user-123" }
}
🎯

Pass context from authorizer to Lambda backend: Add context field in authorizer response. The backend Lambda receives it via event.requestContext.authorizer.userId. Avoids redundant token parsing in every backend function.

⚠️

Common traps:

  • "Lambda authorizer is called for every API request" — FALSE when caching is enabled. The policy is cached by token value for the configured TTL.
  • "API Keys provide strong security authentication" — FALSE; API Keys are for usage tracking and throttling, not for authentication. Never use them as a security mechanism.
Task 2.2

Implement Encryption by Using AWS Services

KMS, envelope encryption, S3 encryption options, in-transit TLS, ACM, and key rotation.

Skills in:
  • Define encryption at rest and in transit; describe differences between client-side and server-side encryption
  • Describe certificate management (AWS Private CA); use encryption keys to encrypt or decrypt data
  • Generate certificates and SSH keys for development; use encryption across account boundaries
  • Enable and disable key rotation
🔐 AWS KMS — Key Types, Key Policy & Envelope Encryption
KMSExam Fave

AWS KMS is the foundation of encryption on AWS. Every developer must understand the three key types, envelope encryption, and how to call KMS APIs from code.

KMS Key Types
Key TypeWho ManagesKey PolicyCostUse Case
AWS Managed KeyAWS (auto-rotated every year)Managed by AWSFreeDefault for S3, RDS, EBS (e.g., aws/s3)
Customer Managed Key (CMK)You (optional auto-rotation)You control it$1/month + API callsCustom encryption, cross-account, key policy control
Imported Key MaterialYou (bring your own key)You control it$1/monthCompliance: must own key material
Envelope Encryption
  • KMS can only encrypt data up to 4 KB directly. For larger data, use envelope encryption:
  • Step 1: Call GenerateDataKey → KMS returns a plaintext data key (DEK) AND an encrypted DEK
  • Step 2: Encrypt your data locally with the plaintext DEK (AES-256)
  • Step 3: Store the encrypted DEK alongside the encrypted data. Discard the plaintext DEK.
  • Decryption: Call Decrypt with the encrypted DEK → get plaintext DEK → decrypt data locally
Encryption Context
  • Optional additional authenticated data (AAD) — key-value pairs sent with encrypt/decrypt calls
  • If you encrypt with context {"purpose":"payments"}, you must provide the same context to decrypt
  • Appears in CloudTrail logs — useful for auditing which application/service used a key
Key Policy Essentials
  • Every KMS key has a key policy (resource-based policy). Unlike IAM, the key policy must explicitly allow the AWS account or it has no access — even root.
  • Cross-account: Key policy must allow the external account → then IAM policy in that account must also allow kms:Decrypt
Key Rotation
Automatic RotationManual Rotation
FrequencyEvery 365 days (cannot customize)Any schedule you choose
Old key materialRetained for decryption of old ciphertextMust keep old key alias active
Application change?No (same Key ID)Yes (update alias to point to new key)
Available forCMKs only (not imported key material)All key types
💡

Envelope Encryption = DEK inside KMS envelope: Your data is encrypted with a local DEK. The DEK is "wrapped" (encrypted) by KMS. Only KMS can unwrap it. The data never goes to KMS.

🎯

"Encrypt a 500 MB file with KMS" → Use GenerateDataKey + encrypt locally (envelope encryption). "Compliance requires 90-day rotation" → Manual key rotation (automatic is fixed at 365 days). "Audit which app used KMS" → Add Encryption Context + query CloudTrail.

⚠️

Common traps:

  • "AWS Managed Keys can be disabled or deleted by the customer" — FALSE; you have no control over AWS Managed Keys. Use CMKs for control.
  • "KMS automatic rotation changes the Key ID" — FALSE; rotation only changes the key material. The Key ID and alias stay the same — no application changes needed.
  • "Imported key material supports automatic rotation" — FALSE; automatic rotation is only for CMKs with KMS-generated key material.
🔒 Encryption Options — S3, At Rest vs. In Transit, Client vs. Server-Side
S3KMSHigh Frequency
S3 Server-Side Encryption Options
MethodKey ManagementKey AccessUse Case
SSE-S3S3 manages keys entirelyTransparent to userDefault encryption; no key management needed
SSE-KMSAWS KMS CMKYou control key policy; CloudTrail logs accessAudit who accessed, cross-account encryption
SSE-CCustomer provides key in request headerAWS never stores the keyCompliance: must own key material; key not on AWS
Client-SideEncrypted before sending to S3AWS never sees plaintextMaximum control; AWS cannot access data even with account compromise
Encryption in Transit
  • All AWS SDK and console calls use HTTPS (TLS) by default
  • Enforce HTTPS on S3: Bucket policy with Deny when aws:SecureTransport = false
  • ACM (AWS Certificate Manager): Free public TLS certificates for ALBs, CloudFront, API GW. Auto-renews.
  • CloudFront certificates must be in us-east-1 region — hard exam fact
Client-Side vs. Server-Side
Server-Side (SSE)Client-Side
Who encryptsAWS service (S3, RDS, etc.)Your application code
AWS sees plaintext?Briefly (SSE-S3, SSE-KMS)Never
ComplexityLow — checkbox / configHigher — you manage encryption library
Performance overheadNone to youCPU on client
🎯

"AWS must NEVER see the key" → SSE-C or Client-Side encryption. "Audit every access to S3 objects" → SSE-KMS (every decrypt call appears in CloudTrail). "Simplest encryption with no management" → SSE-S3 (enabled by default on all new buckets since Jan 2023).

⚠️

Common traps:

  • "SSE-KMS means AWS cannot read your data" — FALSE; with SSE-KMS, AWS KMS holds the key. With SSE-C or client-side encryption, AWS cannot read plaintext. Compliance may require one over the other.
  • "ACM certificates can be used in any region for CloudFront" — FALSE; CloudFront requires ACM certificates in us-east-1.
Task 2.3

Manage Sensitive Data in Application Code

Secrets Manager, Parameter Store, environment variable encryption, data classification, and masking.

Skills in:
  • Describe data classification (PII, PHI); encrypt environment variables containing sensitive data
  • Use secret management services to secure sensitive data
  • Sanitize sensitive data; implement application-level data masking and sanitization
  • Implement data access patterns for multi-tenant applications
🗝️ AWS Secrets Manager vs. SSM Parameter Store
SecretsExam Fave

Choosing between Secrets Manager and Parameter Store is a common DVA-C02 scenario. Secrets Manager is purpose-built for secrets with rotation; Parameter Store is for configuration and simple secrets.

Comparison
FeatureSecrets ManagerSSM Parameter Store
PurposeSecrets (DB passwords, API keys)Config + secrets
Auto rotation✅ Native (uses Lambda)❌ No native rotation
RDS integration✅ Native DB credential rotation❌ Manual only
EncryptionAlways KMS-encryptedSecureString uses KMS; String is plaintext
Cost$0.40/secret/month + $0.05/10K API callsFree (Standard) / $0.05/10K (Advanced)
Max value size64 KB4 KB (Standard), 8 KB (Advanced)
Hierarchy / pathsLimited naming✅ Path-based: /app/prod/db-url
Best forDatabase passwords, API keys needing rotationApp config, feature flags, non-rotating secrets
Secrets Manager Auto Rotation
  • Rotation uses a Lambda function (AWS provides templates for RDS, Redshift, DocumentDB)
  • Rotation schedule: interval in days or cron expression
  • On rotation: new secret version is created → application fetches latest automatically using the AWSCURRENT label
  • Versions: AWSCURRENT (active), AWSPREVIOUS (just rotated), AWSPENDING (being created)
Best Practice: Fetch at Init, Cache in Memory
# Fetch secret once at Lambda cold start — cache for warm invocations
import boto3, json
client = boto3.client('secretsmanager')
secret = json.loads(client.get_secret_value(
    SecretId='prod/myapp/db'
)['SecretString'])
DB_PASSWORD = secret['password']  # cached globally
🎯

"DB password needs auto-rotation every 30 days" → Secrets Manager. "App config values like feature flags" → Parameter Store (free). "Multiple apps share same config path" → Parameter Store with path hierarchy. "Secrets Manager or Env Vars?" → Always Secrets Manager for production secrets.

⚠️

Common traps:

  • "SSM Parameter Store SecureString auto-rotates" — FALSE; you must write your own rotation logic. Secrets Manager has this built-in.
  • "Storing secrets in Lambda environment variables is secure" — PARTIALLY; env vars are KMS-encrypted at rest but are visible in the Lambda console and to any principal with GetFunctionConfiguration access. Use Secrets Manager for sensitive production credentials.
🏷️ Data Classification, Masking & Multi-Tenant Patterns
SecurityMedium Frequency
Data Classification
TypeExamplesRegulation
PII (Personally Identifiable Information)Name, email, SSN, phone, IP addressGDPR, CCPA
PHI (Protected Health Information)Medical records, diagnoses, insurance IDsHIPAA
PCI-DSSCredit card numbers, CVVsPCI-DSS
Data Masking & Sanitization
  • Masking: Replace sensitive data with a representative value (e.g., 4111-****-****-1234 for credit card)
  • Tokenization: Replace sensitive data with a non-sensitive token; original data stored in a secure vault
  • Sanitization: Remove or replace dangerous characters to prevent injection (SQL injection, XSS)
  • Never log PII/PHI — use structured logging with field-level exclusion
Multi-Tenant Data Isolation Patterns
PatternIsolation LevelAWS Implementation
Silo (per-tenant DB)High (physical isolation)Separate RDS instances; separate DynamoDB tables
Pool (shared DB, tenant ID column)Medium (logical isolation)DynamoDB partition key = tenantId; RDS with row-level security
Bridge (hybrid)Medium-HighShared infrastructure, per-tenant encryption keys (KMS per tenant)
🎯

"Scan S3 for PII/PHI" → Amazon Macie. "Multi-tenant app — isolate data per tenant in DynamoDB" → Partition key = tenantId. Use IAM condition dynamodb:LeadingKeys to enforce each user can only access their own partitions via Cognito Identity Pool role.

Domain 3 Overview

Package, deploy, and automate the release of AWS applications. Covers AWS SAM, CloudFormation, CDK, CI/CD pipeline services (CodeBuild, CodeDeploy, CodePipeline), deployment strategies, and testing approaches.

⚡ 24% of scored content
Task 3.1

Prepare Application Artifacts to be Deployed to AWS

Lambda packaging, AWS SAM, CloudFormation, CDK, AppConfig, and IaC templates.

Skills in:
  • Manage dependencies within the package (env vars, config files, container images)
  • Organize files and directory structure for deployment
  • Use code repositories in deployment environments
  • Apply application requirements for resources (memory, cores)
  • Prepare application configurations for specific environments (using AWS AppConfig)
📦 AWS SAM — Serverless Application Model
SAMExam Fave

AWS SAM is a shorthand extension of CloudFormation for serverless apps. The exam tests SAM template syntax, local testing, and the build/deploy workflow.

SAM Template Structure
AWSTemplateFormatVersion: '2010-09-09'
Transform: AWS::Serverless-2016-10-31   # Declares this is a SAM template
Globals:
  Function:
    Timeout: 30
    Runtime: python3.12
    Environment:
      Variables:
        TABLE_NAME: !Ref MyTable
Resources:
  MyFunction:
    Type: AWS::Serverless::Function
    Properties:
      Handler: app.lambda_handler
      CodeUri: ./src/
      Events:
        ApiEvent:
          Type: Api
          Properties:
            Path: /hello
            Method: get
SAM Resource Types
SAM TypeExpands To
AWS::Serverless::FunctionLambda Function + IAM Role + optional Event Source Mapping
AWS::Serverless::ApiAPI Gateway REST API + Stage + Deployment
AWS::Serverless::HttpApiAPI Gateway HTTP API
AWS::Serverless::SimpleTableDynamoDB table (single primary key)
AWS::Serverless::StateMachineStep Functions state machine
SAM CLI Workflow
  • sam init — scaffold a new SAM app from a template
  • sam build — package code and dependencies into .aws-sam/build
  • sam local invoke — run a Lambda function locally with a test event JSON
  • sam local start-api — start a local API Gateway emulator
  • sam deploy --guided — deploy to AWS; creates/updates CloudFormation stack
  • sam logs — tail Lambda logs from CloudWatch
🎯

SAM Transform: The line Transform: AWS::Serverless-2016-10-31 is what makes a template a SAM template. During deployment, CloudFormation calls the SAM transform to expand SAM-specific resource types into standard CloudFormation resources.

⚠️

Common traps:

  • "SAM is a completely separate service from CloudFormation" — FALSE; SAM is a superset/extension of CloudFormation. sam deploy creates a CloudFormation stack.
  • "sam local invoke tests the same environment as AWS Lambda" — FALSE; local invocation runs in a Docker container that simulates Lambda but is not identical (no VPC, different IAM, different resource limits).
🏗️ AWS CloudFormation — Stacks, Templates & Change Sets
CloudFormationHigh Frequency
Core Concepts
  • Stack: A collection of AWS resources managed as a single unit. Create/update/delete together.
  • Template: JSON or YAML file defining the desired state. Sections: Parameters, Mappings, Conditions, Resources (required), Outputs
  • Change Set: Preview the impact of a template update before executing. Shows what will be Added/Modified/Removed.
  • Drift Detection: Identify resources that have been manually modified outside CloudFormation
  • Stack Policy: Protects stack resources from unintended updates during stack update operations
Key Template Functions
FunctionPurposeExample
!RefReference parameter or resource!Ref MyBucket
!GetAttGet attribute of a resource!GetAtt MyTable.Arn
!SubString substitution!Sub "arn:aws:s3:::${BucketName}/*"
!ImportValueImport an exported Output from another stackCross-stack references
!IfConditional resource creation!If [IsProd, ProdConfig, DevConfig]
Nested Stacks & StackSets
  • Nested Stacks: Reusable template components — a root stack references child stack templates via S3 URLs. Used to break up large templates.
  • StackSets: Deploy a single template to multiple accounts and regions simultaneously — essential for multi-account governance
AWS AppConfig
  • Managed feature flags and configuration — separate configuration from code deployments
  • Supports validation (JSON schema) before config goes live
  • Supports gradual deployment of config changes (canary, linear, all-at-once)
  • Applications fetch config at runtime without code redeployment
🎯

"Preview what changes before updating stack" → Change Set. "Deploy same template to 50 accounts" → StackSets. "Share VPC ID between stacks" → Outputs + !ImportValue. "Separate feature flags from Lambda code" → AppConfig.

⚠️

Common traps:

  • "CloudFormation can import existing resources into a stack" — TRUE via Resource Import, but only for supported resource types.
  • "!ImportValue creates a hard dependency — if the exporting stack tries to delete the exported Output, it will fail until all importing stacks are deleted first."
Task 3.2

Test Applications in Development Environments

API Gateway stages and stage variables, Lambda testing strategies, integration tests, mock APIs.

Skills in:
  • Test deployed code using AWS services and tools
  • Write integration tests and mock APIs for external dependencies
  • Test applications by using development endpoints (configuring stages in Amazon API Gateway)
  • Deploy application stack updates to existing environments (e.g., deploying SAM template to a different staging environment)
  • Test event-driven applications
🧪 API Gateway Stages, Stage Variables & Testing
API GatewayMedium Frequency
Stage-Based Testing Pattern
  • Deploy same API to dev, test, and prod stages — each stage is an independent deployment
  • Stage Variables: Use ${stageVariables.lambdaAlias} in Lambda integration ARN to route each stage to a different Lambda alias
  • Canary deployments on stages: Route X% of traffic to a new deployment version while keeping the rest on stable — built into API GW stage settings
Testing Approaches
Test TypeToolWhat It Tests
Unit testpytest, Jest, JUnitIndividual functions in isolation
Local integrationsam local invoke / start-apiLambda handler with real event payloads
Integration testDeploy to dev stageFull stack — API GW → Lambda → DynamoDB
Mock APIAPI GW mock integrationFrontend dev without backend
Event testLambda console test, SAM CLIEvent-driven functions (S3, SQS, DDB Streams)
Testing Event-Driven Apps
  • Use SAM CLI event templates: sam local generate-event s3 put — generates sample S3 event JSON
  • For SQS: sam local generate-event sqs receive-message
  • Test Lambda → SQS → Lambda pipelines locally with sam local start-lambda + real SQS queue in dev account
🎯

"Test new Lambda version without affecting prod traffic" → Deploy to Lambda alias pointing to new version. Use API GW stage variable to route dev stage to dev alias. "Frontend team needs API without backend" → API GW Mock Integration returns hardcoded response.

Task 3.3

Automate Deployment Testing

Lambda versions and aliases, container image tags, Amplify branches, and IaC deployment automation.

Skills in:
  • Create application test events (JSON payloads for Lambda, API GW, SAM)
  • Deploy API resources to various environments
  • Create application environments using approved versions (Lambda aliases, container image tags, Amplify branches)
  • Implement and deploy IaC templates (SAM, CloudFormation)
  • Manage environments in individual AWS services (dev/test/prod in API GW)
  • Use Amazon Q Developer to generate automated tests
🏷️ Lambda Versions & Aliases — Immutable Releases & Traffic Shifting
LambdaHigh Frequency
Versions
  • Publishing a version creates an immutable snapshot of the function code + config
  • Versions are numbered: :1, :2, :3
  • $LATEST is the mutable, unpublished version — always points to the latest code
  • You cannot update a published version — it is read-only
Aliases
  • Named pointers to specific versions (e.g., live → v5, beta → v6)
  • Weighted routing: An alias can split traffic between two versions — e.g., live = 90% v5 + 10% v6. Used for blue/green and canary testing.
  • API Gateway stage variable or event source mapping references the alias — swap the version behind the alias without changing integrations
graph LR D["dev stage\n(API GW)"] --> DA["Lambda alias: dev\n→ $LATEST"] P["prod stage\n(API GW)"] --> PA["Lambda alias: live\n90% v5 / 10% v6"] PA -->|90%| V5["Version 5\n(stable)"] PA -->|10%| V6["Version 6\n(new)"]
Container Image Tags & ECR
  • Lambda supports container images up to 10 GB (vs 250 MB for ZIP)
  • Images stored in Amazon ECR — tag with :latest, :v1.2.0, or commit SHA for traceability
  • CI/CD pipeline: build → push to ECR with version tag → update Lambda to use new image URI → run tests → promote tag to :stable
🎯

"Test 10% of prod traffic on new Lambda version" → Lambda alias with weighted routing (90/10). "Roll back to previous version instantly" → Update alias to point back to previous version number. "Prevent accidental deploy of untested code" → Require published version (not $LATEST) in prod alias.

⚠️

Common traps:

  • "Updating an alias to point to a new version causes downtime" — FALSE; alias update is near-instant and atomic.
  • "$LATEST is a publishable version" — FALSE; $LATEST is always the unpublished current state. Publishing creates numbered versions from $LATEST.
  • "Lambda aliases can point to another alias" — FALSE; aliases can only point to specific version numbers or $LATEST, not to other aliases.
Task 3.4

Deploy Code by Using AWS CI/CD Services

CodeCommit, CodeBuild, CodeDeploy, CodePipeline, deployment strategies, rollback.

Skills in:
  • Describe Lambda deployment packaging options and API Gateway stages with custom domains
  • Update existing IaC templates; manage application environments using AWS services
  • Deploy an application version using deployment strategies; commit code to invoke build/test/deploy actions
  • Use orchestrated workflows to deploy code to different environments
  • Perform application rollbacks; use labels and branches for version and release management
  • Configure deployment strategies (blue/green, canary, rolling) for application releases
🚀 AWS CI/CD Pipeline — CodeCommit, CodeBuild, CodeDeploy & CodePipeline
CI/CDExam Fave

The four Code* services form the AWS-native CI/CD toolchain. Know each service's role and how they integrate in a pipeline.

Service Roles
ServicePurposeKey Concept
AWS CodeCommitGit repository (source control)Private Git repos; triggers CodePipeline on push
AWS CodeBuildFully managed build & test serviceRuns buildspec.yml; Docker-based; pay-per-minute
AWS CodeDeployAutomated deployment to compute targetsEC2, Lambda, ECS; uses appspec.yml
AWS CodePipelineOrchestrates the full CI/CD workflowStages: Source → Build → Test → Deploy; integrates with GitHub/Bitbucket too
AWS CodeArtifactArtifact repository (npm, Maven, PyPI)Cache and share packages; pull-through cache from public repos
buildspec.yml — CodeBuild
version: 0.2
phases:
  install:
    commands:
      - pip install -r requirements.txt
  build:
    commands:
      - python -m pytest tests/
      - sam build
  post_build:
    commands:
      - sam package --s3-bucket $ARTIFACT_BUCKET
artifacts:
  files:
    - packaged-template.yaml
appspec.yml — CodeDeploy for Lambda
version: 0.0
Resources:
  - MyLambdaFunction:
      Type: AWS::Lambda::Function
      Properties:
        Name: MyFunction
        Alias: live
        CurrentVersion: !Ref Version1
        TargetVersion: !Ref Version2
Hooks:
  - BeforeAllowTraffic: PreTrafficHookFunction
  - AfterAllowTraffic: PostTrafficHookFunction
🎯

CodeBuild vs. CodeDeploy: CodeBuild compiles, tests, packages. CodeDeploy deploys the artifact to a target. CodePipeline orchestrates both. Common exam trap: mixing up which service does what.

⚠️

Common traps:

  • "CodePipeline can only use CodeCommit as source" — FALSE; CodePipeline supports GitHub, GitHub Enterprise, GitLab, Bitbucket, S3, and ECR as sources.
  • "CodeBuild replaces CodeDeploy" — FALSE; they serve different purposes. CodeBuild builds and tests; CodeDeploy deploys the built artifact.
🔄 Deployment Strategies — Blue/Green, Canary, Rolling, All-at-Once
CI/CDExam Fave
Strategy Comparison
StrategyHow It WorksDowntime?Rollback SpeedCost
All-at-onceDeploy to all instances simultaneouslyYes (brief)Redeploy (slow)Cheapest
RollingDeploy to one batch at a timeReduced capacity brieflyRedeployLow
Rolling with additional batchAdd new batch before removing oldNoRedeployMedium
ImmutableLaunch new ASG with new version; swap on successNoTerminate new instances (fast)High (double capacity)
Blue/GreenFull duplicate environment; switch DNS/ALBNoInstant (swap back)Highest (2× environment)
CanaryX% traffic to new, rest to oldNoShift traffic backMedium
Lambda-Specific Deployment via CodeDeploy
  • LambdaAllAtOnce: Instantly shifts all traffic to new version
  • LambdaCanary10Percent5Minutes: 10% to new version → wait 5 min → 100% if hooks pass
  • LambdaLinear10PercentEvery1Minute: Increase by 10% every minute until 100%
  • Pre/Post traffic hooks: Lambda functions that run before/after traffic shift to validate the deployment (e.g., run smoke tests)
Elastic Beanstalk Deployment Policies
PolicyDowntimeBest For
All at onceYesDev/test rapid deployments
RollingNo (reduced capacity)Production with some risk tolerance
Rolling with additional batchNo (full capacity maintained)Production zero-downtime
ImmutableNoProduction — fastest rollback
Blue/Green (swap URLs)NoProduction — full environment isolation
💡

Speed of rollback (fastest to slowest): Blue/Green (instant DNS swap) → Canary (shift traffic back) → Immutable (terminate ASG) → Rolling (redeploy previous version).

🎯

"Zero downtime with fastest rollback for Lambda" → CodeDeploy LambdaCanary + pre/post traffic hooks. "Zero downtime for EC2 with no capacity impact" → Rolling with additional batch. "Instant full rollback for production" → Blue/Green. "Cheapest, accepts downtime" → All-at-once.

⚠️

Common traps:

  • "Rolling deployment maintains 100% capacity" — FALSE; standard rolling takes instances offline during update, reducing capacity. Use Rolling with additional batch to maintain full capacity.
  • "Blue/Green deployment is always cheaper than rolling" — FALSE; Blue/Green requires double the infrastructure temporarily — it is the most expensive strategy.

Domain 4 Overview

Debug, monitor, and optimize AWS applications. Covers CloudWatch Logs and Insights, AWS X-Ray distributed tracing, custom metrics (including EMF), structured logging, and performance optimization with caching, concurrency tuning, and messaging efficiency.

⚡ 18% of scored content
Task 4.1

Assist in a Root Cause Analysis

CloudWatch Logs, Logs Insights queries, X-Ray traces, custom metrics, dashboards, and deployment failure debugging.

Skills in:
  • Debug code to identify defects; interpret application metrics, logs, and traces
  • Query logs to find relevant data; implement custom metrics (CloudWatch EMF)
  • Review application health using dashboards and insights
  • Troubleshoot deployment failures using service output logs
  • Debug service integration issues in applications
📋 Amazon CloudWatch Logs — Log Groups, Insights & Metric Filters
CloudWatchExam Fave

CloudWatch Logs is the primary log aggregation service on AWS. Knowing how to query, filter, and act on logs is a key DVA-C02 skill.

Hierarchy
  • Log Group: Container for related log streams (e.g., one per Lambda function or ECS service). Retention configured here (1 day – 10 years, or never expire).
  • Log Stream: Sequence of log events from a single source (e.g., one Lambda execution environment, one EC2 instance).
  • Log Event: A single timestamped record.
CloudWatch Logs Insights — Query Syntax
# Find Lambda errors in the last hour
fields @timestamp, @message
| filter @message like /ERROR/
| sort @timestamp desc
| limit 20

# Calculate p99 Lambda duration
filter @type = "REPORT"
| stats pct(@duration, 99) as p99Duration by bin(5m)

# Count errors by error type
filter @message like /Exception/
| parse @message "* Exception: *" as exType, exMsg
| stats count(*) by exType
Metric Filters
  • Extract numeric values from log events and publish them as CloudWatch custom metrics
  • Example: Count occurrences of ERROR in logs → create metric ErrorCount → set alarm on it
  • Pattern: { $.level = "ERROR" } (JSON log format) or simple text pattern ERROR
  • One metric filter = one CloudWatch metric per log group
Lambda Logging
  • Lambda automatically sends stdout/stderr to CloudWatch Logs (/aws/lambda/functionName)
  • Lambda Logs format: START, END, REPORT lines contain requestId, duration, billed duration, memory used, init duration (cold start)
  • CloudWatch Lambda Insights: Enhanced monitoring — CPU, memory, disk, network utilization via a Lambda layer
Common Root Cause Patterns
SymptomLikely CauseWhere to Look
Lambda 502 errors from API GWLambda function exception or timeoutLambda log group for error detail
Lambda throttling (429)Concurrency limit reachedCloudWatch metric: Throttles
Lambda timeoutFunction exceeds timeout configREPORT line: Duration > configured timeout
DynamoDB ProvisionedThroughputExceededExceptionHot partition or insufficient capacityCW metric: ConsumedWriteCapacityUnits, ThrottledRequests
API GW 5xx errorsBackend Lambda or integration errorAPI GW execution logs; enable in stage settings
🎯

Enable API Gateway execution logging in the stage settings to see full request/response cycle. Without this, you only see access logs — insufficient for debugging integration errors. Set log level to INFO or ERROR.

⚠️

Common traps:

  • "CloudWatch Logs Insights queries run in real-time as logs arrive" — FALSE; Logs Insights queries historical log data. For real-time filtering, use CloudWatch Live Tail or Subscription Filters.
  • "Lambda always writes logs to CloudWatch automatically" — TRUE only if the execution role has logs:CreateLogGroup, logs:CreateLogStream, and logs:PutLogEvents permissions.
🔍 AWS X-Ray — Distributed Tracing, Segments & Service Map
X-RayHigh Frequency

X-Ray provides end-to-end distributed tracing across microservices. It is the go-to tool for identifying latency bottlenecks and errors across a request chain on the DVA-C02 exam.

Core Concepts
ConceptDescription
TraceComplete end-to-end request across all services — identified by unique Trace ID
SegmentData from a single service (e.g., Lambda function, EC2 app) for one request
SubsegmentGranular unit within a segment (e.g., DynamoDB call, HTTP downstream call, custom code block)
AnnotationIndexed key-value pair — filterable in X-Ray console (e.g., userId, tenantId)
MetadataNon-indexed key-value pair — visible in trace but not searchable
Service MapVisual graph of all services in the request chain with latency and error rates
Enabling X-Ray
  • Lambda: Enable Active Tracing in function config. Lambda sends trace data automatically. Add X-Ray SDK for subsegments.
  • API Gateway: Enable X-Ray Tracing on the stage settings. Adds trace header to requests.
  • ECS/EC2: Run X-Ray Daemon as a sidecar or daemon process. SDK sends trace data to daemon → daemon batches and sends to X-Ray API.
  • SDK instrumentation: Wrap AWS SDK calls and HTTP clients with X-Ray SDK to automatically create subsegments
X-Ray SDK — Annotations vs. Metadata
# Python: add annotation (indexed — use for filtering)
xray_recorder.put_annotation("userId", user_id)
xray_recorder.put_annotation("plan", "premium")

# Metadata (not indexed — for debugging context)
xray_recorder.put_metadata("requestPayload", request_body)

# Custom subsegment
with xray_recorder.in_subsegment("database-query") as sub:
    result = table.query(...)
    sub.put_annotation("recordCount", len(result['Items']))
Sampling Rules
  • X-Ray does not record every request — it samples to reduce cost and noise
  • Default rule: Record first request per second + 5% of additional requests per service host
  • Custom sampling rules: Target specific services, URLs, or HTTP methods with different rates
  • Sampling rules are centrally configured — no code change needed to adjust sampling
💡

Annotation vs. Metadata: Annotation = Actionable / searchable (like a database index). Metadata = More context (like a comment — descriptive but not queryable).

🎯

"Find all requests from tenantId=ABC that took >2s" → Add tenantId as X-Ray annotation → filter in X-Ray console. "Which downstream service is causing latency" → X-Ray Service Map — look for high response time subsegments. "Lambda function not sending traces" → Check Active Tracing is on AND execution role has xray:PutTraceSegments + xray:PutTelemetryRecords.

⚠️

Common traps:

  • "X-Ray traces every single request by default" — FALSE; X-Ray uses sampling. To trace every request, set a custom sampling rule with 100% fixed rate (not recommended at high volume).
  • "X-Ray metadata is searchable in the console like annotations" — FALSE; only annotations are indexed and filterable. Metadata is visible within a specific trace detail but cannot be used to search/filter across traces.
  • "X-Ray is enabled automatically on Lambda functions" — FALSE; you must explicitly enable Active Tracing in the Lambda function configuration.
Task 4.2

Instrument Code for Observability

Logging vs. monitoring vs. observability, custom metrics, EMF, CloudWatch Alarms, X-Ray SDK, structured logging, health checks.

Skills in:
  • Describe differences between logging, monitoring, and observability
  • Implement an effective logging strategy to record application behavior and state
  • Implement code that emits custom metrics; add annotations for tracing services
  • Implement notification alerts for specific actions (quota limits, deployment completions)
  • Implement tracing using AWS services and tools; implement structured logging
  • Configure application health checks and readiness probes
📊 CloudWatch Custom Metrics, EMF & Alarms
CloudWatchHigh Frequency

Publishing custom metrics lets you monitor business-level KPIs alongside infrastructure metrics. EMF is the modern, cost-effective approach for Lambda functions.

Logging vs. Monitoring vs. Observability
PillarWhat It ProvidesAWS Service
LoggingDiscrete event records — what happenedCloudWatch Logs, CloudTrail
MonitoringTime-series metrics — how things are performingCloudWatch Metrics & Alarms
TracingRequest flow across services — why it happenedAWS X-Ray
ObservabilityCombine all three to understand system stateCloudWatch Application Signals, X-Ray, OTel
Custom Metrics — PutMetricData
# Emit custom metric: order processing time
cw = boto3.client('cloudwatch')
cw.put_metric_data(
    Namespace='MyApp/Orders',
    MetricData=[{
        'MetricName': 'ProcessingDuration',
        'Value': processing_ms,
        'Unit': 'Milliseconds',
        'Dimensions': [{
            'Name': 'Environment',
            'Value': 'prod'
        }]
    }]
)
Embedded Metric Format (EMF) — Preferred for Lambda
  • Embed metric values in structured log JSON — CloudWatch automatically extracts them as metrics
  • Zero additional API calls — metrics are extracted from your existing logs
  • No extra cost for the PutMetricData API call; you only pay for the log ingestion
  • Use the aws-embedded-metrics library (Python/Node.js/Java)
# Python EMF — metrics extracted automatically from logs
from aws_embedded_metrics import metric_scope

@metric_scope
def lambda_handler(event, context, metrics):
    metrics.set_namespace("MyApp")
    metrics.put_dimensions({"Service": "OrderProcessor"})
    metrics.put_metric("OrderCount", 1, "Count")
    metrics.put_metric("ProcessingTime", elapsed_ms, "Milliseconds")
CloudWatch Alarms
Alarm TypeTriggers WhenUse Case
Threshold AlarmMetric crosses a static thresholdCPU > 80%, Error count > 5
Anomaly DetectionMetric deviates from ML-predicted bandTraffic drops/spikes without fixed threshold
Composite AlarmCombination of multiple alarms (AND/OR)Alert only when both high CPU AND high errors
Math Expression AlarmMetric math result crosses thresholdError rate = errors / requests > 1%
Alarm States
  • OK: Metric is within the threshold
  • ALARM: Metric has breached the threshold for the configured datapoints-to-alarm
  • INSUFFICIENT_DATA: Not enough data points yet (common with new metrics or long evaluation periods)
🎯

"Notify on Lambda throttling" → CloudWatch Alarm on Throttles metric → SNS → email/PagerDuty. "Track business KPI in Lambda" → Use EMF (no extra API calls, extracted from logs automatically). "Alert only when multiple conditions are true" → Composite Alarm.

⚠️

Common traps:

  • "An alarm in INSUFFICIENT_DATA state means something is wrong" — FALSE; insufficient data is normal for new alarms or when a metric has gaps (e.g., Lambda not invoked recently).
  • "CloudWatch custom metrics are retained indefinitely" — FALSE; CloudWatch retains data at different resolutions: 1-second for 3 hours, 1-minute for 15 days, 5-minute for 63 days, 1-hour for 15 months.
📝 Structured Logging, Health Checks & Observability Best Practices
LoggingMedium Frequency
Structured Logging (JSON)
  • Log as JSON instead of plain text — machine-parseable, queryable with Logs Insights
  • Always include: requestId, userId, timestamp, level, message, duration
  • Never include PII, credentials, or sensitive data in logs
  • Use a correlation ID (requestId or traceId) to link logs across multiple services for the same request
# Structured log entry — queryable with Logs Insights
{
  "level": "INFO",
  "requestId": "abc-123",
  "userId": "u-456",
  "action": "processOrder",
  "orderId": "ord-789",
  "durationMs": 45,
  "status": "success"
}
Application Health Checks
TypeWhat It ChecksUsed By
ALB Target Health CheckHTTP endpoint returns 2xxALB removes unhealthy targets from rotation
ECS Container Health CheckCommand/HTTP runs successfully inside containerECS replaces unhealthy tasks
Route 53 Health CheckDNS-level endpoint availabilityDNS failover routing policies
CloudWatch Synthetics CanaryScripted browser/API test runs on a scheduleProactive endpoint monitoring
Subscription Filters — Real-Time Log Routing
  • Stream filtered log data from a CloudWatch log group to Lambda, Kinesis, or Firehose in real time
  • Use case: Forward ERROR logs from all Lambda functions to a centralized Kinesis stream → Firehose → OpenSearch for analysis
  • One subscription filter per log group (with Firehose); up to 2 with Lambda
🎯

"Real-time log analysis pipeline" → CloudWatch Logs Subscription Filter → Kinesis Data Firehose → S3 / OpenSearch. "Proactive endpoint test every 5 min" → CloudWatch Synthetics Canary. "Correlate logs across Lambda, API GW, DynamoDB for one request" → Use X-Ray trace ID as correlation ID in all log statements.

Task 4.3

Optimize Applications by Using AWS Services and Features

Lambda concurrency, caching strategies (ElastiCache, DAX, CloudFront, API GW), messaging optimization, and application profiling.

Skills in:
  • Define concurrency; profile application performance
  • Determine minimum memory and compute power for an application
  • Use subscription filter policies to optimize messaging
  • Cache content based on request headers; implement application-level caching
  • Optimize application resource usage; analyze application performance issues
  • Use application logs to identify performance bottlenecks
Lambda Concurrency Deep Dive — Limits, Throttling & Burst
LambdaHigh Frequency

Concurrency management is one of the most tested Lambda optimization topics. Understanding reserved vs. provisioned concurrency and throttling behavior is essential.

Concurrency Model
  • Each in-flight request uses one concurrent execution
  • Account default limit: 1,000 concurrent executions (soft limit — can be increased via Service Quotas)
  • Burst limit: Initial burst capacity (500–3,000 depending on region); after burst, scales by 500 per minute
  • When concurrency limit is hit → throttlingTooManyRequestsException (HTTP 429)
Concurrency Types Recap
TypePurposeCost when idleEliminates cold start?
UnreservedDefault — shared poolNoNo
ReservedGuarantee max N; protect downstreamNo (just a limit)No
ProvisionedPre-warm N environmentsYes (charged continuously)Yes
Throttling Behavior by Invocation Type
Invocation TypeThrottle Behavior
Synchronous (API GW → Lambda)Returns 429 immediately to caller; no retry
Asynchronous (S3, SNS, EventBridge)Lambda retries for up to 6 hours; backs off automatically
Event Source Mapping (SQS, Kinesis)Messages stay in queue/stream; Lambda retries when concurrency available
Lambda Power Tuning
  • Open-source tool (AWS Lambda Power Tuning) runs your function at multiple memory sizes
  • Finds the optimal memory setting for cost, speed, or both
  • More memory ≠ always slower; often reducing duration pays for memory increase
  • Rule: Cost = duration × memory. Higher memory may run faster enough to be cheaper overall.
💡

Reserved Concurrency = Ceiling (caps usage). Provisioned Concurrency = Floor (pre-warms environments). Set both: Reserved prevents runaway scale; Provisioned ensures no cold starts up to that level.

🎯

"Prevent Lambda from overwhelming RDS" → Reserved Concurrency (cap Lambda). "Eliminate cold starts for latency-sensitive function" → Provisioned Concurrency. "Lambda throttled during traffic spike" → Request concurrency limit increase via Service Quotas. "Find cheapest memory setting" → Lambda Power Tuning tool.

⚠️

Common traps:

  • "Setting reserved concurrency to 0 disables the function" — TRUE — this is actually a useful technique to temporarily disable a function by throttling all executions.
  • "Provisioned concurrency guarantees no throttling" — FALSE; Provisioned Concurrency pre-warms execution environments but the function can still be throttled if it exceeds the reserved or account concurrency limit.
  • "Lambda always scales to handle any load instantly" — FALSE; there is a burst limit per region. Above the burst, Lambda scales by 500 concurrent executions per minute.
🚀 Caching Strategies — CloudFront, API GW, ElastiCache & DAX
PerformanceHigh Frequency

Caching at the right layer dramatically reduces latency and cost. The DVA-C02 exam tests which cache layer to choose for a given scenario.

Cache Layer Comparison
Cache LayerServiceWhat It CachesBest For
CDN / EdgeCloudFrontStatic assets, API responsesGlobal users; reduce origin load; low latency worldwide
API LayerAPI Gateway cacheAPI endpoint responsesRepeated identical API requests; reduce Lambda invocations
Application LayerElastiCache (Redis/Memcached)DB query results, session dataRDS read offload; session store; complex data structures
Database LayerDynamoDB DAXDynamoDB GetItem/Query resultsDynamoDB hot reads; microsecond latency
CloudFront Cache Behaviors
  • Cache based on request headers: Add headers to the Cache Key to vary cached responses (e.g., Accept-Language for localized content). More headers = fewer cache hits.
  • TTL: Set via Cache-Control: max-age=N response header from origin, or override in CloudFront policy (min/max/default TTL)
  • Cache invalidation: CreateInvalidation API or console — invalidate specific paths. Cost: first 1,000 paths/month free, then $0.005/path.
  • Origin Shield: Additional caching layer between CloudFront edge and origin — reduces origin load further
API Gateway Caching
  • Cache enabled per stage; capacity from 0.5 GB to 237 GB
  • TTL: 0–3600 seconds (default 300s)
  • Cache invalidation: client sends Cache-Control: max-age=0 header (if you grant them permission)
  • Cache hit metric: CacheHitCount / CacheMissCount in CloudWatch
SNS Subscription Filter Policies
  • Filter policies on SNS subscriptions prevent unwanted messages from being delivered to a subscriber
  • Filtering happens at SNS — subscribers only receive messages matching their policy
  • Reduces downstream processing costs (fewer Lambda invocations, fewer SQS messages)
  • Filter on message attributes: {"eventType": ["ORDER_PLACED", "ORDER_SHIPPED"]}
🎯

"Reduce Lambda invocations for repeated API calls" → API GW cache. "Reduce RDS load from read-heavy app" → ElastiCache lazy loading. "Serve static content globally with low latency" → CloudFront. "Only process ORDER events from SNS topic" → SNS subscription filter policy on your SQS subscriber.

⚠️

Common traps:

  • "CloudFront caches POST requests" — FALSE; CloudFront only caches GET and HEAD requests by default. POST, PUT, DELETE pass through to origin (not cached).
  • "Adding more headers to the CloudFront cache key improves cache hit rate" — FALSE; more headers in the cache key means more unique cache entries → lower cache hit rate. Use only headers that genuinely vary the response.
  • "SNS filter policies are applied at the publisher" — FALSE; filter policies are configured on the subscription (subscriber side). The publisher sends the message to all subscribers; SNS filters before delivering.
🔧 SQS Optimization & Application Performance Profiling
PerformanceMedium Frequency
SQS Optimization Techniques
TechniqueHowBenefit
Long PollingSet WaitTimeSeconds = 1–20Reduces empty API calls; reduces cost
Batch ProcessingReceive up to 10 messages per call (MaxNumberOfMessages)Fewer API calls; higher throughput
Message Batching (send)Use SendMessageBatch (up to 10 messages)10× throughput per API call
Visibility Timeout TuningSet to > max processing timePrevent duplicate processing
Dead-Letter QueueSet maxReceiveCountIsolate poison-pill messages; prevent infinite retry loop
Application Performance Analysis
  • CloudWatch Metrics: Identify CPU/memory saturation, DB connection counts, queue depth trends over time
  • X-Ray Traces: Find which subsegment (DB call, downstream HTTP, internal code) contributes most to latency
  • CloudWatch Logs Insights: Query p99 latency: stats pct(@duration, 99) by bin(5m)
  • Lambda REPORT logs: Init Duration reveals cold start cost; Max Memory Used reveals memory waste
Performance Bottleneck Decision Tree
SymptomRoot CauseFix
High Lambda durationSlow downstream DB/API callAdd DAX/ElastiCache; use async pattern
Lambda cold startsNo pre-warmed environmentProvisioned Concurrency; reduce package size
DynamoDB throttlingHot partition or under-provisionedBetter partition key; On-Demand mode; DAX
RDS connection errors under loadConnection pool exhaustedRDS Proxy; reduce Lambda reserved concurrency
High API GW latencyRepeated identical requests hitting LambdaEnable API GW caching
SQS high message ageConsumers too slowIncrease Lambda concurrency; increase batch size
🎯

Full observability stack for Lambda: X-Ray (tracing) + CloudWatch Logs (structured JSON) + EMF (custom metrics) + CloudWatch Alarms (notification) + Lambda Insights layer (system metrics). This combination gives you complete visibility without external tools.

⚠️

Common traps:

  • "Long polling always returns messages faster" — FALSE; long polling waits up to WaitTimeSeconds for a message. If no message arrives, it returns empty after the wait. It is more efficient, not necessarily faster.
  • "Increasing Lambda memory always reduces cost" — FALSE; if the function is not compute-bound, increasing memory just costs more with no duration benefit. Use Power Tuning to find the optimum.

On this page