DynamoDB — NoSQL Database for Serverless Applications
Why DynamoDB Matters
DynamoDB is AWS's fully managed NoSQL database service. It delivers single-digit millisecond performance at any scale, is fully managed (no server maintenance), and integrates seamlessly with Lambda for serverless applications.
Why this matters for your career:
- DynamoDB is the default database for serverless applications on AWS
- It powers some of the largest applications in the world (Netflix, Airbnb, Lyft)
- Understanding DynamoDB design is essential for AWS Solutions Architect certification
- DynamoDB's pricing model (pay per request) aligns perfectly with serverless
Core Concepts
Tables, Items, Attributes
Table: Users
Item 1: { "id": "user_123", "name": "Alice", "email": "alice@example.com", "age": 30 }
Item 2: { "id": "user_456", "name": "Bob", "email": "bob@example.com", "age": 25 }
Item 3: { "id": "user_789", "name": "Charlie", "email": "charlie@example.com", "age": 35 }
- Table: Collection of items (like a SQL table)
- Item: A single record (like a SQL row)
- Attribute: A piece of data (like a SQL column)
DynamoDB is schemaless — different items can have different attributes.
Primary Keys
| Key Type | Components | Example | Use Case |
|----------|------------|---------|----------|
| Simple (Partition Key) | 1 attribute | id (partition key) | Simple lookup by ID |
| Composite (Partition Key + Sort Key) | 2 attributes | userId (partition) + timestamp (sort) | Time-series data, one-to-many |
Secondary Indexes
| Index Type | Description | Consumed Capacity | |------------|-------------|-------------------| | Local Secondary Index (LSI) | Same partition key, different sort key | Reads from same table | | Global Secondary Index (GSI) | Different partition key and sort key | Own provisioned throughput |
Capacity Modes
| Mode | Description | Best For | |------|-------------|----------| | Provisioned | Specify RCU and WCU | Predictable workloads | | On-Demand | Pay per request | Unpredictable workloads, new applications |
RCU and WCU
- 1 RCU (Read Capacity Unit) = 1 strongly consistent read/sec for 4 KB item, or 2 eventually consistent reads/sec
- 1 WCU (Write Capacity Unit) = 1 write/sec for 1 KB item
Querying and Scanning
Query
A Query operation finds items based on the partition key. It is the most efficient way to retrieve data.
import boto3
from boto3.dynamodb.conditions import Key
dynamodb = boto3.resource('dynamodb')
table = dynamodb.Table('Orders')
# Query by partition key
response = table.query(
KeyConditionExpression=Key('userId').eq('user_123')
)
# Query with sort key condition
response = table.query(
KeyConditionExpression=Key('userId').eq('user_123') & Key('createdAt').between('2025-01-01', '2025-01-31')
)
# Query with limit and pagination
response = table.query(
KeyConditionExpression=Key('userId').eq('user_123'),
Limit=10,
ExclusiveStartKey=response.get('LastEvaluatedKey')
)
Scan
A Scan operation examines every item in the table. It is expensive and should be avoided in production.
# Scan all users (expensive!)
response = table.scan()
# Scan with filter (still reads all items, just returns fewer)
response = table.scan(
FilterExpression=Attr('age').gte(30)
)
Query vs. Scan
| Operation | Efficiency | Use When | |-----------|------------|----------| | Query | High (reads only relevant partition) | You know the partition key | | Scan | Low (reads every item) | You don't know the partition key (avoid if possible) |
Best Practices
| Practice | Reason | |----------|--------| | Design tables for your access patterns | DynamoDB is not flexible like SQL — plan queries first | | Use composite keys for time-series data | Partition key = entity ID, sort key = timestamp | | Create GSIs for alternate access patterns | Query by different attributes | | Use sparse indexes | GSI only includes items with the indexed attribute | | Avoid hot partitions | Distribute reads/writes evenly across partition keys | | Use DynamoDB Accelerator (DAX) for caching | In-memory cache for read-heavy workloads | | Enable auto-scaling for provisioned mode | Handle traffic spikes without manual intervention | | Use TTL for automatic expiry | Automatically delete expired items |
DynamoDB with Lambda
import json
import boto3
from boto3.dynamodb.conditions import Key, Attr
dynamodb = boto3.resource('dynamodb')
table = dynamodb.Table('Users')
def lambda_handler(event, context):
action = event.get('action')
if action == 'create':
item = event['item']
table.put_item(Item=item)
return {'statusCode': 201, 'body': json.dumps(item)}
elif action == 'get':
key = {'id': event['id']}
response = table.get_item(Key=key)
return {'statusCode': 200, 'body': json.dumps(response.get('Item'))}
elif action == 'update':
key = {'id': event['id']}
updates = event['updates']
update_expression = 'SET '
expression_attrs = {}
for i, (k, v) in enumerate(updates.items()):
update_expression += f'#{k} = :val{i}, '
expression_attrs[f'#val{i}'] = k
expression_attrs[f':val{i}'] = v
update_expression = update_expression.rstrip(', ')
table.update_item(
Key=key,
UpdateExpression=update_expression,
ExpressionAttributeNames={f'#{k}': k for k in updates.keys()},
ExpressionAttributeValues={f':val{i}': v for i, (k, v) in enumerate(updates.items())}
)
return {'statusCode': 200, 'body': json.dumps({'message': 'Updated'})}
elif action == 'delete':
key = {'id': event['id']}
table.delete_item(Key=key)
return {'statusCode': 200, 'body': json.dumps({'message': 'Deleted'})}
elif action == 'query':
response = table.query(
KeyConditionExpression=Key('id').eq(event['id']),
Limit=event.get('limit', 10)
)
return {'statusCode': 200, 'body': json.dumps(response['Items'])}
return {'statusCode': 400, 'body': json.dumps({'error': 'Unknown action'})}
Summary
DynamoDB is a powerful NoSQL database for serverless applications. Design your tables around your access patterns, use composite keys for time-series data, create GSIs for alternate queries, and avoid expensive scan operations. Always plan your data model before writing code.
Key takeaways:
- DynamoDB is fully managed, schemaless, NoSQL database
- Primary key: simple (partition key) or composite (partition + sort key)
- Query is efficient (uses partition key); Scan is expensive (reads everything)
- RCU/WCU: provisioned (predictable) vs. on-demand (unpredictable)
- GSIs enable alternate access patterns (own capacity)
- Avoid hot partitions — distribute keys evenly
- Use TTL for automatic data expiry
- DAX provides in-memory caching for read-heavy workloads
What's Next: DynamoDB Streams & EventBridge
The next chapter covers DynamoDB Streams and EventBridge — building event-driven serverless applications with real-time data processing.
Capacity Mode Decision Guide
| Factor | Provisioned | On-Demand | |--------|-------------|-----------| | Traffic pattern | Predictable | Spiky or unknown | | Cost optimization | Up to 80% cheaper with auto-scaling | More expensive for steady state | | Management | Must set RCU/WCU and auto-scaling | Zero management | | Use case | Production with stable traffic | New apps, dev/test, unpredictable spikes |
Choose provisioned for cost savings on predictable workloads. Choose on-demand to avoid capacity planning for new or variable-traffic applications.