DynamoDB — NoSQL Database for Serverless Applications

Why DynamoDB Matters

DynamoDB is AWS's fully managed NoSQL database service. It delivers single-digit millisecond performance at any scale, is fully managed (no server maintenance), and integrates seamlessly with Lambda for serverless applications.

Why this matters for your career:

DynamoDB is the default database for serverless applications on AWS
It powers some of the largest applications in the world (Netflix, Airbnb, Lyft)
Understanding DynamoDB design is essential for AWS Solutions Architect certification
DynamoDB's pricing model (pay per request) aligns perfectly with serverless

Core Concepts

Tables, Items, Attributes

Table: Users
  Item 1: { "id": "user_123", "name": "Alice",   "email": "alice@example.com",   "age": 30 }
  Item 2: { "id": "user_456", "name": "Bob",     "email": "bob@example.com",     "age": 25 }
  Item 3: { "id": "user_789", "name": "Charlie", "email": "charlie@example.com", "age": 35 }

- Table: Collection of items (like a SQL table)
- Item: A single record (like a SQL row)
- Attribute: A piece of data (like a SQL column)

DynamoDB is schemaless — different items can have different attributes.

Primary Keys

| Key Type | Components | Example | Use Case | |----------|------------|---------|----------| | Simple (Partition Key) | 1 attribute | id (partition key) | Simple lookup by ID | | Composite (Partition Key + Sort Key) | 2 attributes | userId (partition) + timestamp (sort) | Time-series data, one-to-many |

Secondary Indexes

| Index Type | Description | Consumed Capacity | |------------|-------------|-------------------| | Local Secondary Index (LSI) | Same partition key, different sort key | Reads from same table | | Global Secondary Index (GSI) | Different partition key and sort key | Own provisioned throughput |

Capacity Modes

| Mode | Description | Best For | |------|-------------|----------| | Provisioned | Specify RCU and WCU | Predictable workloads | | On-Demand | Pay per request | Unpredictable workloads, new applications |

RCU and WCU

1 RCU (Read Capacity Unit) = 1 strongly consistent read/sec for 4 KB item, or 2 eventually consistent reads/sec
1 WCU (Write Capacity Unit) = 1 write/sec for 1 KB item

Querying and Scanning

Query

A Query operation finds items based on the partition key. It is the most efficient way to retrieve data.

import boto3
from boto3.dynamodb.conditions import Key

dynamodb = boto3.resource('dynamodb')
table = dynamodb.Table('Orders')

# Query by partition key
response = table.query(
    KeyConditionExpression=Key('userId').eq('user_123')
)

# Query with sort key condition
response = table.query(
    KeyConditionExpression=Key('userId').eq('user_123') & Key('createdAt').between('2025-01-01', '2025-01-31')
)

# Query with limit and pagination
response = table.query(
    KeyConditionExpression=Key('userId').eq('user_123'),
    Limit=10,
    ExclusiveStartKey=response.get('LastEvaluatedKey')
)

Scan

A Scan operation examines every item in the table. It is expensive and should be avoided in production.

# Scan all users (expensive!)
response = table.scan()

# Scan with filter (still reads all items, just returns fewer)
response = table.scan(
    FilterExpression=Attr('age').gte(30)
)

Query vs. Scan

| Operation | Efficiency | Use When | |-----------|------------|----------| | Query | High (reads only relevant partition) | You know the partition key | | Scan | Low (reads every item) | You don't know the partition key (avoid if possible) |

Best Practices

| Practice | Reason | |----------|--------| | Design tables for your access patterns | DynamoDB is not flexible like SQL — plan queries first | | Use composite keys for time-series data | Partition key = entity ID, sort key = timestamp | | Create GSIs for alternate access patterns | Query by different attributes | | Use sparse indexes | GSI only includes items with the indexed attribute | | Avoid hot partitions | Distribute reads/writes evenly across partition keys | | Use DynamoDB Accelerator (DAX) for caching | In-memory cache for read-heavy workloads | | Enable auto-scaling for provisioned mode | Handle traffic spikes without manual intervention | | Use TTL for automatic expiry | Automatically delete expired items |

DynamoDB with Lambda

import json
import boto3
from boto3.dynamodb.conditions import Key, Attr

dynamodb = boto3.resource('dynamodb')
table = dynamodb.Table('Users')

def lambda_handler(event, context):
    action = event.get('action')

    if action == 'create':
        item = event['item']
        table.put_item(Item=item)
        return {'statusCode': 201, 'body': json.dumps(item)}

    elif action == 'get':
        key = {'id': event['id']}
        response = table.get_item(Key=key)
        return {'statusCode': 200, 'body': json.dumps(response.get('Item'))}

    elif action == 'update':
        key = {'id': event['id']}
        updates = event['updates']
        update_expression = 'SET '
        expression_attrs = {}
        for i, (k, v) in enumerate(updates.items()):
            update_expression += f'#{k} = :val{i}, '
            expression_attrs[f'#val{i}'] = k
            expression_attrs[f':val{i}'] = v
        update_expression = update_expression.rstrip(', ')

        table.update_item(
            Key=key,
            UpdateExpression=update_expression,
            ExpressionAttributeNames={f'#{k}': k for k in updates.keys()},
            ExpressionAttributeValues={f':val{i}': v for i, (k, v) in enumerate(updates.items())}
        )
        return {'statusCode': 200, 'body': json.dumps({'message': 'Updated'})}

    elif action == 'delete':
        key = {'id': event['id']}
        table.delete_item(Key=key)
        return {'statusCode': 200, 'body': json.dumps({'message': 'Deleted'})}

    elif action == 'query':
        response = table.query(
            KeyConditionExpression=Key('id').eq(event['id']),
            Limit=event.get('limit', 10)
        )
        return {'statusCode': 200, 'body': json.dumps(response['Items'])}

    return {'statusCode': 400, 'body': json.dumps({'error': 'Unknown action'})}

Summary

DynamoDB is a powerful NoSQL database for serverless applications. Design your tables around your access patterns, use composite keys for time-series data, create GSIs for alternate queries, and avoid expensive scan operations. Always plan your data model before writing code.

Key takeaways:

DynamoDB is fully managed, schemaless, NoSQL database
Primary key: simple (partition key) or composite (partition + sort key)
Query is efficient (uses partition key); Scan is expensive (reads everything)
RCU/WCU: provisioned (predictable) vs. on-demand (unpredictable)
GSIs enable alternate access patterns (own capacity)
Avoid hot partitions — distribute keys evenly
Use TTL for automatic data expiry
DAX provides in-memory caching for read-heavy workloads

What's Next: DynamoDB Streams & EventBridge

The next chapter covers DynamoDB Streams and EventBridge — building event-driven serverless applications with real-time data processing.

Capacity Mode Decision Guide

| Factor | Provisioned | On-Demand | |--------|-------------|-----------| | Traffic pattern | Predictable | Spiky or unknown | | Cost optimization | Up to 80% cheaper with auto-scaling | More expensive for steady state | | Management | Must set RCU/WCU and auto-scaling | Zero management | | Use case | Production with stable traffic | New apps, dev/test, unpredictable spikes |

Choose provisioned for cost savings on predictable workloads. Choose on-demand to avoid capacity planning for new or variable-traffic applications.