DynamoDB — NoSQL Database for Serverless Applications

Why DynamoDB Matters

DynamoDB is AWS's fully managed NoSQL database service. It delivers single-digit millisecond performance at any scale, is fully managed (no server maintenance), and integrates seamlessly with Lambda for serverless applications.

Why this matters for your career:

  • DynamoDB is the default database for serverless applications on AWS
  • It powers some of the largest applications in the world (Netflix, Airbnb, Lyft)
  • Understanding DynamoDB design is essential for AWS Solutions Architect certification
  • DynamoDB's pricing model (pay per request) aligns perfectly with serverless

Core Concepts

Tables, Items, Attributes

Table: Users
  Item 1: { "id": "user_123", "name": "Alice",   "email": "alice@example.com",   "age": 30 }
  Item 2: { "id": "user_456", "name": "Bob",     "email": "bob@example.com",     "age": 25 }
  Item 3: { "id": "user_789", "name": "Charlie", "email": "charlie@example.com", "age": 35 }

- Table: Collection of items (like a SQL table)
- Item: A single record (like a SQL row)
- Attribute: A piece of data (like a SQL column)

DynamoDB is schemaless — different items can have different attributes.

Primary Keys

| Key Type | Components | Example | Use Case | |----------|------------|---------|----------| | Simple (Partition Key) | 1 attribute | id (partition key) | Simple lookup by ID | | Composite (Partition Key + Sort Key) | 2 attributes | userId (partition) + timestamp (sort) | Time-series data, one-to-many |

Secondary Indexes

| Index Type | Description | Consumed Capacity | |------------|-------------|-------------------| | Local Secondary Index (LSI) | Same partition key, different sort key | Reads from same table | | Global Secondary Index (GSI) | Different partition key and sort key | Own provisioned throughput |

Capacity Modes

| Mode | Description | Best For | |------|-------------|----------| | Provisioned | Specify RCU and WCU | Predictable workloads | | On-Demand | Pay per request | Unpredictable workloads, new applications |

RCU and WCU

  • 1 RCU (Read Capacity Unit) = 1 strongly consistent read/sec for 4 KB item, or 2 eventually consistent reads/sec
  • 1 WCU (Write Capacity Unit) = 1 write/sec for 1 KB item

Querying and Scanning

Query

A Query operation finds items based on the partition key. It is the most efficient way to retrieve data.

import boto3
from boto3.dynamodb.conditions import Key

dynamodb = boto3.resource('dynamodb')
table = dynamodb.Table('Orders')

# Query by partition key
response = table.query(
    KeyConditionExpression=Key('userId').eq('user_123')
)

# Query with sort key condition
response = table.query(
    KeyConditionExpression=Key('userId').eq('user_123') & Key('createdAt').between('2025-01-01', '2025-01-31')
)

# Query with limit and pagination
response = table.query(
    KeyConditionExpression=Key('userId').eq('user_123'),
    Limit=10,
    ExclusiveStartKey=response.get('LastEvaluatedKey')
)

Scan

A Scan operation examines every item in the table. It is expensive and should be avoided in production.

# Scan all users (expensive!)
response = table.scan()

# Scan with filter (still reads all items, just returns fewer)
response = table.scan(
    FilterExpression=Attr('age').gte(30)
)

Query vs. Scan

| Operation | Efficiency | Use When | |-----------|------------|----------| | Query | High (reads only relevant partition) | You know the partition key | | Scan | Low (reads every item) | You don't know the partition key (avoid if possible) |

Best Practices

| Practice | Reason | |----------|--------| | Design tables for your access patterns | DynamoDB is not flexible like SQL — plan queries first | | Use composite keys for time-series data | Partition key = entity ID, sort key = timestamp | | Create GSIs for alternate access patterns | Query by different attributes | | Use sparse indexes | GSI only includes items with the indexed attribute | | Avoid hot partitions | Distribute reads/writes evenly across partition keys | | Use DynamoDB Accelerator (DAX) for caching | In-memory cache for read-heavy workloads | | Enable auto-scaling for provisioned mode | Handle traffic spikes without manual intervention | | Use TTL for automatic expiry | Automatically delete expired items |

DynamoDB with Lambda

import json
import boto3
from boto3.dynamodb.conditions import Key, Attr

dynamodb = boto3.resource('dynamodb')
table = dynamodb.Table('Users')

def lambda_handler(event, context):
    action = event.get('action')

    if action == 'create':
        item = event['item']
        table.put_item(Item=item)
        return {'statusCode': 201, 'body': json.dumps(item)}

    elif action == 'get':
        key = {'id': event['id']}
        response = table.get_item(Key=key)
        return {'statusCode': 200, 'body': json.dumps(response.get('Item'))}

    elif action == 'update':
        key = {'id': event['id']}
        updates = event['updates']
        update_expression = 'SET '
        expression_attrs = {}
        for i, (k, v) in enumerate(updates.items()):
            update_expression += f'#{k} = :val{i}, '
            expression_attrs[f'#val{i}'] = k
            expression_attrs[f':val{i}'] = v
        update_expression = update_expression.rstrip(', ')

        table.update_item(
            Key=key,
            UpdateExpression=update_expression,
            ExpressionAttributeNames={f'#{k}': k for k in updates.keys()},
            ExpressionAttributeValues={f':val{i}': v for i, (k, v) in enumerate(updates.items())}
        )
        return {'statusCode': 200, 'body': json.dumps({'message': 'Updated'})}

    elif action == 'delete':
        key = {'id': event['id']}
        table.delete_item(Key=key)
        return {'statusCode': 200, 'body': json.dumps({'message': 'Deleted'})}

    elif action == 'query':
        response = table.query(
            KeyConditionExpression=Key('id').eq(event['id']),
            Limit=event.get('limit', 10)
        )
        return {'statusCode': 200, 'body': json.dumps(response['Items'])}

    return {'statusCode': 400, 'body': json.dumps({'error': 'Unknown action'})}

Summary

DynamoDB is a powerful NoSQL database for serverless applications. Design your tables around your access patterns, use composite keys for time-series data, create GSIs for alternate queries, and avoid expensive scan operations. Always plan your data model before writing code.

Key takeaways:

  • DynamoDB is fully managed, schemaless, NoSQL database
  • Primary key: simple (partition key) or composite (partition + sort key)
  • Query is efficient (uses partition key); Scan is expensive (reads everything)
  • RCU/WCU: provisioned (predictable) vs. on-demand (unpredictable)
  • GSIs enable alternate access patterns (own capacity)
  • Avoid hot partitions — distribute keys evenly
  • Use TTL for automatic data expiry
  • DAX provides in-memory caching for read-heavy workloads

What's Next: DynamoDB Streams & EventBridge

The next chapter covers DynamoDB Streams and EventBridge — building event-driven serverless applications with real-time data processing.

Capacity Mode Decision Guide

| Factor | Provisioned | On-Demand | |--------|-------------|-----------| | Traffic pattern | Predictable | Spiky or unknown | | Cost optimization | Up to 80% cheaper with auto-scaling | More expensive for steady state | | Management | Must set RCU/WCU and auto-scaling | Zero management | | Use case | Production with stable traffic | New apps, dev/test, unpredictable spikes |

Choose provisioned for cost savings on predictable workloads. Choose on-demand to avoid capacity planning for new or variable-traffic applications.

Member Exclusive Free Tutorial

This chapter is free exclusive content for registered members! Please login or register to unlock immediately.

Login / Register Now