Data Structures — Arrays, Lists, Trees, Hash Tables
Why Data Structures Matter
Data structures are the building blocks of all software. Choosing the right data structure can make the difference between a program that runs in seconds and one that takes hours. Every serious software engineer must understand the strengths and weaknesses of each data structure.
Why this matters for your career:
- Data structure questions dominate technical interviews at every major tech company
- Choosing the right data structure is the first step in algorithm design
- Performance-critical applications depend on optimal data structure selection
- Understanding data structures helps you use standard library containers effectively
Big O Time Complexity Reference
| Data Structure | Access | Search | Insert | Delete | |---------------|--------|--------|--------|--------| | Array | O(1) | O(n) | O(n) | O(n) | | Stack | O(n) | O(n) | O(1)* | O(1)* | | Queue | O(n) | O(n) | O(1)* | O(1)* | | Singly Linked List | O(n) | O(n) | O(1) | O(1) | | Doubly Linked List | O(n) | O(n) | O(1) | O(1) | | Hash Table | O(1) avg | O(1) avg | O(1) avg | O(1) avg | | Binary Search Tree | O(log n) avg | O(log n) avg | O(log n) avg | O(log n) avg | | Balanced BST (AVL) | O(log n) | O(log n) | O(log n) | O(log n) | | Heap | O(1) min | O(n) | O(log n) | O(log n) | | Trie | O(k) | O(k) | O(k) | O(k) | | Adjacency List (Graph) | O(1) | O(V+E) | O(1) | O(V+E) |
*Stack and Queue insert/delete are at the top/front only.
Arrays
Arrays store elements in contiguous memory locations. They provide O(1) random access by index but O(n) insertion and deletion.
# Static array (fixed size)
arr = [0] * 10 # [0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
arr[0] = 42 # O(1) access
# Dynamic array (like Python list)
dynamic = [1, 2, 3]
dynamic.append(4) # O(1) amortized
dynamic.insert(0, 0) # O(n) — shifts all elements
dynamic.pop() # O(1)
dynamic.pop(0) # O(n) — shifts all elements
Use when: You need fast index-based access, know the size in advance, or want memory locality (cache-friendly).
Linked Lists
Linked lists store elements in nodes, each pointing to the next. They provide O(1) insertion and deletion but O(n) access.
class ListNode:
def __init__(self, val=0, next=None):
self.val = val
self.next = next
# Create: 1 → 2 → 3 → None
head = ListNode(1)
head.next = ListNode(2)
head.next.next = ListNode(3)
# Insert 0 at front: O(1)
new_head = ListNode(0)
new_head.next = head
head = new_head
# Now: 0 → 1 → 2 → 3 → None
# Delete 1: O(1) if we have reference to previous node
head.next = head.next.next
# Now: 0 → 2 → 3 → None
Use when: You need frequent insertions/deletions, don't need random access, or are implementing queues/stacks.
Stacks and Queues
Stack (LIFO)
stack = []
stack.append(1) # push
stack.append(2) # push
stack.append(3) # push
print(stack.pop()) # → 3
print(stack.pop()) # → 2
print(stack.pop()) # → 1
Use for: Undo/redo, function call stack, expression evaluation, DFS traversal.
Queue (FIFO)
from collections import deque
queue = deque()
queue.append(1) # enqueue (right)
queue.append(2) # enqueue
queue.append(3) # enqueue
print(queue.popleft()) # → 1 (dequeue from left)
print(queue.popleft()) # → 2
print(queue.popleft()) # → 3
Use for: Task scheduling, BFS traversal, message queues, print spooling.
Hash Tables
Hash tables map keys to values using a hash function. They provide average O(1) lookups.
hash_table = {}
hash_table["name"] = "Alice" # Insert O(1) avg
hash_table["age"] = 30 # Insert O(1) avg
print(hash_table["name"]) # Lookup O(1) avg → "Alice"
del hash_table["age"] # Delete O(1) avg
# Handle collisions (Python uses open addressing)
# Good hash functions minimize collisions
# Load factor = entries / buckets
Use when: You need fast lookups, de-duplication, counting, or caching.
Trees
Binary Search Tree (BST)
class TreeNode:
def __init__(self, val):
self.val = val
self.left = None
self.right = None
def insert(root, val):
if root is None:
return TreeNode(val)
if val < root.val:
root.left = insert(root.left, val)
else:
root.right = insert(root.right, val)
return root
def search(root, val):
if root is None or root.val == val:
return root
if val < root.val:
return search(root.left, val)
return search(root.right, val)
# Traversals
def inorder(root):
if root:
inorder(root.left)
print(root.val)
inorder(root.right)
Use when: You need sorted data, range queries, or hierarchical relationships.
Heaps
A heap is a complete binary tree where each parent is smaller (min-heap) or larger (max-heap) than its children.
import heapq
# Min-heap (default)
heap = []
heapq.heappush(heap, 5)
heapq.heappush(heap, 1)
heapq.heappush(heap, 3)
print(heapq.heappop(heap)) # → 1 (smallest)
print(heapq.heappop(heap)) # → 3
print(heapq.heappop(heap)) # → 5
# Max-heap (by negating values)
max_heap = []
heapq.heappush(max_heap, -5)
heapq.heappush(max_heap, -1)
heapq.heappush(max_heap, -3)
print(-heapq.heappop(max_heap)) # → 5 (largest)
Use for: Priority queues, scheduling, Dijkstra's algorithm, median finding.
Graphs
Graphs consist of vertices (nodes) and edges (connections).
# Adjacency list representation
graph = {
'A': ['B', 'C'],
'B': ['A', 'D', 'E'],
'C': ['A', 'F'],
'D': ['B'],
'E': ['B', 'F'],
'F': ['C', 'E']
}
def bfs(graph, start):
visited = set()
queue = deque([start])
while queue:
node = queue.popleft()
if node not in visited:
visited.add(node)
queue.extend(neighbor for neighbor in graph[node]
if neighbor not in visited)
return visited
def dfs(graph, start, visited=None):
if visited is None:
visited = set()
visited.add(start)
for neighbor in graph[start]:
if neighbor not in visited:
dfs(graph, neighbor, visited)
return visited
Use for: Social networks, maps, dependency resolution, network routing.
How to Choose a Data Structure
| Requirement | Recommended Data Structure | |-------------|---------------------------| | Fast index-based access | Array | | Fast insert/delete at front | Linked List | | LIFO access | Stack | | FIFO access | Queue | | Fast key-value lookup | Hash Table | | Sorted data with range queries | Balanced BST (e.g., AVL, Red-Black) | | Always need min/max element | Heap | | Hierarchical data | Tree | | Relationships between items | Graph | | String prefix search | Trie |
Summary
Data structures are the foundation of efficient algorithm design. Arrays give fast access but slow inserts. Linked lists give fast inserts but slow access. Hash tables give fast lookups. Trees maintain order. Heaps prioritize extremes. Graphs model relationships. Choose the right tool for your specific performance requirements.
Key takeaways:
- Arrays: O(1) access, O(n) insert/delete — use for known-size, index-heavy workloads
- Linked Lists: O(1) insert/delete (given position), O(n) search — use for frequent modifications
- Hash Tables: O(1) average for all operations — use for lookups and de-duplication
- BSTs: O(log n) average — use for sorted data and range queries
- Heaps: O(log n) min/max extraction — use for priority queues
- Graphs: model arbitrary relationships — use for networks, maps, dependencies
- The right data structure can turn O(n²) into O(n log n) or O(n)
What's Next: Algorithm Basics
The next chapter covers algorithm fundamentals — sorting, searching, recursion, and the principles of algorithm design and analysis.