Data Structures — Arrays, Lists, Trees, Hash Tables

Why Data Structures Matter

Data structures are the building blocks of all software. Choosing the right data structure can make the difference between a program that runs in seconds and one that takes hours. Every serious software engineer must understand the strengths and weaknesses of each data structure.

Why this matters for your career:

Data structure questions dominate technical interviews at every major tech company
Choosing the right data structure is the first step in algorithm design
Performance-critical applications depend on optimal data structure selection
Understanding data structures helps you use standard library containers effectively

Big O Time Complexity Reference

| Data Structure | Access | Search | Insert | Delete | |---------------|--------|--------|--------|--------| | Array | O(1) | O(n) | O(n) | O(n) | | Stack | O(n) | O(n) | O(1)* | O(1)* | | Queue | O(n) | O(n) | O(1)* | O(1)* | | Singly Linked List | O(n) | O(n) | O(1) | O(1) | | Doubly Linked List | O(n) | O(n) | O(1) | O(1) | | Hash Table | O(1) avg | O(1) avg | O(1) avg | O(1) avg | | Binary Search Tree | O(log n) avg | O(log n) avg | O(log n) avg | O(log n) avg | | Balanced BST (AVL) | O(log n) | O(log n) | O(log n) | O(log n) | | Heap | O(1) min | O(n) | O(log n) | O(log n) | | Trie | O(k) | O(k) | O(k) | O(k) | | Adjacency List (Graph) | O(1) | O(V+E) | O(1) | O(V+E) |

*Stack and Queue insert/delete are at the top/front only.

Arrays

Arrays store elements in contiguous memory locations. They provide O(1) random access by index but O(n) insertion and deletion.

# Static array (fixed size)
arr = [0] * 10  # [0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
arr[0] = 42     # O(1) access

# Dynamic array (like Python list)
dynamic = [1, 2, 3]
dynamic.append(4)   # O(1) amortized
dynamic.insert(0, 0)  # O(n) — shifts all elements
dynamic.pop()      # O(1)
dynamic.pop(0)     # O(n) — shifts all elements

Use when: You need fast index-based access, know the size in advance, or want memory locality (cache-friendly).

Linked Lists

Linked lists store elements in nodes, each pointing to the next. They provide O(1) insertion and deletion but O(n) access.

class ListNode:
    def __init__(self, val=0, next=None):
        self.val = val
        self.next = next

# Create: 1 → 2 → 3 → None
head = ListNode(1)
head.next = ListNode(2)
head.next.next = ListNode(3)

# Insert 0 at front: O(1)
new_head = ListNode(0)
new_head.next = head
head = new_head
# Now: 0 → 1 → 2 → 3 → None

# Delete 1: O(1) if we have reference to previous node
head.next = head.next.next
# Now: 0 → 2 → 3 → None

Use when: You need frequent insertions/deletions, don't need random access, or are implementing queues/stacks.

Stacks and Queues

Stack (LIFO)

stack = []
stack.append(1)  # push
stack.append(2)  # push
stack.append(3)  # push
print(stack.pop())  # → 3
print(stack.pop())  # → 2
print(stack.pop())  # → 1

Use for: Undo/redo, function call stack, expression evaluation, DFS traversal.

Queue (FIFO)

from collections import deque

queue = deque()
queue.append(1)    # enqueue (right)
queue.append(2)    # enqueue
queue.append(3)    # enqueue
print(queue.popleft())  # → 1 (dequeue from left)
print(queue.popleft())  # → 2
print(queue.popleft())  # → 3

Use for: Task scheduling, BFS traversal, message queues, print spooling.

Hash Tables

Hash tables map keys to values using a hash function. They provide average O(1) lookups.

hash_table = {}
hash_table["name"] = "Alice"    # Insert O(1) avg
hash_table["age"] = 30          # Insert O(1) avg
print(hash_table["name"])        # Lookup O(1) avg → "Alice"
del hash_table["age"]            # Delete O(1) avg

# Handle collisions (Python uses open addressing)
# Good hash functions minimize collisions
# Load factor = entries / buckets

Use when: You need fast lookups, de-duplication, counting, or caching.

Trees

Binary Search Tree (BST)

class TreeNode:
    def __init__(self, val):
        self.val = val
        self.left = None
        self.right = None

def insert(root, val):
    if root is None:
        return TreeNode(val)
    if val < root.val:
        root.left = insert(root.left, val)
    else:
        root.right = insert(root.right, val)
    return root

def search(root, val):
    if root is None or root.val == val:
        return root
    if val < root.val:
        return search(root.left, val)
    return search(root.right, val)

# Traversals
def inorder(root):
    if root:
        inorder(root.left)
        print(root.val)
        inorder(root.right)

Use when: You need sorted data, range queries, or hierarchical relationships.

Heaps

A heap is a complete binary tree where each parent is smaller (min-heap) or larger (max-heap) than its children.

import heapq

# Min-heap (default)
heap = []
heapq.heappush(heap, 5)
heapq.heappush(heap, 1)
heapq.heappush(heap, 3)
print(heapq.heappop(heap))  # → 1 (smallest)
print(heapq.heappop(heap))  # → 3
print(heapq.heappop(heap))  # → 5

# Max-heap (by negating values)
max_heap = []
heapq.heappush(max_heap, -5)
heapq.heappush(max_heap, -1)
heapq.heappush(max_heap, -3)
print(-heapq.heappop(max_heap))  # → 5 (largest)

Use for: Priority queues, scheduling, Dijkstra's algorithm, median finding.

Graphs

Graphs consist of vertices (nodes) and edges (connections).

# Adjacency list representation
graph = {
    'A': ['B', 'C'],
    'B': ['A', 'D', 'E'],
    'C': ['A', 'F'],
    'D': ['B'],
    'E': ['B', 'F'],
    'F': ['C', 'E']
}

def bfs(graph, start):
    visited = set()
    queue = deque([start])
    while queue:
        node = queue.popleft()
        if node not in visited:
            visited.add(node)
            queue.extend(neighbor for neighbor in graph[node]
                         if neighbor not in visited)
    return visited

def dfs(graph, start, visited=None):
    if visited is None:
        visited = set()
    visited.add(start)
    for neighbor in graph[start]:
        if neighbor not in visited:
            dfs(graph, neighbor, visited)
    return visited

Use for: Social networks, maps, dependency resolution, network routing.

How to Choose a Data Structure

| Requirement | Recommended Data Structure | |-------------|---------------------------| | Fast index-based access | Array | | Fast insert/delete at front | Linked List | | LIFO access | Stack | | FIFO access | Queue | | Fast key-value lookup | Hash Table | | Sorted data with range queries | Balanced BST (e.g., AVL, Red-Black) | | Always need min/max element | Heap | | Hierarchical data | Tree | | Relationships between items | Graph | | String prefix search | Trie |

Summary

Data structures are the foundation of efficient algorithm design. Arrays give fast access but slow inserts. Linked lists give fast inserts but slow access. Hash tables give fast lookups. Trees maintain order. Heaps prioritize extremes. Graphs model relationships. Choose the right tool for your specific performance requirements.

Key takeaways:

Arrays: O(1) access, O(n) insert/delete — use for known-size, index-heavy workloads
Linked Lists: O(1) insert/delete (given position), O(n) search — use for frequent modifications
Hash Tables: O(1) average for all operations — use for lookups and de-duplication
BSTs: O(log n) average — use for sorted data and range queries
Heaps: O(log n) min/max extraction — use for priority queues
Graphs: model arbitrary relationships — use for networks, maps, dependencies
The right data structure can turn O(n²) into O(n log n) or O(n)

What's Next: Algorithm Basics

The next chapter covers algorithm fundamentals — sorting, searching, recursion, and the principles of algorithm design and analysis.