Huffman Coding

🔥 Vibe Prompt

"Implement Huffman coding for 'hello world'. Output the code table and compression ratio."

import heapq
from collections import Counter

class HuffmanNode:
    def __init__(self, char, freq):
        self.char = char
        self.freq = freq
        self.left = None
        self.right = None
    def __lt__(self, other):
        return self.freq < other.freq

def huffman_encoding(text):
    freq = Counter(text)
    heap = [HuffmanNode(c, f) for c, f in freq.items()]
    heapq.heapify(heap)
    while len(heap) > 1:
        left = heapq.heappop(heap)
        right = heapq.heappop(heap)
        merged = HuffmanNode(None, left.freq + right.freq)
        merged.left = left
        merged.right = right
        heapq.heappush(heap, merged)
    codes = {}
    def build_code(node, prefix=""):
        if node.char:
            codes[node.char] = prefix
        else:
            build_code(node.left, prefix + "0")
            build_code(node.right, prefix + "1")
    build_code(heap[0])
    encoded = ''.join(codes[c] for c in text)
    return codes, encoded, len(encoded) / (len(text) * 8)

codes, encoded, ratio = huffman_encoding("hello world")
print(f"Codes: {codes}")
print(f"Compression ratio: {ratio:.1%}")

Chapter Summary

Understand core concepts and principles
Master implementation methods and techniques
Familiar with common issues and solutions
Able to apply in real projects

Implementation Example

Basic Example

# This section provides a complete implementation example

Steps

Setup: Configure development environment
Data: Prepare required data
Implementation: Build core functionality
Testing: Verify correctness
Optimization: Improve performance

Common Errors

| Error Type | Cause | Solution | |------------|-------|----------| | Compilation | Syntax | Check code syntax | | Runtime | Environment | Verify dependencies installed | | Logic | Algorithm | Step-by-step debugging | | Performance | Efficiency | Use profilers |

Code Example

import sys

def main():
    print("Hello, World!")

if __name__ == "__main__":
    main()

References

Official documentation
API reference
Open source examples
Community discussions

Huffman Coding Algorithm

Huffman coding assigns variable-length binary codes to characters based on their frequency. More frequent characters get shorter codes, achieving compression.

Algorithm Steps

Count the frequency of each character
Create a leaf node for each character with its frequency
Build a min-heap of all leaf nodes
While more than one node remains:
- Extract the two nodes with smallest frequencies
- Create a new internal node with frequency = sum of both
- Make the two nodes its left and right children
- Insert the new node into the heap
The remaining node is the root of the Huffman tree
Assign codes by traversing the tree (left = 0, right = 1)

Implementation

import heapq
from collections import defaultdict, Counter

class HuffmanNode:
    def __init__(self, char, freq):
        self.char = char
        self.freq = freq
        self.left = None
        self.right = None

    def __lt__(self, other):
        return self.freq < other.freq


def build_huffman_tree(text):
    """Build Huffman tree from input text."""
    freq = Counter(text)
    heap = [HuffmanNode(char, f) for char, f in freq.items()]
    heapq.heapify(heap)

    while len(heap) > 1:
        left = heapq.heappop(heap)
        right = heapq.heappop(heap)
        internal = HuffmanNode(None, left.freq + right.freq)
        internal.left = left
        internal.right = right
        heapq.heappush(heap, internal)

    return heap[0] if heap else None


def assign_codes(node, code="", codes=None):
    """Assign binary codes by traversing the Huffman tree."""
    if codes is None:
        codes = {}
    if node is None:
        return codes
    if node.char is not None:  # Leaf node
        codes[node.char] = code or "0"
        return codes
    assign_codes(node.left, code + "0", codes)
    assign_codes(node.right, code + "1", codes)
    return codes


def huffman_encode(text):
    """Encode text using Huffman coding."""
    root = build_huffman_tree(text)
    codes = assign_codes(root)
    encoded = "".join(codes[ch] for ch in text)
    return encoded, codes, root


# Example
text = "this is an example for huffman encoding"
encoded, codes, root = huffman_encode(text)

print(f"Original: {len(text) * 8} bits")
print(f"Encoded: {len(encoded)} bits")
print(f"Compression ratio: {len(encoded) / (len(text) * 8) * 100:.1f}%")
print(f"\nCharacter codes:")
for char, code in sorted(codes.items(), key=lambda x: len(x[1])):
    print(f"  '{char}': {code}")

Decoding

def huffman_decode(encoded, root):
    """Decode a Huffman-encoded string back to text."""
    if root is None:
        return ""
    decoded = []
    node = root
    for bit in encoded:
        node = node.left if bit == "0" else node.right
        if node.char is not None:  # Leaf reached
            decoded.append(node.char)
            node = root
    return "".join(decoded)

# Verify
original = "this is an example for huffman encoding"
encoded, codes, root = huffman_encode(original)
decoded = huffman_decode(encoded, root)
print(f"Original: {original}")
print(f"Decoded:  {decoded}")
print(f"Match: {original == decoded}")

Compression Analysis

| Metric | Value | |--------|-------| | Original text length | 36 characters | | Original bits (ASCII) | 288 bits | | Huffman encoded bits | ~120 bits (varies by frequencies) | | Compression ratio | ~58% reduction | | Optimality | Huffman is optimal for symbol-by-symbol coding |

Applications

| Application | Description | |-------------|-------------| | JPEG | Huffman coding for compressed image data | | PNG | Deflate algorithm (uses Huffman) | | MP3 | Huffman coding for audio compression | | ZIP | Deflate compression (LZ77 + Huffman) | | Gzip | File compression (uses Huffman) |

Time Complexity

| Operation | Complexity | |-----------|-----------| | Frequency count | O(n) where n = text length | | Building heap | O(k) where k = unique characters | | Building tree | O(k log k) | | Assigning codes | O(k) tree traversal | | Encoding | O(n) | | Decoding | O(n) |

Summary

Huffman coding is a greedy algorithm for optimal data compression. It assigns shorter binary codes to more frequent characters. The Huffman tree is built bottom-up by repeatedly merging the two least frequent nodes. Huffman is optimal for symbol-by-symbol coding with known frequencies.

Key takeaways:

Huffman = variable-length codes based on frequency
More frequent characters = shorter codes
Build tree by merging two smallest frequency nodes
Left edge = 0, Right edge = 1
Huffman is optimal (minimum weighted path length)
Used in JPEG, PNG, MP3, ZIP, Gzip
Time: O(n + k log k) where n = text length, k = unique chars
Decoding requires the Huffman tree (or code table)

What's Next: Set Cover

The next chapter covers the Set Cover problem — a greedy approximation algorithm for covering all elements with minimum subsets.