Huffman Coding
๐ฅ Vibe Prompt
"Implement Huffman coding for 'hello world'. Output the code table and compression ratio."
import heapq
from collections import Counter
class HuffmanNode:
def __init__(self, char, freq):
self.char = char
self.freq = freq
self.left = None
self.right = None
def __lt__(self, other):
return self.freq < other.freq
def huffman_encoding(text):
freq = Counter(text)
heap = [HuffmanNode(c, f) for c, f in freq.items()]
heapq.heapify(heap)
while len(heap) > 1:
left = heapq.heappop(heap)
right = heapq.heappop(heap)
merged = HuffmanNode(None, left.freq + right.freq)
merged.left = left
merged.right = right
heapq.heappush(heap, merged)
codes = {}
def build_code(node, prefix=""):
if node.char:
codes[node.char] = prefix
else:
build_code(node.left, prefix + "0")
build_code(node.right, prefix + "1")
build_code(heap[0])
encoded = ''.join(codes[c] for c in text)
return codes, encoded, len(encoded) / (len(text) * 8)
codes, encoded, ratio = huffman_encoding("hello world")
print(f"Codes: {codes}")
print(f"Compression ratio: {ratio:.1%}")
Chapter Summary
- Understand core concepts and principles
- Master implementation methods and techniques
- Familiar with common issues and solutions
- Able to apply in real projects
Further Reading
- Official documentation and API references
- Open source examples on GitHub
- Technical books and online courses
- Community discussions and tech blogs
Implementation Example
Basic Example
# This section provides a complete implementation example
Steps
- Setup: Configure development environment
- Data: Prepare required data
- Implementation: Build core functionality
- Testing: Verify correctness
- Optimization: Improve performance
Common Errors
| Error Type | Cause | Solution | |------------|-------|----------| | Compilation | Syntax | Check code syntax | | Runtime | Environment | Verify dependencies installed | | Logic | Algorithm | Step-by-step debugging | | Performance | Efficiency | Use profilers |
Code Example
import sys
def main():
print("Hello, World!")
if __name__ == "__main__":
main()
References
- Official documentation
- API reference
- Open source examples
- Community discussions
Huffman Coding Algorithm
Huffman coding assigns variable-length binary codes to characters based on their frequency. More frequent characters get shorter codes, achieving compression.
Algorithm Steps
- Count the frequency of each character
- Create a leaf node for each character with its frequency
- Build a min-heap of all leaf nodes
- While more than one node remains:
- Extract the two nodes with smallest frequencies
- Create a new internal node with frequency = sum of both
- Make the two nodes its left and right children
- Insert the new node into the heap
- The remaining node is the root of the Huffman tree
- Assign codes by traversing the tree (left = 0, right = 1)
Implementation
import heapq
from collections import defaultdict, Counter
class HuffmanNode:
def __init__(self, char, freq):
self.char = char
self.freq = freq
self.left = None
self.right = None
def __lt__(self, other):
return self.freq < other.freq
def build_huffman_tree(text):
"""Build Huffman tree from input text."""
freq = Counter(text)
heap = [HuffmanNode(char, f) for char, f in freq.items()]
heapq.heapify(heap)
while len(heap) > 1:
left = heapq.heappop(heap)
right = heapq.heappop(heap)
internal = HuffmanNode(None, left.freq + right.freq)
internal.left = left
internal.right = right
heapq.heappush(heap, internal)
return heap[0] if heap else None
def assign_codes(node, code="", codes=None):
"""Assign binary codes by traversing the Huffman tree."""
if codes is None:
codes = {}
if node is None:
return codes
if node.char is not None: # Leaf node
codes[node.char] = code or "0"
return codes
assign_codes(node.left, code + "0", codes)
assign_codes(node.right, code + "1", codes)
return codes
def huffman_encode(text):
"""Encode text using Huffman coding."""
root = build_huffman_tree(text)
codes = assign_codes(root)
encoded = "".join(codes[ch] for ch in text)
return encoded, codes, root
# Example
text = "this is an example for huffman encoding"
encoded, codes, root = huffman_encode(text)
print(f"Original: {len(text) * 8} bits")
print(f"Encoded: {len(encoded)} bits")
print(f"Compression ratio: {len(encoded) / (len(text) * 8) * 100:.1f}%")
print(f"\nCharacter codes:")
for char, code in sorted(codes.items(), key=lambda x: len(x[1])):
print(f" '{char}': {code}")
Decoding
def huffman_decode(encoded, root):
"""Decode a Huffman-encoded string back to text."""
if root is None:
return ""
decoded = []
node = root
for bit in encoded:
node = node.left if bit == "0" else node.right
if node.char is not None: # Leaf reached
decoded.append(node.char)
node = root
return "".join(decoded)
# Verify
original = "this is an example for huffman encoding"
encoded, codes, root = huffman_encode(original)
decoded = huffman_decode(encoded, root)
print(f"Original: {original}")
print(f"Decoded: {decoded}")
print(f"Match: {original == decoded}")
Compression Analysis
| Metric | Value | |--------|-------| | Original text length | 36 characters | | Original bits (ASCII) | 288 bits | | Huffman encoded bits | ~120 bits (varies by frequencies) | | Compression ratio | ~58% reduction | | Optimality | Huffman is optimal for symbol-by-symbol coding |
Applications
| Application | Description | |-------------|-------------| | JPEG | Huffman coding for compressed image data | | PNG | Deflate algorithm (uses Huffman) | | MP3 | Huffman coding for audio compression | | ZIP | Deflate compression (LZ77 + Huffman) | | Gzip | File compression (uses Huffman) |
Time Complexity
| Operation | Complexity | |-----------|-----------| | Frequency count | O(n) where n = text length | | Building heap | O(k) where k = unique characters | | Building tree | O(k log k) | | Assigning codes | O(k) tree traversal | | Encoding | O(n) | | Decoding | O(n) |
Summary
Huffman coding is a greedy algorithm for optimal data compression. It assigns shorter binary codes to more frequent characters. The Huffman tree is built bottom-up by repeatedly merging the two least frequent nodes. Huffman is optimal for symbol-by-symbol coding with known frequencies.
Key takeaways:
- Huffman = variable-length codes based on frequency
- More frequent characters = shorter codes
- Build tree by merging two smallest frequency nodes
- Left edge = 0, Right edge = 1
- Huffman is optimal (minimum weighted path length)
- Used in JPEG, PNG, MP3, ZIP, Gzip
- Time: O(n + k log k) where n = text length, k = unique chars
- Decoding requires the Huffman tree (or code table)
What's Next: Set Cover
The next chapter covers the Set Cover problem โ a greedy approximation algorithm for covering all elements with minimum subsets.