title: "Edit Distance (Levenshtein Distance)" description: "Implement Levenshtein Distance for spell checking, DNA sequence alignment, and fuzzy search." order: 4

Edit Distance (Levenshtein Distance)

What is Edit Distance?

Levenshtein Distance calculates the minimum edits (insert, delete, substitute) to transform one string into another.

| Operation | Cost | Example | |-----------|------|---------| | Insert | 1 | "cat" -> "cats" | | Delete | 1 | "cats" -> "cat" | | Substitute | 1 | "cat" -> "cut" | | Match | 0 | "cat" -> "cat" |

Implementation

def edit_distance(s1, s2):
    m, n = len(s1), len(s2)
    dp = [[0] * (n + 1) for _ in range(m + 1)]
    for i in range(m + 1): dp[i][0] = i
    for j in range(n + 1): dp[0][j] = j
    for i in range(1, m + 1):
        for j in range(1, n + 1):
            if s1[i-1] == s2[j-1]:
                dp[i][j] = dp[i-1][j-1]
            else:
                dp[i][j] = 1 + min(dp[i-1][j], dp[i][j-1], dp[i-1][j-1])
    return dp[m][n]

print(f"kitten -> sitting: {edit_distance('kitten', 'sitting')} steps")
print(f"Vibe -> Vibes: {edit_distance('Vibe', 'Vibes')} steps")

Applications

| Domain | Use Case | |--------|----------| | Spell Checker | Suggest correct spelling for typos | | DNA Sequencing | Calculate genetic mutation distance | | Plagiarism Detection | Detect modified plagiarism | | Speech Recognition | Match transcribed text to known phrases | | Fuzzy Search | Find matches even with typos |

Why Edit Distance Matters

| Scenario | Application | |----------|-------------| | Spell Check | "recieve" vs "receive" distance = 1 | | DNA Alignment | Gene sequence similarity | | Plagiarism Detection | Smaller distance = more similar | | Auto Grading | Error count from OCR results | | Google Search | "Did you mean: xxx" |

Space Optimization

def edit_distance_optimized(s1, s2):
    m, n = len(s1), len(s2)
    if m < n: s1, s2 = s2, s1; m, n = n, m
    prev = list(range(n + 1))
    curr = [0] * (n + 1)
    for i in range(1, m + 1):
        curr[0] = i
        for j in range(1, n + 1):
            cost = 0 if s1[i-1] == s2[j-1] else 1
            curr[j] = min(prev[j] + 1, curr[j-1] + 1, prev[j-1] + cost)
        prev, curr = curr, prev
    return prev[n]

Next Chapter: Spell Check API

Edit distance wrapped as a FastAPI service - the practical application.

Applications of Edit Distance

| Domain | Application | |--------|-------------| | Spell Checking | Suggest corrections for misspelled words | | DNA Sequencing | Measure similarity between genetic sequences | | Plagiarism Detection | Compare documents for copied content | | Speech Recognition | Match recognized words to dictionary entries | | Version Control | Track differences between file versions (git diff) | | Autocomplete | Rank suggestion candidates by edit distance | | Machine Translation | Align source and target language sentences |

How Edit Distance Works

The algorithm uses dynamic programming to build an (m+1) x (n+1) table where dp[i][j] represents the minimum edit distance between the first i characters of s1 and the first j characters of s2.

Recurrence Relation

dp[i][j] = min(
    dp[i-1][j] + 1,      # delete s1[i]
    dp[i][j-1] + 1,      # insert s2[j]
    dp[i-1][j-1] + cost  # substitute (cost = 0 if same, 1 if different)
)

Example Table for "kitten" → "sitting"

| | ε | s | i | t | t | i | n | g | |---|---|---|---|---|---|---|---|---| | ε | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | | k | 1 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | | i | 2 | 2 | 1 | 2 | 3 | 4 | 5 | 6 | | t | 3 | 3 | 2 | 1 | 2 | 3 | 4 | 5 | | t | 4 | 4 | 3 | 2 | 1 | 2 | 3 | 4 | | e | 5 | 5 | 4 | 3 | 2 | 2 | 3 | 4 | | n | 6 | 6 | 5 | 4 | 3 | 3 | 2 | 3 |

The final answer dp[6][7] = 3. The minimum edit distance from "kitten" to "sitting" is 3 operations:

Substitute k → s
Substitute e → i
Insert g

Full DP Table Implementation

def edit_distance_full_table(s1: str, s2: str) -> int:
    """
    Compute minimum edit distance between s1 and s2.
    Operations: insert, delete, substitute (all cost 1).
    """
    m, n = len(s1), len(s2)

    # Create DP table
    dp = [[0] * (n + 1) for _ in range(m + 1)]

    # Initialize first row and column
    for i in range(m + 1):
        dp[i][0] = i  # deleting all characters from s1
    for j in range(n + 1):
        dp[0][j] = j  # inserting all characters of s2

    # Fill the DP table
    for i in range(1, m + 1):
        for j in range(1, n + 1):
            cost = 0 if s1[i - 1] == s2[j - 1] else 1
            dp[i][j] = min(
                dp[i - 1][j] + 1,      # delete
                dp[i][j - 1] + 1,      # insert
                dp[i - 1][j - 1] + cost  # substitute
            )

    return dp[m][n]

# Test
print(edit_distance_full_table("kitten", "sitting"))  # 3
print(edit_distance_full_table("horse", "ros"))       # 3
print(edit_distance_full_table("", "abc"))            # 3
print(edit_distance_full_table("same", "same"))       # 0

Weighted Edit Distance

Different operations can have different costs:

def weighted_edit_distance(s1: str, s2: str,
                          insert_cost=1, delete_cost=1, substitute_cost=2):
    """
    Edit distance with configurable operation costs.
    Useful when substitution should cost more than insert/delete.
    """
    m, n = len(s1), len(s2)
    dp = [[0] * (n + 1) for _ in range(m + 1)]

    for i in range(m + 1):
        dp[i][0] = i * delete_cost
    for j in range(n + 1):
        dp[0][j] = j * insert_cost

    for i in range(1, m + 1):
        for j in range(1, n + 1):
            cost = 0 if s1[i - 1] == s2[j - 1] else substitute_cost
            dp[i][j] = min(
                dp[i - 1][j] + delete_cost,
                dp[i][j - 1] + insert_cost,
                dp[i - 1][j - 1] + cost
            )

    return dp[m][n]

print(weighted_edit_distance("kitten", "sitting"))  # 5 (2 changes x 2 + 1 insert = 5)

Damerau-Levenshtein Distance

Adds the transposition operation (swap adjacent characters):

def damerau_levenshtein(s1: str, s2: str) -> int:
    """
    Edit distance with transpositions.
    "cat" → "act" costs 1 (swap adjacent c and a) instead of 2.
    """
    m, n = len(s1), len(s2)
    dp = [[0] * (n + 1) for _ in range(m + 1)]

    for i in range(m + 1):
        dp[i][0] = i
    for j in range(n + 1):
        dp[0][j] = j

    for i in range(1, m + 1):
        for j in range(1, n + 1):
            cost = 0 if s1[i - 1] == s2[j - 1] else 1
            dp[i][j] = min(
                dp[i - 1][j] + 1,
                dp[i][j - 1] + 1,
                dp[i - 1][j - 1] + cost
            )
            # Transposition check
            if i > 1 and j > 1 and s1[i - 1] == s2[j - 2] and s1[i - 2] == s2[j - 1]:
                dp[i][j] = min(dp[i][j], dp[i - 2][j - 2] + cost)

    return dp[m][n]

print(damerau_levenshtein("cat", "act"))    # 1 (transposition)
print(damerau_levenshtein("kitten", "sitting"))  # 3

Finding the Operations (Backtrace)

def edit_distance_with_operations(s1: str, s2: str):
    """
    Compute edit distance and return the operations.
    """
    m, n = len(s1), len(s2)
    dp = [[0] * (n + 1) for _ in range(m + 1)]

    for i in range(m + 1):
        dp[i][0] = i
    for j in range(n + 1):
        dp[0][j] = j

    for i in range(1, m + 1):
        for j in range(1, n + 1):
            cost = 0 if s1[i - 1] == s2[j - 1] else 1
            dp[i][j] = min(
                dp[i - 1][j] + 1,
                dp[i][j - 1] + 1,
                dp[i - 1][j - 1] + cost
            )

    # Backtrace to find operations
    operations = []
    i, j = m, n
    while i > 0 or j > 0:
        if i > 0 and dp[i][j] == dp[i - 1][j] + 1:
            operations.append(f"Delete '{s1[i-1]}' at position {i-1}")
            i -= 1
        elif j > 0 and dp[i][j] == dp[i][j - 1] + 1:
            operations.append(f"Insert '{s2[j-1]}' at position {j-1}")
            j -= 1
        else:
            if s1[i - 1] != s2[j - 1]:
                operations.append(f"Substitute '{s1[i-1]}' → '{s2[j-1]}' at position {i-1}")
            i -= 1
            j -= 1

    operations.reverse()
    return dp[m][n], operations

# Test
dist, ops = edit_distance_with_operations("kitten", "sitting")
print(f"Distance: {dist}")
print("Operations:")
for op in ops:
    print(f"  {op}")

Similarity Score

Convert edit distance to a similarity percentage:

def similarity(s1: str, s2: str) -> float:
    """
    Compute string similarity as a percentage (0.0 to 1.0).
    1.0 = identical, 0.0 = completely different.
    """
    distance = edit_distance_full_table(s1, s2)
    max_len = max(len(s1), len(s2))
    if max_len == 0:
        return 1.0
    return 1.0 - (distance / max_len)

print(similarity("kitten", "sitting"))   # ~0.57
print(similarity("same", "same"))        # 1.0
print(similarity("abc", "xyz"))          # 0.0

Summary

Edit distance (Levenshtein distance) measures how many single-character operations are needed to transform one string into another. Dynamic programming fills a table using the recurrence: min(delete, insert, substitute). Variants include weighted costs, Damerau-Levenshtein (adds transposition), and similarity scoring.

Key takeaways:

DP table: dp[i][j] = min cost to convert first i chars of s1 to first j chars of s2
Recurrence: min(delete, insert, substitute) with cost 0 if characters match
Space optimization: only keep two rows (O(min(m,n)) instead of full O(mn))
Weighted version: different costs for insert, delete, substitute
Damerau-Levenshtein: adds transposition (swap adjacent characters)
Backtrace: reconstruct the actual operations from the DP table
Similarity: 1 - (distance / max_len) gives a 0-1 similarity score
Applications: spell check, DNA analysis, plagiarism detection, autocomplete

What's Next: Spell Check API

The next chapter wraps edit distance into a FastAPI spell-check service — a practical API that suggests corrections for misspelled words.