String Similarity Algorithms

Common string similarity algorithms sim: (Σ* × Σ*) → [0, 1].

Open value range algorithms (like Hamming) are normalized.

Hamming

Complexity: O(n)

def hamming_distance(str1: str, str2: str) -> int

def hamming(str1: str, str2: str) -> float

The shorter string is padded with blank symbols to apply the algorithm.

Levenshtein

Complexity: O(n²)

def levenshtein_distance(str1: str, str2: str) -> int

def levenshtein(str1: str, str2: str) -> float

Damerau-Levenshtein

Complexity: O(n²)

def damerau_levenshtein_distance(str1: str, str2: str) -> int

def damerau_levenshtein(str1: str, str2: str) -> float

Jaro

Complexity: O(n²)

def jaro(str1: str, str2: str) -> float

Jaro-Winkler

Complexity: O(n²)

def jaro_winkler(str1: str, str2: str, p: float = 0.1) -> float

Jaccard

Complexity: O(n)

def jaccard(str1: str, str2: str) -> float

The set based similarity algorithms use character and index combination to mimic set element identity ({ (character, index) ∀ c ∈ S₁, S₂ }).

Sørensen-Dice

Complexity: O(n)

def sorensen_dice(str1: str, str2: str) -> float

Szymkiewicz-Simpson

Complexity: O(n)

def szymkiewicz_simpson(str1: str, str2: str) -> float

Szymkiewicz-Simpson is also simply known as “overlap”.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
.vscode		.vscode
string_similarity_algorithms		string_similarity_algorithms
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

String Similarity Algorithms

Hamming

Levenshtein

Damerau-Levenshtein

Jaro

Jaro-Winkler

Jaccard

Sørensen-Dice

Szymkiewicz-Simpson

About

Uh oh!

Languages

License

t-ski/string-similarity-algorithms

Folders and files

Latest commit

History

Repository files navigation

String Similarity Algorithms

Hamming

Levenshtein

Damerau-Levenshtein

Jaro

Jaro-Winkler

Jaccard

Sørensen-Dice

Szymkiewicz-Simpson

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Languages