A high-performance Rust implementation of Python's difflib.unified_diff function with PyO3 bindings.
This package provides a Rust-based implementation of the unified diff algorithm, offering significant performance improvements over Python's built-in difflib module while maintaining API compatibility.
- 🚀 3-5x Faster: Consistently outperforms Python's difflib across all file sizes and change patterns (see Performance section for detailed benchmarks)
- 100% Compatible: Drop-in replacement for difflib.unified_diffwith identical output
- Thoroughly Tested: Comprehensive test suite ensuring byte-for-byte compatibility with Python's implementation
- Easy to use: Simple Python API with PyO3 bindings
pip install difflib-rs# Clone the repository
git clone https://github.com/sweepai/difflib-rs.git
cd difflib-rs
# Set up virtual environment
python -m venv venv
source venv/bin/activate
# Install build dependencies
pip install maturin pytest
# Build and install
maturin develop --releaseThis is a drop-in replacement for Python's difflib.unified_diff. Simply replace your import:
- from difflib import unified_diff
+ from difflib_rs import unified_diff
# Compare two sequences of lines
a = ['line1', 'line2', 'line3']
b = ['line1', 'modified', 'line3']
diff = unified_diff(
    a, b,
    fromfile='original.txt',
    tofile='modified.txt',
    fromfiledate='2023-01-01',
    tofiledate='2023-01-02'
)
for line in diff:
    print(line, end='')Note: Currently, only unified_diff is supported. Other difflib functions are not implemented, but pull requests are welcome!
Most agents (including Sweep) can add support for any other methods if needed. A copy of the Python implementation is provided in src/difflib.py for reference.
For additional convenience, use unified_diff_str directly with (unsplit) strings:
from difflib_rs import unified_diff_str
# Compare two strings directly - no need to split first!
text_a = """line1
line2
line3"""
text_b = """line1
modified
line3"""
# The function handles splitting internally (more efficient)
diff = unified_diff_str(
    text_a, text_b,
    fromfile='original.txt',
    tofile='modified.txt',
    keepends=False  # Whether to keep line endings in the diff
)
for line in diff:
    print(line, end='')The unified_diff_str function:
- Takes strings directly instead of lists
- Handles line splitting internally in Rust (faster than Python's splitlines())
- Supports \n,\r\n, and\rline endings
- Has a keependsparameter to preserve line endings in the output
The Rust implementation consistently outperforms Python's built-in difflib module while producing identical output:
| File Size | Python Time | Rust Time | Speedup | Output Lines | 
|---|---|---|---|---|
| 100 lines | 86.0μs | 38.3μs | 2.24x | 71 | 
| 500 lines | 450.6μs | 130.3μs | 3.46x | 300 | 
| 1,000 lines | 910.2μs | 220.8μs | 4.12x | 587 | 
| 2,000 lines | 2203.1μs | 482.3μs | 4.57x | 1,222 | 
| File Size | Python Time | Rust Time | Speedup | Output Lines | 
|---|---|---|---|---|
| 100 lines | 167.9μs | 49.3μs | 3.41x | 131 | 
| 500 lines | 1028.5μs | 252.0μs | 4.08x | 655 | 
| 1,000 lines | 1925.0μs | 414.3μs | 4.65x | 1,285 | 
| File Size | Changes | Python Time | Rust Time | Speedup | Output Lines | 
|---|---|---|---|---|---|
| 5,000 lines | 5 | 2842.0μs | 859.7μs | 3.31x | 47 | 
| 10,000 lines | 5 | 5003.2μs | 1471.3μs | 3.40x | 47 | 
| 20,000 lines | 5 | 8470.5μs | 2821.6μs | 3.00x | 47 | 
| File Size | Changes | Python Time | Rust Time | Speedup | Output Lines | 
|---|---|---|---|---|---|
| 5,000 lines | 250 | 7985.5μs | 1579.4μs | 5.06x | 1,869 | 
| 10,000 lines | 500 | 14692.5μs | 2833.8μs | 5.18x | 3,793 | 
| 20,000 lines | 1,000 | 34949.0μs | 6461.2μs | 5.41x | 7,569 | 
| Test Case | Python Time | Rust Time | Speedup | 
|---|---|---|---|
| Identical sequences (5,000 lines) | 1773.1μs | 406.1μs | 4.37x | 
| Completely different (1,000 lines) | 284.5μs | 219.8μs | 1.29x | 
Performance comparison of unified_diff_str vs unified_diff with Python splitlines():
| File Size | Python split + Rust diff | All Rust ( unified_diff_str) | Speedup | 
|---|---|---|---|
| 100 lines | 54.8μs | 21.1μs | 2.59x | 
| 500 lines | 169.9μs | 118.3μs | 1.44x | 
| 1000 lines | 316.1μs | 248.3μs | 1.27x | 
| 2000 lines | 654.8μs | 550.4μs | 1.19x | 
def unified_diff(a, b, fromfile='', tofile='', fromfiledate='', tofiledate='', n=3, lineterm='\n'):
    """
    Compare two sequences of lines; generate the unified diff.
    
    Unified diffs are a compact way of showing line changes and a few
    lines of context. The number of context lines is set by n which
    defaults to three.
    
    Parameters:
        a: Sequence of lines to compare (the 'from' file)
        b: Sequence of lines to compare (the 'to' file)
        fromfile: Label to use for the 'from' file in the diff header
        tofile: Label to use for the 'to' file in the diff header
        fromfiledate: Modification date of the 'from' file
        tofiledate: Modification date of the 'to' file
        n: Number of context lines (default: 3)
        lineterm: Line terminator to use (default: '\n')
    
    Returns:
        Generator yielding unified diff format strings
    
    Note: This is a high-performance Rust implementation that provides
    3-5x speedup over Python's difflib while maintaining 100% compatibility.
    """
    pass# Activate virtual environment
source venv/bin/activate
# Run tests
python -m pytest tests/ -v
# Run benchmarks
python -m pytest tests/test_benchmark.py -s
# Build the package with optimizations
maturin develop --releaseIf you want a feature or have an idea, just create a pull request! Contributions are welcome.
Everything in this project was written by Sweep AI, an AI agent for Jetbrains IDEs.