A comprehensive, production-ready JSON repair library for Elixir that intelligently fixes malformed JSON strings from any source—LLMs, legacy systems, data pipelines, streaming APIs, and human input.
JsonRemedy uses a sophisticated pre-processing stage followed by a 5-layer repair pipeline where each layer employs the most appropriate technique: pattern detection, content cleaning, state machines for structural repairs, character-by-character parsing for syntax normalization, and battle-tested parsers for validation. The result is a robust system that handles virtually any JSON malformation while preserving valid content.
Malformed JSON is everywhere in real-world systems:
// LLM output with mixed issues
```json
{
users: [
{name: 'Alice Johnson', active: True, scores: [95, 87, 92,]},
{name: "Bob Smith", active: False /* incomplete
],
metadata: None
# Legacy Python system output
{'users': [{'name': 'Alice', 'verified': True, 'data': None}]}// Copy-paste from JavaScript console
{name: "Alice", getValue: function() { return "test"; }, data: [1,2,3]}// Streaming API with connection drop
{"status": "processing", "results": [{"id": 1, "name": "Alice"
// Human input with common mistakes
{name: Alice, "age": 30, "scores": [95 87 92], active: true,}
Standard JSON parsers fail completely on these inputs. JsonRemedy fixes them intelligently.
Runs before the main layer pipeline to handle complex patterns that would otherwise be broken by subsequent layers:
- Multiple JSON detection:
[]{}→[[], {}]- Aggregates consecutive JSON values - Object boundary merging:
{"a":"b"},"c":"d"}→{"a":"b","c":"d"}- Merges split objects - Ellipsis filtering:
[1,2,3,...]→[1,2,3]- Removes unquoted ellipsis placeholders (LLM truncation markers) - Keyword filtering:
{"a":1, COMMENT "b":2}→{"a":1,"b":2}- Removes debug keywords (COMMENT, DEBUG_INFO, PLACEHOLDER, TODO, etc.)
Inspired by patterns from the json_repair Python library
- Code fences:
json ...→ clean JSON - Comments:
// line commentsand/* block comments */→ removed - Hash comments:
# python-style comments→ removed - Wrapper text: Extracts JSON from prose, HTML tags, API responses
- Trailing text removal:
[{"id": 1}]\n1 Volume(s) created→[{"id": 1}](v0.1.3+) - Encoding normalization: UTF-8 handling and cleanup
- Missing closing delimiters:
{"name": "Alice"→{"name": "Alice"} - Array element terminators: Multi-item arrays recover missing braces/brackets between elements (v0.1.10)
- Extra delimiters:
{"name": "Alice"}}}→{"name": "Alice"} - Mismatched delimiters:
[{"name": "Alice"}]→ proper structure - Missing opening braces:
["key": "value"]→[{"key": "value"}] - Concatenated objects:
{"a":1}{"b":2}→[{"a":1},{"b":2}] - Misplaced colons:
{"a": 1 : "b": 2}→{"a": 1, "b": 2} - Complex nesting: Intelligent repair of deeply nested structures
- Quote variants:
'single',"smart",""doubled""→"standard" - Unquoted keys:
{name: "value"}→{"name": "value"} - Boolean variants:
True,TRUE,false→true,false - Null variants:
None,NULL,Null→null - Trailing commas:
[1, 2, 3,]→[1, 2, 3] - Missing commas:
[1 2 3]→[1, 2, 3] - Missing colons:
{"name" "value"}→{"name": "value"} - Escape sequences:
\n,\t,\uXXXX→ proper Unicode - Unescaped quotes:
"text "quoted" text"→ proper escaping - Trailing backslashes: Streaming artifact cleanup
- Unquoted HTML values:
"body":<!DOCTYPE HTML>...→"body":"<!DOCTYPE HTML>..."(v0.1.5+)- Handles full HTML error pages from APIs
- DOCTYPE declarations, comments, void elements
- Self-closing tags, nested structures
- Proper escaping of quotes, newlines, special chars
- Byte + grapheme metadata for consumer slices (v0.1.8)
🔧 Hardcoded Patterns (ported from json_repair Python library)
Layer 3 includes battle-tested cleanup patterns for edge cases commonly found in LLM output:
- Smart quotes:
"curly",«guillemets»,‹angles›→"standard" - Doubled quotes:
""value""→"value"(preserves empty strings"") - Number formats:
1,234,567→1234567(removes thousands separators) - Unicode escapes:
\u263a→☺(opt-in via:enable_escape_normalization) - Hex escapes:
\x41→A(opt-in via:enable_escape_normalization)
These patterns run as a pre-processing step and can be controlled via feature flags. See examples/hardcoded_patterns_examples.exs for demonstrations.
- Jason.decode optimization: Valid JSON uses battle-tested parser
- Performance monitoring: Automatic fallback for complex repairs
- Early exit: Stop processing when JSON is clean
- Lenient number parsing:
123,456→123(with backtracking) - Planned - Number fallback: Malformed numbers become strings vs. failing - Planned
- Literal disambiguation: Smart detection of booleans vs. strings - Planned
- Aggressive error recovery: Extract meaningful data from severely malformed input - Planned
- Stream-safe parsing: Handle incomplete or truncated JSON - Planned
JsonRemedy understands JSON structure to preserve valid content:
# ✅ PRESERVE: Comma inside string content
{"message": "Hello, world", "status": "ok"}
# ✅ REMOVE: Trailing comma
{"items": [1, 2, 3,]}
# ✅ PRESERVE: Numbers stay numbers
{"count": 42}
# ✅ QUOTE: Unquoted keys get quoted
{name: "Alice"}
# ✅ PRESERVE: Boolean content in strings
{"note": "Set active to True"}
# ✅ NORMALIZE: Boolean values
{"active": True}
# ✅ PRESERVE: Escape sequences in strings
{"path": "C:\\Users\\Alice"}
# ✅ PARSE: Unicode escapes
{"unicode": "\\u0048\\u0065\\u006c\\u006c\\u006f"}Add JsonRemedy to your mix.exs:
def deps do
[
{:json_remedy, "~> 0.1.10"}
]
end# Simple repair and parse
malformed = ~s|{name: "Alice", age: 30, active: True}|
{:ok, data} = JsonRemedy.repair(malformed)
# => %{"name" => "Alice", "age" => 30, "active" => true}
# Get the repaired JSON string
{:ok, fixed_json} = JsonRemedy.repair_to_string(malformed)
# => "{\"name\":\"Alice\",\"age\":30,\"active\":true}"
# Track what was repaired
{:ok, data, repairs} = JsonRemedy.repair(malformed, logging: true)
# => repairs: [
# %{layer: :syntax_normalization, action: "quoted unquoted key 'name'"},
# %{layer: :syntax_normalization, action: "normalized boolean True -> true"}
# ]# LLM output with multiple issues
llm_output = """
Here's the user data you requested:
```json
{
// User information
users: [
{
name: 'Alice Johnson',
email: "alice@example.com",
age: 30,
active: True,
scores: [95, 87, 92,], // Test scores
profile: {
city: "New York",
interests: ["coding", "music", "travel",]
},
},
{
name: 'Bob Smith',
email: "bob@example.com",
age: 25,
active: False
// Missing comma above
}
],
metadata: {
total: 2,
updated: "2024-01-15"
// Missing closing brace
```
That should give you what you need!
"""
{:ok, clean_data} = JsonRemedy.repair(llm_output)
# Works perfectly! Handles code fences, comments, quotes, booleans, trailing commas, missing delimiters# Legacy Python-style JSON
python_json = ~s|{'users': [{'name': 'Alice', 'active': True, 'metadata': None}]}|
{:ok, data} = JsonRemedy.repair(python_json)
# => %{"users" => [%{"name" => "Alice", "active" => true, "metadata" => nil}]}
# JavaScript object literals
js_object = ~s|{name: "Alice", getValue: function() { return 42; }, data: [1,2,3]}|
{:ok, data} = JsonRemedy.repair(js_object)
# => %{"name" => "Alice", "data" => [1, 2, 3]} (function removed)
# Streaming/incomplete data
incomplete = ~s|{"status": "processing", "data": [1, 2, 3|
{:ok, data} = JsonRemedy.repair(incomplete)
# => %{"status" => "processing", "data" => [1, 2, 3]}
# Human input with common mistakes
human_input = ~s|{name: Alice, age: 30, scores: [95 87 92], active: true,}|
{:ok, data} = JsonRemedy.repair(human_input)
# => %{"name" => "Alice", "age" => 30, "scores" => [95, 87, 92], "active" => true}JsonRemedy includes comprehensive examples demonstrating real-world usage scenarios.
To see all examples in action with their full output:
./run-examples.shThis will execute all example scripts and show a summary of results.
Run specific examples to see detailed output:
mix run examples/basic_usage.exsLearn the fundamentals with step-by-step examples:
- Fixing unquoted keys
- Normalizing quote styles
- Handling boolean/null variants
- Repairing structural issues
- Processing LLM outputs
mix run examples/hardcoded_patterns_examples.exsDemonstrates advanced cleanup patterns ported from Python's json_repair library:
- Smart quotes normalization: Curly quotes, guillemets, angle quotes
- Doubled quotes repair:
""value""→"value" - Number format cleaning:
1,234,567→1234567 - Unicode/hex escapes:
\u263a→☺,\x41→A - International text: UTF-8 support with smart quotes
- Combined patterns: Real-world LLM output examples
mix run examples/html_content_examples.exsDemonstrates handling of unquoted HTML content in JSON values (common when APIs return error pages):
- API 503 Service Unavailable: Full HTML error page in JSON response
- API 404 Not Found: HTML 404 page with comments and metadata
- Simple HTML fragments: Bio fields and content with HTML tags
- Multiple HTML values: Arrays of templates with HTML content
- Complex nested HTML: HTML with JSON-like attributes and embedded scripts
This example showcases the HTML detection and quoting capabilities added in v0.1.5, which handle real-world scenarios where API endpoints return HTML error pages instead of JSON.
mix run examples/html_metadata_examples.exsInspect the metadata returned when quoting HTML fragments:
- Grapheme vs byte counts for emoji-rich HTML bodies
- Byte-accurate offsets from non-zero starting positions
- Guidance for integrating the metadata into downstream pipelines
mix run examples/real_world_scenarios.exsSee JsonRemedy handle realistic problematic JSON:
- LLM/ChatGPT outputs with code fences and mixed syntax
- Legacy system exports with comments and non-standard formatting
- User form input with mixed quote styles and missing delimiters
- Configuration files with comments and trailing commas
- API responses with inconsistent formatting
- Database dumps with structural issues
- JavaScript object literals with functions and invalid syntax
- Log outputs with embedded JSON in text
mix run examples/quick_performance.exsUnderstand JsonRemedy's performance characteristics:
- Fast path optimization for valid JSON
- Layer-specific performance breakdown
- Throughput measurements for different input sizes
- Memory usage patterns
mix run examples/simple_stress_test.exsVerify reliability under load:
- Repeated repair operations
- Nested structure handling
- Large array processing
- Memory usage stability
mix run examples/windows_ci_examples.exsValidate the cross-platform pipeline:
- Confirms the GitHub Actions workflow includes a Windows runner
- Lists the PowerShell commands executed in CI
- Helps contributors mirror the job locally when debugging CRLF issues
Here's what you'll see when running the real-world scenarios:
=== JsonRemedy Real-World Scenarios ===
Example 1: LLM/ChatGPT Output with Code Fences
==============================================
Input (LLM response with code fences and explanatory text):
Here's the user data you requested:
```json
{
"users": [
{name: "Alice Johnson", age: 32, role: "engineer"},
{name: "Bob Smith", age: 28, role: "designer"}
],
"metadata": {
generated_at: "2024-01-15",
total_count: 2,
active_only: True
}
}
Processing LLM Output through JsonRemedy pipeline...
✓ Layer 1 (Content Cleaning): Applied 1 repairs
✓ Layer 3 (Syntax Normalization): Applied 4 repairs
✓ Layer 4 (Validation): SUCCESS - Valid JSON produced!
{ "users": [ { "name": "Alice Johnson", "age": 32, "role": "engineer" }, { "name": "Bob Smith", "age": 28, "role": "designer" } ], "metadata": { "generated_at": "2024-01-15", "total_count": 2, "active_only": true } }
Total repairs applied: 5 Repair summary:
- removed code fences and wrapper text
- normalized unquoted key 'name' to "name"
- normalized unquoted key 'age' to "age"
- normalized unquoted key 'role' to "role"
- normalized boolean True -> true
All examples include detailed output showing:
- **Input analysis**: What's wrong with the JSON
- **Layer-by-layer processing**: Which layers made repairs
- **Final output**: Clean, valid JSON
- **Repair summary**: Detailed log of all fixes applied
- **Performance metrics**: Timing and throughput data
### 🎯 **Custom Examples**
Create your own examples using the same patterns:
```elixir
# examples/my_custom_example.exs
defmodule MyCustomExample do
def test_my_json do
malformed = ~s|{my: 'problematic', json: True}|
case JsonRemedy.repair(malformed, logging: true) do
{:ok, result, context} ->
IO.puts("✓ Repaired successfully!")
IO.puts("Result: #{Jason.encode!(result, pretty: true)}")
IO.puts("Repairs: #{length(context.repairs)}")
{:error, reason} ->
IO.puts("✗ Failed: #{reason}")
end
end
end
MyCustomExample.test_my_json()
Run with: mix run examples/my_custom_example.exs
All examples have been thoroughly tested and optimized for v0.1.1:
| Example | Status | Performance | Notes |
|---|---|---|---|
| Basic Usage | ✅ Stable | ~10ms | 8 fundamental examples, all patterns work |
| Real World Scenarios | ✅ Stable | ~15-30s | 8 complex scenarios, handles LLM/legacy data |
| Quick Performance | ✅ Stable | ~2-5s | 4 benchmarks, includes throughput analysis |
| Simple Stress Test | ✅ Stable | ~10-15s | 1000+ operations, memory stability verified |
| Performance Benchmarks | May hang | Complex analysis may timeout on large datasets |
The examples/performance_benchmarks.exs may hang when processing large datasets (5000+ objects). This is a computational complexity issue, not a library bug:
# These work fine:
mix run examples/performance_benchmarks.exs # May hang on large datasets
# Alternatives that complete successfully:
mix run examples/quick_performance.exs # Lightweight performance testing
mix run examples/simple_stress_test.exs # Stress testing without hangingWorkaround: For comprehensive benchmarking, use smaller dataset sizes or the quick performance example which provides sufficient performance insights.
- ✅ Fixed all compilation warnings across example files
- ✅ Corrected pattern matching for layer return values
- ✅ Added division-by-zero protection in throughput calculations
- ✅ Improved error handling for edge cases
- ✅ Enhanced Layer 4 validation pipeline integration
JsonRemedy is currently in Phase 1 implementation with Layers 1-4 fully operational:
| Layer | Status | Description |
|---|---|---|
| Layer 1 | ✅ Complete | Content cleaning (code fences, comments, encoding) |
| Layer 2 | ✅ Complete | Structural repair (delimiters, nesting, concatenation) |
| Layer 3 | ✅ Complete | Syntax normalization (quotes, booleans, commas) |
| Layer 4 | ✅ Complete | Fast validation (Jason.decode optimization) |
| Layer 5 | ⏳ Planned | Tolerant parsing (aggressive error recovery) |
The current implementation handles ~95% of real-world malformed JSON through Layers 1-4. Layer 5 will add edge case handling for the remaining challenging scenarios.
Current Release (v0.1.1): Production-ready Layers 1-4
- ✅ Complete JSON repair pipeline
- ✅ Handles LLM outputs, legacy systems, human input
- ✅ Performance optimized with fast-path validation
- ✅ Comprehensive test coverage and documentation
Next Release (v0.2.0): Layer 5 - Tolerant Parsing
- ⏳ Custom recursive descent parser
- ⏳ Aggressive error recovery for edge cases
- ⏳ Malformed number handling (e.g.,
123,456→123) - ⏳ Stream-safe parsing for incomplete JSON
- ⏳ Literal disambiguation algorithms
Based on comprehensive analysis of the json_repair Python library, the following patterns were initially documented as missing but are now fully implemented in v0.1.5:
Implemented Advanced Patterns (all tests passing):
-
Multiple JSON Values Aggregation ✅ -
test/missing_patterns/pattern1_multiple_json_test.exs- Pattern:
[]{}→[[],{}] - Status: ✅ 10/10 tests pass
- Implementation:
MultipleJsonDetectorutility in pre-processing pipeline - Wraps multiple complete JSON values into an array
- Pattern:
-
Object Boundary Merging ✅ -
test/missing_patterns/pattern2_object_merging_test.exs- Pattern:
{"a":"b"},"c":"d"}→{"a":"b","c":"d"} - Status: ✅ 10/10 tests pass
- Implementation:
ObjectMergermodule in Layer 3 - Merges additional key-value pairs after premature object close
- Pattern:
-
Ellipsis Filtering ✅ -
test/missing_patterns/pattern3_ellipsis_test.exs- Pattern:
[1,2,3,...]→[1,2,3] - Status: ✅ 10/10 tests pass
- Implementation:
EllipsisFiltermodule in Layer 3 - Filters unquoted
...placeholders from arrays (common in LLM output)
- Pattern:
-
Comment Keywords Filtering ✅ -
test/missing_patterns/pattern4_comment_keywords_test.exs- Pattern:
{"a":1, COMMENT "b":2}→{"a":1,"b":2} - Status: ✅ 10/10 tests pass
- Implementation:
KeywordFiltermodule in Layer 3 - Filters unquoted keywords:
COMMENT,SHOULD_NOT_EXIST,DEBUG_INFO,PLACEHOLDER,TODO,FIXME, etc.
- Pattern:
These advanced patterns handle edge cases commonly found in LLM outputs, debug logs, and malformed API responses. All 40 pattern tests pass with 100% success rate.
JsonRemedy's strength comes from its pragmatic, layered approach where each stage uses the optimal technique:
defmodule JsonRemedy.LayeredRepair do
def repair(input) do
input
|> PreProcessing.detect_and_fix() # Pre-process: Multiple JSON, object merging, filtering
|> Layer1.content_cleaning() # Cleaning: Remove wrappers, comments, normalize encoding
|> Layer2.structural_repair() # State machine: Fix delimiters, nesting, structure
|> Layer3.syntax_normalization() # Char parsing: Fix quotes, booleans, commas
|> Layer4.validation_attempt() # Jason.decode: Fast path for clean JSON
|> Layer5.tolerant_parsing() # Custom parser: Handle edge cases gracefully (FUTURE)
end
endTechnique: Pattern detection and early transformation
- Detects and aggregates multiple consecutive JSON values
- Merges split objects with boundary issues
- Filters ellipsis and debug keywords
- Runs before Layer 1 to prevent pattern interference
Technique: String operations
- Removes code fences, comments, wrapper text
- Normalizes encoding and whitespace
- Extracts JSON from prose and HTML
- Handles streaming artifacts
Technique: State machine with context tracking
- Fixes missing/extra/mismatched delimiters
- Handles complex nesting scenarios
- Wraps concatenated objects
- Preserves content inside strings
Technique: Character-by-character parsing with context awareness
- Standardizes quotes, booleans, null values
- Fixes commas and colons intelligently
- Handles escape sequences properly
- Preserves string content while normalizing structure
Technique: Battle-tested Jason.decode
- Attempts standard parsing for maximum speed
- Returns immediately if successful (common case)
- Provides performance benchmark
Technique: Custom recursive descent with error recovery (planned)
- Handles edge cases that preprocessing can't fix (planned)
- Uses pattern matching where appropriate (planned)
- Aggressive error recovery (planned)
- Graceful failure modes (planned)
# Main repair function
JsonRemedy.repair(json_string, opts \\ [])
# Returns: {:ok, term} | {:ok, term, repairs} | {:error, reason}
# Repair to JSON string
JsonRemedy.repair_to_string(json_string, opts \\ [])
# Returns: {:ok, json_string} | {:error, reason}
# Repair from file
JsonRemedy.from_file(path, opts \\ [])
# Returns: {:ok, term} | {:ok, term, repairs} | {:error, reason}[
# Return detailed repair log as third tuple element
logging: true,
# How aggressive to be with repairs
strictness: :lenient, # :strict | :lenient | :permissive
# Stop after successful layer (for performance)
early_exit: true,
# Maximum input size (security)
max_size_mb: 10,
# Processing timeout
timeout_ms: 5000,
# Custom repair rules for Layer 3
custom_rules: [
%{
name: "fix_custom_pattern",
pattern: ~r/special_pattern/,
replacement: "fixed_pattern",
condition: nil
}
]
]# Layer-specific processing (for custom pipelines)
JsonRemedy.Layer1.ContentCleaning.process(input, context)
JsonRemedy.Layer2.StructuralRepair.process(input, context)
JsonRemedy.Layer3.SyntaxNormalization.process(input, context)
# Individual repair functions
JsonRemedy.Layer3.SyntaxNormalization.normalize_quotes(input)
JsonRemedy.Layer3.SyntaxNormalization.fix_commas(input)
JsonRemedy.Layer3.SyntaxNormalization.normalize_escape_sequences(input)
# Health checking
JsonRemedy.health_check()
# => %{status: :healthy, layers: [...], performance: {...}}For large files or real-time processing:
# Process large files efficiently
"huge_log.jsonl"
|> File.stream!()
|> JsonRemedy.repair_stream()
|> Stream.map(&process_record/1)
|> Stream.each(&store_record/1)
|> Stream.run()
# Real-time stream processing with buffering
websocket_stream
|> JsonRemedy.repair_stream(buffer_incomplete: true, chunk_size: 1024)
|> Stream.each(&handle_json/1)
|> Stream.run()
# Batch processing with error collection
inputs
|> JsonRemedy.repair_stream(collect_errors: true)
|> Enum.reduce({[], []}, fn
{:ok, data} -> {[data | successes], errors}
{:error, err} -> {successes, [err | errors]}
end)JsonRemedy prioritizes correctness first, performance second with intelligent optimization:
Note: Performance benchmarks below reflect Layers 1-4 implementation. Layer 5 performance will be added in v0.2.0.
Input Type | Throughput | Memory | Notes
------------------------------|---------------|-----------|------------------
Valid JSON (Layer 4 only) | TODO: | TODO: | Jason.decode fast path
Simple malformed | TODO: | TODO: | Layers 1-3 processing
Complex malformed | TODO: | TODO: | Full pipeline
Large files (streaming) | TODO: | TODO: | Constant memory usage
LLM output (typical) | TODO: | TODO: | Mixed complexity
- Fast path: Valid JSON uses Jason.decode directly
- Intelligent layering: Early exit when repairs succeed
- Memory efficient: Streaming support for large files
- Predictable: Performance degrades gracefully with complexity
- Monitoring: Built-in performance tracking and health checks
Run benchmarks:
mix run bench/comprehensive_benchmark.exs
mix run bench/memory_profile.exsdefmodule MyApp.LLMProcessor do
def extract_structured_data(llm_response) do
case JsonRemedy.repair(llm_response, logging: true, timeout_ms: 3000) do
{:ok, data, []} ->
{:clean, data}
{:ok, data, repairs} ->
Logger.info("LLM output required #{length(repairs)} repairs")
maybe_retrain_model(repairs)
{:repaired, data}
{:error, reason} ->
Logger.error("Unparseable LLM output: #{reason}")
{:unparseable, reason}
end
end
defp maybe_retrain_model(repairs) do
# Analyze repair patterns to improve LLM prompts
serious_issues = Enum.filter(repairs, &(&1.layer == :structural_repair))
if length(serious_issues) > 3, do: schedule_model_retraining()
end
enddefmodule DataPipeline.JSONHealer do
def process_external_api(response) do
response.body
|> JsonRemedy.repair(strictness: :lenient, max_size_mb: 50)
|> case do
{:ok, data} ->
validate_and_transform(data)
{:error, reason} ->
send_to_deadletter_queue(response, reason)
{:error, :unparseable}
end
end
def heal_legacy_export(file_path) do
file_path
|> JsonRemedy.from_file(logging: true)
|> case do
{:ok, data, repairs} when length(repairs) > 0 ->
Logger.warn("Legacy file required healing: #{inspect(repairs)}")
maybe_update_source_system(file_path, repairs)
{:ok, data}
result -> result
end
end
enddefmodule MyApp.ConfigLoader do
def load_with_auto_repair(path) do
case JsonRemedy.from_file(path, logging: true) do
{:ok, config, []} ->
{:ok, config}
{:ok, config, repairs} ->
Logger.warn("Config file auto-repaired: #{format_repairs(repairs)}")
maybe_write_fixed_config(path, config, repairs)
{:ok, config}
{:error, reason} ->
{:error, "Config file unrecoverable: #{reason}"}
end
end
defp maybe_write_fixed_config(path, config, repairs) do
if mostly_syntax_fixes?(repairs) do
backup_path = path <> ".backup"
File.cp!(path, backup_path)
fixed_json = Jason.encode!(config, pretty: true)
File.write!(path, fixed_json)
Logger.info("Auto-fixed config saved. Backup at #{backup_path}")
end
end
enddefmodule LogProcessor do
def process_json_logs(file_path) do
file_path
|> File.stream!(read_ahead: 100_000)
|> JsonRemedy.repair_stream(
buffer_incomplete: true,
collect_errors: true,
timeout_ms: 1000
)
|> Stream.filter(&valid_log_entry?/1)
|> Stream.map(&enrich_log_entry/1)
|> Stream.chunk_every(1000)
|> Stream.each(&bulk_insert_logs/1)
|> Stream.run()
end
def process_realtime_stream(websocket_pid) do
websocket_pid
|> stream_from_websocket()
|> JsonRemedy.repair_stream(
buffer_incomplete: true,
max_buffer_size: 64_000,
early_exit: true
)
|> Stream.each(&handle_realtime_event/1)
|> Stream.run()
end
enddefmodule QualityControl do
def analyze_data_quality(source) do
results = source
|> stream_data()
|> JsonRemedy.repair_stream(logging: true)
|> Enum.reduce(%{total: 0, clean: 0, repaired: 0, failed: 0, repairs: []},
fn result, acc ->
case result do
{:ok, _data, []} ->
%{acc | total: acc.total + 1, clean: acc.clean + 1}
{:ok, _data, repairs} ->
%{acc | total: acc.total + 1, repaired: acc.repaired + 1,
repairs: acc.repairs ++ repairs}
{:error, _} ->
%{acc | total: acc.total + 1, failed: acc.failed + 1}
end
end)
generate_quality_report(results)
end
defp generate_quality_report(%{total: total, clean: clean, repaired: repaired,
failed: failed, repairs: repairs}) do
%{
summary: %{
quality_score: (clean + repaired) / total * 100,
clean_percentage: clean / total * 100,
repair_rate: repaired / total * 100,
failure_rate: failed / total * 100
},
top_issues: repair_frequency_analysis(repairs),
recommendations: generate_recommendations(repairs)
}
end
end| Feature | JsonRemedy | Poison | Jason | Python json-repair | JavaScript jsonrepair |
|---|---|---|---|---|---|
| Repair Capability | ✅ Comprehensive | ❌ None | ❌ None | ||
| Architecture | 🏗️ 5-layer pipeline | 📦 Monolithic | 📦 Monolithic | 📦 Single-pass | 📦 Single-pass |
| Context Awareness | ✅ Advanced | ❌ No | ❌ No | ||
| Streaming Support | ✅ Yes | ❌ No | ❌ No | ❌ No | ❌ No |
| Repair Logging | ✅ Detailed | ❌ No | ❌ No | ❌ No | |
| Performance | ⚡ Optimized | ⚡ Good | 🚀 Excellent | 🐌 Slow | ⚡ Good |
| Unicode Support | ✅ Full | ✅ Yes | ✅ Yes | ✅ Yes | |
| Error Recovery | ✅ Aggressive | ❌ No | ❌ No | ||
| LLM Output | ✅ Specialized | ❌ No | ❌ No | ||
| Production Ready | ✅ Yes | ✅ Yes | ✅ Yes |
# Define domain-specific repair rules
custom_rules = [
%{
name: "fix_currency_format",
pattern: ~r/\$(\d+)/,
replacement: ~S({"amount": \1, "currency": "USD"}),
condition: &(!JsonRemedy.LayerBehaviour.inside_string?(&1, 0))
},
%{
name: "normalize_dates",
pattern: ~r/(\d{4})-(\d{2})-(\d{2})/,
replacement: ~S("\1-\2-\3T00:00:00Z"),
condition: nil
}
]
{:ok, data} = JsonRemedy.repair(input, custom_rules: custom_rules)# System health and performance monitoring
health = JsonRemedy.health_check()
# => %{
# status: :healthy,
# layers: [
# %{layer: :content_cleaning, status: :healthy, avg_time_us: 45},
# %{layer: :structural_repair, status: :healthy, avg_time_us: 120},
# # ...
# ],
# performance: %{
# cache_hit_rate: 0.85,
# avg_repair_time_us: 850,
# memory_usage_mb: 12.3
# }
# }
# Performance statistics
stats = JsonRemedy.performance_stats()
# => %{success_rate: 0.94, avg_time_us: 680, cache_hits: 1205}# Detailed error analysis for debugging
case JsonRemedy.repair(malformed_input, logging: true) do
{:ok, data, repairs} ->
analyze_repair_patterns(repairs)
{:success, data}
{:error, reason} ->
case JsonRemedy.analyze_failure(malformed_input) do
{:analyzable, issues} ->
Logger.error("Repair failed: #{inspect(issues)}")
{:partial_analysis, issues}
{:unanalyzable, _} ->
{:complete_failure, reason}
end
end- LLM output malformations (code fences, mixed syntax, comments)
- Legacy system format conversion (Python, JavaScript object literals)
- Human input errors (missing quotes, trailing commas, typos)
- Streaming data issues (incomplete transmission, encoding problems)
- Copy-paste artifacts (doubled quotes, escape sequence issues)
- Invent missing data: Won't guess incomplete key-value pairs
- Fix semantic errors: Won't correct logically invalid data
- Handle arbitrary text: Requires recognizable JSON-like structure
- Guarantee perfect preservation: May alter semantics in edge cases
- Process infinite inputs: Has reasonable size and time limits
- Pragmatic over pure: Uses the optimal technique for each layer
- Correctness over performance: Prioritizes getting the right answer
- Transparency over magic: Comprehensive logging of all changes
- Robustness over efficiency: Graceful handling of edge cases
- Composable over monolithic: Each layer can be used independently
- Production-ready: Comprehensive error handling and monitoring
# Built-in security features
JsonRemedy.repair(input, [
max_size_mb: 10, # Prevent memory exhaustion
timeout_ms: 5000, # Prevent infinite processing
max_nesting_depth: 50, # Prevent stack overflow
disable_custom_rules: true # Disable user rules in untrusted contexts
])JsonRemedy follows a test-driven development approach with comprehensive quality standards:
# Development setup
git clone https://github.com/nshkrdotcom/json_remedy.git
cd json_remedy
mix deps.get
# Run test suites
mix test # All tests
mix test --only unit # Unit tests only
mix test --only integration # Integration tests
mix test --only performance # Performance validation
mix test --only property # Property-based tests
# Quality assurance
mix credo --strict # Code quality
mix dialyzer # Type analysis
mix format --check-formatted # Code formatting
mix test.coverage # Coverage analysis
# Windows CI parity (PowerShell)
mix run examples/windows_ci_examples.exs
# Benchmarking
mix run bench/comprehensive_benchmark.exs
mix run bench/memory_profile.exslib/
├── json_remedy.ex # Main API with pre-processing
├── json_remedy/
│ ├── layer_behaviour.ex # Common interface for all layers
│ ├── utils/
│ │ └── multiple_json_detector.ex # ✅ Pre-processing: Multiple JSON aggregation
│ ├── layer1/
│ │ └── content_cleaning.ex # ✅ Code fences, comments, wrappers
│ ├── layer2/
│ │ └── structural_repair.ex # ✅ Delimiters, nesting, state machine
│ ├── layer3/
│ │ ├── syntax_normalization.ex # ✅ Quotes, booleans, char-by-char parsing
│ │ ├── object_merger.ex # ✅ Pre-processing: Object boundary merging
│ │ ├── ellipsis_filter.ex # ✅ Filter unquoted ellipsis
│ │ └── keyword_filter.ex # ✅ Filter debug keywords
│ ├── layer4/
│ │ └── validation.ex # ✅ Jason.decode optimization
│ ├── layer5/ # ⏳ PLANNED
│ │ └── tolerant_parsing.ex # ⏳ Custom parser with error recovery
│ ├── pipeline.ex # Layer orchestration with pre-processing
│ ├── performance.ex # Monitoring and health checks
│ └── config.ex # Configuration management
# 1. Add repair rule to Layer 3
@repair_rules [
%{
name: "fix_my_pattern",
pattern: ~r/custom_pattern/,
replacement: "fixed_pattern",
condition: &my_condition_check/1
}
# existing rules...
]
# 2. Add test cases
test "fixes my custom pattern" do
input = "input with custom_pattern"
expected = "input with fixed_pattern"
{:ok, result, context} = SyntaxNormalization.process(input, %{repairs: [], options: []})
assert result == expected
assert Enum.any?(context.repairs, &String.contains?(&1.action, "fix_my_pattern"))
end
# 3. Add to API documentation
@doc """
Fix my custom pattern in JSON strings.
"""
@spec fix_my_pattern(input :: String.t()) :: {String.t(), [repair_action()]}
def fix_my_pattern(input), do: apply_rule(input, @my_pattern_rule)- Layer 5 completion (tolerant parsing) - Layer 4 already complete
- Advanced escape sequence handling (
\uXXXX,\xXX) - Concatenated JSON object wrapping
- Performance optimizations for large files
- Enhanced streaming API with better buffering
- Plug middleware for automatic request repair
- Phoenix LiveView helpers and components
- Ecto custom types for automatic JSON repair
- Broadway integration for data pipeline processing
- CLI tool with advanced options
- JSON5 extended syntax support
- Machine learning-based repair pattern detection
- Advanced caching and memoization
- Distributed processing for massive datasets
- Custom DSL for complex repair rules
JsonRemedy is released under the MIT License. See LICENSE for details.
JsonRemedy: Industrial-strength JSON repair for the real world. When your JSON is broken, we fix it right.
Built with ❤️ by developers who understand that perfect JSON is a luxury, but working JSON is a necessity.