Feat/new compiler #92

shreyas-londhe · 2025-03-28T15:39:56Z

Note

Replaces the old Rust compiler and extensive Circom tests with a new Bun/TypeScript scripts suite (gen-regex, gen-inputs), shared utilities, and Jest-based tests.

Tooling/Scripts (new):
- Add Bun/TypeScript scripts for generating regex artifacts: circom/scripts/gen-regex.ts, noir/scripts/gen-regex.ts, noir/scripts/gen-inputs.ts.
- Introduce shared utilities (utils/*) for logging, file ops, subprocess, and types.
- Add project config: tsconfig.json, jest.config.ts, jest.setup.ts, bun.lock, and package.json.
Tests (new):
- Add Jest integration and unit tests under scripts/__tests__ for script workflows and utilities.
Compiler (removed):
- Remove legacy Rust compiler package (packages/compiler/*).
Circom Tests (removed):
- Remove numerous legacy Circom test circuits and test cases under packages/circom/tests/*.

^{Written by Cursor Bugbot for commit 3d37c31. This will update automatically on new commits. Configure here.}

Summary by CodeRabbit

New Features
- Added comprehensive regex verification circuits for email headers (body hash, subject, sender, recipient, message ID) and email addresses.
- Introduced Noir proving framework support alongside Circom.
- Added compiler infrastructure for generating regex verification circuits from patterns.
Infrastructure & Build
- Updated GitHub Actions workflow for improved testing with Rust and Bun tooling.
- Simplified and consolidated project configuration across build tools.
- Added pre-push Git hooks to enforce circuit regeneration when compiler changes.
Documentation
- Completely revised README with detailed setup, installation, and troubleshooting guidance.
- Added comprehensive documentation for Circom and Noir subprojects.
- Included usage examples and common commands for developers.
Chores
- Cleaned up ESLint, Prettier, and Babel configurations.
- Restructured workspace manifest for monorepo organization.
- Updated .gitignore for Rust and Prover-related artifacts.

socket-security · 2025-03-31T07:26:18Z

Review the following changes in direct dependencies. Learn more about Socket for GitHub.

Package	Supply Chain Security	Quality
npm/@types/bun@1.3.1
npm/@zk-email/circuits@6.4.1-alpha.0
npm/circom_tester@0.0.20 ⏵ 0.0.24	⁺³	^-1
cargo/thiserror@2.0.12
npm/@types/node@22.13.0 ⏵ 20.19.24
cargo/serde@1.0.219
cargo/serde_json@1.0.140
cargo/wasm-bindgen@0.2.100
npm/bun-types@1.3.1
npm/prettier@3.6.2
npm/typescript@5.9.3
cargo/wasm-bindgen-test@0.2.50
npm/ts-jest@29.4.5
cargo/heck@0.4.1
cargo/regex-syntax@0.8.5
cargo/regex-automata@0.4.7
cargo/clap@4.2.1
cargo/serde-wasm-bindgen@0.6.5

View full report

error handling in gen circuit input

- Restore individual test functions in Noir circuit files that were previously removed - Add circuit input test cases and update sample data - Update script optimizations for better binary preference - Add comprehensive manual test guidelines to README files - Document safe test formatting to prevent accidental deletion by gen-inputs script Co-Authored-By: Colleague <colleague@example.com> 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>

…pts-used-for-generating-circom-and-noir feat: migrate Python scripts to TypeScript with comprehensive testing and tooling improvements

BornPsych · 2025-09-30T15:37:39Z

@cursor please review this

cursor · 2025-09-30T15:37:46Z

I've thoroughly reviewed the "Feat/new compiler" PR. I started by exploring the codebase, examining package.json files for dependency changes, and then delved into the compiler's structure, error handling, and test files. I also paid close attention to the Socket security report and the Cargo.toml files for Rust dependencies.

Here's a summary of my findings:

This PR introduces a substantial new ZK-Regex compiler with a well-structured architecture, robust error handling, and support for multiple proving frameworks (Circom and Noir). Key strengths include:

Architecture & Design: Modular structure, clear API, and support for multiple frameworks.
Error Handling: Comprehensive error taxonomy with user-friendly messages.
Developer Experience: Excellent CLI tool, automatic template regeneration, and good documentation.
Code Quality: Follows Rust best practices with clean abstractions.

However, I identified several areas for improvement and potential issues:

Security Concerns: The Socket report highlighted the addition of @zk-email/circuits@6.4.1-alpha.0 and an upgrade of circom_tester. Using alpha versions in production is a risk.
Test Coverage: I found no unit tests in the Rust compiler code, and integration tests appear limited.
Dependencies: There's a dependency on a specific GitHub branch (comptime from feat/comptime-codegen), which could lead to instability.
Documentation: Some inline documentation is missing, and there's no architecture diagram.

My main recommendations are to:

Add comprehensive unit tests for core compiler functionality.
Replace alpha dependencies with stable versions.
Address the external GitHub branch dependency.
Add performance benchmarks.

Overall, this is a high-quality PR, but the critical issues regarding tests and dependencies should be addressed before merging.

5 tasks completed

Review PR changes for new compiler feature
Analyze new dependencies added (@types/bun, @zk-email/circuits)
Review Socket security report findings
Verify new compiler implementation and integration
Provide comprehensive review feedback

_{Learn more about Cursor Agents}

…re, preserves special groups

…oups inside of public patterns into non capture

…group conversion The convert_capturing_to_non_capturing function was naively converting all '(' to '(?:' without considering: - Escaped parentheses like \( which match literal parens - Parentheses inside character classes like [()] which are literals This caused regex semantics to break. For example: - \(foo\) would become \(?:foo\) (wrong) - [()] would become [(?:)] (matches extra chars) Fixed by tracking two states during string scan: - in_char_class: are we inside [...]? - escaped: was previous char a backslash? Only convert ( to (?:) when it's not escaped, not in a char class, and not already followed by ?. Added 5 new tests covering escaped parens, character classes, and mixed cases. All 23 tests pass. Addresses PR review feedback: #109 (comment)

Extended convert_capturing_to_non_capturing to handle both PCRE-style (?<name>...) and Rust-style (?P<name>...) named capture groups, not just bare capturing groups. This prevents named captures in private patterns from interfering with public capture group numbering. Key changes: - Detect and convert both named capture syntaxes to (?:...) - Distinguish lookbehind assertions (?<=, ?<!) from named captures - Updated test_preserve_named_groups to test_convert_pcre_named_groups with correct expected behavior - Added 7 new tests covering both named capture styles and edge cases Fixes code review finding that named captures were being preserved instead of converted, which would cause capture group numbering mismatches in generated circuits.

…ure-groups-inside-of-private-patterns Rutefig/reg 574 fix prevent capture groups inside of private patterns

coderabbitai · 2025-10-31T09:46:07Z

Walkthrough

This PR introduces a comprehensive compiler infrastructure for generating zero-knowledge regex circuits. It restructures the project around a Rust-based compiler that converts regex patterns into circuit code for both Circom and Noir frameworks, adds supporting tooling, test infrastructure, and documentation across multiple proof systems.

Changes

Cohort / File(s)	Summary
Configuration & Tooling Cleanup `.cargo/config`, `.eslintrc.json`, `.eslintignore`, `babel.config.js`, `.prettierrc`	Removed deprecated/obsolete configuration files for cargo, ESLint, and Babel
Build Configuration Updates `Cargo.toml`, `.gitignore`, `bunfig.toml`	Restructured workspace to single compiler member; updated dependency management; revised ignore patterns; added Bun test timeout config
GitHub Actions `.github/workflows/test.yaml`, `.github/workflows/test.yml`	Replaced Node/Yarn-based workflow with Rust/Bun-based CI pipeline including Circom 2.1.9 setup and caching
Root Documentation `README.md`, `LICENSE`	Comprehensive README rewrite with installation, setup, and troubleshooting guidance; minor license URL standardization
Compiler Core `compiler/src/lib.rs`, `compiler/src/driver.rs`, `compiler/src/error.rs`, `compiler/src/types.rs`, `compiler/src/utils.rs`, `compiler/src/wasm.rs`, `compiler/Cargo.toml`, `compiler/README.md`, `compiler/package.json`	Complete Rust compiler library for regex-to-circuit generation; public API for compilation, decomposed regex handling, circuit input generation, and WASM bindings; structured error handling with error codes; utility functions for template validation and output serialization
Compiler IR Pipeline `compiler/src/ir/mod.rs`, `compiler/src/ir/nfa.rs`, `compiler/src/ir/intermediate.rs`, `compiler/src/ir/graph.rs`	Intermediate representation and NFA graph structures; epsilon-transition elimination; DFA construction with capture group support; path-finding and reachability analysis
Compiler Passes `compiler/src/passes/mod.rs`, `compiler/src/passes/error.rs`, `compiler/src/passes/builder.rs`	NFA builder pipeline from regex patterns; Thompson NFA conversion; error types and translation to compiler errors
Backend Code Generation `compiler/src/backend/mod.rs`, `compiler/src/backend/circom.rs`, `compiler/src/backend/noir.rs`, `compiler/src/backend/shared.rs`	Code generators for Circom and Noir proving frameworks; circuit data extraction and input generation; shared logic for state transitions and capture group handling
CLI Binary `compiler/src/bin/zk-regex.rs`	Command-line interface with subcommands for decomposed/raw regex compilation and circuit input generation
Circom Integration `circom/Cargo.toml` (not in diff, implied by package.json), `circom/package.json`, `circom/README.md`, `circom/tsconfig.json`, `circom/bunfig.toml`, `circom/index.ts`	Circom-specific package configuration, documentation, and test setup
Circom Regex Templates `circom/circuits/common/body_hash_regex.circom`, `circom/circuits/common/email_addr_regex.circom`, `circom/circuits/common/email_domain_regex.circom`, `circom/circuits/common/from_all_regex.circom`, `circom/circuits/common/message_id_regex.circom`, `circom/circuits/common/reversed_bracket_regex.circom`, `circom/circuits/common/simple_regex.circom`, `circom/circuits/common/subject_all_regex.circom`, `circom/circuits/common/to_all_regex.circom`	Circom 2.1.5 templates implementing DFA-based regex matching with capture group support; deterministic state transitions; per-byte validity checking; final accept-state validation
Circom DFA Graphs `circom/circuits/common/*_graph.json`	JSON representations of deterministic finite automatons for regex patterns (email, subject, body hash, etc.); state transitions and capture group metadata
Circom Regex Helpers `circom/circuits/regex_helpers.circom`	Utility templates for regex matching: MultiOR aggregation, byte/range transition checking with optional capture group validation, and substring capture extraction
Circom Tests `circom/circuits/tests/body_hash.test.ts`, `circom/circuits/tests/circuits/test_body_hash_regex.circom`	Test suite for body hash regex using circom_tester; witness computation and constraint validation
Circom Regex Definitions `circom/regexes/*.json`	Multi-part regex patterns (body hash, email address, domains, headers) defining pattern sequences and capture group constraints
Circom Code Generation Script `circom/scripts/gen-regex.ts`	TypeScript script to invoke zk-regex compiler for each JSON regex, building release binary and generating Circom circuit code
Circom .gitignore `circom/.gitignore`	Comprehensive ignore patterns for Circom build artifacts, dependencies, caches, IDE files
Noir Integration `noir/Nargo.toml`, `noir/README.md`	Noir project manifest with compiler version and sort dependency; documentation for Noir test integration
Noir Pattern Configurations `noir/common/*.json`	Multi-part regex pattern definitions for Noir (body hash, email address/domain, message-id, etc.)
Noir Sample Data `noir/common/sample_haystacks/body_hash.json`, `noir/common/sample_haystacks/circuit_inputs/body_hash_pass_0.json`	Sample test inputs and expected circuit input traces for Noir test generation
Git Hooks `git-hooks/pre-push`	Pre-push hook enforcing compiler rebuild and template regeneration; blocks push if changes detected to enable review

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant CLI as zk-regex CLI
    participant Driver
    participant Builder as NFABuilder
    participant Backend as Circom/Noir<br/>Backend
    participant Output

    User->>CLI: gen-from-raw(pattern, template, framework)
    CLI->>Driver: compile(pattern, config)
    Driver->>Builder: NFAGraph::build(pattern)
    Builder->>Builder: Thompson NFA → IntermediateNFA
    Builder->>Builder: Epsilon elimination
    Builder->>Output: NFAGraph (with states,<br/>transitions, captures)
    Driver->>Backend: generate_code(nfa, template, pattern, max_bytes)
    Backend->>Backend: extract start/accept states
    Backend->>Backend: generate per-byte<br/>transition logic
    Backend->>Backend: emit capture<br/>group handling
    Backend->>Output: Circom/Noir code
    Output-->>User: (NFAGraph, Code)

    User->>CLI: gen-circuit-inputs(nfa, haystack, framework)
    CLI->>Driver: generate_circuit_inputs(...)
    Driver->>Builder: get_path_to_accept(haystack)
    Builder->>Builder: PikeVM traverse haystack
    Builder->>Builder: extract state path &<br/>capture metadata
    Driver->>Output: CircomInputs / NoirInputs
    Output-->>User: Prover-ready JSON

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120 minutes

Areas requiring extra attention:

Compiler IR & NFA Implementation (compiler/src/ir/, compiler/src/passes/): Complex epsilon-transition elimination, path-finding logic, and state reachability analysis with non-obvious correctness constraints.
Backend Code Generation (compiler/src/backend/circom.rs, compiler/src/backend/noir.rs): Dense circuit code emission with conditional logic for capture groups, state aggregation, and per-byte transition mapping; correctness critical for circuit validity.
Circom Template Logic (circom/circuits/common/*_regex.circom): Complex per-position state tracking, transition validity accumulation, and accept-state verification; errors could silently fail circuit constraints.
Driver & Error Handling (compiler/src/driver.rs, compiler/src/error.rs): Error mapping and recovery paths across multiple compilation stages.
WASM Bindings (compiler/src/wasm.rs): Interface between JavaScript and Rust compiler; serialization/deserialization edge cases and error translation.
Git Hook Script (git-hooks/pre-push): Stash/unstash logic and error recovery paths in shell scripting.

Poem

🐰 A regex becomes a circuit, a pattern becomes proof,
Thompson builds the scaffold, epsilon fades aloof,
States dance through the bytes, captures held tight,
Zero-knowledge promised in circuits of might! ✨

Pre-merge checks and finishing touches

❌ Failed checks (1 inconclusive)

Check name	Status	Explanation	Resolution
Title Check	❓ Inconclusive	The PR introduces a comprehensive new compiler system for generating zero-knowledge regex verification circuits across multiple proving frameworks (Circom and Noir), representing a substantial architectural addition to the codebase. The title "Feat/new compiler" does reference a real component of the changeset, but it uses vague terminology that fails to convey meaningful information about what the compiler does, why it was added, or its significance. A teammate scanning the repository history would not understand from this title alone that the PR introduces a full zk-regex compiler with circuit generation capabilities, multiple framework support, or its role in the project. Additionally, the title uses a branch-naming convention format (feat/...) rather than a proper PR title format.	Consider revising the title to be more descriptive and specific, such as "Add ZK-Regex compiler for Circom and Noir circuit generation" or "Implement compiler for regex verification circuit generation" to clearly communicate the purpose and scope of this major addition. This would help future reviewers quickly understand the primary contribution when scanning the commit history.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch feat/new-compiler

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

cursor

This PR is being reviewed by Cursor Bugbot

Details

You are on the Bugbot Free tier. On this plan, Bugbot will review limited PRs each billing cycle.

To receive Bugbot reviews on all of your PRs, visit the Cursor dashboard to activate Pro and start your 14-day free trial.

cursor · 2025-10-31T09:58:06Z

compiler/src/bin/zk-regex.rs

+                            eprintln!("Supported frameworks: circom, noir");
+                            std::process::exit(1);
+                        }
+                    };


Bug: Bug

The proving_framework argument is parsed inconsistently across commands. While Decomposed and Raw use ProvingFramework::from_str, GenerateCircuitInput manually parses it with a match statement. This duplicates logic and creates maintenance challenges, potentially leading to behavioral inconsistencies.

cursor · 2025-10-31T09:58:06Z

noir/scripts/gen-inputs.ts

+}
+
+// Run the script if this file is executed directly
+if (import.meta.url === `file://${process.argv[1]}`) {


Bug: Bug

The direct execution check for the main function in gen-inputs.ts and gen-regex.ts uses an incorrect comparison between import.meta.url (a URL string) and process.argv[1] (a file path). This mismatch means the scripts' main functions never run when executed directly.

Additional Locations (1)

noir/scripts/gen-regex.ts#L203-L204

coderabbitai

Actionable comments posted: 25

🧹 Nitpick comments (12)

circom/regexes/reversed_bracket.json (1)

5-5: Consider whether greedy matching is intentional.

The pattern <.* uses greedy matching (.*) which will match everything after the < character. If the intent is to match content only up to the next delimiter or end of string, you might want to use a more constrained pattern.

However, if the greedy behavior is intentional for your use case, this is fine as-is.

circom/circuits/common/to_all_graph.json (1)

1-50: Clarify generation and maintenance process for DFA graph file.

This 2256-line JSON file encodes a deterministic finite automaton (DFA) for the regex pattern (?:\r\n|^)to:([^\r\n]+)\r\n. The file structure suggests automated generation rather than manual authoring.

Key concerns:

Is this auto-generated? If yes, add build artifacts to .gitignore or document the generation process.

Maintenance model: If manually maintained, this is unmaintainable. If generated, provide:

Documentation on the generator tool

Instructions for regeneration when regex patterns change

Validation that generated states are correct and complete

Reachability verification: Confirm all states are reachable and accept_states are terminal.

Without clarity on ownership, future maintainers won't know how to update this when regex patterns change.

I can help document the generation process or create a verification script. Would you like me to generate a script to validate DFA reachability and accept state correctness?
circom/regexes/message_id.json (1)
1-7: Consider expanding the message-id character class to align with RFC 5322.

The PublicPattern on line 4 uses <[A-Za-z0-9=@\\.\\+_-]+>, which may be too restrictive. RFC 5322 allows additional characters in message-id local and domain parts, such as colons (:) and slashes (/). Modern message-ids may include these characters.

If stricter validation is intended, this is acceptable. Otherwise, consider expanding the pattern:
-        { "PublicPattern": ["<[A-Za-z0-9=@\\.\\+_-]+>", 128] },
+        { "PublicPattern": ["<[A-Za-z0-9=@\\.\\+_\\-:/]+>", 128] },
circom/README.md (1)

1-62: LGTM - Comprehensive documentation with minor style improvements available.

The README provides clear installation instructions, prerequisites, and troubleshooting guidance. The content is accurate and aligns well with the Circom integration introduced in this PR.

For improved markdown consistency, consider fixing the indentation issues flagged by markdownlint. Nested list items should use 2-space indentation instead of 4-space:

Lines 13-14, 19-23, 27-31, 35-40 have 4-space indentation

Line 51 uses emphasis instead of a proper heading

These are purely stylistic and can be addressed in a future cleanup if desired.

noir/common/message_id.json (1)

1-7: Pattern duplication with circom/regexes/message_id.json.

This file is identical to circom/regexes/message_id.json. While having separate regex definitions for Circom and Noir toolchains is reasonable, consider whether these could be maintained in a single source to prevent drift.

Additionally, the same character class limitation applies here: the pattern may be too restrictive for modern message-ids per RFC 5322. See the review comment on circom/regexes/message_id.json for details.
compiler/README.md (2)
13-40: Fix markdown list indentation to comply with MD007.

Lines use 4-space indentation for nested list items; markdown style guide requires 2 spaces. Update the list indentation throughout (lines 13–40, 51, 65, 81, 97) for consistency.
-    -   **`compile(pattern: &str) -> Result<NFAGraph, CompilerError>`**
-        Parses the input regex `pattern` string.
+  -   **`compile(pattern: &str) -> Result<NFAGraph, CompilerError>`**
+      Parses the input regex `pattern` string.
51-51: Replace bold text with heading syntax for "Examples" sections (MD036).

Lines 51, 65, 81, and 97 use bold text (e.g., **Example 1:**) instead of proper heading syntax. Convert to heading level (e.g., ### Example 1:) for better accessibility and document structure.
-**Example 1: Compile a simple regex to NFA**
+### Example 1: Compile a simple regex to NFA
Also applies to: 65-65, 81-81, 97-97
compiler/src/utils.rs (2)
168-196: Consider validating the file_extension parameter.

The function accepts file_extension without validation. Invalid extensions could create confusion (e.g., extensions with path separators, empty strings, or non-standard characters).

Add basic validation:
 pub fn save_outputs(
     nfa: &NFAGraph,
     circom_code: String,
     output_dir: &PathBuf,
     template_name: &str,
     file_extension: &str,
 ) -> Result<(), Box<dyn std::error::Error>> {
     validate_cli_template_name(template_name)?;
+    
+    // Validate file extension
+    if file_extension.is_empty() || file_extension.contains('/') || file_extension.contains('\\') {
+        return Err("Invalid file extension".into());
+    }

     // Create output directory if it doesn't exist
198-464: Strong test coverage! Consider adding edge case tests.

The test suite is comprehensive and well-organized. To complement the issues raised above, consider adding tests for:

Escaped brackets in character classes: r"[\[]" and r"[\]]"

Character class starting with ]: r"[]a]"

Malformed named captures: r"(?<name...)" (missing >)

Empty template name validation
compiler/src/wasm.rs (1)
155-189: Minor optimization: avoid cloning max_substring_bytes twice.

The code clones max_substring_bytes on line 161 (and 180 for Noir), but then the original is also referenced in the closure. Since it's only Option<Vec<usize>>, the clone is cheap, but the pattern is slightly redundant.

Consider restructuring to avoid the clone:
+    let num_capture_groups = nfa.num_capture_groups;
+    let max_bytes_len = max_substring_bytes.as_ref().map(|v| v.len()).unwrap_or(0);
+
     let code = match proving_framework {
         ProvingFramework::Circom => {
-            let max_bytes_clone = config.max_bytes.clone();
             crate::backend::generate_circom_code(
                 &nfa,
                 &template_name.0,
                 &raw_regex.0,
-                max_substring_bytes.clone(),
+                max_substring_bytes,
             )
             .map_err(|nfa_err| {
                 match nfa_err {
                     crate::passes::NFAError::InvalidCapture(_) => {
                         CompilerError::invalid_capture_config(
-                            nfa.num_capture_groups,
-                            max_substring_bytes.as_ref().map(|v| v.len()).unwrap_or(0),
+                            num_capture_groups,
+                            max_bytes_len,
                         )
                     }
compiler/src/lib.rs (1)
73-88: Consider exposing the optimize parameter.

The optimize flag is hardcoded to true in the CompilationConfig. Depending on use cases, users might want to disable optimization for debugging or to compare output.

Consider adding an optional parameter:
pub fn gen_from_raw(
    pattern: &str,
    max_bytes: Option<Vec<usize>>,
    template_name: &str,
    proving_framework: ProvingFramework,
    optimize: Option<bool>,
) -> CompilerResult<(NFAGraph, String)> {
    let config = CompilationConfig {
        template_name: template_name.to_string(),
        proving_framework,
        max_bytes,
        optimize: optimize.unwrap_or(true),
    };
    // ...
}
compiler/src/ir/intermediate.rs (1)
340-396: State compaction logic is correct but clones nodes.

The compact_states method correctly remaps all state indices and transitions. The node cloning on line 353 is safe but could be optimized if this becomes a performance bottleneck.

If performance is critical, consider using std::mem::take:
let mut node = std::mem::take(&mut self.nodes[old_idx]);
Though this would require making self.nodes temporarily inconsistent, so the current clone approach is safer.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4fb1bda and 3d37c31.

⛔ Files ignored due to path filters (5)

.DS_Store is excluded by !**/.DS_Store
Cargo.lock is excluded by !**/*.lock
bun.lock is excluded by !**/*.lock
circom/bun.lock is excluded by !**/*.lock
packages/circom/pnpm-lock.yaml is excluded by !**/pnpm-lock.yaml

📒 Files selected for processing (78)

.cargo/config (0 hunks)
.eslintignore (0 hunks)
.eslintrc.json (0 hunks)
.github/workflows/test.yaml (0 hunks)
.github/workflows/test.yml (1 hunks)
.gitignore (1 hunks)
.prettierrc (0 hunks)
Cargo.toml (1 hunks)
LICENSE (1 hunks)
README.md (2 hunks)
babel.config.js (0 hunks)
bunfig.toml (1 hunks)
circom/.gitignore (1 hunks)
circom/README.md (1 hunks)
circom/bunfig.toml (1 hunks)
circom/circuits/common/body_hash_regex.circom (1 hunks)
circom/circuits/common/email_addr_graph.json (1 hunks)
circom/circuits/common/email_addr_regex.circom (1 hunks)
circom/circuits/common/email_domain_graph.json (1 hunks)
circom/circuits/common/email_domain_regex.circom (1 hunks)
circom/circuits/common/from_all_graph.json (1 hunks)
circom/circuits/common/from_all_regex.circom (1 hunks)
circom/circuits/common/message_id_graph.json (1 hunks)
circom/circuits/common/message_id_regex.circom (1 hunks)
circom/circuits/common/reversed_bracket_regex.circom (1 hunks)
circom/circuits/common/simple_graph.json (1 hunks)
circom/circuits/common/simple_regex.circom (1 hunks)
circom/circuits/common/subject_all_graph.json (1 hunks)
circom/circuits/common/subject_all_regex.circom (1 hunks)
circom/circuits/common/to_all_graph.json (1 hunks)
circom/circuits/common/to_all_regex.circom (1 hunks)
circom/circuits/regex_helpers.circom (1 hunks)
circom/circuits/tests/body_hash.test.ts (1 hunks)
circom/circuits/tests/circuits/test_body_hash_regex.circom (1 hunks)
circom/index.ts (1 hunks)
circom/package.json (1 hunks)
circom/regexes/body_hash.json (1 hunks)
circom/regexes/email_addr.json (1 hunks)
circom/regexes/email_domain.json (1 hunks)
circom/regexes/from_all.json (1 hunks)
circom/regexes/message_id.json (1 hunks)
circom/regexes/reversed_bracket.json (1 hunks)
circom/regexes/simple.json (1 hunks)
circom/regexes/subject_all.json (1 hunks)
circom/regexes/to_all.json (1 hunks)
circom/scripts/gen-regex.ts (1 hunks)
circom/tsconfig.json (1 hunks)
compiler/Cargo.toml (1 hunks)
compiler/README.md (1 hunks)
compiler/package.json (1 hunks)
compiler/src/backend/circom.rs (1 hunks)
compiler/src/backend/mod.rs (1 hunks)
compiler/src/backend/noir.rs (1 hunks)
compiler/src/backend/shared.rs (1 hunks)
compiler/src/bin/zk-regex.rs (1 hunks)
compiler/src/driver.rs (1 hunks)
compiler/src/error.rs (1 hunks)
compiler/src/ir/graph.rs (1 hunks)
compiler/src/ir/intermediate.rs (1 hunks)
compiler/src/ir/mod.rs (1 hunks)
compiler/src/ir/nfa.rs (1 hunks)
compiler/src/lib.rs (1 hunks)
compiler/src/passes/builder.rs (1 hunks)
compiler/src/passes/error.rs (1 hunks)
compiler/src/passes/mod.rs (1 hunks)
compiler/src/types.rs (1 hunks)
compiler/src/utils.rs (1 hunks)
compiler/src/wasm.rs (1 hunks)
git-hooks/pre-push (1 hunks)
noir/Nargo.toml (1 hunks)
noir/README.md (1 hunks)
noir/common/body_hash.json (1 hunks)
noir/common/email_addr.json (1 hunks)
noir/common/email_domain.json (1 hunks)
noir/common/from_all.json (1 hunks)
noir/common/message_id.json (1 hunks)
noir/common/sample_haystacks/body_hash.json (1 hunks)
noir/common/sample_haystacks/circuit_inputs/body_hash_pass_0.json (1 hunks)

💤 Files with no reviewable changes (6)

.github/workflows/test.yaml
.prettierrc
.eslintignore
.eslintrc.json
babel.config.js
.cargo/config

🧰 Additional context used

🧬 Code graph analysis (16)

compiler/src/passes/builder.rs (3)

compiler/src/backend/circom.rs (2)

nfa (229-231)

nfa (233-235)

compiler/src/backend/noir.rs (2)

nfa (191-193)

nfa (195-197)

compiler/src/ir/intermediate.rs (1)

new (53-61)

compiler/src/ir/nfa.rs (1)

compiler/src/ir/intermediate.rs (1)

new (53-61)

compiler/src/ir/mod.rs (2)

compiler/src/backend/circom.rs (2)

nfa (229-231)

nfa (233-235)

compiler/src/backend/noir.rs (2)

nfa (191-193)

nfa (195-197)

compiler/src/bin/zk-regex.rs (6)

compiler/src/lib.rs (3)

gen_circuit_inputs (160-297)

gen_from_decomposed (114-126)

gen_from_raw (73-88)

compiler/src/utils.rs (2)

save_outputs (168-196)

validate_cli_template_name (156-166)

compiler/src/types.rs (1)

from_str (51-59)

compiler/src/error.rs (2)

code (139-148)

user_message (151-224)

compiler/src/ir/nfa.rs (1)

from_json (80-84)

compiler/src/passes/error.rs (1)

from (53-166)

compiler/src/backend/shared.rs (5)

compiler/src/passes/error.rs (1)

from (53-166)

compiler/src/backend/noir.rs (8)

inputs (272-276)

inputs (283-287)

inputs (291-295)

nfa (191-193)

nfa (195-197)

start_states (350-353)

accept_states (379-382)

start (430-430)

compiler/src/backend/circom.rs (2)

nfa (229-231)

nfa (233-235)

compiler/src/ir/intermediate.rs (1)

new (53-61)

compiler/src/ir/nfa.rs (1)

new (63-71)

circom/circuits/tests/body_hash.test.ts (1)

compiler/src/wasm.rs (1)

genCircuitInputs (198-217)

compiler/src/driver.rs (6)

compiler/src/lib.rs (1)

compile (39-41)

compiler/src/backend/circom.rs (3)

nfa (229-231)

nfa (233-235)

generate_circom_code (62-432)

compiler/src/backend/noir.rs (3)

nfa (191-193)

nfa (195-197)

generate_noir_code (30-265)

compiler/src/passes/builder.rs (1)

build (21-37)

compiler/src/error.rs (3)

code (139-148)

invalid_capture_config (278-290)

circuit_generation_failed (268-276)

compiler/src/passes/error.rs (1)

from (53-166)

circom/scripts/gen-regex.ts (6)

compiler/src/bin/zk-regex.rs (1)

main (83-225)

scripts/utils/logger.ts (2)

logger (41-41)

error (36-38)

scripts/utils/file-operations.ts (3)

fileExists (75-82)

ensureDirectory (24-30)

listFilesWithExtension (99-104)

scripts/utils/types.ts (1)

ScriptError (69-74)

scripts/utils/string-utils.ts (1)

toPascalCase (15-19)

scripts/utils/subprocess.ts (1)

executeCargo (196-267)

compiler/src/lib.rs (6)

compiler/src/driver.rs (1)

compile (59-130)

compiler/src/passes/builder.rs (1)

build (21-37)

compiler/src/backend/shared.rs (3)

from (33-44)

from (48-59)

generate_circuit_inputs (127-228)

compiler/src/passes/error.rs (1)

from (53-166)

compiler/src/utils.rs (1)

decomposed_to_composed_regex (131-154)

compiler/src/error.rs (2)

input_too_long (292-300)

no_match_found (302-310)

compiler/src/utils.rs (3)

compiler/src/ir/intermediate.rs (1)

new (53-61)

compiler/src/ir/nfa.rs (1)

new (63-71)

compiler/src/lib.rs (1)

gen_from_decomposed (114-126)

compiler/src/ir/graph.rs (5)

compiler/src/backend/circom.rs (2)

nfa (229-231)

nfa (233-235)

compiler/src/backend/noir.rs (3)

nfa (191-193)

nfa (195-197)

start (430-430)

compiler/src/ir/intermediate.rs (1)

new (53-61)

compiler/src/ir/nfa.rs (2)

new (63-71)

node (96-99)

compiler/src/backend/shared.rs (2)

path (163-163)

path (164-164)

compiler/src/passes/error.rs (2)

compiler/src/wasm.rs (5)

from (21-26)

from (30-42)

from (54-56)

from (60-62)

from (66-68)

compiler/src/error.rs (2)

nfa_construction_failed (259-266)

serialization_error (312-318)

compiler/src/backend/noir.rs (1)

compiler/src/backend/shared.rs (4)

escape_regex_for_display (230-243)

generate_circuit_data (62-125)

from (33-44)

from (48-59)

compiler/src/backend/circom.rs (1)

compiler/src/backend/shared.rs (2)

escape_regex_for_display (230-243)

generate_circuit_data (62-125)

compiler/src/ir/intermediate.rs (3)

compiler/src/backend/circom.rs (2)

nfa (229-231)

nfa (233-235)

compiler/src/backend/noir.rs (3)

nfa (191-193)

nfa (195-197)

start (430-430)

compiler/src/ir/nfa.rs (2)

new (63-71)

node (96-99)

compiler/src/wasm.rs (6)

compiler/src/backend/circom.rs (3)

generate_circom_code (62-432)

nfa (229-231)

nfa (233-235)

compiler/src/backend/noir.rs (6)

generate_noir_code (30-265)

nfa (191-193)

nfa (195-197)

inputs (272-276)

inputs (283-287)

inputs (291-295)

compiler/src/lib.rs (2)

compile (39-41)

gen_circuit_inputs (160-297)

compiler/src/utils.rs (1)

decomposed_to_composed_regex (131-154)

compiler/src/passes/error.rs (1)

from (53-166)

compiler/src/error.rs (3)

code (139-148)

serialization_error (312-318)

invalid_capture_config (278-290)

🪛 actionlint (1.7.8)

.github/workflows/test.yml

23-23: the runner of "actions/cache@v3" action is too old to run on GitHub Actions. update the action's version to fix this issue

(action)

🪛 markdownlint-cli2 (0.18.1)

README.md

14-14: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)

15-15: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)

compiler/README.md

13-13: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)

14-14: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)

15-15: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)

19-19: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)

20-20: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)

21-21: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)

22-22: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)

23-23: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)

27-27: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)

28-28: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)

29-29: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)

30-30: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)

31-31: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)

35-35: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)

36-36: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)

37-37: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)

38-38: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)

39-39: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)

40-40: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)

51-51: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)

65-65: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)

81-81: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)

97-97: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)

circom/README.md

13-13: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)

14-14: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)

15-15: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)

19-19: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)

20-20: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)

21-21: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)

22-22: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)

23-23: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)

27-27: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)

28-28: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)

29-29: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)

30-30: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)

31-31: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)

35-35: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)

36-36: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)

37-37: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)

38-38: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)

39-39: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)

40-40: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)

51-51: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)

GitHub Check: Socket Security: Pull Request Alerts
GitHub Check: Cursor Bugbot
GitHub Check: test
GitHub Check: test

🔇 Additional comments (34)

LICENSE (1)

674-674: URL standardization looks correct.

The reference to https://www.gnu.org/licenses/why-not-lgpl.html is the canonical FSF link for GPL v3 licensing guidance. This is a standard maintenance update with no license compliance implications.

noir/Nargo.toml (2)

1-5: Populate authors and review version constraints.

The authors field (line 4) is empty, which is a metadata quality issue. For published or team projects, author information should be populated. Additionally, the compiler_version constraint uses only a lower bound (>=1.0.0) with no upper limit, which could lead to compatibility issues if future Noir releases introduce breaking changes.

Please populate the authors field with actual author name(s), e.g., authors = ["Your Name"]

Consider adding an upper bound to compiler_version to maintain compatibility, e.g., compiler_version = ">=1.0.0, <2.0.0" (adjust based on stability guarantees)

7-8: Verify the dependency specification and git tag availability.

The sort dependency is sourced from a GitHub repository via a git tag (v0.2.3) rather than a version-pinned crates.io release. While this approach is valid in Noir, please confirm:

The git repository URL and tag v0.2.3 are accessible and stable.

This dependency sourcing approach aligns with the project's practices for managing external dependencies.

The tag points to a compatible and stable version of the library.

If the dependency is well-maintained and the tag is stable, consider adding a comment explaining why a git-sourced dependency is preferred over a registry release (if available).

git-hooks/pre-push (1)

17-30: comment

circom/regexes/simple.json (1)

1-3: LGTM!

Valid JSON structure defining a simple regex pattern for testing the compiler pipeline.

noir/common/email_addr.json (1)

1-10: LGTM!

Email validation pattern with appropriate maximum length (320 characters per RFC 3696). The pattern includes standard email characters and properly escaped special characters.

noir/README.md (1)

1-20: LGTM!

Clear and helpful documentation with an important warning about test placement. The guidance prevents potential issues with automated test generation scripts.

circom/regexes/to_all.json (1)

1-7: LGTM!

Well-structured multi-part pattern for parsing email "To:" headers. Correctly handles CRLF line endings and constrains the captured content to a reasonable maximum length.

circom/circuits/common/simple_graph.json (1)

1-71: LGTM with observation.

The NFA graph correctly represents the regex "a*b" with proper state transitions. The multiple start states (0, 2, 3) are unusual but may be intentional for the compiler's graph representation or optimization strategy.

Note: Ensure the graph consumer correctly handles multiple start states if this is a generated artifact from the compiler pipeline.

circom/regexes/from_all.json (1)

1-7: LGTM — new regex pattern data asset.

This JSON file adds a regex definition for parsing email "from:" headers. The structure and patterns are consistent with the regex data assets introduced elsewhere in the PR.

noir/common/from_all.json (1)

1-7: LGTM — matching Noir regex pattern.

This mirrors the Circom version for cross-framework consistency. Both frameworks now share the same "from:" header parsing patterns.

compiler/Cargo.toml (1)

13-23: Verify workspace dependency versions in root Cargo.toml.

All dependencies are correctly delegated to the workspace. However, ensure the versions specified in the root workspace manifest are appropriate and not using unstable sources (except where necessary).

circom/regexes/body_hash.json (1)

1-8: LGTM — DKIM body hash regex pattern.

Well-structured regex for extracting DKIM body hash values. The base64 pattern and 44-character limit align with SHA256-based DKIM signatures.

circom/tsconfig.json (1)

1-27: LGTM — well-configured TypeScript settings.

The configuration enforces strict type checking while maintaining development flexibility. Modern settings for bundler-based workflows are appropriate.

noir/common/sample_haystacks/body_hash.json (1)

1-9: LGTM - Well-structured test fixture.

The test cases appropriately cover:

Pass case 1: body hash pattern without leading CRLF (line start anchor)

Pass case 2: body hash pattern with leading CRLF

Fail case: pattern preceded by another header field ("to:"), ensuring the regex doesn't match mid-line

circom/circuits/tests/circuits/test_body_hash_regex.circom (1)

5-5: No issues identified—parameter values are correct.

The BodyHashRegex template is instantiated with maxHaystackBytes=300 and maxMatchBytes=299. These parameters are appropriately configured: the haystack (input data to search) is 300 bytes, and the maximum match length is 299 bytes. Having the match length slightly less than the haystack size is a standard and intentional configuration.

compiler/src/passes/mod.rs (1)

6-9: Dismiss this review comment — the builder module is correctly kept private.

The builder module contains no public type definitions, only impl blocks that add methods to public types (NFAGraph, IntermediateNFA) defined in the ir/ module. External code already has full access to builder functionality through public methods like NFAGraph::build(). Keeping the builder module private is appropriate; it's an internal implementation detail, not a public API surface requiring exposure.

Likely an incorrect or invalid review comment.

circom/circuits/common/subject_all_regex.circom (1)

43-148: Solid DFA wiring

The transition matrix, capture propagation, and path-length/linkage checks line up cleanly with (?:\r\n|^)subject:([^\r\n]+)\r\n; I don’t see correctness gaps here.

circom/circuits/common/message_id_regex.circom (1)

43-153: Regex automaton looks sound

State transitions and capture-group handling match the message-id pattern, and the accept-state accounting mirrors the other proven templates—looks good.

circom/circuits/common/subject_all_graph.json (1)

1-2321: Graph data matches the circuit

The DFA states, transitions, and capture annotations mirror the Circom template’s expectations, so the data asset is consistent.

circom/circuits/common/from_all_regex.circom (1)

43-167: From-header automaton looks consistent

The transitions and capture plumbing reflect the intended (from: …) header pattern, with the same path-length safeguards as the other templates. No issues spotted.

compiler/src/wasm.rs (2)

45-69: Good use of newtype pattern for type safety at WASM boundaries.

The wrapper types provide clear type distinctions at the WASM interface, which helps prevent parameter ordering mistakes.

194-242: Clean error handling in circuit input generation.

The function properly deserializes the NFA graph, generates inputs, and serializes the result with appropriate error handling at each step.

compiler/src/lib.rs (3)

24-41: Clean and well-documented API entry point.

The compile function provides a clear, simple interface for NFA construction with proper error type conversion.

90-126: Clean decomposition and delegation pattern.

The function appropriately delegates to gen_from_raw after composing the regex, avoiding code duplication.

267-294: Thorough error context enrichment for debugging.

The code carefully adds input context to errors that lack it, which will significantly help users debug issues. The pattern of augmenting base errors with additional context is a good practice.

compiler/src/ir/intermediate.rs (3)

13-61: Well-designed intermediate representation for epsilon elimination.

The structures cleanly separate epsilon and byte transitions, and the new method properly initializes all fields.

111-145: Clean conversion pipeline from intermediate to final NFA.

The finalize method correctly orchestrates epsilon elimination and final validation, with clear separation of concerns.

184-201: Verify capture group merging logic during epsilon elimination—no automated tests found.

The concern in the original review is valid: no tests were found that specifically cover epsilon transitions combined with capture groups. The capture merging logic (adding start captures from source closure and end captures from target closure) appears algorithmically sound, but is complex and lacks dedicated test coverage.

Tests that should exist but were not found:

Epsilon transitions with capture groups

Nested capture groups across epsilon transitions

Multiple chained epsilon transitions with captures

Start/end captures at different positions

compiler/src/error.rs (4)

9-74: Well-organized error code system with clear categorization.

The error codes are logically grouped into ranges, making it easy to identify error categories at a glance. The Display implementation is complete.

76-135: Excellent error design with rich context and actionable suggestions.

The error variants provide comprehensive context for debugging while keeping related information grouped logically. The consistent inclusion of suggestion fields is particularly user-friendly.

137-235: Clean implementation of error utility methods.

The methods provide clear access to error codes and well-formatted user messages. The is_recoverable() classification is useful for error handling strategies.

237-319: Comprehensive and consistent convenience constructors.

The factory methods provide easy-to-use interfaces for creating common errors with appropriate codes and helpful suggestions. All constructors properly populate their respective error variants.

compiler/src/utils.rs (1)

131-154: No validation needed in decomposed_to_composed_regex—downstream backend code already validates max_len values.

The backend generators validate that max_substring_bytes contains no zero values and return InvalidCapture errors if validation fails. The decomposed_to_composed_regex function correctly collects max_len values; validation happens in the appropriate downstream layer.

Likely an incorrect or invalid review comment.

coderabbitai · 2025-10-31T09:59:23Z

.github/workflows/test.yml

+        toolchain: stable
+
+    - name: Cache Cargo
+      uses: actions/cache@v3


⚠️ Potential issue | 🟠 Major

Update actions/cache to v4 for GitHub Actions compatibility.

The actions/cache@v3 action is outdated per actionlint. Update to v4 to ensure compatibility with current GitHub Actions standards and to receive security updates.

- uses: actions/cache@v3 + uses: actions/cache@v4

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

uses: actions/cache@v3

uses: actions/cache@v4

🧰 Tools

🪛 actionlint (1.7.8)

23-23: the runner of "actions/cache@v3" action is too old to run on GitHub Actions. update the action's version to fix this issue

(action)

🤖 Prompt for AI Agents

In .github/workflows/test.yml around line 23, the workflow references actions/cache@v3 which is outdated; update the action reference to actions/cache@v4 to satisfy actionlint and gain compatibility/security updates. Modify the uses entry to point to actions/cache@v4 (no other changes required), commit the workflow, and verify the workflow lints and runs successfully in CI.

coderabbitai · 2025-10-31T09:59:23Z

Cargo.toml

-# mail-parser = { version = "0.8", git = "https://github.com/stalwartlabs//mail-parser", tag = "0.8.0" }
+[workspace.dependencies]
+clap = "4.2.1"
+comptime = { git = "https://github.com/jp4g/sparse_array", branch = "feat/comptime-codegen" }


⚠️ Potential issue | 🔴 Critical

CRITICAL: Replace feature-branch git dependency with stable crate version.

The comptime dependency uses an unstable feature branch from a fork, which creates a production risk. The official comptime crate version 1.0.0 is available on crates.io. Unless the fork contains essential unreleased features, migrate to the stable version:

-comptime = { git = "https://github.com/jp4g/sparse_array", branch = "feat/comptime-codegen" } +comptime = "1.0.0"

If fork-specific functionality is required, open an issue with the upstream maintainers or document the dependency override rationale in comments.

Also applies to: 9-9

🤖 Prompt for AI Agents

In Cargo.toml around lines 9 to 9, the comptime dependency is pointing at a feature branch of a forked repo; replace that git dependency with the stable crates.io release by changing the dependency to use version = "1.0.0" (remove the git/branch fields), run cargo update / cargo build to refresh Cargo.lock, and search the repo for any other occurrences of the git override to update or document; if fork-specific behavior is truly required, add a short comment explaining the rationale and open an upstream issue instead of keeping the git dependency.

coderabbitai · 2025-10-31T09:59:24Z

circom/.gitignore

+
+# logs
+logs
+_.log


⚠️ Potential issue | 🟡 Minor

Fix typo in log file pattern.

Line 15 has _.log which will only match a file literally named "_.log". This should be *.log to match all log files with the .log extension.

Apply this diff:

-_.log +*.log

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

_.log

*.log

🤖 Prompt for AI Agents

In circom/.gitignore around line 15, the entry currently reads "_.log" which only matches a literal file named "_.log"; change this to "*.log" so the gitignore will match all files ending with the .log extension; update the line to use an asterisk prefix (no other changes needed).

coderabbitai · 2025-10-31T09:59:24Z

circom/circuits/common/email_addr_graph.json

@@ -0,0 +1,943 @@
+{
+  "regex": "([A-Za-z0-9!#$%&'*+=?\\-\\^_`{|}~./@]+@[A-Za-z0-9.\\-]+)",


⚠️ Potential issue | 🟠 Major

Remove extra ‘@’ from the local-part class.

The generated regex on Line 2 still includes @ in the local-part character class, so this graph will accept inputs like foo@bar@example.com: the first @ keeps you in the local-part loop, and the second @ transitions to the domain. That violates the single-@ requirement for mailboxes and will yield false positives. Please drop @ from the first character class (and regen this graph) so only the delimiter @ switches to the domain portion.

🤖 Prompt for AI Agents

In circom/circuits/common/email_addr_graph.json around line 2 the local-part character class incorrectly includes '@', allowing multiple '@' characters (e.g., foo@bar@example.com); remove the '@' from the first character class so only the single delimiter '@' separates local-part and domain, update the regex accordingly, and regenerate the graph artifact so the change is reflected in the generated file.

coderabbitai · 2025-10-31T09:59:24Z

circom/circuits/common/email_addr_regex.circom

+    signal isWithinPathLengthMinusOne[maxMatchBytes-2];
+    signal isTransitionLinked[maxMatchBytes];
+
+    signal isValidStartState;
+
+    signal reachedAcceptState[maxMatchBytes];
+
+    component isValidTraversal[maxMatchBytes];
+
+    // Select the haystack from the input
+    signal haystack[maxMatchBytes] <== SelectSubArray(maxHaystackBytes, maxMatchBytes)(inHaystack, matchStart, matchLength);
+
+    // Check if the first state in the haystack is a valid start state
+    isValidStartState <== IsEqual()([startStates[0], currStates[0]]);
+
+    for (var i = 0; i < maxMatchBytes; i++) {
+        isWithinPathLength[i] <== LessThan(log2Ceil(maxMatchBytes))([i, matchLength]);
+
+        // Check if the traversal is a valid path
+        if (i < maxMatchBytes-2) {
+            isWithinPathLengthMinusOne[i] <== LessThan(log2Ceil(maxMatchBytes))([i, matchLength-1]);
+            isTransitionLinked[i] <== IsEqual()([nextStates[i], currStates[i+1]]);
+            isTransitionLinked[i] * isWithinPathLengthMinusOne[i] === isWithinPathLengthMinusOne[i];
+        }


⚠️ Potential issue | 🔴 Critical

Enforce transition linkage for the final step

When matchLength == maxMatchBytes, the constraint block at Line 53 never executes for i = maxMatchBytes - 2, so nextStates[maxMatchBytes-2] is never forced to equal currStates[maxMatchBytes-1]. An attacker can supply an inconsistent final transition and still satisfy all constraints, breaking soundness for full-length matches. Please extend the guard/value array so the last link is checked as well. Example fix:

- signal isWithinPathLengthMinusOne[maxMatchBytes-2]; + signal isWithinPathLengthMinusOne[maxMatchBytes-1]; @@ - if (i < maxMatchBytes-2) { + if (i < maxMatchBytes-1) { isWithinPathLengthMinusOne[i] <== LessThan(log2Ceil(maxMatchBytes))([i, matchLength-1]); isTransitionLinked[i] <== IsEqual()([nextStates[i], currStates[i+1]]); isTransitionLinked[i] * isWithinPathLengthMinusOne[i] === isWithinPathLengthMinusOne[i]; }

🤖 Prompt for AI Agents

In circom/circuits/common/email_addr_regex.circom around lines 34 to 57, the arrays and loop guard for checking transition linkage stop at maxMatchBytes-2 so the final link (between nextStates[maxMatchBytes-2] and currStates[maxMatchBytes-1]) is not enforced when matchLength == maxMatchBytes; extend the arrays and adjust the loop condition so the last link is validated: declare isWithinPathLengthMinusOne and isTransitionLinked with size maxMatchBytes-1 (instead of maxMatchBytes-2 and maxMatchBytes), change the in-loop guard to if (i < maxMatchBytes-1) and compute isWithinPathLengthMinusOne[i] with LessThan(log2Ceil(maxMatchBytes))([i, matchLength-1]) then set isTransitionLinked[i] <== IsEqual()([nextStates[i], currStates[i+1]]) and constrain isTransitionLinked[i] * isWithinPathLengthMinusOne[i] === isWithinPathLengthMinusOne[i] so the final transition is enforced for full-length matches.

coderabbitai · 2025-10-31T09:59:28Z

compiler/src/utils.rs

+            } else if i + 2 < chars.len() && chars[i + 2] == '<' {
+                // Could be: (?<=...) positive lookbehind, or (?<!...) negative lookbehind, or (?<name>...) PCRE named capture
+                if i + 3 < chars.len() && (chars[i + 3] == '=' || chars[i + 3] == '!') {
+                    // Lookbehind assertion: (?<=...) or (?<!...)
+                    // These are special groups, preserve as-is
+                    result.push(ch);
+                } else {
+                    // PCRE named capture: (?<name>...)
+                    // Convert to non-capturing and skip the name
+                    result.push_str("(?:");
+                    i += 2; // Skip '?' and '<'
+                    // Skip until we find the closing '>'
+                    while i + 1 < chars.len() && chars[i + 1] != '>' {
+                        i += 1;
+                    }
+                    if i + 1 < chars.len() && chars[i + 1] == '>' {
+                        i += 1; // Skip the '>'
+                    }
+                }
+            } else if i + 3 < chars.len() && chars[i + 2] == 'P' && chars[i + 3] == '<' {
+                // Rust named capture: (?P<name>...)
+                // Convert to non-capturing and skip the name
+                result.push_str("(?:");
+                i += 3; // Skip '?', 'P', and '<'
+                // Skip until we find the closing '>'
+                while i + 1 < chars.len() && chars[i + 1] != '>' {
+                    i += 1;
+                }
+                if i + 1 < chars.len() && chars[i + 1] == '>' {
+                    i += 1; // Skip the '>'
+                }


⚠️ Potential issue | 🟠 Major

Validate named capture group syntax to prevent silent pattern corruption.

The code skips to the closing > for named captures but doesn't validate that it exists. Malformed patterns like (?<name... without a closing > will scan to the end of the pattern, potentially corrupting the output.

Consider adding validation:

// Skip until we find the closing '>' while i + 1 < chars.len() && chars[i + 1] != '>' { i += 1; } if i + 1 < chars.len() && chars[i + 1] == '>' { i += 1; // Skip the '>' + } else { + // Malformed named capture - no closing '>' + return format!("(?:MALFORMED_CAPTURE)"); }

Or better yet, document that this function expects well-formed regex patterns and should be called after regex validation.

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents

In compiler/src/utils.rs around lines 68 to 98, the named-capture skipping logic does not verify a closing '>' and will run to the end of the input on malformed patterns; update the logic to detect whether the loop found a '>' and handle the failure explicitly: after the loop check if the terminating '>' was not found and either (a) return an error/Result indicating a malformed regex so callers can validate before calling, or (b) restore/emit the original characters (do not drop them) and stop trying to skip the name (to avoid silently corrupting output); implement one of these behaviors consistently for both PCRE (?<name>) and Rust (?P<name>) branches and add a brief comment explaining the chosen failure mode.

coderabbitai · 2025-10-31T09:59:28Z

compiler/src/utils.rs

+pub fn validate_cli_template_name(name: &str) -> Result<String, String> {
+    // Convert to PascalCase to normalize
+    let pascal_name = name.to_pascal_case();
+
+    // Verify it's valid PascalCase
+    if pascal_name != name {
+        return Err("Template name must be in PascalCase (e.g., ThisIsATemplate)".into());
+    }
+
+    Ok(name.to_string())
+}


⚠️ Potential issue | 🟡 Minor

Validate against empty template names.

The function doesn't check for empty strings. An empty template name would pass validation since "".to_pascal_case() == "".

Add an early check:

pub fn validate_cli_template_name(name: &str) -> Result<String, String> { + if name.is_empty() { + return Err("Template name cannot be empty".into()); + } + // Convert to PascalCase to normalize let pascal_name = name.to_pascal_case();

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

pub fn validate_cli_template_name(name: &str) -> Result<String, String> {

// Convert to PascalCase to normalize

let pascal_name = name.to_pascal_case();

// Verify it's valid PascalCase

if pascal_name != name {

return Err("Template name must be in PascalCase (e.g., ThisIsATemplate)".into());

}

Ok(name.to_string())

}

pub fn validate_cli_template_name(name: &str) -> Result<String, String> {

if name.is_empty() {

return Err("Template name cannot be empty".into());

}

// Convert to PascalCase to normalize

let pascal_name = name.to_pascal_case();

// Verify it's valid PascalCase

if pascal_name != name {

return Err("Template name must be in PascalCase (e.g., ThisIsATemplate)".into());

}

Ok(name.to_string())

}

🤖 Prompt for AI Agents

In compiler/src/utils.rs around lines 156 to 166, the validate_cli_template_name function does not reject empty strings (""), which currently pass since "".to_pascal_case() == ""; add an early check at the top to return Err("Template name cannot be empty" or similar) if name.trim().is_empty(), then proceed with the existing PascalCase normalization and comparison; ensure the error message is consistent with other validation errors.

coderabbitai · 2025-10-31T09:59:28Z

compiler/src/wasm.rs

+impl From<WasmError> for JsValue {
+    fn from(err: WasmError) -> Self {
+        // Return a structured error message that JavaScript can parse
+        match err {
+            WasmError::CompilerError { code, message } => {
+                let error_json = format!(
+                    r#"{{"type": "CompilerError", "code": "{}", "message": "{}"}}"#,
+                    code,
+                    message.replace('"', "\\\"").replace('\n', "\\n")
+                );
+                JsValue::from_str(&error_json)
+            }
+        }
+    }


⚠️ Potential issue | 🟠 Major

Use proper JSON serialization instead of manual string escaping.

The manual JSON construction with string replacement is fragile and doesn't handle all special characters (e.g., \r, \t, \, control characters). This could lead to invalid JSON or injection issues.

Use serde_json for safe serialization:

+use serde_json::json; + impl From<WasmError> for JsValue { fn from(err: WasmError) -> Self { - // Return a structured error message that JavaScript can parse match err { WasmError::CompilerError { code, message } => { - let error_json = format!( - r#"{{"type": "CompilerError", "code": "{}", "message": "{}"}}"#, - code, - message.replace('"', "\\\"").replace('\n', "\\n") - ); - JsValue::from_str(&error_json) + let error_json = json!({ + "type": "CompilerError", + "code": code, + "message": message + }); + JsValue::from_str(&error_json.to_string()) } } }

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents

In compiler/src/wasm.rs around lines 29 to 42, the From<WasmError> impl manually constructs JSON with string replacements which is unsafe for many characters; replace the manual formatting with proper serde_json serialization: derive or build a small serializable struct or use serde_json::json! to produce a valid JSON string, call serde_json::to_string (or to_string_pretty) and pass the result to JsValue::from_str, and handle serialization errors (e.g., log and return a generic error JSON) and add/import serde_json in Cargo.toml and at the top of the file.

coderabbitai · 2025-10-31T09:59:28Z

noir/common/email_domain.json

+{
+    "parts": [
+        { "Pattern": "[A-Za-z0-9!#$%&'*+=?\\-\\^_`{|}~./]+@" },
+        { "PublicPattern": ["[A-Za-z0-9.\\-@]+", 128] }


⚠️ Potential issue | 🟠 Major

Remove '@' from domain character class.

The PublicPattern includes @ in the character class [A-Za-z0-9.\\-@]+, but email domains should not contain @ symbols. Since Part 1 already ends with @, allowing it in the domain part would permit malformed emails like user@domain@extra.com.

Apply this fix:

- { "PublicPattern": ["[A-Za-z0-9.\\-@]+", 128] } + { "PublicPattern": ["[A-Za-z0-9.\\-]+", 128] }

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

{ "PublicPattern": ["[A-Za-z0-9.\\-@]+", 128] }

{ "PublicPattern": ["[A-Za-z0-9.\\-]+", 128] }

🤖 Prompt for AI Agents

In noir/common/email_domain.json around line 4, the PublicPattern character class wrongly includes '@' allowing malformed domains; remove the '@' from the class so the domain regex does not accept '@' characters, and ensure the hyphen is either escaped or placed last in the class (e.g. [A-Za-z0-9.\-]+ or [A-Za-z0-9.-]+) while keeping the same length limit (128).

coderabbitai · 2025-10-31T09:59:28Z

README.md


- [ ] support character class regex grammar (so as to simplify the regular expressions)
- [ ] better compatibility with regex grammar (will need a set of different tests to assert the compatibility between circuit and regex in languages) -->
+## License

-## Assumptions
-Some email providers put not only the sender's email address but also their username to the From field.
-ALthough its concrete formats differ among email providers, our FromAddrRegex template assumes that the email address appears at the end of the From field.
-If this assumption does not hold, i.e., the username appears after the email address, an adversary can output an arbitrary email address from that template by including a dummy email address in the username.
+This project is licensed under the [Specify License Here - e.g., MIT License or Apache 2.0].


⚠️ Potential issue | 🟡 Minor

Fix license placeholder.

The README now ends with “Specify License Here…”, which contradicts the actual MIT license declared elsewhere. Please replace the placeholder with the real license text or link so downstream users get the correct compliance information.

🤖 Prompt for AI Agents

In README.md around lines 229 to 232, the license section still contains a placeholder; replace "Specify License Here - e.g., MIT License or Apache 2.0" with the actual MIT license reference by updating the heading to "License" and either inserting the short MIT license blurb plus year/owner or linking to the full text (e.g., "This project is licensed under the MIT License - see LICENSE file or https://opensource.org/licenses/MIT"), ensuring it matches the project's LICENSE file.

shreyas-londhe and others added 16 commits March 3, 2025 07:56

fix: 3 char issue

e861888

feat: v2 fresh start

02150d4

feat: remove epsilon states (wip)

6ec7675

fix: epsilon-nfa -> nfa

e309004

feat: circom codegen

6667e88

fix: conditional capture group logic

7613e48

feat: added substring extraction logic

a61a224

feat: basic case working

09c7cd8

noir compiler for regex matching (no substring capture

499db3e

feat: basic compiler functions (untested)

f45751c

update to work with bin and new compiler

f755d92

split functionality, add packed capture group data to lut

9bb586f

fix: capture group logic

2901ec1

feat: added function to generate inputs for circuit

7930a41

feat: basic e2e working

20d465a

feat: added wasm functions

8a24437

shreyas-londhe and others added 13 commits April 2, 2025 18:20

feat: minor changes

2121a41

feat: added sample circom circuits

20ee1d3

merge

d9d839b

input gen works for non-capture

0738447

feat: improve epsilon removal

7f7a6e9

optimized substring capture single case

e39ea6c

common utilities moved to common

90c2027

codegen for capture groups

5e1d6cb

codegen works with non-captures and captures

29ab2b6

constrain capture start end masks

0735934

fix: epsilon removal

49573cf

Merge branch 'feat/new-compiler' into feat/new-compiler-noir

3983096

updated regex compiler

c7cc953

BornPsych and others added 6 commits September 12, 2025 02:16

Merge pull request #107 from zkemail/fix/error-handling

35014c2

error handling in gen circuit input

chore: added github actions for the test

0f79735

Merge pull request #106 from zkemail/rutefig/reg-531-move-python-scri…

d18c9e9

…pts-used-for-generating-circom-and-noir feat: migrate Python scripts to TypeScript with comprehensive testing and tooling improvements

chore: added license file to repo

29f8cc8

Merge branch 'main' into feat/new-compiler

8c50a00

BornPsych marked this pull request as ready for review September 29, 2025 06:59

BornPsych added 11 commits September 29, 2025 13:02

chore: ran cargo clippy fix

3dd1eb5

fix: force new build process

298b3f7

fix: changes in pre-push file for preserving tests

c1ac7ad

chore: regenerate templates with updated import paths

a8a670c

fix: added code to preserve mannual code

5124dff

chore: increased the test timeout for CI tests

fa0b546

chore: removed the unsupported bun deps

7de226d

fix: CI for body hash test

9fc345a

fix: CI error due to wrong version of bun

a54aced

fix: add root bunfig.toml with 30s timeout for all tests

2d5cd24

fix: pin Bun version to 1.2.22 in CI workflow

09b8a86

rutefig added 7 commits October 30, 2025 09:59

fix: rust compilation errors - unstable expressions

89c9442

chore: convert bare capture groups on private patterns into non captu…

181ddd8

…re, preserves special groups

fix: doctest trying to access to private function

12b8a6f

chore: added test to check if gen_from_decomposed converts capture gr…

bb530c5

…oups inside of public patterns into non capture

Merge pull request #109 from zkemail/rutefig/reg-574-fix-prevent-capt…

3d37c31

…ure-groups-inside-of-private-patterns Rutefig/reg 574 fix prevent capture groups inside of private patterns

cursor bot reviewed Oct 31, 2025

View reviewed changes

coderabbitai bot reviewed Oct 31, 2025

View reviewed changes

		@@ -0,0 +1,943 @@
		{
		"regex": "([A-Za-z0-9!#$%&'*+=?\\-\\^_`{\|}~./@]+@[A-Za-z0-9.\\-]+)",

	{ "PublicPattern": ["[A-Za-z0-9.\\-@]+", 128] }
	{ "PublicPattern": ["[A-Za-z0-9.\\-]+", 128] }

Feat/new compiler #92

Are you sure you want to change the base?

Feat/new compiler #92

Conversation

shreyas-londhe commented Mar 28, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

socket-security bot commented Mar 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

BornPsych commented Sep 30, 2025

Uh oh!

cursor bot commented Sep 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coderabbitai bot commented Oct 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Poem

Pre-merge checks and finishing touches

Uh oh!

cursor bot left a comment

Choose a reason for hiding this comment

This PR is being reviewed by Cursor Bugbot

Uh oh!

cursor bot Oct 31, 2025

Choose a reason for hiding this comment

Bug: Bug

Uh oh!

cursor bot Oct 31, 2025

Choose a reason for hiding this comment

Bug: Bug

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Oct 31, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Oct 31, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Oct 31, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Oct 31, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Oct 31, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Oct 31, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Oct 31, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Oct 31, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Oct 31, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Oct 31, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

shreyas-londhe commented Mar 28, 2025 •

edited by coderabbitai bot

Loading

socket-security bot commented Mar 31, 2025 •

edited

Loading

cursor bot commented Sep 30, 2025 •

edited

Loading

coderabbitai bot commented Oct 31, 2025 •

edited

Loading