Skip to content

Conversation

@shreyas-londhe
Copy link
Member

@shreyas-londhe shreyas-londhe commented Mar 28, 2025

Note

Replaces the old Rust compiler and extensive Circom tests with a new Bun/TypeScript scripts suite (gen-regex, gen-inputs), shared utilities, and Jest-based tests.

  • Tooling/Scripts (new):
    • Add Bun/TypeScript scripts for generating regex artifacts: circom/scripts/gen-regex.ts, noir/scripts/gen-regex.ts, noir/scripts/gen-inputs.ts.
    • Introduce shared utilities (utils/*) for logging, file ops, subprocess, and types.
    • Add project config: tsconfig.json, jest.config.ts, jest.setup.ts, bun.lock, and package.json.
  • Tests (new):
    • Add Jest integration and unit tests under scripts/__tests__ for script workflows and utilities.
  • Compiler (removed):
    • Remove legacy Rust compiler package (packages/compiler/*).
  • Circom Tests (removed):
    • Remove numerous legacy Circom test circuits and test cases under packages/circom/tests/*.

Written by Cursor Bugbot for commit 3d37c31. This will update automatically on new commits. Configure here.

Summary by CodeRabbit

  • New Features

    • Added comprehensive regex verification circuits for email headers (body hash, subject, sender, recipient, message ID) and email addresses.
    • Introduced Noir proving framework support alongside Circom.
    • Added compiler infrastructure for generating regex verification circuits from patterns.
  • Infrastructure & Build

    • Updated GitHub Actions workflow for improved testing with Rust and Bun tooling.
    • Simplified and consolidated project configuration across build tools.
    • Added pre-push Git hooks to enforce circuit regeneration when compiler changes.
  • Documentation

    • Completely revised README with detailed setup, installation, and troubleshooting guidance.
    • Added comprehensive documentation for Circom and Noir subprojects.
    • Included usage examples and common commands for developers.
  • Chores

    • Cleaned up ESLint, Prettier, and Babel configurations.
    • Restructured workspace manifest for monorepo organization.
    • Updated .gitignore for Rust and Prover-related artifacts.

BornPsych and others added 6 commits September 12, 2025 02:16
error handling in gen circuit input
- Restore individual test functions in Noir circuit files that were previously removed
- Add circuit input test cases and update sample data
- Update script optimizations for better binary preference
- Add comprehensive manual test guidelines to README files
- Document safe test formatting to prevent accidental deletion by gen-inputs script

Co-Authored-By: Colleague <colleague@example.com>

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
…pts-used-for-generating-circom-and-noir

feat: migrate Python scripts to TypeScript with comprehensive testing and tooling improvements
@BornPsych BornPsych marked this pull request as ready for review September 29, 2025 06:59
@BornPsych
Copy link

@cursor please review this

@cursor
Copy link

cursor bot commented Sep 30, 2025

I've thoroughly reviewed the "Feat/new compiler" PR. I started by exploring the codebase, examining package.json files for dependency changes, and then delved into the compiler's structure, error handling, and test files. I also paid close attention to the Socket security report and the Cargo.toml files for Rust dependencies.

Here's a summary of my findings:

This PR introduces a substantial new ZK-Regex compiler with a well-structured architecture, robust error handling, and support for multiple proving frameworks (Circom and Noir). Key strengths include:

  • Architecture & Design: Modular structure, clear API, and support for multiple frameworks.
  • Error Handling: Comprehensive error taxonomy with user-friendly messages.
  • Developer Experience: Excellent CLI tool, automatic template regeneration, and good documentation.
  • Code Quality: Follows Rust best practices with clean abstractions.

However, I identified several areas for improvement and potential issues:

  • Security Concerns: The Socket report highlighted the addition of @zk-email/circuits@6.4.1-alpha.0 and an upgrade of circom_tester. Using alpha versions in production is a risk.
  • Test Coverage: I found no unit tests in the Rust compiler code, and integration tests appear limited.
  • Dependencies: There's a dependency on a specific GitHub branch (comptime from feat/comptime-codegen), which could lead to instability.
  • Documentation: Some inline documentation is missing, and there's no architecture diagram.

My main recommendations are to:

  • Add comprehensive unit tests for core compiler functionality.
  • Replace alpha dependencies with stable versions.
  • Address the external GitHub branch dependency.
  • Add performance benchmarks.

Overall, this is a high-quality PR, but the critical issues regarding tests and dependencies should be addressed before merging.


5 tasks completed

  • Review PR changes for new compiler feature
  • Analyze new dependencies added (@types/bun, @zk-email/circuits)
  • Review Socket security report findings
  • Verify new compiler implementation and integration
  • Provide comprehensive review feedback
    Open in Cursor Open in Web

Learn more about Cursor Agents

…oups inside of public patterns into non capture
…group conversion

The convert_capturing_to_non_capturing function was naively converting
all '(' to '(?:' without considering:
- Escaped parentheses like \( which match literal parens
- Parentheses inside character classes like [()] which are literals

This caused regex semantics to break. For example:
- \(foo\) would become \(?:foo\) (wrong)
- [()] would become [(?:)] (matches extra chars)

Fixed by tracking two states during string scan:
- in_char_class: are we inside [...]?
- escaped: was previous char a backslash?

Only convert ( to (?:) when it's not escaped, not in a char class,
and not already followed by ?.

Added 5 new tests covering escaped parens, character classes, and
mixed cases. All 23 tests pass.

Addresses PR review feedback: #109 (comment)
Extended convert_capturing_to_non_capturing to handle both PCRE-style
(?<name>...) and Rust-style (?P<name>...) named capture groups, not
just bare capturing groups. This prevents named captures in private
patterns from interfering with public capture group numbering.

Key changes:
- Detect and convert both named capture syntaxes to (?:...)
- Distinguish lookbehind assertions (?<=, ?<!) from named captures
- Updated test_preserve_named_groups to test_convert_pcre_named_groups
  with correct expected behavior
- Added 7 new tests covering both named capture styles and edge cases

Fixes code review finding that named captures were being preserved
instead of converted, which would cause capture group numbering
mismatches in generated circuits.
…ure-groups-inside-of-private-patterns

Rutefig/reg 574 fix prevent capture groups inside of private patterns
@coderabbitai
Copy link

coderabbitai bot commented Oct 31, 2025

Walkthrough

This PR introduces a comprehensive compiler infrastructure for generating zero-knowledge regex circuits. It restructures the project around a Rust-based compiler that converts regex patterns into circuit code for both Circom and Noir frameworks, adds supporting tooling, test infrastructure, and documentation across multiple proof systems.

Changes

Cohort / File(s) Summary
Configuration & Tooling Cleanup
.cargo/config, .eslintrc.json, .eslintignore, babel.config.js, .prettierrc
Removed deprecated/obsolete configuration files for cargo, ESLint, and Babel
Build Configuration Updates
Cargo.toml, .gitignore, bunfig.toml
Restructured workspace to single compiler member; updated dependency management; revised ignore patterns; added Bun test timeout config
GitHub Actions
.github/workflows/test.yaml, .github/workflows/test.yml
Replaced Node/Yarn-based workflow with Rust/Bun-based CI pipeline including Circom 2.1.9 setup and caching
Root Documentation
README.md, LICENSE
Comprehensive README rewrite with installation, setup, and troubleshooting guidance; minor license URL standardization
Compiler Core
compiler/src/lib.rs, compiler/src/driver.rs, compiler/src/error.rs, compiler/src/types.rs, compiler/src/utils.rs, compiler/src/wasm.rs, compiler/Cargo.toml, compiler/README.md, compiler/package.json
Complete Rust compiler library for regex-to-circuit generation; public API for compilation, decomposed regex handling, circuit input generation, and WASM bindings; structured error handling with error codes; utility functions for template validation and output serialization
Compiler IR Pipeline
compiler/src/ir/mod.rs, compiler/src/ir/nfa.rs, compiler/src/ir/intermediate.rs, compiler/src/ir/graph.rs
Intermediate representation and NFA graph structures; epsilon-transition elimination; DFA construction with capture group support; path-finding and reachability analysis
Compiler Passes
compiler/src/passes/mod.rs, compiler/src/passes/error.rs, compiler/src/passes/builder.rs
NFA builder pipeline from regex patterns; Thompson NFA conversion; error types and translation to compiler errors
Backend Code Generation
compiler/src/backend/mod.rs, compiler/src/backend/circom.rs, compiler/src/backend/noir.rs, compiler/src/backend/shared.rs
Code generators for Circom and Noir proving frameworks; circuit data extraction and input generation; shared logic for state transitions and capture group handling
CLI Binary
compiler/src/bin/zk-regex.rs
Command-line interface with subcommands for decomposed/raw regex compilation and circuit input generation
Circom Integration
circom/Cargo.toml (not in diff, implied by package.json), circom/package.json, circom/README.md, circom/tsconfig.json, circom/bunfig.toml, circom/index.ts
Circom-specific package configuration, documentation, and test setup
Circom Regex Templates
circom/circuits/common/body_hash_regex.circom, circom/circuits/common/email_addr_regex.circom, circom/circuits/common/email_domain_regex.circom, circom/circuits/common/from_all_regex.circom, circom/circuits/common/message_id_regex.circom, circom/circuits/common/reversed_bracket_regex.circom, circom/circuits/common/simple_regex.circom, circom/circuits/common/subject_all_regex.circom, circom/circuits/common/to_all_regex.circom
Circom 2.1.5 templates implementing DFA-based regex matching with capture group support; deterministic state transitions; per-byte validity checking; final accept-state validation
Circom DFA Graphs
circom/circuits/common/*_graph.json
JSON representations of deterministic finite automatons for regex patterns (email, subject, body hash, etc.); state transitions and capture group metadata
Circom Regex Helpers
circom/circuits/regex_helpers.circom
Utility templates for regex matching: MultiOR aggregation, byte/range transition checking with optional capture group validation, and substring capture extraction
Circom Tests
circom/circuits/tests/body_hash.test.ts, circom/circuits/tests/circuits/test_body_hash_regex.circom
Test suite for body hash regex using circom_tester; witness computation and constraint validation
Circom Regex Definitions
circom/regexes/*.json
Multi-part regex patterns (body hash, email address, domains, headers) defining pattern sequences and capture group constraints
Circom Code Generation Script
circom/scripts/gen-regex.ts
TypeScript script to invoke zk-regex compiler for each JSON regex, building release binary and generating Circom circuit code
Circom .gitignore
circom/.gitignore
Comprehensive ignore patterns for Circom build artifacts, dependencies, caches, IDE files
Noir Integration
noir/Nargo.toml, noir/README.md
Noir project manifest with compiler version and sort dependency; documentation for Noir test integration
Noir Pattern Configurations
noir/common/*.json
Multi-part regex pattern definitions for Noir (body hash, email address/domain, message-id, etc.)
Noir Sample Data
noir/common/sample_haystacks/body_hash.json, noir/common/sample_haystacks/circuit_inputs/body_hash_pass_0.json
Sample test inputs and expected circuit input traces for Noir test generation
Git Hooks
git-hooks/pre-push
Pre-push hook enforcing compiler rebuild and template regeneration; blocks push if changes detected to enable review

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant CLI as zk-regex CLI
    participant Driver
    participant Builder as NFABuilder
    participant Backend as Circom/Noir<br/>Backend
    participant Output

    User->>CLI: gen-from-raw(pattern, template, framework)
    CLI->>Driver: compile(pattern, config)
    Driver->>Builder: NFAGraph::build(pattern)
    Builder->>Builder: Thompson NFA → IntermediateNFA
    Builder->>Builder: Epsilon elimination
    Builder->>Output: NFAGraph (with states,<br/>transitions, captures)
    Driver->>Backend: generate_code(nfa, template, pattern, max_bytes)
    Backend->>Backend: extract start/accept states
    Backend->>Backend: generate per-byte<br/>transition logic
    Backend->>Backend: emit capture<br/>group handling
    Backend->>Output: Circom/Noir code
    Output-->>User: (NFAGraph, Code)

    User->>CLI: gen-circuit-inputs(nfa, haystack, framework)
    CLI->>Driver: generate_circuit_inputs(...)
    Driver->>Builder: get_path_to_accept(haystack)
    Builder->>Builder: PikeVM traverse haystack
    Builder->>Builder: extract state path &<br/>capture metadata
    Driver->>Output: CircomInputs / NoirInputs
    Output-->>User: Prover-ready JSON
Loading

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120 minutes

Areas requiring extra attention:

  • Compiler IR & NFA Implementation (compiler/src/ir/, compiler/src/passes/): Complex epsilon-transition elimination, path-finding logic, and state reachability analysis with non-obvious correctness constraints.
  • Backend Code Generation (compiler/src/backend/circom.rs, compiler/src/backend/noir.rs): Dense circuit code emission with conditional logic for capture groups, state aggregation, and per-byte transition mapping; correctness critical for circuit validity.
  • Circom Template Logic (circom/circuits/common/*_regex.circom): Complex per-position state tracking, transition validity accumulation, and accept-state verification; errors could silently fail circuit constraints.
  • Driver & Error Handling (compiler/src/driver.rs, compiler/src/error.rs): Error mapping and recovery paths across multiple compilation stages.
  • WASM Bindings (compiler/src/wasm.rs): Interface between JavaScript and Rust compiler; serialization/deserialization edge cases and error translation.
  • Git Hook Script (git-hooks/pre-push): Stash/unstash logic and error recovery paths in shell scripting.

Poem

🐰 A regex becomes a circuit, a pattern becomes proof,
Thompson builds the scaffold, epsilon fades aloof,
States dance through the bytes, captures held tight,
Zero-knowledge promised in circuits of might!

Pre-merge checks and finishing touches

❌ Failed checks (1 inconclusive)
Check name Status Explanation Resolution
Title Check ❓ Inconclusive The PR introduces a comprehensive new compiler system for generating zero-knowledge regex verification circuits across multiple proving frameworks (Circom and Noir), representing a substantial architectural addition to the codebase. The title "Feat/new compiler" does reference a real component of the changeset, but it uses vague terminology that fails to convey meaningful information about what the compiler does, why it was added, or its significance. A teammate scanning the repository history would not understand from this title alone that the PR introduces a full zk-regex compiler with circuit generation capabilities, multiple framework support, or its role in the project. Additionally, the title uses a branch-naming convention format (feat/...) rather than a proper PR title format. Consider revising the title to be more descriptive and specific, such as "Add ZK-Regex compiler for Circom and Noir circuit generation" or "Implement compiler for regex verification circuit generation" to clearly communicate the purpose and scope of this major addition. This would help future reviewers quickly understand the primary contribution when scanning the commit history.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch feat/new-compiler

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR is being reviewed by Cursor Bugbot

Details

You are on the Bugbot Free tier. On this plan, Bugbot will review limited PRs each billing cycle.

To receive Bugbot reviews on all of your PRs, visit the Cursor dashboard to activate Pro and start your 14-day free trial.

eprintln!("Supported frameworks: circom, noir");
std::process::exit(1);
}
};
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Bug

The proving_framework argument is parsed inconsistently across commands. While Decomposed and Raw use ProvingFramework::from_str, GenerateCircuitInput manually parses it with a match statement. This duplicates logic and creates maintenance challenges, potentially leading to behavioral inconsistencies.

Fix in Cursor Fix in Web

}

// Run the script if this file is executed directly
if (import.meta.url === `file://${process.argv[1]}`) {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Bug

The direct execution check for the main function in gen-inputs.ts and gen-regex.ts uses an incorrect comparison between import.meta.url (a URL string) and process.argv[1] (a file path). This mismatch means the scripts' main functions never run when executed directly.

Additional Locations (1)

Fix in Cursor Fix in Web

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 25

🧹 Nitpick comments (12)
circom/regexes/reversed_bracket.json (1)

5-5: Consider whether greedy matching is intentional.

The pattern <.* uses greedy matching (.*) which will match everything after the < character. If the intent is to match content only up to the next delimiter or end of string, you might want to use a more constrained pattern.

However, if the greedy behavior is intentional for your use case, this is fine as-is.

circom/circuits/common/to_all_graph.json (1)

1-50: Clarify generation and maintenance process for DFA graph file.

This 2256-line JSON file encodes a deterministic finite automaton (DFA) for the regex pattern (?:\r\n|^)to:([^\r\n]+)\r\n. The file structure suggests automated generation rather than manual authoring.

Key concerns:

  1. Is this auto-generated? If yes, add build artifacts to .gitignore or document the generation process.
  2. Maintenance model: If manually maintained, this is unmaintainable. If generated, provide:
    • Documentation on the generator tool
    • Instructions for regeneration when regex patterns change
    • Validation that generated states are correct and complete
  3. Reachability verification: Confirm all states are reachable and accept_states are terminal.

Without clarity on ownership, future maintainers won't know how to update this when regex patterns change.

I can help document the generation process or create a verification script. Would you like me to generate a script to validate DFA reachability and accept state correctness?

circom/regexes/message_id.json (1)

1-7: Consider expanding the message-id character class to align with RFC 5322.

The PublicPattern on line 4 uses <[A-Za-z0-9=@\\.\\+_-]+>, which may be too restrictive. RFC 5322 allows additional characters in message-id local and domain parts, such as colons (:) and slashes (/). Modern message-ids may include these characters.

If stricter validation is intended, this is acceptable. Otherwise, consider expanding the pattern:

-        { "PublicPattern": ["<[A-Za-z0-9=@\\.\\+_-]+>", 128] },
+        { "PublicPattern": ["<[A-Za-z0-9=@\\.\\+_\\-:/]+>", 128] },
circom/README.md (1)

1-62: LGTM - Comprehensive documentation with minor style improvements available.

The README provides clear installation instructions, prerequisites, and troubleshooting guidance. The content is accurate and aligns well with the Circom integration introduced in this PR.

For improved markdown consistency, consider fixing the indentation issues flagged by markdownlint. Nested list items should use 2-space indentation instead of 4-space:

  • Lines 13-14, 19-23, 27-31, 35-40 have 4-space indentation
  • Line 51 uses emphasis instead of a proper heading

These are purely stylistic and can be addressed in a future cleanup if desired.

noir/common/message_id.json (1)

1-7: Pattern duplication with circom/regexes/message_id.json.

This file is identical to circom/regexes/message_id.json. While having separate regex definitions for Circom and Noir toolchains is reasonable, consider whether these could be maintained in a single source to prevent drift.

Additionally, the same character class limitation applies here: the pattern may be too restrictive for modern message-ids per RFC 5322. See the review comment on circom/regexes/message_id.json for details.

compiler/README.md (2)

13-40: Fix markdown list indentation to comply with MD007.

Lines use 4-space indentation for nested list items; markdown style guide requires 2 spaces. Update the list indentation throughout (lines 13–40, 51, 65, 81, 97) for consistency.

-    -   **`compile(pattern: &str) -> Result<NFAGraph, CompilerError>`**
-        Parses the input regex `pattern` string.
+  -   **`compile(pattern: &str) -> Result<NFAGraph, CompilerError>`**
+      Parses the input regex `pattern` string.

51-51: Replace bold text with heading syntax for "Examples" sections (MD036).

Lines 51, 65, 81, and 97 use bold text (e.g., **Example 1:**) instead of proper heading syntax. Convert to heading level (e.g., ### Example 1:) for better accessibility and document structure.

-**Example 1: Compile a simple regex to NFA**
+### Example 1: Compile a simple regex to NFA

Also applies to: 65-65, 81-81, 97-97

compiler/src/utils.rs (2)

168-196: Consider validating the file_extension parameter.

The function accepts file_extension without validation. Invalid extensions could create confusion (e.g., extensions with path separators, empty strings, or non-standard characters).

Add basic validation:

 pub fn save_outputs(
     nfa: &NFAGraph,
     circom_code: String,
     output_dir: &PathBuf,
     template_name: &str,
     file_extension: &str,
 ) -> Result<(), Box<dyn std::error::Error>> {
     validate_cli_template_name(template_name)?;
+    
+    // Validate file extension
+    if file_extension.is_empty() || file_extension.contains('/') || file_extension.contains('\\') {
+        return Err("Invalid file extension".into());
+    }

     // Create output directory if it doesn't exist

198-464: Strong test coverage! Consider adding edge case tests.

The test suite is comprehensive and well-organized. To complement the issues raised above, consider adding tests for:

  • Escaped brackets in character classes: r"[\[]" and r"[\]]"
  • Character class starting with ]: r"[]a]"
  • Malformed named captures: r"(?<name...)" (missing >)
  • Empty template name validation
compiler/src/wasm.rs (1)

155-189: Minor optimization: avoid cloning max_substring_bytes twice.

The code clones max_substring_bytes on line 161 (and 180 for Noir), but then the original is also referenced in the closure. Since it's only Option<Vec<usize>>, the clone is cheap, but the pattern is slightly redundant.

Consider restructuring to avoid the clone:

+    let num_capture_groups = nfa.num_capture_groups;
+    let max_bytes_len = max_substring_bytes.as_ref().map(|v| v.len()).unwrap_or(0);
+
     let code = match proving_framework {
         ProvingFramework::Circom => {
-            let max_bytes_clone = config.max_bytes.clone();
             crate::backend::generate_circom_code(
                 &nfa,
                 &template_name.0,
                 &raw_regex.0,
-                max_substring_bytes.clone(),
+                max_substring_bytes,
             )
             .map_err(|nfa_err| {
                 match nfa_err {
                     crate::passes::NFAError::InvalidCapture(_) => {
                         CompilerError::invalid_capture_config(
-                            nfa.num_capture_groups,
-                            max_substring_bytes.as_ref().map(|v| v.len()).unwrap_or(0),
+                            num_capture_groups,
+                            max_bytes_len,
                         )
                     }
compiler/src/lib.rs (1)

73-88: Consider exposing the optimize parameter.

The optimize flag is hardcoded to true in the CompilationConfig. Depending on use cases, users might want to disable optimization for debugging or to compare output.

Consider adding an optional parameter:

pub fn gen_from_raw(
    pattern: &str,
    max_bytes: Option<Vec<usize>>,
    template_name: &str,
    proving_framework: ProvingFramework,
    optimize: Option<bool>,
) -> CompilerResult<(NFAGraph, String)> {
    let config = CompilationConfig {
        template_name: template_name.to_string(),
        proving_framework,
        max_bytes,
        optimize: optimize.unwrap_or(true),
    };
    // ...
}
compiler/src/ir/intermediate.rs (1)

340-396: State compaction logic is correct but clones nodes.

The compact_states method correctly remaps all state indices and transitions. The node cloning on line 353 is safe but could be optimized if this becomes a performance bottleneck.

If performance is critical, consider using std::mem::take:

let mut node = std::mem::take(&mut self.nodes[old_idx]);

Though this would require making self.nodes temporarily inconsistent, so the current clone approach is safer.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4fb1bda and 3d37c31.

⛔ Files ignored due to path filters (5)
  • .DS_Store is excluded by !**/.DS_Store
  • Cargo.lock is excluded by !**/*.lock
  • bun.lock is excluded by !**/*.lock
  • circom/bun.lock is excluded by !**/*.lock
  • packages/circom/pnpm-lock.yaml is excluded by !**/pnpm-lock.yaml
📒 Files selected for processing (78)
  • .cargo/config (0 hunks)
  • .eslintignore (0 hunks)
  • .eslintrc.json (0 hunks)
  • .github/workflows/test.yaml (0 hunks)
  • .github/workflows/test.yml (1 hunks)
  • .gitignore (1 hunks)
  • .prettierrc (0 hunks)
  • Cargo.toml (1 hunks)
  • LICENSE (1 hunks)
  • README.md (2 hunks)
  • babel.config.js (0 hunks)
  • bunfig.toml (1 hunks)
  • circom/.gitignore (1 hunks)
  • circom/README.md (1 hunks)
  • circom/bunfig.toml (1 hunks)
  • circom/circuits/common/body_hash_regex.circom (1 hunks)
  • circom/circuits/common/email_addr_graph.json (1 hunks)
  • circom/circuits/common/email_addr_regex.circom (1 hunks)
  • circom/circuits/common/email_domain_graph.json (1 hunks)
  • circom/circuits/common/email_domain_regex.circom (1 hunks)
  • circom/circuits/common/from_all_graph.json (1 hunks)
  • circom/circuits/common/from_all_regex.circom (1 hunks)
  • circom/circuits/common/message_id_graph.json (1 hunks)
  • circom/circuits/common/message_id_regex.circom (1 hunks)
  • circom/circuits/common/reversed_bracket_regex.circom (1 hunks)
  • circom/circuits/common/simple_graph.json (1 hunks)
  • circom/circuits/common/simple_regex.circom (1 hunks)
  • circom/circuits/common/subject_all_graph.json (1 hunks)
  • circom/circuits/common/subject_all_regex.circom (1 hunks)
  • circom/circuits/common/to_all_graph.json (1 hunks)
  • circom/circuits/common/to_all_regex.circom (1 hunks)
  • circom/circuits/regex_helpers.circom (1 hunks)
  • circom/circuits/tests/body_hash.test.ts (1 hunks)
  • circom/circuits/tests/circuits/test_body_hash_regex.circom (1 hunks)
  • circom/index.ts (1 hunks)
  • circom/package.json (1 hunks)
  • circom/regexes/body_hash.json (1 hunks)
  • circom/regexes/email_addr.json (1 hunks)
  • circom/regexes/email_domain.json (1 hunks)
  • circom/regexes/from_all.json (1 hunks)
  • circom/regexes/message_id.json (1 hunks)
  • circom/regexes/reversed_bracket.json (1 hunks)
  • circom/regexes/simple.json (1 hunks)
  • circom/regexes/subject_all.json (1 hunks)
  • circom/regexes/to_all.json (1 hunks)
  • circom/scripts/gen-regex.ts (1 hunks)
  • circom/tsconfig.json (1 hunks)
  • compiler/Cargo.toml (1 hunks)
  • compiler/README.md (1 hunks)
  • compiler/package.json (1 hunks)
  • compiler/src/backend/circom.rs (1 hunks)
  • compiler/src/backend/mod.rs (1 hunks)
  • compiler/src/backend/noir.rs (1 hunks)
  • compiler/src/backend/shared.rs (1 hunks)
  • compiler/src/bin/zk-regex.rs (1 hunks)
  • compiler/src/driver.rs (1 hunks)
  • compiler/src/error.rs (1 hunks)
  • compiler/src/ir/graph.rs (1 hunks)
  • compiler/src/ir/intermediate.rs (1 hunks)
  • compiler/src/ir/mod.rs (1 hunks)
  • compiler/src/ir/nfa.rs (1 hunks)
  • compiler/src/lib.rs (1 hunks)
  • compiler/src/passes/builder.rs (1 hunks)
  • compiler/src/passes/error.rs (1 hunks)
  • compiler/src/passes/mod.rs (1 hunks)
  • compiler/src/types.rs (1 hunks)
  • compiler/src/utils.rs (1 hunks)
  • compiler/src/wasm.rs (1 hunks)
  • git-hooks/pre-push (1 hunks)
  • noir/Nargo.toml (1 hunks)
  • noir/README.md (1 hunks)
  • noir/common/body_hash.json (1 hunks)
  • noir/common/email_addr.json (1 hunks)
  • noir/common/email_domain.json (1 hunks)
  • noir/common/from_all.json (1 hunks)
  • noir/common/message_id.json (1 hunks)
  • noir/common/sample_haystacks/body_hash.json (1 hunks)
  • noir/common/sample_haystacks/circuit_inputs/body_hash_pass_0.json (1 hunks)
💤 Files with no reviewable changes (6)
  • .github/workflows/test.yaml
  • .prettierrc
  • .eslintignore
  • .eslintrc.json
  • babel.config.js
  • .cargo/config
🧰 Additional context used
🧬 Code graph analysis (16)
compiler/src/passes/builder.rs (3)
compiler/src/backend/circom.rs (2)
  • nfa (229-231)
  • nfa (233-235)
compiler/src/backend/noir.rs (2)
  • nfa (191-193)
  • nfa (195-197)
compiler/src/ir/intermediate.rs (1)
  • new (53-61)
compiler/src/ir/nfa.rs (1)
compiler/src/ir/intermediate.rs (1)
  • new (53-61)
compiler/src/ir/mod.rs (2)
compiler/src/backend/circom.rs (2)
  • nfa (229-231)
  • nfa (233-235)
compiler/src/backend/noir.rs (2)
  • nfa (191-193)
  • nfa (195-197)
compiler/src/bin/zk-regex.rs (6)
compiler/src/lib.rs (3)
  • gen_circuit_inputs (160-297)
  • gen_from_decomposed (114-126)
  • gen_from_raw (73-88)
compiler/src/utils.rs (2)
  • save_outputs (168-196)
  • validate_cli_template_name (156-166)
compiler/src/types.rs (1)
  • from_str (51-59)
compiler/src/error.rs (2)
  • code (139-148)
  • user_message (151-224)
compiler/src/ir/nfa.rs (1)
  • from_json (80-84)
compiler/src/passes/error.rs (1)
  • from (53-166)
compiler/src/backend/shared.rs (5)
compiler/src/passes/error.rs (1)
  • from (53-166)
compiler/src/backend/noir.rs (8)
  • inputs (272-276)
  • inputs (283-287)
  • inputs (291-295)
  • nfa (191-193)
  • nfa (195-197)
  • start_states (350-353)
  • accept_states (379-382)
  • start (430-430)
compiler/src/backend/circom.rs (2)
  • nfa (229-231)
  • nfa (233-235)
compiler/src/ir/intermediate.rs (1)
  • new (53-61)
compiler/src/ir/nfa.rs (1)
  • new (63-71)
circom/circuits/tests/body_hash.test.ts (1)
compiler/src/wasm.rs (1)
  • genCircuitInputs (198-217)
compiler/src/driver.rs (6)
compiler/src/lib.rs (1)
  • compile (39-41)
compiler/src/backend/circom.rs (3)
  • nfa (229-231)
  • nfa (233-235)
  • generate_circom_code (62-432)
compiler/src/backend/noir.rs (3)
  • nfa (191-193)
  • nfa (195-197)
  • generate_noir_code (30-265)
compiler/src/passes/builder.rs (1)
  • build (21-37)
compiler/src/error.rs (3)
  • code (139-148)
  • invalid_capture_config (278-290)
  • circuit_generation_failed (268-276)
compiler/src/passes/error.rs (1)
  • from (53-166)
circom/scripts/gen-regex.ts (6)
compiler/src/bin/zk-regex.rs (1)
  • main (83-225)
scripts/utils/logger.ts (2)
  • logger (41-41)
  • error (36-38)
scripts/utils/file-operations.ts (3)
  • fileExists (75-82)
  • ensureDirectory (24-30)
  • listFilesWithExtension (99-104)
scripts/utils/types.ts (1)
  • ScriptError (69-74)
scripts/utils/string-utils.ts (1)
  • toPascalCase (15-19)
scripts/utils/subprocess.ts (1)
  • executeCargo (196-267)
compiler/src/lib.rs (6)
compiler/src/driver.rs (1)
  • compile (59-130)
compiler/src/passes/builder.rs (1)
  • build (21-37)
compiler/src/backend/shared.rs (3)
  • from (33-44)
  • from (48-59)
  • generate_circuit_inputs (127-228)
compiler/src/passes/error.rs (1)
  • from (53-166)
compiler/src/utils.rs (1)
  • decomposed_to_composed_regex (131-154)
compiler/src/error.rs (2)
  • input_too_long (292-300)
  • no_match_found (302-310)
compiler/src/utils.rs (3)
compiler/src/ir/intermediate.rs (1)
  • new (53-61)
compiler/src/ir/nfa.rs (1)
  • new (63-71)
compiler/src/lib.rs (1)
  • gen_from_decomposed (114-126)
compiler/src/ir/graph.rs (5)
compiler/src/backend/circom.rs (2)
  • nfa (229-231)
  • nfa (233-235)
compiler/src/backend/noir.rs (3)
  • nfa (191-193)
  • nfa (195-197)
  • start (430-430)
compiler/src/ir/intermediate.rs (1)
  • new (53-61)
compiler/src/ir/nfa.rs (2)
  • new (63-71)
  • node (96-99)
compiler/src/backend/shared.rs (2)
  • path (163-163)
  • path (164-164)
compiler/src/passes/error.rs (2)
compiler/src/wasm.rs (5)
  • from (21-26)
  • from (30-42)
  • from (54-56)
  • from (60-62)
  • from (66-68)
compiler/src/error.rs (2)
  • nfa_construction_failed (259-266)
  • serialization_error (312-318)
compiler/src/backend/noir.rs (1)
compiler/src/backend/shared.rs (4)
  • escape_regex_for_display (230-243)
  • generate_circuit_data (62-125)
  • from (33-44)
  • from (48-59)
compiler/src/backend/circom.rs (1)
compiler/src/backend/shared.rs (2)
  • escape_regex_for_display (230-243)
  • generate_circuit_data (62-125)
compiler/src/ir/intermediate.rs (3)
compiler/src/backend/circom.rs (2)
  • nfa (229-231)
  • nfa (233-235)
compiler/src/backend/noir.rs (3)
  • nfa (191-193)
  • nfa (195-197)
  • start (430-430)
compiler/src/ir/nfa.rs (2)
  • new (63-71)
  • node (96-99)
compiler/src/wasm.rs (6)
compiler/src/backend/circom.rs (3)
  • generate_circom_code (62-432)
  • nfa (229-231)
  • nfa (233-235)
compiler/src/backend/noir.rs (6)
  • generate_noir_code (30-265)
  • nfa (191-193)
  • nfa (195-197)
  • inputs (272-276)
  • inputs (283-287)
  • inputs (291-295)
compiler/src/lib.rs (2)
  • compile (39-41)
  • gen_circuit_inputs (160-297)
compiler/src/utils.rs (1)
  • decomposed_to_composed_regex (131-154)
compiler/src/passes/error.rs (1)
  • from (53-166)
compiler/src/error.rs (3)
  • code (139-148)
  • serialization_error (312-318)
  • invalid_capture_config (278-290)
🪛 actionlint (1.7.8)
.github/workflows/test.yml

23-23: the runner of "actions/cache@v3" action is too old to run on GitHub Actions. update the action's version to fix this issue

(action)

🪛 markdownlint-cli2 (0.18.1)
README.md

14-14: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)


15-15: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)

compiler/README.md

13-13: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)


14-14: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)


15-15: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)


19-19: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)


20-20: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)


21-21: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)


22-22: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)


23-23: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)


27-27: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)


28-28: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)


29-29: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)


30-30: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)


31-31: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)


35-35: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)


36-36: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)


37-37: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)


38-38: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)


39-39: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)


40-40: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)


51-51: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)


65-65: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)


81-81: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)


97-97: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)

circom/README.md

13-13: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)


14-14: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)


15-15: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)


19-19: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)


20-20: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)


21-21: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)


22-22: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)


23-23: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)


27-27: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)


28-28: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)


29-29: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)


30-30: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)


31-31: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)


35-35: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)


36-36: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)


37-37: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)


38-38: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)


39-39: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)


40-40: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)


51-51: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
  • GitHub Check: Socket Security: Pull Request Alerts
  • GitHub Check: Cursor Bugbot
  • GitHub Check: test
  • GitHub Check: test
🔇 Additional comments (34)
LICENSE (1)

674-674: URL standardization looks correct.

The reference to https://www.gnu.org/licenses/why-not-lgpl.html is the canonical FSF link for GPL v3 licensing guidance. This is a standard maintenance update with no license compliance implications.

noir/Nargo.toml (2)

1-5: Populate authors and review version constraints.

The authors field (line 4) is empty, which is a metadata quality issue. For published or team projects, author information should be populated. Additionally, the compiler_version constraint uses only a lower bound (>=1.0.0) with no upper limit, which could lead to compatibility issues if future Noir releases introduce breaking changes.

  • Please populate the authors field with actual author name(s), e.g., authors = ["Your Name"]
  • Consider adding an upper bound to compiler_version to maintain compatibility, e.g., compiler_version = ">=1.0.0, <2.0.0" (adjust based on stability guarantees)

7-8: Verify the dependency specification and git tag availability.

The sort dependency is sourced from a GitHub repository via a git tag (v0.2.3) rather than a version-pinned crates.io release. While this approach is valid in Noir, please confirm:

  1. The git repository URL and tag v0.2.3 are accessible and stable.
  2. This dependency sourcing approach aligns with the project's practices for managing external dependencies.
  3. The tag points to a compatible and stable version of the library.

If the dependency is well-maintained and the tag is stable, consider adding a comment explaining why a git-sourced dependency is preferred over a registry release (if available).

git-hooks/pre-push (1)

17-30: comment

circom/regexes/simple.json (1)

1-3: LGTM!

Valid JSON structure defining a simple regex pattern for testing the compiler pipeline.

noir/common/email_addr.json (1)

1-10: LGTM!

Email validation pattern with appropriate maximum length (320 characters per RFC 3696). The pattern includes standard email characters and properly escaped special characters.

noir/README.md (1)

1-20: LGTM!

Clear and helpful documentation with an important warning about test placement. The guidance prevents potential issues with automated test generation scripts.

circom/regexes/to_all.json (1)

1-7: LGTM!

Well-structured multi-part pattern for parsing email "To:" headers. Correctly handles CRLF line endings and constrains the captured content to a reasonable maximum length.

circom/circuits/common/simple_graph.json (1)

1-71: LGTM with observation.

The NFA graph correctly represents the regex "a*b" with proper state transitions. The multiple start states (0, 2, 3) are unusual but may be intentional for the compiler's graph representation or optimization strategy.

Note: Ensure the graph consumer correctly handles multiple start states if this is a generated artifact from the compiler pipeline.

circom/regexes/from_all.json (1)

1-7: LGTM — new regex pattern data asset.

This JSON file adds a regex definition for parsing email "from:" headers. The structure and patterns are consistent with the regex data assets introduced elsewhere in the PR.

noir/common/from_all.json (1)

1-7: LGTM — matching Noir regex pattern.

This mirrors the Circom version for cross-framework consistency. Both frameworks now share the same "from:" header parsing patterns.

compiler/Cargo.toml (1)

13-23: Verify workspace dependency versions in root Cargo.toml.

All dependencies are correctly delegated to the workspace. However, ensure the versions specified in the root workspace manifest are appropriate and not using unstable sources (except where necessary).

circom/regexes/body_hash.json (1)

1-8: LGTM — DKIM body hash regex pattern.

Well-structured regex for extracting DKIM body hash values. The base64 pattern and 44-character limit align with SHA256-based DKIM signatures.

circom/tsconfig.json (1)

1-27: LGTM — well-configured TypeScript settings.

The configuration enforces strict type checking while maintaining development flexibility. Modern settings for bundler-based workflows are appropriate.

noir/common/sample_haystacks/body_hash.json (1)

1-9: LGTM - Well-structured test fixture.

The test cases appropriately cover:

  • Pass case 1: body hash pattern without leading CRLF (line start anchor)
  • Pass case 2: body hash pattern with leading CRLF
  • Fail case: pattern preceded by another header field ("to:"), ensuring the regex doesn't match mid-line
circom/circuits/tests/circuits/test_body_hash_regex.circom (1)

5-5: No issues identified—parameter values are correct.

The BodyHashRegex template is instantiated with maxHaystackBytes=300 and maxMatchBytes=299. These parameters are appropriately configured: the haystack (input data to search) is 300 bytes, and the maximum match length is 299 bytes. Having the match length slightly less than the haystack size is a standard and intentional configuration.

compiler/src/passes/mod.rs (1)

6-9: Dismiss this review comment — the builder module is correctly kept private.

The builder module contains no public type definitions, only impl blocks that add methods to public types (NFAGraph, IntermediateNFA) defined in the ir/ module. External code already has full access to builder functionality through public methods like NFAGraph::build(). Keeping the builder module private is appropriate; it's an internal implementation detail, not a public API surface requiring exposure.

Likely an incorrect or invalid review comment.

circom/circuits/common/subject_all_regex.circom (1)

43-148: Solid DFA wiring

The transition matrix, capture propagation, and path-length/linkage checks line up cleanly with (?:\r\n|^)subject:([^\r\n]+)\r\n; I don’t see correctness gaps here.

circom/circuits/common/message_id_regex.circom (1)

43-153: Regex automaton looks sound

State transitions and capture-group handling match the message-id pattern, and the accept-state accounting mirrors the other proven templates—looks good.

circom/circuits/common/subject_all_graph.json (1)

1-2321: Graph data matches the circuit

The DFA states, transitions, and capture annotations mirror the Circom template’s expectations, so the data asset is consistent.

circom/circuits/common/from_all_regex.circom (1)

43-167: From-header automaton looks consistent

The transitions and capture plumbing reflect the intended (from: …) header pattern, with the same path-length safeguards as the other templates. No issues spotted.

compiler/src/wasm.rs (2)

45-69: Good use of newtype pattern for type safety at WASM boundaries.

The wrapper types provide clear type distinctions at the WASM interface, which helps prevent parameter ordering mistakes.


194-242: Clean error handling in circuit input generation.

The function properly deserializes the NFA graph, generates inputs, and serializes the result with appropriate error handling at each step.

compiler/src/lib.rs (3)

24-41: Clean and well-documented API entry point.

The compile function provides a clear, simple interface for NFA construction with proper error type conversion.


90-126: Clean decomposition and delegation pattern.

The function appropriately delegates to gen_from_raw after composing the regex, avoiding code duplication.


267-294: Thorough error context enrichment for debugging.

The code carefully adds input context to errors that lack it, which will significantly help users debug issues. The pattern of augmenting base errors with additional context is a good practice.

compiler/src/ir/intermediate.rs (3)

13-61: Well-designed intermediate representation for epsilon elimination.

The structures cleanly separate epsilon and byte transitions, and the new method properly initializes all fields.


111-145: Clean conversion pipeline from intermediate to final NFA.

The finalize method correctly orchestrates epsilon elimination and final validation, with clear separation of concerns.


184-201: Verify capture group merging logic during epsilon elimination—no automated tests found.

The concern in the original review is valid: no tests were found that specifically cover epsilon transitions combined with capture groups. The capture merging logic (adding start captures from source closure and end captures from target closure) appears algorithmically sound, but is complex and lacks dedicated test coverage.

Tests that should exist but were not found:

  • Epsilon transitions with capture groups
  • Nested capture groups across epsilon transitions
  • Multiple chained epsilon transitions with captures
  • Start/end captures at different positions
compiler/src/error.rs (4)

9-74: Well-organized error code system with clear categorization.

The error codes are logically grouped into ranges, making it easy to identify error categories at a glance. The Display implementation is complete.


76-135: Excellent error design with rich context and actionable suggestions.

The error variants provide comprehensive context for debugging while keeping related information grouped logically. The consistent inclusion of suggestion fields is particularly user-friendly.


137-235: Clean implementation of error utility methods.

The methods provide clear access to error codes and well-formatted user messages. The is_recoverable() classification is useful for error handling strategies.


237-319: Comprehensive and consistent convenience constructors.

The factory methods provide easy-to-use interfaces for creating common errors with appropriate codes and helpful suggestions. All constructors properly populate their respective error variants.

compiler/src/utils.rs (1)

131-154: No validation needed in decomposed_to_composed_regex—downstream backend code already validates max_len values.

The backend generators validate that max_substring_bytes contains no zero values and return InvalidCapture errors if validation fails. The decomposed_to_composed_regex function correctly collects max_len values; validation happens in the appropriate downstream layer.

Likely an incorrect or invalid review comment.

toolchain: stable

- name: Cache Cargo
uses: actions/cache@v3
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Update actions/cache to v4 for GitHub Actions compatibility.

The actions/cache@v3 action is outdated per actionlint. Update to v4 to ensure compatibility with current GitHub Actions standards and to receive security updates.

-      uses: actions/cache@v3
+      uses: actions/cache@v4
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
uses: actions/cache@v3
uses: actions/cache@v4
🧰 Tools
🪛 actionlint (1.7.8)

23-23: the runner of "actions/cache@v3" action is too old to run on GitHub Actions. update the action's version to fix this issue

(action)

🤖 Prompt for AI Agents
In .github/workflows/test.yml around line 23, the workflow references
actions/cache@v3 which is outdated; update the action reference to
actions/cache@v4 to satisfy actionlint and gain compatibility/security updates.
Modify the uses entry to point to actions/cache@v4 (no other changes required),
commit the workflow, and verify the workflow lints and runs successfully in CI.

# mail-parser = { version = "0.8", git = "https://github.com/stalwartlabs//mail-parser", tag = "0.8.0" }
[workspace.dependencies]
clap = "4.2.1"
comptime = { git = "https://github.com/jp4g/sparse_array", branch = "feat/comptime-codegen" }
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

CRITICAL: Replace feature-branch git dependency with stable crate version.

The comptime dependency uses an unstable feature branch from a fork, which creates a production risk. The official comptime crate version 1.0.0 is available on crates.io. Unless the fork contains essential unreleased features, migrate to the stable version:

-comptime = { git = "https://github.com/jp4g/sparse_array", branch = "feat/comptime-codegen" }
+comptime = "1.0.0"

If fork-specific functionality is required, open an issue with the upstream maintainers or document the dependency override rationale in comments.

Also applies to: 9-9

🤖 Prompt for AI Agents
In Cargo.toml around lines 9 to 9, the comptime dependency is pointing at a
feature branch of a forked repo; replace that git dependency with the stable
crates.io release by changing the dependency to use version = "1.0.0" (remove
the git/branch fields), run cargo update / cargo build to refresh Cargo.lock,
and search the repo for any other occurrences of the git override to update or
document; if fork-specific behavior is truly required, add a short comment
explaining the rationale and open an upstream issue instead of keeping the git
dependency.


# logs
logs
_.log
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Fix typo in log file pattern.

Line 15 has _.log which will only match a file literally named "_.log". This should be *.log to match all log files with the .log extension.

Apply this diff:

-_.log
+*.log
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
_.log
*.log
🤖 Prompt for AI Agents
In circom/.gitignore around line 15, the entry currently reads "_.log" which
only matches a literal file named "_.log"; change this to "*.log" so the
gitignore will match all files ending with the .log extension; update the line
to use an asterisk prefix (no other changes needed).

@@ -0,0 +1,943 @@
{
"regex": "([A-Za-z0-9!#$%&'*+=?\\-\\^_`{|}~./@]+@[A-Za-z0-9.\\-]+)",
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Remove extra ‘@’ from the local-part class.

The generated regex on Line 2 still includes @ in the local-part character class, so this graph will accept inputs like foo@bar@example.com: the first @ keeps you in the local-part loop, and the second @ transitions to the domain. That violates the single-@ requirement for mailboxes and will yield false positives. Please drop @ from the first character class (and regen this graph) so only the delimiter @ switches to the domain portion.

🤖 Prompt for AI Agents
In circom/circuits/common/email_addr_graph.json around line 2 the local-part
character class incorrectly includes '@', allowing multiple '@' characters
(e.g., foo@bar@example.com); remove the '@' from the first character class so
only the single delimiter '@' separates local-part and domain, update the regex
accordingly, and regenerate the graph artifact so the change is reflected in the
generated file.

Comment on lines +34 to +57
signal isWithinPathLengthMinusOne[maxMatchBytes-2];
signal isTransitionLinked[maxMatchBytes];

signal isValidStartState;

signal reachedAcceptState[maxMatchBytes];

component isValidTraversal[maxMatchBytes];

// Select the haystack from the input
signal haystack[maxMatchBytes] <== SelectSubArray(maxHaystackBytes, maxMatchBytes)(inHaystack, matchStart, matchLength);

// Check if the first state in the haystack is a valid start state
isValidStartState <== IsEqual()([startStates[0], currStates[0]]);

for (var i = 0; i < maxMatchBytes; i++) {
isWithinPathLength[i] <== LessThan(log2Ceil(maxMatchBytes))([i, matchLength]);

// Check if the traversal is a valid path
if (i < maxMatchBytes-2) {
isWithinPathLengthMinusOne[i] <== LessThan(log2Ceil(maxMatchBytes))([i, matchLength-1]);
isTransitionLinked[i] <== IsEqual()([nextStates[i], currStates[i+1]]);
isTransitionLinked[i] * isWithinPathLengthMinusOne[i] === isWithinPathLengthMinusOne[i];
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Enforce transition linkage for the final step

When matchLength == maxMatchBytes, the constraint block at Line 53 never executes for i = maxMatchBytes - 2, so nextStates[maxMatchBytes-2] is never forced to equal currStates[maxMatchBytes-1]. An attacker can supply an inconsistent final transition and still satisfy all constraints, breaking soundness for full-length matches. Please extend the guard/value array so the last link is checked as well. Example fix:

-    signal isWithinPathLengthMinusOne[maxMatchBytes-2];
+    signal isWithinPathLengthMinusOne[maxMatchBytes-1];
@@
-        if (i < maxMatchBytes-2) {
+        if (i < maxMatchBytes-1) {
             isWithinPathLengthMinusOne[i] <== LessThan(log2Ceil(maxMatchBytes))([i, matchLength-1]);
             isTransitionLinked[i] <== IsEqual()([nextStates[i], currStates[i+1]]);
             isTransitionLinked[i] * isWithinPathLengthMinusOne[i] === isWithinPathLengthMinusOne[i];
         }
🤖 Prompt for AI Agents
In circom/circuits/common/email_addr_regex.circom around lines 34 to 57, the
arrays and loop guard for checking transition linkage stop at maxMatchBytes-2 so
the final link (between nextStates[maxMatchBytes-2] and
currStates[maxMatchBytes-1]) is not enforced when matchLength == maxMatchBytes;
extend the arrays and adjust the loop condition so the last link is validated:
declare isWithinPathLengthMinusOne and isTransitionLinked with size
maxMatchBytes-1 (instead of maxMatchBytes-2 and maxMatchBytes), change the
in-loop guard to if (i < maxMatchBytes-1) and compute
isWithinPathLengthMinusOne[i] with LessThan(log2Ceil(maxMatchBytes))([i,
matchLength-1]) then set isTransitionLinked[i] <== IsEqual()([nextStates[i],
currStates[i+1]]) and constrain isTransitionLinked[i] *
isWithinPathLengthMinusOne[i] === isWithinPathLengthMinusOne[i] so the final
transition is enforced for full-length matches.

Comment on lines +68 to +98
} else if i + 2 < chars.len() && chars[i + 2] == '<' {
// Could be: (?<=...) positive lookbehind, or (?<!...) negative lookbehind, or (?<name>...) PCRE named capture
if i + 3 < chars.len() && (chars[i + 3] == '=' || chars[i + 3] == '!') {
// Lookbehind assertion: (?<=...) or (?<!...)
// These are special groups, preserve as-is
result.push(ch);
} else {
// PCRE named capture: (?<name>...)
// Convert to non-capturing and skip the name
result.push_str("(?:");
i += 2; // Skip '?' and '<'
// Skip until we find the closing '>'
while i + 1 < chars.len() && chars[i + 1] != '>' {
i += 1;
}
if i + 1 < chars.len() && chars[i + 1] == '>' {
i += 1; // Skip the '>'
}
}
} else if i + 3 < chars.len() && chars[i + 2] == 'P' && chars[i + 3] == '<' {
// Rust named capture: (?P<name>...)
// Convert to non-capturing and skip the name
result.push_str("(?:");
i += 3; // Skip '?', 'P', and '<'
// Skip until we find the closing '>'
while i + 1 < chars.len() && chars[i + 1] != '>' {
i += 1;
}
if i + 1 < chars.len() && chars[i + 1] == '>' {
i += 1; // Skip the '>'
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Validate named capture group syntax to prevent silent pattern corruption.

The code skips to the closing > for named captures but doesn't validate that it exists. Malformed patterns like (?<name... without a closing > will scan to the end of the pattern, potentially corrupting the output.

Consider adding validation:

                     // Skip until we find the closing '>'
                     while i + 1 < chars.len() && chars[i + 1] != '>' {
                         i += 1;
                     }
                     if i + 1 < chars.len() && chars[i + 1] == '>' {
                         i += 1; // Skip the '>'
+                    } else {
+                        // Malformed named capture - no closing '>'
+                        return format!("(?:MALFORMED_CAPTURE)");
                     }

Or better yet, document that this function expects well-formed regex patterns and should be called after regex validation.

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents
In compiler/src/utils.rs around lines 68 to 98, the named-capture skipping logic
does not verify a closing '>' and will run to the end of the input on malformed
patterns; update the logic to detect whether the loop found a '>' and handle the
failure explicitly: after the loop check if the terminating '>' was not found
and either (a) return an error/Result indicating a malformed regex so callers
can validate before calling, or (b) restore/emit the original characters (do not
drop them) and stop trying to skip the name (to avoid silently corrupting
output); implement one of these behaviors consistently for both PCRE (?<name>)
and Rust (?P<name>) branches and add a brief comment explaining the chosen
failure mode.

Comment on lines +156 to +166
pub fn validate_cli_template_name(name: &str) -> Result<String, String> {
// Convert to PascalCase to normalize
let pascal_name = name.to_pascal_case();

// Verify it's valid PascalCase
if pascal_name != name {
return Err("Template name must be in PascalCase (e.g., ThisIsATemplate)".into());
}

Ok(name.to_string())
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Validate against empty template names.

The function doesn't check for empty strings. An empty template name would pass validation since "".to_pascal_case() == "".

Add an early check:

 pub fn validate_cli_template_name(name: &str) -> Result<String, String> {
+    if name.is_empty() {
+        return Err("Template name cannot be empty".into());
+    }
+
     // Convert to PascalCase to normalize
     let pascal_name = name.to_pascal_case();
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
pub fn validate_cli_template_name(name: &str) -> Result<String, String> {
// Convert to PascalCase to normalize
let pascal_name = name.to_pascal_case();
// Verify it's valid PascalCase
if pascal_name != name {
return Err("Template name must be in PascalCase (e.g., ThisIsATemplate)".into());
}
Ok(name.to_string())
}
pub fn validate_cli_template_name(name: &str) -> Result<String, String> {
if name.is_empty() {
return Err("Template name cannot be empty".into());
}
// Convert to PascalCase to normalize
let pascal_name = name.to_pascal_case();
// Verify it's valid PascalCase
if pascal_name != name {
return Err("Template name must be in PascalCase (e.g., ThisIsATemplate)".into());
}
Ok(name.to_string())
}
🤖 Prompt for AI Agents
In compiler/src/utils.rs around lines 156 to 166, the validate_cli_template_name
function does not reject empty strings (""), which currently pass since
"".to_pascal_case() == ""; add an early check at the top to return Err("Template
name cannot be empty" or similar) if name.trim().is_empty(), then proceed with
the existing PascalCase normalization and comparison; ensure the error message
is consistent with other validation errors.

Comment on lines +29 to +42
impl From<WasmError> for JsValue {
fn from(err: WasmError) -> Self {
// Return a structured error message that JavaScript can parse
match err {
WasmError::CompilerError { code, message } => {
let error_json = format!(
r#"{{"type": "CompilerError", "code": "{}", "message": "{}"}}"#,
code,
message.replace('"', "\\\"").replace('\n', "\\n")
);
JsValue::from_str(&error_json)
}
}
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Use proper JSON serialization instead of manual string escaping.

The manual JSON construction with string replacement is fragile and doesn't handle all special characters (e.g., \r, \t, \, control characters). This could lead to invalid JSON or injection issues.

Use serde_json for safe serialization:

+use serde_json::json;
+
 impl From<WasmError> for JsValue {
     fn from(err: WasmError) -> Self {
-        // Return a structured error message that JavaScript can parse
         match err {
             WasmError::CompilerError { code, message } => {
-                let error_json = format!(
-                    r#"{{"type": "CompilerError", "code": "{}", "message": "{}"}}"#,
-                    code,
-                    message.replace('"', "\\\"").replace('\n', "\\n")
-                );
-                JsValue::from_str(&error_json)
+                let error_json = json!({
+                    "type": "CompilerError",
+                    "code": code,
+                    "message": message
+                });
+                JsValue::from_str(&error_json.to_string())
             }
         }
     }

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents
In compiler/src/wasm.rs around lines 29 to 42, the From<WasmError> impl manually
constructs JSON with string replacements which is unsafe for many characters;
replace the manual formatting with proper serde_json serialization: derive or
build a small serializable struct or use serde_json::json! to produce a valid
JSON string, call serde_json::to_string (or to_string_pretty) and pass the
result to JsValue::from_str, and handle serialization errors (e.g., log and
return a generic error JSON) and add/import serde_json in Cargo.toml and at the
top of the file.

{
"parts": [
{ "Pattern": "[A-Za-z0-9!#$%&'*+=?\\-\\^_`{|}~./]+@" },
{ "PublicPattern": ["[A-Za-z0-9.\\-@]+", 128] }
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Remove '@' from domain character class.

The PublicPattern includes @ in the character class [A-Za-z0-9.\\-@]+, but email domains should not contain @ symbols. Since Part 1 already ends with @, allowing it in the domain part would permit malformed emails like user@domain@extra.com.

Apply this fix:

-        { "PublicPattern": ["[A-Za-z0-9.\\-@]+", 128] }
+        { "PublicPattern": ["[A-Za-z0-9.\\-]+", 128] }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
{ "PublicPattern": ["[A-Za-z0-9.\\-@]+", 128] }
{ "PublicPattern": ["[A-Za-z0-9.\\-]+", 128] }
🤖 Prompt for AI Agents
In noir/common/email_domain.json around line 4, the PublicPattern character
class wrongly includes '@' allowing malformed domains; remove the '@' from the
class so the domain regex does not accept '@' characters, and ensure the hyphen
is either escaped or placed last in the class (e.g. [A-Za-z0-9.\-]+ or
[A-Za-z0-9.-]+) while keeping the same length limit (128).

Comment on lines 229 to +232

- [ ] support character class regex grammar (so as to simplify the regular expressions)
- [ ] better compatibility with regex grammar (will need a set of different tests to assert the compatibility between circuit and regex in languages) -->
## License

## Assumptions
Some email providers put not only the sender's email address but also their username to the From field.
ALthough its concrete formats differ among email providers, our FromAddrRegex template assumes that the email address appears at the end of the From field.
If this assumption does not hold, i.e., the username appears after the email address, an adversary can output an arbitrary email address from that template by including a dummy email address in the username.
This project is licensed under the [Specify License Here - e.g., MIT License or Apache 2.0].
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Fix license placeholder.

The README now ends with “Specify License Here…”, which contradicts the actual MIT license declared elsewhere. Please replace the placeholder with the real license text or link so downstream users get the correct compliance information.

🤖 Prompt for AI Agents
In README.md around lines 229 to 232, the license section still contains a
placeholder; replace "Specify License Here - e.g., MIT License or Apache 2.0"
with the actual MIT license reference by updating the heading to "License" and
either inserting the short MIT license blurb plus year/owner or linking to the
full text (e.g., "This project is licensed under the MIT License - see LICENSE
file or https://opensource.org/licenses/MIT"), ensuring it matches the project's
LICENSE file.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants