- 
                Notifications
    You must be signed in to change notification settings 
- Fork 726
          fix: panic in regexp.MustCompile when building wildcarddirectories with non ascii characters
          #1947
        
          New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR fixes a panic in regexp.MustCompile when building wildcard directories with non-ASCII characters in file paths. The issue was caused by the regex pattern using \w (which only matches ASCII word characters in Go's regexp package), making it incompatible with Unicode characters commonly found in international file paths.
Key Changes:
- Updated the reserved character pattern to explicitly escape only regex metacharacters instead of using negated character classes
- Replaced Go-incompatible \wpattern with explicit list of special regex characters
- Added comprehensive test coverage for non-ASCII characters in wildcard directory paths
Reviewed Changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
| File | Description | 
|---|---|
| internal/vfs/utilities.go | Fixed regex pattern to escape only actual regex metacharacters, removing unsupported \wpattern that caused panics with Unicode | 
| internal/tsoptions/wildcarddirectories_test.go | Added new test cases covering Norwegian, Japanese, Chinese, and other Unicode characters in file paths | 
| // so we only escape characters that have special meaning in regex. | ||
| var ( | ||
| reservedCharacterPattern *regexp.Regexp = regexp.MustCompile(`[^\w\s/]`) | ||
| reservedCharacterPattern *regexp.Regexp = regexp.MustCompile(`[\\.\+*?()\[\]{}^$|#]`) | 
    
      
    
      Copilot
AI
    
    
    
      Oct 24, 2025 
    
  
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The backslash in the character class should not be escaped. In Go regex, within a character class [], a literal backslash only needs to appear once. The pattern [\\.\+*?()\[\]{}^$|#] attempts to escape the backslash itself, which may not match literal backslashes correctly. Change to [\\.+*?()\[\]{}^$|#] (single backslash, no escape for +).
| reservedCharacterPattern *regexp.Regexp = regexp.MustCompile(`[\\.\+*?()\[\]{}^$|#]`) | |
| reservedCharacterPattern *regexp.Regexp = regexp.MustCompile(`[\\.+*?()\[\]{}^$|#]`) | 
| // so we only escape characters that have special meaning in regex. | ||
| var ( | ||
| reservedCharacterPattern *regexp.Regexp = regexp.MustCompile(`[^\w\s/]`) | ||
| reservedCharacterPattern *regexp.Regexp = regexp.MustCompile(`[\\.\+*?()\[\]{}^$|#]`) | 
    
      
    
      Copilot
AI
    
    
    
      Oct 24, 2025 
    
  
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The + character does not need to be escaped inside a character class. In Go regex character classes, + is treated as a literal character. Change \+ to + for consistency and correctness.
| reservedCharacterPattern *regexp.Regexp = regexp.MustCompile(`[\\.\+*?()\[\]{}^$|#]`) | |
| reservedCharacterPattern *regexp.Regexp = regexp.MustCompile(`[\\.+*?()\[\]{}^$|#]`) | 
| I'm not sure I understand; if this were panicking wouldn't everything be broken right now? | 
| 
 sorry should've clarified in the PR description, apparently this is only an issue when you've got non-ascii characters inside the path on windows machines.  | 
regexp.MustCompile when building wildcarddirectoriesregexp.MustCompile when building wildcarddirectories with non ascii characters
      | Ah, this is oxc-project/tsgolint#318. I didn't realize you meant a knock-on effect from a later regex call. | 
| // It may be inefficient (we could just match (/[-[\]{}()*+?.,\\^$|#\s]/g), but this is future | ||
| // proof. | ||
| // Reserved characters - only escape actual regex metacharacters. | ||
| // Go's regexp doesn't support \x escape sequences for arbitrary characters, | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We shouldn't be using regexp at all; does switching to regexp2 fix this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I don't understand why getWildcardDirectories uses regexp instead of regexp2.
(I would rather we not use regexp/regexp2 at all in the codebase but that's another problem.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i would guess it's still a problem with regexp2.
I remove this regex in favor of a simple for loop though 6bf5543
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It shouldn't be, given regexp2 has a mode that emulates ECMAScript regexes?
If we are going to use regexp and not regexp2, then we can probably just use regexp.QuoteMeta.
| The latest commit is not exactly what I mean; I wasn't trying to say to continue to change the escaping, but rather fix  I feel somewhat less confident about hand escaping regexes like this unless we can be sure that those characters are definitely all that matter? Otherwise, we'd just use https://pkg.go.dev/regexp#QuoteMeta, right? | 
136d7fe    to
    853ac9d      
    Compare
  
    | 
 Ah i see. i reverted that commit. The problem is that  Hopefully I understood that correctly 🙂 | 
| Why does Strada not need the escaping change, though? | 
| // so we only escape characters that have special meaning in regex. | ||
| var ( | ||
| reservedCharacterPattern *regexp.Regexp = regexp.MustCompile(`[^\w\s/]`) | ||
| reservedCharacterPattern *regexp.Regexp = regexp.MustCompile(`[\\.\+*?()\[\]{}^$|#]`) | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The goal of suggesting regex2 was to avoid making this particular change. If we still want to restrict the characters, it makes me think we should just use QuoteMeta and not change anything else.
No description provided.