Skip to content

Conversation

@camc314
Copy link
Contributor

@camc314 camc314 commented Oct 24, 2025

No description provided.

Copilot AI review requested due to automatic review settings October 24, 2025 13:47
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR fixes a panic in regexp.MustCompile when building wildcard directories with non-ASCII characters in file paths. The issue was caused by the regex pattern using \w (which only matches ASCII word characters in Go's regexp package), making it incompatible with Unicode characters commonly found in international file paths.

Key Changes:

  • Updated the reserved character pattern to explicitly escape only regex metacharacters instead of using negated character classes
  • Replaced Go-incompatible \w pattern with explicit list of special regex characters
  • Added comprehensive test coverage for non-ASCII characters in wildcard directory paths

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File Description
internal/vfs/utilities.go Fixed regex pattern to escape only actual regex metacharacters, removing unsupported \w pattern that caused panics with Unicode
internal/tsoptions/wildcarddirectories_test.go Added new test cases covering Norwegian, Japanese, Chinese, and other Unicode characters in file paths

// so we only escape characters that have special meaning in regex.
var (
reservedCharacterPattern *regexp.Regexp = regexp.MustCompile(`[^\w\s/]`)
reservedCharacterPattern *regexp.Regexp = regexp.MustCompile(`[\\.\+*?()\[\]{}^$|#]`)
Copy link

Copilot AI Oct 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The backslash in the character class should not be escaped. In Go regex, within a character class [], a literal backslash only needs to appear once. The pattern [\\.\+*?()\[\]{}^$|#] attempts to escape the backslash itself, which may not match literal backslashes correctly. Change to [\\.+*?()\[\]{}^$|#] (single backslash, no escape for +).

Suggested change
reservedCharacterPattern *regexp.Regexp = regexp.MustCompile(`[\\.\+*?()\[\]{}^$|#]`)
reservedCharacterPattern *regexp.Regexp = regexp.MustCompile(`[\\.+*?()\[\]{}^$|#]`)

Copilot uses AI. Check for mistakes.
// so we only escape characters that have special meaning in regex.
var (
reservedCharacterPattern *regexp.Regexp = regexp.MustCompile(`[^\w\s/]`)
reservedCharacterPattern *regexp.Regexp = regexp.MustCompile(`[\\.\+*?()\[\]{}^$|#]`)
Copy link

Copilot AI Oct 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The + character does not need to be escaped inside a character class. In Go regex character classes, + is treated as a literal character. Change \+ to + for consistency and correctness.

Suggested change
reservedCharacterPattern *regexp.Regexp = regexp.MustCompile(`[\\.\+*?()\[\]{}^$|#]`)
reservedCharacterPattern *regexp.Regexp = regexp.MustCompile(`[\\.+*?()\[\]{}^$|#]`)

Copilot uses AI. Check for mistakes.
@jakebailey
Copy link
Member

I'm not sure I understand; if this were panicking wouldn't everything be broken right now?

@camc314
Copy link
Contributor Author

camc314 commented Oct 24, 2025

I'm not sure I understand; if this were panicking wouldn't everything be broken right now?

sorry should've clarified in the PR description, apparently this is only an issue when you've got non-ascii characters inside the path on windows machines.

panic: regexp: Compile(`(?i)^((C\:/Users/TobiasL\ægreid/dev/app/frontend/packages/react/src(/.+?)?/[^/]*\.test\.ts[^/]*)|(C\:/Users/TobiasL\ægreid/dev/app/frontend/packages/react/src(/.+?)?/[^/]*\.stories\.ts[^/]*)|(C\:/Users/TobiasL\ægreid/dev/app/frontend/packages/react/src(/.+?)?/[^/]*\.mdx))($|/)`): error parsing regexp: invalid escape sequence: `\æ`
goroutine 299 [running]:
regexp.MustCompile({0xc0000b0840, 0x151})
        /opt/hostedtoolcache/go/1.25.0/x64/src/regexp/regexp.go:313 +0xb4
github.com/microsoft/typescript-go/internal/tsoptions.getWildcardDirectories({0xc000500f70, 0x1, 0x70?}, {0xc0000a9840?, 0xc000118900?, 0x70?}, {0x28?, {0xc000118c60?, 0x7ff62e770331?}})

@camc314 camc314 changed the title fix: panic in regexp.MustCompile when building wildcarddirectories fix: panic in regexp.MustCompile when building wildcarddirectories with non ascii characters Oct 24, 2025
@jakebailey
Copy link
Member

Ah, this is oxc-project/tsgolint#318. I didn't realize you meant a knock-on effect from a later regex call.

// It may be inefficient (we could just match (/[-[\]{}()*+?.,\\^$|#\s]/g), but this is future
// proof.
// Reserved characters - only escape actual regex metacharacters.
// Go's regexp doesn't support \x escape sequences for arbitrary characters,
Copy link
Member

@jakebailey jakebailey Oct 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We shouldn't be using regexp at all; does switching to regexp2 fix this?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I don't understand why getWildcardDirectories uses regexp instead of regexp2.

(I would rather we not use regexp/regexp2 at all in the codebase but that's another problem.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i would guess it's still a problem with regexp2.

I remove this regex in favor of a simple for loop though 6bf5543

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It shouldn't be, given regexp2 has a mode that emulates ECMAScript regexes?

If we are going to use regexp and not regexp2, then we can probably just use regexp.QuoteMeta.

@jakebailey
Copy link
Member

The latest commit is not exactly what I mean; I wasn't trying to say to continue to change the escaping, but rather fix getWildcardDirectories to not use Go's regexp package, given the original code was more or less trying to emulate ECMAScript regexes which can be done with regexp2.

I feel somewhat less confident about hand escaping regexes like this unless we can be sure that those characters are definitely all that matter?

Otherwise, we'd just use https://pkg.go.dev/regexp#QuoteMeta, right?

@camc314 camc314 force-pushed the c/fix-panic-when-compiling-regex branch from 136d7fe to 853ac9d Compare October 25, 2025 11:49
@camc314
Copy link
Contributor Author

camc314 commented Oct 25, 2025

The latest commit is not exactly what I mean; I wasn't trying to say to continue to change the escaping, but rather fix getWildcardDirectories to not use Go's regexp package, given the original code was more or less trying to emulate ECMAScript regexes which can be done with regexp2.

I feel somewhat less confident about hand escaping regexes like this unless we can be sure that those characters are definitely all that matter?

Otherwise, we'd just use https://pkg.go.dev/regexp#QuoteMeta, right?

Ah i see. i reverted that commit. The problem is that GetRegularExpressionForWildcard combines multiple strings, so it has to be escaped, hence we can't just omit the regex escaping change. Regardless, i changed wildcarddirectories to use regexp2.

Hopefully I understood that correctly 🙂

@jakebailey
Copy link
Member

Why does Strada not need the escaping change, though?

// so we only escape characters that have special meaning in regex.
var (
reservedCharacterPattern *regexp.Regexp = regexp.MustCompile(`[^\w\s/]`)
reservedCharacterPattern *regexp.Regexp = regexp.MustCompile(`[\\.\+*?()\[\]{}^$|#]`)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The goal of suggesting regex2 was to avoid making this particular change. If we still want to restrict the characters, it makes me think we should just use QuoteMeta and not change anything else.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants