-
Notifications
You must be signed in to change notification settings - Fork 0
Parse whitespaces #21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| extras: $ => [ | ||
| // This allows escaping newlines everywhere, although this is only valid in | ||
| // preprocessor statements | ||
| /\s|\\\r?\n/, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately, this regex can't be removed without breaking preprocessor support!
| alias($.preproc_call_expression, $.call_expression), | ||
| )), | ||
| /\r?\n/, | ||
| $._external_end_of_statement, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With the addition of $.whitespace as an external rule, this change becomes necessary. Otherwise, tree-sitter-fortran fails to recognize newline characters, since they may be consumed silently by the external scanner. The issue is that the external scanner reads tokens directly from the input buffer without checking valid_symbols, which is a bug that should be reported upstream. This behavior has made proper whitespace parsing extremely difficult.
alvrogd
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After testing with several open-source Fortran projects, it looks like these changes do not introduce any syntax regressions in the grammar:
Awesome! After looking at the tests in this PR, I understand you have observed in general that the rest of the parsing tree always stays the same.
Thanks also for the clarifications on the grammar changes. A lot of hours must have been put into this PR; it's a lot of rationale to unpack for each change, so the comments really help to follow it.
ruifm
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Live reviewed, overal LGTM, great job!
- Patch #3 did not pass the internal testsuite - Patches #20 and stadelmanma#22 were merged together
5415a9e to
0e770a5
Compare
jgonzac
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM after live review, except the optional($.whitespace) workaround but it looks like it works.
Good job anyway!
| // HACK: This `optional($.whitespace)` is a workaround to prevent | ||
| // `tree-sitter` from misinterpreting `#define FOO (bar * baz)` as a | ||
| // `preproc_function_def`. For reasons related to how `token.immediate()` | ||
| // detects adjacent tokens, explicitly naming the whitespace rule causes | ||
| // the parser to think the opening parenthesis after `FOO` is immediately | ||
| // following it, which makes it match the function-like form instead of | ||
| // the object-like one. Adding an optional `whitespace` node here breaks | ||
| // that ambiguity and forces `tree-sitter` to correctly parse it as a | ||
| // regular `preproc_def` | ||
| optional($.whitespace), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand yet how this can disambiguate the rules 🤷🏼♂️
This PR adds whitespace parsing support to
tree-sitter-fortran. This PR supersedes #19.After testing with several open-source Fortran projects, it looks like these changes do not introduce any syntax regressions in the grammar:
codeefeature/ParseWhitespaces