Skip to content

Conversation

@inaki-amatria
Copy link
Member

This PR adds whitespace parsing support to tree-sitter-fortran. To make this work, I merged preproc_def and preproc_function_def, since keeping them separate caused parsing conflicts. Given that macro support is already limited, losing the distinction between these two nodes should be acceptable for now.

@inaki-amatria inaki-amatria requested a review from a team October 13, 2025 11:26
@inaki-amatria inaki-amatria self-assigned this Oct 13, 2025
@inaki-amatria inaki-amatria requested review from alvrogd and jgonzac and removed request for a team October 13, 2025 11:26
@inaki-amatria inaki-amatria force-pushed the feature/ParseWhitespaces branch from a156383 to 59e6ad5 Compare October 13, 2025 11:29
Copy link

@jgonzac jgonzac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this really works, it's a big step!

// This allows escaping newlines everywhere, although this is only valid in
// preprocessor statements
/\s|\\\r?\n/,
$.whitespace,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you know how this interferes with the \s above, which also includes whitespace?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

whitespace does not consume all the tokens \s consumes, so the rule is still required

Copy link

@jgonzac jgonzac Oct 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean, could it be some whitespaces in \s that should be handled before with$.whitespace? I suppose not to but, just to know if that can happens.

Copy link

@jgonzac jgonzac Oct 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would try replacing\s with [\f\t\n\v\r] and see if something changes

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

whitespace does not consume all the tokens \s consumes, so the rule is still required

I'm not sure the kind of ambiguity this suggests... Why not? Why are both required?

@inaki-amatria inaki-amatria force-pushed the feature/ParseWhitespaces branch from 59e6ad5 to 7c934b0 Compare October 13, 2025 14:41
@inaki-amatria inaki-amatria marked this pull request as ready for review October 15, 2025 14:28
Copy link

@daniel-otero daniel-otero left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we considering upstreaming some of this? Maybe starting with a hidden white-space node could be a good middle point.

Otherwise, this looks like a big departure from upstream.

Comment on lines 163 to 180

preproc_function_def: $ => seq(
preprocessor('define'),
field('name', $.identifier),
field('parameters', $.preproc_params),
field('value', optional($.preproc_arg)),
token.immediate(/\r?\n/),
),

preproc_params: $ => seq(
token.immediate('('), commaSep(choice($.identifier, '...')), ')',
),

preproc_call: $ => seq(

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More than merge, I would say simply remove, right? I don't see how the merge is happening.

// This allows escaping newlines everywhere, although this is only valid in
// preprocessor statements
/\s|\\\r?\n/,
$.whitespace,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

whitespace does not consume all the tokens \s consumes, so the rule is still required

I'm not sure the kind of ambiguity this suggests... Why not? Why are both required?

src/scanner.c Outdated
Comment on lines 455 to 483
static bool scan_whitespace(Scanner *scanner, TSLexer *lexer,
bool lex_whitespace) {
if (!iswblank(lexer->lookahead)) {
return false;
}

while (iswblank(lexer->lookahead)) {
if (lex_whitespace) {
advance(lexer);
continue;
}
skip(lexer);
}

if (!lex_whitespace) {
return false;
}

lexer->mark_end(lexer);
lexer->result_symbol = WHITESPACE;

return true;
}

static bool scan(Scanner *scanner, TSLexer *lexer, const bool *valid_symbols) {
// Consume any leading whitespace except newlines
if (scan_whitespace(scanner, lexer, valid_symbols[WHITESPACE])) {
return true;
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this needed, instead of some simple grammar rule? Seems like something you are not telling here.

@inaki-amatria inaki-amatria force-pushed the feature/ParseWhitespaces branch from 7c934b0 to e946912 Compare October 16, 2025 13:00
@inaki-amatria inaki-amatria force-pushed the feature/ParseWhitespaces branch from e946912 to 202d895 Compare October 16, 2025 13:01
@inaki-amatria
Copy link
Member Author

I will close this PR since the approach has changed significantly. We can continue the discussion in a new PR, focusing the review on the updated implementation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants