Parse whitespaces #18

inaki-amatria · 2025-10-13T11:26:35Z

This PR adds whitespace parsing support to tree-sitter-fortran. To make this work, I merged preproc_def and preproc_function_def, since keeping them separate caused parsing conflicts. Given that macro support is already limited, losing the distinction between these two nodes should be acceptable for now.

jgonzac

If this really works, it's a big step!

src/scanner.c

jgonzac · 2025-10-13T13:18:28Z

grammar.js

    // This allows escaping newlines everywhere, although this is only valid in
    // preprocessor statements
    /\s|\\\r?\n/,
+    $.whitespace,


Do you know how this interferes with the \s above, which also includes whitespace?

whitespace does not consume all the tokens \s consumes, so the rule is still required

I mean, could it be some whitespaces in \s that should be handled before with$.whitespace? I suppose not to but, just to know if that can happens.

I would try replacing\s with [\f\t\n\v\r] and see if something changes

whitespace does not consume all the tokens \s consumes, so the rule is still required

I'm not sure the kind of ambiguity this suggests... Why not? Why are both required?

daniel-otero

Are we considering upstreaming some of this? Maybe starting with a hidden white-space node could be a good middle point.

Otherwise, this looks like a big departure from upstream.

daniel-otero · 2025-10-15T16:31:28Z

grammar.js


-    preproc_function_def: $ => seq(
-      preprocessor('define'),
-      field('name', $.identifier),
-      field('parameters', $.preproc_params),
-      field('value', optional($.preproc_arg)),
-      token.immediate(/\r?\n/),
-    ),
-
-    preproc_params: $ => seq(
-      token.immediate('('), commaSep(choice($.identifier, '...')), ')',
-    ),
-
    preproc_call: $ => seq(


More than merge, I would say simply remove, right? I don't see how the merge is happening.

daniel-otero · 2025-10-15T16:33:09Z

grammar.js

    // This allows escaping newlines everywhere, although this is only valid in
    // preprocessor statements
    /\s|\\\r?\n/,
+    $.whitespace,


whitespace does not consume all the tokens \s consumes, so the rule is still required

I'm not sure the kind of ambiguity this suggests... Why not? Why are both required?

daniel-otero · 2025-10-15T16:34:35Z

src/scanner.c

+static bool scan_whitespace(Scanner *scanner, TSLexer *lexer,
+                            bool lex_whitespace) {
+    if (!iswblank(lexer->lookahead)) {
+        return false;
+    }
+
    while (iswblank(lexer->lookahead)) {
+        if (lex_whitespace) {
+            advance(lexer);
+            continue;
+        }
        skip(lexer);
    }

+    if (!lex_whitespace) {
+        return false;
+    }
+
+    lexer->mark_end(lexer);
+    lexer->result_symbol = WHITESPACE;
+
+    return true;
+}
+
+static bool scan(Scanner *scanner, TSLexer *lexer, const bool *valid_symbols) {
+    // Consume any leading whitespace except newlines
+    if (scan_whitespace(scanner, lexer, valid_symbols[WHITESPACE])) {
+        return true;
+    }


Why is this needed, instead of some simple grammar rule? Seems like something you are not telling here.

inaki-amatria · 2025-10-16T13:29:27Z

I will close this PR since the approach has changed significantly. We can continue the discussion in a new PR, focusing the review on the updated implementation.

inaki-amatria requested a review from a team October 13, 2025 11:26

inaki-amatria self-assigned this Oct 13, 2025

inaki-amatria requested review from alvrogd and jgonzac and removed request for a team October 13, 2025 11:26

inaki-amatria force-pushed the feature/ParseWhitespaces branch from a156383 to 59e6ad5 Compare October 13, 2025 11:29

jgonzac approved these changes Oct 13, 2025

View reviewed changes

inaki-amatria force-pushed the feature/ParseWhitespaces branch from 59e6ad5 to 7c934b0 Compare October 13, 2025 14:41

inaki-amatria marked this pull request as ready for review October 15, 2025 14:28

daniel-otero reviewed Oct 15, 2025

View reviewed changes

inaki-amatria force-pushed the feature/ParseWhitespaces branch from 7c934b0 to e946912 Compare October 16, 2025 13:00

Parse whitespaces

202d895

inaki-amatria force-pushed the feature/ParseWhitespaces branch from e946912 to 202d895 Compare October 16, 2025 13:01

inaki-amatria closed this Oct 16, 2025

inaki-amatria mentioned this pull request Oct 17, 2025

Parse whitespaces #19

Closed

Parse whitespaces #18

Parse whitespaces #18

Uh oh!

Conversation

inaki-amatria commented Oct 13, 2025

Uh oh!

jgonzac left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

jgonzac Oct 13, 2025

Choose a reason for hiding this comment

Uh oh!

inaki-amatria Oct 15, 2025

Choose a reason for hiding this comment

Uh oh!

jgonzac Oct 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jgonzac Oct 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

daniel-otero Oct 15, 2025

Choose a reason for hiding this comment

Uh oh!

daniel-otero left a comment

Choose a reason for hiding this comment

Uh oh!

daniel-otero Oct 15, 2025

Choose a reason for hiding this comment

Uh oh!

daniel-otero Oct 15, 2025

Choose a reason for hiding this comment

Uh oh!

daniel-otero Oct 15, 2025

Choose a reason for hiding this comment

Uh oh!

inaki-amatria commented Oct 16, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

jgonzac Oct 15, 2025 •

edited

Loading

jgonzac Oct 15, 2025 •

edited

Loading