Skip to content

refactor(syntax): Group SyntaxKind and unify token/terminal kinds under LexemeKind.#10154

Draft
orizi wants to merge 1 commit into
mainfrom
orizi/06-23-refactor_syntax_group_syntaxkind_and_unify_token_terminal_kinds_under_lexemekind
Draft

refactor(syntax): Group SyntaxKind and unify token/terminal kinds under LexemeKind.#10154
orizi wants to merge 1 commit into
mainfrom
orizi/06-23-refactor_syntax_group_syntaxkind_and_unify_token_terminal_kinds_under_lexemekind

Conversation

@orizi

@orizi orizi commented Jun 23, 2026

Copy link
Copy Markdown
Collaborator

Summary

Restructures the generated SyntaxKind enum so that the token and terminal kinds are no longer two parallel, duplicated variant lists.

A terminal node and the token that backs it share one lexical identity, so both now reuse a single LexemeKind enum. Trivia tokens (whitespace, newlines, comments, skipped) move to TriviaKind, and enum missing-variants to MissingKind:

enum SyntaxKind {
    Token(LexemeKind),
    TriviaToken(TriviaKind),
    Terminal(LexemeKind),
    Missing(MissingKind),
    // ...other node kinds stay flat (ExprBinary, ExprList, ...)
}

The lexer/parser layer now operates on LexemeKind directly (LexerTerminal, peek().kind, Terminal::KIND, MissingToken, operator-precedence and skip-until predicates); SyntaxKind is reserved for green-tree node kinds. All kind enums are generated from the spec through a single classifier, so the lists can't drift apart again.

Debug/Display preserve the historical flat names (TerminalIdentifier, TokenWhitespace, ExprMissing), so diagnostics and golden tests are unaffected by the grouping. colored_printer::set_color now matches exhaustively over LexemeKind/ panic! catch-all (the long-standing "Can this be made exhaustive?"TODO).

Type of change

▎ None of the listed categories is "refactor"; this is an internal restructuring with no intended functional change (see below for the one user-visible diagnostic wording change).

  • Bug fix (fixes incorrect behavior)
  • New feature
  • Performance improvement
  • Documentation change with concrete technical impact
  • Style, wording, formatting, or typo-only change
  • Refactor / internal cleanup (no functional change)

Why is this change needed?

The flat SyntaxKind duplicated every lexical kind twice — once as TokenX and once as TerminalX (81 paired names) — with nothing tying the two lists together; they could (and did) drift. A terminal and its backing token are the sameould share one source of truth.
Separately, colored_printer::set_color was a non-exhaustive match ovpanic!, which crashed on perfectly valid token kinds it didn'tenumerate. Grouping the kinds lets that match be exhaustive over LexemeKind/TriviaKind, so the compiler now forces a coloring decision for any new token instead of allowing a latent panic.

What was the behavior or documentation before?

  • SyntaxKind was a flat enum with parallel Token* and Terminal* varintly from the spec.
  • colored_printer::set_color panicked (Unexpected syntax kind: ...) on token kinds missing from its match.
  • The "missing token" diagnostic rendered the internal kind name, e.tifier.

What is the behavior or documentation after?

  • SyntaxKind groups token/terminal/missing kinds into LexemeKind (shiaKind, and MissingKind; other node kinds stay flat. Thetoken/terminal duplication is gone.
  • set_color matches exhaustively; no catch-all panic.
  • The only user-visible change: that diagnostic now reads Missing token Identifier. (the LexemeKind name), which is friendlier than the old TerminalIdentifier.

Everything else (Debug/Display strings, all other diagnostics, golden test output) is unchanged.

Related issue or discussion (if any)

None.

Additional context

  • All three generated files (kind.rs, ast.rs, key_fields.rs) and the ~1000 hand-written call sites were migrated; the kind enums are emitted by the codegen, so re-running cargo run --bin generate-syntax reproduces them.
  • Verified: cargo build --workspace --all-targets clean, clippy -D warnings clean, rust_fmt.sh clean, and the parser / syntax / formatter / doc / semantic test suites pass (two parser golden files regenerated for the diagnostic-wording

@reviewable-StarkWare

Copy link
Copy Markdown

This change is Reviewable

orizi commented Jun 23, 2026

Copy link
Copy Markdown
Collaborator Author

This stack of pull requests is managed by Graphite. Learn more about stacking.

@TomerStarkware TomerStarkware left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@TomerStarkware reviewed 23 files and all commit messages, and made 2 comments.
Reviewable status: all files reviewed, 2 unresolved discussions (waiting on eytan-starkware and orizi).


crates/cairo-lang-syntax/src/node/kind.rs line 383 at r1 (raw file):

    }
    pub fn is_keyword_token(&self) -> bool {
        matches!(

if let SyntaxKind::Token(lexemeKind) && lexemeKind.is_keyword_token() {
true
} else {
false
}


crates/cairo-lang-syntax/src/node/kind.rs line 420 at r1 (raw file):

    }
    pub fn is_keyword_terminal(&self) -> bool {
        matches!(

f let SyntaxKind::Terminal(lexemeKind) && lexemeKind.is_keyword_token() {
true
} else {
false
}

@orizi orizi force-pushed the orizi/06-23-refactor_syntax_group_syntaxkind_and_unify_token_terminal_kinds_under_lexemekind branch from f20c3f9 to 2c76ca2 Compare June 28, 2026 12:29

@orizi orizi left a comment

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@orizi made 2 comments.
Reviewable status: 21 of 23 files reviewed, 2 unresolved discussions (waiting on eytan-starkware and TomerStarkware).


crates/cairo-lang-syntax/src/node/kind.rs line 383 at r1 (raw file):

Previously, TomerStarkware wrote…

if let SyntaxKind::Token(lexemeKind) && lexemeKind.is_keyword_token() {
true
} else {
false
}

Done.


crates/cairo-lang-syntax/src/node/kind.rs line 420 at r1 (raw file):

Previously, TomerStarkware wrote…

f let SyntaxKind::Terminal(lexemeKind) && lexemeKind.is_keyword_token() {
true
} else {
false
}

Done.

…er LexemeKind.

SyntaxKind's flat token/terminal/missing variants are grouped into nested enums. A terminal node and the token that backs it share one lexical identity, so both reuse a single `LexemeKind` instead of duplicated `TokenKind`/`TerminalKind` lists; trivia tokens (whitespace, newlines, comments, skipped) move to `TriviaKind`, and enum missing-variants to `MissingKind`:

    enum SyntaxKind {
        Token(LexemeKind),
        TriviaToken(TriviaKind),
        Terminal(LexemeKind),
        Missing(MissingKind),
        // ...other node kinds stay flat
    }

The parser/lexer layer now operates on `LexemeKind` directly (LexerTerminal, peek().kind, Terminal::KIND, MissingToken, operator-precedence and skip predicates), reserving SyntaxKind for green-tree node kinds. Debug/Display preserve the historical flat names (TerminalIdentifier, TokenWhitespace, ExprMissing) so diagnostics and golden tests are unaffected by the grouping; the lone visible change is the friendlier "Missing token Identifier." (was "TerminalIdentifier"), since kind_to_string now renders a LexemeKind.

colored_printer::set_color matches exhaustively over LexemeKind/TriviaKind, removing its panicking catch-all. All kind enums are generated from the spec via a single classifier, so the duplication can't drift.
@orizi orizi force-pushed the orizi/06-23-refactor_syntax_group_syntaxkind_and_unify_token_terminal_kinds_under_lexemekind branch from 2c76ca2 to 4df3abb Compare June 29, 2026 13:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants