Skip to content

(opt): keep syntax node ids stable across reparses#10105

Merged
eytan-starkware merged 1 commit into
mainfrom
eytan_graphite/syntax-node-canonical-source
Jun 28, 2026
Merged

(opt): keep syntax node ids stable across reparses#10105
eytan-starkware merged 1 commit into
mainfrom
eytan_graphite/syntax-node-canonical-source

Conversation

@eytan-starkware

@eytan-starkware eytan-starkware commented Jun 15, 2026

Copy link
Copy Markdown
Contributor

Summary

Introduces a distinction between canonical and detached syntax tree roots to make stable pointer node IDs stable across file edits.

Previously, all root nodes were created via SyntaxNode::new_root / SyntaxNode::new_root_with_offset, both backed by a green-keyed tracked function (new_root_node). Because the key included the green tree, any content change minted fresh node IDs, preventing salsa early cutoff from firing downstream.

Now there are two explicit constructors:

  • SyntaxNode::new_canonical_root — called exclusively from the file_syntax_data parse query. The root's tracked struct is seeded by that file-keyed query, so reparsing changed content reuses the same node IDs. This is the only place a canonical root is created; all consumers must reach it via db.file_syntax(file_id).
  • SyntaxNode::new_detached_root / new_detached_root_with_offset — backed by the green-keyed new_detached_root_node query, for transient trees (proc-macro token streams, snippet formatting, tests) that have no per-file parse query to seed from.

The defs cache is updated to match: on-disk file roots are reconstructed at load time by re-running db.file_syntax (so cached stable ptrs land on canonical nodes), while virtual/plugin-generated file roots store their green tree and are rebuilt as detached roots.


Type of change

Please check one:

  • Bug fix (fixes incorrect behavior)
  • New feature
  • Performance improvement
  • Documentation change with concrete technical impact
  • Style, wording, formatting, or typo-only change

Why is this change needed?

Stable pointers and IDs derived from them are used throughout the compiler for incremental computation. When a file was edited, the green-keyed root constructor minted new node IDs even for unchanged subtrees, defeating salsa early cutoff and causing unnecessary recomputation. By seeding the canonical root from a file-keyed query, node IDs are reused across edits, restoring early cutoff behavior.


What was the behavior or documentation before?

All root nodes were created through a single green-keyed tracked function. Every reparse of a changed file produced new node IDs for the root and all its descendants, even if the content was structurally identical.


What is the behavior or documentation after?

The canonical root for each file is created once inside file_syntax_data via SyntaxNode::new_canonical_root. Reparsing a changed file reuses the same root node ID (salsa reruns the query but the tracked struct identity is preserved), allowing downstream queries that depend on stable ptrs to benefit from early cutoff. Detached roots (transient trees, macro helpers, tests) continue to use the green-keyed path and carry no stability guarantee.


Related issue or discussion (if any)


Additional context

The db_test comparison for the parser test was updated to compare green trees rather than node identity, since the parsed root (canonical) and the hand-built root (detached) are now distinct nodes even when their content is identical.

eytan-starkware commented Jun 15, 2026

Copy link
Copy Markdown
Contributor Author

@reviewable-StarkWare

Copy link
Copy Markdown

This change is Reviewable

@eytan-starkware eytan-starkware force-pushed the eytan_graphite/syntax-node-canonical-source branch from b65f1d0 to 025c2b9 Compare June 16, 2026 11:49
@eytan-starkware eytan-starkware marked this pull request as ready for review June 16, 2026 11:54
@cursor

cursor Bot commented Jun 16, 2026

Copy link
Copy Markdown

PR Summary

High Risk
Changes core syntax node identity and the defs-cache deserialization path; mistakes could break incremental recomputation or cache load correctness across the compiler.

Overview
Splits syntax tree root construction into canonical roots (stable node ids across file edits) and detached roots (standalone / green-keyed trees).

Canonical roots are created only in the file_syntax_data parse query via SyntaxNode::new_canonical_root, so reparsing after a content change reuses the same root node id and downstream salsa early cutoff can fire on stable pointers. Callers are expected to use db.file_syntax(file_id) instead of minting their own root.

Detached new_detached_root / new_detached_root_with_offset replace the old new_root APIs for proc-macro snippets, formatter token-tree parsing, tests, and similar transient parses. The parser adds parse_*_green helpers; file_syntax_data builds the canonical tree from green there.

The defs cache no longer serializes green trees for file roots: on load, roots are restored through db.file_syntax so cached stable ptrs align with canonical nodes (including plugin-generated external files). Lowering cache tests add cached_external_file_root_is_canonical and refactor cache setup through generate_cached_db.

Reviewed by Cursor Bugbot for commit b44c9c9. Bugbot is set up for automated code reviews on this repo. Configure here.

Comment thread crates/cairo-lang-defs/src/cache/mod.rs Outdated

@orizi orizi left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@orizi reviewed 5 files and all commit messages, and made 1 comment.
Reviewable status: 5 of 8 files reviewed, 2 unresolved discussions (waiting on eytan-starkware and TomerStarkware).


crates/cairo-lang-parser/src/parser.rs line 206 at r2 (raw file):

    /// Parses a file expr, returning the green root. See [Self::parse_file_green].
    pub fn parse_file_expr_green(

should some of the non _green fns be removed?

@eytan-starkware eytan-starkware force-pushed the eytan_graphite/syntax-node-canonical-source branch from 025c2b9 to e2bb34f Compare June 17, 2026 06:48

@eytan-starkware eytan-starkware left a comment

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@eytan-starkware made 1 comment.
Reviewable status: 5 of 9 files reviewed, 2 unresolved discussions (waiting on orizi and TomerStarkware).


crates/cairo-lang-parser/src/parser.rs line 206 at r2 (raw file):

Previously, orizi wrote…

should some of the non _green fns be removed?

Yes. Done. Also changed a util to rely on the query so it wont use a detached root.

@orizi orizi left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@orizi reviewed 3 files and all commit messages, made 1 comment, and resolved 1 discussion.
Reviewable status: 8 of 9 files reviewed, 2 unresolved discussions (waiting on eytan-starkware and TomerStarkware).


crates/cairo-lang-defs/src/cache/mod.rs line 1645 at r3 (raw file):

    /// A file whose content is not re-derivable at load time (virtual / plugin-generated): its
    /// green is stored and the root is rebuilt detached, standing on its own.
    // TODO: give plugin-generated code a canonical source too, so this variant can be removed and

should be solved or have an assignee.

@eytan-starkware eytan-starkware force-pushed the eytan_graphite/syntax-node-canonical-source branch from e2bb34f to 061e908 Compare June 17, 2026 08:30

@eytan-starkware eytan-starkware left a comment

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@eytan-starkware made 1 comment.
Reviewable status: 8 of 9 files reviewed, 2 unresolved discussions (waiting on orizi and TomerStarkware).


crates/cairo-lang-defs/src/cache/mod.rs line 1645 at r3 (raw file):

Previously, orizi wrote…

should be solved or have an assignee.

Done.

@cursor cursor Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, have a team admin enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit 061e908. Configure here.

Comment thread crates/cairo-lang-defs/src/cache/mod.rs Outdated

@orizi orizi left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@orizi reviewed all commit messages, made 1 comment, and resolved 1 discussion.
Reviewable status: 8 of 9 files reviewed, 2 unresolved discussions (waiting on eytan-starkware and TomerStarkware).


crates/cairo-lang-defs/src/cache/mod.rs at r4 (raw file):
awaiting updates discussed f2f.

@eytan-starkware eytan-starkware changed the base branch from main to graphite-base/10105 June 24, 2026 12:02
@eytan-starkware eytan-starkware force-pushed the eytan_graphite/syntax-node-canonical-source branch from 061e908 to b2a1d57 Compare June 24, 2026 12:02
@eytan-starkware eytan-starkware changed the base branch from graphite-base/10105 to eytan_graphite/support-external-file-cache June 24, 2026 12:02

@orizi orizi left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@orizi reviewed 6 files and all commit messages.
Reviewable status: all files reviewed, 1 unresolved discussion (waiting on eytan-starkware and TomerStarkware).

@eytan-starkware eytan-starkware force-pushed the eytan_graphite/support-external-file-cache branch from 1334cde to 9cb7a3d Compare June 25, 2026 10:48
@eytan-starkware eytan-starkware force-pushed the eytan_graphite/syntax-node-canonical-source branch from b2a1d57 to 783ecae Compare June 25, 2026 10:48
@eytan-starkware eytan-starkware force-pushed the eytan_graphite/syntax-node-canonical-source branch from 783ecae to 34274a7 Compare June 25, 2026 11:09
@eytan-starkware eytan-starkware force-pushed the eytan_graphite/support-external-file-cache branch from 9cb7a3d to 329a5a8 Compare June 25, 2026 11:09
@eytan-starkware eytan-starkware force-pushed the eytan_graphite/syntax-node-canonical-source branch from 34274a7 to d40ef8c Compare June 25, 2026 11:35
@eytan-starkware eytan-starkware force-pushed the eytan_graphite/support-external-file-cache branch from 329a5a8 to 3f5bf0b Compare June 25, 2026 11:35

@eytan-starkware eytan-starkware left a comment

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@eytan-starkware made 1 comment.
Reviewable status: 9 of 12 files reviewed, 1 unresolved discussion (waiting on orizi and TomerStarkware).


crates/cairo-lang-defs/src/cache/mod.rs at r4 (raw file):

Previously, orizi wrote…

awaiting updates discussed f2f.

Done.

@eytan-starkware eytan-starkware force-pushed the eytan_graphite/support-external-file-cache branch from 3f5bf0b to c9b0603 Compare June 28, 2026 10:27
@eytan-starkware eytan-starkware force-pushed the eytan_graphite/syntax-node-canonical-source branch 2 times, most recently from 19ba634 to b16c4fc Compare June 28, 2026 10:56
@eytan-starkware eytan-starkware force-pushed the eytan_graphite/support-external-file-cache branch from c9b0603 to 3ce0717 Compare June 28, 2026 10:56
@graphite-app graphite-app Bot changed the base branch from eytan_graphite/support-external-file-cache to graphite-base/10105 June 28, 2026 11:19
Seed the canonical per-file root from the file-keyed parse query (file_syntax) instead of a
green-keyed detached root, so reparsing a changed file reuses the previous node ids — early
cutoff downstream relies on stable stable-ptrs. Split root construction into a canonical
constructor (file_syntax only) and a detached one (for standalone/transient trees).

The defs cache now records each root by file id alone and re-derives the tree via file_syntax
on load, unifying cached stable ptrs with the live tree. External (plugin-generated) files
re-derive the same way; their content is served from the cache blob (see the external-file
content cache below this in the stack), so re-deriving never cycles back into module data.
@eytan-starkware eytan-starkware force-pushed the eytan_graphite/syntax-node-canonical-source branch from b16c4fc to b44c9c9 Compare June 28, 2026 13:57
@eytan-starkware eytan-starkware changed the base branch from graphite-base/10105 to main June 28, 2026 13:57

@orizi orizi left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

@orizi reviewed 7 files and all commit messages, made 1 comment, and resolved 1 discussion.
Reviewable status: :shipit: complete! all files reviewed, all discussions resolved (waiting on TomerStarkware).

@eytan-starkware eytan-starkware added this pull request to the merge queue Jun 28, 2026
Merged via the queue into main with commit 55eae8a Jun 28, 2026
106 checks passed
@eytan-starkware eytan-starkware deleted the eytan_graphite/syntax-node-canonical-source branch June 28, 2026 16:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants