Skip to content

feat: hand written parser#40717

Draft
amitkumarashutosh wants to merge 21 commits into
RocketChat:developfrom
amitkumarashutosh:feat/hand-written-parser
Draft

feat: hand written parser#40717
amitkumarashutosh wants to merge 21 commits into
RocketChat:developfrom
amitkumarashutosh:feat/hand-written-parser

Conversation

@amitkumarashutosh

@amitkumarashutosh amitkumarashutosh commented May 28, 2026

Copy link
Copy Markdown
Contributor

Handwritten Parser Foundation

This PR introduces a handwritten TypeScript parser implementation to replace the
PeggyJS-generated parser, building it up incrementally while maintaining AST
compatibility with the existing grammar.

What's Added

  • parser.ts — core handwritten parser with block dispatch and inline parsing loop
  • scanner.ts — cursor abstraction with position() / backtrack() for backtracking
  • chars.ts — character classification helpers (isAlpha, isDigit, isPlainChar, etc.) and the emoticon table
  • Reused existing AST utility helpers from utils.ts for node generation, and types from definitions.ts
  • Preserved the existing public API and options structure in index.ts

Parsing Features Implemented

  • Plain paragraph parsing — wraps each line in a paragraph node
  • Line break detection — blank lines emit lineBreak nodes; trailing newlines are swallowed
  • Escape sequences — \*, \_, \~, \`, \#, \.
  • Inline code — code with no markup parsing inside
  • Bold — *text* and **text** with whitespace-only and triple-asterisk edge cases
  • Italic — _text_ and __text__ with word-boundary guards to protect snake_case
  • Strikethrough — ~text~ and ~~text~~ with whitespace-only and triple-tilde edge cases
  • Re-entrancy guards (skipBold, skipItalic, skipStrike) mirroring PeggyJS skip flags
  • Heading — # through #### prefix with required space, inline content inside
  • Code fence — ``` with optional language tag, raw content (no markup inside)
  • User mention — @username supporting special characters like . - _ : @ in names
  • Channel mention — #channel with word-boundary guard
  • Blockquote — > prefixed lines with inline markup support inside
  • Inline and block spoiler — ||text|| inline and multi-line ||\n...\n|| block support
  • Markdown and angle bracket link — [title](url) and <url|title>
  • Unordered and ordered list — -/* and 1. markers with inline content per item; asterisk bullets respect the grammar's trailing-* guard (so * *, * Hello* stay inline)
  • KaTeX — block and inline parsing under the dollar-sign and parenthesis syntax options
  • Auto-link URL — bare URL / domain detection, TLD-validated via tldts
  • Emoji shortcode — :emoji: parsing, plus big-emoji handling for shortcode-only messages
  • Email autolinking — local@domainmailto: link, with TLD validation and mailto: prefix parts
  • Phone autolinking — +numbertel: link via phoneChecker, with prefix/grouping variants
  • Timestamp — <t:...> supporting Unixtime, ISO-8601 (with/without milliseconds and timezone), relative HH:MM[:SS] times, and the t/T/d/D/f/F/R format specifiers
  • Emoticon — :), :D, etc. via a symbol→shortcode table with longest-match and word-boundary dispatch, integrated into big-emoji detection

Test Progress

  • Passing Tests: 573/634

Goal

  • This PR lays the groundwork for the High-Performance Message Parser Rewrite project.
  • Future commits will continue implementing parsing behavior while maintaining AST compatibility with the existing PeggyJS parser and working toward full test-suite parity.

Notes

  • No user-facing behavior changes are intended yet.
  • This is an infrastructure/foundation step toward a complete handwritten parser implementation.

@dionisio-bot

dionisio-bot Bot commented May 28, 2026

Copy link
Copy Markdown
Contributor

Looks like this PR is not ready to merge, because of the following issues:

  • This PR is missing the 'stat: QA assured' label
  • This PR is missing the required milestone or project

Please fix the issues and try again

If you have any trouble, please check the PR guidelines

@changeset-bot

changeset-bot Bot commented May 28, 2026

Copy link
Copy Markdown

⚠️ No Changeset found

Latest commit: bfb7b12

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

@coderabbitai

coderabbitai Bot commented May 28, 2026

Copy link
Copy Markdown
Contributor

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: ae37847e-ff8c-4d1c-925e-fb0c2b15c439

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

@ahmed-n-abdeltwab ahmed-n-abdeltwab Jun 8, 2026

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @amitkumarashutosh , good work getting this done. The class is clean and readable overall. I do have a few thoughts on the OOP design. The core issue is that methods like advance(), save(), restore(), and pos being public are exposing the internal mechanics of how the scanner works, rather than what the scanner does from the caller's perspective. This breaks encapsulation — callers shouldn't need to know or care how the scanner moves through the input internally. Let me use an example. Imagine you order a car from Factory A. Its public interface is start(), drive(), brake(), and getSpeed(). Now Factory B delivers a car with those same methods plus increaseSpeed(), decreaseSpeed(), and setRPM(). Most people would prefer Factory A's car — not because Factory B's car is broken, but because Factory A made a decision about what the driver should and shouldn't need to think about. The engine is managing speed under the hood, the driver just presses the pedal. Factory B is forcing the driver to manage internals that should be invisible, which makes the experience more complex and error-prone. Expose too much, and the driver causes a crash not because they were reckless, but because the interface invited the mistake. The same applies here — the Scanner class is our car, and the caller is our driver. advance(n) is a raw cursor movement that exposes internal position management directly to the caller. Instead, the scanner should move itself forward as a side effect of higher-level operations. Methods like consume(), consumeWhile(predicate), or expect(literal) would advance internally and return meaningful results — the caller should be asking what to consume, not how many characters to move. save() and restore() expose the backtracking mechanism as a manual two-step process. This is fragile because a caller can save and forget to restore. A pattern like tryConsume(literal) or a scoped attempt(() => ...) callback would hide the save/restore cycle entirely and make misuse much harder. pos being public lets anyone outside the class read and reason about the raw cursor index. Position is an implementation detail. If the caller needs location info for something like error reporting, a dedicated method like getLocation() returning something meaningful would serve that need without leaking the internals. sliceFrom(savedPos) requires the caller to hold a raw position integer and pass it back in, which ties the caller directly to internal state. A mark() method returning an opaque token, paired with getSliceSinceMarked(token), would achieve the same result without exposing the position type at all.

A well-designed class should make the right usage easy and the wrong usage hard. Right now the public API of Scanner requires callers to know a lot about how the internals work in order to use it correctly. The goal is to push that knowledge inward — hide the cursor, hide the backtracking, and expose only the operations that make sense at the scanning level.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it

Comment thread packages/message-parser/src/parser.ts
Comment thread packages/message-parser/src/parser.ts Outdated
Comment on lines +1525 to +1527
if (scanner.matches('mailto:')) {
scanner.advance(7);
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like a magic number slipped through here! Could you please fix this one and double-check the rest of the code for any others?

@amitkumarashutosh amitkumarashutosh force-pushed the feat/hand-written-parser branch from 37e61da to c673fba Compare June 15, 2026 09:42
@ahmed-n-abdeltwab

Copy link
Copy Markdown
Contributor

Hey @amitkumarashutosh , I'm getting a build failure in message-parser due to an unused variable (peekPos). Have you run into this error? or it's just me

@amitkumarashutosh amitkumarashutosh force-pushed the feat/hand-written-parser branch from c673fba to 65fd782 Compare June 15, 2026 11:40
@amitkumarashutosh

Copy link
Copy Markdown
Contributor Author

Hey @amitkumarashutosh , I'm getting a build failure in message-parser due to an unused variable (peekPos). Have you run into this error? or it's just me

@ahmed-n-abdeltwab, I resolved the unused variable (peekPos) issue. The build is working now.

@ahmed-n-abdeltwab

Copy link
Copy Markdown
Contributor

Hey @amitkumarashutosh , I'm getting a build failure in message-parser due to an unused variable (peekPos). Have you run into this error? or it's just me

@ahmed-n-abdeltwab, I resolved the unused variable (peekPos) issue. The build is working now.

Thanks. It's always a good idea to double-check everything before pushing a commit. It's okay to make these kinds of mistakes; I do it all the time and they're hard to catch. That's why I usually run the build to ensure everything compiles properly, and then run the tests and lint checks to make sure I don't mess up the code or the project. It's like evolving a third eye that's looking out for those mistakes! :)

@amitkumarashutosh amitkumarashutosh force-pushed the feat/hand-written-parser branch from ac46e14 to c67061d Compare June 18, 2026 02:28
Comment on lines +744 to +746
if (scanner.matches('mailto:')) {
scanner.consume(7);
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here is a magic number

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants