Skip to content

fix: avoid quadratic re-scan of comments after a value#1689

Merged
baylesj merged 2 commits into
open-source-parsers:masterfrom
hjanuschka:fix-quadratic-comment-rescan
Jun 14, 2026
Merged

fix: avoid quadratic re-scan of comments after a value#1689
baylesj merged 2 commits into
open-source-parsers:masterfrom
hjanuschka:fix-quadratic-comment-rescan

Conversation

@hjanuschka

@hjanuschka hjanuschka commented Jun 13, 2026

Copy link
Copy Markdown
Contributor

OurReader::readComment() decides whether a comment attaches to the previous value (commentAfterOnSameLine) by scanning from the end of that value to the comment via containsNewLine(lastValueEnd_, commentBegin). lastValueEnd_ only advances when a new value is read, so a long run of comments after a value (error recovery, or a value followed by many comments) re-scans the same growing prefix for every comment -- O(n^2).

A comment can only be on the same line as the last value when no newline separates them, and that gap only grows as more comments are consumed, so it never needs re-examining after the first comment. Setting lastValueHasAComment_ after the first comment makes the rest skip the scan. Output is unchanged.

Found via a jsoncpp_fuzzer timeout (Chromium issue 521541633): a 400KB input scanned 2.24 GB across 8384 containsNewLine calls and took ~18s. After the fix: 122 KB / 1 call, ~56ms.

Adds CharReaderTest/parseCommentsAfterValueNotQuadratic (a value followed by many trailing comments, bounded parse time): fails multi-second without the fix, passes in ms with it. Existing tests unaffected.

hjanuschka and others added 2 commits June 13, 2026 14:30
OurReader::readComment() decides whether a comment should be attached to
the previous value (commentAfterOnSameLine) by scanning the input from the
end of that value up to the comment with containsNewLine(). lastValueEnd_
only advances when a new value is read, so a long run of comments after a
value (e.g. during error recovery, or a value followed by many comments)
made every comment re-scan the same growing prefix, giving O(n^2) parse
time. A jsoncpp_fuzzer testcase took ~18s for a 400KB input.

A comment can only ever be on the same line as the last value if no
newline separates them, and the gap to inspect only grows as further
comments are consumed, so once the gap has been examined for the first
comment it never needs to be examined again. Mark lastValueHasAComment_
after the first comment following a value so subsequent comments skip the
scan. Parsing the testcase drops from ~18s to ~56ms with identical output.

Add a regression test that parses a value followed by a large number of
trailing comments and requires it to complete well under a generous time
bound.
Replace the wall-clock bound in the comment regression test with a
direct, deterministic assertion on work done. The parse output is
identical with and without the fix, so the only observable difference is
how much the parser scans; a time bound is also flaky under
valgrind/sanitizers/loaded CI.

Add an instrumentation counter for the bytes examined by
OurReader::containsNewLine, exposed via a JSON_API seam, and assert it
stays linear in the input (scanned < 4 * doc.size()) rather than
O(comments * gap). The counter is thread_local (no race during
concurrent parsing) and the increment is negligible, running only while
parsing comments. It is compiled unconditionally because the ABI
compatibility job builds the test suite against a separately-installed
Release library, so the symbol must exist there. Rename the test to
parseCommentsAfterValueScansLinearly to describe what it checks, and
link crbug.com/521541633.

Verified: the test fails when the fix is reverted and passes with it, in
Debug and Release, and the seam links against a Release-installed shared
library (the ABI compatibility scenario).
@baylesj

baylesj commented Jun 14, 2026

Copy link
Copy Markdown
Contributor

The production changes are good, but I'm reluctant to land a non-deterministic timer based test. Updating the test format to be based on execution count instead of a potentially flaky clock timer, and then landing this change.

Thank you for your CL!

@baylesj baylesj force-pushed the fix-quadratic-comment-rescan branch from 1eeb764 to d0c03d1 Compare June 14, 2026 03:11
@baylesj baylesj merged commit 8519b83 into open-source-parsers:master Jun 14, 2026
23 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants