Skip to content

perf: Make xml_add_child() append in constant time by default#479

Merged
jeroen merged 1 commit into
r-lib:mainfrom
krlmlr:claude/admiring-keller-nkmv37
Jun 22, 2026
Merged

perf: Make xml_add_child() append in constant time by default#479
jeroen merged 1 commit into
r-lib:mainfrom
krlmlr:claude/admiring-keller-nkmv37

Conversation

@krlmlr

@krlmlr krlmlr commented Jun 20, 2026

Copy link
Copy Markdown
Member

@jeroen: This is the cause for RSQLite with covr on GitHub Actions taking over 5 hours. Example run: https://github.com/r-dbi/RSQLite/actions/runs/27809553863

Cost driver: covr builds an XML tree by appending children. We can fix at the covr end, a fix here is likely to have a much more widespread impact. I can share a more detailed analysis and benchmarks if necessary.

Two independent Claude sessions came up with basically the same solution. The .where = NULL default is mine, Claude proposed .where = Inf . Reviewed manually. The xml_length() change only helps when a specific insertion point is passed. I'm not really sure about the logic of the "document" and "nodeset" methods, happy to add more tests.

The default .where enumerated all existing children on every call, making
repeated appends to the same node O(n^2). Default to .where = NULL (append
after the last child) and short-circuit to libxml2's O(1) append without
counting; for an explicit finite .where, count with xml_length() (in C)
rather than materialising the children with length(xml_children()).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01SieDyiSrnYqRKyafhGHWuw
@krlmlr

krlmlr commented Jun 21, 2026

Copy link
Copy Markdown
Member Author

With the PR, the covr run shrinks to 6 minutes total, 2.5 minutes for covr: https://github.com/r-dbi/RSQLite/actions/runs/27885610548/job/82520936461?pr=479

@jeroen

jeroen commented Jun 21, 2026

Copy link
Copy Markdown
Member

@copilot does libxml2 have an efficient way to get the number of children from a node, rather than length(xml_children(.x))?

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR changes xml_add_child()’s default insertion behavior so that repeated appends avoid an O(n) child enumeration on every call, making XML tree construction significantly faster in common “build up a node one child at a time” workloads (e.g., covr output generation).

Changes:

  • Change xml_add_child()’s default .where from length(xml_children(.x)) to NULL and use a fast append path when .where is NULL (or Inf).
  • Optimize explicit “append past the end” cases by using xml_length(.x) (C-backed) instead of materializing xml_children(.x) in R.
  • Add a regression test for ordered appends and update documentation/NEWS to reflect the new default.

Reviewed changes

Copilot reviewed 3 out of 4 changed files in this pull request and generated 1 comment.

File Description
R/xml_modify.R Implements the new .where = NULL default and fast append logic; adjusts nodeset forwarding behavior.
tests/testthat/test-xml_modify.R Adds coverage ensuring default appends preserve insertion order and .where past end appends.
NEWS.md Notes the performance improvement and the new default semantics.
man/xml_replace.Rd Updates the usage and .where parameter documentation to match the new default.
Files not reviewed (1)
  • man/xml_replace.Rd: Generated file

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread R/xml_modify.R
Comment on lines 204 to +207
node <- create_node(.value, .parent = .x, .copy = .copy, ...)

if (.where == 0L) {
if (is.null(.where) || is.infinite(.where)) {
# Appending at the end is the common case, e.g. when building a document one
@krlmlr

krlmlr commented Jun 21, 2026

Copy link
Copy Markdown
Member Author

This PR uses xml_length() . Claude claims there is no such way in libxml2, only head and tail can be accessed efficiently. Haven't verified that.

In a recent R:

FALSE || c(TRUE, FALSE, TRUE)
#> Error in `FALSE || c(TRUE, FALSE, TRUE)`:
#> ! 'length = 3' in coercion to 'logical(1)'

Created on 2026-06-21 with reprex v2.1.1

I suspect we'd need to establish argument checks at a large scale if we wanted to.

Should we run a round of revdepchecks just for this PR, or only before the release?

@jeroen

jeroen commented Jun 22, 2026

Copy link
Copy Markdown
Member

I guess internally libxml2 keeps the xml tree as a series of double-linked lists, so it has to iterate through the entire structure to find the length. I will merge it and trigger a revdep-check to see if there are any surprises.

@jeroen jeroen merged commit 07127e4 into r-lib:main Jun 22, 2026
23 checks passed
@krlmlr krlmlr deleted the claude/admiring-keller-nkmv37 branch June 22, 2026 08:30
@jeroen

jeroen commented Jun 22, 2026

Copy link
Copy Markdown
Member

I did a revdepcheck and there are no new issues with this change. 👍

@jeroen

jeroen commented Jun 22, 2026

Copy link
Copy Markdown
Member

This is now on CRAN on version 1.6.0.

krlmlr added a commit to r-dbi/RSQLite that referenced this pull request Jun 23, 2026
krlmlr added a commit to krlmlr/actions-sync that referenced this pull request Jun 23, 2026
krlmlr added a commit to krlmlr/actions-sync that referenced this pull request Jun 23, 2026
@krlmlr

krlmlr commented Jun 23, 2026

Copy link
Copy Markdown
Member Author

Thanks, great!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants