perf: Make xml_add_child() append in constant time by default#479
Conversation
The default .where enumerated all existing children on every call, making repeated appends to the same node O(n^2). Default to .where = NULL (append after the last child) and short-circuit to libxml2's O(1) append without counting; for an explicit finite .where, count with xml_length() (in C) rather than materialising the children with length(xml_children()). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01SieDyiSrnYqRKyafhGHWuw
|
With the PR, the covr run shrinks to 6 minutes total, 2.5 minutes for covr: https://github.com/r-dbi/RSQLite/actions/runs/27885610548/job/82520936461?pr=479 |
|
@copilot does libxml2 have an efficient way to get the number of children from a node, rather than |
There was a problem hiding this comment.
Pull request overview
This PR changes xml_add_child()’s default insertion behavior so that repeated appends avoid an O(n) child enumeration on every call, making XML tree construction significantly faster in common “build up a node one child at a time” workloads (e.g., covr output generation).
Changes:
- Change
xml_add_child()’s default.wherefromlength(xml_children(.x))toNULLand use a fast append path when.whereisNULL(orInf). - Optimize explicit “append past the end” cases by using
xml_length(.x)(C-backed) instead of materializingxml_children(.x)in R. - Add a regression test for ordered appends and update documentation/NEWS to reflect the new default.
Reviewed changes
Copilot reviewed 3 out of 4 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
R/xml_modify.R |
Implements the new .where = NULL default and fast append logic; adjusts nodeset forwarding behavior. |
tests/testthat/test-xml_modify.R |
Adds coverage ensuring default appends preserve insertion order and .where past end appends. |
NEWS.md |
Notes the performance improvement and the new default semantics. |
man/xml_replace.Rd |
Updates the usage and .where parameter documentation to match the new default. |
Files not reviewed (1)
- man/xml_replace.Rd: Generated file
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| node <- create_node(.value, .parent = .x, .copy = .copy, ...) | ||
|
|
||
| if (.where == 0L) { | ||
| if (is.null(.where) || is.infinite(.where)) { | ||
| # Appending at the end is the common case, e.g. when building a document one |
|
This PR uses In a recent R: FALSE || c(TRUE, FALSE, TRUE)
#> Error in `FALSE || c(TRUE, FALSE, TRUE)`:
#> ! 'length = 3' in coercion to 'logical(1)'Created on 2026-06-21 with reprex v2.1.1 I suspect we'd need to establish argument checks at a large scale if we wanted to. Should we run a round of revdepchecks just for this PR, or only before the release? |
|
I guess internally libxml2 keeps the xml tree as a series of double-linked lists, so it has to iterate through the entire structure to find the length. I will merge it and trigger a revdep-check to see if there are any surprises. |
|
I did a revdepcheck and there are no new issues with this change. 👍 |
|
This is now on CRAN on version 1.6.0. |
…(#757)" This reverts commit 0742769.
…(#757)" This reverts commit 0742769.
|
Thanks, great! |
@jeroen: This is the cause for RSQLite with covr on GitHub Actions taking over 5 hours. Example run: https://github.com/r-dbi/RSQLite/actions/runs/27809553863
Cost driver: covr builds an XML tree by appending children. We can fix at the covr end, a fix here is likely to have a much more widespread impact. I can share a more detailed analysis and benchmarks if necessary.
Two independent Claude sessions came up with basically the same solution. The
.where = NULLdefault is mine, Claude proposed.where = Inf. Reviewed manually. Thexml_length()change only helps when a specific insertion point is passed. I'm not really sure about the logic of the "document" and "nodeset" methods, happy to add more tests.