diff --git a/_freeze/posts/2025-10-25-tidySummarizedExperiment-optimization/index/execute-results/html.json b/_freeze/posts/2025-10-25-tidySummarizedExperiment-optimization/index/execute-results/html.json
index 57a33a3b..1c92e125 100644
--- a/_freeze/posts/2025-10-25-tidySummarizedExperiment-optimization/index/execute-results/html.json
+++ b/_freeze/posts/2025-10-25-tidySummarizedExperiment-optimization/index/execute-results/html.json
@@ -1,8 +1,8 @@
 {
-  "hash": "7be71d376f53b732577c0b82cd833d8f",
+  "hash": "01ebaaa9fba5f733bbae5ac1ff9424dc",
   "result": {
     "engine": "knitr",
-    "markdown": "---\ntitle: \"Speeding up tidySummarizedExperiment through query optimisation and the plyxp backend\"\nauthor: \"Stefano Mangiola\"\ncontributors:\n  - Stefano Mangiola\n  - Michael Love\n  - Justin Landis\n  - Pierre-Paul Axisa\ndate: \"2025-10-25\"\npackage: tidySummarizedExperiment\ntags:\n  - tidyomics/tidyomicsBlog\n  - optimization\n  - performance\n  - plyxp\n  - SummarizedExperiment\n  - benchmarking\ndescription: \"Performance optimisation of tidySummarizedExperiment and related benchmark.\"\nimage: benchmark_plot.png\nformat:\n  html:\n    toc: true\n    toc-float: true\n    theme: yeti\n    css: ../../../styles.css\nexecute:\n  freeze: true\n---\n\n![tidySummarizedExperiment logo](logo.png){width=\"150px\" fig-align=\"left\"}\n\n*Contributors: Michael Love, Justin Landis, Pierre-Paul Axisa*\n\n\n::: {.cell}\n\n:::\n\n\n\nThe generality of [`tidySummarizedExperiment`](https://bioconductor.org/packages/tidySummarizedExperiment) makes it easy to interface with several [`tidyverse`](https://www.tidyverse.org/) packages (e.g. [`dplyr`](https://CRAN.R-project.org/package=dplyr), [`tidyr`](https://CRAN.R-project.org/package=tidyr), [`ggplot2`](https://CRAN.R-project.org/package=ggplot2), [`purrr`](https://CRAN.R-project.org/package=purrr), [`plotly`](https://CRAN.R-project.org/package=plotly)). This is possible thanks to its approach of converting [`SummarizedExperiment`](https://bioconductor.org/packages/SummarizedExperiment) objects to tibbles, performing operations, and converting back to the original format. This conversion process introduces substantial overhead when working with large-scale datasets. Each operation requires multiple data transformations, with the conversion to tibble format creating memory copies of the entire dataset, followed by the reverse conversion back to [`SummarizedExperiment`](https://bioconductor.org/packages/SummarizedExperiment). For datasets containing hundreds of samples and tens of thousands of genes, these repeated conversions can consume memory and add significant computational overhead to even simple operations such as filtering or grouping. \n\nWith the new [`tidySummarizedExperiment`](https://bioconductor.org/packages/tidySummarizedExperiment) release ([v1.19.7](https://github.com/tidyomics/tidySummarizedExperiment/releases/tag/v1.19.7)), we have introduced new optimisations that address these performance limitations. This optimisation is powered by:\n\n1) Check for the query domain (assay, colData, rowData), and execute specialised operation.\n2) Use of [`plyxp`](https://bioconductor.org/packages/plyxp) for complex domain-specific queries.\n\n_plyxp_ is a tidyomics package developed by [Justin Landis](https://github.com/jtlandis), and first released as part of Bioconductor 3.20 in October 2024. \nIt uses data-masking functionality from the [rlang](https://rlang.r-lib.org/) package to perform efficient operations on _SummarizedExperiment_ objects.\n\n### Motivation and design principles\n\nThis benchmark supports ongoing work to improve the performance of [`tidySummarizedExperiment`](https://bioconductor.org/packages/tidySummarizedExperiment). In this benchmark, we show up to 30x improvement in operations such as `mutate()`.\n\nThe current optimisation is grounded in three principles:\n\n- Decompose operation series: break `mutate(a=..., b=..., c=...)` into single operations for simpler handling and clearer routing. Reference implementation in `R/mutate.R` (decomposition step) at [L146](https://github.com/tidyomics/tidySummarizedExperiment/blob/92072d71f9d3b9a82cfc5fdced8e52477c44d80f/R/mutate.R#L146).\n- Analyse scope: infer whether each expression targets `colData`, `rowData`, `assays`, or a mix (noting that the current analyser is likely over-engineered and could be simplified). See [L149](https://github.com/tidyomics/tidySummarizedExperiment/blob/92072d71f9d3b9a82cfc5fdced8e52477c44d80f/R/mutate.R#L149).\n- Route mixed operations via plyxp: when an expression touches multiple slots, prefer the plyxp path for correctness and performance. See [L155](https://github.com/tidyomics/tidySummarizedExperiment/blob/92072d71f9d3b9a82cfc5fdced8e52477c44d80f/R/mutate.R#L155).\n\nThese design choices aim to preserve dimnames, avoid unnecessary tibble round-trips, and provide predictable performance across simple and mixed-slot scenarios.\n\n### Example of code optimisation\n\nThis was the `mutate()` method before optimisation. The previous implementation relied on \n`as_tibble() |> dplyr::mutate() |> update_SE_from_tibble(.data)`\n\nThe function `update_SE_from_tibble` interprets the input tibble and converts it back to a `SummarizedExperiment`. Although this step provides great generality and flexibility, it is particularly expensive because it must infer whether columns are sample-wise or feature-wise.\n\n\n::: {.cell}\n\n```{.r .cell-code  code-fold=\"true\" code-summary=\"Show pre-optimization source\"}\nmutate.SummarizedExperiment <- function(.data, ...) {\n    # Legacy implementation of mutate() for SummarizedExperiment:\n    # - Validates requested edits against special/view-only columns\n    # - Performs mutate() via tibble round-trip, then reconstructs the SE\n    # Check that we are not modifying a key column\n    cols <- enquos(...) |> names()\n    \n    # Deprecation of special column names:\n    # capture all quoted args to detect deprecated special-column usage\n    .cols <- enquos(..., .ignore_empty=\"all\") %>% \n        map(~ quo_name(.x)) %>% unlist()\n    if (is_sample_feature_deprecated_used(.data, .cols)) {\n        # Record deprecated usage into metadata for backward compatibility\n        .data <- ping_old_special_column_into_metadata(.data)\n    }\n    \n    # Identify view-only/special columns (sample/feature keys, etc.)\n    # Use a small slice to reduce overhead while probing structure\n    special_columns <- get_special_columns(\n        # Decrease the size of the dataset\n        .data[1:min(100, nrow(.data)), 1:min(20, ncol(.data))]\n    ) |> c(get_needed_columns(.data))\n    \n    # Are any requested targets among special/view-only columns?\n    tst <-\n        intersect(\n            cols,\n            special_columns\n        ) |> \n        length() |>\n        gt(0)\n\n    if (tst) {\n        columns <-\n            special_columns |>\n                paste(collapse=\", \")\n        stop(\n            \"tidySummarizedExperiment says:\",\n            \" you are trying to rename a column that is view only\",\n            columns,\n            \"(it is not present in the colData).\",\n            \" If you want to mutate a view-only column,\",\n            \" make a copy and mutate that one.\"\n        )\n    }\n\n    # If Ranges column not in query, prefer faster tibble conversion\n    # Skip expanding GRanges columns when not referenced\n    skip_GRanges <-\n        get_GRanges_colnames() %in% \n        cols |>\n        not()\n    \n    # Round-trip: SE -> tibble -> dplyr::mutate -> SE\n    .data |>\n        as_tibble(skip_GRanges=skip_GRanges) |>\n        dplyr::mutate(...) |>\n        update_SE_from_tibble(.data)\n}\n```\n:::\n\n\nThe new implementation captures all easy cases, such as sample-only and feature-only metadata `mutate()`. If `mutate()` is a mixed operation that can be factored out to sample- and feature-wise operation it is handled by `plyxp`. Otherwise, the general solution is used.\n\nKey components to compare:\n- The pre-optimization code always uses a tibble round-trip (`as_tibble() |> dplyr::mutate() |> update_SE_from_tibble()`).\n- The optimized code first analyzes scope (`colData`, `rowData`, `assay`, or mixed) and dispatches to specialized paths.\n- The fallback still exists (`mutate_via_tibble`) for complex cases, preserving generality.\n\n\n::: {.cell}\n\n```{.r .cell-code  code-fold=\"true\" code-summary=\"Show post-optimization source\"}\nmutate.SummarizedExperiment <- function(.data, ...) {\n\n       # Check if query is composed (multiple expressions)\n    if (is_composed(\"mutate\", ...)) return(decompose_tidy_operation(\"mutate\", ...)(.data))\n\n        # Check for scope and dispatch elegantly\n        scope_report <- analyze_query_scope_mutate(.data, ...)\n        scope <- scope_report$scope\n\n        result <-\n        if(scope == \"coldata_only\") modify_samples(.data, \"mutate\", ...)\n        else if(scope == \"rowdata_only\") modify_features(.data, \"mutate\", ...)\n        else if(scope == \"assay_only\") mutate_assay(.data, ...)\n        else if(scope == \"mixed\") modify_se_plyxp(.data, \"mutate\", scope_report, ...)\n        else mutate_via_tibble(.data, ...)\n\n        # Record latest mutate scope into metadata for testing/introspection\n        meta <- S4Vectors::metadata(result)\n        if (is.null(meta)) meta <- list()\n        meta$latest_mutate_scope_report <- scope_report\n        S4Vectors::metadata(result) <- meta\n\n        return(result)\n\n}\n```\n:::\n\n\n\n# Benchmarking Overview\n\nThis vignette benchmarks a set of [`mutate()`](https://tidyomics.github.io/tidySummarizedExperiment/reference/mutate.html), [`filter()`](https://tidyomics.github.io/tidySummarizedExperiment/reference/filter.html), [`select()`](https://tidyomics.github.io/tidySummarizedExperiment/reference/select.html), and [`distinct()`](https://tidyomics.github.io/tidySummarizedExperiment/reference/distinct.html) scenarios (see [documentation](https://bioconductor.org/packages/tidySummarizedExperiment)) comparing performance before and after optimisation, by explicitly checking out specific commits via git worktree, loading each commit's code with `devtools::load_all()`, running the same scenarios multiple times, and comparing the runtimes with ggplot boxplots.\n\n- Before optimisation: [commit 87445757d2d0332e7d335d22cd28f73568b7db66](https://github.com/tidyomics/tidySummarizedExperiment/commit/87445757d2d0332e7d335d22cd28f73568b7db66)\n- After optimisation: [commit 9f7c26e0519c92f9682b270d566da127367bcbc0](https://github.com/tidyomics/tidySummarizedExperiment/commit/9f7c26e0519c92f9682b270d566da127367bcbc0)\n\n\n### Setup helper functions\n\n\n::: {.cell}\n\n```{.r .cell-code  code-fold=\"true\" code-summary=\"Show the code\"}\nsuppressPackageStartupMessages({\n  library(ggplot2)\n  library(dplyr)\n  library(SummarizedExperiment)\n  library(rlang)\n  library(devtools)\n  library(airway)\n  library(microbenchmark)\n  library(reactable)\n  library(patchwork)\n})\n\nload_branch_code <- function(worktree_dir) {\n  if (!requireNamespace(\"devtools\", quietly = TRUE)) stop(\"Please install devtools to run this vignette.\")\n  # Debug: print the directory we're looking for\n  cat(\"Looking for worktree directory:\", worktree_dir, \"\\n\")\n  cat(\"Directory exists:\", dir.exists(worktree_dir), \"\\n\")\n  cat(\"Current working directory:\", getwd(), \"\\n\")\n  # Check if directory exists\n  if (!dir.exists(worktree_dir)) {\n    stop(paste(\"Worktree directory does not exist:\", worktree_dir))\n  }\n  suppressMessages(devtools::load_all(worktree_dir, quiet = TRUE))\n}\n\ncreate_airway_test_se <- function() {\n  suppressPackageStartupMessages(library(airway))\n  data(airway)\n  se <- airway\n  se[1:200, ]\n}\n\nbenchmark_scenarios <- function() {\n  list(\n    coldata_simple_assignment = quo({ se |> mutate(new_dex = dex) }),\n    coldata_arithmetic = quo({ se |> mutate(avgLength_plus_5 = avgLength + 5) }),\n    coldata_concat = quo({ se |> mutate(sample_info = paste(cell, dex, SampleName, sep = \"_\")) }),\n    coldata_grouped_mean = quo({ se |> group_by(dex) |> mutate(avgLength_group_mean = mean(avgLength)) |> ungroup() }),\n    assay_simple_assignment = quo({ se |> mutate(counts_copy = counts) }),\n    assay_plus_one = quo({ se |> mutate(counts_plus_1 = counts + 1) }),\n    assay_log = quo({ se |> mutate(log_counts_manual = log2(counts + 1)) }),\n    complex_conditional_coldata = quo({ se |> mutate(length_group = ifelse(avgLength > mean(avgLength), \"longer\", \"shorter\")) }),\n    complex_nested = quo({ se |> mutate(complex_category = ifelse(dex == \"trt\" & avgLength > mean(avgLength), \"treated_long\", ifelse(dex == \"untrt\", \"untreated\", \"other\"))) }),\n    mixed_assay_coldata = quo({ se |> mutate(new_counts = counts * avgLength) }),\n    multiple_simple_assay = quo({ se |> mutate(normalized_counts = counts / 1000, sqrt_counts = sqrt(counts)) }),\n    chained_mutates = quo({ se |> mutate(tmp = avgLength * 2) |> mutate(flag = ifelse(tmp > mean(tmp), 1, 0)) }),\n\n    # Filter benchmarks (scoped and non-rectangular)\n    filter_coldata_simple = quo({ se |> filter(dex == \"trt\") }),\n    filter_coldata_numeric = quo({ se |> filter(avgLength > median(avgLength)) }),\n    filter_assay_nonrect = quo({ se |> filter(counts > 0) }),\n\n    # Select benchmarks (covering colData-only, rowData-only, assays-only, mixed)\n    select_coldata_simple = quo({ se |> select(.sample, dex) }),\n    select_rowdata_simple = quo({ se |> select(.feature) }),\n    select_assay_only = quo({ se |> select(counts) }),\n    select_mixed_keys_counts = quo({ se |> select(.sample, .feature, counts) }),\n    select_coldata_wide = quo({ se |> select(.sample, dex, avgLength, SampleName) }),\n\n    # Distinct benchmarks (covering colData-only, rowData-only, assays-only, mixed)\n    distinct_coldata_simple = quo({ se |> distinct(dex) }),\n    distinct_coldata_multiple = quo({ se |> distinct(dex, avgLength) }),\n    distinct_rowdata_simple = quo({ se |> distinct(.feature) }),\n    distinct_assay_only = quo({ se |> distinct(counts) }),\n    distinct_mixed_keys_counts = quo({ se |> distinct(.sample, .feature, counts) }),\n    distinct_coldata_wide = quo({ se |> distinct(.sample, dex, avgLength, SampleName) }),\n    distinct_with_keep_all = quo({ se |> distinct(dex, .keep_all = TRUE) }),\n    distinct_complex_expression = quo({ se |> distinct(dex, avgLength) })\n  )\n}\n\nrun_one <- function(expr_quo, reps = 5L) {\n  se_base <- create_airway_test_se()\n  mb <- microbenchmark::microbenchmark(\n    eval_tidy(expr_quo),\n    times = reps,\n    setup = { se <- se_base },          # reuse the same input, avoid recreating inside the timed expr\n    control = list(warmup = 2L)\n  )\n  # microbenchmark returns nanoseconds; convert to milliseconds\n  as.numeric(mb$time) / 1e6\n}\n\nrun_all_scenarios <- function(branch_label, reps = 7L) {\n  scenarios <- benchmark_scenarios()\n  out <- list()\n  for (nm in names(scenarios)) {\n    tms <- run_one(scenarios[[nm]], reps = reps)\n    out[[length(out) + 1L]] <- data.frame(\n      branch = branch_label,\n      scenario = nm,\n      replicate = seq_along(tms),\n      elapsed_ms = tms,\n      stringsAsFactors = FALSE\n    )\n  }\n  bind_rows(out)\n}\n\n# Parallel version: run each scenario on a separate worker\nrun_all_scenarios_parallel <- function(branch_label, reps = 20L, workers = 1L, initializer = NULL) {\n  scenarios <- benchmark_scenarios()\n  nms <- names(scenarios)\n  old_plan <- future::plan()\n  on.exit(future::plan(old_plan), add = TRUE)\n  future::plan(future::multisession, workers = workers)\n  res <- future.apply::future_lapply(nms, function(nm) {\n    if (!is.null(initializer)) initializer()\n    tms <- run_one(scenarios[[nm]], reps = reps)\n    data.frame(\n      branch = branch_label,\n      scenario = nm,\n      replicate = seq_along(tms),\n      elapsed_ms = tms,\n      stringsAsFactors = FALSE\n    )\n  }, future.seed = TRUE)\n  dplyr::bind_rows(res)\n}\n```\n:::\n\n\n### Run benchmarks on both branches\n\n\n::: {.cell}\n\n```{.r .cell-code  code-fold=\"true\" code-summary=\"Show the code\"}\n# Worktree directories (already exist in the post directory)\nwt_before <- \".__bench_before__\"\nwt_after <- \".__bench_after__\"\n\n# Verify worktrees exist\nif (!dir.exists(wt_before)) {\n  stop(\"Worktree directory does not exist: \", wt_before)\n}\nif (!dir.exists(wt_after)) {\n  stop(\"Worktree directory does not exist: \", wt_after)\n}\n\n# Before optimisation (commit 87445757)\nload_branch_code(wt_before)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\nLooking for worktree directory: .__bench_before__ \nDirectory exists: TRUE \nCurrent working directory: /Users/a1234450/Documents/GitHub/tidyomicsBlog/posts/2025-10-25-tidySummarizedExperiment-optimization \n```\n\n\n:::\n\n```{.r .cell-code  code-fold=\"true\" code-summary=\"Show the code\"}\nres_before <- run_all_scenarios(branch_label = \"before_optimization\", reps = 10L)\n\n# After optimisation (commit 9f7c26e)\nload_branch_code(wt_after)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\nLooking for worktree directory: .__bench_after__ \nDirectory exists: TRUE \nCurrent working directory: /Users/a1234450/Documents/GitHub/tidyomicsBlog/posts/2025-10-25-tidySummarizedExperiment-optimization \n```\n\n\n:::\n\n```{.r .cell-code  code-fold=\"true\" code-summary=\"Show the code\"}\nres_after <- run_all_scenarios(branch_label = \"after_optimization\", reps = 10L)\n\nresults <- dplyr::bind_rows(res_before, res_after) |>\n  dplyr::mutate(operation = dplyr::case_when(\n    grepl(\"^filter\", scenario) ~ \"filter\",\n    grepl(\"^select\", scenario) ~ \"select\",\n    grepl(\"^distinct\", scenario) ~ \"distinct\",\n    TRUE ~ \"mutate\"\n  ))\n\nsummary_table <- results |>\n  group_by(branch, scenario) |>\n  summarise(median_ms = median(elapsed_ms), .groups = \"drop\") |>\n  tidyr::pivot_wider(names_from = branch, values_from = median_ms) |> \n  dplyr::mutate(speedup = round(before_optimization / after_optimization, 2))\n```\n:::\n\n\n\n::: {.cell}\n::: {.cell-output-display}\n\n```{=html}\n<div class=\"reactable html-widget html-fill-item\" id=\"htmlwidget-be62d0a7b115f97e1d95\" style=\"width:auto;height:auto;\"></div>\n<script type=\"application/json\" data-for=\"htmlwidget-be62d0a7b115f97e1d95\">{\"x\":{\"tag\":{\"name\":\"Reactable\",\"attribs\":{\"data\":{\"scenario\":[\"assay_log\",\"assay_plus_one\",\"assay_simple_assignment\",\"chained_mutates\",\"coldata_arithmetic\",\"coldata_concat\",\"coldata_grouped_mean\",\"coldata_simple_assignment\",\"complex_conditional_coldata\",\"complex_nested\",\"distinct_assay_only\",\"distinct_coldata_multiple\",\"distinct_coldata_simple\",\"distinct_coldata_wide\",\"distinct_complex_expression\",\"distinct_mixed_keys_counts\",\"distinct_rowdata_simple\",\"distinct_with_keep_all\",\"filter_assay_nonrect\",\"filter_coldata_numeric\",\"filter_coldata_simple\",\"mixed_assay_coldata\",\"multiple_simple_assay\",\"select_assay_only\",\"select_coldata_simple\",\"select_coldata_wide\",\"select_mixed_keys_counts\",\"select_rowdata_simple\"],\"after_optimization\":[12.1772705,12.120708,13.033438,21.8921255,11.0978115,9.9674795,96.1601455,13.26825,11.3534175,11.614917,191.317604,103.8458955,104.324063,105.13925,101.9207915,216.8330205,100.924542,102.734958,78.236541,21.618147,23.073854,29.388625,26.1888745,173.0589375,99.4489585,106.7828745,122.281708,96.581313],\"before_optimization\":[289.9473125,301.021812,302.3901255,531.5186875,275.1587495,286.2960215,90.411437,304.5340005,272.4593545,276.3903955,203.7173125,122.818604,121.259167,121.5855,125.015354,204.204646,117.6186455,120.1943755,106.376646,27.2320835,26.475958,292.0701245,307.999,182.769959,126.767563,113.77525,120.5803335,116.265104],\"speedup\":[23.81,24.84,23.2,24.28,24.79,28.72,0.94,22.95,24,23.8,1.06,1.18,1.16,1.16,1.23,0.94,1.17,1.17,1.36,1.26,1.15,9.94,11.76,1.06,1.27,1.07,0.99,1.2]},\"columns\":[{\"id\":\"scenario\",\"name\":\"Scenario\",\"type\":\"character\",\"minWidth\":220,\"align\":\"left\"},{\"id\":\"after_optimization\",\"name\":\"After (ms)\",\"type\":\"numeric\",\"minWidth\":120,\"align\":\"left\",\"format\":{\"cell\":{\"digits\":1},\"aggregated\":{\"digits\":1}}},{\"id\":\"before_optimization\",\"name\":\"Before (ms)\",\"type\":\"numeric\",\"minWidth\":120,\"align\":\"left\",\"format\":{\"cell\":{\"digits\":1},\"aggregated\":{\"digits\":1}}},{\"id\":\"speedup\",\"name\":\"Speedup (x)\",\"type\":\"numeric\",\"minWidth\":120,\"align\":\"left\",\"format\":{\"cell\":{\"digits\":2},\"aggregated\":{\"digits\":2}}}],\"filterable\":true,\"searchable\":true,\"defaultPageSize\":10,\"highlight\":true,\"bordered\":true,\"striped\":true,\"compact\":true,\"dataKey\":\"ea7124abc1135d7cf55e9fa312c89d3f\"},\"children\":[]},\"class\":\"reactR_markup\"},\"evals\":[],\"jsHooks\":[]}</script>\n```\n\n:::\n:::\n\n\n# Visualize with combined performance plots\n\n\n::: {.cell}\n\n```{.r .cell-code  code-fold=\"true\" code-summary=\"Show the code\"}\ndodge_w <- 0.7\n\np_box <- ggplot(results, aes(x = scenario, y = elapsed_ms, fill = branch)) +\n  geom_boxplot(position = position_dodge(width = dodge_w), width = 0.7, outlier.shape = NA) +\n\n  # Add jittered points aligned with the dodged boxplots\n  geom_point(\n    position = position_jitterdodge(jitter.width = 0.1, jitter.height = 0, dodge.width = dodge_w), \n    alpha = 0.6, \n    size = 0.5\n  ) +\n  scale_y_log10() + \n  coord_flip() +\n  facet_grid(operation ~ ., scales = \"free_y\", space = \"free_y\") +\n  annotation_logticks(sides = \"b\") +\n  labs(title = \"Performance comparison: Before vs After optimization\",\n       x = \"Scenario\",\n       y = \"Elapsed (ms)\") +\n  theme_bw() +\n  \n  # Angle x labels  \n  theme(legend.position = \"top\", axis.text.x = element_text(angle = 45, hjust = 1))\n\n# Speedup summary panel (median before/after ratio)\nspeedup_plot_data <- summary_table |>\n  dplyr::mutate(operation = dplyr::case_when(\n    grepl(\"^filter\", scenario) ~ \"filter\",\n    grepl(\"^select\", scenario) ~ \"select\",\n    grepl(\"^distinct\", scenario) ~ \"distinct\",\n    TRUE ~ \"mutate\"\n  ))\n\np_speedup <- ggplot(\n  speedup_plot_data,\n  aes(x = speedup, y = reorder(scenario, speedup))\n) +\n  geom_col(width = 0.7, fill = \"grey70\", color = \"grey40\") +\n  facet_grid(operation ~ ., scales = \"free_y\", space = \"free_y\") +\n  labs(\n    title = \"Median speedup by scenario\",\n    x = \"Speedup (before/after, x)\",\n    y = NULL\n  ) +\n  theme_bw() +\n  theme(legend.position = \"none\")\n\ncombined_plot <- p_box + p_speedup + patchwork::plot_layout(widths = c(2.3, 1))\ncombined_plot\n```\n\n::: {.cell-output-display}\n![](index_files/figure-html/plot-1.png){width=1344}\n:::\n\n```{.r .cell-code  code-fold=\"true\" code-summary=\"Show the code\"}\n# Save the combined figure\nggsave(\"benchmark_plot.png\", plot = combined_plot, width = 14, height = 8)\n```\n:::\n\n\n### Interpreting the benchmark results\n\n\n\nAcross all scenarios, speedup ranges from **0.94x** to **28.72x**.\n\nOperations with the strongest gains are: **coldata_concat (28.72x), assay_plus_one (24.84x), coldata_arithmetic (24.79x)**.\n\nLower-gain outliers are: **coldata_grouped_mean (0.94x), distinct_mixed_keys_counts (0.94x), select_mixed_keys_counts (0.99x)**.\n\nBy operation family, median speedup is: **mutate (23.8x), filter (1.26x), distinct (1.17x), select (1.07x)**.\n\n# Session Info\n\n::: {.cell}\n::: {.cell-output .cell-output-stdout}\n\n```\nR version 4.5.3 (2026-03-11)\nPlatform: x86_64-apple-darwin20\nRunning under: macOS Sonoma 14.6.1\n\nMatrix products: default\nBLAS:   /Library/Frameworks/R.framework/Versions/4.5-x86_64/Resources/lib/libRblas.0.dylib \nLAPACK: /Library/Frameworks/R.framework/Versions/4.5-x86_64/Resources/lib/libRlapack.dylib;  LAPACK version 3.12.1\n\nlocale:\n[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8\n\ntime zone: Australia/Adelaide\ntzcode source: internal\n\nattached base packages:\n[1] stats4    stats     graphics  grDevices utils     datasets  methods  \n[8] base     \n\nother attached packages:\n [1] tidySummarizedExperiment_1.19.7 tidyr_1.3.2                    \n [3] testthat_3.3.2                  ttservice_0.5.3                \n [5] patchwork_1.3.2                 reactable_0.4.5                \n [7] rlang_1.1.7                     microbenchmark_1.5.0           \n [9] airway_1.30.0                   SummarizedExperiment_1.40.0    \n[11] Biobase_2.70.0                  GenomicRanges_1.62.1           \n[13] Seqinfo_1.0.0                   IRanges_2.44.0                 \n[15] S4Vectors_0.48.0                BiocGenerics_0.56.0            \n[17] generics_0.1.4                  MatrixGenerics_1.22.0          \n[19] matrixStats_1.5.0               dplyr_1.2.0                    \n[21] ggplot2_4.0.2                   devtools_2.4.6                 \n[23] usethis_3.2.1                  \n\nloaded via a namespace (and not attached):\n [1] tidyselect_1.2.1    viridisLite_0.4.3   farver_2.1.2       \n [4] S7_0.2.1            fastmap_1.2.0       lazyeval_0.2.2     \n [7] digest_0.6.39       plyxp_1.4.3         lifecycle_1.0.5    \n[10] ellipsis_0.3.2      magrittr_2.0.4      compiler_4.5.3     \n[13] tools_4.5.3         yaml_2.3.12         data.table_1.18.2.1\n[16] knitr_1.51          S4Arrays_1.10.1     labeling_0.4.3     \n[19] htmlwidgets_1.6.4   pkgbuild_1.4.8      DelayedArray_0.36.0\n[22] RColorBrewer_1.1-3  pkgload_1.5.0       abind_1.4-8        \n[25] withr_3.0.2         purrr_1.2.1         desc_1.4.3         \n[28] grid_4.5.3          fansi_1.0.7         scales_1.4.0       \n[31] cli_3.6.5           rmarkdown_2.30      ragg_1.5.1         \n[34] remotes_2.5.0       otel_0.2.0          rstudioapi_0.18.0  \n[37] httr_1.4.8          sessioninfo_1.2.3   cachem_1.1.0       \n[40] stringr_1.6.0       XVector_0.50.0      vctrs_0.7.1        \n[43] Matrix_1.7-4        jsonlite_2.0.0      systemfonts_1.3.2  \n[46] crosstalk_1.2.2     plotly_4.12.0       glue_1.8.0         \n[49] reactR_0.6.1        stringi_1.8.7       gtable_0.3.6       \n[52] tibble_3.3.1        pillar_1.11.1       htmltools_0.5.9    \n[55] brio_1.1.5          R6_2.6.1            textshaping_1.0.5  \n[58] rprojroot_2.1.1     evaluate_1.0.5      lattice_0.22-9     \n[61] memoise_2.0.1       SparseArray_1.10.9  xfun_0.56          \n[64] fs_1.6.7            pkgconfig_2.0.3    \n```\n\n\n:::\n:::\n\n\n\n",
+    "markdown": "---\ntitle: \"Speeding up tidySummarizedExperiment through query optimisation and the plyxp backend\"\nauthor: \"Stefano Mangiola\"\ncontributors:\n  - Stefano Mangiola\n  - Michael Love\n  - Justin Landis\n  - Pierre-Paul Axisa\ndate: \"2026-03-22\"\npackage: tidySummarizedExperiment\ntags:\n  - tidyomics/tidyomicsBlog\n  - optimization\n  - performance\n  - plyxp\n  - SummarizedExperiment\n  - benchmarking\ndescription: \"Performance optimisation of tidySummarizedExperiment and related benchmark.\"\nimage: benchmark_plot.png\nformat:\n  html:\n    toc: true\n    toc-float: true\n    theme: yeti\n    css: ../../../styles.css\nexecute:\n  freeze: true\n---\n\n![tidySummarizedExperiment logo](logo.png){width=\"150px\" fig-align=\"left\"}\n\n*Contributors: Michael Love, Justin Landis, Pierre-Paul Axisa*\n\n\n::: {.cell}\n\n:::\n\n\n\nThe generality of [`tidySummarizedExperiment`](https://bioconductor.org/packages/tidySummarizedExperiment) makes it easy to interface with several [`tidyverse`](https://www.tidyverse.org/) packages (e.g. [`dplyr`](https://CRAN.R-project.org/package=dplyr), [`tidyr`](https://CRAN.R-project.org/package=tidyr), [`ggplot2`](https://CRAN.R-project.org/package=ggplot2), [`purrr`](https://CRAN.R-project.org/package=purrr), [`plotly`](https://CRAN.R-project.org/package=plotly)). This is possible thanks to its approach of converting [`SummarizedExperiment`](https://bioconductor.org/packages/SummarizedExperiment) objects to tibbles, performing operations, and converting back to the original format. This conversion process introduces substantial overhead when working with large-scale datasets. Each operation requires multiple data transformations, with the conversion to tibble format creating memory copies of the entire dataset, followed by the reverse conversion back to [`SummarizedExperiment`](https://bioconductor.org/packages/SummarizedExperiment). For datasets containing hundreds of samples and tens of thousands of genes, these repeated conversions can consume memory and add significant computational overhead to even simple operations such as filtering or grouping. \n\nWith the new [`tidySummarizedExperiment`](https://bioconductor.org/packages/tidySummarizedExperiment) release ([v1.19.7](https://github.com/tidyomics/tidySummarizedExperiment/releases/tag/v1.19.7)), we have introduced new optimisations that address these performance limitations. This optimisation is powered by:\n\n1) Check for the query domain (assay, colData, rowData), and execute specialised operation.\n2) Use of [`plyxp`](https://bioconductor.org/packages/plyxp) for complex domain-specific queries.\n\n_plyxp_ is a tidyomics package developed by [Justin Landis](https://github.com/jtlandis), and first released as part of Bioconductor 3.20 in October 2024. \nIt uses data-masking functionality from the [rlang](https://rlang.r-lib.org/) package to perform efficient operations on _SummarizedExperiment_ objects.\n\n### Motivation and design principles\n\nThis benchmark supports ongoing work to improve the performance of [`tidySummarizedExperiment`](https://bioconductor.org/packages/tidySummarizedExperiment). In this benchmark, we show up to 30x improvement in operations such as `mutate()`.\n\nThe current optimisation is grounded in three principles:\n\n- Decompose operation series: break `mutate(a=..., b=..., c=...)` into single operations for simpler handling and clearer routing. Reference implementation in `R/mutate.R` (decomposition step) at [L146](https://github.com/tidyomics/tidySummarizedExperiment/blob/92072d71f9d3b9a82cfc5fdced8e52477c44d80f/R/mutate.R#L146).\n- Analyse scope: infer whether each expression targets `colData`, `rowData`, `assays`, or a mix (noting that the current analyser is likely over-engineered and could be simplified). See [L149](https://github.com/tidyomics/tidySummarizedExperiment/blob/92072d71f9d3b9a82cfc5fdced8e52477c44d80f/R/mutate.R#L149).\n- Route mixed operations via plyxp: when an expression touches multiple slots, prefer the plyxp path for correctness and performance. See [L155](https://github.com/tidyomics/tidySummarizedExperiment/blob/92072d71f9d3b9a82cfc5fdced8e52477c44d80f/R/mutate.R#L155).\n\nThese design choices aim to preserve dimnames, avoid unnecessary tibble round-trips, and provide predictable performance across simple and mixed-slot scenarios.\n\n### Example of code optimisation\n\nThis was the `mutate()` method before optimisation. The previous implementation relied on \n`as_tibble() |> dplyr::mutate() |> update_SE_from_tibble(.data)`\n\nThe function `update_SE_from_tibble` interprets the input tibble and converts it back to a `SummarizedExperiment`. Although this step provides great generality and flexibility, it is particularly expensive because it must infer whether columns are sample-wise or feature-wise.\n\n\n::: {.cell}\n\n```{.r .cell-code  code-fold=\"true\" code-summary=\"Show pre-optimization source\"}\nmutate.SummarizedExperiment <- function(.data, ...) {\n    # Legacy implementation of mutate() for SummarizedExperiment:\n    # - Validates requested edits against special/view-only columns\n    # - Performs mutate() via tibble round-trip, then reconstructs the SE\n    # Check that we are not modifying a key column\n    cols <- enquos(...) |> names()\n    \n    # Deprecation of special column names:\n    # capture all quoted args to detect deprecated special-column usage\n    .cols <- enquos(..., .ignore_empty=\"all\") %>% \n        map(~ quo_name(.x)) %>% unlist()\n    if (is_sample_feature_deprecated_used(.data, .cols)) {\n        # Record deprecated usage into metadata for backward compatibility\n        .data <- ping_old_special_column_into_metadata(.data)\n    }\n    \n    # Identify view-only/special columns (sample/feature keys, etc.)\n    # Use a small slice to reduce overhead while probing structure\n    special_columns <- get_special_columns(\n        # Decrease the size of the dataset\n        .data[1:min(100, nrow(.data)), 1:min(20, ncol(.data))]\n    ) |> c(get_needed_columns(.data))\n    \n    # Are any requested targets among special/view-only columns?\n    tst <-\n        intersect(\n            cols,\n            special_columns\n        ) |> \n        length() |>\n        gt(0)\n\n    if (tst) {\n        columns <-\n            special_columns |>\n                paste(collapse=\", \")\n        stop(\n            \"tidySummarizedExperiment says:\",\n            \" you are trying to rename a column that is view only\",\n            columns,\n            \"(it is not present in the colData).\",\n            \" If you want to mutate a view-only column,\",\n            \" make a copy and mutate that one.\"\n        )\n    }\n\n    # If Ranges column not in query, prefer faster tibble conversion\n    # Skip expanding GRanges columns when not referenced\n    skip_GRanges <-\n        get_GRanges_colnames() %in% \n        cols |>\n        not()\n    \n    # Round-trip: SE -> tibble -> dplyr::mutate -> SE\n    .data |>\n        as_tibble(skip_GRanges=skip_GRanges) |>\n        dplyr::mutate(...) |>\n        update_SE_from_tibble(.data)\n}\n```\n:::\n\n\nThe new implementation captures all easy cases, such as sample-only and feature-only metadata `mutate()`. If `mutate()` is a mixed operation that can be factored out to sample- and feature-wise operation it is handled by `plyxp`. Otherwise, the general solution is used.\n\nKey components to compare:\n- The pre-optimization code always uses a tibble round-trip (`as_tibble() |> dplyr::mutate() |> update_SE_from_tibble()`).\n- The optimized code first analyzes scope (`colData`, `rowData`, `assay`, or mixed) and dispatches to specialized paths.\n- The fallback still exists (`mutate_via_tibble`) for complex cases, preserving generality.\n\n\n::: {.cell}\n\n```{.r .cell-code  code-fold=\"true\" code-summary=\"Show post-optimization source\"}\nmutate.SummarizedExperiment <- function(.data, ...) {\n\n       # Check if query is composed (multiple expressions)\n    if (is_composed(\"mutate\", ...)) return(decompose_tidy_operation(\"mutate\", ...)(.data))\n\n        # Check for scope and dispatch elegantly\n        scope_report <- analyze_query_scope_mutate(.data, ...)\n        scope <- scope_report$scope\n\n        result <-\n        if(scope == \"coldata_only\") modify_samples(.data, \"mutate\", ...)\n        else if(scope == \"rowdata_only\") modify_features(.data, \"mutate\", ...)\n        else if(scope == \"assay_only\") mutate_assay(.data, ...)\n        else if(scope == \"mixed\") modify_se_plyxp(.data, \"mutate\", scope_report, ...)\n        else mutate_via_tibble(.data, ...)\n\n        # Record latest mutate scope into metadata for testing/introspection\n        meta <- S4Vectors::metadata(result)\n        if (is.null(meta)) meta <- list()\n        meta$latest_mutate_scope_report <- scope_report\n        S4Vectors::metadata(result) <- meta\n\n        return(result)\n\n}\n```\n:::\n\n\n\n# Benchmarking Overview\n\nThis vignette benchmarks a set of [`mutate()`](https://tidyomics.github.io/tidySummarizedExperiment/reference/mutate.html), [`filter()`](https://tidyomics.github.io/tidySummarizedExperiment/reference/filter.html), [`select()`](https://tidyomics.github.io/tidySummarizedExperiment/reference/select.html), and [`distinct()`](https://tidyomics.github.io/tidySummarizedExperiment/reference/distinct.html) scenarios (see [documentation](https://bioconductor.org/packages/tidySummarizedExperiment)) comparing performance before and after optimisation, by explicitly checking out specific commits via git worktree, loading each commit's code with `devtools::load_all()`, running the same scenarios multiple times, and comparing the runtimes with ggplot boxplots.\n\n- Before optimisation: [commit 87445757d2d0332e7d335d22cd28f73568b7db66](https://github.com/tidyomics/tidySummarizedExperiment/commit/87445757d2d0332e7d335d22cd28f73568b7db66)\n- After optimisation: [commit 9f7c26e0519c92f9682b270d566da127367bcbc0](https://github.com/tidyomics/tidySummarizedExperiment/commit/9f7c26e0519c92f9682b270d566da127367bcbc0)\n\n\n### Setup helper functions\n\n\n::: {.cell}\n\n```{.r .cell-code  code-fold=\"true\" code-summary=\"Show the code\"}\nsuppressPackageStartupMessages({\n  library(ggplot2)\n  library(dplyr)\n  library(SummarizedExperiment)\n  library(rlang)\n  library(devtools)\n  library(airway)\n  library(microbenchmark)\n  library(reactable)\n  library(patchwork)\n})\n\nload_branch_code <- function(worktree_dir) {\n  if (!requireNamespace(\"devtools\", quietly = TRUE)) stop(\"Please install devtools to run this vignette.\")\n  # Debug: print the directory we're looking for\n  cat(\"Looking for worktree directory:\", worktree_dir, \"\\n\")\n  cat(\"Directory exists:\", dir.exists(worktree_dir), \"\\n\")\n  cat(\"Current working directory:\", getwd(), \"\\n\")\n  # Check if directory exists\n  if (!dir.exists(worktree_dir)) {\n    stop(paste(\"Worktree directory does not exist:\", worktree_dir))\n  }\n  suppressMessages(devtools::load_all(worktree_dir, quiet = TRUE))\n}\n\ncreate_airway_test_se <- function() {\n  suppressPackageStartupMessages(library(airway))\n  data(airway)\n  se <- airway\n  se[1:200, ]\n}\n\nbenchmark_scenarios <- function() {\n  list(\n    coldata_simple_assignment = quo({ se |> mutate(new_dex = dex) }),\n    coldata_arithmetic = quo({ se |> mutate(avgLength_plus_5 = avgLength + 5) }),\n    coldata_concat = quo({ se |> mutate(sample_info = paste(cell, dex, SampleName, sep = \"_\")) }),\n    coldata_grouped_mean = quo({ se |> group_by(dex) |> mutate(avgLength_group_mean = mean(avgLength)) |> ungroup() }),\n    assay_simple_assignment = quo({ se |> mutate(counts_copy = counts) }),\n    assay_plus_one = quo({ se |> mutate(counts_plus_1 = counts + 1) }),\n    assay_log = quo({ se |> mutate(log_counts_manual = log2(counts + 1)) }),\n    complex_conditional_coldata = quo({ se |> mutate(length_group = ifelse(avgLength > mean(avgLength), \"longer\", \"shorter\")) }),\n    complex_nested = quo({ se |> mutate(complex_category = ifelse(dex == \"trt\" & avgLength > mean(avgLength), \"treated_long\", ifelse(dex == \"untrt\", \"untreated\", \"other\"))) }),\n    mixed_assay_coldata = quo({ se |> mutate(new_counts = counts * avgLength) }),\n    multiple_simple_assay = quo({ se |> mutate(normalized_counts = counts / 1000, sqrt_counts = sqrt(counts)) }),\n    chained_mutates = quo({ se |> mutate(tmp = avgLength * 2) |> mutate(flag = ifelse(tmp > mean(tmp), 1, 0)) }),\n\n    # Filter benchmarks (scoped and non-rectangular)\n    filter_coldata_simple = quo({ se |> filter(dex == \"trt\") }),\n    filter_coldata_numeric = quo({ se |> filter(avgLength > median(avgLength)) }),\n    filter_assay_nonrect = quo({ se |> filter(counts > 0) }),\n\n    # Select benchmarks (covering colData-only, rowData-only, assays-only, mixed)\n    select_coldata_simple = quo({ se |> select(.sample, dex) }),\n    select_rowdata_simple = quo({ se |> select(.feature) }),\n    select_assay_only = quo({ se |> select(counts) }),\n    select_mixed_keys_counts = quo({ se |> select(.sample, .feature, counts) }),\n    select_coldata_wide = quo({ se |> select(.sample, dex, avgLength, SampleName) }),\n\n    # Distinct benchmarks (covering colData-only, rowData-only, assays-only, mixed)\n    distinct_coldata_simple = quo({ se |> distinct(dex) }),\n    distinct_coldata_multiple = quo({ se |> distinct(dex, avgLength) }),\n    distinct_rowdata_simple = quo({ se |> distinct(.feature) }),\n    distinct_assay_only = quo({ se |> distinct(counts) }),\n    distinct_mixed_keys_counts = quo({ se |> distinct(.sample, .feature, counts) }),\n    distinct_coldata_wide = quo({ se |> distinct(.sample, dex, avgLength, SampleName) }),\n    distinct_with_keep_all = quo({ se |> distinct(dex, .keep_all = TRUE) }),\n    distinct_complex_expression = quo({ se |> distinct(dex, avgLength) })\n  )\n}\n\nrun_one <- function(expr_quo, reps = 5L) {\n  se_base <- create_airway_test_se()\n  mb <- microbenchmark::microbenchmark(\n    eval_tidy(expr_quo),\n    times = reps,\n    setup = { se <- se_base },          # reuse the same input, avoid recreating inside the timed expr\n    control = list(warmup = 2L)\n  )\n  # microbenchmark returns nanoseconds; convert to milliseconds\n  as.numeric(mb$time) / 1e6\n}\n\nrun_all_scenarios <- function(branch_label, reps = 7L) {\n  scenarios <- benchmark_scenarios()\n  out <- list()\n  for (nm in names(scenarios)) {\n    tms <- run_one(scenarios[[nm]], reps = reps)\n    out[[length(out) + 1L]] <- data.frame(\n      branch = branch_label,\n      scenario = nm,\n      replicate = seq_along(tms),\n      elapsed_ms = tms,\n      stringsAsFactors = FALSE\n    )\n  }\n  bind_rows(out)\n}\n\n# Parallel version: run each scenario on a separate worker\nrun_all_scenarios_parallel <- function(branch_label, reps = 20L, workers = 1L, initializer = NULL) {\n  scenarios <- benchmark_scenarios()\n  nms <- names(scenarios)\n  old_plan <- future::plan()\n  on.exit(future::plan(old_plan), add = TRUE)\n  future::plan(future::multisession, workers = workers)\n  res <- future.apply::future_lapply(nms, function(nm) {\n    if (!is.null(initializer)) initializer()\n    tms <- run_one(scenarios[[nm]], reps = reps)\n    data.frame(\n      branch = branch_label,\n      scenario = nm,\n      replicate = seq_along(tms),\n      elapsed_ms = tms,\n      stringsAsFactors = FALSE\n    )\n  }, future.seed = TRUE)\n  dplyr::bind_rows(res)\n}\n```\n:::\n\n\n### Run benchmarks on both branches\n\n\n::: {.cell}\n\n```{.r .cell-code  code-fold=\"true\" code-summary=\"Show the code\"}\n# Worktree directories (already exist in the post directory)\nwt_before <- \".__bench_before__\"\nwt_after <- \".__bench_after__\"\n\n# Verify worktrees exist\nif (!dir.exists(wt_before)) {\n  stop(\"Worktree directory does not exist: \", wt_before)\n}\nif (!dir.exists(wt_after)) {\n  stop(\"Worktree directory does not exist: \", wt_after)\n}\n\n# Before optimisation (commit 87445757)\nload_branch_code(wt_before)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\nLooking for worktree directory: .__bench_before__ \nDirectory exists: TRUE \nCurrent working directory: /Users/a1234450/Documents/GitHub/tidyomicsBlog/posts/2025-10-25-tidySummarizedExperiment-optimization \n```\n\n\n:::\n\n```{.r .cell-code  code-fold=\"true\" code-summary=\"Show the code\"}\nres_before <- run_all_scenarios(branch_label = \"before_optimization\", reps = 10L)\n\n# After optimisation (commit 9f7c26e)\nload_branch_code(wt_after)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\nLooking for worktree directory: .__bench_after__ \nDirectory exists: TRUE \nCurrent working directory: /Users/a1234450/Documents/GitHub/tidyomicsBlog/posts/2025-10-25-tidySummarizedExperiment-optimization \n```\n\n\n:::\n\n```{.r .cell-code  code-fold=\"true\" code-summary=\"Show the code\"}\nres_after <- run_all_scenarios(branch_label = \"after_optimization\", reps = 10L)\n\nresults <- dplyr::bind_rows(res_before, res_after) |>\n  dplyr::mutate(operation = dplyr::case_when(\n    grepl(\"^filter\", scenario) ~ \"filter\",\n    grepl(\"^select\", scenario) ~ \"select\",\n    grepl(\"^distinct\", scenario) ~ \"distinct\",\n    TRUE ~ \"mutate\"\n  ))\n\nsummary_table <- results |>\n  group_by(branch, scenario) |>\n  summarise(median_ms = median(elapsed_ms), .groups = \"drop\") |>\n  tidyr::pivot_wider(names_from = branch, values_from = median_ms) |> \n  dplyr::mutate(speedup = round(before_optimization / after_optimization, 2))\n```\n:::\n\n\n\n::: {.cell}\n::: {.cell-output-display}\n\n```{=html}\n<div class=\"reactable html-widget html-fill-item\" id=\"htmlwidget-226268d0e09b28952dc5\" style=\"width:auto;height:auto;\"></div>\n<script type=\"application/json\" data-for=\"htmlwidget-226268d0e09b28952dc5\">{\"x\":{\"tag\":{\"name\":\"Reactable\",\"attribs\":{\"data\":{\"scenario\":[\"assay_log\",\"assay_plus_one\",\"assay_simple_assignment\",\"chained_mutates\",\"coldata_arithmetic\",\"coldata_concat\",\"coldata_grouped_mean\",\"coldata_simple_assignment\",\"complex_conditional_coldata\",\"complex_nested\",\"distinct_assay_only\",\"distinct_coldata_multiple\",\"distinct_coldata_simple\",\"distinct_coldata_wide\",\"distinct_complex_expression\",\"distinct_mixed_keys_counts\",\"distinct_rowdata_simple\",\"distinct_with_keep_all\",\"filter_assay_nonrect\",\"filter_coldata_numeric\",\"filter_coldata_simple\",\"mixed_assay_coldata\",\"multiple_simple_assay\",\"select_assay_only\",\"select_coldata_simple\",\"select_coldata_wide\",\"select_mixed_keys_counts\",\"select_rowdata_simple\"],\"after_optimization\":[12.4554805,12.328708,11.681375,21.5625,11.0071665,11.0921455,93.9895835,11.321708,11.3674785,11.787646,183.6270415,104.3887705,103.090917,172.007979,108.229458,211.791104,102.567687,116.4211665,74.754,21.781209,22.755375,29.610208,25.965729,174.4993535,98.666563,102.3431045,115.9610005,97.5994795],\"before_optimization\":[289.085042,289.313729,288.7251875,548.727708,288.0155005,281.744729,88.8149785,291.0410415,271.0590425,274.0804585,197.556417,118.027792,116.40925,119.396937,117.7116455,199.8871875,114.4365215,118.1221665,104.8900625,25.817354,26.3123745,283.314792,299.2484995,181.386271,123.428688,113.5046455,117.837937,113.6867925],\"speedup\":[23.21,23.47,24.72,25.45,26.17,25.4,0.94,25.71,23.85,23.25,1.08,1.13,1.13,0.69,1.09,0.94,1.12,1.01,1.4,1.19,1.16,9.57,11.52,1.04,1.25,1.11,1.02,1.16]},\"columns\":[{\"id\":\"scenario\",\"name\":\"Scenario\",\"type\":\"character\",\"minWidth\":220,\"align\":\"left\"},{\"id\":\"after_optimization\",\"name\":\"After (ms)\",\"type\":\"numeric\",\"minWidth\":120,\"align\":\"left\",\"format\":{\"cell\":{\"digits\":1},\"aggregated\":{\"digits\":1}}},{\"id\":\"before_optimization\",\"name\":\"Before (ms)\",\"type\":\"numeric\",\"minWidth\":120,\"align\":\"left\",\"format\":{\"cell\":{\"digits\":1},\"aggregated\":{\"digits\":1}}},{\"id\":\"speedup\",\"name\":\"Speedup (x)\",\"type\":\"numeric\",\"minWidth\":120,\"align\":\"left\",\"format\":{\"cell\":{\"digits\":2},\"aggregated\":{\"digits\":2}}}],\"filterable\":true,\"searchable\":true,\"defaultPageSize\":10,\"highlight\":true,\"bordered\":true,\"striped\":true,\"compact\":true,\"dataKey\":\"5e96953b8fc47c84831c51a6f5bf258f\"},\"children\":[]},\"class\":\"reactR_markup\"},\"evals\":[],\"jsHooks\":[]}</script>\n```\n\n:::\n:::\n\n\n# Visualize with combined performance plots\n\n\n::: {.cell}\n\n```{.r .cell-code  code-fold=\"true\" code-summary=\"Show the code\"}\ndodge_w <- 0.7\n\np_box <- ggplot(results, aes(x = scenario, y = elapsed_ms, fill = branch)) +\n  geom_boxplot(position = position_dodge(width = dodge_w), width = 0.7, outlier.shape = NA) +\n\n  # Add jittered points aligned with the dodged boxplots\n  geom_point(\n    position = position_jitterdodge(jitter.width = 0.1, jitter.height = 0, dodge.width = dodge_w), \n    alpha = 0.6, \n    size = 0.5\n  ) +\n  scale_y_log10() + \n  coord_flip() +\n  facet_grid(operation ~ ., scales = \"free_y\", space = \"free_y\") +\n  annotation_logticks(sides = \"b\") +\n  labs(title = \"Performance comparison: Before vs After optimization\",\n       x = \"Scenario\",\n       y = \"Elapsed (ms)\") +\n  theme_bw() +\n  \n  # Angle x labels  \n  theme(legend.position = \"top\", axis.text.x = element_text(angle = 45, hjust = 1))\n\n# Speedup summary panel (median before/after ratio)\nspeedup_plot_data <- summary_table |>\n  dplyr::mutate(operation = dplyr::case_when(\n    grepl(\"^filter\", scenario) ~ \"filter\",\n    grepl(\"^select\", scenario) ~ \"select\",\n    grepl(\"^distinct\", scenario) ~ \"distinct\",\n    TRUE ~ \"mutate\"\n  ))\n\np_speedup <- ggplot(\n  speedup_plot_data,\n  aes(x = speedup, y = reorder(scenario, speedup))\n) +\n  geom_col(width = 0.7, fill = \"grey70\", color = \"grey40\") +\n  facet_grid(operation ~ ., scales = \"free_y\", space = \"free_y\") +\n  labs(\n    title = \"Median speedup by scenario\",\n    x = \"Speedup (before/after, x)\",\n    y = NULL\n  ) +\n  theme_bw() +\n  theme(legend.position = \"none\")\n\ncombined_plot <- p_box + p_speedup + patchwork::plot_layout(widths = c(2.3, 1))\ncombined_plot\n```\n\n::: {.cell-output-display}\n![](index_files/figure-html/plot-1.png){width=1344}\n:::\n\n```{.r .cell-code  code-fold=\"true\" code-summary=\"Show the code\"}\n# Save the combined figure\nggsave(\"benchmark_plot.png\", plot = combined_plot, width = 14, height = 8)\n```\n:::\n\n\n### Interpreting the benchmark results\n\n\n\nAcross all scenarios, speedup ranges from **0.69x** to **26.17x**.\n\nOperations with the strongest gains are: **coldata_arithmetic (26.17x), coldata_simple_assignment (25.71x), chained_mutates (25.45x)**.\n\nLower-gain outliers are: **distinct_coldata_wide (0.69x), coldata_grouped_mean (0.94x), distinct_mixed_keys_counts (0.94x)**.\n\nBy operation family, median speedup is: **mutate (23.66x), filter (1.19x), select (1.11x), distinct (1.08x)**.\n\n# Session Info\n\n::: {.cell}\n::: {.cell-output .cell-output-stdout}\n\n```\nR version 4.5.3 (2026-03-11)\nPlatform: x86_64-apple-darwin20\nRunning under: macOS Sonoma 14.6.1\n\nMatrix products: default\nBLAS:   /Library/Frameworks/R.framework/Versions/4.5-x86_64/Resources/lib/libRblas.0.dylib \nLAPACK: /Library/Frameworks/R.framework/Versions/4.5-x86_64/Resources/lib/libRlapack.dylib;  LAPACK version 3.12.1\n\nlocale:\n[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8\n\ntime zone: Australia/Adelaide\ntzcode source: internal\n\nattached base packages:\n[1] stats4    stats     graphics  grDevices utils     datasets  methods  \n[8] base     \n\nother attached packages:\n [1] tidySummarizedExperiment_1.19.7 tidyr_1.3.2                    \n [3] testthat_3.3.2                  ttservice_0.5.3                \n [5] patchwork_1.3.2                 reactable_0.4.5                \n [7] rlang_1.1.7                     microbenchmark_1.5.0           \n [9] airway_1.30.0                   SummarizedExperiment_1.40.0    \n[11] Biobase_2.70.0                  GenomicRanges_1.62.1           \n[13] Seqinfo_1.0.0                   IRanges_2.44.0                 \n[15] S4Vectors_0.48.0                BiocGenerics_0.56.0            \n[17] generics_0.1.4                  MatrixGenerics_1.22.0          \n[19] matrixStats_1.5.0               dplyr_1.2.0                    \n[21] ggplot2_4.0.2                   devtools_2.4.6                 \n[23] usethis_3.2.1                  \n\nloaded via a namespace (and not attached):\n [1] tidyselect_1.2.1    viridisLite_0.4.3   farver_2.1.2       \n [4] S7_0.2.1            fastmap_1.2.0       lazyeval_0.2.2     \n [7] digest_0.6.39       plyxp_1.4.3         lifecycle_1.0.5    \n[10] ellipsis_0.3.2      magrittr_2.0.4      compiler_4.5.3     \n[13] tools_4.5.3         yaml_2.3.12         data.table_1.18.2.1\n[16] knitr_1.51          S4Arrays_1.10.1     labeling_0.4.3     \n[19] htmlwidgets_1.6.4   pkgbuild_1.4.8      DelayedArray_0.36.0\n[22] RColorBrewer_1.1-3  pkgload_1.5.0       abind_1.4-8        \n[25] withr_3.0.2         purrr_1.2.1         desc_1.4.3         \n[28] grid_4.5.3          fansi_1.0.7         scales_1.4.0       \n[31] cli_3.6.5           rmarkdown_2.30      ragg_1.5.1         \n[34] remotes_2.5.0       otel_0.2.0          rstudioapi_0.18.0  \n[37] httr_1.4.8          sessioninfo_1.2.3   cachem_1.1.0       \n[40] stringr_1.6.0       XVector_0.50.0      vctrs_0.7.1        \n[43] Matrix_1.7-4        jsonlite_2.0.0      systemfonts_1.3.2  \n[46] crosstalk_1.2.2     plotly_4.12.0       glue_1.8.0         \n[49] reactR_0.6.1        stringi_1.8.7       gtable_0.3.6       \n[52] tibble_3.3.1        pillar_1.11.1       htmltools_0.5.9    \n[55] brio_1.1.5          R6_2.6.1            textshaping_1.0.5  \n[58] rprojroot_2.1.1     evaluate_1.0.5      lattice_0.22-9     \n[61] memoise_2.0.1       SparseArray_1.10.9  xfun_0.56          \n[64] fs_1.6.7            pkgconfig_2.0.3    \n```\n\n\n:::\n:::\n\n\n\n",
     "supporting": [
       "index_files"
     ],
diff --git a/posts/2025-10-25-tidySummarizedExperiment-optimization/index.qmd b/posts/2025-10-25-tidySummarizedExperiment-optimization/index.qmd
index 1330813a..b47182ba 100644
--- a/posts/2025-10-25-tidySummarizedExperiment-optimization/index.qmd
+++ b/posts/2025-10-25-tidySummarizedExperiment-optimization/index.qmd
@@ -6,7 +6,7 @@ contributors:
   - Michael Love
   - Justin Landis
   - Pierre-Paul Axisa
-date: "2025-10-25"
+date: "2026-03-22"
 package: tidySummarizedExperiment
 tags:
   - tidyomics/tidyomicsBlog