Skip to content

Sim-vs-obs plots for the 3 BU SIPNET ensembles (#238)#31

Open
AritraDey-Dev wants to merge 7 commits into
mainfrom
cal-val-plots-238
Open

Sim-vs-obs plots for the 3 BU SIPNET ensembles (#238)#31
AritraDey-Dev wants to merge 7 commits into
mainfrom
cal-val-plots-238

Conversation

@AritraDey-Dev

@AritraDey-Dev AritraDey-Dev commented Jun 10, 2026

Copy link
Copy Markdown
Member

Sim-vs-obs plots for the 3 BU SIPNET ensembles (Salinas SOCS, Modesto Nichols, Russell Ranch).

Closes ccmmf/organization#238.

Salinas SOCS — TotSoilCarb

salinas

Modesto Nichols — N₂O flux

modesto

Russell Ranch — TotSoilCarb (derived)

image

@AritraDey-Dev AritraDey-Dev deleted the cal-val-plots-238 branch June 10, 2026 14:48
@AritraDey-Dev AritraDey-Dev restored the cal-val-plots-238 branch June 10, 2026 14:51
@AritraDey-Dev AritraDey-Dev reopened this Jun 10, 2026
@AritraDey-Dev AritraDey-Dev requested a review from dlebauer June 10, 2026 14:55
Signed-off-by: Aritra Dey <adey01027@gmail.com>
Signed-off-by: Aritra Dey <adey01027@gmail.com>
Signed-off-by: Aritra Dey <adey01027@gmail.com>
Signed-off-by: Aritra Dey <adey01027@gmail.com>
Signed-off-by: Aritra Dey <adey01027@gmail.com>
Signed-off-by: Aritra Dey <adey01027@gmail.com>
Signed-off-by: Aritra Dey <adey01027@gmail.com>

@divine7022 divine7022 left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for plots, highlevel
not totally sure on where it belongs at this moment but for sure not workflows.
for now i would suggest have this under benckmarking/ dir or similar in uncertainity repo

droped some inline, nice to have a look

# ----------------------------------------------------------------------------
WORKBOOK_DIR <- Sys.getenv(
"WORKBOOK_DIR",
unset = "/projectnb2/dietzelab/ccmmf/usr/adey2/workflows"

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

small thing, nothing blocking. right now WORKBOOK_DIR points at your scc folder and the script grabs the raw xlsx straight from there

i just got cal-val-data repo stood up where the cleaned cal/val data lives as csv. once that settles it would be nicer to read from those csvs instead of the workbook so anybody can run this. so no need to touch anything now, just leaving a note so we circle back and hook it up later

# Read a single PEcAn-style NetCDF (one year file) and return a tibble with
# posix + the requested variable. Time is encoded in the standard CF way
# (units = "days since YYYY-01-01" or similar).
read_one_nc <- function(nc_path, variable) {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

read_one_nc is hand rolling the netcdf read here and parsing cf time units by chopping the string apart.

any reason not to lean on read.output/ ncdf4 helpers we already have for pulling the time origin and doing the unit conversion ? feels like a lot less to get wrong than re rolling it ourselves

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

came here to say this - PEcAn.utils::read.output()

more generally, there needs to be a strategy that 1) uses / improves existing PEcAn functionality and 2) writes new functions as if they are going to be merged into PEcAn at some point (see e.g. how I've organized the R folder in the downscaling repository).

We don't want to be held up by PEcAn review process, but we also don't want to re-invent the wheel and create one-off functions that aren't as robust or PEcAn-compatible as the ones that already exist.

@divine7022 divine7022 Jun 24, 2026

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these plot pngs are checked into the repo, and since they get regenerated they will keep churning in the diffs and go stale fast. might be cleaner drop the plots folder and just let folks rebuild them from the script, that way repo only carries code and not the output

this totally fine now, in intent to get it documented for records is more priority

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, please remove these plots from the PR

@dlebauer dlebauer left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please make sure to coordinate with the work that @ayushman1210 is doing in his GSOC project to make sure there isn't unnecessary duplication of work.

Other points

  • observations appear to be stock measurement at a single time point, while the model states are calculated as annual means
  • separate out dataset specific from generalized calculations, and separate calculations from plotting functions. e.g. read data --> compute metrics --> plot. Again, lets align with @ayushman1210.
  • separate out workflow documentation and dataset specific assumptions
  • add documentation (per ccmmf/organization#238).

Consider making suggested changes and then submitting as a PR to the uncertainty and cal-val-data repositories.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, please remove these plots from the PR

Comment on lines +25 to +26
MAGIC_MAIN <- file.path(WORKBOOK_DIR, "MAGiC Calibration Validation Dataset(1).xlsx")
MAGIC_RUSSELL <- file.path(WORKBOOK_DIR, "MAGiC Calibration Validation Data_ Russell Ranch.xlsx")

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

update to use data in cal-val-data repo


# Load ensemble output across all ENS-* dirs for one site and one variable.
# Returns a tibble: ens_num, posix, year, <variable>.
read_ensemble <- function(run_dir, variable, start_year, end_year) {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this functionality has been implemented multiple times; there is a PEcAn function PEcAn.utils::nc_merge_all_sites_by_year()

Before that function existed, I wrote similar functionality in
ccmmf/downscaling scripts/030_extract_sipnet_output.R - though I am not sure if @divine7022 has refactored that in one of the open PRs. It can output a single netcdf or long CSV.

dplyr::filter(!is.na(value))
}

# Derive Russell Ranch SOC stock (Mg C ha-1) from per-layer C% and BD.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

datasets and dataset-specific harmonization scripts should live together in a single repository (e.g. cal-val-data).

We don't want to prematurely optimize, but keep in mind:

  • writing a dataset specific function like this is appropriate for the first pass. and can be part of a dataset specific harmonization script as mentioned above. But general functionality should be extracted so that it doesn't have to be rewritten.
  • when refactoring, consider breaking out specific conversion functions into utility functions in data.land (e.g. https://github.com/PecanProject/pecan/blob/develop/modules/data.land/R/soil_utils.R)

}

# Derive Russell Ranch SOC stock (Mg C ha-1) from per-layer C% and BD.
# Per David's guidance (Slack):

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# Per David's guidance (Slack):

title = sprintf("%s — %s (%d–%d)",
site$name, site$treatment,
site$start_year, site$end_year),
subtitle = "Total Soil Carbon — derived obs (0-30 cm: C% x BD x depth, summed)\nAssumptions: total C = SOC (pH < 7); coarse fraction ignored (sandy loam)",

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

coarse fraction is not ignored; it is assumed to be zero based on soil type.

for (site in SITES) {
if (site$short == "russell") {
# Russell Ranch workbook stores per-layer C% + BD, not SOC stock directly.
# Derive the 0-30 cm SOC stock per David's guidance (Slack), then plot.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't need to include 'per David's guidance (Slack)' in the source code; this type of information goes in the PR description

# 2. Compute SOC stock per layer = C% * BD * depth, then sum across layers.
# 3. Assumptions:
# - total C = SOC (pH < 7 at this site)
# - coarse fraction negligible (sandy loam, not mentioned in source text)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please cite source that supports this assumption

Comment on lines +83 to +85
posix <- as.POSIXct(origin_date) + t_vals * 3600
} else {
posix <- as.POSIXct(origin_date) + t_vals * 86400

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use named constants, or, e.g.

year2s <- PEcAn.utils::ud_convert(1, 'y', 's')

}

# Plot ensemble band + observed points.
plot_sim_vs_obs <- function(site, variable, label, obs_var, units_conv = 1,

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lines 319 ff duplicate the core plotting block already implemented in plot_sim_vs_obs(). The special case seems to be how the observation dataframe is constructed.

Consider refactoring plot_sim_vs_obs() so it accepts an optional precomputed obs dataframe.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Sim-vs-obs plots for Salinas+Modesto+Russell Ranch SIPNET ensembles

3 participants