Sim-vs-obs plots for the 3 BU SIPNET ensembles (#238)#31
Conversation
7816e40 to
f58370b
Compare
f58370b to
ce2a791
Compare
Signed-off-by: Aritra Dey <adey01027@gmail.com>
Signed-off-by: Aritra Dey <adey01027@gmail.com>
Signed-off-by: Aritra Dey <adey01027@gmail.com>
Signed-off-by: Aritra Dey <adey01027@gmail.com>
Signed-off-by: Aritra Dey <adey01027@gmail.com>
Signed-off-by: Aritra Dey <adey01027@gmail.com>
Signed-off-by: Aritra Dey <adey01027@gmail.com>
ce2a791 to
3749a56
Compare
divine7022
left a comment
There was a problem hiding this comment.
thanks for plots, highlevel
not totally sure on where it belongs at this moment but for sure not workflows.
for now i would suggest have this under benckmarking/ dir or similar in uncertainity repo
droped some inline, nice to have a look
| # ---------------------------------------------------------------------------- | ||
| WORKBOOK_DIR <- Sys.getenv( | ||
| "WORKBOOK_DIR", | ||
| unset = "/projectnb2/dietzelab/ccmmf/usr/adey2/workflows" |
There was a problem hiding this comment.
small thing, nothing blocking. right now WORKBOOK_DIR points at your scc folder and the script grabs the raw xlsx straight from there
i just got cal-val-data repo stood up where the cleaned cal/val data lives as csv. once that settles it would be nicer to read from those csvs instead of the workbook so anybody can run this. so no need to touch anything now, just leaving a note so we circle back and hook it up later
| # Read a single PEcAn-style NetCDF (one year file) and return a tibble with | ||
| # posix + the requested variable. Time is encoded in the standard CF way | ||
| # (units = "days since YYYY-01-01" or similar). | ||
| read_one_nc <- function(nc_path, variable) { |
There was a problem hiding this comment.
read_one_nc is hand rolling the netcdf read here and parsing cf time units by chopping the string apart.
any reason not to lean on read.output/ ncdf4 helpers we already have for pulling the time origin and doing the unit conversion ? feels like a lot less to get wrong than re rolling it ourselves
There was a problem hiding this comment.
came here to say this - PEcAn.utils::read.output()
more generally, there needs to be a strategy that 1) uses / improves existing PEcAn functionality and 2) writes new functions as if they are going to be merged into PEcAn at some point (see e.g. how I've organized the R folder in the downscaling repository).
We don't want to be held up by PEcAn review process, but we also don't want to re-invent the wheel and create one-off functions that aren't as robust or PEcAn-compatible as the ones that already exist.
There was a problem hiding this comment.
these plot pngs are checked into the repo, and since they get regenerated they will keep churning in the diffs and go stale fast. might be cleaner drop the plots folder and just let folks rebuild them from the script, that way repo only carries code and not the output
this totally fine now, in intent to get it documented for records is more priority
There was a problem hiding this comment.
Agreed, please remove these plots from the PR
There was a problem hiding this comment.
Please make sure to coordinate with the work that @ayushman1210 is doing in his GSOC project to make sure there isn't unnecessary duplication of work.
Other points
- observations appear to be stock measurement at a single time point, while the model states are calculated as annual means
- separate out dataset specific from generalized calculations, and separate calculations from plotting functions. e.g. read data --> compute metrics --> plot. Again, lets align with @ayushman1210.
- separate out workflow documentation and dataset specific assumptions
- add documentation (per ccmmf/organization#238).
Consider making suggested changes and then submitting as a PR to the uncertainty and cal-val-data repositories.
There was a problem hiding this comment.
Agreed, please remove these plots from the PR
| MAGIC_MAIN <- file.path(WORKBOOK_DIR, "MAGiC Calibration Validation Dataset(1).xlsx") | ||
| MAGIC_RUSSELL <- file.path(WORKBOOK_DIR, "MAGiC Calibration Validation Data_ Russell Ranch.xlsx") |
There was a problem hiding this comment.
update to use data in cal-val-data repo
|
|
||
| # Load ensemble output across all ENS-* dirs for one site and one variable. | ||
| # Returns a tibble: ens_num, posix, year, <variable>. | ||
| read_ensemble <- function(run_dir, variable, start_year, end_year) { |
There was a problem hiding this comment.
this functionality has been implemented multiple times; there is a PEcAn function PEcAn.utils::nc_merge_all_sites_by_year()
Before that function existed, I wrote similar functionality in
ccmmf/downscaling scripts/030_extract_sipnet_output.R - though I am not sure if @divine7022 has refactored that in one of the open PRs. It can output a single netcdf or long CSV.
| dplyr::filter(!is.na(value)) | ||
| } | ||
|
|
||
| # Derive Russell Ranch SOC stock (Mg C ha-1) from per-layer C% and BD. |
There was a problem hiding this comment.
datasets and dataset-specific harmonization scripts should live together in a single repository (e.g. cal-val-data).
We don't want to prematurely optimize, but keep in mind:
- writing a dataset specific function like this is appropriate for the first pass. and can be part of a dataset specific harmonization script as mentioned above. But general functionality should be extracted so that it doesn't have to be rewritten.
- when refactoring, consider breaking out specific conversion functions into utility functions in data.land (e.g. https://github.com/PecanProject/pecan/blob/develop/modules/data.land/R/soil_utils.R)
| } | ||
|
|
||
| # Derive Russell Ranch SOC stock (Mg C ha-1) from per-layer C% and BD. | ||
| # Per David's guidance (Slack): |
There was a problem hiding this comment.
| # Per David's guidance (Slack): |
| title = sprintf("%s — %s (%d–%d)", | ||
| site$name, site$treatment, | ||
| site$start_year, site$end_year), | ||
| subtitle = "Total Soil Carbon — derived obs (0-30 cm: C% x BD x depth, summed)\nAssumptions: total C = SOC (pH < 7); coarse fraction ignored (sandy loam)", |
There was a problem hiding this comment.
coarse fraction is not ignored; it is assumed to be zero based on soil type.
| for (site in SITES) { | ||
| if (site$short == "russell") { | ||
| # Russell Ranch workbook stores per-layer C% + BD, not SOC stock directly. | ||
| # Derive the 0-30 cm SOC stock per David's guidance (Slack), then plot. |
There was a problem hiding this comment.
don't need to include 'per David's guidance (Slack)' in the source code; this type of information goes in the PR description
| # 2. Compute SOC stock per layer = C% * BD * depth, then sum across layers. | ||
| # 3. Assumptions: | ||
| # - total C = SOC (pH < 7 at this site) | ||
| # - coarse fraction negligible (sandy loam, not mentioned in source text) |
There was a problem hiding this comment.
please cite source that supports this assumption
| posix <- as.POSIXct(origin_date) + t_vals * 3600 | ||
| } else { | ||
| posix <- as.POSIXct(origin_date) + t_vals * 86400 |
There was a problem hiding this comment.
use named constants, or, e.g.
year2s <- PEcAn.utils::ud_convert(1, 'y', 's')
| } | ||
|
|
||
| # Plot ensemble band + observed points. | ||
| plot_sim_vs_obs <- function(site, variable, label, obs_var, units_conv = 1, |
There was a problem hiding this comment.
lines 319 ff duplicate the core plotting block already implemented in plot_sim_vs_obs(). The special case seems to be how the observation dataframe is constructed.
Consider refactoring plot_sim_vs_obs() so it accepts an optional precomputed obs dataframe.
Sim-vs-obs plots for the 3 BU SIPNET ensembles (Salinas SOCS, Modesto Nichols, Russell Ranch).
Closes ccmmf/organization#238.
Salinas SOCS — TotSoilCarb
Modesto Nichols — N₂O flux
Russell Ranch — TotSoilCarb (derived)