-
Notifications
You must be signed in to change notification settings - Fork 0
Expand file tree
/
Copy pathdoi_folder_implementation.Rmd
More file actions
105 lines (72 loc) · 3.23 KB
/
Copy pathdoi_folder_implementation.Rmd
File metadata and controls
105 lines (72 loc) · 3.23 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
---
title: "DOI Redux"
author: "Simon Goring"
date: "January 31, 2018"
output:
html_document:
includes:
before_body: resources/header.html
after_body: resources/footer.html
code_folding: null
keep_md: false
mathjax: null
self_contained: true
number_sections: no
highlight: null
toc: no
theme: yeti
md_extensions: -autolink_bare_uris
---
```{r setup, include=FALSE}
source('R/validate_dois.R')
```
## Check DOIs
This code first checks which datasets have been generated by Neotoma to date.
```{r check_dois}
output <- validate_dois()
output$neotoma_dataset <- stringr::str_match(output$relatedIdentifier, "downloads/(\\d*),")[,2]
readr::write_csv(output, path='./neotomadois.csv')
#output <- readr::read_csv('./neotomadois.csv')
```
This indicates that there are now `r nrow(output)` minted DOIs associated but only `r length(unique(output$neotoma_dataset))` datasets minted. This is part of a problem with the initial implementation and the permissions on the Neotoma Windows Server.
Current implementation is to report all DOIs, for the datasets, since many of these will share common underlying metadata. We will work towards cleaning this issue up in the future.
```{r, build_pages, echo = TRUE, warning = FALSE, message=FALSE, results='hide'}
counter <- list(good = NA,
bad = list())
all_ds <- neotoma::get_dataset() %>%
lapply(function(x)x$dataset.meta$dataset.id)
end_point <- '.'
sitemap <- '<?xml version=\"1.0\"?>\n<urlset xmlns=\"http://www.sitemaps.org/schemas/sitemap/0.9\">'
for (i in 1:length(all_ds)) {
ds_id <- i
dir.create(paste0(end_point, '/', ds_id))
tester <- try(rmarkdown::render('static_page.Rmd',
output_file = paste0(end_point, '/', ds_id, '/index.html'),
envir = globalenv(), quiet = TRUE))
if (!"try-error" %in% class(tester)) {
counter$good <- na.omit(c(counter$good, ds_id))
if(!'index_files' %in% list.files()) {
dir.create('./index_files')
file.copy(paste0("./", ds_id, "/index_files"), "index_files", recursive=TRUE)
}
sitemap <- paste0(sitemap, '<url><loc>http://data.neotomadb.org/datasets/',ds_id,'/</loc></url>\n')
# Remove the files for the js. . .
unlink(paste0(end_point, '/', ds_id, '/index_files'),
force = TRUE, recursive = TRUE)
# Note, this needs to be run as an admin.
aa <- system(paste0("ln -sr ", end_point , "/index_files/ ",
end_point, "/", ds_id, "/index_files"), intern = TRUE)
} else {
new <- length(counter$bad) + 1
counter$bad[[new]] <- list(ds_id, tester)
}
}
sitemap <- paste0(sitemap, '</urlset>')
fileConn<-file("../sitemap.xml")
writeLines(sitemap, fileConn)
close(fileConn)
```
This results in a total of `r length(counter$good)` folders and datasets with valid endpoints in the Neotoma data landing page set, and `r length(counter$bad)` dataset IDs without proper landing pages.
The landing pages contain dynamic elements, including leaflet maps that can be navigated. The pages themselves are (as noted) generated using RMarkdown.
## Future Steps
Ideally this platform will move to a more dynamic system that interacts directly with the database in an ongoing manner.