Skip to content

Saving to CSV does not quote text fields leading to data corruption #85

@gavinsimpson

Description

@gavinsimpson

Consider https://apps.neotomadb.org/explorer/?siteids=26991 and in particular the pollen data set https://data.neotomadb.org/47520

The Name field for this data set has for example (row 9) ?Microthyrium (type 8, HdV) for the taxon name (there are others in this file. Where the , in the name causes data corruption upon reading of the downloaded file because the CSV file is created without either escaping the delimiter , or not quoting text columns, which would seem to be the easiest solution.

If you do download the CSV file for the pollen data set and try to read it into R with for example:

pollen <- read_csv("dataset47520_site26991.csv")

you'll see the problem:

Warning message:
One or more parsing issues, call `problems()` on your data frame for details,
e.g.:
  dat <- vroom(...)
  problems(dat) 
> problems(pollen)
# A tibble: 10 × 5
     row   col expected    actual      file 
   <int> <int> <chr>       <chr>       <chr>
 1     9   202 201 columns 202 columns ""   
 2    13   202 201 columns 202 columns ""   
 3    14   202 201 columns 202 columns ""   
 4    22   202 201 columns 202 columns ""   
 5    52   202 201 columns 202 columns ""   
 6    53   202 201 columns 202 columns ""   
 7    57   202 201 columns 202 columns ""   
 8    80   202 201 columns 202 columns ""   
 9   123   202 201 columns 202 columns ""   
10   124   202 201 columns 202 columns ""   

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions