reading in csv files, some of which have column names and some of which don't
Like Bert, I can't see an easy approach for datasets that have
character rather than numeric data. But here's a simple approach for
distinguishing files that have possible character headers but numeric
data.
readheader <- function(filename) {
possibleheader <- read.table(filename, nrows=1, sep=",", header=FALSE)
if(all(is.numeric(possibleheader[,1]))) {
# no header
infile <- read.table(filename, sep=",", header=FALSE)
} else {
# has header
infile <- read.table(filename, sep=",", header=TRUE)
}
infile
}
#### file noheader.csv ####
1,1,1
2,2,2
3,3,3
#### file hasheader.csv ####
a,b,c
1,1,1
2,2,2
3,3,3
########################
readheader("noheader.csv")
V1 V2 V3 1 1 1 1 2 2 2 2 3 3 3 3
readheader("hasheader.csv")
a b c 1 1 1 1 2 2 2 2 3 3 3 3 Sarah
On Tue, Aug 13, 2019 at 2:00 PM Christopher W Ryan <cryan at binghamton.edu> wrote:
Alas, we spend so much time and energy on data wrangling . . . .
I'm given a collection of csv files to work with---"found data". They arose
via saving Excel files to csv format. They all have the same column
structure, except that some were saved with column names and some were not.
I have a code snippet that I've used before to traverse a directory and
read into R all the csv files of a certain filename pattern within it, and
combine them all into a single dataframe:
library(dplyr)
## specify the csv files that I will want to access
files.to.read <- list.files(path = "H:/EH", pattern =
"WICLeadLabOrdersDone.+", all.files = FALSE, full.names = TRUE, recursive =
FALSE, ignore.case = FALSE, include.dirs = FALSE, no.. = FALSE)
## function to read csv files back in
read.csv.files <- function(filename) {
bb <- read.csv(filename, colClasses = "character", header = TRUE)
bb
}
## now read the csv files, as all character
b <- lapply(files.to.read, read.csv.files)
ddd <- bind_rows(b)
But this assumes that all files have column names in their first row. In
this case, some don't. Any advice how to handle it so that those with
column names and those without are read in and combined properly? The only
thing I've come up with so far is:
## function to read csv files back in
## Unfortunately, some of the csv files are saved with column headers, and
some are saved without them.
## This presents a problem when defining the function to read them: header
= TRUE or header = FALSE?
## The best solution I can think of as of 13 August 2019 is to use header =
FALSE and skip the
## first row of every file. This will sacrifice one record from each csv of
about 80 files
read.csv.files <- function(filename) {
bb <- read.csv(filename, colClasses = "character", header = FALSE, skip
= 1)
bb
}
This sacrifices about 80 out of about 1600 records. For my purposes in this
instance, this may be acceptable, but of course I'd rather not.
Thanks.
--Chris Ryan
[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Sarah Goslee (she/her) http://www.numberwright.com