how to make the code more efficient using lapply
Hi Stephen,
I am not sure that the "for loop" is the source of slowness.
You seem to be doing a lot of unnecessary work each time through the loop.
e.g. no need to check if it's the last file, just move that section outside
of the loop.
It will be executed when the loop finishes. As it is you are calling
list.files() each time
through the loop which could be slow.
In any case here's a possible way to do it. Warning: untested!
f <- function(fn) {
temp<-read_xlsx(fn,sheet=1,range=cell_cols(c(1,30,38:42)))
temp<-temp[temp$Id %in% c("geneA","geneB","geneC"),]
}
myL <- lapply( X=list.files(), FUN=f )
temp.df.all<-do.call("rbind",myL)
names(temp.df.all)<-gsub("^.*] ","",names(temp.df.all))
write_xlsx(temp.df.all, path="output.xlsx")
HTH,
Eric
On Fri, May 25, 2018 at 9:24 AM, Stephen HonKit Wong <stephen66 at gmail.com>
wrote:
Dear All,
I have a following for-loop code which is basically intended to read in
many excel files (each file has many columns and rows) in a directory and
extract the some rows and columns out of each file and then combine them
together into a dataframe. I use for loop which can do the work but quite
slow. How to make it faster using lapply function ? Thanks in advance!
temp.df<-c() # create an empty list to store the extracted result from each
excel file inside for-loop
for (i in list.files()) { # loop through each excel file in the directory
temp<-read_xlsx(i,sheet=1,range=cell_cols(c(1,30,38:42))) # from
package
"readxl" to read in excel file
temp<-temp[grep("^geneA$|^geneB$|^geneC$",temp$Id),] # extract rows
based on temp$id
names(temp)<-gsub("^.*] ","",names(temp)) # clean up column names
temp.df<-append(temp.df, list(as.data.frame(temp))) # change the
dataframe to list, so it can be append to list.
if (i == list.files()[length(list.files())]){ # if it is last excel
file,
then combine all the rows in the list into a dataframe because they all
have same column names
temp.df.all<-do.call("rbind",temp.df)
write_xlsx(temp.df.all, path="output.xlsx") # write_xlsx from package
writexl.
}
}
*Stephen*
[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/ posting-guide.html and provide commented, minimal, self-contained, reproducible code.