Skip to content

About populating a dataframe in a loop

7 messages · Richard M. Heiberger, jeremiah rounds, lily li +1 more

#
Hi R users,

I have a question about filling a dataframe in R using a for loop.

I created an empty dataframe first and then filled it, using the code:
pre.mat = data.frame()
for(i in 1:10){
    mat.temp = data.frame(some values filled in)
    pre.mat = rbind(pre.mat, mat.temp)
}
However, the resulted dataframe has not all the rows that I desired for.
What is the problem and how to solve it? Thanks.
#
Hello,

Works with me:

set.seed(6574)

pre.mat = data.frame()
for(i in 1:10){
     mat.temp = data.frame(x = rnorm(5), A = sample(LETTERS, 5, TRUE))
     pre.mat = rbind(pre.mat, mat.temp)
}

nrow(pre.mat)  # should be 50


Can you give us an example that doesn't work?

Rui Barradas

Em 06-01-2017 18:00, lily li escreveu:
#
Hi Rui,

Thanks for your reply. Yes, when I tried to rbind two dataframes, it works.
However, if there are more than 50, it got stuck for hours. When I tried to
terminate the process and open the csv file separately, it has only one
data frame. What is the problem? Thanks.
On Fri, Jan 6, 2017 at 11:12 AM, Rui Barradas <ruipbarradas at sapo.pt> wrote:

            

  
  
#
Incrementally increasing the size of an array is not efficient in R.
The recommended technique is to allocate as much space as you will
need, and then fill it.
user  system elapsed
  0.011   0.000   0.011
[1] 1001    5
user  system elapsed
  0.001   0.000   0.001
[1] 1001    5
On Fri, Jan 6, 2017 at 11:46 PM, lily li <chocold12 at gmail.com> wrote:
#
As a rule never rbind in a loop. It has O(n^2) run time because the rbind
itself can be O(n) (where n is the number of data.frames).  Instead either
put them all into a list with lapply or vector("list", length=) and then
datatable::rbindlist, do.call(rbind, thelist) or use the equivalent from
dplyr.  All of which will be much more efficient.
On Fri, Jan 6, 2017 at 8:46 PM, lily li <chocold12 at gmail.com> wrote:

            

  
  
#
Thanks, Richard. But if the data cannot fill the constructed data frame,
will there be NA values?


On Fri, Jan 6, 2017 at 10:07 PM, Richard M. Heiberger <rmh at temple.edu>
wrote:

  
  
#
Hello,

I believe you should follow Jeremiah's sugestion to first read all csv 
files into a list and then rbind them.
Something like the following.

file_list <- list.files(pattern = "*.csv")
df_list <- lapply(file_list, read.csv)
result <- do.call(rbind, df_list)

Hope this helps,

Rui Barradas

Em 07-01-2017 06:51, lily li escreveu: