Don MacQueen
Lawrence Livermore National Laboratory
7000 East Ave., L-627
Livermore, CA 94550
925-423-1062
On 11/20/13 12:16 PM, "Noah Silverman" <noahsilverman at g.ucla.edu> wrote:
>Hello,
>
>I have a fairly large data.frame. (About 150,000 rows of 100
>variables.) There are case IDs, and multiple entries for each ID, with a
>date stamp. (i.e. records of peoples activity.)
>
>
>I need to iterate over each person (record ID) in the data set, and then
>process their data for each date. The processing part is fast, the date
>part is fast. Locating the records is slow. I've even tried using
>data.table, with ID set as the index, and it is still slow.
>
>The line with the slow process (According to Rprof) is:
>
>
>j <- which( d$id == person )
>
>(I then process all the records indexed by j, which seems fast enough.)
>
>where d is my data.frame or data.table
>
>I thought that using the data.table indexing would speed things up, but
>not in this case.
>
>Any ideas on how to speed this up?
>
>
>Thanks!
>
>--
>Noah Silverman, M.S., C.Phil
>UCLA Department of Statistics
>8117 Math Sciences Building
>Los Angeles, CA 90095
>
>______________________________________________
>R-help at r-project.org mailing list
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.