Back to formatted view
Raw Message

Message-ID: <fff7708f0902271447l31b12659lc29c86b109d2373f@mail.gmail.com>
Date: 2009-02-27T22:47:24Z
From: Andrew Ziem
Subject: Optimize for loop / find last record for each person
In-Reply-To: <77EB52C6DD32BA4D87471DCD70C8D700C7F857@NA-PA-VBE03.na.tibco.com>

On Fri, Feb 27, 2009 at 2:10 PM, William Dunlap <wdunlap at tibco.com> wrote:
> Andrew, it makes it easier to help if you supply a typical
> input and expected output along with your code. ?I tried
> your code with the following input:

I'll be careful to avoid these mistakes.  Also, I should not have used
a reserved word for the variable history, and I should have mentioned
the data is sorted with the most recent dates first. Talk about a bad
day! :)

Originally I omitted this code before the for loop:

history["order"] <- NA
history[1,"order"] = 1

Here's a sample data set:
history_ <- data.frame(person_id=list(c(1,2,2)),date_=list(c("2009-01-01","2009-02-03","2009-02-02")),
x=list(c(0.01,0.05,0.06)) )
colnames(history_) <- c("person_id", "date_","x")
history_

Jorge's suggestion[1] works for me, and it seems much faster.  I
simply adapted it by replacing Jorge's variable x with a sequential
identifier already in the database.
[1] https://stat.ethz.ch/pipermail/r-help/2009-February/189981.html

> The following function, f2, does what I think you are saying
> you want.  It sorts the data by person_id, breaking ties with
> date, and then selects the rows where the person_id entry does

My data is already sorted by the SQL database like this
 ORDER BY person_id, date_ DESC

Thanks everyone for responding and expanding my knowledge of R!


Best regards,
Andrew