lapply or data.table to find a unit's previous transaction
William,
This is a general r-help question, not a high performance computing question,
and should be directed to r-help for more useful and relevant answers, and to
keep this list (and its archives) on topic.
Also, when you re-post to r-help, you will get a more helpful answer if you
have a fully reproducible example, per the posting guidelines.
Regards,
- Brian
On 06/02/2010 11:43 AM, William Rogers wrote:
I have a dataset of property transactions that inlcudes the transaction ID
(TranID), property ID (UnitID), and transaction date (TranDt). I need to
create a data frame (or data table) that includes the previous transaction
date, if one exists.
This is an easy problem in SQL, where I just run a subquery, but I'm trying
to make R my one-stop-shopping program. The following code works on a
subset of my data, but I can't run this on my full dataset becuase my
computer runs out of memory after about 30 minutes.
#First I create a list of all the previous transactions by unit
TranList<- as.matrix(Data$TranID, ncol= 1) #Matrix of transaction IDs
PreTran<- lapply(TranList, #Make a list of all the prevoius
transactions
function(x) (with(Data, #TranID= cat(UnitID, TranDt)
Data[
UnitID== substr(x, 1, 9)&
TranDt< Data[TranID== x, "TranDt"], ]
))
)
#I do get warnings about missing data because some transactions have no
predecessor.
#Some transactions have no previous transactions, others have many so I pick
the most recent
BeforeTran<- lapply(seq_along(PreTran), function(x) (
with(PreTran[[x]], PreTran[[x]][which(TranDt== max(TranDt)), ])))
#I need to add the current transaction's TranID to the list so I can merge
later
BeforeTran<- lapply(seq_along(PreTran), function(x) (
transform(BeforeTran[[x]], TranID= TranList[x, 1])))
#Finally, I convert from a list to a data frame
BeforeTran<- do.call("rbind", BeforeTran)
#I have used a combination of data.table and for loops, but that seems
cheesey and doesn't preform much better.
library(data.table)
#First I create a list of all the previous transactions by unit
TranList<- vector(nrow(Data), mode= "list")
names(TranList)<- levels(Data$TranID)
#Convert my data frame to a data table
DataDT<- data.table(Data, key= "UnitID")
#Use a for loop and data.table to find the date of the previous transaction
for (i in levels(Data$TranID)) {
if (DataDT[J(substr(i, 1, 9))& #I run an IF statement to avoid 'missing
data' errors
TranDt<= (DataDT[TranID== i, TranDt]), #But it slows things down
length(TranDt)]> 1)
TranList[[i]]<- cbind(TranID= i, #This finds the last transaction date and
combines
DataDT[J(substr(i, 1, 9))& #the current transaction ID
TranDt< (DataDT[TranID== i, TranDt]),
list(TranDt= max(TranDt))])
}
#Finally, I convert from a list to a data table
BeforeTran<- do.call("rbind", TranList)
My intution says that this code doesn't take advantage of data.table's
attributes.
Are there any ideas out there? Thank you.
P.S. I've tried plyr and it does not help my memory problem.