Pointer to covariates?
On Wed, 20 Feb 2002, Gabor Grothendieck wrote:
In the first line, use the dist function, found in library mva, to get the distance between each pair of rows. From this calculate an incidence matrix for which element i,j is true if row i in dat equals row j in dat (and false elsewhere). In the second line, for each row calculate the indices of the matching rows and take the minimum of those as the key. incid <- as.matrix(dist(dat[,-1],method="max"))==0 keys <- unlist(lapply(apply(incid,1,which),min))
Thank you very much! This is very fast, much faster than my attempts so far, but it has two drawbacks: 1. It gives pointers to first occurrences in the _original_ data frame, not the 'unique' version. 2. The first step results in a _huge_ matrix 'incid', too huge for my applications. However, this is a promising first attempt, and I will try to refine the idea. Again, thanks! G?ran
--- G?ran Brostr?m <gb at stat.umu.se> wrote:
I have a dataframe 'dat' with one response and some covariates. Many observations (rows), but only a few unique combinations of the covariates. Let's say that the response is in column 1, and the covariates in columns 2:k. I want to do
covar <- unique.data.frame(dat[, 2:k]) y <- dat[, 1] keys <- ??????
where 'keys' should be a vector of length length(y) and contain the row numbers in 'covar', where the response will find its covariates. Example:
dat
y x1 x2 1 1 1 0 2 2 0 1 3 3 1 0
unique.data.frame(dat[, 2:3])
x1 x2 1 1 0 2 0 1
keys
1 1 2 2 3 1 But how do I get 'keys'? -- G?ran Brostr?m tel: +46 90 786 5223 professor fax: +46 90 786 6614 Department of Statistics http://www.stat.umu.se/egna/gb/ Ume? University SE-90187 Ume?, Sweden e-mail: gb at stat.umu.se -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
_____________________________________________________________ -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
G?ran Brostr?m tel: +46 90 786 5223 professor fax: +46 90 786 6614 Department of Statistics http://www.stat.umu.se/egna/gb/ Ume? University SE-90187 Ume?, Sweden e-mail: gb at stat.umu.se -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._