Trouble retrieving the second largest value from each row of a data.frame
On Jul 24, 2010, at 9:27 PM, David Winsemius wrote:
On Jul 24, 2010, at 8:09 PM, David Winsemius wrote:
On Jul 24, 2010, at 4:54 PM, <mpward at illinois.edu> wrote:
THANKS, but I have one issue and one question. For some reason the "secondstrongest" value for row 3 and 6 are incorrect (they are the strongest) the remaining 10 are correct??
In my run of Wiley's code I instead get identical values for rows 2,5,6. Holtman's and my solutions did not suffer from that defect, although mine suffered from my misreading of your request, thinking that you wanted the top 3. The fix is trivial
These data are being used to track radio-tagged birds, they are from automated radio telemetry receivers. I will applying the following formula diff <- ((strongest- secondstrongest)/100) bearingdiff <-30-(-0.0624*(diff**2))-(2.8346*diff)
vals <- c("value0", "value60", "value120", "value180", "value240",
"value300")
value.str2 <- (match(yourdata$secondstrongestantenna, vals)-1)*60
Had a misspelling ... rather: match(yourdata$secondstrongantenna, vals)
value.str1 <- (match(yourdata$strongestantenna, vals)-1)*60 change.ind <- abs(match(yourdata, vals) - which(match(yourdata, vals) )
OOOPs should have been change.ind <- abs(match(yourdata, vals) - match(yourdata, vals) )
A) Then the bearing diff is added to strongestantenna (value0 = 0degrees) if the secondstrongestatenna is greater (eg value0 and value60),
B) or if the secondstrongestantenna is smaller than the strongestantenna, then the bearingdiff is substracted from the strongestantenna.
yourdata$finalbearing <- with(yourdata, ifelse (value.str2>value.str1, bearingdiff+value.str1, value.str1- bearingdiff) )
C) The only exception is that if value0 (0degrees) is strongest and value300(360degrees) is the secondstrongestantenna then the bearing is 360-bearingdiff.
yourdata$finalbearing <- with(yourdata, ifelse (strongestantenna == "value0" & secondstrongantenna == "value300", 360- bearingdiff, finalbearing) );
D) Also the strongestantenna and secondstrongestantenna have to be next to each other (e.g. value0 with value60, value240 with value300, value0 with value300) or the results should be NA.
After setting finalbearing with A, B, and C then:
yourdata$finalbearing <- with(yourdata, ifelse (
change.ind <5 & change.ind > 1 ,
NA, finalbearing) )
Better result with proper creation of value.str2:
yourdata
strongest secondstrongest strongestantenna secondstrongantenna
finalbearing
1 -11072 -11707 value120 value60
105.48359
2 -11176 -11799 value120 value180
134.76237
3 -11113 -11778 value120 value60
106.09061
4 -11071 -11561 value120
value240 NA
5 -11067 -11638 value120 value180
135.84893
6 -11068 -11698 value0 value60
14.61868
7 -11092 -11607 value120
value240 NA
8 -11061 -11426 value120
value240 NA
9 -11137 -11736 value120 value60
104.74034
10 -11146 -11779 value300 value0
285.44272
I have been trying to use a series of if,else statements to produce these bearing,
ifelse is the correct construct for processing vectors -- David.
but all I am producing is errors. Any suggestion would be appreciated.
Again THANKS for you efforts. Mike ---- Original message ----
Date: Fri, 23 Jul 2010 23:01:56 -0700
From: Joshua Wiley <jwiley.psych at gmail.com>
Subject: Re: [R] Trouble retrieving the second largest value from
each row of a data.frame
To: mpward at illinois.edu
Cc: r-help at r-project.org
Hi,
Here is a little function that will do what you want and return a
nice output:
#Function To calculate top two values and return
my.finder <- function(mydata) {
my.fun <- function(data) {
strongest <- which.max(data)
secondstrongest <- which.max(data[-strongest])
strongestantenna <- names(data)[strongest]
secondstrongantenna <- names(data[-strongest])[secondstrongest]
value <- matrix(c(data[strongest], data[secondstrongest],
strongestantenna, secondstrongantenna), ncol =4)
return(value)
}
dat <- apply(mydata, 1, my.fun)
dat <- t(dat)
dat <- as.data.frame(dat, stringsAsFactors = FALSE)
colnames(dat) <- c("strongest", "secondstrongest",
"strongestantenna", "secondstrongantenna")
dat[ , "strongest"] <- as.numeric(dat[ , "strongest"])
dat[ , "secondstrongest"] <- as.numeric(dat[ , "secondstrongest"])
return(dat)
}
#Using your example data:
yourdata <- structure(list(value0 = c(-13007L, -12838L, -12880L,
-12805L,
-12834L, -11068L, -12807L, -12770L, -12988L, -11779L), value60 =
c(-11707L,
-13210L, -11778L, -11653L, -13527L, -11698L, -14068L, -11665L,
-11736L, -12873L), value120 = c(-11072L, -11176L, -11113L, -11071L,
-11067L, -12430L, -11092L, -11061L, -11137L, -12973L), value180 =
c(-12471L,
-11799L, -12439L, -12385L, -11638L, -12430L, -11709L, -12373L,
-12570L, -12537L), value240 = c(-12838L, -13210L, -13089L, -11561L,
-13527L, -12430L, -11607L, -11426L, -13467L, -12973L), value300 =
c(-13357L,
-13845L, -13880L, -13317L, -13873L, -12814L, -13025L, -12805L,
-13739L, -11146L)), .Names = c("value0", "value60", "value120",
"value180", "value240", "value300"), class = "data.frame",
row.names = c("1",
"2", "3", "4", "5", "6", "7", "8", "9", "10"))
my.finder(yourdata) #and what you want is in a nicely labeled
data frame
#A potential problem is that it is not very efficient
#Here is a test using a matrix of 100,000 rows
#sampled from the same range as your data
#with the same number of columns
data.test <- matrix(
sample(seq(min(yourdata),max(yourdata)), size = 500000, replace =
TRUE),
ncol = 5)
system.time(my.finder(data.test))
#On my system I get
system.time(my.finder(data.test))
user system elapsed 2.89 0.00 2.89 Hope that helps, Josh On Fri, Jul 23, 2010 at 6:20 PM, <mpward at illinois.edu> wrote:
I have a data frame with a couple million lines and want to retrieve the largest and second largest values in each row, along with the label of the column these values are in. For example row 1 strongest=-11072 secondstrongest=-11707 strongestantenna=value120 secondstrongantenna=value60 Below is the code I am using and a truncated data.frame. Retrieving the largest value was easy, but I have been getting errors every way I have tried to retrieve the second largest value. I have not even tried to retrieve the labels for the value yet. Any help would be appreciated Mike
data<- data.frame(value0,value60,value120,value180,value240,value300) data
value0 value60 value120 value180 value240 value300 1 -13007 -11707 -11072 -12471 -12838 -13357 2 -12838 -13210 -11176 -11799 -13210 -13845 3 -12880 -11778 -11113 -12439 -13089 -13880 4 -12805 -11653 -11071 -12385 -11561 -13317 5 -12834 -13527 -11067 -11638 -13527 -13873 6 -11068 -11698 -12430 -12430 -12430 -12814 7 -12807 -14068 -11092 -11709 -11607 -13025 8 -12770 -11665 -11061 -12373 -11426 -12805 9 -12988 -11736 -11137 -12570 -13467 -13739 10 -11779 -12873 -12973 -12537 -12973 -11146
#largest value in the row strongest<-apply(data,1,max) #second largest value in the row n<-function(data)(1/(min(1/(data[1,]-max(data[1,]))))+ (max(data[1,]))) secondstrongest<-apply(data,1,n)
Error in data[1, ] : incorrect number of dimensions
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
-- Joshua Wiley Ph.D. Student, Health Psychology University of California, Los Angeles http://www.joshuawiley.com/
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.