Hi!
OK, I'm trying to select some "useful outliers" from
my dataset: I defined 11 "treshold" values (1 for each
level of a variable (sampling site) as follows:
tresholds<-function(x)
{
tapply(x,mm$NAME,FUN=mean ,simplify = T, na.rm=T)->med
tapply(x,mm$NAME,FUN=sd ,simplify = T,
na.rm=T)->standev
standev+med
}
tresholds(mm$chl)
Now I'd like to select those values from vector mm$chl
that are higher than each "treshold value", but how
can I compare a vector with 1885 elements with the one
with 11?
Sorry for this (probably) stupid question...
and thanks in advance.
Alessandro
vector vs array
6 messages · Adaikalavan Ramasamy, alessandro carletti, Dimitris Rizopoulos +1 more
General Notes :
a) Please try to give a simple example
b) Please avoid the rightwards assignment (i.e. "->"). Eventhough it is
perfectly legal to use it, it is confusing especially when you are
posting to a mailing list.
1) Here is a reproducible example
set.seed(1) # for reproducibility
v <- abs( rnorm(1000) )
thr <- c( 0.5, 1.0, 2.0, 3.0 )
2) If you simply want to count the number of points above a threshold
sapply( thr, function(x) sum(v > x) )
[1] 620 326 60 3
3) Or you can cut the data by threshold limits (be careful at the edges
if you have discrete data) followed by breaks
table( cut( v, breaks=c( -Inf, thr, Inf ) ) ) )
(-Inf,0.5] (0.5,1] (1,2] (2,3] (3,Inf]
380 294 266 57 3
4) If you want to turn the problem on its head and ask for which
threshold point would you get 99%, 99.9% and 99.99% of the data below
it, you can use use quantiles
quantile( v, c(0.99, 0.999, 0.9999) )
99% 99.9% 99.99%
2.529139 3.056497 3.734899
Regards, Adai
On Mon, 2005-08-08 at 08:34 -0700, alessandro carletti wrote:
Hi!
OK, I'm trying to select some "useful outliers" from
my dataset: I defined 11 "treshold" values (1 for each
level of a variable (sampling site) as follows:
tresholds<-function(x)
{
tapply(x,mm$NAME,FUN=mean ,simplify = T, na.rm=T)->med
tapply(x,mm$NAME,FUN=sd ,simplify = T,
na.rm=T)->standev
standev+med
}
tresholds(mm$chl)
Now I'd like to select those values from vector mm$chl
that are higher than each "treshold value", but how
can I compare a vector with 1885 elements with the one
with 11?
Sorry for this (probably) stupid question...
and thanks in advance.
Alessandro
______________________________________________ R-help at stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Ok, thanks, I'll try with a simplier example: I have a vector with 4 levels dataframe 1 station temp aaa 12 aaa 13 bbb 12 bbb 20 aaa 23 bbb 21 ccc 30 ccc 18 ddd 15 aaa 11 ddd 15 ddd 10 and a thresholds vector station thr aaa 20 bbb 18 ccc 25 ddd 10 I vant to select from dataframe 1 each value (level by level) > its own threshold value. How to do it automatically? (vector temp and vector thr have different length) Thanks
you could use something like this:
dat1 <- data.frame(station = rep(letters[1:5], 4), temp =
round(rnorm(20, 15, 3)))
dat2 <- data.frame(station = letters[1:5], temp = round(rnorm(5, 15,
4)))
################
dat <- merge(dat1, dat2, by = "station")
do.call("rbind", lapply(split(dat, dat$station), function(x){
out <- x[x$temp.x > x$temp.y, ]
if(nrow(out)) out else rep(NA, length(x))
}))
I hope it helps.
Best,
Dimitris
----
Dimitris Rizopoulos
Ph.D. Student
Biostatistical Centre
School of Public Health
Catholic University of Leuven
Address: Kapucijnenvoer 35, Leuven, Belgium
Tel: +32/16/336899
Fax: +32/16/337015
Web: http://www.med.kuleuven.be/biostat/
http://www.student.kuleuven.be/~m0390867/dimitris.htm
----- Original Message -----
From: "alessandro carletti" <alxmilton at yahoo.it>
To: "rHELP" <R-help at stat.math.ethz.ch>
Sent: Tuesday, August 09, 2005 9:58 AM
Subject: [R] more on vector vs array
Ok, thanks, I'll try with a simplier example: I have a vector with 4 levels dataframe 1 station temp aaa 12 aaa 13 bbb 12 bbb 20 aaa 23 bbb 21 ccc 30 ccc 18 ddd 15 aaa 11 ddd 15 ddd 10 and a thresholds vector station thr aaa 20 bbb 18 ccc 25 ddd 10 I vant to select from dataframe 1 each value (level by level) > its own threshold value. How to do it automatically? (vector temp and vector thr have different length) Thanks
______________________________________________ R-help at stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
If 'thr' were a vector with the stations as names, then you could do (untested): above <- dataframe1[, 'temp'] > thr[as.character(dataframe1[, 'station'])] Patrick Burns patrick at burns-stat.com +44 (0)20 8525 0696 http://www.burns-stat.com (home of S Poetry and "A Guide for the Unwilling S User")
alessandro carletti wrote:
Ok, thanks, I'll try with a simplier example: I have a vector with 4 levels dataframe 1 station temp aaa 12 aaa 13 bbb 12 bbb 20 aaa 23 bbb 21 ccc 30 ccc 18 ddd 15 aaa 11 ddd 15 ddd 10 and a thresholds vector station thr aaa 20 bbb 18 ccc 25 ddd 10 I vant to select from dataframe 1 each value (level by level) > its own threshold value. How to do it automatically? (vector temp and vector thr have different length) Thanks
______________________________________________ R-help at stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Nice one. But I think you could replace the last line (the one with
do.call) with the simpler
w <- which( dat[ ,2] > dat[ ,3] )
w
[1] 6 11 13 14 16 18 20
dat[ w, ]
station temp.x temp.y
6 b 18 16
11 c 17 15
13 d 16 14
14 d 17 14
16 d 17 14
18 e 16 15
20 e 19 15
Thank you.
Regards, Adai
On Tue, 2005-08-09 at 10:19 +0200, Dimitris Rizopoulos wrote:
you could use something like this:
dat1 <- data.frame(station = rep(letters[1:5], 4), temp =
round(rnorm(20, 15, 3)))
dat2 <- data.frame(station = letters[1:5], temp = round(rnorm(5, 15,
4)))
################
dat <- merge(dat1, dat2, by = "station")
do.call("rbind", lapply(split(dat, dat$station), function(x){
out <- x[x$temp.x > x$temp.y, ]
if(nrow(out)) out else rep(NA, length(x))
}))
I hope it helps.
Best,
Dimitris
----
Dimitris Rizopoulos
Ph.D. Student
Biostatistical Centre
School of Public Health
Catholic University of Leuven
Address: Kapucijnenvoer 35, Leuven, Belgium
Tel: +32/16/336899
Fax: +32/16/337015
Web: http://www.med.kuleuven.be/biostat/
http://www.student.kuleuven.be/~m0390867/dimitris.htm
----- Original Message -----
From: "alessandro carletti" <alxmilton at yahoo.it>
To: "rHELP" <R-help at stat.math.ethz.ch>
Sent: Tuesday, August 09, 2005 9:58 AM
Subject: [R] more on vector vs array
Ok, thanks, I'll try with a simplier example: I have a vector with 4 levels dataframe 1 station temp aaa 12 aaa 13 bbb 12 bbb 20 aaa 23 bbb 21 ccc 30 ccc 18 ddd 15 aaa 11 ddd 15 ddd 10 and a thresholds vector station thr aaa 20 bbb 18 ccc 25 ddd 10 I vant to select from dataframe 1 each value (level by level) > its own threshold value. How to do it automatically? (vector temp and vector thr have different length) Thanks
______________________________________________ R-help at stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
______________________________________________ R-help at stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html