Skip to content

vector vs array

6 messages · Adaikalavan Ramasamy, alessandro carletti, Dimitris Rizopoulos +1 more

#
Hi!
OK, I'm trying to select some "useful outliers" from
my dataset: I defined 11 "treshold" values (1 for each
level of a variable (sampling site) as follows:


tresholds<-function(x)
{
tapply(x,mm$NAME,FUN=mean ,simplify = T, na.rm=T)->med


tapply(x,mm$NAME,FUN=sd ,simplify = T,
na.rm=T)->standev 

standev+med

}
tresholds(mm$chl)


Now I'd like to select those values from vector mm$chl
that are higher than each "treshold value", but how
can I compare a vector with 1885 elements with the one
with 11?
Sorry for this (probably) stupid question...
and thanks in advance.
Alessandro
#
General Notes :
a) Please try to give a simple example
b) Please avoid the rightwards assignment (i.e. "->"). Eventhough it is
perfectly legal to use it, it is confusing especially when you are
posting to a mailing list.


1) Here is a reproducible example

 set.seed(1)                         # for reproducibility
 v   <- abs( rnorm(1000) )
 thr <- c( 0.5, 1.0, 2.0, 3.0 )


2) If you simply want to count the number of points above a threshold

 sapply( thr, function(x) sum(v > x) )
 [1] 620 326  60   3


3) Or you can cut the data by threshold limits (be careful at the edges
if you have discrete data) followed by breaks

 table( cut( v, breaks=c( -Inf, thr, Inf ) ) ) )

 (-Inf,0.5]    (0.5,1]      (1,2]      (2,3]    (3,Inf]
        380        294        266         57          3

 
4) If you want to turn the problem on its head and ask for which
threshold point would you get 99%, 99.9% and 99.99% of the data below
it, you can use use quantiles

 quantile( v, c(0.99, 0.999, 0.9999) )
      99%    99.9%   99.99%
 2.529139 3.056497 3.734899


Regards, Adai
On Mon, 2005-08-08 at 08:34 -0700, alessandro carletti wrote:
#
Ok, thanks,
I'll try with a simplier example:

I have a vector with 4 levels

dataframe 1
station   temp
aaa        12
aaa        13
bbb        12
bbb        20
aaa        23
bbb        21
ccc        30
ccc        18
ddd        15
aaa        11
ddd        15
ddd        10


and a thresholds vector

station    thr  
aaa         20
bbb         18
ccc         25
ddd         10


I vant to select from dataframe 1 each value (level by
level) > its own threshold value.
How to do it automatically? (vector temp and vector
thr have different length)

Thanks
#
you could use something like this:

dat1 <- data.frame(station = rep(letters[1:5], 4), temp = 
round(rnorm(20, 15, 3)))
dat2 <- data.frame(station = letters[1:5], temp = round(rnorm(5, 15, 
4)))
################
dat <- merge(dat1, dat2, by = "station")
do.call("rbind", lapply(split(dat, dat$station), function(x){
        out <- x[x$temp.x > x$temp.y, ]
        if(nrow(out)) out else rep(NA, length(x))
    }))


I hope it helps.

Best,
Dimitris

----
Dimitris Rizopoulos
Ph.D. Student
Biostatistical Centre
School of Public Health
Catholic University of Leuven

Address: Kapucijnenvoer 35, Leuven, Belgium
Tel: +32/16/336899
Fax: +32/16/337015
Web: http://www.med.kuleuven.be/biostat/
     http://www.student.kuleuven.be/~m0390867/dimitris.htm


----- Original Message ----- 
From: "alessandro carletti" <alxmilton at yahoo.it>
To: "rHELP" <R-help at stat.math.ethz.ch>
Sent: Tuesday, August 09, 2005 9:58 AM
Subject: [R] more on vector vs array
#
If 'thr' were a vector with the stations as names,
then you could do (untested):

above <- dataframe1[, 'temp'] > thr[as.character(dataframe1[, 'station'])]

Patrick Burns
patrick at burns-stat.com
+44 (0)20 8525 0696
http://www.burns-stat.com
(home of S Poetry and "A Guide for the Unwilling S User")
alessandro carletti wrote:

            
#
Nice one. But I think you could replace the last line (the one with
do.call) with the simpler

 w <- which( dat[ ,2] > dat[ ,3] )
 w
 [1]  6 11 13 14 16 18 20

 dat[ w, ]
    station temp.x temp.y
 6        b     18     16
 11       c     17     15
 13       d     16     14
 14       d     17     14
 16       d     17     14
 18       e     16     15
 20       e     19     15

Thank you.

Regards, Adai
On Tue, 2005-08-09 at 10:19 +0200, Dimitris Rizopoulos wrote: