Skip to content

Replacing elements of a list over a certain threshold

8 messages · James, Christos Argyropoulos, Joris Meys +4 more

#
Dear List,

I have a list of length ~1000 filled with numerics. I need to replace 
the elements of this list that are above a certain numerical threshold 
with the value of the threshold.

e.g
example=list(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1)
threshold=5
<magic code goes here>
example=(1, 2, 3, 4, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 4, 3, 2, 1).

I have written a crude script that achieves this but it's very slow. Is 
there a way to do this using some R function?

Crude script: http://pastebin.com/3KSfi8nD
#
"magic" code:
example[example>threshold] <-threshold

Cheers
Joris
On Mon, Jun 21, 2010 at 12:34 PM, Jim Hargreaves <james at ipec.co.uk> wrote:

  
    
#
You shouldn't use sapply/lapply for this but use the indices
[1] 3 4 6 9 3 9
user  system elapsed
      0       0       0
[1] 3 4 5 5 3 5



On Mon, Jun 21, 2010 at 2:03 PM, Christos Argyropoulos
<argchris at hotmail.com> wrote:

  
    
1 day later
#
Jim Hargreaves <james at ipec.co.uk> [Mon, Jun 21, 2010 at 12:34:01PM CEST]:
lapply(example, min, 5)
#
On Jun 22, 2010, at 2:14 PM, Johannes Huesing wrote:

            
Perhaps wrapped in unlist( ) if a vector is desired.

The same strategy would work with pmin and probably be faster (albeit  
not a big deal if the list is only 1000 elements long:

unlist( pmin(example, 5) )
David Winsemius, MD
West Hartford, CT
#
Is it essential that the dataset 'example' be a 'list'
and not a 'numeric' object (created in this case by
calling 'c' instead of 'list')?

list's can contain elements of various types and there
are usually time and memory penalties for allowing that
flexibility.  numeric (or character or complex or logical)
objects contain only one type of element and generally
use less space than the equivalent list and processing them
generally takes less time.

E.g., here are timings for several of the algorithms
that have been suggested on equivalent list and numeric
objects of length 10^5:
  > x.orig <- runif(10^5, min=0, max=10) # numeric object
  > xl.orig <- as.list(x.orig) # list object, one scalar numeric vector per element
  > x <- x.orig ; system.time(x[x>5] <- 5)
     user  system elapsed
    0.000   0.000   0.004
  >  x <- x.orig ; system.time(x <- pmin(x, 5))
     user  system elapsed
    0.010   0.000   0.002
  > xl <- xl.orig ; system.time(xl[xl>5] <- 5)
     user  system elapsed
    0.020   0.000   0.013
  >  xl <- xl.orig ; system.time(xl <- pmin(xl, 5))
     user  system elapsed
    0.080   0.000   0.084
  > xl <- xl.orig ; system.time(xl <- lapply(xl, min, 5))
     user  system elapsed
    0.130   0.000   0.135

In addition to the time penalty, it just seems unnatural
to use a list to store numbers when a numeric object could
do the job.

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com