Skip to content

Testing for Inequality à la "select case"

11 messages · diegol, Baptiste Auguie, Stavros Macrakis

#
Using R 2.7.0 under WinXP.

I need to write a function that takes a non-negative vector and returns the
parallell maximum between a percentage of this argument and a fixed value.
Both the percentages and the fixed values depend on which interval x falls
in. Intervals are as follows:
---------------------------------------------------------------
0           |       20000    |       65        |       0
20000     |       100000  |       40        |       14000	
100000   |       250000   |       30       |       40000	
250000   |       700000   |       25       |       75000
700000   |       1000000 |       20       |       175000
1000000 |       inf          |       --       |       250000

Once the interval is determined, the values in x are multiplied by the
percentages applying to the range in the 3rd column. 
If the result is less than the fourth column, then the latter is used.
For values of x falling in the last interval, 250,000 must be used.


My best attempt at it in R:

	MyRange <- function(x){

	range_aux = ifelse(x<=20000, 1, 
        	    ifelse(x<=100000, 2, 
	              ifelse(x<=250000, 3,
        	        ifelse(x<=700000, 4,
                	  ifelse(x<=1000000, 5,6)))))
	percent = c(0.65, 0.4, 0.3, 0.25, 0.2, 0)
	minimum = c(0, 14000, 40000, 75000, 175000, 250000)

	pmax(x * percent[range_aux], minimum[range_aux])

	}


This could be done in Excel much tidier in my opinion (especially the
range_aux part), element by element (cell by cell), 

with a VBA function as follows:

	Function MyRange(x as Double) as Double

	Select Case x
	    Case Is <= 20000
        	MyRange = 0.65 * x
	    Case Is <= 100000
	        RCJuiProfDet = IIf(0.40 * x < 14000, 14000, 0.4 * x)
	    Case Is <= 250000
	        RCJuiProfDet = IIf(0.3 * x < 40000, 40000, 0.3 * x)
	    Case Is <= 700000
	        RCJuiProfDet = IIf(0.25 * x < 75000, 75000, 0.25 * x)
	    Case Is <= 1000000
	        RCJuiProfDet = IIf(0.2 * x < 175000, 175000, 0.2 * x)    
	    Case Else
		' This is always 250000. I left it this way so it is analogous to the R
function
	        RCJuiProfDet = IIf(0 * x < 250000, 250000, 0 * x) 
	End Select

	End Function


Any way to improve my R function? I have searched the help archive and the
closest I have found is the switch function, which tests for equality only.
Thank you in advance for reading this.


-----
~~~~~~~~~~~~~~~~~~~~~~~~~~
Diego Mazzeo
Actuarial Science Student
Facultad de Ciencias Econ?micas
Universidad de Buenos Aires
Buenos Aires, Argentina
#
Hi,

I think you could get a cleaner solution using ?cut to split your data  
in given ranges (the break argument), and then using this factor to  
give the appropriate percentage.


Hope this helps,

baptiste
On 15 Mar 2009, at 20:12, diegol wrote:

            
_____________________________

Baptiste Augui?

School of Physics
University of Exeter
Stocker Road,
Exeter, Devon,
EX4 4QL, UK

Phone: +44 1392 264187

http://newton.ex.ac.uk/research/emag
#
On Sun, Mar 15, 2009 at 4:12 PM, diegol <diegol81 at gmail.com> wrote:
If you'd do it element-by-element in Excel, why not do it
element-by-element in R?

Create a table with the limits of the ranges

    range= c(20,100,250,700,1000,Inf)*1000

and then find the index of the appropriate case using something like

    idx <- which(x<=range)[1]

Then the formula becomes simply

    pmax( x*perc[idx], min[idx] )

Putting it all together:

mr <-
  local({
    # Local constants
    range= c(20,100,250,700,1000,Inf)*1000
    perc = c(65,40,30,25,20,0)/100
    min =  c(0,14,40,75,175,250)*1000

    function(x)
      { idx <- which(x<=range)[1]
        pmax( x*perc[idx], min[idx] )
      }
  })

Make sense?

          -s
#
Hello Stavros,
Well, actually I was hoping for a vectorized solution so as to avoid
looping. I need to use this formula on rather lengthy vectors and I wanted
to put R's efficiency to some good use. In any case, I had not come up with
your solution. For now, I'd stick to my ugly version.
Perfectly.

    >x <- 1:150 * 10000
    >y <- numeric(150)
    >for (i in 1:150) y[i] <- mr(x[i])
    >identical(MyRange(x), y)
    >TRUE

I would however use max instead of pmax, since the argument for mr() must be
a vector of length 1. The final version looks like this (also added a line
to avoid vectors of length > 1):

mr <-
  local({
    # Local constants
    range= c(20,100,250,700,1000,Inf)*1000
    perc = c(65,40,30,25,20,0)/100
    min =  c(0,14,40,75,175,250)*1000

    function(x)
      {if (length(x) >1) stop("x must have length 1")
      idx <- which(x<=range)[1]
        max( x*perc[idx], min[idx] )
      }
  })

Thank you very much for your help.
Diego
Stavros Macrakis-2 wrote:
-----
~~~~~~~~~~~~~~~~~~~~~~~~~~
Diego Mazzeo
Actuarial Science Student
Facultad de Ciencias Econ?micas
Universidad de Buenos Aires
Buenos Aires, Argentina
#
Hello Baptiste,

I am not very sure how I'd go about that. Taking the range, perc and min
vectors from Stavros' response:

    range= c(20,100,250,700,1000,Inf)*1000
    perc = c(65,40,30,25,20,0)/100
    min =  c(0,14,40,75,175,250)*1000

For range to work as the breaks argument to "cut", I think an additional
first element is needed:

    range = c(0, range)

Now I create a dummy vector x and apply cut to create a factor z:

    x <- 1:150 * 10000
    z <- cut(x = x, breaks = range)

The thing is, I cannot seem to figure out how to use this z factor to create
vectors of the same length as x with the corresponding elements of "percent"
and "min" defined above. Admittedly I have never felt very comfortable with
factors. Could you please give me some advice?

Thank you very much.
baptiste auguie-2 wrote:
-----
~~~~~~~~~~~~~~~~~~~~~~~~~~
Diego Mazzeo
Actuarial Science Student
Facultad de Ciencias Econ?micas
Universidad de Buenos Aires
Buenos Aires, Argentina
#
Hi,

I don't use ?cut and ?split very much either, so this may not be good  
advice. From what I understood of your problem, I would try something  
along those lines,
but it's getting late here so i may well be missing an important thing.

Hope this helps,

baptiste
On 15 Mar 2009, at 23:19, diegol wrote:

            
_____________________________

Baptiste Augui?

School of Physics
University of Exeter
Stocker Road,
Exeter, Devon,
EX4 4QL, UK

Phone: +44 1392 264187

http://newton.ex.ac.uk/research/emag
#
Hello Baptiste,

Thanks so much for your help. This function which is basically your input
wrapped with curly brackets seems to work alright:

    mr_2 <- function(x){
        range= c(20,100,250,700,1000,Inf)*1000
        perc = c(65,40,30,25,20,0)/100
        min =  c(0,14,40,75,175,250)*1000

        range = c(0, range)

        percent <- x
        minimum <- x

        z <- cut(x = x, breaks = range)
        levs <- levels(z)

        split(percent, z, drop = FALSE) <- perc
        split(minimum, z, drop = FALSE) <- min

        mydf <- data.frame(x, range= z, percent, minimum)
        mydf <- within(mydf, product  <-  x * percent)
        mydf$result <- with(mydf, ifelse(product < minimum, minimum,
product))

        mydf$result
    }

    # Basic Test
    x <- 1:150 * 10000
    identical(MyRange(x), mr_2(x))
    [1] TRUE

    # Yet another test 
    # (I will have a more in depth look at "split", "with" and "within" to
feel more comfortable)
    x <- 150:1 * 10000
    identical(TramosAutos(x), mr_2(x))
    [1] TRUE

Again, thank you very much to both of you.

Have a great week.
Diego
baptiste auguie-2 wrote:
-----
~~~~~~~~~~~~~~~~~~~~~~~~~~
Diego Mazzeo
Actuarial Science Student
Facultad de Ciencias Econ?micas
Universidad de Buenos Aires
Buenos Aires, Argentina
#
On Sun, Mar 15, 2009 at 6:34 PM, diegol <diegol81 at gmail.com> wrote:
That's what I meant by element-by -element. A vector in R corresponds
to a row or a column in Excel, and a vector operation in R corresponds
to a row or column of formulae, e.g.

Excel
     A      B       C
1)  5      10      a1+b1  (= 15)
2)  3       2       a2+b2  (= 5)
etc.

R
A <- c(5,3)
B <- c(10,2)
C <- A + B
max and pmax are equivalent in this case.  I just use pmax as my
default because it acts like other arithmetic operators (+, *, etc.)
which perform pointwise (element-by-element) operations.

I do wonder if there isn't a simpler/cleaner/faster way to write
which(x<=range)[1], then again in most applications it is plenty fast
enough.

             -s
#
Using cut/split seems like gross overkill here.  Among other things,
you don't need to generate labels for all the different ranges.

   which(x<=range)[1]

seems straightforward enough to me, but you could also use the
built-in function findInterval.

              -s
#
Steve, I still don't understand the analogy. I agree that in this case the R
approach is vectorized. However, your function just as you first proposed it
will not work without a loop.
It's true. I changed it because I had applied your original version of mr()
to the entire vector x, which gave an incorrect result (perhaps "range" was
recycled in "idx <- which(x<=range)[1]"). If I used max instead of pmax, and
ever happened to use mr() without a loop, the length of the result would be
strange enough for me to realise the error. But then again, I added the "if
(length(x) >1) stop("x must have length 1")" line, so using max or pmax now
doesn't really make a difference, apart perhaps from run time.
I could edit the mr_2() function a little bit to make it more efficient. I
left it mostly unchanged for the thread to be easier to follow. For example
I could replace the last four lines for only:

    product <- x*percent
    ifelse(product< minimum, minimum, product)

But I believe you refer to the cut/split functions rather. I agree that
"which(x<=range)[1]" is straighforward, but using such expression will
require a loop to pull the trick, which I don't intend. Am I missing
something?


Regards,
Diego
Stavros Macrakis-2 wrote:
-----
~~~~~~~~~~~~~~~~~~~~~~~~~~
Diego Mazzeo
Actuarial Science Student
Facultad de Ciencias Econ?micas
Universidad de Buenos Aires
Buenos Aires, Argentina
#
On Sun, Mar 15, 2009 at 11:46 PM, diegol <diegol81 at gmail.com> wrote:
Actually Stavros (???????), not Stephen/Steve (????????).  Both Greek,
but different names.
Approach is vectorized over the range parameter, but not vectorized
over the x parameter.  If you want to vectorize over x, you can use
findInterval:

mr <-
 local({
   # Local constants
   range= c(0,20,100,250,700,1000,Inf)*1000
   perc = c(65,40,30,25,20,0)/100
   min =  c(0,14,40,75,175,250)*1000

   function(x)
     { idx <- findInterval(x,range)
       pmax( x*perc[idx], min[idx] )
     }
 })

And this time, you *do* need pmax.  I did refer to cut/split, but only
to say they were unnecessary.

          -s