Skip to content

Conditional Statistics

4 messages · Joseph Norman Thomson, Simon Blomberg, John Kane +1 more

#
Hello,

I am a new user of R. I am coming from SAS and do statistics on stock
market data, economic data, and social data. My question is this: How
can you get the mean, standard dev, etc. of a variable based on a
conditional statement on either the same variable or a different
variable in the same data set? So if I had the closing prices of the
S&P from 01/01/1990-12/31/1990, how could I get the average price of
the S&P from 02/01/1990-03/15/1990? Or the average price of the S&P on
Mondays (assuming a dummy var is created for 1 = Monday, 0 = else). I
understand that you can create subsets and new data sets based on the
conditional statements; but is there an easier way to do this by
typing a line into the mean() statement? That was extremely easy in
SAS where you could say:

proc means data=sp500;
var price;
where monday = 1;

Thank you for your help.

Joe
#
You can use the tapply function to do this. You can't type a line into 
the mean statement. (See ?mean for what you can type in there). The 
general approach is to have a vector of data (stock prices) and a 
categorical variable (day of week). Then break up the data vector 
according to the levels in the categorical variable, and calculate the 
mean values:

Weekmeans <- tapply(data.vector, catvariable, mean)

This will give you the means for all days. If you really just want one 
mean (just monday), you could do:

Monmean <- mean(data.vector[catvariable=="Monday"])

Similarly, if you want the standard deviation for each day of the week, 
you would use:

WeekSD <- tapply(data.vector, catvariable, sd)
MonSD <- sd(data.vector[catvariable=="Monday"])

You will find that some things that are easy in SAS require a little 
more thought in R, and vice versa. Certainly, the philosophical approach 
to data analysis in R is different to that in SAS. There are a couple of 
books for R for SAS users. They might help you.

Cheers,

Simon.
On 08/01/13 11:17, Joseph Norman Thomson wrote:

  
    
#
I think Simon has provided a good answer to the actual question but as a refugee from SAS I'd suggest having a look at www.et.bs.ehu.es/~etptupaf/pub/R/RforSAS&SPSSusers.pdf or getting the book Muenchen, R. A. (2008). R for SAS and SPSS Users (1st ed.). Springer.

R ans SAS approach things very differently at times.

John Kane
Kingston ON Canada
____________________________________________________________
FREE 3D MARINE AQUARIUM SCREENSAVER - Watch dolphins, sharks & orcas on your desktop!
#
tapply has already been referred to. You may also find aggregate() useful, as it gives you back a data frame that includes the conditioning variables if you tell it to. Alse ave, if you want to do something like mean-centring a data set based on group means rather than the grad mean.

S Ellison

*******************************************************************
This email and any attachments are confidential. Any use...{{dropped:8}}