Skip to content

Mean-Centering Question

9 messages · Ray DiGiacomo, Jr., Elizabeth Fuller Bettini, David Winsemius +2 more

#
On Dec 8, 2012, at 3:54 PM, Ray DiGiacomo, Jr. wrote:

            
dat <- read.table(text="Location,TimePeriod,Units,AveragePrice
Los Angeles,5/1/11,61,5.42
Los Angeles,5/8/11,49,4.69
Los Angeles,5/15/11,40,5.05
New York,5/1/11,259,6.4
New York,5/8/11,187,5.3
New York,5/15/11,177,5.7
Paris,5/1/11,672,6.26
Paris,5/8/11,514,5.3
Paris,5/15/11,455,5.2", header=TRUE, sep=",")
I needed to modify this to avoid errors relating to how colMeans is  
expecting its arguments:

specialFunction2 <- function(x){ log(x) - mean(log(x), na.rm = T) }

aggregate(dat[3:4], dat[1], FUN=specialFunction2)

      Location    Units.1    Units.2    Units.3 AveragePrice.1  
AveragePrice.2
1 Los Angeles  0.2136827 -0.0053709 -0.2083118      0.0717903      
-0.0728730
2    New York  0.2354659 -0.0902535 -0.1452124      0.1014743      
-0.0871168
3       Paris  0.2193320 -0.0487031 -0.1706289      0.1173316      
-0.0491417
   AveragePrice.3
1      0.0010827
2     -0.0143575
3     -0.0681899
OK. So then I tried this with your function and was surprised to see  
that it also works:

 > by(dat[c("Units", "AveragePrice")],
+ dat["Location"],
+ specialFunction)
Location: Los Angeles
      Units AveragePrice
1  0.21368    0.0717903
2  2.27351   -2.3517586
3 -0.20831    0.0010827
------------------------------------------------------------------
Location: New York
      Units AveragePrice
4  0.23547     0.101474
5  3.47628    -3.653655
6 -0.14521    -0.014357
------------------------------------------------------------------
Location: Paris
      Units AveragePrice
7  0.21933      0.11733
8  4.52537     -4.62322
9 -0.17063     -0.06819
I guess I don't. Cannot reproduce and my other methods worked as  
well.This also works with your version and with mine but I get the  
deprecation message for `mean.data.frame` from mine:

 > lapply( split(dat[3:4], dat[1]) , FUN=specialFunction )
$`Los Angeles`
      Units AveragePrice
1  0.21368    0.0717903
2  2.27351   -2.3517586
3 -0.20831    0.0010827

$`New York`
      Units AveragePrice
4  0.23547     0.101474
5  3.47628    -3.653655
6 -0.14521    -0.014357

$Paris
      Units AveragePrice
7  0.21933      0.11733
8  4.52537     -4.62322
9 -0.17063     -0.06819

  
    
#
On Dec 8, 2012, at 7:06 PM, Elizabeth Fuller Bettini wrote:

            
You subscribed and only you know the password that allows you to  
control the subscription options. Please use the links at the bottom  
of every posting to Rhelp.
David Winsemius, MD
Alameda, CA, USA
#
Hi,

It works for me also:
?by(dat1[c("Units","AveragePrice")],dat1[,1],specialFunction)
#dat1[, 1]: Los Angeles
?# ???? Units AveragePrice
#1? 0.2136827? 0.071790268
#2? 2.2735148 -2.351758623
#3 -0.2083118? 0.001082696
----------------------------------------------
#or

?by(cbind(Units=dat1[,3],AveragePrice=dat1[,4]),dat1[,1],specialFunction)
#INDICES: Los Angeles
?# ???? Units AveragePrice
#1? 0.2136827? 0.071790268
#2? 2.2735148 -2.351758623
#3 -0.2083118? 0.001082696
--------------------------------------------

A.K.






----- Original Message -----
From: "Ray DiGiacomo, Jr." <rayd at liondatasystems.com>
To: R Help <r-help at r-project.org>
Cc: 
Sent: Saturday, December 8, 2012 6:54 PM
Subject: [R] Mean-Centering Question

Hello,

I'm trying to create a custom function that "mean-centers" data and can be
applied across many columns.

Here is an example dataset, which is similar to my dataset:

*Location,TimePeriod,Units,AveragePrice*
Los Angeles,5/1/11,61,5.42
Los Angeles,5/8/11,49,4.69
Los Angeles,5/15/11,40,5.05
New York,5/1/11,259,6.4
New York,5/8/11,187,5.3
New York,5/15/11,177,5.7
Paris,5/1/11,672,6.26
Paris,5/8/11,514,5.3
Paris,5/15/11,455,5.2

I want to mean-center the "Units" and "AveragePrice" Columns.

So, I created this function:

specialFunction <- function(x){ log(x) - colMeans(log(x), na.rm = T) }

If I use only "one" column in the first argument of the "by" function,
everything is in fine.? For example the following code will work fine:

by(data[c("Units")],
data["Location"],
specialFunction)

But the following code will "not" work, because I have "two" columns in the
first argument...

by(data[c("Units", "AveragePrice")],
data["Location"],
specialFunction)

Does anyone have any ideas as to what I am doing wrong?

Please note that I'm trying to get the following results (for the "Los
Angeles" group):

Los Angeles "Units" variable (Mean-Centered)
0.213682659
-0.005370907
-0.208311751

Los Angeles "AveragePrice" variable (Mean-Centered)
0.071790268
-0.072872965
0.001082696

Best Regards,

Ray DiGiacomo, Jr.
Healthcare Predictive Analytics Specialist
President, Lion Data Systems LLC
President, The Orange County R User Group
Board Member, TDWI
rayd at liondatasystems.com
(m) 408-425-7851
San Juan Capistrano, California USA
twitter.com/liondatasystems
linkedin.com/in/raydigiacomojr
youtube.com/user/liondatasystems/videos

??? [[alternative HTML version deleted]]

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
#
Hi,

You could also use:
newFunction1<-function(x) {t(t(log(x))-colMeans(log(x)))}

?res1<-by(dat1[c("Units","AveragePrice")],dat1["Location"],newFunction1)
?res1
#Location: Los Angeles
#???????? Units AveragePrice
#1? 0.213682659? 0.071790268
#2 -0.005370907 -0.072872965
#3 -0.208311751? 0.001082696
#------------------------------------------------------------ 
#Location: New York
?# ????? Units AveragePrice
#4? 0.23546592?? 0.10147433
#5 -0.09025352? -0.08711684
#6 -0.14521240? -0.01435749
#------------------------------------------------------------ 
#Location: Paris
?# ????? Units AveragePrice
#7? 0.21933200?? 0.11733164
#8 -0.04870308? -0.04914172
#9 -0.17062892? -0.06818992


? newFunction <- function(x) { sweep(log(x), 2, colMeans(log(x)), "-") }
?res<-by(dat1[c("Units","AveragePrice")],dat1["Location"],newFunction)
?res
#Location: Los Angeles
?# ?????? Units AveragePrice
#1? 0.213682659? 0.071790268
#2 -0.005370907 -0.072872965
#3 -0.208311751? 0.001082696
#------------------------------------------------------------ 
#Location: New York
?# ????? Units AveragePrice
#4? 0.23546592?? 0.10147433
#5 -0.09025352? -0.08711684
#6 -0.14521240? -0.01435749
#------------------------------------------------------------ 
#Location: Paris
?# ????? Units AveragePrice
#7? 0.21933200?? 0.11733164
#8 -0.04870308? -0.04914172
#9 -0.17062892? -0.06818992

#the ?identical() will be FALSE, as the list elements for res is data.frame and res1 is matrix.? 

A.K.


----- Original Message -----
From: "Ray DiGiacomo, Jr." <rayd at liondatasystems.com>
To: R Help <r-help at r-project.org>
Cc: 
Sent: Saturday, December 8, 2012 11:11 PM
Subject: Re: [R] Mean-Centering Question

Hi David and Arun,

Thanks for looking into this.? I think I have found a solution.

The "by" function will run ok without errors but the values returned in the
second row of the "Los Angeles" output are both incorrect.? These incorrect
values are shown below in red.

I think my original custom function was causing the incorrect values
because the subtraction inside the original custom function was subtracting
frames that had different dimensions and I think there was some "recycling"
happening.

Using the "sweep" function fixes the problem.? This is what I did to fix
things:

# here is my "new" custom function
newFunction <- function(x) { sweep(log(x), 2, colMeans(log(x)), "-") }

# this gives the correct values
by(PullData[c("Units","AveragePrice")],
PullData[c("StoreLocation")],
? ? ? ? newFunction)

- Ray
On Sat, Dec 8, 2012 at 7:12 PM, David Winsemius <dwinsemius at comcast.net>wrote:

            
??? [[alternative HTML version deleted]]

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
#
If you are willing to rethink the definition of your special function, the
process can be simplified. The function lmc() log-mean centers a single
grouped numeric vector. Then sapply() can be used to center a batch of them.
scale=FALSE), g)
Location X..TimePeriod        Units AveragePrice
1 Los Angeles        5/1/11  0.213682659  0.071790268
2 Los Angeles        5/8/11 -0.005370907 -0.072872965
3 Los Angeles       5/15/11 -0.208311751  0.001082696
4    New York        5/1/11  0.235465925  0.101474328
5    New York        5/8/11 -0.090253520 -0.087116841
6    New York       5/15/11 -0.145212404 -0.014357487
7       Paris        5/1/11  0.219331999  0.117331641
8       Paris        5/8/11 -0.048703076 -0.049141723
9       Paris       5/15/11 -0.170628923 -0.068189918

----------------------------------------------
David L Carlson
Associate Professor of Anthropology
Texas A&M University
College Station, TX 77843-4352