Skip to content

How to z-standardize for subgroups?

5 messages · Jorge Ivan Velez, John Kane, Chuck Cleland +1 more

#
Hi folks,
I have a dataframe df.vars with the follwing structure:


var1   var2   var3   group

Group is a factor.

Now I want to standardize the vars 1-3 (actually - there are many  
more) by class, so I define

z.mean.sd <- function(data){
	return.values <- (data  - mean(data)) / (sd(data))
	return(return.values)
}

now I can call for each var

z.var1 <- by(df.vars$var1, group, z.mean.sd)

which gives me the standardised data for each subgroup in a list with  
the subgroups

z.var1 <- unlist(z.var1)

then gives me the z-standardised data for var1 in one vector. Great!

Now I would like to do this for the whole dataframe, but probably I am  
not thinking vectorwise enough.

z.df.vars <- by(df.vars, group, z.mean.sd)

does not work. I banged my head on other solutions trying out sapply  
and tapply, but did not succeed. Do I need to loop and put everything  
together by hand? But I want to keep the columnnames in the vector?

-karsten


---------------------------------------------------------------------------------------------
Karsten D. Wolf
Didactical Design of Interactive
Learning Environments
Universit?t Bremen - Fachbereich 12
web: http://www.ifeb.uni-bremen.de/wolf/
#
http://finzi.psych.upenn.edu/R/library/QuantPsyc/html/Make.Z.html

Make.Z in the QuantPsych package may already do it.
--- On Sun, 11/29/09, Karsten Wolf <wolf at uni-bremen.de> wrote:

            
__________________________________________________
Do You Yahoo!?
Tired of spam?
#
On 11/29/2009 4:23 PM, John Kane wrote:
For a single variable, you could use ave() and scale() together like this:

with(iris, ave(Sepal.Width, Species, FUN = scale))

  To scale more than one variable in a concise call, consider something
along these lines:

apply(iris[,1:4], 2, function(x){ave(x, iris$Species, FUN = scale)})

hope this helps,

Chuck Cleland

  
    
#
Hi Jorge, Chuck and Kane,
thanks for your input!
The following code based on Jorge's answer did the trick to  
standardize for subgroups within multiple columns:

# define a standardize function, but you could also define your custom  
standardize function here
z.mean.sd <- function(data){
	return.values <- (data  - mean(data, na.rm = TRUE)) / (sd(data, na.rm  
= TRUE))
	return(return.values)
}

# assume there is some data.frame called sole.data with a group factor  
sole.data$studie already read into R
sole.data <- read.csv2("SoLe.dat")
attach(sole.data)
# assume I have created a subset of the data.frame cor.vars with only  
some of the vars needed to be standardized
cor.vars <- data.frame(var02, var04, var07, var10, var17, var24, var 36)

z.cor.vars <- apply(cor.vars, 2, tapply, sole.data$studie, z.mean.sd)
z.cor.vars <- sapply(z.cor.vars, unlist, USE.NAMES = FALSE)
z.cor.vars

BUT then Chuck's answer was much more elegant than my first woodpecker  
solution

apply(iris[,1:4], 2, function(x){ave(x, iris$Species, FUN = scale)})

could be translated into

apply(sole.data[,c(2,4,7,10,17,24,36)], 2, function(x){ave(x,sole.data 
$studie, FUN=scale)})

Thanks for the beauty of this code with an anonymous function call :)

-karsten



Am 29.11.2009 um 16:47 schrieb Jorge Ivan Velez: