Computing sd across an array with missing values
It just so happens that I created a vectorized SD function the other
day. See the column and row versions below. If there are any
rows/columns with only one non-NA value, it will return NaN.
col_sd = function(x){
dimx = dim(x)
x.mean = .Internal(colMeans(x,dimx[1],dimx[2],na.rm=TRUE))
err = t(t(x)-x.mean)
err.sq = err*err
sum.err.sq = .Internal(colSums(err.sq,dimx[1],dimx[2],na.rm=TRUE))
n = .Internal(colSums(!is.na(x),dimx[1],dimx[2],na.rm=TRUE))
sqrt(sum.err.sq/(n-1))
}
row_sd = function(x){
dimx = dim(x)
x.mean = .Internal(rowMeans(x,dimx[1],dimx[2],na.rm=TRUE))
err = x-x.mean
err.sq = err*err
sum.err.sq = .Internal(rowSums(err.sq,dimx[1],dimx[2],na.rm=TRUE))
n = .Internal(rowSums(!is.na(x),dimx[1],dimx[2],na.rm=TRUE))
sqrt(sum.err.sq/(n-1))
}
On Wed, Feb 25, 2009 at 5:11 PM, Jorge Ivan Velez
<jorgeivanvelez at gmail.com> wrote:
Dear Matt, You're absolutely right. The reason why my code gives different results is because it calculates the standard deviation for each row/column in all the arrays rather than for every cell. My bad. You can easily get the results you posted running
apply(p,1:2,sd,na.rm=TRUE)
Here is another option which gives the same results you posted :-) ?Unfortunately there is not timing improvement :( ? (Note that I ran both sd1 and sd2 functions using pp instead of p) # ------------- # System times # -------------
pp <- array(c(1:5, rep(NA, times = 3)), dim = c(1000, 1000, 60)) system.time(apply(pp, 1:2, sd1))
# ? user ?system elapsed # ?93.16 ? ?0.23 ? 94.87
system.time(apply(pp, 1:2, sd2))
# ? user ?system elapsed
# ?66.51 ? ?0.30 ? 67.84
# ----------
# Functions
# ----------
sd1<-function(x) sd(x,na.rm=TRUE)
sd2<- function(i){
if(sum(!is.na(i))==0) temp.sd <- NA
else temp.sd <- sd(i, na.rm = TRUE)
temp.sd
}
Regards,
Jorge
On Wed, Feb 25, 2009 at 3:23 PM, Matt Oliver <moliver at udel.edu> wrote:
Hi Jorge, this does not seem to return the same thing as
p <- array(c(1:5, rep(NA, times = 3)), dim = c(5, 5, 3))
sd_fun <- function(i){
if(sum(!is.na(i))==0){
temp.sd <- NA
}else{
temp.sd <- sd(i, na.rm = TRUE)
}
return(temp.sd)
}
apply(p, 1:2, sd_fun)
Am I missing something basic here?
On Wed, Feb 25, 2009 at 11:47 AM, Jorge Ivan Velez <
jorgeivanvelez at gmail.com> wrote:
Hi Mark, There is a typo in the first way. My apologies. It should be: # First res<-apply(p,3,function(X) ? ? ? ?c(scols=apply(X,2,sd,na.rm=TRUE),srows=apply(X,1,sd,na.rm=TRUE)) ? ? ? ?) res HTH, Jorge On Wed, Feb 25, 2009 at 11:42 AM, Jorge Ivan Velez < jorgeivanvelez at gmail.com> wrote:
Dear Matt, Here are two ways: # Data p <- array(c(1:5, rep(NA, times = 3)), dim = c(5, 5, 3)) # First res<-apply(p,3,function(X) ? ? ? ?c(scols=apply(X,2,sd,na.rm=TRUE),srows=apply(X,2,sd,na.rm=TRUE)) ? ? ? ?) res # Second res2<-apply(p,3,function(X) list(scols=apply(X,2,sd,na.rm=TRUE),srows=apply(X,1,sd,na.rm=TRUE)) ? ? ? ?) lapply(res2,function(x) do.call(rbind,x)) HTH, Jorge On Wed, Feb 25, 2009 at 10:53 AM, Matt Oliver <moliver at udel.edu> wrote:
Dear help, suppose I have this array and want to compute sd aross rows
and
columns.
p <- array(c(1:5, rep(NA, times = 3)), dim = c(5, 5, 3))
apply(p, 1:2, sd) fails because sd requires at least 2 numbers to
compute sd
apply(p, 1:2, sd, na.rm = TRUE) fails for the same reason
I crafted my own function that does what I want
sd_fun <- function(i){
if(sum(!is.na(i))==0){
temp.sd <- NA
}else{
temp.sd <- sd(i, na.rm = TRUE)
}
return(temp.sd)
}
apply(p, 1:2, sd_fun)
This does what I want, but when I scale up to large arrays like
pp <- array(c(1:5, rep(NA, times = 3)), dim = c(1000, 1000, 60))
the apply function takes a long time to run.
Is there a faster, more efficient way to do this?
Thanks in advance
Matt
--
Matthew J. Oliver
Assistant Professor
College of Marine and Earth Studies
University of Delaware
700 Pilottown Rd.
Lewes, DE, 19958
302-645-4079
http://www.ocean.udel.edu/people/profile.aspx?moliver
? ? ? ?[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
-- Matthew J. Oliver Assistant Professor College of Marine and Earth Studies University of Delaware 700 Pilottown Rd. Lewes, DE, 19958 302-645-4079 http://www.ocean.udel.edu/people/profile.aspx?moliver
? ? ? ?[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Mike Lawrence Graduate Student Department of Psychology Dalhousie University Looking to arrange a meeting? Check my public calendar: http://tinyurl.com/mikes-public-calendar ~ Certainty is folly... I think. ~