Skip to content

Averaging 'blocks' of data

8 messages · Dylan Beaudette, Gabor Grothendieck, Steve Murray +3 more

#
Dear all,

I have a large dataset which I hope to reduce in size, to make it more useable. I hope to do this by taking an average of each 60 x 60 blockof values and forming a new data frame out of the averaged values.

How would I go about taking averages of 60 x 60 'blocks' in R, and cycling through the whole dataset, recording each calculated value in a new table/data frame?

Many thanks for any advice offered.

Steve
#
On Sun, Sep 7, 2008 at 12:32 PM, Steve Murray <smurray444 at hotmail.com> wrote:
what does the data look like? vector / matrix / list ?
some form of apply(), tapply(), mapply(), or lapply() would probably
do what you want
Here is a start:

# step 1. too much data: 10x10 matrix
m <- matrix(runif(100), ncol=10)

# step 2. reduce down to a 10x1 vector, averaging-by-row:
apply(m, 1, mean)

# step 3 profit.

Dylan
#
This was answered last month:

http://tolstoy.newcastle.edu.au/R/e4/help/08/08/19091.html
On Sun, Sep 7, 2008 at 3:32 PM, Steve Murray <smurray444 at hotmail.com> wrote:
#
Gabor - thanks for your suggestion... I had checked the previous post, but I found (as a new user of R) this approach to be too complicated and I had problems gaining the correct output values. If there is a simpler way of doing this, then please feel free to let me know.

Dylan - thanks, your approach is a good start. In answer to your questions, my data are 43200 columns and 16800 rows as a data frame - I will probably have to read the dataset in segments though, as it won't fit into the memory!  I've been able to follow your example - how would I be able to apply this technique for finding the average of each 60 x 60 block?

Any other suggestions are of course welcome!

Many thanks again,

Steve
#
Gabor - thanks for your suggestion... I had checked the previous post, but I found (as a new user of R) this approach to be too complicated and I had problems gaining the correct output values. If there is a simpler way of doing this, then please feel free to let me know.

Dylan - thanks, your approach is a good start. In answer to your questions, my data are 43200 columns and 16800 rows as a data frame - I will probably have to read the dataset in segments though, as it won't fit into the memory!  I've been able to follow your example - how would I be able to apply this technique for finding the average of each 60 x 60 block?

Any other suggestions are of course welcome!

Many thanks again,

Steve

_________________________________________________________________
Discover Bird's Eye View now with Multimap from Live Search
#
Here is a way to do it by reading in 60 lines at a time and computing the means:

# create some test data
n <- 360
x <- matrix(runif(360*16800), nrow=16800)
cat(x, file="/tempxx.txt")


# now process the data 60 lines at a time, averaging each 60x60 block
result <- matrix(0, nrow=6, ncol=280)
nextLine <- 1  # next output in the result
# create a list of indices to use to partition the input matrix
colIndex <- split(seq(16800), (seq(16800) - 1) %/% 60)
input <- file("/tempxx.txt", "r")
while (TRUE){
    # use 'scan' to read in 60 lines at a time
    block <- scan(input, what=0, n=60*16800)
    if (length(block) != 60 * 16800) break  # exit if done
    # convert to a matrix
    block <- matrix(block, nrow=60, byrow=TRUE)
    # compute the mean and store it
    result[nextLine,] <- sapply(colIndex, function(.blk){
        mean(block[, .blk])
    })
    nextLine <- nextLine + 1
}
On Sun, Sep 7, 2008 at 4:46 PM, Steve Murray <smurray444 at hotmail.com> wrote:

  
    
#
I'm not sure I exactly understand your problem, but if you are 
looking for a recursive algorithm for calculating the average by 
addition of one record only at a time, consider:

y[k] = y[k-1] + (x[k] - y[k-1])/k,      where y(0) = 0, k = 1, 2, ...

At each stage, y[k] = (x[1]+...+x[k])/k.
At 04:46 PM 9/7/2008, Steve Murray wrote:

            
================================================================
Robert A. LaBudde, PhD, PAS, Dpl. ACAFS  e-mail: ral at lcfltd.com
Least Cost Formulations, Ltd.            URL: http://lcfltd.com/
824 Timberlake Drive                     Tel: 757-467-0954
Virginia Beach, VA 23464-3239            Fax: 757-467-2947

"Vere scire est per causas scire"
#
Hi Steve,

 	You probably want to check out ?by or ?aggregate, maybe using
(rownames(df) %/% 60) : (colnames(df) %/% 60) as your index variable.

--Adam
On Sun, 7 Sep 2008, Steve Murray wrote: