Dear all, I have a large dataset which I hope to reduce in size, to make it more useable. I hope to do this by taking an average of each 60 x 60 blockof values and forming a new data frame out of the averaged values. How would I go about taking averages of 60 x 60 'blocks' in R, and cycling through the whole dataset, recording each calculated value in a new table/data frame? Many thanks for any advice offered. Steve
Averaging 'blocks' of data
8 messages · Dylan Beaudette, Gabor Grothendieck, Steve Murray +3 more
On Sun, Sep 7, 2008 at 12:32 PM, Steve Murray <smurray444 at hotmail.com> wrote:
Dear all, I have a large dataset which I hope to reduce in size, to make it more useable. I hope to do this by taking an average of each 60 x 60 blockof values and forming a new data frame out of the averaged values.
what does the data look like? vector / matrix / list ?
How would I go about taking averages of 60 x 60 'blocks' in R, and cycling through the whole dataset, recording each calculated value in a new table/data frame?
some form of apply(), tapply(), mapply(), or lapply() would probably do what you want
Many thanks for any advice offered. Steve
Here is a start: # step 1. too much data: 10x10 matrix m <- matrix(runif(100), ncol=10) # step 2. reduce down to a 10x1 vector, averaging-by-row: apply(m, 1, mean) # step 3 profit. Dylan
This was answered last month: http://tolstoy.newcastle.edu.au/R/e4/help/08/08/19091.html
On Sun, Sep 7, 2008 at 3:32 PM, Steve Murray <smurray444 at hotmail.com> wrote:
Dear all, I have a large dataset which I hope to reduce in size, to make it more useable. I hope to do this by taking an average of each 60 x 60 blockof values and forming a new data frame out of the averaged values. How would I go about taking averages of 60 x 60 'blocks' in R, and cycling through the whole dataset, recording each calculated value in a new table/data frame? Many thanks for any advice offered. Steve
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Gabor - thanks for your suggestion... I had checked the previous post, but I found (as a new user of R) this approach to be too complicated and I had problems gaining the correct output values. If there is a simpler way of doing this, then please feel free to let me know. Dylan - thanks, your approach is a good start. In answer to your questions, my data are 43200 columns and 16800 rows as a data frame - I will probably have to read the dataset in segments though, as it won't fit into the memory! I've been able to follow your example - how would I be able to apply this technique for finding the average of each 60 x 60 block? Any other suggestions are of course welcome! Many thanks again, Steve
Gabor - thanks for your suggestion... I had checked the previous post, but I found (as a new user of R) this approach to be too complicated and I had problems gaining the correct output values. If there is a simpler way of doing this, then please feel free to let me know. Dylan - thanks, your approach is a good start. In answer to your questions, my data are 43200 columns and 16800 rows as a data frame - I will probably have to read the dataset in segments though, as it won't fit into the memory! I've been able to follow your example - how would I be able to apply this technique for finding the average of each 60 x 60 block? Any other suggestions are of course welcome! Many thanks again, Steve _________________________________________________________________ Discover Bird's Eye View now with Multimap from Live Search
Here is a way to do it by reading in 60 lines at a time and computing the means:
# create some test data
n <- 360
x <- matrix(runif(360*16800), nrow=16800)
cat(x, file="/tempxx.txt")
# now process the data 60 lines at a time, averaging each 60x60 block
result <- matrix(0, nrow=6, ncol=280)
nextLine <- 1 # next output in the result
# create a list of indices to use to partition the input matrix
colIndex <- split(seq(16800), (seq(16800) - 1) %/% 60)
input <- file("/tempxx.txt", "r")
while (TRUE){
# use 'scan' to read in 60 lines at a time
block <- scan(input, what=0, n=60*16800)
if (length(block) != 60 * 16800) break # exit if done
# convert to a matrix
block <- matrix(block, nrow=60, byrow=TRUE)
# compute the mean and store it
result[nextLine,] <- sapply(colIndex, function(.blk){
mean(block[, .blk])
})
nextLine <- nextLine + 1
}
On Sun, Sep 7, 2008 at 4:46 PM, Steve Murray <smurray444 at hotmail.com> wrote:
Gabor - thanks for your suggestion... I had checked the previous post, but I found (as a new user of R) this approach to be too complicated and I had problems gaining the correct output values. If there is a simpler way of doing this, then please feel free to let me know. Dylan - thanks, your approach is a good start. In answer to your questions, my data are 43200 columns and 16800 rows as a data frame - I will probably have to read the dataset in segments though, as it won't fit into the memory! I've been able to follow your example - how would I be able to apply this technique for finding the average of each 60 x 60 block? Any other suggestions are of course welcome! Many thanks again, Steve
_________________________________________________________________ Discover Bird's Eye View now with Multimap from Live Search ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem that you are trying to solve?
I'm not sure I exactly understand your problem, but if you are looking for a recursive algorithm for calculating the average by addition of one record only at a time, consider: y[k] = y[k-1] + (x[k] - y[k-1])/k, where y(0) = 0, k = 1, 2, ... At each stage, y[k] = (x[1]+...+x[k])/k.
At 04:46 PM 9/7/2008, Steve Murray wrote:
Gabor - thanks for your suggestion... I had checked the previous post, but I found (as a new user of R) this approach to be too complicated and I had problems gaining the correct output values. If there is a simpler way of doing this, then please feel free to let me know. Dylan - thanks, your approach is a good start. In answer to your questions, my data are 43200 columns and 16800 rows as a data frame - I will probably have to read the dataset in segments though, as it won't fit into the memory! I've been able to follow your example - how would I be able to apply this technique for finding the average of each 60 x 60 block? Any other suggestions are of course welcome! Many thanks again, Steve
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
================================================================ Robert A. LaBudde, PhD, PAS, Dpl. ACAFS e-mail: ral at lcfltd.com Least Cost Formulations, Ltd. URL: http://lcfltd.com/ 824 Timberlake Drive Tel: 757-467-0954 Virginia Beach, VA 23464-3239 Fax: 757-467-2947 "Vere scire est per causas scire"
Hi Steve, You probably want to check out ?by or ?aggregate, maybe using (rownames(df) %/% 60) : (colnames(df) %/% 60) as your index variable. --Adam
On Sun, 7 Sep 2008, Steve Murray wrote:
Dear all, I have a large dataset which I hope to reduce in size, to make it more useable. I hope to do this by taking an average of each 60 x 60 blockof values and forming a new data frame out of the averaged values. How would I go about taking averages of 60 x 60 'blocks' in R, and cycling through the whole dataset, recording each calculated value in a new table/data frame? Many thanks for any advice offered. Steve
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.