Dear R-help,
I have a loop, which is set to take about 26 hours to run at the rate it's going
- this is ridiculous and I really need your help to find a more efficient way of
loading up my array gpcc.array:
#My data is stored in a table format with all the data in one long column
#running though every longitute, for every latitude, for every year. The
#original data is sotred as gpcc.data2 where dim(gpcc.data2) = [476928,5] where
#the 5th column is the data:
#make the array in the format I need [longitude,latitude,years]
gpcc.array <- array(NA, c(144,72,46))
n=0
for(k in 1:46){
for(j in 1:72){
for(i in 1:144){
n <- n+1
gpcc.array[i,j,k] <- gpcc.data2[n,5]
print(j)
}
}
}
So it runs through all the longs for every lat for every year - which is the
order the data is running down the column in gpcc.data2 so n increses by 1 each
time and each data point is pulled off....
It needs to be a lot quicker, I'd appreciate any ideas!
Many thanks for taking time to read this,
Jenny Barnes
~~~~~~~~~~~~~~~~~~
Jennifer Barnes
PhD student - long range drought prediction
Climate Extremes
Department of Space and Climate Physics
University College London
Holmbury St Mary, Dorking
Surrey
RH5 6NT
01483 204149
07916 139187
Web: http://climate.mssl.ucl.ac.uk
loop is going to take 26 hours - needs to be quicker!
7 messages · Jenny Barnes, Duncan Murdoch, Marc Schwartz +3 more
On 12/14/2006 7:56 AM, Jenny Barnes wrote:
Dear R-help,
I have a loop, which is set to take about 26 hours to run at the rate it's going
- this is ridiculous and I really need your help to find a more efficient way of
loading up my array gpcc.array:
#My data is stored in a table format with all the data in one long column
#running though every longitute, for every latitude, for every year. The
#original data is sotred as gpcc.data2 where dim(gpcc.data2) = [476928,5] where
#the 5th column is the data:
#make the array in the format I need [longitude,latitude,years]
gpcc.array <- array(NA, c(144,72,46))
n=0
for(k in 1:46){
for(j in 1:72){
for(i in 1:144){
n <- n+1
gpcc.array[i,j,k] <- gpcc.data2[n,5]
print(j)
}
}
}
So it runs through all the longs for every lat for every year - which is the
order the data is running down the column in gpcc.data2 so n increses by 1 each
time and each data point is pulled off....
It needs to be a lot quicker, I'd appreciate any ideas!
I think the loop above is equivalent to gpcc.array <- array(gpcc.data2[,5], c(144, 72, 46)) which would certainly be a lot quicker. You should check that the values are loaded in the right order (probably on a smaller example!). If not, you should change the order of indices when you create the array, and use the aperm() function to get them the way you want afterwards. Duncan Murdoch
Jenny Barnes wrote:
Dear R-help,
I have a loop, which is set to take about 26 hours to run at the rate it's going
- this is ridiculous and I really need your help to find a more efficient way of
loading up my array gpcc.array:
#My data is stored in a table format with all the data in one long column
#running though every longitute, for every latitude, for every year. The
#original data is sotred as gpcc.data2 where dim(gpcc.data2) = [476928,5] where
#the 5th column is the data:
#make the array in the format I need [longitude,latitude,years]
gpcc.array <- array(NA, c(144,72,46))
n=0
for(k in 1:46){
for(j in 1:72){
for(i in 1:144){
n <- n+1
gpcc.array[i,j,k] <- gpcc.data2[n,5]
print(j)
}
}
}
I don't know if it is faster - but adding three columns to qpcc.data, one for longitude, one for lattitude and one for year (using rep() as they are in sequence) and the using reshape() might be faster?
So it runs through all the longs for every lat for every year - which is the order the data is running down the column in gpcc.data2 so n increses by 1 each time and each data point is pulled off.... It needs to be a lot quicker, I'd appreciate any ideas! Many thanks for taking time to read this, Jenny Barnes ~~~~~~~~~~~~~~~~~~ Jennifer Barnes PhD student - long range drought prediction Climate Extremes Department of Space and Climate Physics University College London Holmbury St Mary, Dorking Surrey RH5 6NT 01483 204149 07916 139187 Web: http://climate.mssl.ucl.ac.uk
______________________________________________ R-help at stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Rainer M. Krug, Dipl. Phys. (Germany), MSc Conservation
Biology (UCT)
Department of Conservation Ecology and Entomology
University of Stellenbosch
Matieland 7602
South Africa
Tel: +27 - (0)72 808 2975 (w)
Fax: +27 - (0)86 516 2782
Fax: +27 - (0)21 808 3304 (w)
Cell: +27 - (0)83 9479 042
email: RKrug at sun.ac.za
Rainer at krugs.de
On Thu, 2006-12-14 at 12:56 +0000, Jenny Barnes wrote:
Dear R-help,
I have a loop, which is set to take about 26 hours to run at the rate it's going
- this is ridiculous and I really need your help to find a more efficient way of
loading up my array gpcc.array:
#My data is stored in a table format with all the data in one long column
#running though every longitute, for every latitude, for every year. The
#original data is sotred as gpcc.data2 where dim(gpcc.data2) = [476928,5] where
#the 5th column is the data:
#make the array in the format I need [longitude,latitude,years]
gpcc.array <- array(NA, c(144,72,46))
n=0
for(k in 1:46){
for(j in 1:72){
for(i in 1:144){
n <- n+1
gpcc.array[i,j,k] <- gpcc.data2[n,5]
print(j)
}
}
}
So it runs through all the longs for every lat for every year - which is the
order the data is running down the column in gpcc.data2 so n increses by 1 each
time and each data point is pulled off....
It needs to be a lot quicker, I'd appreciate any ideas!
Many thanks for taking time to read this,
Jenny Barnes
Take a "whole object" approach to this problem. You are also wasting a lot of time by printing the values of 'j' in the loop.
gpcc.data2 <- matrix(rnorm(476928 * 5), ncol = 5)
dim(gpcc.data2)
[1] 476928 5
str(gpcc.data2)
num [1:476928, 1:5] 2.7385 -0.0438 -0.1084 0.8768 -1.0024 ...
system.time(gpcc.array <- array(gpcc.data2[, 5],
dim = c(144, 72, 46))) [1] 0.024 0.026 0.078 0.000 0.000 You should verify the order of the values and adjust the indices accordingly, if the above results in an out of order array. HTH, Marc Schwartz
What about gpcc.array <- array(gpcc.data2[,5], dim=c(144,72,46))
On 14/12/06, Rainer M Krug <RKrug at sun.ac.za> wrote:
Jenny Barnes wrote:
Dear R-help,
I have a loop, which is set to take about 26 hours to run at the rate it's going
- this is ridiculous and I really need your help to find a more efficient way of
loading up my array gpcc.array:
#My data is stored in a table format with all the data in one long column
#running though every longitute, for every latitude, for every year. The
#original data is sotred as gpcc.data2 where dim(gpcc.data2) = [476928,5] where
#the 5th column is the data:
#make the array in the format I need [longitude,latitude,years]
gpcc.array <- array(NA, c(144,72,46))
n=0
for(k in 1:46){
for(j in 1:72){
for(i in 1:144){
n <- n+1
gpcc.array[i,j,k] <- gpcc.data2[n,5]
print(j)
}
}
}
I don't know if it is faster - but adding three columns to qpcc.data, one for longitude, one for lattitude and one for year (using rep() as they are in sequence) and the using reshape() might be faster?
So it runs through all the longs for every lat for every year - which is the order the data is running down the column in gpcc.data2 so n increses by 1 each time and each data point is pulled off.... It needs to be a lot quicker, I'd appreciate any ideas! Many thanks for taking time to read this, Jenny Barnes ~~~~~~~~~~~~~~~~~~ Jennifer Barnes PhD student - long range drought prediction Climate Extremes Department of Space and Climate Physics University College London Holmbury St Mary, Dorking Surrey RH5 6NT 01483 204149 07916 139187 Web: http://climate.mssl.ucl.ac.uk
______________________________________________ R-help at stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
--
Rainer M. Krug, Dipl. Phys. (Germany), MSc Conservation
Biology (UCT)
Department of Conservation Ecology and Entomology
University of Stellenbosch
Matieland 7602
South Africa
Tel: +27 - (0)72 808 2975 (w)
Fax: +27 - (0)86 516 2782
Fax: +27 - (0)21 808 3304 (w)
Cell: +27 - (0)83 9479 042
email: RKrug at sun.ac.za
Rainer at krugs.de
______________________________________________ R-help at stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
================================= David Barron Said Business School University of Oxford Park End Street Oxford OX1 1HP
Hi if I understand correctly, you have one column you need to reformat into array. Array is basically a vector with dim attribute. Therefore if your data were properly sorted you could use just gpcc.array <- array(gpccnata2[,5], c(144,72,46)) to reformat column 5 of your data frame. But you shall be 100% sure you really want an array and not any other data form. HTH Petr
On 14 Dec 2006 at 12:56, Jenny Barnes wrote:
Date sent: Thu, 14 Dec 2006 12:56:16 +0000 (GMT) From: Jenny Barnes <jmb at mssl.ucl.ac.uk> To: r-help at stat.math.ethz.ch Subject: [R] loop is going to take 26 hours - needs to be quicker! Send reply to: Jenny Barnes <jmb at mssl.ucl.ac.uk> <mailto:r-help-request at stat.math.ethz.ch?subject=unsubscribe> <mailto:r-help-request at stat.math.ethz.ch?subject=subscribe>
Dear R-help,
I have a loop, which is set to take about 26 hours to run at the rate
it's going - this is ridiculous and I really need your help to find a
more efficient way of loading up my array gpcc.array:
#My data is stored in a table format with all the data in one long
#column running though every longitute, for every latitude, for every
#year. The original data is sotred as gpcc.data2 where dim(gpcc.data2)
#= [476928,5] where the 5th column is the data:
#make the array in the format I need [longitude,latitude,years]
gpcc.array <- array(NA, c(144,72,46))
n=0
for(k in 1:46){
for(j in 1:72){
for(i in 1:144){
n <- n+1
gpcc.array[i,j,k] <- gpcc.data2[n,5]
print(j)
}
}
}
So it runs through all the longs for every lat for every year - which
is the order the data is running down the column in gpcc.data2 so n
increses by 1 each time and each data point is pulled off....
It needs to be a lot quicker, I'd appreciate any ideas!
Many thanks for taking time to read this,
Jenny Barnes
~~~~~~~~~~~~~~~~~~
Jennifer Barnes
PhD student - long range drought prediction
Climate Extremes
Department of Space and Climate Physics
University College London
Holmbury St Mary, Dorking
Surrey
RH5 6NT
01483 204149
07916 139187
Web: http://climate.mssl.ucl.ac.uk
______________________________________________ R-help at stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Petr Pikal petr.pikal at precheza.cz
David Barron wrote:
What about gpcc.array <- array(gpcc.data2[,5], dim=c(144,72,46))
I guess this will be slightly faster then my suggestion :-) ?
On 14/12/06, Rainer M Krug <RKrug at sun.ac.za> wrote:
Jenny Barnes wrote:
Dear R-help, I have a loop, which is set to take about 26 hours to run at the
rate it's going
- this is ridiculous and I really need your help to find a more
efficient way of
loading up my array gpcc.array: #My data is stored in a table format with all the data in one long
column
#running though every longitute, for every latitude, for every year.
The
#original data is sotred as gpcc.data2 where dim(gpcc.data2) =
[476928,5] where
#the 5th column is the data:
#make the array in the format I need [longitude,latitude,years]
gpcc.array <- array(NA, c(144,72,46))
n=0
for(k in 1:46){
for(j in 1:72){
for(i in 1:144){
n <- n+1
gpcc.array[i,j,k] <- gpcc.data2[n,5]
print(j)
}
}
}
I don't know if it is faster - but adding three columns to qpcc.data, one for longitude, one for lattitude and one for year (using rep() as they are in sequence) and the using reshape() might be faster?
So it runs through all the longs for every lat for every year -
which is the
order the data is running down the column in gpcc.data2 so n
increses by 1 each
time and each data point is pulled off.... It needs to be a lot quicker, I'd appreciate any ideas! Many thanks for taking time to read this, Jenny Barnes
Rainer M. Krug, Dipl. Phys. (Germany), MSc Conservation
Biology (UCT)
Department of Conservation Ecology and Entomology
University of Stellenbosch
Matieland 7602
South Africa
Tel: +27 - (0)72 808 2975 (w)
Fax: +27 - (0)86 516 2782
Fax: +27 - (0)21 808 3304 (w)
Cell: +27 - (0)83 9479 042
email: RKrug at sun.ac.za
Rainer at krugs.de