Hi,
The purpose is to generate weather information(tmin,tmim,prcp) for every 4
KM (every 4KM there is a point represent that grid) for 32 years throughout
Contingent U.S.
I want to run several states simultaneously which I want it run them
parallel.
Thanks,
Ping
On Sun, Feb 2, 2014 at 1:45 PM, Julian Burgos <julian at hafro.is> wrote:
A clear explanation of what you are trying to do would help. I cannot
figure out why you need to generate such a huge amount of files! Writing
data into a disk is a very slow step.
Hi r-sig-geo,
I am using the following script to extraction weather information for
point (from a rasterized shapefile for the COUS) which generate about
460,000 files:
Originally I read the coordinates from a shapefile:
Library(rgdal)
pl <- readOGR(".","US_2_points")
then I used the following code to extract the information from three
NetCDF
files that contains the tmin,tmax and pr for each year :
for (year in 1981:2012)
{
#get the combined netCDF file
tminfile <- paste("tmin","_",year,".nc",sep='')
b_tmin <- brick(tminfile,varname='tmin')
pptfile <- paste("ppt","_",year,".nc",sep='')
b_ppt <- brick(pptfile,varname='ppt')
tmaxfile <- paste("tmax","_",year,".nc",sep='')
b_tmax <- brick(tmaxfile,varname='tmax')
#Get the first year here!!!
print(paste("processing year :",year,sep=''))
for(l in 1:length(pl))
{
v <- NULL
#generate file with the name convention with
t_n(latitude)w(longitude).txt, 5 digits after point should be work
filename <-
paste("e:/PRISM/US3/N",round(coordinates(pl[l,])[2],5),"W",abs(round(coordinates(pl[l,])[1],5)),".wth",sep='')
print(paste("processing file :",filename,sep=''))
tmin <-
as.numeric(round(extract(b_tmin,coordinates(pl[l,])),digits=1))
tmax <-
as.numeric(round(extract(b_tmax,coordinates(pl[l,])),digits=1))
ppt <-
as.numeric(round(extract(b_ppt,coordinates(pl[l,])),digits=2))
v <- cbind(tmax,tmin,ppt)
tablename <- c("tmin","tmax","ppt")
v <- data.frame(v)
colnames(v) <- tablename
v["default"] <- 0
v["year"] <- year
date <-
seq(as.Date(paste(year,"/1/1",sep='')),as.Date(paste(year,"/12/31",sep='')),"days")
month <- as.numeric(substr(date,6,7))
day <- as.numeric(substr(date,9,10))
v["month"] <- month
v["day"] <- day
v <- v[c("year","month","day","default","tmin","tmax","ppt")]
#write into a file with the fixed wide format
if(file.existes(filename))
{
write.fwf(x=v,filename,append=TRUE,na="NA",rownames=FALSE,colnames=FALSE,width=c(6,3,3,5,5,5,6))
write.fwf(x=v,filename,append=FALSE,na="NA",rownames=FALSE,colnames=FALSE,width=c(6,3,3,5,5,5,6))
}
I found it will takes weeks to finish, then I divide the points from the
shape file into each State(to generate a shape file for each state for
points), and using the following function:
weather4Point <- function(state)
{
#get the point file for the state
setwd("c:/PRISM/US1Points/")
pl <- readOGR(".",paste("points4_",state,sep=''))
setwd("e:/PRISM/NetCDF/")
for (year in 1981:2012)
{
#get the combined netCDF file
tminfile <- paste("tmin","_",year,".nc",sep='')
b_tmin <- brick(tminfile,varname='tmin')
pptfile <- paste("ppt","_",year,".nc",sep='')
b_ppt <- brick(pptfile,varname='ppt')
tmaxfile <- paste("tmax","_",year,".nc",sep='')
b_tmax <- brick(tmaxfile,varname='tmax')
#Get the first year here!!!
print(paste("processing year :",year,sep=''))
for(l in 1:length(pl))
{
v <- NULL
#generate file with the name convention with
t_n(latitude)w(longitude).txt, 5 digits after point should be work
filename <-
paste("e:/PRISM/US3/N",round(coordinates(pl[l,])[2],5),"W",abs(round(coordinates(pl[l,])[1],5)),".wth",sep='')
print(paste("processing file :",filename,sep=''))
tmin <-
as.numeric(round(extract(b_tmin,coordinates(pl[l,])),digits=1))
tmax <-
as.numeric(round(extract(b_tmax,coordinates(pl[l,])),digits=1))
ppt <-
as.numeric(round(extract(b_ppt,coordinates(pl[l,])),digits=2))
v <- cbind(tmax,tmin,ppt)
tablename <- c("tmin","tmax","ppt")
v <- data.frame(v)
colnames(v) <- tablename
v["default"] <- 0
v["year"] <- year
date <-
seq(as.Date(paste(year,"/1/1",sep='')),as.Date(paste(year,"/12/31",sep='')),"days")
month <- as.numeric(substr(date,6,7))
day <- as.numeric(substr(date,9,10))
v["month"] <- month
v["day"] <- day
v <- v[c("year","month","day","default","tmin","tmax","ppt")]
v
#print(paste(v), zero.print = ".")
#write into a file with the APEX format
if(file.existes(filename))
{
write.fwf(x=v,filename,append=TRUE,na="NA",rownames=FALSE,colnames=FALSE,width=c(6,3,3,5,5,5,6))
write.fwf(x=v,filename,append=FALSE,na="NA",rownames=FALSE,colnames=FALSE,width=c(6,3,3,5,5,5,6))
}
}
}
}
Then using the following code by opening several R session to make this
process "parallel".
States <- c("AL","AZ","AR","CO","CT","DC","FL","GA","ID","IA",
"IN","KS","KY","LA","MA","ME","MI","MN","MT","NC")
for (state in States)
{
weather4Point(state)
}
However, I still feel slow when I plan to do for the whole US. I am
here is there a better solution to speed up this process(any possibility
to
employ parallel programming in R) ?
Environment information:
R version 3.0.1 (2013-05-16)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Memory: 16 GB
Processor: Intel(R) Core(tm) i7-3770 CPU @ 3.4Ghz 3.4Ghz
Looking forward to any suggestions?
Regards,
Ping
[[alternative HTML version deleted]]