Melt and Rbind/Rbindlist
Hello Mr. Holtman, Thank you very much for your reply and suggestion. This is what each Year's data looks like; tmp1 <- structure(list(FIPS = c(1001L, 1003L, 1005L), X2026.01.01.1 =
c(285.5533142,
285.5533142, 286.2481079), X2026.01.01.2 = c(283.4977112, 283.4977112,
285.0860291), X2026.01.01.3 = c(281.9733887, 281.9733887, 284.1548767
), X2026.01.01.4 = c(280.0234985, 280.0234985, 282.6075745),
X2026.01.01.5 = c(278.7125854, 278.7125854, 281.2553711),
X2026.01.01.6 = c(278.5204773, 278.5204773, 280.6148071)), .Names =
c("FIPS",
"X2026.01.01.1", "X2026.01.01.2", "X2026.01.01.3", "X2026.01.01.4",
"X2026.01.01.5", "X2026.01.01.6"), class = "data.frame", row.names =
c(NA,
-3L))
The data is in 3-hour blocks for every day by US FIPS code from 2026-2045,
each year's data is in a difference csv. My goal is to to compute max, min,
and mean by week and month. I used the following code to assign week
numbers to the observations;
nweek <- function(x, format="%Y-%m-%d", origin){
if(missing(origin)){
as.integer(format(strptime(x, format=format), "%W"))
}else{
x <- as.Date(x, format=format)
o <- as.Date(origin, format=format)
w <- as.integer(format(strptime(x, format=format), "%w"))
2 + as.integer(x - o - w) %/% 7
}
}
Then the following;
for (i in filelist) {
nweek(tmp2$date)
}
for (i in filelist) {
nweek(dates, origin="2026-01-01")
}
for (i in filelist) {
wkn<-nweek(tmp2$date)
}
Is this efficient? Thank you so much again. I really appreciate it. Sincerely, Shouro
On Sun, Feb 1, 2015 at 1:22 AM, jim holtman <jholtman at gmail.com> wrote:
It would have been nice if you had at least supplied a subset (~10 lines)
from a couple of files so we could see what the data looks like and test
out any solution. Since you are using 'data.table', you should probably
also use 'fread' for reading in the data. Here is a possible approach of
reading the data into a list and then creating a single, large data.table:
-------
myDTs <- lapply(filelist, function(.file) {
tmp1 <- fread(.file, sep=",")
tmp2 <- melt(tmp1, id="FIPS")
tmp2$year <- as.numeric(substr(tmp2$variable,2,5))
tmp2$month <- as.numeric(substr(tmp2$variable,7,8))
tmp2$day <- as.numeric(substr(tmp2$variable,10,11))
tmp2 # return value
})
bigDT <- rbindlist(myDTs) # rbind all the data.tables together
# then you should be able to do:
mean.temp <- bigDT[, list(temp.mean=lapply(.SD, mean),
by=c("FIPS","year","month"), .SDcols=c("temp")]
Jim Holtman
Data Munger Guru
What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.
On Sat, Jan 31, 2015 at 5:57 PM, Shouro Dasgupta <shouro at gmail.com> wrote:
I have climate data for 20 years for US counties (FIPS) in csv format,
each
file represents one year of data. I have extracted the data and reshaped
the yearly data files using melt();
for (i in filelist) {
tmp1 <- as.data.table(read.csv(i,header=T, sep=",")) tmp2 <- melt(tmp1, id="FIPS") tmp2$year <- as.numeric(substr(tmp2$variable,2,5)) tmp2$month <- as.numeric(substr(tmp2$variable,7,8)) tmp2$day <- as.numeric(substr(tmp2$variable,10,11)) }
Should I *rbind *in the loop here as I have the memory? So, the file (i) tmp2 looks like this: FIPS temp year month date
1001 276.7936 2045 1 1/1/2045 1003 276.7936 2045 1 1/1/2045 1005 279.6452 2045 1 1/1/2045 1007 276.7936 2045 1 1/1/2045 1009 272.3748 2045 1 1/1/2045 1011 279.6452 2045 1 1/1/2045
My goal is calculate the mean by FIPS code by month/week, however, when I
use the following code, I get a NULL value.
mean.temp<- for (i in filelist) {tmp2[, list(temp.mean=lapply(.SD, mean),
by=c("FIPS","year","month"), .SDcols=c("temp")]}
This works fine for individual years but with *for (i in filelist)*. What
am I doing wrong? Can include a rbind/bindlist in the loop to make a big
data.frame? Any suggestions will be highly appreciated. Thank you.
Sincerely,
Shouro
[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.