Melt and Rbind/Rbindlist

Hello Mr. Holtman,

Thank you very much for your reply and suggestion. This is what each Year's
data looks like;

tmp1 <- structure(list(FIPS = c(1001L, 1003L, 1005L), X2026.01.01.1 =
c(285.5533142,
  285.5533142, 286.2481079), X2026.01.01.2 = c(283.4977112, 283.4977112,
  285.0860291), X2026.01.01.3 = c(281.9733887, 281.9733887, 284.1548767
  ), X2026.01.01.4 = c(280.0234985, 280.0234985, 282.6075745),
      X2026.01.01.5 = c(278.7125854, 278.7125854, 281.2553711),
      X2026.01.01.6 = c(278.5204773, 278.5204773, 280.6148071)), .Names =
c("FIPS",
  "X2026.01.01.1", "X2026.01.01.2", "X2026.01.01.3", "X2026.01.01.4",
  "X2026.01.01.5", "X2026.01.01.6"), class = "data.frame", row.names =
c(NA,
  -3L))
The data is in 3-hour blocks for every day by US FIPS code from 2026-2045,
each year's data is in a difference csv. My goal is to to compute max, min,
and mean by week and month. I used the following code to assign week
numbers to the observations;

nweek <- function(x, format="%Y-%m-%d", origin){
    if(missing(origin)){
        as.integer(format(strptime(x, format=format), "%W"))
    }else{
        x <- as.Date(x, format=format)
        o <- as.Date(origin, format=format)
        w <- as.integer(format(strptime(x, format=format), "%w"))
        2 + as.integer(x - o - w) %/% 7
    }
}

Then the following;

for (i in filelist) {
nweek(tmp2$date)
}
for (i in filelist) {
nweek(dates, origin="2026-01-01")
}
for (i in filelist) {
wkn<-nweek(tmp2$date)
}
Is this efficient? Thank you so much again. I really appreciate it.

Sincerely,

Shouro

It would have been nice if you had at least supplied a subset (~10 lines)
from a couple of files so we could see what the data looks like and test
out any solution. Since you are using 'data.table', you should probably
also use 'fread' for reading in the data.  Here is a possible approach of
reading the data into a list and then creating a single, large data.table:

-------
myDTs <- lapply(filelist, function(.file) {
  tmp1 <- fread(.file, sep=",")
  tmp2 <- melt(tmp1, id="FIPS")
  tmp2$year <- as.numeric(substr(tmp2$variable,2,5))
  tmp2$month <- as.numeric(substr(tmp2$variable,7,8))
  tmp2$day <- as.numeric(substr(tmp2$variable,10,11))
  tmp2  # return value
})

bigDT <- rbindlist(myDTs)  # rbind all the data.tables together

# then you should be able to do:

mean.temp <- bigDT[, list(temp.mean=lapply(.SD, mean),
       by=c("FIPS","year","month"), .SDcols=c("temp")]

Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.

On Sat, Jan 31, 2015 at 5:57 PM, Shouro Dasgupta <shouro at gmail.com> wrote:

I have climate data for 20 years for US counties (FIPS) in csv format,
each
file represents one year of data. I have extracted the data and reshaped
the yearly data files using melt();

for (i in filelist) {
  tmp1 <- as.data.table(read.csv(i,header=T, sep=","))
  tmp2 <- melt(tmp1, id="FIPS")
  tmp2$year <- as.numeric(substr(tmp2$variable,2,5))
  tmp2$month <- as.numeric(substr(tmp2$variable,7,8))
  tmp2$day <- as.numeric(substr(tmp2$variable,10,11))
}

Should I *rbind *in the loop here as I have the memory?
So, the file (i) tmp2 looks like this:

FIPS  temp year month  date
1001 276.7936 2045 1 1/1/2045
1003 276.7936 2045 1 1/1/2045
1005 279.6452 2045 1 1/1/2045
1007 276.7936 2045 1 1/1/2045
1009 272.3748 2045 1 1/1/2045
1011 279.6452 2045 1 1/1/2045

My goal is calculate the mean by FIPS code by month/week, however, when I
use the following code, I get a NULL value.

mean.temp<- for (i in filelist) {tmp2[, list(temp.mean=lapply(.SD, mean),
by=c("FIPS","year","month"), .SDcols=c("temp")]}

This works fine for individual years but with *for (i in filelist)*. What
am I doing wrong? Can include a rbind/bindlist in the loop to make a big
data.frame? Any suggestions will be highly appreciated. Thank you.

Sincerely,

Shouro

        [[alternative HTML version deleted]]

______________________________________________
R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.