More efficient use of reshape?

An embedded and charset-unspecified text was scrubbed...
Name: not available
URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20121213/80454138/attachment.pl>

Hi all,

I have played a bit with the "reshape" package and function along with
"melt" and "cast", but I feel I still don't have a good handle on  
how to
use them efficiently. Below I have included a application of  
"reshape" that
is rather clunky and I'm hoping someone can offer advice on how to use
reshape (or melt/cast) more efficiently.

You do realize that the 'reshape' function is _not_ in the reshape  
package, right? And also that the reshape package has been superseded  
by the reshape2 package?
David.

>
> #For this example I am using climate change data available on-line
>
> file <- ("
> http://processtrends.com/Files/RClimate_consol_temp_anom_latest.csv")
> clim.data <- read.csv(file, header=TRUE)
>
> library(lubridate)
> library(reshape)
>
> #I've been playing with the lubridate package a bit to work with  
> dates, but
> as the climate dataset only uses year and month I have
> #added a "day" to each entry in the "yr_mn" column and then used  
> "dym" from
> lubridate to generate the POSIXlt formatted dates in
> #a new column clim.data$date
>
> clim.data$yr_mn<-paste("01", clim.data$yr_mn, sep="")
> clim.data$date<-dym(clim.data$yr_mn)
>
> #Now to the reshape. The dataframe is in a wide format. The columns  
> GISS,
> HAD, NOAA, RSS, and UAH are all different sources
> #from which the global temperature anomaly has been calculated since  
> 1880
> (actually only 1978 for RSS and UAH). What I would like to
> #do is plot the temperature anomaly vs date and use ggplot to facet  
> by the
> different data source (GISS, HAD, etc.). Thus I need the
> #data in long format with a date column, a temperature anomaly  
> column, and
> a data source column. The code below works, but its
> #really very clunky and I'm sure I am not using these tools as  
> efficiently
> as I can.
>
> #The varying=list(3:7) specifies the columns in the dataframe that
> corresponded to the sources (GISS, etc.), though then in the resulting
> #reshaped dataframe the sources are numbered 1-5, so I have to  
> reassigned
> their names. In addition, the original dataframe has
> #additional data columns I do not want and so after reshaping I create
> another! dataframe with just the columns I need, and
> #then I have to rename them so that I can keep track of what  
> everything is.
> Whew! Not the most elegant of code.
>
> d<-reshape(clim.data, varying=list(3:7),idvar="date",
> v.names="anomaly",direction="long")
>
> d$time<-ifelse(d$time==1,"GISS",d$time)
> d$time<-ifelse(d$time==2,"HAD",d$time)
> d$time<-ifelse(d$time==3,"NOAA",d$time)
> d$time<-ifelse(d$time==4,"RSS",d$time)
> d$time<-ifelse(d$time==5,"UAH",d$time)
>
> new.data<-data.frame(d$date,d$time,d$anomaly)
> names(new.data)<-c("date","source","anomaly")
>
> I realize this is a mess, though it works. I think with just some  
> help on
> how better to work this example I'll probably get over the learning  
> hump
> and actually figure out how to use these data manipulation functions  
> more
> cleanly.
>
> Any advice or assistance would be appreciated.
> Thanks,
> Nate
>
> 	[[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

David Winsemius, MD
Alameda, CA, USA
An embedded and charset-unspecified text was scrubbed...
Name: not available
URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20121213/e2cea93d/attachment.pl>
I think David was pointing out that reshape() is not a reshape2 function.  It is in the stats package.

I am not sure exactly what you are doing but perhaps something along the lines of 

library(reshape2)      
mm  <-  melt(clim.data, id = Cs("yr_frac", "yr_mn",    "AMO", "NINO34", "SSTA"))

is a start?  

I also don't think that the more recent versions of ggplot2 automatically load reshape2 so it may be that you are working with a relatively old installation of ggplot and reshape?

sessionInfo()
R version 2.15.2 (2012-10-26)
Platform: i686-pc-linux-gnu (32-bit)

locale:
 [1] LC_CTYPE=en_CA.UTF-8       LC_NUMERIC=C               LC_TIME=en_CA.UTF-8       
 [4] LC_COLLATE=en_CA.UTF-8     LC_MONETARY=en_CA.UTF-8    LC_MESSAGES=en_CA.UTF-8   
 [7] LC_PAPER=C                 LC_NAME=C                  LC_ADDRESS=C              
[10] LC_TELEPHONE=C             LC_MEASUREMENT=en_CA.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] grid      stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] lubridate_1.2.0    directlabels_2.9   RColorBrewer_1.0-5 gridExtra_0.9.1    stringr_0.6.2     
[6] scales_0.2.3       plyr_1.8           reshape2_1.2.1     ggplot2_0.9.3     

loaded via a namespace (and not attached):
[1] colorspace_1.2-0 dichromat_1.2-4  digest_0.6.0     gtable_0.1.2     labeling_0.1    
[6] MASS_7.3-22      munsell_0.4      proto_0.3-9.2    tools_2.15.2    

John Kane
Kingston ON Canada
-----Original Message-----
From: natemiller77 at gmail.com
Sent: Thu, 13 Dec 2012 09:58:34 -0800
To: dwinsemius at comcast.net
Subject: Re: [R] More efficient use of reshape?

Sorry David,

In my attempt to simplify example and just include the code I felt was
necessary I left out the loading of ggplot2, which then imports reshape2,
and which was actually used in the code I provided. Sorry to the mistake
and my misunderstanding of where the reshape function was coming from.
Should have checked that more carefully.

Thanks,
Nate

On Thu, Dec 13, 2012 at 9:48 AM, David Winsemius
<dwinsemius at comcast.net>wrote:

On Dec 13, 2012, at 9:16 AM, Nathan Miller wrote:

 Hi all,
I have played a bit with the "reshape" package and function along with
"melt" and "cast", but I feel I still don't have a good handle on how
to
use them efficiently. Below I have included a application of "reshape"
that
is rather clunky and I'm hoping someone can offer advice on how to use
reshape (or melt/cast) more efficiently.

You do realize that the 'reshape' function is _not_ in the reshape
package, right? And also that the reshape package has been superseded by
the reshape2 package?

--
David.

#For this example I am using climate change data available on-line

file <- ("
http://processtrends.com/**Files/RClimate_consol_temp_**anom_latest.csv<http://processtrends.com/Files/RClimate_consol_temp_anom_latest.csv>
")
clim.data <- read.csv(file, header=TRUE)

library(lubridate)
library(reshape)

#I've been playing with the lubridate package a bit to work with dates,
but
as the climate dataset only uses year and month I have
#added a "day" to each entry in the "yr_mn" column and then used "dym"
from
lubridate to generate the POSIXlt formatted dates in
#a new column clim.data$date

clim.data$yr_mn<-paste("01", clim.data$yr_mn, sep="")
clim.data$date<-dym(clim.data$**yr_mn)

#Now to the reshape. The dataframe is in a wide format. The columns
GISS,
HAD, NOAA, RSS, and UAH are all different sources
#from which the global temperature anomaly has been calculated since
1880
(actually only 1978 for RSS and UAH). What I would like to
#do is plot the temperature anomaly vs date and use ggplot to facet by
the
different data source (GISS, HAD, etc.). Thus I need the
#data in long format with a date column, a temperature anomaly column,
and
a data source column. The code below works, but its
#really very clunky and I'm sure I am not using these tools as
efficiently
as I can.

#The varying=list(3:7) specifies the columns in the dataframe that
corresponded to the sources (GISS, etc.), though then in the resulting
#reshaped dataframe the sources are numbered 1-5, so I have to
reassigned
their names. In addition, the original dataframe has
#additional data columns I do not want and so after reshaping I create
another! dataframe with just the columns I need, and
#then I have to rename them so that I can keep track of what everything
is.
Whew! Not the most elegant of code.

d<-reshape(clim.data, varying=list(3:7),idvar="date"**,
v.names="anomaly",direction="**long")

d$time<-ifelse(d$time==1,"**GISS",d$time)
d$time<-ifelse(d$time==2,"HAD"**,d$time)
d$time<-ifelse(d$time==3,"**NOAA",d$time)
d$time<-ifelse(d$time==4,"RSS"**,d$time)
d$time<-ifelse(d$time==5,"UAH"**,d$time)

new.data<-data.frame(d$date,d$**time,d$anomaly)
names(new.data)<-c("date","**source","anomaly")

I realize this is a mess, though it works. I think with just some help
on
how better to work this example I'll probably get over the learning
hump
and actually figure out how to use these data manipulation functions
more
cleanly.

Any advice or assistance would be appreciated.
Thanks,
Nate

        [[alternative HTML version deleted]]

______________________________**________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/**listinfo/r-help<https://stat.ethz.ch/mailman/listinfo/r-help>
PLEASE do read the posting guide http://www.R-project.org/**
posting-guide.html <http://www.R-project.org/posting-guide.html>
and provide commented, minimal, self-contained, reproducible code.

David Winsemius, MD
Alameda, CA, USA

	[[alternative HTML version deleted]]

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
____________________________________________________________
Receive Notifications of Incoming Messages
Easily monitor multiple email accounts & access them with a click.
Visit http://www.inbox.com/notifier and check it out!
An embedded and charset-unspecified text was scrubbed...
Name: not available
URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20121214/80d8ceac/attachment.pl>