An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20121213/80454138/attachment.pl>
More efficient use of reshape?
5 messages · David Winsemius, Nathan Miller, John Kane
On Dec 13, 2012, at 9:16 AM, Nathan Miller wrote:
Hi all, I have played a bit with the "reshape" package and function along with "melt" and "cast", but I feel I still don't have a good handle on how to use them efficiently. Below I have included a application of "reshape" that is rather clunky and I'm hoping someone can offer advice on how to use reshape (or melt/cast) more efficiently.
You do realize that the 'reshape' function is _not_ in the reshape package, right? And also that the reshape package has been superseded by the reshape2 package?
David.
>
> #For this example I am using climate change data available on-line
>
> file <- ("
> http://processtrends.com/Files/RClimate_consol_temp_anom_latest.csv")
> clim.data <- read.csv(file, header=TRUE)
>
> library(lubridate)
> library(reshape)
>
> #I've been playing with the lubridate package a bit to work with
> dates, but
> as the climate dataset only uses year and month I have
> #added a "day" to each entry in the "yr_mn" column and then used
> "dym" from
> lubridate to generate the POSIXlt formatted dates in
> #a new column clim.data$date
>
> clim.data$yr_mn<-paste("01", clim.data$yr_mn, sep="")
> clim.data$date<-dym(clim.data$yr_mn)
>
> #Now to the reshape. The dataframe is in a wide format. The columns
> GISS,
> HAD, NOAA, RSS, and UAH are all different sources
> #from which the global temperature anomaly has been calculated since
> 1880
> (actually only 1978 for RSS and UAH). What I would like to
> #do is plot the temperature anomaly vs date and use ggplot to facet
> by the
> different data source (GISS, HAD, etc.). Thus I need the
> #data in long format with a date column, a temperature anomaly
> column, and
> a data source column. The code below works, but its
> #really very clunky and I'm sure I am not using these tools as
> efficiently
> as I can.
>
> #The varying=list(3:7) specifies the columns in the dataframe that
> corresponded to the sources (GISS, etc.), though then in the resulting
> #reshaped dataframe the sources are numbered 1-5, so I have to
> reassigned
> their names. In addition, the original dataframe has
> #additional data columns I do not want and so after reshaping I create
> another! dataframe with just the columns I need, and
> #then I have to rename them so that I can keep track of what
> everything is.
> Whew! Not the most elegant of code.
>
> d<-reshape(clim.data, varying=list(3:7),idvar="date",
> v.names="anomaly",direction="long")
>
> d$time<-ifelse(d$time==1,"GISS",d$time)
> d$time<-ifelse(d$time==2,"HAD",d$time)
> d$time<-ifelse(d$time==3,"NOAA",d$time)
> d$time<-ifelse(d$time==4,"RSS",d$time)
> d$time<-ifelse(d$time==5,"UAH",d$time)
>
> new.data<-data.frame(d$date,d$time,d$anomaly)
> names(new.data)<-c("date","source","anomaly")
>
> I realize this is a mess, though it works. I think with just some
> help on
> how better to work this example I'll probably get over the learning
> hump
> and actually figure out how to use these data manipulation functions
> more
> cleanly.
>
> Any advice or assistance would be appreciated.
> Thanks,
> Nate
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
David Winsemius, MD
Alameda, CA, USA
An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20121213/e2cea93d/attachment.pl>
I think David was pointing out that reshape() is not a reshape2 function. It is in the stats package.
I am not sure exactly what you are doing but perhaps something along the lines of
library(reshape2)
mm <- melt(clim.data, id = Cs("yr_frac", "yr_mn", "AMO", "NINO34", "SSTA"))
is a start?
I also don't think that the more recent versions of ggplot2 automatically load reshape2 so it may be that you are working with a relatively old installation of ggplot and reshape?
sessionInfo()
R version 2.15.2 (2012-10-26)
Platform: i686-pc-linux-gnu (32-bit)
locale:
[1] LC_CTYPE=en_CA.UTF-8 LC_NUMERIC=C LC_TIME=en_CA.UTF-8
[4] LC_COLLATE=en_CA.UTF-8 LC_MONETARY=en_CA.UTF-8 LC_MESSAGES=en_CA.UTF-8
[7] LC_PAPER=C LC_NAME=C LC_ADDRESS=C
[10] LC_TELEPHONE=C LC_MEASUREMENT=en_CA.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] grid stats graphics grDevices utils datasets methods base
other attached packages:
[1] lubridate_1.2.0 directlabels_2.9 RColorBrewer_1.0-5 gridExtra_0.9.1 stringr_0.6.2
[6] scales_0.2.3 plyr_1.8 reshape2_1.2.1 ggplot2_0.9.3
loaded via a namespace (and not attached):
[1] colorspace_1.2-0 dichromat_1.2-4 digest_0.6.0 gtable_0.1.2 labeling_0.1
[6] MASS_7.3-22 munsell_0.4 proto_0.3-9.2 tools_2.15.2
John Kane
Kingston ON Canada
-----Original Message----- From: natemiller77 at gmail.com Sent: Thu, 13 Dec 2012 09:58:34 -0800 To: dwinsemius at comcast.net Subject: Re: [R] More efficient use of reshape? Sorry David, In my attempt to simplify example and just include the code I felt was necessary I left out the loading of ggplot2, which then imports reshape2, and which was actually used in the code I provided. Sorry to the mistake and my misunderstanding of where the reshape function was coming from. Should have checked that more carefully. Thanks, Nate On Thu, Dec 13, 2012 at 9:48 AM, David Winsemius <dwinsemius at comcast.net>wrote:
On Dec 13, 2012, at 9:16 AM, Nathan Miller wrote: Hi all,
I have played a bit with the "reshape" package and function along with "melt" and "cast", but I feel I still don't have a good handle on how to use them efficiently. Below I have included a application of "reshape" that is rather clunky and I'm hoping someone can offer advice on how to use reshape (or melt/cast) more efficiently.
You do realize that the 'reshape' function is _not_ in the reshape package, right? And also that the reshape package has been superseded by the reshape2 package? -- David.
#For this example I am using climate change data available on-line
file <- ("
http://processtrends.com/**Files/RClimate_consol_temp_**anom_latest.csv<http://processtrends.com/Files/RClimate_consol_temp_anom_latest.csv>
")
clim.data <- read.csv(file, header=TRUE)
library(lubridate)
library(reshape)
#I've been playing with the lubridate package a bit to work with dates,
but
as the climate dataset only uses year and month I have
#added a "day" to each entry in the "yr_mn" column and then used "dym"
from
lubridate to generate the POSIXlt formatted dates in
#a new column clim.data$date
clim.data$yr_mn<-paste("01", clim.data$yr_mn, sep="")
clim.data$date<-dym(clim.data$**yr_mn)
#Now to the reshape. The dataframe is in a wide format. The columns
GISS,
HAD, NOAA, RSS, and UAH are all different sources
#from which the global temperature anomaly has been calculated since
1880
(actually only 1978 for RSS and UAH). What I would like to
#do is plot the temperature anomaly vs date and use ggplot to facet by
the
different data source (GISS, HAD, etc.). Thus I need the
#data in long format with a date column, a temperature anomaly column,
and
a data source column. The code below works, but its
#really very clunky and I'm sure I am not using these tools as
efficiently
as I can.
#The varying=list(3:7) specifies the columns in the dataframe that
corresponded to the sources (GISS, etc.), though then in the resulting
#reshaped dataframe the sources are numbered 1-5, so I have to
reassigned
their names. In addition, the original dataframe has
#additional data columns I do not want and so after reshaping I create
another! dataframe with just the columns I need, and
#then I have to rename them so that I can keep track of what everything
is.
Whew! Not the most elegant of code.
d<-reshape(clim.data, varying=list(3:7),idvar="date"**,
v.names="anomaly",direction="**long")
d$time<-ifelse(d$time==1,"**GISS",d$time)
d$time<-ifelse(d$time==2,"HAD"**,d$time)
d$time<-ifelse(d$time==3,"**NOAA",d$time)
d$time<-ifelse(d$time==4,"RSS"**,d$time)
d$time<-ifelse(d$time==5,"UAH"**,d$time)
new.data<-data.frame(d$date,d$**time,d$anomaly)
names(new.data)<-c("date","**source","anomaly")
I realize this is a mess, though it works. I think with just some help
on
how better to work this example I'll probably get over the learning
hump
and actually figure out how to use these data manipulation functions
more
cleanly.
Any advice or assistance would be appreciated.
Thanks,
Nate
[[alternative HTML version deleted]]
______________________________**________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/**listinfo/r-help<https://stat.ethz.ch/mailman/listinfo/r-help> PLEASE do read the posting guide http://www.R-project.org/** posting-guide.html <http://www.R-project.org/posting-guide.html> and provide commented, minimal, self-contained, reproducible code.
David Winsemius, MD Alameda, CA, USA
[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
____________________________________________________________ Receive Notifications of Incoming Messages Easily monitor multiple email accounts & access them with a click. Visit http://www.inbox.com/notifier and check it out!
An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20121214/80d8ceac/attachment.pl>