Skip to content

Data Frame housekeeping

5 messages · Scott Hatcher, Jonathan Daily, David Winsemius

#
My suggestion, since bold doesn't show up in a text only mailing list,
would be to look into the function ?aggregate.

It looks like something like (assuming your data is in a mydat):

mydat.new <- aggregate(cbind(STN_ID, YEAR, MM, DAY) ~ ELEM + ?, mydat,
FUN = ?) #this is up to you

Alternatively, the plyr package is great at transforming data.frames.

On Tue, May 24, 2011 at 3:03 PM, Scott Hatcher
<scott.v.hatcher at gmail.com> wrote:

  
    
#
On May 24, 2011, at 3:03 PM, Scott Hatcher wrote:

            
assuming this dataframe is named 'tst':

require(reshape2)
mtst <- melt(tst[, 1:7], id.vars=1:4)  Only select idvars and  X1:X3
  str(mtst)
#----------
'data.frame':	54 obs. of  6 variables:
  $ STN_ID  : num  2402594 2402594 2402594 2402594 2402594 ...
  $ YEAR    : num  1997 1997 1997 1997 1998 ...
  $ MM      : num  9 10 11 12 1 2 3 4 5 9 ...
  $ ELEM    : num  1 1 1 1 1 1 1 1 1 2 ...
  $ variable: Factor w/ 3 levels "X1","X2","X3": 1 1 1 1 1 1 1 1 1 1 ...
  $ value   : chr  "-00233" "-00003" "000025" "000160" ...

dcast(mtst, STN_ID +YEAR+ MM  + variable ~ ELEM)
#---------
     STN_ID YEAR MM variable      1      2
1  2402594 1997  9       X1 -00233 -00339
2  2402594 1997  9       X2 -00204 -00339
3  2402594 1997  9       X3 -00119 -00343
4  2402594 1997 10       X1 -00003 -00207
5  2402594 1997 10       X2 -00005 -00289
6  2402594 1997 10       X3 -00001 -00278
7  2402594 1997 11       X1 000025 -00242
snipped output
Where is that second column coming from. I don't see it in the data  
example
David Winsemius, MD
West Hartford, CT
#
Hello Dr. Winsemius,

First of all, thank you for your prompt and helpful reply. Also, for 
providing something I hoped would be produced from joining this mailing 
list: a means of discovering incredibly useful packages such as the 
"reshape2" one you have introduced me too.

I have a follow up question to your solution (which should produce 
exactly what I need):

when I run the cast function to reassemble the data frame I get:

Error in names(data) <- array_names(res$labels[[2]]) :
   'names' attribute [7] must be the same length as the vector [1]

This signaled to me that the function was returning 7 values where it 
expected only 1. To test this I applied a summary function "mean" to the 
cast, and the result processed (however it only produced NA's because my 
values were class:factors). What I don't understand is where these 
multiple values are coming from; there should be only a single value 
corresponding to the 4 id.vars given in the cast function 
(STN_ID,YEAR,MM,variable).

Thanks again for your help,

Scott Hatcher
On 24/05/2011 5:16 PM, David Winsemius wrote:
#
On May 25, 2011, at 1:16 PM, Scott Hatcher wrote:

            
I used `dcast`.
And I obviously didn't get that error, so there might be a difference  
in either the code (which you did not show), or the data (which you  
did not offer in a reproducible form).
If you want further effort you should address the inadequacies of your  
question. It is very possible that you will need to acquaint yourself  
with the use of either `dump` pr `dput`.