Data Frame housekeeping

An embedded and charset-unspecified text was scrubbed...
Name: not available
URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20110524/7f921d83/attachment.pl>
My suggestion, since bold doesn't show up in a text only mailing list,
would be to look into the function ?aggregate.

It looks like something like (assuming your data is in a mydat):

mydat.new <- aggregate(cbind(STN_ID, YEAR, MM, DAY) ~ ELEM + ?, mydat,
FUN = ?) #this is up to you

Alternatively, the plyr package is great at transforming data.frames.

On Tue, May 24, 2011 at 3:03 PM, Scott Hatcher
Hello,

I have a large data frame that is organized by date in a peculiar way. I
am seeking advice on how to transform the data into a format that is of
more use to me.

The data is organized as follows:

? ? STN_ID YEAR MM ELEM ? ? ?X1 ? ? ? ? X2 ? ? ? X3 ? ? ? ? X4
X5 ? ? ? ?X6 ? ? ? ? X7
1 ?2402594 1997 ? 9 ? 1 *-00233* *-00204* *-00119* ?-00190 ?-00251
-00243 ?-00249
2 ?2402594 1997 ?10 ?1 ? ? ? ? ? ? ?-00003 ?-00005 ?-00001 ?-00039
-00031 -00036 ?-00033
3 ?2402594 1997 ?11 ?1 ? ? ? ? ? ? ?000025 ?000065 ?000070 ?000069
000115 ?000072 ?000093
4 ?2402594 1997 ?12 ?1 ? ? ? ? ? ? ?000160 ?000114 ?000143 ?000140
000093 ?000068 ?000157
5 ?2402594 1998 ?1 ? ?1 ? ? ? ? ? ? ?000067 ?000095 ?000139 ?000113
000066 ?000081 ?000070
6 ?2402594 1998 ?2 ? ?1 ? ? ? ? ? ? ?000098 ?000102 ?000140 ?000124
000082 ?000111 ?000047
7 ?2402594 1998 ?3 ? ?1 ? ? ? ? ? ? ?-00039 ?-00006 ?000015 ?000015
000016 ?000035 ?000013
8 ?2402594 1998 ?4 ? ?1 ? ? ? ? ? ? ?-00035 ?-00046 ?-00046 ?-00062
-00018 ?-00025 ?-00012
9 ?2402594 1998 ?5 ? ?1 ? ? ? ? ? ? ?000031 ?000011 ?-00005 ?-00061
-00061 ?-00080 ?-00217
10 2402594 1997 ?9 ? ?2 *-00339 ?-00339 ?-00343* ?-00346 ?-00285
-00253 ?-00253
11 2402594 1997 ?10 ?2 ? ? ? ? ? ? ?-00207 ?-00289 ?-00278 ?-00271
-00258 ?-00315 ?-00341
12 2402594 1997 ?11 ?2 ? ? ? ? ? ? ?-00242 ?-00230 ?-00206 ?-00180
-00256 ?-00227 ?-00241
13 2402594 1997 ?12 ?2 ? ? ? ? ? ? ?-00155 ?-00153 ?-00118 ?-00066
-00088 ?-00073 ?-00032
14 2402594 1998 ?1 ? ?2 ? ? ? ? ? ? ?000003 ?-00021 ?-00033 ?-00022
-00014 ?000001 ?000008
15 2402594 1998 ?2 ? ?2 ? ? ? ? ? ? ?000050 ?000077 ?000106 ?000073
000060 ?000060 ?000083
16 2402594 1998 ?3 ? ?2 ? ? ? ? ? ? ?000095 ?000063 ?000030 ?000057
000073 ?000144 ?000090
17 2402594 1998 ?4 ? ?2 ? ? ? ? ? ? ?000128 ?000178 ?000195 ?000157
000160 ?000160 ?000117
18 2402594 1998 ?5 ? ?2 ? ? ? ? ? ? ?000074 ?000064 ?000051 ?000027
000053 ?000063 ?000067

Where "MM" is the month of the year, and ELEM is the variable to which
the values in the X* columns describe (in the actual data there are 31 X
columns, one for each day of the month). The values in bold are the
values that are transferred into the small chart below (which is the
result I hope to get). This is to give a sense of how the data is picked
out of the original data frame.

I would like to organize the data so it looks like this:

? ? ? STN_ID YEAR MM DAY ? ?ELEM1 ELEM2
1 ? ? 2402594 1997 ? 9 ?X1 ? ? ? -00233 -00339
2 ? ? 2402594 1997 ? 9 ?X2 ? ? ? -00204 000077
3 ? ? 2402594 1997 ? 9 ?X3 ? ? ? -00119 000030

Such that I create a new column named "DAY" that is made up of the
numbers following "X" in the original data.frame columns. Also, the ELEM
values are converted to columns and parsed with the ELEM code (in this
case 1 and 2).

I have tried to split apart the columns, transform them, and bind them
back together, but my ability to do so just isn't there yet. I am still
fairly new to R, and would really appreciate some help in working
towards organizing this data frame.

Thanks in advance,
Scott Hatcher

? ? ? ?[[alternative HTML version deleted]]

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

===============================================
Jon Daily
Technician
===============================================
#!/usr/bin/env outside
# It's great, trust me.

Hello,

I have a large data frame that is organized by date in a peculiar  
way. I
am seeking advice on how to transform the data into a format that is  
of
more use to me.

The data is organized as follows:

    STN_ID YEAR MM ELEM      X1         X2       X3          
X4        X5        X6         X7
1  2402594 1997   9   1 *-00233* *-00204* *-00119*  -00190  -00251   
-00243  -00249
2  2402594 1997  10  1              -00003  -00005  -00001  -00039   
-00031  -00036  -00033
3  2402594 1997  11  1              000025  000065  000070  000069   
000115  000072  000093

Where "MM" is the month of the year, and ELEM is the variable to which
the values in the X* columns describe (in the actual data there are  
31 X
columns, one for each day of the month). The values in bold are the
values that are transferred into the small chart below (which is the
result I hope to get). This is to give a sense of how the data is  
picked
out of the original data frame.
assuming this dataframe is named 'tst':

require(reshape2)
mtst <- melt(tst[, 1:7], id.vars=1:4)  Only select idvars and  X1:X3
  str(mtst)
#----------
'data.frame':	54 obs. of  6 variables:
  $ STN_ID  : num  2402594 2402594 2402594 2402594 2402594 ...
  $ YEAR    : num  1997 1997 1997 1997 1998 ...
  $ MM      : num  9 10 11 12 1 2 3 4 5 9 ...
  $ ELEM    : num  1 1 1 1 1 1 1 1 1 2 ...
  $ variable: Factor w/ 3 levels "X1","X2","X3": 1 1 1 1 1 1 1 1 1 1 ...
  $ value   : chr  "-00233" "-00003" "000025" "000160" ...

dcast(mtst, STN_ID +YEAR+ MM  + variable ~ ELEM)
#---------
     STN_ID YEAR MM variable      1      2
1  2402594 1997  9       X1 -00233 -00339
2  2402594 1997  9       X2 -00204 -00339
3  2402594 1997  9       X3 -00119 -00343
4  2402594 1997 10       X1 -00003 -00207
5  2402594 1997 10       X2 -00005 -00289
6  2402594 1997 10       X3 -00001 -00278
7  2402594 1997 11       X1 000025 -00242
snipped output
I would like to organize the data so it looks like this:

      STN_ID YEAR MM DAY    ELEM1 ELEM2
1     2402594 1997   9  X1       -00233 -00339
2     2402594 1997   9  X2       -00204 000077
3     2402594 1997   9  X3       -00119 000030
Where is that second column coming from. I don't see it in the data  
example
Such that I create a new column named "DAY" that is made up of the
numbers following "X" in the original data.frame columns. Also, the  
ELEM
values are converted to columns and parsed with the ELEM code (in this
case 1 and 2).

I have tried to split apart the columns, transform them, and bind them
back together, but my ability to do so just isn't there yet. I am  
still
fairly new to R, and would really appreciate some help in working
towards organizing this data frame.

Thanks in advance,
Scott Hatcher

	[[alternative HTML version deleted]]

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
David Winsemius, MD
West Hartford, CT
Hello Dr. Winsemius,

First of all, thank you for your prompt and helpful reply. Also, for 
providing something I hoped would be produced from joining this mailing 
list: a means of discovering incredibly useful packages such as the 
"reshape2" one you have introduced me too.

I have a follow up question to your solution (which should produce 
exactly what I need):

when I run the cast function to reassemble the data frame I get:

Error in names(data) <- array_names(res$labels[[2]]) :
   'names' attribute [7] must be the same length as the vector [1]

This signaled to me that the function was returning 7 values where it 
expected only 1. To test this I applied a summary function "mean" to the 
cast, and the result processed (however it only produced NA's because my 
values were class:factors). What I don't understand is where these 
multiple values are coming from; there should be only a single value 
corresponding to the 4 id.vars given in the cast function 
(STN_ID,YEAR,MM,variable).

Thanks again for your help,

Scott Hatcher
On May 24, 2011, at 3:03 PM, Scott Hatcher wrote:

Hello,

I have a large data frame that is organized by date in a peculiar way. I
am seeking advice on how to transform the data into a format that is of
more use to me.

The data is organized as follows:

    STN_ID YEAR MM ELEM      X1         X2       X3         X4        
X5        X6         X7
1  2402594 1997   9   1 *-00233* *-00204* *-00119*  -00190  -00251  
-00243  -00249
2  2402594 1997  10  1              -00003  -00005  -00001  -00039  
-00031  -00036  -00033
3  2402594 1997  11  1              000025  000065  000070  000069  
000115  000072  000093

Where "MM" is the month of the year, and ELEM is the variable to which
the values in the X* columns describe (in the actual data there are 31 X
columns, one for each day of the month). The values in bold are the
values that are transferred into the small chart below (which is the
result I hope to get). This is to give a sense of how the data is picked
out of the original data frame.
assuming this dataframe is named 'tst':

require(reshape2)
mtst <- melt(tst[, 1:7], id.vars=1:4)  Only select idvars and  X1:X3
 str(mtst)
#----------
'data.frame':    54 obs. of  6 variables:
 $ STN_ID  : num  2402594 2402594 2402594 2402594 2402594 ...
 $ YEAR    : num  1997 1997 1997 1997 1998 ...
 $ MM      : num  9 10 11 12 1 2 3 4 5 9 ...
 $ ELEM    : num  1 1 1 1 1 1 1 1 1 2 ...
 $ variable: Factor w/ 3 levels "X1","X2","X3": 1 1 1 1 1 1 1 1 1 1 ...
 $ value   : chr  "-00233" "-00003" "000025" "000160" ...

dcast(mtst, STN_ID +YEAR+ MM  + variable ~ ELEM)
#---------
    STN_ID YEAR MM variable      1      2
1  2402594 1997  9       X1 -00233 -00339
2  2402594 1997  9       X2 -00204 -00339
3  2402594 1997  9       X3 -00119 -00343
4  2402594 1997 10       X1 -00003 -00207
5  2402594 1997 10       X2 -00005 -00289
6  2402594 1997 10       X3 -00001 -00278
7  2402594 1997 11       X1 000025 -00242
snipped output

I would like to organize the data so it looks like this:

      STN_ID YEAR MM DAY    ELEM1 ELEM2
1     2402594 1997   9  X1       -00233 -00339
2     2402594 1997   9  X2       -00204 000077
3     2402594 1997   9  X3       -00119 000030
Where is that second column coming from. I don't see it in the data 
example
Such that I create a new column named "DAY" that is made up of the
numbers following "X" in the original data.frame columns. Also, the ELEM
values are converted to columns and parsed with the ELEM code (in this
case 1 and 2).

I have tried to split apart the columns, transform them, and bind them
back together, but my ability to do so just isn't there yet. I am still
fairly new to R, and would really appreciate some help in working
towards organizing this data frame.

Thanks in advance,
Scott Hatcher

    [[alternative HTML version deleted]]

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
David Winsemius, MD
West Hartford, CT

Hello Dr. Winsemius,

First of all, thank you for your prompt and helpful reply. Also, for  
providing something I hoped would be produced from joining this  
mailing list: a means of discovering incredibly useful packages such  
as the "reshape2" one you have introduced me too.

I have a follow up question to your solution (which should produce  
exactly what I need):

when I run the cast function to reassemble the data frame I get:
I used `dcast`.
Error in names(data) <- array_names(res$labels[[2]]) :
 'names' attribute [7] must be the same length as the vector [1]
And I obviously didn't get that error, so there might be a difference  
in either the code (which you did not show), or the data (which you  
did not offer in a reproducible form).
This signaled to me that the function was returning 7 values where  
it expected only 1. To test this I applied a summary function "mean"  
to the cast, and the result processed (however it only produced NA's  
because my values were class:factors). What I don't understand is  
where these multiple values are coming from; there should be only a  
single value corresponding to the 4 id.vars given in the cast  
function (STN_ID,YEAR,MM,variable).
If you want further effort you should address the inadequacies of your  
question. It is very possible that you will need to acquaint yourself  
with the use of either `dump` pr `dput`.
David.
>
> Thanks again for your help,
>
> Scott Hatcher
>
> On 24/05/2011 5:16 PM, David Winsemius wrote:
>>
>> On May 24, 2011, at 3:03 PM, Scott Hatcher wrote:
>>
>>> Hello,
>>>
>>> I have a large data frame that is organized by date in a peculiar  
>>> way. I
>>> am seeking advice on how to transform the data into a format that  
>>> is of
>>> more use to me.
>>>
>>> The data is organized as follows:
>>>
>>>    STN_ID YEAR MM ELEM      X1         X2       X3          
>>> X4        X5        X6         X7
>>> 1  2402594 1997   9   1 *-00233* *-00204* *-00119*  -00190   
>>> -00251  -00243  -00249
>>> 2  2402594 1997  10  1              -00003  -00005  -00001   
>>> -00039  -00031  -00036  -00033
>>> 3  2402594 1997  11  1              000025  000065  000070   
>>> 000069  000115  000072  000093
>>>
>>> Where "MM" is the month of the year, and ELEM is the variable to  
>>> which
>>> the values in the X* columns describe (in the actual data there  
>>> are 31 X
>>> columns, one for each day of the month). The values in bold are the
>>> values that are transferred into the small chart below (which is the
>>> result I hope to get). This is to give a sense of how the data is  
>>> picked
>>> out of the original data frame.
>>
>> assuming this dataframe is named 'tst':
>>
>> require(reshape2)
>> mtst <- melt(tst[, 1:7], id.vars=1:4)  Only select idvars and  X1:X3
>> str(mtst)
>> #----------
>> 'data.frame':    54 obs. of  6 variables:
>> $ STN_ID  : num  2402594 2402594 2402594 2402594 2402594 ...
>> $ YEAR    : num  1997 1997 1997 1997 1998 ...
>> $ MM      : num  9 10 11 12 1 2 3 4 5 9 ...
>> $ ELEM    : num  1 1 1 1 1 1 1 1 1 2 ...
>> $ variable: Factor w/ 3 levels "X1","X2","X3": 1 1 1 1 1 1 1 1 1  
>> 1 ...
>> $ value   : chr  "-00233" "-00003" "000025" "000160" ...
>>
>> dcast(mtst, STN_ID +YEAR+ MM  + variable ~ ELEM)
>> #---------
>>    STN_ID YEAR MM variable      1      2
>> 1  2402594 1997  9       X1 -00233 -00339
>> 2  2402594 1997  9       X2 -00204 -00339
>> 3  2402594 1997  9       X3 -00119 -00343
>> 4  2402594 1997 10       X1 -00003 -00207
>> 5  2402594 1997 10       X2 -00005 -00289
>> 6  2402594 1997 10       X3 -00001 -00278
>> 7  2402594 1997 11       X1 000025 -00242
>> snipped output
>>
>>>
>>> I would like to organize the data so it looks like this:
>>>
>>>      STN_ID YEAR MM DAY    ELEM1 ELEM2
>>> 1     2402594 1997   9  X1       -00233 -00339
>>> 2     2402594 1997   9  X2       -00204 000077
>>> 3     2402594 1997   9  X3       -00119 000030
>>
>> Where is that second column coming from. I don't see it in the data  
>> example
>>>
>>> Such that I create a new column named "DAY" that is made up of the
>>> numbers following "X" in the original data.frame columns. Also,  
>>> the ELEM
>>> values are converted to columns and parsed with the ELEM code (in  
>>> this
>>> case 1 and 2).
>>>
>>> I have tried to split apart the columns, transform them, and bind  
>>> them
>>> back together, but my ability to do so just isn't there yet. I am  
>>> still
>>> fairly new to R, and would really appreciate some help in working
>>> towards organizing this data frame.
>>>
>>> Thanks in advance,
>>> Scott Hatcher
>>>
>>>    [[alternative HTML version deleted]]
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>
>> David Winsemius, MD
>> West Hartford, CT
>>

David Winsemius, MD
West Hartford, CT