Skip to content

Wide to long form conversion

10 messages · Andrew Miles, Dennis Murphy, Jim Lemon +2 more

#
I have some data 'myData' in wide form (attached at the end), and
would like to convert it to long form. I wish to have five variables
in the result:

1) Subj: factor
2) Group: between-subjects factor (2 levels: s / w)
3) Reference: within-subject factor (2 levels: Me / She)
4) F: within-subject factor (2 levels: F1 / F2)
5) J: within-subject factor (2 levels: J1 / J2)

As this is the first time I'm learning such a conversion, could
someone help me out?

Many thanks,
Gang
Group MeF1 MeJ1 SheF1 SheJ1 MeF2 MeJ2 SheF2 SheJ2 Subj
1      s    4    5     6    10    3    6     6     9   S1
2      s    6    5     5     6    4    3     5     6   S2
3      s    7    4     6     5    7    4     5     3   S3
4      s    8    5     8     7    7    1     8     6   S4
5      s   10    6     4     7    9    6     4     6   S5
6      s    5    2     4     7    4    1     4     2   S6
7      s   13    2    10     4   11    2     4     3   S7
8      s    8    1     3    11    6    0     3    10   S8
9      s    6    9     5     8    6    8     5     6   S9
10     s   14    5     6    10   13    5     5    10  S10
11     s   15    2    18     2   14    1    18     2  S11
12     s    6    9     4     9    5   11     3     8  S12
13     s    5    5     0    12    4    3     0     8  S13
14     s    5    6     4     9    4    6     2     6  S14
15     s   14    5    12     3   12    3    11     3  S15
16     s    7    2    11     3    5    2    10     2  S16
17     s    1    7     4     5    1    6     3     5  S17
18     s    6    2     7     4    6    2     7     4  S18
19     s    9    4     8     5   10    4     6     3  S19
20     s    8    2     6     5    9    2     6     4  S20
21     s    6    5     5     7    6    6     5     5  S21
22     s    8    8     3     7    6    7     5     3  S22
23     s   11    4     6     7    1    1     6     4  S23
24     s    6    3     2     4    6    4     2     2  S24
25     s    4    4     6     6    2    3     4     6  S25
26     w    5    9     4     7    3    7     3     5  S26
27     w    7    6     3     5    4    1     0     4  S27
28     w   10    4    14     2    8    4    10     2  S28
29     w    9    7     5     6    8    4     5     3  S29
30     w    9    2     7     5    6    2     6     5  S30
31     w    6    7     6     7    6    5     5     8  S31
32     w    7    6    12     7    6    3    10     7  S32
33     w   12    3     8     9   11    3     4     7  S33
34     w   12    2    10     5    9    2     6     3  S34
35     w    6    3    10     4    5    3     5     3  S35
36     w    9    3     9     9    6    3     7     8  S36
37     w    5   11     7     7    4   11     3     4  S37
38     w    7    4     4     6    7    3     1     5  S38
39     w    6    5     1     8    3    3     0     8  S39
40     w   10    3    10     2    7    3     7     2  S40
41     w    1   11     7     5    1    8     4     3  S41
42     w   10    5     6    10   10    4     3     9  S42
43     w    6    3     9     2    4    2     6     0  S43
44     w    9    5    11     4    5    4     7     3  S44
45     w    8    5     6     3    8    4     2     3  S45
46     w    8    4     8     7    4    1     2     6  S46
47     w   12    2     6     2   10    1     5     2  S47
48     w   10    6     9     8    7    5     7     8  S48
49     w   13    6    15     1   12    4    14     0  S49
50     w    7    8     1    12    4    7     1    11  S50
51     w   12    3     9     4    9    1     7     4  S51
#
Thanks for the pointer! I still couldn't figure out how to convert my
data because the example at stackoverflow seems to have only one
variable (Year) while there are three within-subject variables plus
one between-subjects variable. Any further help?

Gang
On Thu, Oct 6, 2011 at 4:34 PM, Andrew Miles <rstuff.miles at gmail.com> wrote:
#
Hi:
I don't see how you can get all of 3-5 given the way your data is
structured. The problem is that each column contains two of the three
variables you want, but not all three. I can see a way to get

Subj   Group  Ref   Time  F   J
  S1       s      Me     1     4   5
  S1       s      Me     2     3   6
  S1       s     She    1     6  10
  S1       s     She    2     6   9

or an 8 line version with Ref (4 Me, 4 She), Factor (F1, J1, F2, J2)
repeated twice and the appropriate response vector, but not a way
where you have three columns for Ref, F and J. For example, what is
the 'J' for MeF1 or the F for SheJ2?

With that, here are a few stabs using the reshape2 package. The first
step is to do a little renaming of your data frame so that one can use
the colsplit() function to generate a new set of variables.

names(myData) <- c("Group", "Me_F_1",  "Me_J_1",  "She_F_1", "She_J_1",
                   "Me_F_2",  "Me_J_2",  "She_F_2", "She_J_2", "Subj")

library('plyr')
library('reshape2')

# collapses the eight columns to be reshaped into a factor named
# variable with a corresponding variable named value
mData <- melt(myData, id = c('Subj', 'Group'))
head(mData)

# Split the original variables into three new columns, named
# Ref, Var and Time, respectively:
newvars <- colsplit(mData$variable, '_', c('Ref', 'Var', 'Time'))

# Append these to the melted data frame and remove 'variable'
mData2 <- cbind(mData, newvars)[, -3]

# This comes closest to your original intent:
mData3 <- arrange(mData2, Subj, Ref, Var, Time)
head(mData3, 8)

   Subj Group value Ref Var Time
1    S1     s     4  Me   F    1
2    S1     s     3  Me   F    2
3    S1     s     5  Me   J    1
4    S1     s     6  Me   J    2
5    S1     s     6 She   F    1
6    S1     s     6 She   F    2
7    S1     s    10 She   J    1
8    S1     s     9 She   J    2

# Some rearrangements to consider:
mData4 <- cast(mData3, Subj + Group + Ref + Time ~ Var, value_var = 'value')
head(mData4, 4)

  Subj Group Ref Time  F  J
1   S1     s  Me    1  4  5
2   S1     s  Me    2  3  6
3   S1     s She    1  6 10
4   S1     s She    2  6  9

mData5 <- cast(mData3, Subj + Group + Ref + Var ~ Time, value_var = 'value')
head(mData5, 4)

  Subj Group Ref Var  1  2
1   S1     s  Me   F  4  3
2   S1     s  Me   J  5  6
3   S1     s She   F  6  6
4   S1     s She   J 10  9

If you like this one, it's probably a good idea to rename the last two
columns 'Time1' and 'Time2' or something similar.

HTH,
Dennis
On Thu, Oct 6, 2011 at 1:28 PM, Gang Chen <gangchen6 at gmail.com> wrote:
#
On 10/07/2011 07:28 AM, Gang Chen wrote:
Hi Gang,
I don't know whether this is the format you want, but:

library(prettyR)
rep_n_stack(mydata,matrix(c(2,3,6,7,4,5,8,9),nrow=2,byrow=TRUE))

Jim
#
Dennis,

Yes, I was wrong! The variables should be:

1) Subj: factor
2) Group: between-subjects factor (2 levels: s / w)
3) Reference: within-subject factor (2 levels: Me / She)
4) F/J: within-subject factor (2 levels: F / J)
5) Time: within-subject factor (2 levels: 1 / 2)
6) Value

So your mData3 is what I was looking for. Thanks a lot!

Gang
On Thu, Oct 6, 2011 at 9:09 PM, Dennis Murphy <djmuser at gmail.com> wrote:
#
Jim, I really appreciate your help!

I like the power of rep_n_stack, but how can I use rep_n_stack to get
the following result?

  Subj Group value Ref Var Time
1    S1     s     4  Me   F    1
2    S1     s     3  Me   F    2
3    S1     s     5  Me   J    1
4    S1     s     6  Me   J    2
5    S1     s     6 She   F    1
6    S1     s     6 She   F    2
7    S1     s    10 She   J    1
8    S1     s     9 She   J    2
On Fri, Oct 7, 2011 at 7:16 AM, Jim Lemon <jim at bitwrit.com.au> wrote:
#
On Oct 7, 2011, at 7:40 AM, Gang Chen wrote:

            
I was not able to construct a one step solution with `reshape` that  
will contains all the columns. You can do it in about 4 steps by first  
making the data "long" and then adding annotation columns. Using just  
rows 1 and 26 you might get:

reshape(myData[c(1,26), ], idvar=c("Group","Subj"),
        direction="long",
        varying=2:9,
        v.names=c("value") )
         Group Subj time value
s.S1.1      s   S1    1      4
w.S26.1     w  S26    1      5
s.S1.2      s   S1    2      5
w.S26.2     w  S26    2      9
s.S1.3      s   S1    3      6
w.S26.3     w  S26    3      4
s.S1.4      s   S1    4     10
w.S26.4     w  S26    4      7
s.S1.5      s   S1    5      3
w.S26.5     w  S26    5      3
s.S1.6      s   S1    6      6
w.S26.6     w  S26    6      7
s.S1.7      s   S1    7      6
w.S26.7     w  S26    7      3
s.S1.8      s   S1    8      9
w.S26.8     w  S26    8      5

The 'time' variable is not really what you wanted but refers to the  
sequence along the original wide column names
You can add the desired  Ref, Var and Time columms with these  
constructions:

 > str(times<-rep(c(1,2), length=nrow(myData)*8 )  )
  num [1:408] 1 2 1 2 1 2 1 2 1 2 ...
 > str(times<-rep(c("F","J"), each=2, length=nrow(myData)*8 )  )
  chr [1:408] "F" "F" "J" "J" "F" "F" "J" "J" "F" "F" ...
 > str(times<-rep(c("Me","She"), each=4, length=nrow(myData)*8 )  )
  chr [1:408] "Me" "Me" "Me" "Me" "She" "She" "She" "She" ...

longData <- reshape(myData, idvar=c("Group","Subj"),
        ction="long",
        varying=2:9,
        v.names=c("value") )
longData$Time <- rep(c(1,2), length=nrow(myData)*8 )
longData$Var <- rep(c("F","J"), each=2, length=nrow(myData)*8 )
longData$Ref <- rep(c("Me","She"), each=4, length=nrow(myData)*8 )
longData <- longData[order(longData$Subj), ]

Looking at Jim Lemon's response, I think he just misinterpreted the  
structure of your data but gave you a perfectly usable response. You  
could have done much the same thing with a minor modification:

 > str(rep_n_stack(myData,matrix(c(2,3,6,7,4,5,8,9),nrow=1,byrow=TRUE)))
'data.frame':	408 obs. of  4 variables:
  $ Group : Factor w/ 2 levels "s","w": 1 1 1 1 1 1 1 1 1 1 ...
  $ Subj  : Factor w/ 51 levels "S1","S10","S11",..: 1 12 23 34 45 48  
49 50 51 2 ...
  $ group1: Factor w/ 8 levels "Me.F.1","Me.F.2",..: 1 1 1 1 1 1 1 1 1  
1 ...
  $ value1: int  4 6 7 8 10 5 13 8 6 14 ...

Now you can just split apart the 'group1' column with sub() to make  
the three specified columns.
#
On Oct 7, 2011, at 1:30 PM, David Winsemius wrote:

            
It occured to me that the ordering operation probably should have  
preceded teh ancillary column creation so this method is tested:
Group Subj time value Time Var Ref
s.S1.1     s   S1    1     4    1   F  Me
s.S1.2     s   S1    2     5    2   F  Me
s.S1.3     s   S1    3     6    1   J  Me
s.S1.4     s   S1    4    10    2   J  Me
s.S1.5     s   S1    5     3    1   F She
s.S1.6     s   S1    6     6    2   F She
s.S1.7     s   S1    7     6    1   J She
s.S1.8     s   S1    8     9    2   J She
Lemon's method has the advantage that it properly carries along the  
column information
David Winsemius, MD
West Hartford, CT
#
David, thanks a lot for the code! I've learned quite a bit from all
the generous help...

Gang
On Fri, Oct 7, 2011 at 1:37 PM, David Winsemius <dwinsemius at comcast.net> wrote: