Skip to content

Help with Reshaping from Wide to Long

9 messages · Tal Galili, jlwoodard, John L. Woodard +3 more

#
I am trying to reshape data that are in the wide format into the long format. 
The design is a repeated-measures design, which combined 3 levels of shape
(circle, square, triangle) with 3 levels of color (Blue, Red, Green), for a
total of 9 variables.  The wide data look like this (sorry I couldn't get
the columns to line up quite right:

   Subject CircleBlue CircleRed CircleGreen SquareBlue SquareRed SquareGreen
TriangleBlue TriangleRed TriangleGreen 
1      101         95        100       100         95        100       100        
60         80        75   
2      102         80         90       100         85         90       100        
55         45        45   

I would like to convert the data to the following format so that I could do
a repeated measures ANOVA:

Accuracy  Subject       Shape 	Color
95 	     subj101  	Circle 	Blue
80 	     subj102 	Circle 	Blue
100 	     subj101 	Circle 	Red
90 	     subj102 	Circle 	Red
100 	     subj101 	Circle 	Green
100 	     subj102 	Circle 	Green
95 	     subj101  	Square       Blue
85 	     subj102 	Square       Blue
100 	     subj101 	Square       Red
90 	     subj102 	Square       Red
100 	     subj101 	Square       Green
100 	     subj102 	Square       Green
60 	     subj101  	Triangle      Blue
55 	     subj102 	Triangle      Blue
80 	     subj101 	Triangle      Red
45 	     subj102 	Triangle      Red
75 	     subj101 	Triangle      Green
45 	     subj102 	Triangle      Green

I've been able to accomplish the task using the stack command, together with
some fairly lengthy code to reorganize the stacked data frame.  Is it
possible to use the reshape command to accomplish this task in a more
straightforward manner?   Many thanks in advance!

John
#
Tal,
  Thanks for the information.

 I actually did read through the help for the reshape package, though being
relatively new to R, I don't quite understand the ins and outs of the
command.

I tried using the melt command:
x<-melt(accuracy,id='Subject')

but, it didn't give me anything different than the stacked command.  I'm
trying to get two columns to indicate the Shape and Color.

Thanks also for this information on ezANOVA.  It looks very promising,
though I didn't see an example of repeated measures ANOVA in the help file.

Best regards,

John
#
Try this:

library(reshape)

accuracy <- structure(list(Subject = c(101L, 102L, 103L, 104L, 105L, 106L
), CircleBlue = c(95L, 80L, 80L, 85L, 70L, 70L), CircleRed = c(100L,
90L, 70L, 80L, 75L, 75L), CircleGreen = c(100L, 100L, 95L, 100L,
95L, 75L), SquareBlue = c(95L, 85L, 90L, 90L, 70L, 40L), SquareRed =
c(100L,
90L, 100L, 90L, 75L, 60L), SquareGreen = c(100L, 100L, 100L, 90L,
85L, 85L), TriangleBlue = c(60L, 55L, 65L, 65L, 60L, 40L), TriangleRed =
c(80L,
45L, 60L, 50L, 40L, 35L), TriangleGreen = c(75L, 45L, 55L, 50L, 45L,
50L)), .Names = c("Subject", "CircleBlue", "CircleRed", "CircleGreen",
"SquareBlue", "SquareRed", "SquareGreen", "TriangleBlue", "TriangleRed",
"TriangleGreen"), row.names = c(NA, 6L), class = "data.frame")

tmp1 <- melt( accuracy, id="Subject" )
colnames( tmp1 ) [ which( colnames( tmp1 ) == "value" ) ] <- "Accuracy"

keys <- data.frame( variable = levels( tmp1$variable )
                   , Shape=rep( c( "Circle", "Square", "Triangle" ), each=3 )
                   , Color=rep( c( "Blue", "Red", "Green" ), times=3 )
                   )
tmp2 <- merge( tmp1, keys )

accuracym <- tmp2[ , c("Accuracy", "Subject", "Shape", "Color") ]
On Sat, 17 Jul 2010, John L. Woodard wrote:

            
---------------------------------------------------------------------------
Jeff Newmiller                        The     .....       .....  Go Live...
DCN:<jdnewmil at dcn.davis.ca.us>        Basics: ##.#.       ##.#.  Live Go...
                                       Live:   OO#.. Dead: OO#..  Playing
Research Engineer (Solar/Batteries            O.O#.       #.O#.  with
/Software/Embedded Controllers)               .OO#.       .OO#.  rocks...1k
#
An alternative using the base reshape function:

one = reshape(accuracy,idvar='Subject',varying=list(c(2,3,4),c(5,6,7),c(8,9,10)),
               direction='long',timevar='shape')
two = reshape(one,idvar=c('Subject','shape'),varying=list(3:5),
               direction='long',timevar='color')
two$shape=factor(two$shape,labels=c('Circle','Square','Triangle'))
two$color=factor(two$color,labels=c('Blue','Red','Green'))
names(two)[4] = 'value'

 					- Phil Spector
 					 Statistical Computing Facility
 					 Department of Statistics
 					 UC Berkeley
 					 spector at stat.berkeley.edu
On Sat, 17 Jul 2010, Jeff Newmiller wrote:

            
1 day later
#
Hi Phil and Jeff,
    Thanks so much for taking the time to help me solve this issue!  Both
approaches work perfectly.  Each of your approaches helped me learn more
about what R can do.   I really appreciate your help!

Very best regards,

John
1 day later
#
On Sun, Jul 18, 2010 at 6:44 PM, jlwoodard <john.woodard at wayne.edu> wrote:
Hi John,

Now that you've seen some of R's fancy data manipulation footwork,
here's a small taste of the graphing capabilties (the matching of
colors and glyphs with your factor levels is serendipitous :-))

dat <- structure(list(Accuracy = c(95L, 80L, 100L, 90L, 100L, 100L,
95L, 85L, 100L, 90L, 100L, 100L, 60L, 55L, 80L, 45L, 75L, 45L
), Subject = structure(c(1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L,
2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L), .Label = c("subj101", "subj102"
), class = "factor"), Shape = structure(c(1L, 1L, 1L, 1L, 1L,
1L, 3L, 3L, 3L, 3L, 3L, 3L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("Circle",
"Triangle", "Square"), class = "factor"), Color = structure(c(3L,
3L, 1L, 1L, 2L, 2L, 3L, 3L, 1L, 1L, 2L, 2L, 3L, 3L, 1L, 1L, 2L,
2L), .Label = c("Red", "Green", "Blue"), class = "factor")), .Names =
c("Accuracy",
"Subject", "Shape", "Color"), row.names = c(NA, -18L), class = "data.frame")

library(ggplot2)
X11(12, 6)
qplot(Shape:Color, Accuracy, data = dat, colour = Color, shape = Shape,
   facets = . ~ Subject)

Suggesting (keeping in mind here we have a sample of just 2 subjects):
i) lower accuracy for triangles
ii) lower accuracy for blue (subj102 triangls is an exception)
iii) the upper bound on accuracy is often reached.
iv) the upper bound may mask effects.  For example look at the color
effects for circles and squares -- for subj101 the green effect might
be masked by the upper bound.
v) there is Color:Shape interaction (e.g. the color effects differ for
triangles)
vi) there is likely between-subject variation in the mean and possibly
in effects as well.

As for analyses, my preference  for repeated measures is to use
likelihood-based rather than sums-of-squares based methods.  Usually
I'd recommend lme4::lmer OR nlme:lme starting with random Subject
intercepts (appears to really just be a RCBD, so a start might be
lmer(Accuracy ~ Shape*Color + (1|Subject), dat)), but the constrained
response and limited sample size (both terms of number of subjects and
conflation between error and interaction) makes me think fitting a
meaningful model is not trivial.  Off the cuff, perhaps a beta or
binomial model or using logit-transformed Accuracy (noting that
nothing can retrieve the 'theoretical effects' mentioned in (iv)
above, but that may not be of interest),

best,
Kingsford Jones