I have a data frame in wide format that I'd like to convert to long format. For example, in wide format I have: id A1B1 A1B2 A2B1 A2B2 1 1 400 475 420 510 2 2 390 500 470 472 3 3 428 512 555 610 4 4 703 787 801 822 5 5 611 634 721 705 6 6 543 522 612 788 7 7 411 488 506 623 8 8 654 644 711 795 A is one repeated-measures variable with levels 1 and 2. B is a second repeated-measures variable also with levels 1 and 2. I'd like to end up with: id A B value 1 1 1 1 400 2 2 1 1 390 3 3 1 1 428 4 4 1 1 703 5 5 1 1 611 6 6 1 1 543 7 7 1 1 411 8 8 1 1 654 9 1 1 2 475 10 2 1 2 500 11 3 1 2 512 12 4 1 2 787 13 5 1 2 634 14 6 1 2 522 15 7 1 2 488 16 8 1 2 644 17 1 2 1 420 18 2 2 1 470 19 3 2 1 555 20 4 2 1 801 21 5 2 1 721 22 6 2 1 612 23 7 2 1 506 24 8 2 1 711 25 1 2 2 510 26 2 2 2 472 27 3 2 2 610 28 4 2 2 822 29 5 2 2 705 30 6 2 2 788 31 7 2 2 623 32 8 2 2 795 I've been working with reshape and with the reshape package, but with no success for the two-time-variable case. Can anyone help please?
reshape with two time variables
5 messages · Neil Stewart, Hadley Wickham, jim holtman +1 more
On Fri, Jan 30, 2009 at 5:57 PM, Neil Stewart
<neil.stewart at warwick.ac.uk> wrote:
I have a data frame in wide format that I'd like to convert to long format.
For example, in wide format I have:
id A1B1 A1B2 A2B1 A2B2
1 1 400 475 420 510
2 2 390 500 470 472
3 3 428 512 555 610
4 4 703 787 801 822
5 5 611 634 721 705
6 6 543 522 612 788
7 7 411 488 506 623
8 8 654 644 711 795
A is one repeated-measures variable with levels 1 and 2. B is a second
repeated-measures variable also with levels 1 and 2. I'd like to end up
with:
id A B value
1 1 1 1 400
2 2 1 1 390
3 3 1 1 428
4 4 1 1 703
5 5 1 1 611
6 6 1 1 543
7 7 1 1 411
8 8 1 1 654
9 1 1 2 475
10 2 1 2 500
11 3 1 2 512
12 4 1 2 787
13 5 1 2 634
14 6 1 2 522
15 7 1 2 488
16 8 1 2 644
17 1 2 1 420
18 2 2 1 470
19 3 2 1 555
20 4 2 1 801
21 5 2 1 721
22 6 2 1 612
23 7 2 1 506
24 8 2 1 711
25 1 2 2 510
26 2 2 2 472
27 3 2 2 610
28 4 2 2 822
29 5 2 2 705
30 6 2 2 788
31 7 2 2 623
32 8 2 2 795
I've been working with reshape and with the reshape package, but with no
success for the two-time-variable case. Can anyone help please?
If you're using melt from the reshape package, once you've melted the data, you'll need to do something like: m$a <- as.numeric(substr(m$variable, 1, 1)) m$b <- as.numeric(substr(m$variable, 3, 3)) m$variable <- NULL Hadley
This will work:
x <- read.table(textConnection(" id A1B1 A1B2 A2B1 A2B2
+ 1 1 400 475 420 510 + 2 2 390 500 470 472 + 3 3 428 512 555 610 + 4 4 703 787 801 822 + 5 5 611 634 721 705 + 6 6 543 522 612 788 + 7 7 411 488 506 623 + 8 8 654 644 711 795"), header=TRUE)
closeAllConnections()
require(reshape)
x.m <- melt(x, id='id')
x.m$A <- sub("A(\\d+).*", "\\1", x.m$variable, perl=TRUE)
x.m$B <- sub(".*B(\\d+)", "\\1", x.m$variable, perl=TRUE)
x.m$variable <- NULL
x.m
id value A B 1 1 400 1 1 2 2 390 1 1 3 3 428 1 1 4 4 703 1 1 5 5 611 1 1 6 6 543 1 1 7 7 411 1 1 8 8 654 1 1 9 1 475 1 2 10 2 500 1 2 11 3 512 1 2 12 4 787 1 2 13 5 634 1 2 14 6 522 1 2 15 7 488 1 2 16 8 644 1 2 17 1 420 2 1 18 2 470 2 1 19 3 555 2 1 20 4 801 2 1 21 5 721 2 1 22 6 612 2 1 23 7 506 2 1 24 8 711 2 1 25 1 510 2 2 26 2 472 2 2 27 3 610 2 2 28 4 822 2 2 29 5 705 2 2 30 6 788 2 2 31 7 623 2 2 32 8 795 2 2 On Fri, Jan 30, 2009 at 6:57 PM, Neil Stewart
<neil.stewart at warwick.ac.uk> wrote:
I have a data frame in wide format that I'd like to convert to long format.
For example, in wide format I have:
id A1B1 A1B2 A2B1 A2B2
1 1 400 475 420 510
2 2 390 500 470 472
3 3 428 512 555 610
4 4 703 787 801 822
5 5 611 634 721 705
6 6 543 522 612 788
7 7 411 488 506 623
8 8 654 644 711 795
A is one repeated-measures variable with levels 1 and 2. B is a second
repeated-measures variable also with levels 1 and 2. I'd like to end up
with:
id A B value
1 1 1 1 400
2 2 1 1 390
3 3 1 1 428
4 4 1 1 703
5 5 1 1 611
6 6 1 1 543
7 7 1 1 411
8 8 1 1 654
9 1 1 2 475
10 2 1 2 500
11 3 1 2 512
12 4 1 2 787
13 5 1 2 634
14 6 1 2 522
15 7 1 2 488
16 8 1 2 644
17 1 2 1 420
18 2 2 1 470
19 3 2 1 555
20 4 2 1 801
21 5 2 1 721
22 6 2 1 612
23 7 2 1 506
24 8 2 1 711
25 1 2 2 510
26 2 2 2 472
27 3 2 2 610
28 4 2 2 822
29 5 2 2 705
30 6 2 2 788
31 7 2 2 623
32 8 2 2 795
I've been working with reshape and with the reshape package, but with no
success for the two-time-variable case. Can anyone help please?
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem that you are trying to solve?
hadley wickham wrote:
On Fri, Jan 30, 2009 at 5:57 PM, Neil Stewart <neil.stewart at warwick.ac.uk> wrote:
I have a data frame in wide format that I'd like to convert to long format.
For example, in wide format I have:
id A1B1 A1B2 A2B1 A2B2
1 1 400 475 420 510
2 2 390 500 470 472
3 3 428 512 555 610
4 4 703 787 801 822
5 5 611 634 721 705
6 6 543 522 612 788
7 7 411 488 506 623
8 8 654 644 711 795
A is one repeated-measures variable with levels 1 and 2. B is a second
repeated-measures variable also with levels 1 and 2. I'd like to end up
with:
id A B value
1 1 1 1 400
2 2 1 1 390
3 3 1 1 428
4 4 1 1 703
5 5 1 1 611
6 6 1 1 543
7 7 1 1 411
8 8 1 1 654
9 1 1 2 475
10 2 1 2 500
11 3 1 2 512
12 4 1 2 787
13 5 1 2 634
14 6 1 2 522
15 7 1 2 488
16 8 1 2 644
17 1 2 1 420
18 2 2 1 470
19 3 2 1 555
20 4 2 1 801
21 5 2 1 721
22 6 2 1 612
23 7 2 1 506
24 8 2 1 711
25 1 2 2 510
26 2 2 2 472
27 3 2 2 610
28 4 2 2 822
29 5 2 2 705
30 6 2 2 788
31 7 2 2 623
32 8 2 2 795
I've been working with reshape and with the reshape package, but with no
success for the two-time-variable case. Can anyone help please?
If you're using melt from the reshape package, once you've melted the data, you'll need to do something like: m$a <- as.numeric(substr(m$variable, 1, 1)) m$b <- as.numeric(substr(m$variable, 3, 3)) m$variable <- NULL
Also, things like this should work (here using stock reshape):
> xx <- reshape(x, direction="long", varying=2:5, v.names="val")
> cbind(xx, expand.grid(B=1:2,A=1:2)[xx$time,])
id time val B A
1.1 1 1 400 1 1
2.1 2 1 390 1 1
3.1 3 1 428 1 1
4.1 4 1 703 1 1
5.1 5 1 611 1 1
6.1 6 1 543 1 1
7.1 7 1 411 1 1
8.1 8 1 654 1 1
1.2 1 2 475 2 1
2.2 2 2 500 2 1
3.2 3 2 512 2 1
4.2 4 2 787 2 1
5.2 5 2 634 2 1
...
Once you got your mind wrapped around it, the pattern might be easier to
generalize to more complex within-subject designs.
O__ ---- Peter Dalgaard ?ster Farimagsgade 5, Entr.B c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk) FAX: (+45) 35327907
Thank you all so much for your help.
I've gone with
repeated_measures_data.csv:
sub,A1B1,A1B2,A2B1,A2B2
s1,400,475,420,510
s2,390,500,470,472
s3,428,512,555,610
s4,703,787,801,822
s5,611,634,721,705
s6,543,522,612,788
s7,411,488,506,623
s8,654,644,711,795
library(reshape)
data<-read.csv("repeated_measures_data.csv")
data.long <- melt(data)
data.long$A <- as.factor(substr(data.long$variable,1,2))
data.long$B <- as.factor(substr(data.long$variable,3,4))
data.long$variable <- NULL
data.long <- rename(data.long, c("value"="RT"))
as it offers the smallest chance of me messing up the factor levels.