I have a data frame with roughly 500 rows and 120 variables. I would like to generate a new data frame that will include one row for each PAIR of rows in the original data frame and will include all 120 + 120 = 240 variables from the two rows. I need only one row for each pair, not two rows. Thus the new data frame will contain 500 x 499 / 2 = 124,750 rows. Is there an easy way to do this with R? Thanks in advance, Don Macnaughton
How Can I Concatenate Every Row in a Data Frame with Every Other Row?
4 messages · Donald Macnaughton, jim holtman, Duncan Murdoch +1 more
Try this:
x <- data.frame(a=1:100, b=100:1, c=sample(100)) # assume even number of rows: bind the even/odd together even <- seq(nrow(x)) %% 2 new.x <- cbind(x[even==1,], x[even==0,]) head(new.x)
a b c a.1 b.1 c.1 1 1 100 69 2 99 60 3 3 98 24 4 97 26 5 5 96 71 6 95 43 7 7 94 17 8 93 70 9 9 92 10 10 91 79 11 11 90 56 12 89 50
On Sat, Mar 21, 2009 at 12:01 PM, Donald Macnaughton <donmac at matstat.com> wrote:
I have a data frame with roughly 500 rows and 120 variables. ?I would like to generate a new data frame that will include one row for each PAIR of rows in the original data frame and will include all 120 + 120 = 240 variables from the two rows. ?I need only one row for each pair, not two rows. ?Thus the new data frame will contain 500 x 499 / 2 = 124,750 rows. Is there an easy way to do this with R? Thanks in advance, Don Macnaughton
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem that you are trying to solve?
On 21/03/2009 12:01 PM, Donald Macnaughton wrote:
I have a data frame with roughly 500 rows and 120 variables. I would like to generate a new data frame that will include one row for each PAIR of rows in the original data frame and will include all 120 + 120 = 240 variables from the two rows. I need only one row for each pair, not two rows. Thus the new data frame will contain 500 x 499 / 2 = 124,750 rows. Is there an easy way to do this with R?
Probably the easiest is to generate row indices for each pair, e.g.
n <- nrow(mydata)
row1 <- rep(1:n, n)
row2 <- rep(1:n, each=n)
keep <- row1 < row2
big <- cbind(mydata[row1[keep],], mydata[row2[keep],])
With a simple example
> mydata <- data.frame(a=1:3, b=letters[1:3])
> mydata
a b
1 1 a
2 2 b
3 3 c
this produces
> big
a b a b
1 1 a 2 b
1.1 1 a 3 c
2 2 b 3 c
I hacked at a bit differently than Duncan. See if these help pages and
this example point another way:
?combn
?"["
> df <- data.frame(a = 1:4, b=LETTERS[1:4])
> n <- nrow(df)
> cbind(df[combn(1:n,2)[1,],], df[combn(1:n,2)[2,],] )
a b a b
1 1 A 2 B
1.1 1 A 3 C
1.2 1 A 4 D
2 2 B 3 C
2.1 2 B 4 D
3 3 C 4 D
David Winsemius On Mar 21, 2009, at 12:01 PM, Donald Macnaughton wrote: > I have a data frame with roughly 500 rows and 120 variables. I > would like > to generate a new data frame that will include one row for each PAIR > of > rows in the original data frame and will include all 120 + 120 = 240 > variables from the two rows. I need only one row for each pair, not > two > rows. Thus the new data frame will contain 500 x 499 / 2 = 124,750 > rows. > > Is there an easy way to do this with R? > David Winsemius, MD Heritage Laboratories West Hartford, CT