Is there a way to generate a new dataframe that produces x lines based on the contents of a column?
for example: I would like to generate a new dataframe with 70 lines of data[1, 1:3], 67 lines of data[2, 1:3], 75lines of data[3,1:3] and so on up to numrow = sum(count).
data
pop fam yesorno count
1 126 1 70
1 127 1 67
1 128 1 75
1 126 0 20
1 127 0 23
1 128 0 15
Thanks,
Chris
Department of Biological Science
Florida State University
319 Stadium Drive
Tallahassee, FL 32306-4295
On May 6, 2011, at 3:15 PM, Christopher G Oakley wrote:
Is there a way to generate a new dataframe that produces x lines based on the contents of a column?
for example: I would like to generate a new dataframe with 70 lines of data[1, 1:3], 67 lines of data[2, 1:3], 75lines of data[3,1:3] and so on up to numrow = sum(count).
# Better not to use 'data' as the name of an R object to avoid
# confusion with certain functions where 'data' is the name of
# an argument, such as regression models. R is smart enough
# to generally know the difference, but it can make reading code
# less confusing
DF
pop fam yesorno count
1 1 126 1 70
2 1 127 1 67
3 1 128 1 75
4 1 126 0 20
5 1 127 0 23
6 1 128 0 15
Use rep() to generate a vector of repeated indices (?rep):
1 2 3 4 5 6
70 67 75 20 23 15
Now use that vector as input:
DF.New <- DF[rep(1:nrow(DF), DF$count), 1:3]
str(DF.New)
'data.frame': 270 obs. of 3 variables:
$ pop : int 1 1 1 1 1 1 1 1 1 1 ...
$ fam : int 126 126 126 126 126 126 126 126 126 126 ...
$ yesorno: int 1 1 1 1 1 1 1 1 1 1 ...
with(DF.New, table(fam, yesorno))
yesorno
fam 0 1
126 20 70
127 23 67
128 15 75
If you might need something more generalized to handle generating 'raw' data of various types from a contingency table, search the list archives for the function "expand.dft", which I posted a few years ago and I think found its way into a couple of CRAN packages.
HTH,
Marc Schwartz