convert count data to binary data - R-help

Christopher G Oakley · 2011-05-06T20:15:04Z

Is there a way to generate a new dataframe that produces x lines based on the contents of a column? for example: I would like to generate a new dataframe with 70 lines of data[1, 1:3], 67 lines of data[2, 1:3], 75lines of data[3,1:3] and so on up to numrow = sum(count). > data pop fam yesorno count 1 126 1 70 1 127 1 67 1 128 1 75 1 126 0 20 1 127 0 23 1 128 0 15 Thanks, Chris Department of Biological Science Florida State

Marc Schwartz

Fri, May 6, 2011 6:19 PM #

On May 6, 2011, at 3:15 PM, Christopher G Oakley wrote:

# Better not to use 'data' as the name of an R object to avoid 
# confusion with certain functions where 'data' is the name of 
# an argument, such as regression models. R is smart enough 
# to generally know the difference, but it can make reading code
# less confusing

pop fam yesorno count
1   1 126       1    70
2   1 127       1    67
3   1 128       1    75
4   1 126       0    20
5   1 127       0    23
6   1 128       0    15


Use rep() to generate a vector of repeated indices (?rep):

[1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
 [34] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
 [67] 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
[100] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
[133] 2 2 2 2 2 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
[166] 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
[199] 3 3 3 3 3 3 3 3 3 3 3 3 3 3 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
[232] 4 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 6 6 6 6 6 6 6 6 6
[265] 6 6 6 6 6 6

1  2  3  4  5  6 
70 67 75 20 23 15 


Now use that vector as input:

DF.New <- DF[rep(1:nrow(DF), DF$count), 1:3]

'data.frame':	270 obs. of  3 variables:
 $ pop    : int  1 1 1 1 1 1 1 1 1 1 ...
 $ fam    : int  126 126 126 126 126 126 126 126 126 126 ...
 $ yesorno: int  1 1 1 1 1 1 1 1 1 1 ...

yesorno
fam    0  1
  126 20 70
  127 23 67
  128 15 75


If you might need something more generalized to handle generating 'raw' data of various types from a contingency table, search the list archives for the function "expand.dft", which I posted a few years ago and I think found its way into a couple of CRAN packages.

HTH,

Marc Schwartz