Skip to content

randomly select duplicated entries

4 messages · Juliet Hannah, Henrique Dallazuanna, jim holtman +1 more

#
Using this data as an example

dat <- read.table(textConnection("Id         myvar
12 1
12 2
12 6
34 9
34 4
34 8
65 15
65 23"), header = TRUE)
closeAllConnections()

how can I create another data set that does not have duplicate entries
for 'Id', but the included values
are randomly selected from the available ones.

Thanks!

Juliet
#
Try this:

do.call(rbind, lapply(split(dat, dat$Id),
                             function(x)x[sample(1:nrow(x), 1),]))
On 7/9/08, Juliet Hannah <juliet.hannah at gmail.com> wrote:

  
    
#
How about this:
+ 12 1
+ 12 2
+ 12 6
+ 34 9
+ 34 4
+ 34 8
+ 65 15
+ 65 23"), header = TRUE)
+     .grp[sample(seq(length(.grp)), 1),]
+ })
Id myvar
12 12     1
34 34     9
65 65    15
On Wed, Jul 9, 2008 at 3:17 PM, Juliet Hannah <juliet.hannah at gmail.com> wrote:

  
    
#
on 07/09/2008 02:17 PM Juliet Hannah wrote:
> aggregate(dat$myvar, list(dat$Id), sample, 1)
   Group.1  x
1      12  6
2      34  4
3      65 15

 > aggregate(dat$myvar, list(dat$Id), sample, 1)
   Group.1  x
1      12  2
2      34  9
3      65 15

 > aggregate(dat$myvar, list(dat$Id), sample, 1)
   Group.1  x
1      12  1
2      34  8
3      65 23


HTH,

Marc Schwartz