randomly select duplicated entries

4 messages · Juliet Hannah, Henrique Dallazuanna, jim holtman +1 more

Original

1

4

Wed, Jul 9, 2008 12:17 PM #

Using this data as an example

dat <- read.table(textConnection("Id         myvar
12 1
12 2
12 6
34 9
34 4
34 8
65 15
65 23"), header = TRUE)
closeAllConnections()

how can I create another data set that does not have duplicate entries
for 'Id', but the included values
are randomly selected from the available ones.

Thanks!

Juliet

Henrique Dallazuanna

Wed, Jul 9, 2008 1:40 PM #

Try this:

do.call(rbind, lapply(split(dat, dat$Id),
                             function(x)x[sample(1:nrow(x), 1),]))

On 7/9/08, Juliet Hannah <juliet.hannah at gmail.com> wrote:

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Henrique Dallazuanna
Curitiba-Paran?-Brasil
25? 25' 40" S 49? 16' 22" O

jim holtman

Wed, Jul 9, 2008 1:42 PM #

How about this:

+ 12 1
+ 12 2
+ 12 6
+ 34 9
+ 34 4
+ 34 8
+ 65 15
+ 65 23"), header = TRUE)

+     .grp[sample(seq(length(.grp)), 1),]
+ })

Id myvar
12 12     1
34 34     9
65 65    15

On Wed, Jul 9, 2008 at 3:17 PM, Juliet Hannah <juliet.hannah at gmail.com> wrote:

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem you are trying to solve?

Wed, Jul 9, 2008 1:52 PM #

on 07/09/2008 02:17 PM Juliet Hannah wrote:

> aggregate(dat$myvar, list(dat$Id), sample, 1)
   Group.1  x
1      12  6
2      34  4
3      65 15

 > aggregate(dat$myvar, list(dat$Id), sample, 1)
   Group.1  x
1      12  2
2      34  9
3      65 15

 > aggregate(dat$myvar, list(dat$Id), sample, 1)
   Group.1  x
1      12  1
2      34  8
3      65 23


HTH,

Marc Schwartz