Hello,
I would like to randomly select one row by group from a matrix. Here is an example where there is one row by group. The code gives an error message:
test <- matrix(c(4,4, 6,2, 1,2), nrow = 2, ncol = 3, dimnames = list(NULL, c("xcor", "ycor", "id")))
do.call(rbind, lapply(split(test, test[,c("id")]), function(x) x[sample(nrow(x), 1), ]))
Show Traceback
Rerun with Debug
Error in sample.int(length(x), size, replace, prob) :
invalid first argument
How can I modify the code so that it works when there are several rows or one row for a given group?
Thanks very much for your time
Have a nice day
Marine
Randomly select one row by group from a matrix
4 messages · Marine Regis, Ulrik Stervbo, Bert Gunter +1 more
Hi Marine,
your manipulation of the matrix is quite convoluted, and it helps to expand
a bit:
test_lst <- split(test, test[,c("id")])
test_lst$`1`
after splitting, your matrix has gone back to be a plain vector, which
makes the sampling fail.
The reason is that, a matrix - behind the scenes - is a vector with a
dimension and when splitting the matrix you lose the dimension information.
Do you really need to work with a matrix? I prefer data.frames because I
can mix different types. Also with data.frame you can use the functionality
of the dplyr library, which also makes things more readable:
library(dplyr)
test_df <- data.frame(xcor = rnorm(8), ycor = rnorm(8), id = c(1, 2))
grouped_test_df <- group_by(test_df, id)
sample_n(grouped_test_df, 1)
HTH
Ulrik
On Thu, 18 May 2017 at 17:18 Marine Regis <marine.regis at hotmail.fr> wrote:
Hello,
I would like to randomly select one row by group from a matrix. Here is an
example where there is one row by group. The code gives an error message:
test <- matrix(c(4,4, 6,2, 1,2), nrow = 2, ncol = 3, dimnames = list(NULL,
c("xcor", "ycor", "id")))
do.call(rbind, lapply(split(test, test[,c("id")]), function(x)
x[sample(nrow(x), 1), ]))
Show Traceback
Rerun with Debug
Error in sample.int(length(x), size, replace, prob) :
invalid first argument
How can I modify the code so that it works when there are several rows or
one row for a given group?
Thanks very much for your time
Have a nice day
Marine
[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
If I understand corrrectly, this is easily accomplished in base R via ?tapply and indexing. e.g. set.seed(1234) ## for reproducibility grp <- sample.int(5,size = 30,rep = TRUE) ## a grouping vector ## Could be just a column of your matrix or frame indx <- tapply(seq_along(grp),grp, sample,size =1)
indx ## just to show you what you get
1 2 3 4 5 19 15 10 6 14 ## now just use indx to extract rowd of your matrix or data frame,d: selected <- d[indx,] ## one row per group Cheers, Bert Bert Gunter "The trouble with having an open mind is that people keep coming along and sticking things into it." -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
On Thu, May 18, 2017 at 8:45 AM, Ulrik Stervbo <ulrik.stervbo at gmail.com> wrote:
Hi Marine,
your manipulation of the matrix is quite convoluted, and it helps to expand
a bit:
test_lst <- split(test, test[,c("id")])
test_lst$`1`
after splitting, your matrix has gone back to be a plain vector, which
makes the sampling fail.
The reason is that, a matrix - behind the scenes - is a vector with a
dimension and when splitting the matrix you lose the dimension information.
Do you really need to work with a matrix? I prefer data.frames because I
can mix different types. Also with data.frame you can use the functionality
of the dplyr library, which also makes things more readable:
library(dplyr)
test_df <- data.frame(xcor = rnorm(8), ycor = rnorm(8), id = c(1, 2))
grouped_test_df <- group_by(test_df, id)
sample_n(grouped_test_df, 1)
HTH
Ulrik
On Thu, 18 May 2017 at 17:18 Marine Regis <marine.regis at hotmail.fr> wrote:
Hello,
I would like to randomly select one row by group from a matrix. Here is an
example where there is one row by group. The code gives an error message:
test <- matrix(c(4,4, 6,2, 1,2), nrow = 2, ncol = 3, dimnames = list(NULL,
c("xcor", "ycor", "id")))
do.call(rbind, lapply(split(test, test[,c("id")]), function(x)
x[sample(nrow(x), 1), ]))
Show Traceback
Rerun with Debug
Error in sample.int(length(x), size, replace, prob) :
invalid first argument
How can I modify the code so that it works when there are several rows or
one row for a given group?
Thanks very much for your time
Have a nice day
Marine
[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
You can modify your original code to get what you want:
do.call(rbind, lapply(split(data.frame(test), test[,c("id")]), function(x) as.matrix(x[sample(nrow(x), 1), ])))
# xcor ycor id
# 1 4 6 1
# 2 4 2 2
But Bert's way is simpler:
indx <- tapply(seq_along(test[, "id"]), test[, "id"], sample, size=1)
test[indx, ]
# xcor ycor id
# [1,] 4 6 1
# [2,] 4 2 2
-------------------------------------
David L Carlson
Department of Anthropology
Texas A&M University
College Station, TX 77840-4352
-----Original Message-----
From: R-help [mailto:r-help-bounces at r-project.org] On Behalf Of Bert Gunter
Sent: Thursday, May 18, 2017 12:39 PM
To: Ulrik Stervbo <ulrik.stervbo at gmail.com>
Cc: r-help at r-project.org; Marine Regis <marine.regis at hotmail.fr>
Subject: Re: [R] Randomly select one row by group from a matrix
If I understand corrrectly, this is easily accomplished in base R via
?tapply and indexing.
e.g.
set.seed(1234) ## for reproducibility
grp <- sample.int(5,size = 30,rep = TRUE) ## a grouping vector
## Could be just a column of your matrix or frame
indx <- tapply(seq_along(grp),grp, sample,size =1)
indx ## just to show you what you get
1 2 3 4 5 19 15 10 6 14 ## now just use indx to extract rowd of your matrix or data frame,d: selected <- d[indx,] ## one row per group Cheers, Bert Bert Gunter "The trouble with having an open mind is that people keep coming along and sticking things into it." -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
On Thu, May 18, 2017 at 8:45 AM, Ulrik Stervbo <ulrik.stervbo at gmail.com> wrote:
Hi Marine,
your manipulation of the matrix is quite convoluted, and it helps to expand
a bit:
test_lst <- split(test, test[,c("id")])
test_lst$`1`
after splitting, your matrix has gone back to be a plain vector, which
makes the sampling fail.
The reason is that, a matrix - behind the scenes - is a vector with a
dimension and when splitting the matrix you lose the dimension information.
Do you really need to work with a matrix? I prefer data.frames because I
can mix different types. Also with data.frame you can use the functionality
of the dplyr library, which also makes things more readable:
library(dplyr)
test_df <- data.frame(xcor = rnorm(8), ycor = rnorm(8), id = c(1, 2))
grouped_test_df <- group_by(test_df, id)
sample_n(grouped_test_df, 1)
HTH
Ulrik
On Thu, 18 May 2017 at 17:18 Marine Regis <marine.regis at hotmail.fr> wrote:
Hello,
I would like to randomly select one row by group from a matrix. Here is an
example where there is one row by group. The code gives an error message:
test <- matrix(c(4,4, 6,2, 1,2), nrow = 2, ncol = 3, dimnames = list(NULL,
c("xcor", "ycor", "id")))
do.call(rbind, lapply(split(test, test[,c("id")]), function(x)
x[sample(nrow(x), 1), ]))
Show Traceback
Rerun with Debug
Error in sample.int(length(x), size, replace, prob) :
invalid first argument
How can I modify the code so that it works when there are several rows or
one row for a given group?
Thanks very much for your time
Have a nice day
Marine
[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.