Deleting rows with special character
On Nov 16, 2012, at 8:26 AM, Sarah Goslee <sarah.goslee at gmail.com> wrote:
Hi Peter, On Fri, Nov 16, 2012 at 9:04 AM, Peter Kupfer <peter.kupfer at me.com> wrote:
Dear all, maybe a simple problem but I found no solution for my problem. I have a matrix Y with 23 000 rows and 220 colums. The entries are "A", "B" or "C".
A reproducible example with sample data is helpful.
I want to extract all rows (as a matrix ) of the matrix Y where all entries of a row are (for example) "A".
Really? Why not just make a new matrix with the right number of "A" values?
Is there any solution? I tried the stringr- package but i doesn't work out.
Of course there is. Here's one option. But I'm not sure you've really stated your actual problem. This extracts the rows where all values are "A", and might at least get you started toward your real problem. testdata <- matrix(c( "A", "B", "C", "B", "B", "B", "C", "A", "A", "A", "A", "A"), ncol=3, byrow=TRUE) testdata.A <- testdata[apply(testdata, 1, function(x)all(x == "A")), , drop=FALSE]
Using something like rowSums() might be faster in this case, based upon brief testing.
Since using a boolean returns TRUE/FALSE, which have numeric equivalent values of 1/0, respectively, you can subset the matrix based upon the rowSums() values being equal to the number of columns in the matrix, which indicates that all values in the row match your desired value.
# Create a 230000 * 220 matrix with random values.
set.seed(1)
testdata <- matrix(sample(c("A", "B", "C"), 23000*220, replace = TRUE), ncol = 220)
# Set 100 random rows to all "A"s
set.seed(2)
testdata[sample(23000, 100), ] <- rep("A", 220)
system.time(Sub1 <-testdata[apply(testdata, 1, function(x)all(x == "A")), ,drop = FALSE])
user system elapsed 0.454 0.047 0.503
system.time(Sub2 <- testdata[rowSums(testdata == "A") == ncol(testdata), , drop = FALSE])
user system elapsed 0.089 0.001 0.090
str(Sub1)
chr [1:100, 1:220] "A" "A" "A" "A" "A" "A" "A" "A" ...
str(Sub2)
chr [1:100, 1:220] "A" "A" "A" "A" "A" "A" "A" "A" ...
identical(Sub1, Sub2)
[1] TRUE See ?rowSums, which uses a .Internal, so is fast code. Regards, Marc Schwartz