Skip to content
Prev 318650 / 398503 Next

Help searching a matrix for only certain records

HI,
You could also use ?data.table() 

n<- 300000
set.seed(51)
?mat1<- as.matrix(data.frame(REC.TYPE= sample(c("SAO","FAO","FL-1","FL-2","FL-15"),n,replace=TRUE),Col2=rnorm(n),Col3=runif(n),stringsAsFactors=FALSE))
?dat1<- as.data.frame(mat1,stringsAsFactors=FALSE)
?table(mat1[,1])
#
?# FAO? FL-1 FL-15? FL-2?? SAO 
#60046 60272 59669 59878 60135 
system.time(x1 <- subset(mat1, grepl("(SAO|FL-15)", mat1[, "REC.TYPE"])))
?#user? system elapsed 
?# 0.076?? 0.004?? 0.082?
?system.time(x2 <- subset(mat1, mat1[, "REC.TYPE"] %in% c("SAO", "FL-15")))
?#? user? system elapsed 
?# 0.028?? 0.000?? 0.030 

system.time(x3 <- mat1[match(mat1[, "REC.TYPE"]
??????????????????????????? , c("SAO", "FL-15")
??????????????????????????? , nomatch = 0) != 0
??????????????????????????? ,, drop = FALSE]
??????????? )
#user? system elapsed 
#? 0.028?? 0.000?? 0.028?
?table(x3[,1])
#
#FL-15?? SAO 
#59669 60135 


library(data.table)

dat2<- data.table(dat1) 
?system.time(x4<- dat2[match(REC.TYPE,c("SAO", "FL-15"),nomatch=0)!=0,,drop=FALSE])
? # user? system elapsed 
? #0.024?? 0.000?? 0.025 
?table(x4$REC.TYPE)

#FL-15?? SAO 
#59669 60135 
A.K.








----- Original Message -----
From: jim holtman <jholtman at gmail.com>
To: Matt Borkowski <mathias1979 at yahoo.com>
Cc: "r-help at r-project.org" <r-help at r-project.org>
Sent: Sunday, March 3, 2013 11:52 AM
Subject: Re: [R] Help searching a matrix for only certain records

If you are using matrices, then here is several ways of doing it for
size 300,000.? You can determine if the difference of 0.1 seconds is
important in terms of the performance you are after.? It is taking you
more time to type in the statements than it is taking them to execute:
+? ?  sample(c("SAO ", "FL-15", "Other"), n, TRUE, prob = c(1,2,1000))
+? ?  , nrow = n
+? ?  , dimnames = list(NULL, "REC.TYPE")
+? ?  )
FL-15? Other?  SAO
?  562 299151? ? 287
?  user? system elapsed
?  0.17? ? 0.00? ? 0.17
?  user? system elapsed
?  0.05? ? 0.00? ? 0.05
+? ? ? ? ? ? ? ? ? ? ? ? ? ?  , c("SAO ", "FL-15")
+? ? ? ? ? ? ? ? ? ? ? ? ? ?  , nomatch = 0) != 0
+? ? ? ? ? ? ? ? ? ? ? ? ? ?  ,, drop = FALSE]
+? ? ? ? ? ?  )
?  user? system elapsed
?  0.03? ? 0.00? ? 0.03
[1] TRUE
[1] TRUE

        
On Sun, Mar 3, 2013 at 11:22 AM, Jim Holtman <jholtman at gmail.com> wrote: