Skip to content

Faster Solution for a simple code?

4 messages · jim holtman, PIKAL Petr, Chris82

#
Hi R-users,

I create a simple code to check out how often the same numbers in y occur in
x. For example 5000000 320000 occurs two times. 
But the code with the loop is extremly slow. x have 6100 lines and y
sometimes more than 50000 lines.

Is there any alternative code to create with R?

thanks.
 

x

5000000	3200000	0
5100000	3100000	0
5200000	3100000	0
5200000	3200000	0


lengthx <- length(x[,1])

y

5000000	3200000	1
5000000	3200000	1
5200000	3100000	1
5200000	3000000	1


langthy <- length(y[,1])

for (i in 1:lengthx){
for (j in 1:lengthy){
if (x[i,1] == y[j,1]){
if (x[i,2] == y[j,2]){
x[i,3] <- x[i,3] + 1
}
}
}
}
x

1  5000000    3200000      2
2  5100000    3100000      0
3  5200000    3100000      1
4  5200000    3200000      0
#
try this:
V1      V2 V3
1 5000000 3200000  0
2 5100000 3100000  0
3 5200000 3100000  0
4 5200000 3200000  0
V1      V2 V3
1 5000000 3200000  1
2 5000000 3200000  1
3 5200000 3100000  1
4 5200000 3000000  1
+     if (any(is.na(.grp))) return(c(.grp[1,1], .grp[1,2], 0))
+     c(.grp[1,1], .grp[1,2], nrow(.grp))
+ }))
                   [,1]    [,2] [,3]
5100000.3100000 5100000 3100000    0
5200000.3100000 5200000 3100000    1
5000000.3200000 5000000 3200000    2
5200000.3200000 5200000 3200000    0

        
On Mon, Apr 13, 2009 at 1:06 PM, Chris82 <rubenbauar at gmx.de> wrote:

  
    
#
Hi

this one could be slightly quicker but i am not completely sure because it 
gives me different results from yours

set.seed(111)
x<-x1<-x2<- data.frame(a=sample(1:50, 10000, replace=T), b=sample(100:500, 
10000, replace=T))
y<- data.frame(a=sample(1:50, 10000, replace=T), b=sample(100:500, 10000, 
replace=T))


system.time({
one<-paste(x[,1], x[,2], sep=".")
two<-paste(y[,1], y[,2], sep=".")
tab<-table(two[two %in% one])
x1[match(names(tab), one),3]<-tab
})

system.time({
z <- merge(x, y, by=c("a", "b"), all.x=TRUE)
x2<-t(sapply(split(z, z[,1:2], drop=TRUE), function(.grp){
     if (any(is.na(.grp))) return(c(.grp[1,1], .grp[1,2], 0))
    c(.grp[1,1], .grp[1,2], nrow(.grp))
}))
})

head(x2)
head(x1[order(x1[,1], x1[,2]),])

Regards
Petr


r-help-bounces at r-project.org napsal dne 13.04.2009 21:51:18:
occur in
http://www.nabble.com/Faster-Solution-for-a-
http://www.R-project.org/posting-guide.html
http://www.R-project.org/posting-guide.html
#
Thanks!

That's a really fast soltution. Now the R process takes a few seconds
instead of a couple of hours with my loop.

greets.
jholtman wrote: