Hi R-users,
I create a simple code to check out how often the same numbers in y occur in
x. For example 5000000 320000 occurs two times.
But the code with the loop is extremly slow. x have 6100 lines and y
sometimes more than 50000 lines.
Is there any alternative code to create with R?
thanks.
x
5000000 3200000 0
5100000 3100000 0
5200000 3100000 0
5200000 3200000 0
lengthx <- length(x[,1])
y
5000000 3200000 1
5000000 3200000 1
5200000 3100000 1
5200000 3000000 1
langthy <- length(y[,1])
for (i in 1:lengthx){
for (j in 1:lengthy){
if (x[i,1] == y[j,1]){
if (x[i,2] == y[j,2]){
x[i,3] <- x[i,3] + 1
}
}
}
}
x
1 5000000 3200000 2
2 5100000 3100000 0
3 5200000 3100000 1
4 5200000 3200000 0
On Mon, Apr 13, 2009 at 1:06 PM, Chris82 <rubenbauar at gmx.de> wrote:
Hi R-users,
I create a simple code to check out how often the same numbers in y occur in
x. For example 5000000 320000 occurs two times.
But the code with the loop is extremly slow. x have 6100 lines and y
sometimes more than 50000 lines.
Is there any alternative code to create with R?
thanks.
x
5000000 3200000 0
5100000 3100000 0
5200000 3100000 0
5200000 3200000 0
lengthx <- length(x[,1])
y
5000000 3200000 1
5000000 3200000 1
5200000 3100000 1
5200000 3000000 1
langthy <- length(y[,1])
for (i in 1:lengthx){
for (j in 1:lengthy){
if (x[i,1] == y[j,1]){
if (x[i,2] == y[j,2]){
x[i,3] <- x[i,3] + 1
}
}
}
}
x
1 ?5000000 ? ?3200000 ? ? ?2
2 ?5100000 ? ?3100000 ? ? ?0
3 ?5200000 ? ?3100000 ? ? ?1
4 ?5200000 ? ?3200000 ? ? ?0
--
View this message in context: http://www.nabble.com/Faster-Solution-for-a-simple-code--tp23024985p23024985.html
Sent from the R help mailing list archive at Nabble.com.
Hi
this one could be slightly quicker but i am not completely sure because it
gives me different results from yours
set.seed(111)
x<-x1<-x2<- data.frame(a=sample(1:50, 10000, replace=T), b=sample(100:500,
10000, replace=T))
y<- data.frame(a=sample(1:50, 10000, replace=T), b=sample(100:500, 10000,
replace=T))
system.time({
one<-paste(x[,1], x[,2], sep=".")
two<-paste(y[,1], y[,2], sep=".")
tab<-table(two[two %in% one])
x1[match(names(tab), one),3]<-tab
})
system.time({
z <- merge(x, y, by=c("a", "b"), all.x=TRUE)
x2<-t(sapply(split(z, z[,1:2], drop=TRUE), function(.grp){
if (any(is.na(.grp))) return(c(.grp[1,1], .grp[1,2], 0))
c(.grp[1,1], .grp[1,2], nrow(.grp))
}))
})
head(x2)
head(x1[order(x1[,1], x1[,2]),])
Regards
Petr
r-help-bounces at r-project.org napsal dne 13.04.2009 21:51:18:
On Mon, Apr 13, 2009 at 1:06 PM, Chris82 <rubenbauar at gmx.de> wrote:
Hi R-users,
I create a simple code to check out how often the same numbers in y
occur in
x. For example 5000000 320000 occurs two times.
But the code with the loop is extremly slow. x have 6100 lines and y
sometimes more than 50000 lines.
Is there any alternative code to create with R?
thanks.
x
5000000 3200000 0
5100000 3100000 0
5200000 3100000 0
5200000 3200000 0
lengthx <- length(x[,1])
y
5000000 3200000 1
5000000 3200000 1
5200000 3100000 1
5200000 3000000 1
langthy <- length(y[,1])
for (i in 1:lengthx){
for (j in 1:lengthy){
if (x[i,1] == y[j,1]){
if (x[i,2] == y[j,2]){
x[i,3] <- x[i,3] + 1
}
}
}
}
x
1 5000000 3200000 2
2 5100000 3100000 0
3 5200000 3100000 1
4 5200000 3200000 0
--
View this message in context:
On Mon, Apr 13, 2009 at 1:06 PM, Chris82 <rubenbauar at gmx.de> wrote:
Hi R-users,
I create a simple code to check out how often the same numbers in y occur
in
x. For example 5000000 320000 occurs two times.
But the code with the loop is extremly slow. x have 6100 lines and y
sometimes more than 50000 lines.
Is there any alternative code to create with R?
thanks.
x
5000000 3200000 0
5100000 3100000 0
5200000 3100000 0
5200000 3200000 0
lengthx <- length(x[,1])
y
5000000 3200000 1
5000000 3200000 1
5200000 3100000 1
5200000 3000000 1
langthy <- length(y[,1])
for (i in 1:lengthx){
for (j in 1:lengthy){
if (x[i,1] == y[j,1]){
if (x[i,2] == y[j,2]){
x[i,3] <- x[i,3] + 1
}
}
}
}
x
1 ?5000000 ? ?3200000 ? ? ?2
2 ?5100000 ? ?3100000 ? ? ?0
3 ?5200000 ? ?3100000 ? ? ?1
4 ?5200000 ? ?3200000 ? ? ?0
--
View this message in context:
http://www.nabble.com/Faster-Solution-for-a-simple-code--tp23024985p23024985.html
Sent from the R help mailing list archive at Nabble.com.