Fastest way to compare a single value with all values in one column of a data frame

An embedded and charset-unspecified text was scrubbed...
Name: not available
URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20130129/b8979da5/attachment.pl>
An embedded and charset-unspecified text was scrubbed...
Name: not available
URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20130130/cc116f90/attachment.pl>
If you wanted this for all values in x that are smaller, i'd use

x[x$a < y$a,] <- y

for just the smallest:

x[intersect(which(x$a < y$a),which.min(x$a)),] <- y

Hello!

I have a large data frame x:
x<-data.frame(item=letters[1:5],a=1:5,b=11:15)  # in actuality, x has 1000
rows
x$item<-as.character(x$item)
I also have a small data frame y with just 1 row:
y<-data.frame(item="f",a=3,b=10)
y$item<-as.character(y$item)

I have to decide if y$a is larger than the smallest of all the values in
x$a. If it is, I want y to replace the whole row in x that has the lowest
value in column a.
This is how I'd do it.

if(y$a>min(x$a)){
 whichmin<-which(x$a==min(x$a))
 x[whichmin,]<-y[1,]
}

I am wondering if there is a faster way of doing it. What would be the
fastest possible way? I'd have to do it, unfortunately, many-many times.

Thank you very much!

-- 
Dimitri Liakhovitski
gfk.com <http://marketfusionanalytics.com/>

	[[alternative HTML version deleted]]

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Hi,
I guess you could also use:

?x[match(min(x$a),x$a[x$a<y$a]),]<- y
?x
#? item a? b
#1??? f 3 10
#2??? b 2 12
#3??? c 3 13
#4??? d 4 14
#5??? e 5 15
A.K.

----- Original Message -----
From: Dimitri Liakhovitski <dimitri.liakhovitski at gmail.com>
To: r-help <r-help at r-project.org>
Cc: 
Sent: Tuesday, January 29, 2013 4:11 PM
Subject: [R] Fastest way to compare a single value with all values in one column of a data frame

Hello!

I have a large data frame x:
x<-data.frame(item=letters[1:5],a=1:5,b=11:15)? # in actuality, x has 1000
rows
x$item<-as.character(x$item)
I also have a small data frame y with just 1 row:
y<-data.frame(item="f",a=3,b=10)
y$item<-as.character(y$item)

I have to decide if y$a is larger than the smallest of all the values in
x$a. If it is, I want y to replace the whole row in x that has the lowest
value in column a.
This is how I'd do it.

if(y$a>min(x$a)){
? whichmin<-which(x$a==min(x$a))
? x[whichmin,]<-y[1,]
}

I am wondering if there is a faster way of doing it. What would be the
fastest possible way? I'd have to do it, unfortunately, many-many times.

Thank you very much!
Dimitri Liakhovitski
gfk.com <http://marketfusionanalytics.com/>

??? [[alternative HTML version deleted]]

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
HI,

Sorry, my previous solution doesn't work.
This should work for your dataset:
set.seed(1851)
x<- data.frame(item=sample(letters[1:5],20,replace=TRUE),a=sample(1:15,20,replace=TRUE),b=sample(20:30,20,replace=TRUE),stringsAsFactors=F)
y<- data.frame(item="f",a=3,b=10,stringsAsFactors=F)
?x[x$a%in%which.min(x[x$a<y$a,]$a),]<- y #if there are multiple minimum values

set.seed(1241)
x1<- data.frame(item=sample(letters[1:10],1e4,replace=TRUE),a=sample(1:30,1e4,replace=TRUE),b=sample(1:100,1e4,replace=TRUE),stringsAsFactors=F)
y1<- data.frame(item="f",a=3,b=10,stringsAsFactors=F)
length(x1$a[x1$a==1])
#[1] 330
?system.time({x1[x1$a%in%which.min(x1[x1$a<y1$a,]$a),]<- y1})
#?? user? system elapsed 
?# 0.000?? 0.000?? 0.001 
length(x1$a[x1$a==1])
#[1] 0

#For some reason, it is not working when the multiple number of minimum values > some value

set.seed(1241)
x1<- data.frame(item=sample(letters[1:10],1e5,replace=TRUE),a=sample(1:30,1e5,replace=TRUE),b=sample(1:100,1e5,replace=TRUE),stringsAsFactors=F)
y1<- data.frame(item="f",a=3,b=10,stringsAsFactors=F)
length(x1$a[x1$a==1])
#[1] 3404
x1[x1$a%in%which.min(x1[x1$a<y1$a,]$a),]<- y1
?length(x1$a[x1$a==1])
#[1] 3404 #not getting replaced

#However, if I try:
set.seed(1241)
?x1<- data.frame(item=sample(letters[1:10],1e6,replace=TRUE),a=sample(1:5000,1e6,replace=TRUE),b=sample(1:100,1e6,replace=TRUE),stringsAsFactors=F)
?y1<- data.frame(item="f",a=3,b=10,stringsAsFactors=F)
?length(x1$a[x1$a==1])
#[1] 208
?system.time(x1[x1$a%in%which.min(x1[x1$a<y1$a,]$a),]<- y1)
#user? system elapsed 
?# 0.124?? 0.016?? 0.138 
? length(x1$a[x1$a==1])
#[1] 0

#Tried Jessica's solution:
set.seed(1851)
?x<- data.frame(item=sample(letters[1:5],20,replace=TRUE),a=sample(1:15,20,replace=TRUE),b=sample(20:30,20,replace=TRUE),stringsAsFactors=F)
?y<- data.frame(item="f",a=3,b=10,stringsAsFactors=F)
?x[intersect(which(x$a < y$a),which.min(x$a)),] <- y
?x
#?? item? a? b
#1???? a? 8 25
#2???? a 10 26
#3???? f? 3 10 #replaced
#4???? e 15 26
#5???? b 13 20
#6???? a? 5 23
#7???? d? 4 29
#8???? e? 2 24
#9???? c? 7 30
#10??? e 14 24
#11??? d? 2 20
#12??? e 10 21
#13??? c 13 27
#14??? d 12 23
#15??? b 11 26
#16??? e? 5 22
#17??? c? 1 26? #it is not replaced
#18??? a? 8 21
#19??? e 10 26
#20??? c? 2 22

A.K.

----- Original Message -----
From: Dimitri Liakhovitski <dimitri.liakhovitski at gmail.com>
To: r-help <r-help at r-project.org>
Cc: 
Sent: Tuesday, January 29, 2013 4:11 PM
Subject: [R] Fastest way to compare a single value with all values in one column of a data frame

Hello!

I have a large data frame x:
x<-data.frame(item=letters[1:5],a=1:5,b=11:15)? # in actuality, x has 1000
rows
x$item<-as.character(x$item)
I also have a small data frame y with just 1 row:
y<-data.frame(item="f",a=3,b=10)
y$item<-as.character(y$item)

I have to decide if y$a is larger than the smallest of all the values in
x$a. If it is, I want y to replace the whole row in x that has the lowest
value in column a.
This is how I'd do it.

if(y$a>min(x$a)){
? whichmin<-which(x$a==min(x$a))
? x[whichmin,]<-y[1,]
}

I am wondering if there is a faster way of doing it. What would be the
fastest possible way? I'd have to do it, unfortunately, many-many times.

Thank you very much!
Dimitri Liakhovitski
gfk.com <http://marketfusionanalytics.com/>

??? [[alternative HTML version deleted]]

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
An embedded and charset-unspecified text was scrubbed...
Name: not available
URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20130130/bfe803c0/attachment.pl>
An embedded and charset-unspecified text was scrubbed...
Name: not available
URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20130130/2b8a9b6a/attachment.pl>
Hi,
Any chance x$a to have the same number repeated?

If `Item` and `a` are unique,? I guess both the solutions should work.

set.seed(1851)
x<- data.frame(item=sample(letters[1:20],20,replace=F),a=sample(1:45,20,replace=F),b=sample(20:50,20,replace=F),stringsAsFactors=F)
y<- data.frame(item="z",a=3,b=10,stringsAsFactors=F)

x[intersect(which(x$a < y$a),which.min(x$a)),]
?#? item a? b
#17??? c 1 48
?x[x$a==which.min(x$a[x$a<y$a]),]
#?? item a? b
#17??? c 1 48
#or 

x[x$a%in%which.min(x$a[x$a<y$a]),]
#?? item a? b
#17??? c 1 48

x[x$a%in%which.min(x$a[x$a<y$a]),]<-y

tail(x)
#?? item? a? b
#15??? q 45 30
#16??? g 10 23
#17??? z? 3 10
#18??? r 15 39
#19??? l 18 45
#20??? t 35 33

#However, if `item` column is unique, but `a` is not, then the one I mentioned previously arise.
set.seed(1851)
x1<- data.frame(item=sample(letters[1:20],20,replace=F),a=sample(1:10,20,replace=T),b=sample(20:50,20,replace=F),stringsAsFactors=F)
y1<- data.frame(item="z",a=3,b=10,stringsAsFactors=F)

x1[intersect(which(x1$a < y1$a),which.min(x1$a)),]
?# item a? b
#3??? s 1 41
x1[x1$a==which.min(x1$a[x1$a<y1$a]),]
?#? item a? b
#3???? s 1 41
#11??? h 1 46
#17??? c 1 48
x1[x1$a==which.min(x1$a[x1$a<y1$a]),]<- y1
A.K.
Sorry - I should have clarified:
My identifiers (in column "item") will always be unique. In other words, one entry in column "item" will never be repeated - neither in x nor in y.
Dimitri

On Wed, Jan 30, 2013 at 1:27 PM, Dimitri Liakhovitski <dimitri.liakhovitski at gmail.com> wrote:

Thank you, everyone! I'll try to test those different approaches. Really appreciate your help!
>Dimitri
>
>
>On Wed, Jan 30, 2013 at 11:03 AM, arun <smartpink111 at yahoo.com> wrote:
>
>HI,
>>
>>Sorry, my previous solution doesn't work.
>>This should work for your dataset:
>>set.seed(1851)
>>x<- data.frame(item=sample(letters[1:5],20,replace=TRUE),a=sample(1:15,20,replace=TRUE),b=sample(20:30,20,replace=TRUE),stringsAsFactors=F)
>>y<- data.frame(item="f",a=3,b=10,stringsAsFactors=F)
>>?x[x$a%in%which.min(x[x$a<y$a,]$a),]<- y #if there are multiple minimum values
>>
>>set.seed(1241)
>>x1<- data.frame(item=sample(letters[1:10],1e4,replace=TRUE),a=sample(1:30,1e4,replace=TRUE),b=sample(1:100,1e4,replace=TRUE),stringsAsFactors=F)
>>y1<- data.frame(item="f",a=3,b=10,stringsAsFactors=F)
>>length(x1$a[x1$a==1])
>>#[1] 330
>>?system.time({x1[x1$a%in%which.min(x1[x1$a<y1$a,]$a),]<- y1})
>>#?? user? system elapsed
>>?# 0.000?? 0.000?? 0.001
>>length(x1$a[x1$a==1])
>>#[1] 0
>>
>>
>>#For some reason, it is not working when the multiple number of minimum values > some value
>>
>>set.seed(1241)
>>x1<- data.frame(item=sample(letters[1:10],1e5,replace=TRUE),a=sample(1:30,1e5,replace=TRUE),b=sample(1:100,1e5,replace=TRUE),stringsAsFactors=F)
>>y1<- data.frame(item="f",a=3,b=10,stringsAsFactors=F)
>>length(x1$a[x1$a==1])
>>#[1] 3404
>>x1[x1$a%in%which.min(x1[x1$a<y1$a,]$a),]<- y1
>>?length(x1$a[x1$a==1])
>>#[1] 3404 #not getting replaced
>>
>>#However, if I try:
>>set.seed(1241)
>>?x1<- data.frame(item=sample(letters[1:10],1e6,replace=TRUE),a=sample(1:5000,1e6,replace=TRUE),b=sample(1:100,1e6,replace=TRUE),stringsAsFactors=F)
>>?y1<- data.frame(item="f",a=3,b=10,stringsAsFactors=F)
>>?length(x1$a[x1$a==1])
>>#[1] 208
>>?system.time(x1[x1$a%in%which.min(x1[x1$a<y1$a,]$a),]<- y1)
>>#user? system elapsed
>>?# 0.124?? 0.016?? 0.138
>>? length(x1$a[x1$a==1])
>>#[1] 0
>>
>>
>>#Tried Jessica's solution:
>>set.seed(1851)
>>?x<- data.frame(item=sample(letters[1:5],20,replace=TRUE),a=sample(1:15,20,replace=TRUE),b=sample(20:30,20,replace=TRUE),stringsAsFactors=F)
>>?y<- data.frame(item="f",a=3,b=10,stringsAsFactors=F)
>>?x[intersect(which(x$a < y$a),which.min(x$a)),] <- y
>>
>>?x
>>#?? item? a? b
>>#1???? a? 8 25
>>#2???? a 10 26
>>#3???? f? 3 10 #replaced
>>#4???? e 15 26
>>#5???? b 13 20
>>#6???? a? 5 23
>>#7???? d? 4 29
>>#8???? e? 2 24
>>#9???? c? 7 30
>>#10??? e 14 24
>>#11??? d? 2 20
>>#12??? e 10 21
>>#13??? c 13 27
>>#14??? d 12 23
>>#15??? b 11 26
>>#16??? e? 5 22
>>#17??? c? 1 26? #it is not replaced
>>#18??? a? 8 21
>>#19??? e 10 26
>>#20??? c? 2 22
>>
>>
>>
>>
>>A.K.
>>
>>
>>
>>
>>
>>----- Original Message -----
>>From: Dimitri Liakhovitski <dimitri.liakhovitski at gmail.com>
>>To: r-help <r-help at r-project.org>
>>Cc:
>>Sent: Tuesday, January 29, 2013 4:11 PM
>>Subject: [R] Fastest way to compare a single value with all values in one column of a data frame
>>
>>
>>Hello!
>>
>>I have a large data frame x:
>>x<-data.frame(item=letters[1:5],a=1:5,b=11:15)? # in actuality, x has 1000
>>rows
>>x$item<-as.character(x$item)
>>I also have a small data frame y with just 1 row:
>>y<-data.frame(item="f",a=3,b=10)
>>y$item<-as.character(y$item)
>>
>>I have to decide if y$a is larger than the smallest of all the values in
>>x$a. If it is, I want y to replace the whole row in x that has the lowest
>>value in column a.
>>This is how I'd do it.
>>
>>if(y$a>min(x$a)){
>>? whichmin<-which(x$a==min(x$a))
>>? x[whichmin,]<-y[1,]
>>}
>>
>>
>>I am wondering if there is a faster way of doing it. What would be the
>>fastest possible way? I'd have to do it, unfortunately, many-many times.
>>
>>Thank you very much!
>>
>>--
>>Dimitri Liakhovitski
>>
>>gfk.com <http://marketfusionanalytics.com/>
>>
>>??? [[alternative HTML version deleted]]
>>
>>______________________________________________
>>R-help at r-project.org mailing list
>>https://stat.ethz.ch/mailman/listinfo/r-help
>>PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>and provide commented, minimal, self-contained, reproducible code.
>>
>>
>
>
>-- 
>
>Dimitri Liakhovitski
>gfk.com
Dimitri Liakhovitski
gfk.com
An embedded and charset-unspecified text was scrubbed...
Name: not available
URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20130130/af14a519/attachment.pl>