Skip to content

VLOOKUP in R - tried everything.

4 messages · bjmjarrett, Peter Dalgaard, Ethan Brown +1 more

#
I am attempting to emulate the VLOOKUP function from Excel in R. 

I want to compare one column (coll.minus.release) with another
(release.days) to get the number of parasitoid released at that time
(TotalParasitoids).

for example:

coll.minus.release	release.days	ParasitoidTotal
-12	                             -266	             1700
8	                             -259	             1000
8	                             -225	             1000
28	                             -216	             1000
41	                             -28	             1148
77	                              -12	             1144
105	                               0	             1160
105	                               8	              972
125	                              28	             1146
125	                              41	             1004
125	                              77	             1003
125	                              97	             1010
....
2772	                              NA	               NA
2801	                              NA	               NA
2834	                              NA	               NA


vlookup <- function(x) data[data$release.days==x,6] # as I have three other
columns that are not of interest

vlookup(-12) = 1144, and so on, which is great.

However, when I try:

unlist(sapply(coll.minus.release,vlookup)) to apply it to the whole
coll.minus.release

it works up to a point, as it doesn't give me 132 values for the 132 values
of coll.minus.release. Is this because the table of release.days and
TotalParasitoid has less values than coll.minus.release (108 compared to
132)? To fill the gap I put in 0, and as none of the coll.minus.release
values = 0 I think it wouldn't affect it.

Other things I have tried include findInterval and match. 

data[findInterval(x=data$coll.minus.release,vec=data$release.days,"ParasitoidTotal")] 

didn't work as it said vec must be sorted non-decreasingly and didn't work
when I randomised the release.days and ParasitoidTotal columns as it doesn't
matter which order they are in.

Thanks for reading all the way through - I wanted all the information I felt
you might need to help me in it.

Any help will be greatly appreciated.

--
View this message in context: http://r.789695.n4.nabble.com/VLOOKUP-in-R-tried-everything-tp3571107p3571107.html
Sent from the R help mailing list archive at Nabble.com.
#
On Jun 3, 2011, at 16:59 , bjmjarrett wrote:

            
I wager that a look at setdiff(coll.minus.release,release.days) and vice versa would be illuminating. Notice that with your definition, vlookup(31415926) or any other number absent from release.days gives a zero-length vector.

Presumably, you are looking for match().

  
    
#
Even after I discovered match(), it took me a little while to figure
out how to use it for this task, so to add on to Peter's comment--to
add a column for total for each value of "coll.minus.release", try the
following:

data$ParasitoidMatch <-
data$ParasitoidTotal[match(data$coll.minus.release,
data$release.days)]

Note also that match() only returns the first match it finds, and by
default returns NA for no match.
On Fri, Jun 3, 2011 at 10:43 AM, peter dalgaard <pdalgd at gmail.com> wrote:
#
On Jun 3, 2011, at 12:43 PM, peter dalgaard wrote:

            
If I remember correctly VLOOKUP (and HLOOKUP)  require sorted  
cutpoints and picks the lowest number that the vector element exceeds  
or equals. If so, then I suspect that an indexing strategy with  
findInterval should work correctly if the dataframe is sorted with  
order(). If you post a cut-pasteable example with desited input and  
output, I sure someone can rig something up.