I am attempting to emulate the VLOOKUP function from Excel in R. I want to compare one column (coll.minus.release) with another (release.days) to get the number of parasitoid released at that time (TotalParasitoids). for example: coll.minus.release release.days ParasitoidTotal -12 -266 1700 8 -259 1000 8 -225 1000 28 -216 1000 41 -28 1148 77 -12 1144 105 0 1160 105 8 972 125 28 1146 125 41 1004 125 77 1003 125 97 1010 .... 2772 NA NA 2801 NA NA 2834 NA NA vlookup <- function(x) data[data$release.days==x,6] # as I have three other columns that are not of interest vlookup(-12) = 1144, and so on, which is great. However, when I try: unlist(sapply(coll.minus.release,vlookup)) to apply it to the whole coll.minus.release it works up to a point, as it doesn't give me 132 values for the 132 values of coll.minus.release. Is this because the table of release.days and TotalParasitoid has less values than coll.minus.release (108 compared to 132)? To fill the gap I put in 0, and as none of the coll.minus.release values = 0 I think it wouldn't affect it. Other things I have tried include findInterval and match. data[findInterval(x=data$coll.minus.release,vec=data$release.days,"ParasitoidTotal")] didn't work as it said vec must be sorted non-decreasingly and didn't work when I randomised the release.days and ParasitoidTotal columns as it doesn't matter which order they are in. Thanks for reading all the way through - I wanted all the information I felt you might need to help me in it. Any help will be greatly appreciated. -- View this message in context: http://r.789695.n4.nabble.com/VLOOKUP-in-R-tried-everything-tp3571107p3571107.html Sent from the R help mailing list archive at Nabble.com.
VLOOKUP in R - tried everything.
4 messages · bjmjarrett, Peter Dalgaard, Ethan Brown +1 more
On Jun 3, 2011, at 16:59 , bjmjarrett wrote:
I am attempting to emulate the VLOOKUP function from Excel in R. I want to compare one column (coll.minus.release) with another (release.days) to get the number of parasitoid released at that time (TotalParasitoids). for example: coll.minus.release release.days ParasitoidTotal -12 -266 1700 8 -259 1000 8 -225 1000 28 -216 1000 41 -28 1148 77 -12 1144 105 0 1160 105 8 972 125 28 1146 125 41 1004 125 77 1003 125 97 1010 .... 2772 NA NA 2801 NA NA 2834 NA NA vlookup <- function(x) data[data$release.days==x,6] # as I have three other columns that are not of interest vlookup(-12) = 1144, and so on, which is great. However, when I try: unlist(sapply(coll.minus.release,vlookup)) to apply it to the whole coll.minus.release it works up to a point, as it doesn't give me 132 values for the 132 values of coll.minus.release. Is this because the table of release.days and TotalParasitoid has less values than coll.minus.release (108 compared to 132)? To fill the gap I put in 0, and as none of the coll.minus.release values = 0 I think it wouldn't affect it.
I wager that a look at setdiff(coll.minus.release,release.days) and vice versa would be illuminating. Notice that with your definition, vlookup(31415926) or any other number absent from release.days gives a zero-length vector. Presumably, you are looking for match().
Other things I have tried include findInterval and match. data[findInterval(x=data$coll.minus.release,vec=data$release.days,"ParasitoidTotal")] didn't work as it said vec must be sorted non-decreasingly and didn't work when I randomised the release.days and ParasitoidTotal columns as it doesn't matter which order they are in. Thanks for reading all the way through - I wanted all the information I felt you might need to help me in it. Any help will be greatly appreciated. -- View this message in context: http://r.789695.n4.nabble.com/VLOOKUP-in-R-tried-everything-tp3571107p3571107.html Sent from the R help mailing list archive at Nabble.com.
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Peter Dalgaard Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Email: pd.mes at cbs.dk Priv: PDalgd at gmail.com
Even after I discovered match(), it took me a little while to figure out how to use it for this task, so to add on to Peter's comment--to add a column for total for each value of "coll.minus.release", try the following: data$ParasitoidMatch <- data$ParasitoidTotal[match(data$coll.minus.release, data$release.days)] Note also that match() only returns the first match it finds, and by default returns NA for no match.
On Fri, Jun 3, 2011 at 10:43 AM, peter dalgaard <pdalgd at gmail.com> wrote:
On Jun 3, 2011, at 16:59 , bjmjarrett wrote:
I am attempting to emulate the VLOOKUP function from Excel in R. I want to compare one column (coll.minus.release) with another (release.days) to get the number of parasitoid released at that time (TotalParasitoids). for example: coll.minus.release ? ?release.days ? ?ParasitoidTotal -12 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?-266 ? ? ? ? ? ? ? ? ? ?1700 8 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?-259 ? ? ? ? ? ? ? ? ? ?1000 8 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?-225 ? ? ? ? ? ? ? ? ? ?1000 28 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? -216 ? ? ? ? ? ? ? ? ? ?1000 41 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? -28 ? ? ? ? ? ? ? ? ? ? 1148 77 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?-12 ? ? ? ? ? ? ? ? ? ?1144 105 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?0 ? ? ? ? ? ? ? ? ? ? 1160 105 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?8 ? ? ? ? ? ? ? ? ? ? ?972 125 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? 28 ? ? ? ? ? ? ? ? ? ? 1146 125 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? 41 ? ? ? ? ? ? ? ? ? ? 1004 125 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? 77 ? ? ? ? ? ? ? ? ? ? 1003 125 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? 97 ? ? ? ? ? ? ? ? ? ? 1010 .... 2772 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?NA ? ? ? ? ? ? ? ? ? ? ? NA 2801 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?NA ? ? ? ? ? ? ? ? ? ? ? NA 2834 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?NA ? ? ? ? ? ? ? ? ? ? ? NA vlookup <- function(x) data[data$release.days==x,6] # as I have three other columns that are not of interest vlookup(-12) = 1144, and so on, which is great. However, when I try: unlist(sapply(coll.minus.release,vlookup)) to apply it to the whole coll.minus.release it works up to a point, as it doesn't give me 132 values for the 132 values of coll.minus.release. Is this because the table of release.days and TotalParasitoid has less values than coll.minus.release (108 compared to 132)? To fill the gap I put in 0, and as none of the coll.minus.release values = 0 I think it wouldn't affect it.
I wager that a look at setdiff(coll.minus.release,release.days) and vice versa would be illuminating. Notice that with your definition, vlookup(31415926) or any other number absent from release.days gives a zero-length vector. Presumably, you are looking for match().
Other things I have tried include findInterval and match. data[findInterval(x=data$coll.minus.release,vec=data$release.days,"ParasitoidTotal")] didn't work as it said vec must be sorted non-decreasingly and didn't work when I randomised the release.days and ParasitoidTotal columns as it doesn't matter which order they are in. Thanks for reading all the way through - I wanted all the information I felt you might need to help me in it. Any help will be greatly appreciated. -- View this message in context: http://r.789695.n4.nabble.com/VLOOKUP-in-R-tried-everything-tp3571107p3571107.html Sent from the R help mailing list archive at Nabble.com.
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
-- Peter Dalgaard Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Email: pd.mes at cbs.dk ?Priv: PDalgd at gmail.com
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
On Jun 3, 2011, at 12:43 PM, peter dalgaard wrote:
On Jun 3, 2011, at 16:59 , bjmjarrett wrote:
I am attempting to emulate the VLOOKUP function from Excel in R. I want to compare one column (coll.minus.release) with another (release.days) to get the number of parasitoid released at that time (TotalParasitoids). for example: coll.minus.release release.days ParasitoidTotal -12 -266 1700 8 -259 1000 8 -225 1000 28 -216 1000 41 -28 1148 77 -12 1144 105 0 1160 105 8 972 125 28 1146 125 41 1004 125 77 1003 125 97 1010 .... 2772 NA NA 2801 NA NA 2834 NA NA vlookup <- function(x) data[data$release.days==x,6] # as I have three other columns that are not of interest vlookup(-12) = 1144, and so on, which is great. However, when I try: unlist(sapply(coll.minus.release,vlookup)) to apply it to the whole coll.minus.release it works up to a point, as it doesn't give me 132 values for the 132 values of coll.minus.release. Is this because the table of release.days and TotalParasitoid has less values than coll.minus.release (108 compared to 132)? To fill the gap I put in 0, and as none of the coll.minus.release values = 0 I think it wouldn't affect it.
I wager that a look at setdiff(coll.minus.release,release.days) and vice versa would be illuminating. Notice that with your definition, vlookup(31415926) or any other number absent from release.days gives a zero-length vector. Presumably, you are looking for match().
If I remember correctly VLOOKUP (and HLOOKUP) require sorted cutpoints and picks the lowest number that the vector element exceeds or equals. If so, then I suspect that an indexing strategy with findInterval should work correctly if the dataframe is sorted with order(). If you post a cut-pasteable example with desited input and output, I sure someone can rig something up.
David. > > >> >> Other things I have tried include findInterval and match. >> >> data[findInterval(x=data$coll.minus.release,vec=data >> $release.days,"ParasitoidTotal")] >> >> didn't work as it said vec must be sorted non-decreasingly and >> didn't work >> when I randomised the release.days and ParasitoidTotal columns as >> it doesn't >> matter which order they are in. >> >> Thanks for reading all the way through - I wanted all the >> information I felt >> you might need to help me in it. >> >> Any help will be greatly appreciated. >> >> -- >> View this message in context: http://r.789695.n4.nabble.com/VLOOKUP-in-R-tried-everything-tp3571107p3571107.html >> Sent from the R help mailing list archive at Nabble.com. >> >> ______________________________________________ >> R-help at r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. > > -- > Peter Dalgaard > Center for Statistics, Copenhagen Business School > Solbjerg Plads 3, 2000 Frederiksberg, Denmark > Phone: (+45)38153501 > Email: pd.mes at cbs.dk Priv: PDalgd at gmail.com > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD West Hartford, CT