Skip to content

- Trying to replicate VLOOKUP in R - help needed

6 messages · Gregg, Andrew Robinson, Mitchell Maltenfort +1 more

#
PROBLEM: I am trying to replicate something like a VLOOKUP in R but am having no success - need a bit of help.

GIVEN DATA SET (data.table): (looks something like this, but much bigger)

NAME                	 TOTALAUTH	ASSIGNED_COMPANY
ABERDEEN PROVING GROUND	     1	               NA
ADELPHI LABORATORY CENTER    1	               NA
CARLISLE BARRACKS	     1	               NA
DETROIT ARSENAL      	     1	               NA
DUGWAY PROVING GROUND	     1	               NA
FORT A P HILL	             1	               NA
FORT BELVOIR	             1	               NA
FORT BENNING	             1	               NA
FORT BLISS	             1	               NA
FORT BRAGG	             1	               NA
FORT BUCHANAN	             1	               NA


I am trying to update the values in the ASSIGNED_COMPANY column from NAs to a value that matches based on the "key" word like below.

NAME                	 TOTALAUTH	ASSIGNED_COMPANY
ABERDEEN PROVING GROUND	     1	               NEC Aberdeen
ADELPHI LABORATORY CENTER    1	               NEC Adelphi
CARLISLE BARRACKS	     1	               NEC Carlise
DETROIT ARSENAL      	     1	               NEC Detroit
DUGWAY PROVING GROUND	     1	               NEC Dugway
FORT A P HILL	             1	               NEC AP Hill
FORT BELVOIR	             1	               NEC Belvoir
FORT BENNING	             1	               NEC Benning
FORT BLISS	             1	               NEC Bliss
FORT BRAGG	             1	               NEC Bragg
FORT BUCHANAN	             1	               NEC Buchanon


In a nutshell, for instance.......

I want to search for the keyword "ABERDEEN" in the NAME column, and for every row where it exists, I want to update the NA in the ASSIGNED_COMPANY column to "NEC Aberdeen"

I want to search for the keyword "ADELPHI" in the NAME column, and for every row where it exists, I want to update the NA in the ASSIGNED_COMPANY column to "NEC ADELPHI"

....... and so on for every value in the NAME column - so in the end a I have matching names in the ASSIGNED_COMPANY column.

I can use an if statement because it is not vectorized.

If I use an ifelse statement, the "else" rewrites any changes with ""

Something so simple should not be difficult.

Some of the methods I attempted to use are below along with the errors I get...






###################CODE#######################################

library(data.table)
library(dplyr)
library(stringr)


VLOOKUP_inR <- data.table::fread("DATASET_TESTINGONLY.csv")

#METHOD 1 FAILS
VLOOKUP_inR %>% dplyr::rename_if(grepl("ADELPHI", VLOOKUP_inR$NAME, useBytes = TRUE), "NEC Adelphi")

Error in get(.x, .env, mode = "function") : 

  object 'NEC Adelphi' of mode 'function' was not found

#METHOD 2 FAILS
if(stringr::str_detect(VLOOKUP_inR$NAME, "ADELPHI")) {
        VLOOKUP_inR$ASSIGNED_COMPANY == "NEC Adelphi"
}

Warning message:
In if (stringr::str_detect(VLOOKUP_inR$NAME, "ADELPHI")) { :
  the condition has length > 1 and only the first element will be used


#METHOD 3 FAILS
ifelse(stringr::str_detect(ASIP_combined_location_tally$NAME, "ADELPHI"), ASIP_combined_location_tally$ASSIGNED_COMPANY == ASIP_combined_location_tally$ASSIGNED_COMPANY)

Error in ifelse(stringr::str_detect(ASIP_combined_location_tally$NAME,  : 

  argument "no" is missing, with no default

#METHOD4 FAILS
VLOOKUP_inR_matching <- VLOOKUP_inR %>% mutate(ASSIGNED_COMPANY = ifelse(grepl(pattern = 'ABERDEEN', x = NAME), 'NEC Aberdeen', ''))
VLOOKUP_inR_matching <- VLOOKUP_inR_matching %>% mutate(ASSIGNED_COMPANY = ifelse(grepl(pattern = 'ADELPHI', x = NAME), 'NEC Adelphi', '')) 

VLOOKUP_inR_matching <- VLOOKUP_inR_matching %>% mutate(ASSIGNED_COMPANY = ifelse(grepl(pattern = 'CARLISLE', x = NAME), 'NEC Carlisle Barracks', ''))
VLOOKUP_inR_matching <- VLOOKUP_inR_matching %>% mutate(ASSIGNED_COMPANY = ifelse(grepl(pattern = 'DETROIT', x = NAME), 'NEC Detroit Arsenal', ''))
VLOOKUP_inR_matching <- VLOOKUP_inR_matching %>% mutate(ASSIGNED_COMPANY = ifelse(grepl(pattern = 'BELVOIR', x = NAME), 'NEC Fort Belvoir', ''))

-----------the 4th method just over writes all previous changers back to ""





######################################################################

Any help offered would be so very greatly appreciated.

Thanks you.

r/
gregg powell
AZ









-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 509 bytes
Desc: OpenPGP digital signature
URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20201116/acdd4bad/attachment.sig>
Message-ID: <IkbJMS_4gaUyHHbjriqfAZaaprHTwr2mbKOuwuQQ2NwsgOo8RwZowms0bl4SxKgAQG2qGMtfucx6uIDERjiolsb5Ii-ij_4bjVh_GYEVTDQ=@protonmail.com>
#
Hi Gregg,

it's not clear from your context if all of ASSIGNED _COMPANY is NA or what the classes of the objects are.  Try the following ideas, none of which are tested.  I assume that the data set is called location.

location$ASSIGNED_COMPANY <- as.character(location$NAME)

is.a.FORT <- substr(location$ASSIGNED_COMPANY, 1, 4) == "FORT"

location$ASSIGNED_COMPANY[!is.a.FORT] <-
  sapply(location$ASSIGNED_COMPANY[!is.a.FORT],
  function(x) strsplit(x)[[1]][[1]]) # retains first name if not a fort

location$ASSIGNED_COMPANY[is.a.FORT] <-
  substr(location$ASSIGNED_COMPANY[is.a.FORT], 6,
  nchar(location$ASSIGNED _COMPANY[is.a.FORT])) # Strips FORT from Forts

substr(location$ASSIGNED_COMPANY, 2, nchar(location$ASSIGNED_COMPANY)) <-
  tolower(substr(location$ASSIGNED _COMPANY, 2,
  nchar(location$ASSIGNED _COMPANY))) # lower case word

location$ASSIGNED_COMPANY <- paste("NEC", location$ASSIGNED_COMPANY)

or you can just do

location$ASSIGNED_COMPANY[location$NAME == "ABERDEEN PROVING GROUND"] <- "NEC Aberdeen"

for each option ....

Cheers,

Andrew

--
Andrew Robinson
Director, CEBRA and Professor of Biosecurity,
School/s of BioSciences and Mathematics & Statistics
University of Melbourne, VIC 3010 Australia
Tel: (+61) 0403 138 955
Email: apro at unimelb.edu.au
Website: https://researchers.ms.unimelb.edu.au/~apro at unimelb/

I acknowledge the Traditional Owners of the land I inhabit, and pay my respects to their Elders.
On Nov 17, 2020, 8:05 AM +1100, Gregg via R-help <R-help at r-project.org>, wrote:
PROBLEM: I am trying to replicate something like a VLOOKUP in R but am having no success - need a bit of help.

GIVEN DATA SET (data.table): (looks something like this, but much bigger)

NAME TOTALAUTH ASSIGNED_COMPANY
ABERDEEN PROVING GROUND 1 NA
ADELPHI LABORATORY CENTER 1 NA
CARLISLE BARRACKS 1 NA
DETROIT ARSENAL 1 NA
DUGWAY PROVING GROUND 1 NA
FORT A P HILL 1 NA
FORT BELVOIR 1 NA
FORT BENNING 1 NA
FORT BLISS 1 NA
FORT BRAGG 1 NA
FORT BUCHANAN 1 NA


I am trying to update the values in the ASSIGNED_COMPANY column from NAs to a value that matches based on the "key" word like below.

NAME TOTALAUTH ASSIGNED_COMPANY
ABERDEEN PROVING GROUND 1 NEC Aberdeen
ADELPHI LABORATORY CENTER 1 NEC Adelphi
CARLISLE BARRACKS 1 NEC Carlise
DETROIT ARSENAL 1 NEC Detroit
DUGWAY PROVING GROUND 1 NEC Dugway
FORT A P HILL 1 NEC AP Hill
FORT BELVOIR 1 NEC Belvoir
FORT BENNING 1 NEC Benning
FORT BLISS 1 NEC Bliss
FORT BRAGG 1 NEC Bragg
FORT BUCHANAN 1 NEC Buchanon


In a nutshell, for instance.......

I want to search for the keyword "ABERDEEN" in the NAME column, and for every row where it exists, I want to update the NA in the ASSIGNED_COMPANY column to "NEC Aberdeen"

I want to search for the keyword "ADELPHI" in the NAME column, and for every row where it exists, I want to update the NA in the ASSIGNED_COMPANY column to "NEC ADELPHI"

....... and so on for every value in the NAME column - so in the end a I have matching names in the ASSIGNED_COMPANY column.

I can use an if statement because it is not vectorized.

If I use an ifelse statement, the "else" rewrites any changes with ""

Something so simple should not be difficult.

Some of the methods I attempted to use are below along with the errors I get...






###################CODE#######################################

library(data.table)
library(dplyr)
library(stringr)


VLOOKUP_inR <- data.table::fread("DATASET_TESTINGONLY.csv")

#METHOD 1 FAILS
VLOOKUP_inR %>% dplyr::rename_if(grepl("ADELPHI", VLOOKUP_inR$NAME, useBytes = TRUE), "NEC Adelphi")

Error in get(.x, .env, mode = "function") :

object 'NEC Adelphi' of mode 'function' was not found

#METHOD 2 FAILS
if(stringr::str_detect(VLOOKUP_inR$NAME, "ADELPHI")) {
VLOOKUP_inR$ASSIGNED_COMPANY == "NEC Adelphi"
}

Warning message:
In if (stringr::str_detect(VLOOKUP_inR$NAME, "ADELPHI")) { :
the condition has length > 1 and only the first element will be used


#METHOD 3 FAILS
ifelse(stringr::str_detect(ASIP_combined_location_tally$NAME, "ADELPHI"), ASIP_combined_location_tally$ASSIGNED_COMPANY == ASIP_combined_location_tally$ASSIGNED_COMPANY)

Error in ifelse(stringr::str_detect(ASIP_combined_location_tally$NAME, :

argument "no" is missing, with no default

#METHOD4 FAILS
VLOOKUP_inR_matching <- VLOOKUP_inR %>% mutate(ASSIGNED_COMPANY = ifelse(grepl(pattern = 'ABERDEEN', x = NAME), 'NEC Aberdeen', ''))
VLOOKUP_inR_matching <- VLOOKUP_inR_matching %>% mutate(ASSIGNED_COMPANY = ifelse(grepl(pattern = 'ADELPHI', x = NAME), 'NEC Adelphi', ''))

VLOOKUP_inR_matching <- VLOOKUP_inR_matching %>% mutate(ASSIGNED_COMPANY = ifelse(grepl(pattern = 'CARLISLE', x = NAME), 'NEC Carlisle Barracks', ''))
VLOOKUP_inR_matching <- VLOOKUP_inR_matching %>% mutate(ASSIGNED_COMPANY = ifelse(grepl(pattern = 'DETROIT', x = NAME), 'NEC Detroit Arsenal', ''))
VLOOKUP_inR_matching <- VLOOKUP_inR_matching %>% mutate(ASSIGNED_COMPANY = ifelse(grepl(pattern = 'BELVOIR', x = NAME), 'NEC Fort Belvoir', ''))

-----------the 4th method just over writes all previous changers back to ""





######################################################################

Any help offered would be so very greatly appreciated.

Thanks you.

r/
gregg powell
AZ
#
ASSIGNED_COMPANY[grep("ADELPHI",NAME)] <-"NEC ADELPHI" is what I'd try
On Mon, Nov 16, 2020 at 4:27 PM Andrew Robinson <apro at unimelb.edu.au> wrote:

            

  
  
1 day later
#
Thanks Andrew and Mitch for your help.

With your assistance, I was able to sort this out.

Since I have to do this type of thing of often, and since there is no existing package/function (yet) that makes this easy, if ever I get to the point were I develop enough skill to build and submit a new package, a simple little VLOOKUP(like) function contained in a package would be of great use.

r/
Gregg




??????? Original Message ???????
On Monday, November 16, 2020 1:56 PM, Gregg via R-help <r-help at r-project.org> wrote:

            

        
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 509 bytes
Desc: OpenPGP digital signature
URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20201118/3038769b/attachment.sig>
Message-ID: <v_VHmXUhXdNTapFe_WtjWKUrZKCexqCncKIeQfEmc5W5w1EZnLrymDqmrib1Lms9pTxDzbO2GT7UrcD-qVetPNXF0fud4sBb-KsWnChspjQ=@protonmail.com>
#
Instead, learn how to use the merge function, or perhaps the dplyr::left_join function. VLOOKUP is really not necessary.
On November 18, 2020 7:11:49 AM PST, Gregg via R-help <r-help at r-project.org> wrote:

  
    
#
I will do that...

Thanks again Jeff.

r/
Gregg Powell




??????? Original Message ???????
On Wednesday, November 18, 2020 8:36 AM, Jeff Newmiller <jdnewmil at dcn.davis.ca.us> wrote:

            

        
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 509 bytes
Desc: OpenPGP digital signature
URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20201118/13bdf277/attachment.sig>
Message-ID: <onTAnDYYFxdUFxTS6WINv5JkQ2HQ9zpMyBm-P3ZwpmaA_KyLbiA0neHZTdvb2TDIpQVGTw07IglmO0fWKHZ8-iccOgXsSya_rIVA6cyTuYY=@protonmail.com>