Skip to content

Please assist -- Unable to remove '-' character from char vector--

5 messages · Jim Lemon, PIKAL Petr, Sunny Singha +1 more

#
Hi,
I have a char vector with year values. Some cells have single year
value '2001-' and some have range like 1996-2007.
I need to remove hyphen character '-' from all the values within the
character vector named as 'end'. After removing the hyphen I need to
get the last
number from the cells where there are year range values i.e if the
cell has range 1996-2007, the code should return me 2007.

How could I get this done?

 Below are the values within this char vector:
[1] "2001-"            "1992-"            "2013-"            "2013-"
          "2013-"            "2013-"
 [7] "2003-"            "2010-"            "2009-"            "1986-"
          "2012-"            "2003-"
[13] "2005-"            "2013-"            "2003-"            "2013-"
          "1993?2007, 2010-" "2012-"
[19] "1984?1992, 1996-" "2015-"            "2009-"            "2000-"
          "2005-"            "1997-"
[25] "2012-"            "1997-"            "2002-"            "2006-"
          "1992-"            "2007-"
[31] "1997-"            "1982-"            "2015-"            "2015-"
          "2010-"            "1996?2007, 2011-"
[37] "2004-"            "1999-"            "2007-"            "1996-"
          "2013-"            "2012-"
[43] "2012-"            "2010-"            "2011-"            "1994-"
          "2014-"

I tried below command--> gsub('[-|,]', '', end)
This did remove all the hyphen character but not from cells having
range year values.Below is the result after executing above command:
As you see hypphen character is removed from single values but not
from ranges. Please guide.
[1] "2001"           "1992"           "2013"           "2013"
  "2013"           "2013"           "2003"
 [8] "2010"           "2009"           "1986"           "2012"
  "2003"           "2005"           "2013"
[15] "2003"           "2013"           "1993?2007 2010" "2012"
  "1984?1992 1996" "2015"           "2009"
[22] "2000"           "2005"           "1997"           "2012"
  "1997"           "2002"           "2006"
[29] "1992"           "2007"           "1997"           "1982"
  "2015"           "2015"           "2010"
[36] "1996?2007 2011" "2004"           "1999"           "2007"
  "1996"           "2013"           "2012"
[43] "2012"           "2010"           "2011"           "1994"
  "2014"

Regards,
Sunny Singha
#
Hi Sunny,
Try this:

# notice that I have replaced the fancy hyphens with real hyphens
end<-c("2001-","1992-","2013-","2013-","2013-","2013-",
 "1993-2007","2010-","2012-","1984-1992","1996-","2015-")
splitends<-sapply(end,strsplit,"-")
last_bit(x) return(x[length(x)])
sapply(splitends,last_bit)

Jim

On Mon, Apr 25, 2016 at 4:35 PM, Sunny Singha
<sunnysingha.analytics at gmail.com> wrote:
#
Hi
You probably meant
last_bit <- function(x) return(x[length(x)])
And good finalisation is

as.numeric(sapply(splitends,last_bit))

Cheers
Petr
________________________________
Tento e-mail a jak?koliv k n?mu p?ipojen? dokumenty jsou d?v?rn? a jsou ur?eny pouze jeho adres?t?m.
Jestli?e jste obdr?el(a) tento e-mail omylem, informujte laskav? neprodlen? jeho odes?latele. Obsah tohoto emailu i s p??lohami a jeho kopie vyma?te ze sv?ho syst?mu.
Nejste-li zam??len?m adres?tem tohoto emailu, nejste opr?vn?ni tento email jakkoliv u??vat, roz?i?ovat, kop?rovat ?i zve?ej?ovat.
Odes?latel e-mailu neodpov?d? za eventu?ln? ?kodu zp?sobenou modifikacemi ?i zpo?d?n?m p?enosu e-mailu.

V p??pad?, ?e je tento e-mail sou??st? obchodn?ho jedn?n?:
- vyhrazuje si odes?latel pr?vo ukon?it kdykoliv jedn?n? o uzav?en? smlouvy, a to z jak?hokoliv d?vodu i bez uveden? d?vodu.
- a obsahuje-li nab?dku, je adres?t opr?vn?n nab?dku bezodkladn? p?ijmout; Odes?latel tohoto e-mailu (nab?dky) vylu?uje p?ijet? nab?dky ze strany p??jemce s dodatkem ?i odchylkou.
- trv? odes?latel na tom, ?e p??slu?n? smlouva je uzav?ena teprve v?slovn?m dosa?en?m shody na v?ech jej?ch n?le?itostech.
- odes?latel tohoto emailu informuje, ?e nen? opr?vn?n uzav?rat za spole?nost ??dn? smlouvy s v?jimkou p??pad?, kdy k tomu byl p?semn? zmocn?n nebo p?semn? pov??en a takov? pov??en? nebo pln? moc byly adres?tovi tohoto emailu p??padn? osob?, kterou adres?t zastupuje, p?edlo?eny nebo jejich existence je adres?tovi ?i osob? j?m zastoupen? zn?m?.

This e-mail and any documents attached to it may be confidential and are intended only for its intended recipients.
If you received this e-mail by mistake, please immediately inform its sender. Delete the contents of this e-mail with all attachments and its copies from your system.
If you are not the intended recipient of this e-mail, you are not authorized to use, disseminate, copy or disclose this e-mail in any manner.
The sender of this e-mail shall not be liable for any possible damage caused by modifications of the e-mail or by delay with transfer of the email.

In case that this e-mail forms part of business dealings:
- the sender reserves the right to end negotiations about entering into a contract in any time, for any reason, and without stating any reasoning.
- if the e-mail contains an offer, the recipient is entitled to immediately accept such offer; The sender of this e-mail (offer) excludes any acceptance of the offer on the part of the recipient containing any amendment or variation.
- the sender insists on that the respective contract is concluded only upon an express mutual agreement on all its aspects.
- the sender of this e-mail informs that he/she is not authorized to enter into any contracts on behalf of the company except for cases in which he/she is expressly authorized to do so in writing, and such authorization or power of attorney is submitted to the recipient or the person represented by the recipient, or the existence of such authorization is known to the recipient of the person represented by the recipient.
#
Thank you Jim,
The code did assist me to get the what I needed.
Also, I learnt that there are different types of dashes
(en-dash/em-dash/hyphen) as explained on this site :
http://www.punctuationmatters.com/hyphen-dash-n-dash-and-m-dash/

I achieved it by executing below command after going through this page
on stackoverflow:
http://stackoverflow.com/questions/9223795/how-to-correctly-deal-with-escaped-unicode-characters-in-r-e-g-the-em-dash

splitends<-sapply(end,strsplit,"-|\u2013|,")

where '\u2013' is, i guess, the unicode for en-dash/em-dash character
in the ranges values.
I had scrapped the HTML table from this web page :
https://en.wikipedia.org/wiki/List_of_World_Heritage_in_Danger
and range values does have en-dash characters.

For now the issue is resolved but how does one capture values similar
to  '\u2013' for other possible special cases to be specified in the
regex ?

Regards,
Sunny Singha.
On Mon, Apr 25, 2016 at 12:39 PM, Jim Lemon <drjimlemon at gmail.com> wrote:
#
It's possible to target sequences of Unicode characters using a regex character class which does have a sequence operator. (R's sequence operator fails in my efforts.)

x <- "\"em\u2013dash\" \"em?dash\" \" em \u2016 dash\""
gsub('[\u2013:\u2016]', "", x)   # removes both
#[1] "\"emdash\" \"emdash\" \" em  dash\""