Skip to content
Prev 360405 / 398502 Next

Please assist -- Unable to remove '-' character from char vector--

Thank you Jim,
The code did assist me to get the what I needed.
Also, I learnt that there are different types of dashes
(en-dash/em-dash/hyphen) as explained on this site :
http://www.punctuationmatters.com/hyphen-dash-n-dash-and-m-dash/

I achieved it by executing below command after going through this page
on stackoverflow:
http://stackoverflow.com/questions/9223795/how-to-correctly-deal-with-escaped-unicode-characters-in-r-e-g-the-em-dash

splitends<-sapply(end,strsplit,"-|\u2013|,")

where '\u2013' is, i guess, the unicode for en-dash/em-dash character
in the ranges values.
I had scrapped the HTML table from this web page :
https://en.wikipedia.org/wiki/List_of_World_Heritage_in_Danger
and range values does have en-dash characters.

For now the issue is resolved but how does one capture values similar
to  '\u2013' for other possible special cases to be specified in the
regex ?

Regards,
Sunny Singha.
On Mon, Apr 25, 2016 at 12:39 PM, Jim Lemon <drjimlemon at gmail.com> wrote: