Please assist -- Unable to remove '-' character from char vector--
Thank you Jim, The code did assist me to get the what I needed. Also, I learnt that there are different types of dashes (en-dash/em-dash/hyphen) as explained on this site : http://www.punctuationmatters.com/hyphen-dash-n-dash-and-m-dash/ I achieved it by executing below command after going through this page on stackoverflow: http://stackoverflow.com/questions/9223795/how-to-correctly-deal-with-escaped-unicode-characters-in-r-e-g-the-em-dash splitends<-sapply(end,strsplit,"-|\u2013|,") where '\u2013' is, i guess, the unicode for en-dash/em-dash character in the ranges values. I had scrapped the HTML table from this web page : https://en.wikipedia.org/wiki/List_of_World_Heritage_in_Danger and range values does have en-dash characters. For now the issue is resolved but how does one capture values similar to '\u2013' for other possible special cases to be specified in the regex ? Regards, Sunny Singha.
On Mon, Apr 25, 2016 at 12:39 PM, Jim Lemon <drjimlemon at gmail.com> wrote:
Hi Sunny,
Try this:
# notice that I have replaced the fancy hyphens with real hyphens
end<-c("2001-","1992-","2013-","2013-","2013-","2013-",
"1993-2007","2010-","2012-","1984-1992","1996-","2015-")
splitends<-sapply(end,strsplit,"-")
last_bit(x) return(x[length(x)])
sapply(splitends,last_bit)
Jim
On Mon, Apr 25, 2016 at 4:35 PM, Sunny Singha
<sunnysingha.analytics at gmail.com> wrote:
Hi, I have a char vector with year values. Some cells have single year value '2001-' and some have range like 1996-2007. I need to remove hyphen character '-' from all the values within the character vector named as 'end'. After removing the hyphen I need to get the last number from the cells where there are year range values i.e if the cell has range 1996-2007, the code should return me 2007. How could I get this done? Below are the values within this char vector:
end
[1] "2001-" "1992-" "2013-" "2013-"
"2013-" "2013-"
[7] "2003-" "2010-" "2009-" "1986-"
"2012-" "2003-"
[13] "2005-" "2013-" "2003-" "2013-"
"1993?2007, 2010-" "2012-"
[19] "1984?1992, 1996-" "2015-" "2009-" "2000-"
"2005-" "1997-"
[25] "2012-" "1997-" "2002-" "2006-"
"1992-" "2007-"
[31] "1997-" "1982-" "2015-" "2015-"
"2010-" "1996?2007, 2011-"
[37] "2004-" "1999-" "2007-" "1996-"
"2013-" "2012-"
[43] "2012-" "2010-" "2011-" "1994-"
"2014-"
I tried below command--> gsub('[-|,]', '', end)
This did remove all the hyphen character but not from cells having
range year values.Below is the result after executing above command:
As you see hypphen character is removed from single values but not
from ranges. Please guide.
gsub('[-|,]', '', end)
[1] "2001" "1992" "2013" "2013" "2013" "2013" "2003" [8] "2010" "2009" "1986" "2012" "2003" "2005" "2013" [15] "2003" "2013" "1993?2007 2010" "2012" "1984?1992 1996" "2015" "2009" [22] "2000" "2005" "1997" "2012" "1997" "2002" "2006" [29] "1992" "2007" "1997" "1982" "2015" "2015" "2010" [36] "1996?2007 2011" "2004" "1999" "2007" "1996" "2013" "2012" [43] "2012" "2010" "2011" "1994" "2014" Regards, Sunny Singha
______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.