An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20110429/20773bd1/attachment.pl>
regular expression in gsub() for strings with leading backslash
6 messages · Duncan Murdoch, Mike Miller, Miao
On 29/04/2011 7:41 PM, Miao wrote:
Hello, Can anyone help on gsub() in R? I have a string like something below, and wanted to delete all the strings with leading backslash, including "\xa0On", "\023, "\xab", and many others. How should I write a regular expression pattern in gsub()? I don't care how many characters following backslash.
If those are R strings, none of them contain a backslash. In R, a backslash would always be printed as \\. \x is the introduction to a hexadecimal encoding for a character; the next two characters show the hex digits. So your first string contains a single character \xa0, the third one contains \xab, and so on. The \023 is an octal encoding for a single character. Duncan Murdoch
txt<- "Is This Thing\xa0On? http://bit.ly/jAbKem wait \023 for people \xab and be patient :" Thanks in advance, Miao [[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20110429/e0e60a4d/attachment.pl>
On 29/04/2011 9:34 PM, Miao wrote:
Thanks Duncan for clarifying this. I'm pretty a newbie to such type of characters and special characters. In R's gsub() what regular expressions shall I use to handle all these situations?
I don't know. This might work:
gsub("[\x01-\x1f\x7f-\xff]", "", x)
(i.e. the range from character 1 to character 31, and 127 to 255) but I
don't know if our regular expression matcher will accept those characters.
Duncan Murdoch
On Fri, Apr 29, 2011 at 6:07 PM, Duncan Murdoch
<murdoch.duncan at gmail.com <mailto:murdoch.duncan at gmail.com>> wrote:
On 29/04/2011 7:41 PM, Miao wrote:
Hello,
Can anyone help on gsub() in R? I have a string like something
below, and
wanted to delete all the strings with leading backslash,
including "\xa0On",
"\023, "\xab", and many others. How should I write a regular
expression
pattern in gsub()? I don't care how many characters following
backslash.
If those are R strings, none of them contain a backslash. In R, a
backslash would always be printed as \\.
\x is the introduction to a hexadecimal encoding for a character;
the next two characters show the hex digits. So your first string
contains a single character \xa0, the third one contains \xab, and
so on.
The \023 is an octal encoding for a single character.
Duncan Murdoch
txt<- "Is This Thing\xa0On? http://bit.ly/jAbKem wait \023 for
people \xab
and be patient :"
Thanks in advance,
Miao
[[alternative HTML version deleted]]
______________________________________________
R-help at r-project.org <mailto:R-help at r-project.org> mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
--
proceed everyday
On Fri, 29 Apr 2011, Duncan Murdoch wrote:
On 29/04/2011 7:41 PM, Miao wrote:
Can anyone help on gsub() in R? I have a string like something below, and wanted to delete all the strings with leading backslash, including "\xa0On", "\023, "\xab", and many others. How should I write a regular expression pattern in gsub()? I don't care how many characters following backslash.
If those are R strings, none of them contain a backslash. In R, a backslash would always be printed as \\. \x is the introduction to a hexadecimal encoding for a character; the next two characters show the hex digits. So your first string contains a single character \xa0, the third one contains \xab, and so on. The \023 is an octal encoding for a single character.
If we were dealing with a leading backslash, I guess this would do it:
gsub("^\\\\.*", "", txt)
R would display a double backslash, but I believe that represents a single
backslash. So if the string were saved using write.table, say, only a
single backslash would be stored.
a <- "\\This is a string." a
[1] "\\This is a string."
gsub("^\\\\", "", a)
[1] "This is a string."
a
[1] "\\This is a string."
gsub("^\\\\.*", "", a)
[1] ""
gsub("^\\\\.*", "", c(a,"Another string","\\more"))
[1] "" "Another string" ""
write.table(a, file="a.txt", quote=F, row.names=F, col.names=F)
$ cat a.txt \This is a string. Apparently this is not what the OP really wanted. The OP probably wanted to remove characters that were not from the regular ASCII set. Mike -- Michael B. Miller, Ph.D. Minnesota Center for Twin and Family Research Department of Psychology University of Minnesota
An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20110429/5bed497c/attachment.pl>