I have a data.frame named "df". The dput of df is at the bottom of this e-mail.
What I'd like to do is replace the "n/a " values with NA. On Mac OSX, it works
to do this:
df[df == "n/a"] <- NA
However, it does not work on Ubuntu. See below.
Thanks in advance,
Garrett
x <- df[27, 4] # complete data.frame dput is below
dput(x)
"n/a?"
x == "n/a "
[1] FALSE
x == "n/a"
[1] FALSE
str(x)
chr "n/a?"
is.na(x)
[1] FALSE
grep("n/a ", x)
integer(0)
grep("n/a", x)
[1] 1
sessionInfo()
R version 2.14.1 (2011-12-22)
Platform: x86_64-pc-linux-gnu (64-bit)
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=C LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] XML_3.4-3 qmao_1.1.10
[3] FinancialInstrument_0.10.9 quantmod_0.3-17
[5] TTR_0.21-0 Defaults_1.1-1
[7] xts_0.8-3 zoo_1.7-6
loaded via a namespace (and not attached):
[1] grid_2.14.1 lattice_0.20-0 tools_2.14.1
### More detail ###
## Here is the complete data.frame
Is that exactly what you're doing, in a clean session?
x <- rdata[27, 4]
x == "n/a "
[1] TRUE
x == "n/a"
[1] FALSE
Because as long as the space is included, the test should be TRUE.
(I renamed the dput object rdata, because df() is a base function.)
df[df == "n/a"] <- NA
shouldn't work on Mac, or any other system, because no elements of
your data frame are "n/a", but are instead "n/a "
If it were my data, I'd get rid of the spaces at the end of the values before
trying to do anything, either before reading it into R, or with gsub() after.
Sarah
On Fri, Feb 3, 2012 at 10:25 AM, G See <gsee000 at gmail.com> wrote:
I have a data.frame named "df". The dput of df is at the bottom of this e-mail.
What I'd like to do is replace the "n/a " values with NA. ?On Mac OSX, it works
to do this:
df[df == "n/a"] <- NA
However, it does not work on Ubuntu. ?See below.
Thanks in advance,
Garrett
x <- df[27, 4] # complete data.frame dput is below
dput(x)
On Fri, Feb 03, 2012 at 09:25:10AM -0600, G See wrote:
I have a data.frame named "df". The dput of df is at the bottom of this e-mail.
What I'd like to do is replace the "n/a " values with NA. On Mac OSX, it works
to do this:
df[df == "n/a"] <- NA
However, it does not work on Ubuntu. See below.
Thanks in advance,
Garrett
x <- df[27, 4] # complete data.frame dput is below
dput(x)
"n/a?"
Hi.
This string contains a no-break space, not a space.
"n/a?" == "n/a\uA0"
[1] TRUE
"n/a\uA0"
[1] "n/a?"
Hope this helps.
Petr Savicky.
On Fri, Feb 3, 2012 at 9:57 AM, Sarah Goslee <sarah.goslee at gmail.com> wrote:
Is that exactly what you're doing, in a clean session?
x <- rdata[27, 4]
x == "n/a "
[1] TRUE
x == "n/a"
[1] FALSE
Because as long as the space is included, the test should be TRUE.
(I renamed the dput object rdata, because df() is a base function.)
df[df == "n/a"] <- NA
shouldn't work on Mac, or any other system, because no elements of
your data frame are "n/a", but are instead "n/a "
If it were my data, I'd get rid of the spaces at the end of the values before
trying to do anything, either before reading it into R, or with gsub() after.
Sarah
On Fri, Feb 3, 2012 at 10:25 AM, G See <gsee000 at gmail.com> wrote:
I have a data.frame named "df". The dput of df is at the bottom of this e-mail.
What I'd like to do is replace the "n/a " values with NA. ?On Mac OSX, it works
to do this:
df[df == "n/a"] <- NA
However, it does not work on Ubuntu. ?See below.
Thanks in advance,
Garrett
x <- df[27, 4] # complete data.frame dput is below
dput(x)
Petr,
Thank you! That is great.
Do you know of a way to print a string such that I can see whether it
contains a string or a no-break space?
Thanks,
Garrett
On Fri, Feb 3, 2012 at 10:01 AM, Petr Savicky <savicky at cs.cas.cz> wrote:
On Fri, Feb 03, 2012 at 09:25:10AM -0600, G See wrote:
I have a data.frame named "df". The dput of df is at the bottom of this e-mail.
What I'd like to do is replace the "n/a " values with NA. ?On Mac OSX, it works
to do this:
df[df == "n/a"] <- NA
However, it does not work on Ubuntu. ?See below.
Thanks in advance,
Garrett
x <- df[27, 4] # complete data.frame dput is below
dput(x)
"n/a?"
Hi.
This string contains a no-break space, not a space.
?"n/a?" == "n/a\uA0"
?[1] TRUE
?"n/a\uA0"
?[1] "n/a?"
Hope this helps.
Petr Savicky.
Sorry, I meant
Do you know of a way to print a string such that I can see whether it
contains a *space* or a no-break space?
On Fri, Feb 3, 2012 at 10:10 AM, G See <gsee000 at gmail.com> wrote:
Petr,
Thank you! ?That is great.
Do you know of a way to print a string such that I can see whether it
contains a string or a no-break space?
Thanks,
Garrett
On Fri, Feb 3, 2012 at 10:01 AM, Petr Savicky <savicky at cs.cas.cz> wrote:
On Fri, Feb 03, 2012 at 09:25:10AM -0600, G See wrote:
I have a data.frame named "df". The dput of df is at the bottom of this e-mail.
What I'd like to do is replace the "n/a " values with NA. ?On Mac OSX, it works
to do this:
df[df == "n/a"] <- NA
However, it does not work on Ubuntu. ?See below.
Thanks in advance,
Garrett
x <- df[27, 4] # complete data.frame dput is below
dput(x)
"n/a?"
Hi.
This string contains a no-break space, not a space.
?"n/a?" == "n/a\uA0"
?[1] TRUE
?"n/a\uA0"
?[1] "n/a?"
Hope this helps.
Petr Savicky.
I have a data.frame named "df". The dput of df is at the bottom of this e-mail.
What I'd like to do is replace the "n/a " values with NA. On Mac OSX, it works
to do this:
df[df == "n/a"]<- NA
However, it does not work on Ubuntu. See below.
Thanks in advance,
Garrett
x<- df[27, 4] # complete data.frame dput is below
dput(x)
"n/a "
x == "n/a "
[1] FALSE
x == "n/a"
[1] FALSE
One would expect the first of these to be TRUE, but the second
shouldn't. On my system that's what happens.
Is this still repeatable in a new session? If so, can you show us what
you get from charToRaw? I get
> charToRaw(x)
[1] 6e 2f 61 20
but perhaps you have some different character in the fourth position,
one which just happens to display as a space.
If it is not repeatable in a new session, then it's hard to guess what
went wrong, but conceivably memory corruption somewhere could have
caused this. It would be worthwhile keeping track of what you were
doing if it ever happens again.
Duncan Murdoch
str(x)
chr "n/a "
is.na(x)
[1] FALSE
grep("n/a ", x)
integer(0)
grep("n/a", x)
[1] 1
sessionInfo()
R version 2.14.1 (2011-12-22)
Platform: x86_64-pc-linux-gnu (64-bit)
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=C LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] XML_3.4-3 qmao_1.1.10
[3] FinancialInstrument_0.10.9 quantmod_0.3-17
[5] TTR_0.21-0 Defaults_1.1-1
[7] xts_0.8-3 zoo_1.7-6
loaded via a namespace (and not attached):
[1] grid_2.14.1 lattice_0.20-0 tools_2.14.1
### More detail ###
## Here is the complete data.frame
Sorry, I meant
Do you know of a way to print a string such that I can see whether it
contains a *space* or a no-break space?
Use tools::showNonASCII(x). On Petr's example, it gives
1: n/a<c2><a0>
Duncan Murdoch
On Fri, Feb 3, 2012 at 10:10 AM, G See<gsee000 at gmail.com> wrote:
Petr,
Thank you! That is great.
Do you know of a way to print a string such that I can see whether it
contains a string or a no-break space?
Thanks,
Garrett
On Fri, Feb 3, 2012 at 10:01 AM, Petr Savicky<savicky at cs.cas.cz> wrote:
On Fri, Feb 03, 2012 at 09:25:10AM -0600, G See wrote:
I have a data.frame named "df". The dput of df is at the bottom of this e-mail.
What I'd like to do is replace the "n/a " values with NA. On Mac OSX, it works
to do this:
df[df == "n/a"]<- NA
However, it does not work on Ubuntu. See below.
Thanks in advance,
Garrett
x<- df[27, 4] # complete data.frame dput is below
dput(x)
"n/a "
Hi.
This string contains a no-break space, not a space.
"n/a " == "n/a\uA0"
[1] TRUE
"n/a\uA0"
[1] "n/a "
Hope this helps.
Petr Savicky.
Thank you Duncan, that is very helpful.
Although I think we've got it sorted out now, to answer your previous
questions, it is repeatable in a new R session, and the output of
charToRaw is below.
On Ubuntu, I get the following:
charToRaw(x)
[1] 6e 2f 61 c2 a0
On Mac, I get:
charToRaw(x)
[1] 6e 2f 61
Thanks to all for the help,
Garrett
On Fri, Feb 3, 2012 at 10:19 AM, Duncan Murdoch
<murdoch.duncan at gmail.com> wrote:
On 12-02-03 11:10 AM, G See wrote:
Sorry, I meant
Do you know of a way to print a string such that I can see whether it
contains a *space* or a no-break space?
Use tools::showNonASCII(x). ?On Petr's example, it gives
1: n/a<c2><a0>
Duncan Murdoch
On Fri, Feb 3, 2012 at 10:10 AM, G See<gsee000 at gmail.com> ?wrote:
Petr,
Thank you! ?That is great.
Do you know of a way to print a string such that I can see whether it
contains a string or a no-break space?
Thanks,
Garrett
On Fri, Feb 3, 2012 at 10:01 AM, Petr Savicky<savicky at cs.cas.cz> ?wrote:
On Fri, Feb 03, 2012 at 09:25:10AM -0600, G See wrote:
I have a data.frame named "df". The dput of df is at the bottom of this
e-mail.
What I'd like to do is replace the "n/a " values with NA. ?On Mac OSX,
it works
to do this:
df[df == "n/a"]<- NA
However, it does not work on Ubuntu. ?See below.
Thanks in advance,
Garrett
x<- df[27, 4] # complete data.frame dput is below
dput(x)
"n/a "
Hi.
This string contains a no-break space, not a space.
?"n/a " == "n/a\uA0"
?[1] TRUE
?"n/a\uA0"
?[1] "n/a "
Hope this helps.
Petr Savicky.
On Fri, Feb 03, 2012 at 10:10:56AM -0600, G See wrote:
Sorry, I meant
Do you know of a way to print a string such that I can see whether it
contains a *space* or a no-break space?
Hi.
For unknown characters, the following may be useful
x <- "n/a?"
library(Unicode)
u_char_inspect(as.u_char_seq(x, ""))
Code Name Char
1 U+006E LATIN SMALL LETTER N n
2 U+002F SOLIDUS /
3 U+0061 LATIN SMALL LETTER A a
4 U+00A0 NO-BREAK SPACE ?
Petr Savicky.
Thank you Duncan, that is very helpful.
Although I think we've got it sorted out now, to answer your previous
questions, it is repeatable in a new R session, and the output of
charToRaw is below.
On Ubuntu, I get the following:
charToRaw(x)
[1] 6e 2f 61 c2 a0
So that's a nonbreak space alright. Next question: How did it get there? I'm mildly surprised that it crept into the data frame, I would expect it to happen much easier with things typed on the keyboard (Alt-Spc on my Mac keyboard, e.g.).
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: pd.mes at cbs.dk Priv: PDalgd at gmail.com
On Fri, Feb 3, 2012 at 10:39 AM, peter dalgaard <pdalgd at gmail.com> wrote:
So that's a nonbreak space alright. Next question: How did it get there? I'm mildly surprised that it crept into the data frame, I would expect it to happen much easier with things typed on the keyboard (Alt-Spc on my Mac keyboard, e.g.).
Peter,
I won't venture to guess how, but this will do it.
On Fri, Feb 3, 2012 at 10:39 AM, peter dalgaard <pdalgd at gmail.com> wrote:
So that's a nonbreak space alright. Next question: How did it get there? I'm mildly surprised that it crept into the data frame, I would expect it to happen much easier with things typed on the keyboard (Alt-Spc on my Mac keyboard, e.g.).
Peter,
I won't venture to guess how, but this will do it.
OK, if you look at the source for that page, it actually contains stuff like
<td align="center">n/a </td>
and   is the infamous \uA0 alias nonbreak space. So the odd thing might actually be that the Mac manages to lose the trailing nonbreak space, whereas other systems do not. AFAICS, this boils down to the matching of [[:space:]] inside
XML:::trim
function (x)
gsub("(^[[:space:]]+|[[:space:]]+$)", "", x)
<environment: namespace:XML>
A locale dependency, perhaps?
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: pd.mes at cbs.dk Priv: PDalgd at gmail.com