Skip to content

Cannot get "==" operator to return TRUE

13 messages · G See, Sarah Goslee, Duncan Murdoch +2 more

#
I have a data.frame named "df". The dput of df is at the bottom of this e-mail.
What I'd like to do is replace the "n/a " values with NA.  On Mac OSX, it works
to do this:
df[df == "n/a"] <- NA

However, it does not work on Ubuntu.  See below.

Thanks in advance,
Garrett
"n/a?"
[1] FALSE
[1] FALSE
chr "n/a?"
[1] FALSE
integer(0)
[1] 1
R version 2.14.1 (2011-12-22)
Platform: x86_64-pc-linux-gnu (64-bit)

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=C                 LC_NAME=C
 [9] LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] XML_3.4-3                  qmao_1.1.10
[3] FinancialInstrument_0.10.9 quantmod_0.3-17
[5] TTR_0.21-0                 Defaults_1.1-1
[7] xts_0.8-3                  zoo_1.7-6

loaded via a namespace (and not attached):
[1] grid_2.14.1    lattice_0.20-0 tools_2.14.1
### More detail ###
## Here is the complete data.frame
structure(list(SYMBOL = c("GOOG?", "GOOG?", "GOOG?", "GOOG?",
"GOOG?", "GOOG?", "GOOG?", "GOOG?", "GOOG?", "GOOG?", "GOOG?",
"GOOG?", "GOOG?", "GOOG?", "GOOG?", "GOOG?", "GOOG?", "GOOG?",
"GOOG?", "GOOG?", "GOOG?", "GOOG?", "GOOG?", "GOOG?", "GOOG?",
"GOOG?", "GOOG?", "GOOG?", "GOOG?", "GOOG?"), PERIOD = c("Q4?2011",
"Q3?2011", "Q2?2011", "Q1?2011", "Q4?2010", "Q3?2010", "Q2?2010",
"Q1?2010", "Q4?2009", "Q3?2009", "Q2?2009", "Q1?2009", "Q4?2008",
"Q3?2008", "Q2?2008", "Q1?2008", "Q4?2007", "Q3?2007", "Q2?2007",
"Q1?2007", "Q4?2006", "Q3?2006", "Q2?2006", "Q1?2006", "Q4?2005",
"Q3?2005", "Q2?2005", "Q1?2005", "Q4?2004", "Q3?2004"),
    `EVENT TITLE` = c("Q4 2011 Google Earnings Release", "Q3 2011
Google Inc Earnings Release",
    "Q2 2011 Google Inc Earnings Release", "Q1 2011 Google Inc
Earnings Release",
    "Q4 2010 Google Earnings Release", "Q3 2010 Google Earnings Release",
    "Q2 2010 Google Earnings Release", "Q1 2010 Google Earnings Release",
    "Q4 2009 Google Earnings Release", "Q3 2009 Google Earnings Release",
    "Q2 2009 Google Earnings Release", "Q1 2009 Google Earnings Release",
    "Q4 2008 Google Earnings Release", "Q3 2008 Google Earnings Release",
    "Q2 2008 Google Earnings Release", "Q1 2008 Google Earnings Release",
    "Q4 2007 Google Earnings Release", "Q3 2007 Google Earnings Release",
    "Q2 2007 Google Earnings Release", "Q1 2007 Google Earnings Release",
    "Q4 2006 Google Earnings Release", "Q3 2006 Google Earnings Release",
    "Q2 2006 Google Earnings Release", "Q1 2006 Google Earnings Release",
    "Q4 2005 Google Earnings Release", "Q3 2005 Google Earnings Release",
    "Q2 2005 Google Earnings Release", "Q1 2005 Google Earnings Release",
    "Q4 2004 Google Earnings Release", "Q3 2004 Google Earnings Release"
    ), `EPS ESTIMATE` = c("$ 10.49?", "$ 8.74?", "$ 7.85?",
    "$ 8.10?", "$ 8.09?", "$ 6.68?", "$ 6.52?", "$ 6.60?",
    "$ 6.50?", "$ 5.42?", "$ 5.09?", "$ 4.93?", "$ 4.95?",
    "$ 4.76?", "$ 4.74?", "$ 4.52?", "$ 4.44?", "$ 3.78?",
    "$ 3.59?", "$ 3.30?", "$ 2.92?", "$ 2.42?", "$ 2.22?",
    "$ 1.97?", "n/a?", "n/a?", "n/a?", "n/a?", "n/a?",
    "n/a?"), `EPS ACTUAL` = c("$ 9.50?", "$ 9.72?", "$ 8.74?",
    "$ 8.08?", "$ 8.75?", "$ 7.64?", "$ 6.45?", "$ 6.76?",
    "$ 6.79?", "$ 5.89?", "$ 5.36?", "$ 5.16?", "$ 5.10?",
    "$ 4.92?", "$ 4.63?", "$ 4.84?", "$ 4.43?", "$ 3.91?",
    "$ 3.56?", "$ 3.68?", "$ 3.18?", "$ 2.62?", "$ 2.49?",
    "$ 2.29?", "n/a?", "n/a?", "n/a?", "n/a?", "n/a?",
    "n/a?"), `PREV. YEAR ACTUAL` = c("$ 8.75?", "$ 7.64?",
    "$ 6.45?", "$ 6.76?", "$ 6.79?", "$ 5.89?", "$ 5.36?",
    "$ 5.16?", "$ 5.10?", "$ 4.92?", "$ 4.63?", "$ 4.84?",
    "$ 4.43?", "$ 3.91?", "$ 3.56?", "$ 3.68?", "$ 3.18?",
    "$ 2.62?", "$ 2.49?", "$ 2.29?", "n/a?", "n/a?", "n/a?",
    "n/a?", "n/a?", "n/a?", "n/a?", "n/a?", "n/a?", "n/a?"
    ), TIME = c("2012-01-19 15:15:00 CST", "2011-10-13 15:15:00 CDT",
    "2011-07-14 15:15:00 CDT", "2011-04-14 15:15:00 CDT", "2011-01-20
15:15:00 CST",
    "2010-10-14 15:15:00 CDT", "2010-07-15 15:15:00 CDT", "2010-04-15
15:15:00 CDT",
    "2010-01-21 15:15:00 CST", "2009-10-15 15:15:00 CDT", "2009-07-16
15:15:00 CDT",
    "2009-04-16 15:15:00 CDT", "2009-01-22 15:15:00 CST", "2008-10-16
15:15:00 CDT",
    "2008-07-17 15:15:00 CDT", "2008-04-17 15:15:00 CDT", "2008-01-31
15:15:00 CST",
    "2007-10-18 15:15:00 CDT", "2007-07-19 15:15:00 CDT", "2007-04-19
15:15:00 CDT",
    "2007-01-31 15:15:00 CST", "2006-10-19 15:15:00 CDT", "2006-07-20
15:15:00 CDT",
    "2006-04-20 15:15:00 CDT", "2006-01-31 15:15:00 CST", "2005-10-20
15:15:00 CDT",
    "2005-07-21 15:15:00 CDT", "2005-04-21 15:15:00 CDT", "2005-02-01
15:15:00 CST",
    "2004-10-21 15:15:00 CDT")), .Names = c("SYMBOL", "PERIOD",
"EVENT TITLE", "EPS ESTIMATE", "EPS ACTUAL", "PREV. YEAR ACTUAL",
"TIME"), row.names = 2:31, na.action = structure(31L, .Names = "32",
class = "omit"), class = "data.frame")
#
Is that exactly what you're doing, in a clean session?

x <- rdata[27, 4]
[1] TRUE
[1] FALSE

Because as long as the space is included, the test should be TRUE.

(I renamed the dput object rdata, because df() is a base function.)

df[df == "n/a"] <- NA
shouldn't work on Mac, or any other system, because no elements of
your data frame are "n/a", but are instead "n/a "

If it were my data, I'd get rid of the spaces at the end of the values before
trying to do anything, either before reading it into R, or with gsub() after.

Sarah
On Fri, Feb 3, 2012 at 10:25 AM, G See <gsee000 at gmail.com> wrote:

  
    
#
On Fri, Feb 03, 2012 at 09:25:10AM -0600, G See wrote:
Hi.

This string contains a no-break space, not a space.

  "n/a?" == "n/a\uA0"

  [1] TRUE

  "n/a\uA0"

  [1] "n/a?"

Hope this helps.

Petr Savicky.
#
Hi Sarah,

Thank you very much for the response.

In fact, it does work on Mac even without including the space:
Loading required package: XML
[1] FALSE
[1] TRUE

Garrett
On Fri, Feb 3, 2012 at 9:57 AM, Sarah Goslee <sarah.goslee at gmail.com> wrote:
#
Petr,

Thank you!  That is great.

Do you know of a way to print a string such that I can see whether it
contains a string or a no-break space?

Thanks,
Garrett
On Fri, Feb 3, 2012 at 10:01 AM, Petr Savicky <savicky at cs.cas.cz> wrote:
#
Sorry, I meant
Do you know of a way to print a string such that I can see whether it
contains a *space* or a no-break space?
On Fri, Feb 3, 2012 at 10:10 AM, G See <gsee000 at gmail.com> wrote:
#
On 12-02-03 10:25 AM, G See wrote:
One would expect the first of these to be TRUE, but the second 
shouldn't.  On my system that's what happens.

Is this still repeatable in a new session?  If so, can you show us what 
you get from charToRaw?  I get

 > charToRaw(x)
[1] 6e 2f 61 20

but perhaps you have some different character in the fourth position, 
one which just happens to display as a space.

If it is not repeatable in a new session, then it's hard to guess what 
went wrong, but conceivably memory corruption somewhere could have 
caused this.  It would be worthwhile keeping track of what you were 
doing if it ever happens again.

Duncan Murdoch
#
On 12-02-03 11:10 AM, G See wrote:
Use tools::showNonASCII(x).  On Petr's example, it gives

1: n/a<c2><a0>

Duncan Murdoch
#
Thank you Duncan, that is very helpful.

Although I think we've got it sorted out now, to answer your previous
questions,  it is repeatable in a new R session, and the output of
charToRaw is below.

On Ubuntu, I get the following:
[1] 6e 2f 61 c2 a0

On Mac, I get:
[1] 6e 2f 61

Thanks to all for the help,
Garrett

On Fri, Feb 3, 2012 at 10:19 AM, Duncan Murdoch
<murdoch.duncan at gmail.com> wrote:
#
On Fri, Feb 03, 2012 at 10:10:56AM -0600, G See wrote:
Hi.

For unknown characters, the following may be useful

  x <- "n/a?"

  library(Unicode)
  u_char_inspect(as.u_char_seq(x, ""))

      Code                 Name Char
  1 U+006E LATIN SMALL LETTER N    n
  2 U+002F              SOLIDUS    /
  3 U+0061 LATIN SMALL LETTER A    a
  4 U+00A0       NO-BREAK SPACE    ?

Petr Savicky.
#
On Feb 3, 2012, at 17:23 , G See wrote:

            
So that's a nonbreak space alright. Next question: How did it get there? I'm mildly surprised that it crept into the data frame, I would expect it to happen much easier with things typed on the keyboard (Alt-Spc on my Mac keyboard, e.g.).
#
On Fri, Feb 3, 2012 at 10:39 AM, peter dalgaard <pdalgd at gmail.com> wrote:
Peter,
I won't venture to guess how, but this will do it.
[1] 6e 2f 61 c2 a0

Garrett
#
On Feb 3, 2012, at 18:03 , G See wrote:

            
OK, if you look at the source for that page, it actually contains stuff like

<td align="center">n/a&#160;</td>

and &#160; is the infamous \uA0 alias nonbreak space. So the odd thing might actually be that the Mac manages to lose the trailing nonbreak space, whereas other systems do not. AFAICS, this boils down to the matching of [[:space:]] inside
function (x) 
gsub("(^[[:space:]]+|[[:space:]]+$)", "", x)
<environment: namespace:XML>

A locale dependency, perhaps?