Cannot get "==" operator to return TRUE

13 messages · G See, Sarah Goslee, Duncan Murdoch +2 more

Original

1

13

G See

Fri, Feb 3, 2012 7:25 AM #

I have a data.frame named "df". The dput of df is at the bottom of this e-mail.
What I'd like to do is replace the "n/a " values with NA.  On Mac OSX, it works
to do this:
df[df == "n/a"] <- NA

However, it does not work on Ubuntu.  See below.

Thanks in advance,
Garrett

"n/a?"

[1] FALSE

[1] FALSE

chr "n/a?"

[1] FALSE

integer(0)

[1] 1

R version 2.14.1 (2011-12-22)
Platform: x86_64-pc-linux-gnu (64-bit)

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=C                 LC_NAME=C
 [9] LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] XML_3.4-3                  qmao_1.1.10
[3] FinancialInstrument_0.10.9 quantmod_0.3-17
[5] TTR_0.21-0                 Defaults_1.1-1
[7] xts_0.8-3                  zoo_1.7-6

loaded via a namespace (and not attached):
[1] grid_2.14.1    lattice_0.20-0 tools_2.14.1

### More detail ###
## Here is the complete data.frame

structure(list(SYMBOL = c("GOOG?", "GOOG?", "GOOG?", "GOOG?",
"GOOG?", "GOOG?", "GOOG?", "GOOG?", "GOOG?", "GOOG?", "GOOG?",
"GOOG?", "GOOG?", "GOOG?", "GOOG?", "GOOG?", "GOOG?", "GOOG?",
"GOOG?", "GOOG?", "GOOG?", "GOOG?", "GOOG?", "GOOG?", "GOOG?",
"GOOG?", "GOOG?", "GOOG?", "GOOG?", "GOOG?"), PERIOD = c("Q4?2011",
"Q3?2011", "Q2?2011", "Q1?2011", "Q4?2010", "Q3?2010", "Q2?2010",
"Q1?2010", "Q4?2009", "Q3?2009", "Q2?2009", "Q1?2009", "Q4?2008",
"Q3?2008", "Q2?2008", "Q1?2008", "Q4?2007", "Q3?2007", "Q2?2007",
"Q1?2007", "Q4?2006", "Q3?2006", "Q2?2006", "Q1?2006", "Q4?2005",
"Q3?2005", "Q2?2005", "Q1?2005", "Q4?2004", "Q3?2004"),
    `EVENT TITLE` = c("Q4 2011 Google Earnings Release", "Q3 2011
Google Inc Earnings Release",
    "Q2 2011 Google Inc Earnings Release", "Q1 2011 Google Inc
Earnings Release",
    "Q4 2010 Google Earnings Release", "Q3 2010 Google Earnings Release",
    "Q2 2010 Google Earnings Release", "Q1 2010 Google Earnings Release",
    "Q4 2009 Google Earnings Release", "Q3 2009 Google Earnings Release",
    "Q2 2009 Google Earnings Release", "Q1 2009 Google Earnings Release",
    "Q4 2008 Google Earnings Release", "Q3 2008 Google Earnings Release",
    "Q2 2008 Google Earnings Release", "Q1 2008 Google Earnings Release",
    "Q4 2007 Google Earnings Release", "Q3 2007 Google Earnings Release",
    "Q2 2007 Google Earnings Release", "Q1 2007 Google Earnings Release",
    "Q4 2006 Google Earnings Release", "Q3 2006 Google Earnings Release",
    "Q2 2006 Google Earnings Release", "Q1 2006 Google Earnings Release",
    "Q4 2005 Google Earnings Release", "Q3 2005 Google Earnings Release",
    "Q2 2005 Google Earnings Release", "Q1 2005 Google Earnings Release",
    "Q4 2004 Google Earnings Release", "Q3 2004 Google Earnings Release"
    ), `EPS ESTIMATE` = c("$ 10.49?", "$ 8.74?", "$ 7.85?",
    "$ 8.10?", "$ 8.09?", "$ 6.68?", "$ 6.52?", "$ 6.60?",
    "$ 6.50?", "$ 5.42?", "$ 5.09?", "$ 4.93?", "$ 4.95?",
    "$ 4.76?", "$ 4.74?", "$ 4.52?", "$ 4.44?", "$ 3.78?",
    "$ 3.59?", "$ 3.30?", "$ 2.92?", "$ 2.42?", "$ 2.22?",
    "$ 1.97?", "n/a?", "n/a?", "n/a?", "n/a?", "n/a?",
    "n/a?"), `EPS ACTUAL` = c("$ 9.50?", "$ 9.72?", "$ 8.74?",
    "$ 8.08?", "$ 8.75?", "$ 7.64?", "$ 6.45?", "$ 6.76?",
    "$ 6.79?", "$ 5.89?", "$ 5.36?", "$ 5.16?", "$ 5.10?",
    "$ 4.92?", "$ 4.63?", "$ 4.84?", "$ 4.43?", "$ 3.91?",
    "$ 3.56?", "$ 3.68?", "$ 3.18?", "$ 2.62?", "$ 2.49?",
    "$ 2.29?", "n/a?", "n/a?", "n/a?", "n/a?", "n/a?",
    "n/a?"), `PREV. YEAR ACTUAL` = c("$ 8.75?", "$ 7.64?",
    "$ 6.45?", "$ 6.76?", "$ 6.79?", "$ 5.89?", "$ 5.36?",
    "$ 5.16?", "$ 5.10?", "$ 4.92?", "$ 4.63?", "$ 4.84?",
    "$ 4.43?", "$ 3.91?", "$ 3.56?", "$ 3.68?", "$ 3.18?",
    "$ 2.62?", "$ 2.49?", "$ 2.29?", "n/a?", "n/a?", "n/a?",
    "n/a?", "n/a?", "n/a?", "n/a?", "n/a?", "n/a?", "n/a?"
    ), TIME = c("2012-01-19 15:15:00 CST", "2011-10-13 15:15:00 CDT",
    "2011-07-14 15:15:00 CDT", "2011-04-14 15:15:00 CDT", "2011-01-20
15:15:00 CST",
    "2010-10-14 15:15:00 CDT", "2010-07-15 15:15:00 CDT", "2010-04-15
15:15:00 CDT",
    "2010-01-21 15:15:00 CST", "2009-10-15 15:15:00 CDT", "2009-07-16
15:15:00 CDT",
    "2009-04-16 15:15:00 CDT", "2009-01-22 15:15:00 CST", "2008-10-16
15:15:00 CDT",
    "2008-07-17 15:15:00 CDT", "2008-04-17 15:15:00 CDT", "2008-01-31
15:15:00 CST",
    "2007-10-18 15:15:00 CDT", "2007-07-19 15:15:00 CDT", "2007-04-19
15:15:00 CDT",
    "2007-01-31 15:15:00 CST", "2006-10-19 15:15:00 CDT", "2006-07-20
15:15:00 CDT",
    "2006-04-20 15:15:00 CDT", "2006-01-31 15:15:00 CST", "2005-10-20
15:15:00 CDT",
    "2005-07-21 15:15:00 CDT", "2005-04-21 15:15:00 CDT", "2005-02-01
15:15:00 CST",
    "2004-10-21 15:15:00 CDT")), .Names = c("SYMBOL", "PERIOD",
"EVENT TITLE", "EPS ESTIMATE", "EPS ACTUAL", "PREV. YEAR ACTUAL",
"TIME"), row.names = 2:31, na.action = structure(31L, .Names = "32",
class = "omit"), class = "data.frame")

Sarah Goslee

Fri, Feb 3, 2012 7:57 AM #

Is that exactly what you're doing, in a clean session?

x <- rdata[27, 4]

[1] TRUE

[1] FALSE

Because as long as the space is included, the test should be TRUE.

(I renamed the dput object rdata, because df() is a base function.)

df[df == "n/a"] <- NA
shouldn't work on Mac, or any other system, because no elements of
your data frame are "n/a", but are instead "n/a "

If it were my data, I'd get rid of the spaces at the end of the values before
trying to do anything, either before reading it into R, or with gsub() after.

Sarah

On Fri, Feb 3, 2012 at 10:25 AM, G See <gsee000 at gmail.com> wrote:

I have a data.frame named "df". The dput of df is at the bottom of this e-mail.
What I'd like to do is replace the "n/a " values with NA. ?On Mac OSX, it works
to do this:
df[df == "n/a"] <- NA

However, it does not work on Ubuntu. ?See below.

Thanks in advance,
Garrett

x <- df[27, 4] # complete data.frame dput is below
dput(x)

"n/a?"

x == "n/a "

[1] FALSE

x == "n/a"

[1] FALSE

str(x)

?chr "n/a?"

is.na(x)

[1] FALSE

grep("n/a ", x)

integer(0)

grep("n/a", x)

[1] 1

sessionInfo()

R version 2.14.1 (2011-12-22)
Platform: x86_64-pc-linux-gnu (64-bit)

locale:
?[1] LC_CTYPE=en_US.UTF-8 ? ? ? LC_NUMERIC=C
?[3] LC_TIME=en_US.UTF-8 ? ? ? ?LC_COLLATE=en_US.UTF-8
?[5] LC_MONETARY=en_US.UTF-8 ? ?LC_MESSAGES=en_US.UTF-8
?[7] LC_PAPER=C ? ? ? ? ? ? ? ? LC_NAME=C
?[9] LC_ADDRESS=C ? ? ? ? ? ? ? LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats ? ? graphics ?grDevices utils ? ? datasets ?methods ? base

other attached packages:
[1] XML_3.4-3 ? ? ? ? ? ? ? ? ?qmao_1.1.10
[3] FinancialInstrument_0.10.9 quantmod_0.3-17
[5] TTR_0.21-0 ? ? ? ? ? ? ? ? Defaults_1.1-1
[7] xts_0.8-3 ? ? ? ? ? ? ? ? ?zoo_1.7-6

loaded via a namespace (and not attached):
[1] grid_2.14.1 ? ?lattice_0.20-0 tools_2.14.1


### More detail ###
## Here is the complete data.frame

dput(df)

structure(list(SYMBOL = c("GOOG?", "GOOG?", "GOOG?", "GOOG?",
"GOOG?", "GOOG?", "GOOG?", "GOOG?", "GOOG?", "GOOG?", "GOOG?",
"GOOG?", "GOOG?", "GOOG?", "GOOG?", "GOOG?", "GOOG?", "GOOG?",
"GOOG?", "GOOG?", "GOOG?", "GOOG?", "GOOG?", "GOOG?", "GOOG?",
"GOOG?", "GOOG?", "GOOG?", "GOOG?", "GOOG?"), PERIOD = c("Q4?2011",
"Q3?2011", "Q2?2011", "Q1?2011", "Q4?2010", "Q3?2010", "Q2?2010",
"Q1?2010", "Q4?2009", "Q3?2009", "Q2?2009", "Q1?2009", "Q4?2008",
"Q3?2008", "Q2?2008", "Q1?2008", "Q4?2007", "Q3?2007", "Q2?2007",
"Q1?2007", "Q4?2006", "Q3?2006", "Q2?2006", "Q1?2006", "Q4?2005",
"Q3?2005", "Q2?2005", "Q1?2005", "Q4?2004", "Q3?2004"),
? ?`EVENT TITLE` = c("Q4 2011 Google Earnings Release", "Q3 2011
Google Inc Earnings Release",
? ?"Q2 2011 Google Inc Earnings Release", "Q1 2011 Google Inc
Earnings Release",
? ?"Q4 2010 Google Earnings Release", "Q3 2010 Google Earnings Release",
? ?"Q2 2010 Google Earnings Release", "Q1 2010 Google Earnings Release",
? ?"Q4 2009 Google Earnings Release", "Q3 2009 Google Earnings Release",
? ?"Q2 2009 Google Earnings Release", "Q1 2009 Google Earnings Release",
? ?"Q4 2008 Google Earnings Release", "Q3 2008 Google Earnings Release",
? ?"Q2 2008 Google Earnings Release", "Q1 2008 Google Earnings Release",
? ?"Q4 2007 Google Earnings Release", "Q3 2007 Google Earnings Release",
? ?"Q2 2007 Google Earnings Release", "Q1 2007 Google Earnings Release",
? ?"Q4 2006 Google Earnings Release", "Q3 2006 Google Earnings Release",
? ?"Q2 2006 Google Earnings Release", "Q1 2006 Google Earnings Release",
? ?"Q4 2005 Google Earnings Release", "Q3 2005 Google Earnings Release",
? ?"Q2 2005 Google Earnings Release", "Q1 2005 Google Earnings Release",
? ?"Q4 2004 Google Earnings Release", "Q3 2004 Google Earnings Release"
? ?), `EPS ESTIMATE` = c("$ 10.49?", "$ 8.74?", "$ 7.85?",
? ?"$ 8.10?", "$ 8.09?", "$ 6.68?", "$ 6.52?", "$ 6.60?",
? ?"$ 6.50?", "$ 5.42?", "$ 5.09?", "$ 4.93?", "$ 4.95?",
? ?"$ 4.76?", "$ 4.74?", "$ 4.52?", "$ 4.44?", "$ 3.78?",
? ?"$ 3.59?", "$ 3.30?", "$ 2.92?", "$ 2.42?", "$ 2.22?",
? ?"$ 1.97?", "n/a?", "n/a?", "n/a?", "n/a?", "n/a?",
? ?"n/a?"), `EPS ACTUAL` = c("$ 9.50?", "$ 9.72?", "$ 8.74?",
? ?"$ 8.08?", "$ 8.75?", "$ 7.64?", "$ 6.45?", "$ 6.76?",
? ?"$ 6.79?", "$ 5.89?", "$ 5.36?", "$ 5.16?", "$ 5.10?",
? ?"$ 4.92?", "$ 4.63?", "$ 4.84?", "$ 4.43?", "$ 3.91?",
? ?"$ 3.56?", "$ 3.68?", "$ 3.18?", "$ 2.62?", "$ 2.49?",
? ?"$ 2.29?", "n/a?", "n/a?", "n/a?", "n/a?", "n/a?",
? ?"n/a?"), `PREV. YEAR ACTUAL` = c("$ 8.75?", "$ 7.64?",
? ?"$ 6.45?", "$ 6.76?", "$ 6.79?", "$ 5.89?", "$ 5.36?",
? ?"$ 5.16?", "$ 5.10?", "$ 4.92?", "$ 4.63?", "$ 4.84?",
? ?"$ 4.43?", "$ 3.91?", "$ 3.56?", "$ 3.68?", "$ 3.18?",
? ?"$ 2.62?", "$ 2.49?", "$ 2.29?", "n/a?", "n/a?", "n/a?",
? ?"n/a?", "n/a?", "n/a?", "n/a?", "n/a?", "n/a?", "n/a?"
? ?), TIME = c("2012-01-19 15:15:00 CST", "2011-10-13 15:15:00 CDT",
? ?"2011-07-14 15:15:00 CDT", "2011-04-14 15:15:00 CDT", "2011-01-20
15:15:00 CST",
? ?"2010-10-14 15:15:00 CDT", "2010-07-15 15:15:00 CDT", "2010-04-15
15:15:00 CDT",
? ?"2010-01-21 15:15:00 CST", "2009-10-15 15:15:00 CDT", "2009-07-16
15:15:00 CDT",
? ?"2009-04-16 15:15:00 CDT", "2009-01-22 15:15:00 CST", "2008-10-16
15:15:00 CDT",
? ?"2008-07-17 15:15:00 CDT", "2008-04-17 15:15:00 CDT", "2008-01-31
15:15:00 CST",
? ?"2007-10-18 15:15:00 CDT", "2007-07-19 15:15:00 CDT", "2007-04-19
15:15:00 CDT",
? ?"2007-01-31 15:15:00 CST", "2006-10-19 15:15:00 CDT", "2006-07-20
15:15:00 CDT",
? ?"2006-04-20 15:15:00 CDT", "2006-01-31 15:15:00 CST", "2005-10-20
15:15:00 CDT",
? ?"2005-07-21 15:15:00 CDT", "2005-04-21 15:15:00 CDT", "2005-02-01
15:15:00 CST",
? ?"2004-10-21 15:15:00 CDT")), .Names = c("SYMBOL", "PERIOD",
"EVENT TITLE", "EPS ESTIMATE", "EPS ACTUAL", "PREV. YEAR ACTUAL",
"TIME"), row.names = 2:31, na.action = structure(31L, .Names = "32",
class = "omit"), class = "data.frame")

Sarah Goslee
http://www.functionaldiversity.org

Petr Savicky

Fri, Feb 3, 2012 8:01 AM #

On Fri, Feb 03, 2012 at 09:25:10AM -0600, G See wrote:

Hi.

This string contains a no-break space, not a space.

  "n/a?" == "n/a\uA0"

  [1] TRUE

  "n/a\uA0"

  [1] "n/a?"

Hope this helps.

Petr Savicky.

G See

Fri, Feb 3, 2012 8:09 AM #

Hi Sarah,

Thank you very much for the response.

In fact, it does work on Mac even without including the space:

Loading required package: XML

[1] FALSE

[1] TRUE

Garrett

On Fri, Feb 3, 2012 at 9:57 AM, Sarah Goslee <sarah.goslee at gmail.com> wrote:

Is that exactly what you're doing, in a clean session?

x <- rdata[27, 4]

x == "n/a "

[1] TRUE

x == "n/a"

[1] FALSE

Because as long as the space is included, the test should be TRUE.

(I renamed the dput object rdata, because df() is a base function.)

df[df == "n/a"] <- NA
shouldn't work on Mac, or any other system, because no elements of
your data frame are "n/a", but are instead "n/a "

If it were my data, I'd get rid of the spaces at the end of the values before
trying to do anything, either before reading it into R, or with gsub() after.

Sarah

On Fri, Feb 3, 2012 at 10:25 AM, G See <gsee000 at gmail.com> wrote:

I have a data.frame named "df". The dput of df is at the bottom of this e-mail.
What I'd like to do is replace the "n/a " values with NA. ?On Mac OSX, it works
to do this:
df[df == "n/a"] <- NA

However, it does not work on Ubuntu. ?See below.

Thanks in advance,
Garrett

x <- df[27, 4] # complete data.frame dput is below
dput(x)

"n/a?"

x == "n/a "

[1] FALSE

x == "n/a"

[1] FALSE

str(x)

?chr "n/a?"

is.na(x)

[1] FALSE

grep("n/a ", x)

integer(0)

grep("n/a", x)

[1] 1

sessionInfo()

R version 2.14.1 (2011-12-22)
Platform: x86_64-pc-linux-gnu (64-bit)

locale:
?[1] LC_CTYPE=en_US.UTF-8 ? ? ? LC_NUMERIC=C
?[3] LC_TIME=en_US.UTF-8 ? ? ? ?LC_COLLATE=en_US.UTF-8
?[5] LC_MONETARY=en_US.UTF-8 ? ?LC_MESSAGES=en_US.UTF-8
?[7] LC_PAPER=C ? ? ? ? ? ? ? ? LC_NAME=C
?[9] LC_ADDRESS=C ? ? ? ? ? ? ? LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats ? ? graphics ?grDevices utils ? ? datasets ?methods ? base

other attached packages:
[1] XML_3.4-3 ? ? ? ? ? ? ? ? ?qmao_1.1.10
[3] FinancialInstrument_0.10.9 quantmod_0.3-17
[5] TTR_0.21-0 ? ? ? ? ? ? ? ? Defaults_1.1-1
[7] xts_0.8-3 ? ? ? ? ? ? ? ? ?zoo_1.7-6

loaded via a namespace (and not attached):
[1] grid_2.14.1 ? ?lattice_0.20-0 tools_2.14.1


### More detail ###
## Here is the complete data.frame

dput(df)

structure(list(SYMBOL = c("GOOG?", "GOOG?", "GOOG?", "GOOG?",
"GOOG?", "GOOG?", "GOOG?", "GOOG?", "GOOG?", "GOOG?", "GOOG?",
"GOOG?", "GOOG?", "GOOG?", "GOOG?", "GOOG?", "GOOG?", "GOOG?",
"GOOG?", "GOOG?", "GOOG?", "GOOG?", "GOOG?", "GOOG?", "GOOG?",
"GOOG?", "GOOG?", "GOOG?", "GOOG?", "GOOG?"), PERIOD = c("Q4?2011",
"Q3?2011", "Q2?2011", "Q1?2011", "Q4?2010", "Q3?2010", "Q2?2010",
"Q1?2010", "Q4?2009", "Q3?2009", "Q2?2009", "Q1?2009", "Q4?2008",
"Q3?2008", "Q2?2008", "Q1?2008", "Q4?2007", "Q3?2007", "Q2?2007",
"Q1?2007", "Q4?2006", "Q3?2006", "Q2?2006", "Q1?2006", "Q4?2005",
"Q3?2005", "Q2?2005", "Q1?2005", "Q4?2004", "Q3?2004"),
? ?`EVENT TITLE` = c("Q4 2011 Google Earnings Release", "Q3 2011
Google Inc Earnings Release",
? ?"Q2 2011 Google Inc Earnings Release", "Q1 2011 Google Inc
Earnings Release",
? ?"Q4 2010 Google Earnings Release", "Q3 2010 Google Earnings Release",
? ?"Q2 2010 Google Earnings Release", "Q1 2010 Google Earnings Release",
? ?"Q4 2009 Google Earnings Release", "Q3 2009 Google Earnings Release",
? ?"Q2 2009 Google Earnings Release", "Q1 2009 Google Earnings Release",
? ?"Q4 2008 Google Earnings Release", "Q3 2008 Google Earnings Release",
? ?"Q2 2008 Google Earnings Release", "Q1 2008 Google Earnings Release",
? ?"Q4 2007 Google Earnings Release", "Q3 2007 Google Earnings Release",
? ?"Q2 2007 Google Earnings Release", "Q1 2007 Google Earnings Release",
? ?"Q4 2006 Google Earnings Release", "Q3 2006 Google Earnings Release",
? ?"Q2 2006 Google Earnings Release", "Q1 2006 Google Earnings Release",
? ?"Q4 2005 Google Earnings Release", "Q3 2005 Google Earnings Release",
? ?"Q2 2005 Google Earnings Release", "Q1 2005 Google Earnings Release",
? ?"Q4 2004 Google Earnings Release", "Q3 2004 Google Earnings Release"
? ?), `EPS ESTIMATE` = c("$ 10.49?", "$ 8.74?", "$ 7.85?",
? ?"$ 8.10?", "$ 8.09?", "$ 6.68?", "$ 6.52?", "$ 6.60?",
? ?"$ 6.50?", "$ 5.42?", "$ 5.09?", "$ 4.93?", "$ 4.95?",
? ?"$ 4.76?", "$ 4.74?", "$ 4.52?", "$ 4.44?", "$ 3.78?",
? ?"$ 3.59?", "$ 3.30?", "$ 2.92?", "$ 2.42?", "$ 2.22?",
? ?"$ 1.97?", "n/a?", "n/a?", "n/a?", "n/a?", "n/a?",
? ?"n/a?"), `EPS ACTUAL` = c("$ 9.50?", "$ 9.72?", "$ 8.74?",
? ?"$ 8.08?", "$ 8.75?", "$ 7.64?", "$ 6.45?", "$ 6.76?",
? ?"$ 6.79?", "$ 5.89?", "$ 5.36?", "$ 5.16?", "$ 5.10?",
? ?"$ 4.92?", "$ 4.63?", "$ 4.84?", "$ 4.43?", "$ 3.91?",
? ?"$ 3.56?", "$ 3.68?", "$ 3.18?", "$ 2.62?", "$ 2.49?",
? ?"$ 2.29?", "n/a?", "n/a?", "n/a?", "n/a?", "n/a?",
? ?"n/a?"), `PREV. YEAR ACTUAL` = c("$ 8.75?", "$ 7.64?",
? ?"$ 6.45?", "$ 6.76?", "$ 6.79?", "$ 5.89?", "$ 5.36?",
? ?"$ 5.16?", "$ 5.10?", "$ 4.92?", "$ 4.63?", "$ 4.84?",
? ?"$ 4.43?", "$ 3.91?", "$ 3.56?", "$ 3.68?", "$ 3.18?",
? ?"$ 2.62?", "$ 2.49?", "$ 2.29?", "n/a?", "n/a?", "n/a?",
? ?"n/a?", "n/a?", "n/a?", "n/a?", "n/a?", "n/a?", "n/a?"
? ?), TIME = c("2012-01-19 15:15:00 CST", "2011-10-13 15:15:00 CDT",
? ?"2011-07-14 15:15:00 CDT", "2011-04-14 15:15:00 CDT", "2011-01-20
15:15:00 CST",
? ?"2010-10-14 15:15:00 CDT", "2010-07-15 15:15:00 CDT", "2010-04-15
15:15:00 CDT",
? ?"2010-01-21 15:15:00 CST", "2009-10-15 15:15:00 CDT", "2009-07-16
15:15:00 CDT",
? ?"2009-04-16 15:15:00 CDT", "2009-01-22 15:15:00 CST", "2008-10-16
15:15:00 CDT",
? ?"2008-07-17 15:15:00 CDT", "2008-04-17 15:15:00 CDT", "2008-01-31
15:15:00 CST",
? ?"2007-10-18 15:15:00 CDT", "2007-07-19 15:15:00 CDT", "2007-04-19
15:15:00 CDT",
? ?"2007-01-31 15:15:00 CST", "2006-10-19 15:15:00 CDT", "2006-07-20
15:15:00 CDT",
? ?"2006-04-20 15:15:00 CDT", "2006-01-31 15:15:00 CST", "2005-10-20
15:15:00 CDT",
? ?"2005-07-21 15:15:00 CDT", "2005-04-21 15:15:00 CDT", "2005-02-01
15:15:00 CST",
? ?"2004-10-21 15:15:00 CDT")), .Names = c("SYMBOL", "PERIOD",
"EVENT TITLE", "EPS ESTIMATE", "EPS ACTUAL", "PREV. YEAR ACTUAL",
"TIME"), row.names = 2:31, na.action = structure(31L, .Names = "32",
class = "omit"), class = "data.frame")


--
Sarah Goslee
http://www.functionaldiversity.org

G See

Fri, Feb 3, 2012 8:10 AM #

Petr,

Thank you!  That is great.

Do you know of a way to print a string such that I can see whether it
contains a string or a no-break space?

Thanks,
Garrett

On Fri, Feb 3, 2012 at 10:01 AM, Petr Savicky <savicky at cs.cas.cz> wrote:

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

G See

Fri, Feb 3, 2012 8:10 AM #

Sorry, I meant
Do you know of a way to print a string such that I can see whether it
contains a *space* or a no-break space?

On Fri, Feb 3, 2012 at 10:10 AM, G See <gsee000 at gmail.com> wrote:

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Fri, Feb 3, 2012 8:15 AM #

On 12-02-03 10:25 AM, G See wrote:

One would expect the first of these to be TRUE, but the second 
shouldn't.  On my system that's what happens.

Is this still repeatable in a new session?  If so, can you show us what 
you get from charToRaw?  I get

 > charToRaw(x)
[1] 6e 2f 61 20

but perhaps you have some different character in the fourth position, 
one which just happens to display as a space.

If it is not repeatable in a new session, then it's hard to guess what 
went wrong, but conceivably memory corruption somewhere could have 
caused this.  It would be worthwhile keeping track of what you were 
doing if it ever happens again.

Duncan Murdoch

str(x)

  chr "n/a "

is.na(x)

[1] FALSE

grep("n/a ", x)

integer(0)

grep("n/a", x)

[1] 1

sessionInfo()

R version 2.14.1 (2011-12-22)
Platform: x86_64-pc-linux-gnu (64-bit)

locale:
  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
  [7] LC_PAPER=C                 LC_NAME=C
  [9] LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] XML_3.4-3                  qmao_1.1.10
[3] FinancialInstrument_0.10.9 quantmod_0.3-17
[5] TTR_0.21-0                 Defaults_1.1-1
[7] xts_0.8-3                  zoo_1.7-6

loaded via a namespace (and not attached):
[1] grid_2.14.1    lattice_0.20-0 tools_2.14.1


### More detail ###
## Here is the complete data.frame

dput(df)

structure(list(SYMBOL = c("GOOG ", "GOOG ", "GOOG ", "GOOG ",
"GOOG ", "GOOG ", "GOOG ", "GOOG ", "GOOG ", "GOOG ", "GOOG ",
"GOOG ", "GOOG ", "GOOG ", "GOOG ", "GOOG ", "GOOG ", "GOOG ",
"GOOG ", "GOOG ", "GOOG ", "GOOG ", "GOOG ", "GOOG ", "GOOG ",
"GOOG ", "GOOG ", "GOOG ", "GOOG ", "GOOG "), PERIOD = c("Q4 2011",
"Q3 2011", "Q2 2011", "Q1 2011", "Q4 2010", "Q3 2010", "Q2 2010",
"Q1 2010", "Q4 2009", "Q3 2009", "Q2 2009", "Q1 2009", "Q4 2008",
"Q3 2008", "Q2 2008", "Q1 2008", "Q4 2007", "Q3 2007", "Q2 2007",
"Q1 2007", "Q4 2006", "Q3 2006", "Q2 2006", "Q1 2006", "Q4 2005",
"Q3 2005", "Q2 2005", "Q1 2005", "Q4 2004", "Q3 2004"),
     `EVENT TITLE` = c("Q4 2011 Google Earnings Release", "Q3 2011
Google Inc Earnings Release",
     "Q2 2011 Google Inc Earnings Release", "Q1 2011 Google Inc
Earnings Release",
     "Q4 2010 Google Earnings Release", "Q3 2010 Google Earnings Release",
     "Q2 2010 Google Earnings Release", "Q1 2010 Google Earnings Release",
     "Q4 2009 Google Earnings Release", "Q3 2009 Google Earnings Release",
     "Q2 2009 Google Earnings Release", "Q1 2009 Google Earnings Release",
     "Q4 2008 Google Earnings Release", "Q3 2008 Google Earnings Release",
     "Q2 2008 Google Earnings Release", "Q1 2008 Google Earnings Release",
     "Q4 2007 Google Earnings Release", "Q3 2007 Google Earnings Release",
     "Q2 2007 Google Earnings Release", "Q1 2007 Google Earnings Release",
     "Q4 2006 Google Earnings Release", "Q3 2006 Google Earnings Release",
     "Q2 2006 Google Earnings Release", "Q1 2006 Google Earnings Release",
     "Q4 2005 Google Earnings Release", "Q3 2005 Google Earnings Release",
     "Q2 2005 Google Earnings Release", "Q1 2005 Google Earnings Release",
     "Q4 2004 Google Earnings Release", "Q3 2004 Google Earnings Release"
     ), `EPS ESTIMATE` = c("$ 10.49 ", "$ 8.74 ", "$ 7.85 ",
     "$ 8.10 ", "$ 8.09 ", "$ 6.68 ", "$ 6.52 ", "$ 6.60 ",
     "$ 6.50 ", "$ 5.42 ", "$ 5.09 ", "$ 4.93 ", "$ 4.95 ",
     "$ 4.76 ", "$ 4.74 ", "$ 4.52 ", "$ 4.44 ", "$ 3.78 ",
     "$ 3.59 ", "$ 3.30 ", "$ 2.92 ", "$ 2.42 ", "$ 2.22 ",
     "$ 1.97 ", "n/a ", "n/a ", "n/a ", "n/a ", "n/a ",
     "n/a "), `EPS ACTUAL` = c("$ 9.50 ", "$ 9.72 ", "$ 8.74 ",
     "$ 8.08 ", "$ 8.75 ", "$ 7.64 ", "$ 6.45 ", "$ 6.76 ",
     "$ 6.79 ", "$ 5.89 ", "$ 5.36 ", "$ 5.16 ", "$ 5.10 ",
     "$ 4.92 ", "$ 4.63 ", "$ 4.84 ", "$ 4.43 ", "$ 3.91 ",
     "$ 3.56 ", "$ 3.68 ", "$ 3.18 ", "$ 2.62 ", "$ 2.49 ",
     "$ 2.29 ", "n/a ", "n/a ", "n/a ", "n/a ", "n/a ",
     "n/a "), `PREV. YEAR ACTUAL` = c("$ 8.75 ", "$ 7.64 ",
     "$ 6.45 ", "$ 6.76 ", "$ 6.79 ", "$ 5.89 ", "$ 5.36 ",
     "$ 5.16 ", "$ 5.10 ", "$ 4.92 ", "$ 4.63 ", "$ 4.84 ",
     "$ 4.43 ", "$ 3.91 ", "$ 3.56 ", "$ 3.68 ", "$ 3.18 ",
     "$ 2.62 ", "$ 2.49 ", "$ 2.29 ", "n/a ", "n/a ", "n/a ",
     "n/a ", "n/a ", "n/a ", "n/a ", "n/a ", "n/a ", "n/a "
     ), TIME = c("2012-01-19 15:15:00 CST", "2011-10-13 15:15:00 CDT",
     "2011-07-14 15:15:00 CDT", "2011-04-14 15:15:00 CDT", "2011-01-20
15:15:00 CST",
     "2010-10-14 15:15:00 CDT", "2010-07-15 15:15:00 CDT", "2010-04-15
15:15:00 CDT",
     "2010-01-21 15:15:00 CST", "2009-10-15 15:15:00 CDT", "2009-07-16
15:15:00 CDT",
     "2009-04-16 15:15:00 CDT", "2009-01-22 15:15:00 CST", "2008-10-16
15:15:00 CDT",
     "2008-07-17 15:15:00 CDT", "2008-04-17 15:15:00 CDT", "2008-01-31
15:15:00 CST",
     "2007-10-18 15:15:00 CDT", "2007-07-19 15:15:00 CDT", "2007-04-19
15:15:00 CDT",
     "2007-01-31 15:15:00 CST", "2006-10-19 15:15:00 CDT", "2006-07-20
15:15:00 CDT",
     "2006-04-20 15:15:00 CDT", "2006-01-31 15:15:00 CST", "2005-10-20
15:15:00 CDT",
     "2005-07-21 15:15:00 CDT", "2005-04-21 15:15:00 CDT", "2005-02-01
15:15:00 CST",
     "2004-10-21 15:15:00 CDT")), .Names = c("SYMBOL", "PERIOD",
"EVENT TITLE", "EPS ESTIMATE", "EPS ACTUAL", "PREV. YEAR ACTUAL",
"TIME"), row.names = 2:31, na.action = structure(31L, .Names = "32",
class = "omit"), class = "data.frame")

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Fri, Feb 3, 2012 8:19 AM #

On 12-02-03 11:10 AM, G See wrote:

Use tools::showNonASCII(x).  On Petr's example, it gives

1: n/a<c2><a0>

Duncan Murdoch

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

G See

Fri, Feb 3, 2012 8:23 AM #

Thank you Duncan, that is very helpful.

Although I think we've got it sorted out now, to answer your previous
questions,  it is repeatable in a new R session, and the output of
charToRaw is below.

On Ubuntu, I get the following:

[1] 6e 2f 61 c2 a0

On Mac, I get:

[1] 6e 2f 61

Thanks to all for the help,
Garrett

On Fri, Feb 3, 2012 at 10:19 AM, Duncan Murdoch

<murdoch.duncan at gmail.com> wrote:

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Petr Savicky

Fri, Feb 3, 2012 8:33 AM #

On Fri, Feb 03, 2012 at 10:10:56AM -0600, G See wrote:

Hi.

For unknown characters, the following may be useful

  x <- "n/a?"

  library(Unicode)
  u_char_inspect(as.u_char_seq(x, ""))

      Code                 Name Char
  1 U+006E LATIN SMALL LETTER N    n
  2 U+002F              SOLIDUS    /
  3 U+0061 LATIN SMALL LETTER A    a
  4 U+00A0       NO-BREAK SPACE    ?

Petr Savicky.

Fri, Feb 3, 2012 8:39 AM #

On Feb 3, 2012, at 17:23 , G See wrote:

So that's a nonbreak space alright. Next question: How did it get there? I'm mildly surprised that it crept into the data frame, I would expect it to happen much easier with things typed on the keyboard (Alt-Spc on my Mac keyboard, e.g.).

Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: pd.mes at cbs.dk  Priv: PDalgd at gmail.com

G See

Fri, Feb 3, 2012 9:03 AM #

On Fri, Feb 3, 2012 at 10:39 AM, peter dalgaard <pdalgd at gmail.com> wrote:

Peter,
I won't venture to guess how, but this will do it.

[1] 6e 2f 61 c2 a0

Garrett

Fri, Feb 3, 2012 2:59 PM #

On Feb 3, 2012, at 18:03 , G See wrote:

OK, if you look at the source for that page, it actually contains stuff like

<td align="center">n/a&#160;</td>

and &#160; is the infamous \uA0 alias nonbreak space. So the odd thing might actually be that the Mac manages to lose the trailing nonbreak space, whereas other systems do not. AFAICS, this boils down to the matching of [[:space:]] inside

function (x) 
gsub("(^[[:space:]]+|[[:space:]]+$)", "", x)
<environment: namespace:XML>

A locale dependency, perhaps?

Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: pd.mes at cbs.dk  Priv: PDalgd at gmail.com