Skip to content

issue with nzchar() ?

6 messages · R. Michael Weylandt, Bert Gunter, David L Carlson +1 more

#
Dear all
I'm a bit surprised by the results output from nzchar(). The help page
says: "nzchar is a fast way to find out if elements of a character
vector are *non-empty strings*." (my emphasis. However, if you do
[1]  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE
[13]  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE
[25]  TRUE  TRUE  TRUE FALSE
[1] TRUE

the NA value in the character vector will be considered as a non-empty
string, something that I find strange. At best NA is the equivalent of
an empty string. In this sense, if you Hmisc::describe() the vector
you get, as I would expect, that in the context of character vectors
NA and '' values are considered together:
x
      n missing  unique
     26       2      26

lowest : a b c d e, highest: v w x y z

So is this a bug in the function or in the help page? Regards
Liviu
#
On Mon, Aug 6, 2012 at 4:48 PM, Liviu Andronic <landronimirc at gmail.com> wrote:
By the way, same question holds for nchar(): Should NA values be
reported as 2-char strings, or as 0-char empty/missing values?
[1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 0


Liviu
#
On Mon, Aug 6, 2012 at 9:53 AM, Liviu Andronic <landronimirc at gmail.com> wrote:
Certainly not to my mind, unless you think that zero and NA should be
the same for integers and doubles as well. NA (in whatever form) is,
to my mind, _unknown_ which is very different than knowing 0.
I'm not sure why that's the case, but it's documented on the help page
(under value):

 For ?nchar?, an integer vector giving the sizes of each element,
     currently always ?2? for missing values (for ?NA?).

so I don't see any bug.

My guess is that it's this way for back-compatability from a time when
there probably wasn't a proper NA_character_ (that's the parser
literal for a character NA) and they really were just "NA" (the
string) -- perhaps in some far distant R 3.0 we'll see
nchar(NA_character_) = NA_integer_

Best,
Michael
#
Liviu:

Well, as usual, to a certain extent this is arbitrary and the only
issue is whether it is documented correctly.

To me, NA (of whatever mode) means ""indeterminate" or "unknown," so
since "" is known and of length 0, I would have expected NA as a
return. But the point is, not what our particular tastes are ("You say
'tomayto', I say 'tomahto,' an old song goes), but what the docs say.
And in both cases, they tell you exactly what you'll get.

For nchar(): " an integer vector giving the sizes of each element,
currently always 2 for missing values (for NA)"

and for nzchar: "a logical vector of the same length as x, true if and
only if the element has non-zero length." (note the 'only if').

So I see no error or inconsistencies anywhere.

-- Bert
On Mon, Aug 6, 2012 at 7:53 AM, Liviu Andronic <landronimirc at gmail.com> wrote:

  
    
#
It would be nice to be able to trigger NA returning NA with an argument to
the function, but you can easily get that result:
[1]  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE
[13]  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE
[25]  TRUE  TRUE    NA FALSE

----------------------------------------------
David L Carlson
Associate Professor of Anthropology
Texas A&M University
College Station, TX 77843-4352
1 day later
#
On Mon, Aug 6, 2012 at 5:27 PM, R. Michael Weylandt
<michael.weylandt at gmail.com> wrote:
This is a tricky question and I don't have a strong opinion yet.
I most certainly missed this bit in the help page.
As David has also suggested (and Bert alluded), it may be worth having
a nchar(..., returnNA=FALSE) argument, which if TRUE would return NA
when it encounters NA values in the original vector.

Thank you all for the comments. Regards
Liviu