Skip to content

Several R vs S-Plus issues

8 messages · Duncan Murdoch, Douglas Bates, Thomas Lumley +2 more

#
I think we do not want to change this.

Splus has
[1] NA  2
[1] T F
[1] T F

etc, so I think the logic would be that we get NA whenever we subscript
by NA.  The current Splus behavior for character vectors is different,
and I do not see why it should.  Note that in R,

R> is.na(LETTERS[c(NA,2)])
[1]  TRUE FALSE

so we really have NA but it is printed as "NA" (and this might be
another case where <NA> would be better).
Doug Bates had raised this issue some time ago.  In the interest of
keeping the S language core as small as possible, I'd rather recommend
against adding these.  We can basically do that same using apply(), and
if the point is to have fast C code I think we should rather
special-case apply accordingly.
This is a function from the chron package, and it is also available in
the R version of chron.  Again, I am not sure whether this should be in
the S language core: we have strsplit.

-k
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-devel-request@stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
#
On Thu, 4 Oct 2001 10:13:54 +0200, you wrote in message
<15292.6722.281945.237767@mithrandir.hornik.net>:
I think NA and "NA" are treated in a very confusing way in R.  In y
below, I construct an NA the way you did above; in z, I use the string
'NA'.  As far as I can tell the vectors are identical.  In w, I use
paste(NA), and get something different.  I also tried paste('NA'), and
get something that looks like w.

 > x <- 'a'
 > y <- x[c(1,NA)]
 > y
 [1] "a"  "NA"
 > y == 'NA'
 [1] FALSE  TRUE
 > is.na(y)
 [1] FALSE  TRUE

 > z <- c('a', 'NA')
 > z == 'NA'
 [1] FALSE  TRUE
 > is.na(z)
 [1] FALSE  TRUE
 > y == z
 [1] TRUE TRUE

 > w <- c('a',paste(NA))
 > w == 'NA'
 [1] FALSE  TRUE
 > is.na(w)
 [1] FALSE FALSE         # This is a surprising result!
 > y == w
 [1] TRUE TRUE

Duncan

-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-devel-request@stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
#
Kurt Hornik <Kurt.Hornik@ci.tuwien.ac.at> writes:
Doug was responding to a request from a poster at fmr.com (I believe
it was David in fact) for the colSums function.  I showed how an add-on
package with this function could be created.  I'm sure that all these
functions could be added easily in a separate package.  

Perhaps I was overly subtle in my phrasing but I wrote the message on
how to create a package so as to encourage others to add these
functions instead of relying on a member of the core group having the
time and inclination to do so.

I agree with Kurt that we would want to have a strong reason for
adding these functions to the base language.  Kurt has actively
promoted the idea of basing R on a small, tight language core that is
supplemented by packages.  Toward this end Kurt has done an incredible
amount of work on tools for package attachment, checking, and
documentation.

The goal of a small tight language core supplemented by
user-contributed modules is in keeping with the spirit of open source
projects.  The R core team, especially Kurt and Fritz, devote
considerable effort to creating tools to make it easy for others to
contribute their work.  A commercial developer is more inclined to
define the language and capabilities in house where they have control
and where they can provide support.  This is not a criticism of
commercial development.  If you are required to provide support for
the software you must be able to control the code.

Those on the r-devel list may want to check the web site
                   http://developer.r-project.org/
occasionally.  Some plans for the future and items for discussion are
given there.

-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-devel-request@stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
#
On Thu, 4 Oct 2001, Kurt Hornik wrote:

            
I think we do want to change this (as we discussed quite recently) but by
adding a genuine character NA.  The problem is not that LETTERS[c(NA,2)]
returns a missing value, it's that it isn't missing enough. We need a
"NaS" (Not a String) value that can't be confused with Nabisco, but also
can't be confused with an empty string.

	-thomas

-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-devel-request@stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
#
Agreed.  As Peter said, subscripting by NA should give NA but R cannot
distinguish a string NA from the string NA.  However, David (I think)
was suggesting that subscripting a string with NA should give an empty
string, which I think cannot be right.  It must be NaS in your sense.

-k
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-devel-request@stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
#
On Thu, 4 Oct 2001 18:14:48 +0200, Kurt Hornik wrote in message
<15292.35576.513722.492732@mithrandir.hornik.net>:
But it does make some sort of distinction, as my example contrasting
"NA" with paste(NA) shows.  In case you missed it:

 > is.na("NA")
 [1] TRUE
 > paste(NA)
 [1] "NA"
 > is.na(paste(NA))
 [1] FALSE

Duncan Murdoch


-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-devel-request@stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
#
On Thu, 4 Oct 2001, Duncan Murdoch wrote:

            
Yes.  is.na() of a string is true if that string is actually a reference
to R_NaString.  Parsing a literal "NA" will give R_NaString, as will
coercing NA from some other type.  However, paste() doesn't check whether
it produces "NA". Neither does toupper()
  > is.na(toupper("na"))
  [1] FALSE
  > toupper("na")
  [1] "NA"


So we're partway there already, in fact.  It looks like we basically
need to
1) stop the parser generating R_NaString from \"NA\"
2) Change PRINTNAME(R_NaString) to avoid ambiguity

	-thomas

-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-devel-request@stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
#
On Thu, Oct 04, 2001 at 11:31:52AM -0700, Thomas Lumley wrote:
One more (at least :-))

3) fix saveload.c to preserve NA status in strings.

Right now we have
[1] "NA" "NA" "NA"
[1]  TRUE  TRUE FALSE
[1] "NA" "NA" "NA"
[1] FALSE FALSE FALSE

It's easy enough to fix in principle--just add another case to
NewSaveSpecialHook and NewLoadSpecialHook.  Unfortunately this means
workspace files that contain string NA's written with this new
convention won't be readable by older versions of R (I think it will
generate an "unknown type" error) (and probably leave the file
opened).

luke