na.omit inconsistent with is.na on list

11 messages · Gabriel Becker, Iñaki Ucar, Hugh Parsonage +1 more

Wed, Aug 11, 2021 9:58 PM #

na.omit is documented as "na.omit returns the object with incomplete cases
removed." and "At present these will handle vectors," so I expected that
when it is used on a list, it should return the same thing as if we subset
via is.na; however I observed the following,

List of 2
 $ : NULL
 $ : num 0

List of 3
 $ : NULL
 $ : logi NA
 $ : num 0

Should na.omit be fixed so that it returns a result that is consistent with
is.na? I assume that is.na is the canonical definition of what should be
considered a missing value in R.

Toby Hocking

Wed, Aug 11, 2021 10:16 PM #

Also, the na.omit method for data.frame with list column seems to be
inconsistent with is.na,

'data.frame': 3 obs. of  1 variable:
 $ L:List of 3
  ..$ : NULL
  ..$ : logi NA
  ..$ : num 0
  ..- attr(*, "class")= chr "AsIs"

L
[1,] FALSE
[2,]  TRUE
[3,] FALSE

L
1
2 NA
3  0

On Wed, Aug 11, 2021 at 9:58 PM Toby Hocking <tdhock5 at gmail.com> wrote:

Gabriel Becker

Thu, Aug 12, 2021 1:18 PM #

Hi Toby,

This definitely appears intentional, the first  expression of
stats:::na.omit.default is

   if (!is.atomic(object))

        return(object)


So it is explicitly just returning the object in non-atomic cases, which
includes lists. I was not involved in this decision (obviously) but my
guess is that it is due to the fact that what constitutes an observation
"being complete" in unclear in the list case. What should

na.omit(list(5, NA, c(NA, 5)))

return? Just the first element, or the first and the last? It seems, at
least to me, unclear. A small change to the documentation to to add "atomic
(in the sense of is.atomic returning \code{TRUE})" in front of "vectors"
or similar  where what types of objects are supported seems justified,
though, imho, as the current documentation is either ambiguous or
technically incorrect, depending on what we take "vector" to mean.

Best,
~G

On Wed, Aug 11, 2021 at 10:16 PM Toby Hocking <tdhock5 at gmail.com> wrote:

______________________________________________
R-devel at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Toby Hocking

Thu, Aug 12, 2021 4:30 PM #

Hi Gabe thanks for the feedback.

On Thu, Aug 12, 2021 at 1:19 PM Gabriel Becker <gabembecker at gmail.com>
wrote:

to say atomic vectors.

I agree in principle/theory that it is unclear, but in practice is.na has
an un-ambiguous answer (if list element is scalar NA then it is considered
missing, otherwise not).

______________________________________________
R-devel at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Gabriel Becker

Thu, Aug 12, 2021 11:46 PM #

On Thu, Aug 12, 2021 at 4:30 PM Toby Hocking <tdhock5 at gmail.com> wrote:

Well, yes it's unambiguous, but I would argue less likely than the other
option to be correct. Remember what na.omit is supposed to do: "remove
observations which are not complete".

Now for data.frames, this means it removes any row (i.e. observation,
despite the internal structure) where *any* column contains an NA. The most
analogous interpretation of na.omit on a list, in the well behaved (ie list
of atomic vectors) case, I think, is that we consider it a ragged
collection of "observations", in which case  x[is.na(x)] with x a list
would do the wrong thing because it is not checking these "observations"
for completeness.

Perhaps others disagree with me about that, and anyway, this only works
when you can check the elements of the list for "completeness" at all, the
list can have anything for elements, and then checking for completeness
becomes impossible...

As is, I do also wonder if a warning should be thrown letting the user know
that their call isn't doing ANY of the possible things it could mean...

Best,
~G

A small change to the documentation to to add "atomic (in the sense of

is.atomic returning \code{TRUE})" in front of "vectors"  or similar  where
what types of objects are supported seems justified, though, imho, as the
current documentation is either ambiguous or technically incorrect,
depending on what we take "vector" to mean.

Best,
~G

On Wed, Aug 11, 2021 at 10:16 PM Toby Hocking <tdhock5 at gmail.com> wrote:

Also, the na.omit method for data.frame with list column seems to be
inconsistent with is.na,

L <- list(NULL, NA, 0)
str(f <- data.frame(I(L)))

'data.frame': 3 obs. of  1 variable:
 $ L:List of 3
  ..$ : NULL
  ..$ : logi NA
  ..$ : num 0
  ..- attr(*, "class")= chr "AsIs"

is.na(f)

         L
[1,] FALSE
[2,]  TRUE
[3,] FALSE

na.omit(f)

   L
1
2 NA
3  0

On Wed, Aug 11, 2021 at 9:58 PM Toby Hocking <tdhock5 at gmail.com> wrote:

na.omit is documented as "na.omit returns the object with incomplete

cases

removed." and "At present these will handle vectors," so I expected

that

when it is used on a list, it should return the same thing as if we

subset

via is.na; however I observed the following,

L <- list(NULL, NA, 0)
str(L[!is.na(L)])

List of 2
 $ : NULL
 $ : num 0

str(na.omit(L))

List of 3
 $ : NULL
 $ : logi NA
 $ : num 0

Should na.omit be fixed so that it returns a result that is consistent
with is.na? I assume that is.na is the canonical definition of what
should be considered a missing value in R.

        [[alternative HTML version deleted]]

______________________________________________
R-devel at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Iñaki Ucar

Fri, Aug 13, 2021 12:26 AM #

On Thu, 12 Aug 2021 at 22:20, Gabriel Becker <gabembecker at gmail.com> wrote:

I don't follow your point. This only means that the *default* method
is not intended for non-atomic cases, but it doesn't mean it shouldn't
exist a method for lists.

[1] FALSE  TRUE FALSE

Following Toby's argument, it's clear to me: the first and the last.

I?aki

______________________________________________
R-devel at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

______________________________________________
R-devel at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

I?aki ?car

Hugh Parsonage

Fri, Aug 13, 2021 1:09 AM #

The data.frame method deliberately skips non-atomic columns before
invoking is.na(x) so I think it is fair to assume this behaviour is
intentional and assumed.

Not so clear to me that there is a sensible answer for list columns.
(List columns seem to collide with the expectation that in each
variable every observation will be of the same type)

Consider your list L as

L <- list(NULL, NA, c(NA, NA))

Seems like every observation could have a claim to be 'missing' here.
Concretely, if a data.frame had a list column representing the lat-lon
of an observation, we might only be able to represent missing values
like c(NA, NA).

On Fri, 13 Aug 2021 at 17:27, I?aki Ucar <iucar at fedoraproject.org> wrote:

On Thu, 12 Aug 2021 at 22:20, Gabriel Becker <gabembecker at gmail.com> wrote:

Hi Toby,

This definitely appears intentional, the first  expression of
stats:::na.omit.default is

   if (!is.atomic(object))

        return(object)

I don't follow your point. This only means that the *default* method
is not intended for non-atomic cases, but it doesn't mean it shouldn't
exist a method for lists.

So it is explicitly just returning the object in non-atomic cases, which
includes lists. I was not involved in this decision (obviously) but my
guess is that it is due to the fact that what constitutes an observation
"being complete" in unclear in the list case. What should

na.omit(list(5, NA, c(NA, 5)))

return? Just the first element, or the first and the last? It seems, at
least to me, unclear. A small change to the documentation to to add "atomic

is.na(list(5, NA, c(NA, 5)))

[1] FALSE  TRUE FALSE

Following Toby's argument, it's clear to me: the first and the last.

I?aki

(in the sense of is.atomic returning \code{TRUE})" in front of "vectors"
or similar  where what types of objects are supported seems justified,
though, imho, as the current documentation is either ambiguous or
technically incorrect, depending on what we take "vector" to mean.

Best,
~G

On Wed, Aug 11, 2021 at 10:16 PM Toby Hocking <tdhock5 at gmail.com> wrote:

Also, the na.omit method for data.frame with list column seems to be
inconsistent with is.na,

L <- list(NULL, NA, 0)
str(f <- data.frame(I(L)))

'data.frame': 3 obs. of  1 variable:
 $ L:List of 3
  ..$ : NULL
  ..$ : logi NA
  ..$ : num 0
  ..- attr(*, "class")= chr "AsIs"

is.na(f)

         L
[1,] FALSE
[2,]  TRUE
[3,] FALSE

na.omit(f)

   L
1
2 NA
3  0

On Wed, Aug 11, 2021 at 9:58 PM Toby Hocking <tdhock5 at gmail.com> wrote:

na.omit is documented as "na.omit returns the object with incomplete

cases

removed." and "At present these will handle vectors," so I expected that
when it is used on a list, it should return the same thing as if we

subset

via is.na; however I observed the following,

L <- list(NULL, NA, 0)
str(L[!is.na(L)])

List of 2
 $ : NULL
 $ : num 0

str(na.omit(L))

List of 3
 $ : NULL
 $ : logi NA
 $ : num 0

Should na.omit be fixed so that it returns a result that is consistent
with is.na? I assume that is.na is the canonical definition of what
should be considered a missing value in R.

        [[alternative HTML version deleted]]

______________________________________________
R-devel at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

______________________________________________
R-devel at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

______________________________________________
R-devel at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

1 day later

Toby Hocking

Sat, Aug 14, 2021 1:48 PM #

Some relevant information from ?is.na: the behavior for lists is
documented,

     For is.na, elementwise the result is false unless that element
     is a length-one atomic vector and the single element of that
     vector is regarded as NA or NaN (note that any is.na method
     for the class of the element is ignored).

Also there are other functions anyNA and is.na<- which are consistent with
is.na. That is, anyNA only returns TRUE if the list has an element which is
a scalar NA. And is.na<- sets list elements to logical NA to indicate
missingness.

On Fri, Aug 13, 2021 at 1:10 AM Hugh Parsonage <hugh.parsonage at gmail.com>
wrote:

The data.frame method deliberately skips non-atomic columns before
invoking is.na(x) so I think it is fair to assume this behaviour is
intentional and assumed.

Not so clear to me that there is a sensible answer for list columns.
(List columns seem to collide with the expectation that in each
variable every observation will be of the same type)

Consider your list L as

L <- list(NULL, NA, c(NA, NA))

Seems like every observation could have a claim to be 'missing' here.
Concretely, if a data.frame had a list column representing the lat-lon
of an observation, we might only be able to represent missing values
like c(NA, NA).

On Fri, 13 Aug 2021 at 17:27, I?aki Ucar <iucar at fedoraproject.org> wrote:

On Thu, 12 Aug 2021 at 22:20, Gabriel Becker <gabembecker at gmail.com>

wrote:

Hi Toby,

This definitely appears intentional, the first  expression of
stats:::na.omit.default is

   if (!is.atomic(object))

        return(object)

I don't follow your point. This only means that the *default* method
is not intended for non-atomic cases, but it doesn't mean it shouldn't
exist a method for lists.

So it is explicitly just returning the object in non-atomic cases,

which

includes lists. I was not involved in this decision (obviously) but my
guess is that it is due to the fact that what constitutes an

observation

"being complete" in unclear in the list case. What should

na.omit(list(5, NA, c(NA, 5)))

return? Just the first element, or the first and the last? It seems, at
least to me, unclear. A small change to the documentation to to add

"atomic

is.na(list(5, NA, c(NA, 5)))

[1] FALSE  TRUE FALSE

Following Toby's argument, it's clear to me: the first and the last.

I?aki

(in the sense of is.atomic returning \code{TRUE})" in front of

"vectors"

or similar  where what types of objects are supported seems justified,
though, imho, as the current documentation is either ambiguous or
technically incorrect, depending on what we take "vector" to mean.

Best,
~G

On Wed, Aug 11, 2021 at 10:16 PM Toby Hocking <tdhock5 at gmail.com>

wrote:

Also, the na.omit method for data.frame with list column seems to be
inconsistent with is.na,

L <- list(NULL, NA, 0)
str(f <- data.frame(I(L)))

'data.frame': 3 obs. of  1 variable:
 $ L:List of 3
  ..$ : NULL
  ..$ : logi NA
  ..$ : num 0
  ..- attr(*, "class")= chr "AsIs"

is.na(f)

         L
[1,] FALSE
[2,]  TRUE
[3,] FALSE

na.omit(f)

   L
1
2 NA
3  0

On Wed, Aug 11, 2021 at 9:58 PM Toby Hocking <tdhock5 at gmail.com>

wrote:

na.omit is documented as "na.omit returns the object with

incomplete

cases

removed." and "At present these will handle vectors," so I

expected that

when it is used on a list, it should return the same thing as if we

subset

via is.na; however I observed the following,

L <- list(NULL, NA, 0)
str(L[!is.na(L)])

List of 2
 $ : NULL
 $ : num 0

str(na.omit(L))

List of 3
 $ : NULL
 $ : logi NA
 $ : num 0

Should na.omit be fixed so that it returns a result that is

consistent

with is.na? I assume that is.na is the canonical definition of

what

should be considered a missing value in R.

        [[alternative HTML version deleted]]

______________________________________________
R-devel at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

______________________________________________
R-devel at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

______________________________________________
R-devel at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

______________________________________________
R-devel at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Gabriel Becker

Sat, Aug 14, 2021 5:15 PM #

I understand what is.na does, the issue I have is that its task is not
equivalent to the conceptual task na.omit is doing, in my opinion, as
illustrated by what the data.frame method does.

Thus what i was getting at above about it not being clear that lst[is.na(lst)]
being the correct thing for na.omit to do

~G

~G

On Sat, Aug 14, 2021, 1:49 PM Toby Hocking <tdhock5 at gmail.com> wrote:

Some relevant information from ?is.na: the behavior for lists is
documented,

     For is.na, elementwise the result is false unless that element
     is a length-one atomic vector and the single element of that
     vector is regarded as NA or NaN (note that any is.na method
     for the class of the element is ignored).

Also there are other functions anyNA and is.na<- which are consistent with
is.na. That is, anyNA only returns TRUE if the list has an element which
is
a scalar NA. And is.na<- sets list elements to logical NA to indicate
missingness.

On Fri, Aug 13, 2021 at 1:10 AM Hugh Parsonage <hugh.parsonage at gmail.com>
wrote:

The data.frame method deliberately skips non-atomic columns before
invoking is.na(x) so I think it is fair to assume this behaviour is
intentional and assumed.

Not so clear to me that there is a sensible answer for list columns.
(List columns seem to collide with the expectation that in each
variable every observation will be of the same type)

Consider your list L as

L <- list(NULL, NA, c(NA, NA))

Seems like every observation could have a claim to be 'missing' here.
Concretely, if a data.frame had a list column representing the lat-lon
of an observation, we might only be able to represent missing values
like c(NA, NA).

On Fri, 13 Aug 2021 at 17:27, I?aki Ucar <iucar at fedoraproject.org>

wrote:

On Thu, 12 Aug 2021 at 22:20, Gabriel Becker <gabembecker at gmail.com>

wrote:

Hi Toby,

This definitely appears intentional, the first  expression of
stats:::na.omit.default is

   if (!is.atomic(object))

        return(object)

I don't follow your point. This only means that the *default* method
is not intended for non-atomic cases, but it doesn't mean it shouldn't
exist a method for lists.

So it is explicitly just returning the object in non-atomic cases,

which

includes lists. I was not involved in this decision (obviously) but

my

guess is that it is due to the fact that what constitutes an

observation

"being complete" in unclear in the list case. What should

na.omit(list(5, NA, c(NA, 5)))

return? Just the first element, or the first and the last? It seems,

at

least to me, unclear. A small change to the documentation to to add

"atomic

is.na(list(5, NA, c(NA, 5)))

[1] FALSE  TRUE FALSE

Following Toby's argument, it's clear to me: the first and the last.

I?aki

(in the sense of is.atomic returning \code{TRUE})" in front of

"vectors"

or similar  where what types of objects are supported seems

justified,

though, imho, as the current documentation is either ambiguous or
technically incorrect, depending on what we take "vector" to mean.

Best,
~G

On Wed, Aug 11, 2021 at 10:16 PM Toby Hocking <tdhock5 at gmail.com>

wrote:

Also, the na.omit method for data.frame with list column seems to

be

inconsistent with is.na,

L <- list(NULL, NA, 0)
str(f <- data.frame(I(L)))

'data.frame': 3 obs. of  1 variable:
 $ L:List of 3
  ..$ : NULL
  ..$ : logi NA
  ..$ : num 0
  ..- attr(*, "class")= chr "AsIs"

is.na(f)

         L
[1,] FALSE
[2,]  TRUE
[3,] FALSE

na.omit(f)

   L
1
2 NA
3  0

On Wed, Aug 11, 2021 at 9:58 PM Toby Hocking <tdhock5 at gmail.com>

wrote:

na.omit is documented as "na.omit returns the object with

incomplete

cases

removed." and "At present these will handle vectors," so I

expected that

when it is used on a list, it should return the same thing as if

we

subset

via is.na; however I observed the following,

L <- list(NULL, NA, 0)
str(L[!is.na(L)])

List of 2
 $ : NULL
 $ : num 0

str(na.omit(L))

List of 3
 $ : NULL
 $ : logi NA
 $ : num 0

Should na.omit be fixed so that it returns a result that is

consistent

with is.na? I assume that is.na is the canonical definition of

what

should be considered a missing value in R.

        [[alternative HTML version deleted]]

______________________________________________
R-devel at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

______________________________________________
R-devel at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

______________________________________________
R-devel at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

______________________________________________
R-devel at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

______________________________________________
R-devel at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

1 day later

Toby Hocking

Mon, Aug 16, 2021 10:54 AM #

To clarify, ?is.na docs say that 'na.omit' returns the object with
incomplete cases removed.
If we take is.na to be the definition of "incomplete cases" then a list
element with scalar NA is incomplete.
About the data.frame method, in my opinion it is highly
confusing/inconsistent for na.omit to keep rows with incomplete cases in
list columns, but not in columns which are atomic vectors,

num
1   1
2  NA
3   2

num
[1,] FALSE
[2,]  TRUE
[3,] FALSE

num
1   1
3   2

list
1    1
2   NA
3    2

list
[1,] FALSE
[2,]  TRUE
[3,] FALSE

list
1    1
2   NA
3    2

On Sat, Aug 14, 2021 at 5:15 PM Gabriel Becker <gabembecker at gmail.com>
wrote:

I understand what is.na does, the issue I have is that its task is not
equivalent to the conceptual task na.omit is doing, in my opinion, as
illustrated by what the data.frame method does.

Thus what i was getting at above about it not being clear that lst[is.na(lst)]
being the correct thing for na.omit to do

~G

~G

On Sat, Aug 14, 2021, 1:49 PM Toby Hocking <tdhock5 at gmail.com> wrote:

Some relevant information from ?is.na: the behavior for lists is
documented,

     For is.na, elementwise the result is false unless that element
     is a length-one atomic vector and the single element of that
     vector is regarded as NA or NaN (note that any is.na method
     for the class of the element is ignored).

Also there are other functions anyNA and is.na<- which are consistent
with
is.na. That is, anyNA only returns TRUE if the list has an element which
is
a scalar NA. And is.na<- sets list elements to logical NA to indicate
missingness.

On Fri, Aug 13, 2021 at 1:10 AM Hugh Parsonage <hugh.parsonage at gmail.com>
wrote:

The data.frame method deliberately skips non-atomic columns before
invoking is.na(x) so I think it is fair to assume this behaviour is
intentional and assumed.

Not so clear to me that there is a sensible answer for list columns.
(List columns seem to collide with the expectation that in each
variable every observation will be of the same type)

Consider your list L as

L <- list(NULL, NA, c(NA, NA))

Seems like every observation could have a claim to be 'missing' here.
Concretely, if a data.frame had a list column representing the lat-lon
of an observation, we might only be able to represent missing values
like c(NA, NA).

On Fri, 13 Aug 2021 at 17:27, I?aki Ucar <iucar at fedoraproject.org>

wrote:

On Thu, 12 Aug 2021 at 22:20, Gabriel Becker <gabembecker at gmail.com>

wrote:

Hi Toby,

This definitely appears intentional, the first  expression of
stats:::na.omit.default is

   if (!is.atomic(object))

        return(object)

I don't follow your point. This only means that the *default* method
is not intended for non-atomic cases, but it doesn't mean it shouldn't
exist a method for lists.

So it is explicitly just returning the object in non-atomic cases,

which

includes lists. I was not involved in this decision (obviously) but

my

guess is that it is due to the fact that what constitutes an

observation

"being complete" in unclear in the list case. What should

na.omit(list(5, NA, c(NA, 5)))

return? Just the first element, or the first and the last? It

seems, at

least to me, unclear. A small change to the documentation to to add

"atomic

is.na(list(5, NA, c(NA, 5)))

[1] FALSE  TRUE FALSE

Following Toby's argument, it's clear to me: the first and the last.

I?aki

(in the sense of is.atomic returning \code{TRUE})" in front of

"vectors"

or similar  where what types of objects are supported seems

justified,

though, imho, as the current documentation is either ambiguous or
technically incorrect, depending on what we take "vector" to mean.

Best,
~G

On Wed, Aug 11, 2021 at 10:16 PM Toby Hocking <tdhock5 at gmail.com>

wrote:

Also, the na.omit method for data.frame with list column seems to

be

inconsistent with is.na,

L <- list(NULL, NA, 0)
str(f <- data.frame(I(L)))

'data.frame': 3 obs. of  1 variable:
 $ L:List of 3
  ..$ : NULL
  ..$ : logi NA
  ..$ : num 0
  ..- attr(*, "class")= chr "AsIs"

is.na(f)

         L
[1,] FALSE
[2,]  TRUE
[3,] FALSE

na.omit(f)

   L
1
2 NA
3  0

On Wed, Aug 11, 2021 at 9:58 PM Toby Hocking <tdhock5 at gmail.com>

wrote:

na.omit is documented as "na.omit returns the object with

incomplete

cases

removed." and "At present these will handle vectors," so I

expected that

when it is used on a list, it should return the same thing as

if we

subset

via is.na; however I observed the following,

L <- list(NULL, NA, 0)
str(L[!is.na(L)])

List of 2
 $ : NULL
 $ : num 0

str(na.omit(L))

List of 3
 $ : NULL
 $ : logi NA
 $ : num 0

Should na.omit be fixed so that it returns a result that is

consistent

with is.na? I assume that is.na is the canonical definition of

what

should be considered a missing value in R.

        [[alternative HTML version deleted]]

______________________________________________
R-devel at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

______________________________________________
R-devel at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

______________________________________________
R-devel at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

______________________________________________
R-devel at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

______________________________________________
R-devel at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Gabriel Becker

Mon, Aug 16, 2021 12:21 PM #

Hi Toby,

Right, my point is that is.na being equivalent to "is an incomplete case"
is really only true for atomic vectors. I don't see it being the case for
lists, given what is.na does for lists. This is all just  my opinion, but
that's my take: vec[!is.na(vec)] happens to be the same as na.omit(vec) for
atomics, but in general the operations are not equivalent and I wouldn't
expect them to be.

Best,
~G

On Mon, Aug 16, 2021 at 10:54 AM Toby Hocking <tdhock5 at gmail.com> wrote:

To clarify, ?is.na docs say that 'na.omit' returns the object with
incomplete cases removed.
If we take is.na to be the definition of "incomplete cases" then a list
element with scalar NA is incomplete.
About the data.frame method, in my opinion it is highly
confusing/inconsistent for na.omit to keep rows with incomplete cases in
list columns, but not in columns which are atomic vectors,

(f.num <- data.frame(num=c(1,NA,2)))

  num
1   1
2  NA
3   2

is.na(f.num)

       num
[1,] FALSE
[2,]  TRUE
[3,] FALSE

na.omit(f.num)

  num
1   1
3   2

(f.list <- data.frame(list=I(list(1,NA,2))))

  list
1    1
2   NA
3    2

is.na(f.list)

      list
[1,] FALSE
[2,]  TRUE
[3,] FALSE

na.omit(f.list)

  list
1    1
2   NA
3    2

On Sat, Aug 14, 2021 at 5:15 PM Gabriel Becker <gabembecker at gmail.com>
wrote:

I understand what is.na does, the issue I have is that its task is not
equivalent to the conceptual task na.omit is doing, in my opinion, as
illustrated by what the data.frame method does.

Thus what i was getting at above about it not being clear that lst[is.na

(lst)]

being the correct thing for na.omit to do

~G

~G

On Sat, Aug 14, 2021, 1:49 PM Toby Hocking <tdhock5 at gmail.com> wrote:

Some relevant information from ?is.na: the behavior for lists is
documented,

     For is.na, elementwise the result is false unless that element
     is a length-one atomic vector and the single element of that
     vector is regarded as NA or NaN (note that any is.na method
     for the class of the element is ignored).

Also there are other functions anyNA and is.na<- which are consistent
with
is.na. That is, anyNA only returns TRUE if the list has an element

which

is
a scalar NA. And is.na<- sets list elements to logical NA to indicate
missingness.

On Fri, Aug 13, 2021 at 1:10 AM Hugh Parsonage <

hugh.parsonage at gmail.com>

wrote:

The data.frame method deliberately skips non-atomic columns before
invoking is.na(x) so I think it is fair to assume this behaviour is
intentional and assumed.

Not so clear to me that there is a sensible answer for list columns.
(List columns seem to collide with the expectation that in each
variable every observation will be of the same type)

Consider your list L as

L <- list(NULL, NA, c(NA, NA))

Seems like every observation could have a claim to be 'missing' here.
Concretely, if a data.frame had a list column representing the lat-lon
of an observation, we might only be able to represent missing values
like c(NA, NA).

On Fri, 13 Aug 2021 at 17:27, I?aki Ucar <iucar at fedoraproject.org>

wrote:

On Thu, 12 Aug 2021 at 22:20, Gabriel Becker <gabembecker at gmail.com

wrote:

Hi Toby,

This definitely appears intentional, the first  expression of
stats:::na.omit.default is

   if (!is.atomic(object))

        return(object)

I don't follow your point. This only means that the *default* method
is not intended for non-atomic cases, but it doesn't mean it

shouldn't

exist a method for lists.

So it is explicitly just returning the object in non-atomic cases,

which

includes lists. I was not involved in this decision (obviously)

but

my

guess is that it is due to the fact that what constitutes an

observation

"being complete" in unclear in the list case. What should

na.omit(list(5, NA, c(NA, 5)))

return? Just the first element, or the first and the last? It

seems, at

least to me, unclear. A small change to the documentation to to

add

"atomic

is.na(list(5, NA, c(NA, 5)))

[1] FALSE  TRUE FALSE

Following Toby's argument, it's clear to me: the first and the last.

I?aki

(in the sense of is.atomic returning \code{TRUE})" in front of

"vectors"

or similar  where what types of objects are supported seems

justified,

though, imho, as the current documentation is either ambiguous or
technically incorrect, depending on what we take "vector" to mean.

Best,
~G

On Wed, Aug 11, 2021 at 10:16 PM Toby Hocking <tdhock5 at gmail.com>

wrote:

Also, the na.omit method for data.frame with list column seems

to

be

inconsistent with is.na,

L <- list(NULL, NA, 0)
str(f <- data.frame(I(L)))

'data.frame': 3 obs. of  1 variable:
 $ L:List of 3
  ..$ : NULL
  ..$ : logi NA
  ..$ : num 0
  ..- attr(*, "class")= chr "AsIs"

is.na(f)

         L
[1,] FALSE
[2,]  TRUE
[3,] FALSE

na.omit(f)

   L
1
2 NA
3  0

On Wed, Aug 11, 2021 at 9:58 PM Toby Hocking <tdhock5 at gmail.com

wrote:

na.omit is documented as "na.omit returns the object with

incomplete

cases

removed." and "At present these will handle vectors," so I

expected that

when it is used on a list, it should return the same thing as

if we

subset

via is.na; however I observed the following,

L <- list(NULL, NA, 0)
str(L[!is.na(L)])

List of 2
 $ : NULL
 $ : num 0

str(na.omit(L))

List of 3
 $ : NULL
 $ : logi NA
 $ : num 0

Should na.omit be fixed so that it returns a result that is

consistent

with is.na? I assume that is.na is the canonical definition

of

what

should be considered a missing value in R.

        [[alternative HTML version deleted]]

______________________________________________
R-devel at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

______________________________________________
R-devel at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

______________________________________________
R-devel at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

______________________________________________
R-devel at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

______________________________________________
R-devel at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

______________________________________________
R-devel at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel