[FORGED] Q re: logical indexing with is.na

Hi

Do you want something like this?
x <- c(1,2,NA, 3, 4, 5, NA, 6,7,8, NA, NA, 9,10)
y <- c(1,2,NA, NA, 3, 4, 5, 6, NA, 7,8, NA, NA, 9,10)
identical(x[which(!is.na(x))], y[which(!is.na(y))])
[1] TRUE

If I expect NA and want to extract or compare something, I tend to use which to select only non NA elements.

Cheers
Petr
-----Original Message-----
From: R-help <r-help-bounces at r-project.org> On Behalf Of David Goldsmith
Sent: Sunday, March 10, 2019 7:16 AM
Cc: r-help at r-project.org
Subject: Re: [R] [FORGED] Q re: logical indexing with is.na

Thanks, all.  I had read about recycling, but I guess I didn't fully appreciate all
the "weirdness" it might produce. :/

With this explained, I'm going to ask a follow-up, which is only contextually
related: the impetus for this discovery was checking "corner cases" to
determine if all(x[!is.na(x)]==y[!is.na(y)]) would suffice to determine equality of
two vectors containing NA's.  Between the above result; my related discovery
that this indexing preserves relative positional info but not absolute positional
info; and the performance penalty when comparing long vectors that may be
unequal "early on";  I've concluded that--if it (can be made to) "short circuit"--it
would probably be better to use an implicit loop.  So that's my Q: will (or can)
an implicit loop (be made to) "exit early" if a specified condition is met before
all indices have been checked?

Thanks again!

DLG

On Sat, Mar 9, 2019 at 9:07 PM Jeff Newmiller <jdnewmil at dcn.davis.ca.us>
wrote:

Regarding the mention of logical indexing, under ?Extract I see:

For [-indexing only: i, j, ... can be logical vectors, indicating
elements/slices to select. Such vectors are recycled if necessary to
match the corresponding extent. i, j, ... can also be negative
integers, indicating elements/slices to leave out of the selection.

On March 9, 2019 6:57:05 PM PST, Rolf Turner <r.turner at auckland.ac.nz>
wrote:
On 3/10/19 2:36 PM, David Goldsmith wrote:
Hi!  Newbie (self-)learning R using P. Dalgaard's "Intro Stats w/
R";
not
new to statistics (have had grad-level courses and work experience
in
statistics) or vectorized programming syntax (have extensive
experience
with MatLab, Python/NumPy, and IDL, and even a smidgen--a long time
ago--of
experience w/ S-plus).

In exploring the use of is.na in the context of logical indexing,
I've come
across the following puzzling-to-me result:

y; !is.na(y[1:3]); y[!is.na(y[1:3])]
[1]  0.3534253 -1.6731597         NA -0.2079209
[1]  TRUE  TRUE FALSE
[1]  0.3534253 -1.6731597 -0.2079209

As you can see, y is a four element vector, the third element of
which is
NA; the next line gives what I would expect--T T F--because the
first
two
elements are not NA but the third element is.  The third line is
what confuses me: why is the result not the two element vector
consisting
of
simply the first two elements of the vector (or, if vectorized
indexing in
R is implemented to return a vector the same length as the logical
index
vector, which appears to be the case, at least the first two
elements
and
then either NA or NaN in the third slot, where the logical indexing
vector
is FALSE): why does the implementation "go looking" for an element
whose
index in the "original" vector, 4, is larger than BOTH the largest
index
specified in the inner-most subsetting index AND the size of the
resulting
indexing vector?  (Note: at first I didn't even understand why the
result
wasn't simply

0.3534253 -1.6731597         NA

but then I realized that the third logical index being FALSE, there
was no
reason for *any* element to be there; but if there is, due to some
overriding rule regarding the length of the result relative to the
length
of the indexer, shouldn't it revert back to *something* that
indicates the
"FALSE"ness of that indexing element?)

Thanks!
It happens because R is eco-concious and re-cycles. :-)

Try:

ok <- c(TRUE,TRUE,FALSE)
(1:4)[ok]

In general in R if there is an operation involving two vectors then
the shorter one gets recycled to provide sufficiently many entries to
match those of the longer vector.

This in the foregoing example the first entry of "ok" gets used
again, to make a length 4 vector to match up with 1:4.  The result is
the same

as (1:4)[c(TRUE,TRUE,FALSE,TRUE)].

If you did (1:7)[ok] you'd get the same result as that from
(1:7)[c(TRUE,TRUE,FALSE,TRUE,TRUE,FALSE,TRUE)] i.e. "ok" gets
recycled 2 and 1/3 times.

Try 10*(1:3) + 1:4, 10*(1:3) + 1:5, 10*(1:3) + 1:6 .

Note that in the first two instances you get warnings, but in the
third you don't, since 6 is an integer multiple of 3.

Why aren't there warnings when logical indexing is used?  I guess
because it would be annoying.  Maybe.

Note that integer indices get recycled too, but the recycling is
limited so as not to produce redundancies.  So

(1:4)[1:3] just (sensibly) gives

[1] 1 2 3

and *not*

[1] 1 2 3 1

Perhaps a bit subtle, but it gives what you'd actually *want* rather
than being pedantic about rules with a result that you wouldn't want.

cheers,

Rolf Turner

P.S.  If you do

y[1:3][!is.na(y[1:3])]

i.e. if you're careful to match the length of the vector and the that
of the indices, you get what you initially expected.

R. T.

P^2.S.  To the younger and wiser heads on this list:  the help on "["
does not mention that the index vectors can be logical.  I couldn't
find anything about logical indexing in the R help files.  Is
something missing here, or am I just not looking in the right place?

R. T.
--
Sent from my phone. Please excuse my brevity.

[[alternative HTML version deleted]]

______________________________________________
R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-
guide.html
and provide commented, minimal, self-contained, reproducible code.
Osobn? ?daje: Informace o zpracov?n? a ochran? osobn?ch ?daj? obchodn?ch partner? PRECHEZA a.s. jsou zve?ejn?ny na: https://www.precheza.cz/zasady-ochrany-osobnich-udaju/ | Information about processing and protection of business partner?s personal data are available on website: https://www.precheza.cz/en/personal-data-protection-principles/
D?v?rnost: Tento e-mail a jak?koliv k n?mu p?ipojen? dokumenty jsou d?v?rn? a podl?haj? tomuto pr?vn? z?vazn?mu prohl??en? o vylou?en? odpov?dnosti: https://www.precheza.cz/01-dovetek/ | This email and any documents attached to it may be confidential and are subject to the legally binding disclaimer: https://www.precheza.cz/en/01-disclaimer/

[FORGED] Q re: logical indexing with is.na

Thread (9 messages)