Skip to content

density with weights missing values

11 messages · Matthias Gondan, Bert Gunter, Jeff Newmiller +3 more

#
Weighted mean behaves differently:
? weight is excluded for missing x
? no warning for sum(weights) != 1
[1] 2.5
[1] NA
[1] 2




Von: Richard O'Keefe
Gesendet: Montag, 12. Juli 2021 13:18
An: Matthias Gondan
Betreff: Re: [R] density with weights missing values

Does your copy of R say that the weights must add up to 1?
?density doesn't say that in mine.   But it does check.
On Mon, 12 Jul 2021 at 22:42, Matthias Gondan <matthias-gondan at gmx.de> wrote:

  
  
#
The behavior is as documented AFAICS.

na.rm
logical; if TRUE, missing values are removed from x. If FALSE any
missing values cause an error.

The default is FALSE.

weights
numeric vector of non-negative observation weights.

NA is not a non-negative numeric.

Bert Gunter

"The trouble with having an open mind is that people keep coming along
and sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )

Bert Gunter

"The trouble with having an open mind is that people keep coming along
and sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
On Mon, Jul 12, 2021 at 6:10 AM Matthias Gondan <matthias-gondan at gmx.de> wrote:
#
The thing is that for na.rm=TRUE, I would expect the weights corresponding to the missing x to be removed, as well. Like in weighted.mean. So this one shouldn't raise an error,density(c(1, 2, 3, 4, 5, NA), na.rm=TRUE, weights=c(1, 1, 1, 1, 1, 1))Or am I missing something??
-------- Urspr?ngliche Nachricht --------Von: Bert Gunter <bgunter.4567 at gmail.com> Datum: 12.07.21  16:25  (GMT+01:00) An: Matthias Gondan <matthias-gondan at gmx.de> Cc: r-help at r-project.org Betreff: Re: [R] density with weights missing values The behavior is as documented AFAICS.na.rmlogical; if TRUE, missing values are removed from x. If FALSE anymissing values cause an error.The default is FALSE.weightsnumeric vector of non-negative observation weights.NA is not a non-negative numeric.Bert Gunter"The trouble with having an open mind is that people keep coming alongand sticking things into it."-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )Bert Gunter"The trouble with having an open mind is that people keep coming alongand sticking things into it."-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )On Mon, Jul 12, 2021 at 6:10 AM Matthias Gondan <matthias-gondan at gmx.de> wrote:>> Weighted mean behaves differently:> ? weight is excluded for missing x> ? no warning for sum(weights) != 1>> > weighted.mean(c(1, 2, 3, 4), weights=c(1, 1, 1, 1))> [1] 2.5> > weighted.mean(c(1, 2, 3, NA), weights=c(1, 1, 1, 1))> [1] NA> > weighted.mean(c(1, 2, 3, NA), weights=c(1, 1, 1, 1), na.rm=TRUE)> [1] 2>>>>> Von: Richard O'Keefe> Gesendet: Montag, 12. Juli 2021 13:18> An: Matthias Gondan> Betreff: Re: [R] density with weights missing values>> Does your copy of R say that the weights must add up to 1?> ?density doesn't say that in mine.?? But it does check.>> On Mon, 12 Jul 2021 at 22:42, Matthias Gondan <matthias-gondan at gmx.de> wrote:> >> > Dear R users,> >> > This works as expected:> >> > ? plot(density(c(1,2, 3, 4, 5, NA), na.rm=TRUE))> >> > This raises an error> >> > ? plot(density(c(1,2, 3, 4, 5, NA), na.rm=TRUE, weights=c(1, 1, 1, 1, 1, 1)))> > ? plot(density(c(1,2, 3, 4, 5, NA), na.rm=TRUE, weights=c(1, 1, 1, 1, 1, NA)))> >> > This seems to work (it triggers a warning that the weights don?t add up to 1, which makes sense*):> >> > ? plot(density(c(1,2, 3, 4, 5, NA), na.rm=TRUE, weights=c(1, 1, 1, 1, 1)))> >> > Questions> >> > ? But shouldn?t the na.rm filter also filter the corresponding weights?> > ? Extra question: In case the na.rm filter is changed to filter the weights, the check for sum(weights) == 1 might trigger false positive warnings since the weights might not add up to 1 anymore> >> > Best wishes,> >> > Matthias> >> >> >???????? [[alternative HTML version deleted]]> >> > ______________________________________________> > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see> > https://stat.ethz.ch/mailman/listinfo/r-help> > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html> > and provide commented, minimal, self-contained, reproducible code.>>>???????? [[alternative HTML version deleted]]>> ______________________________________________> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see> https://stat.ethz.ch/mailman/listinfo/r-help> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html> and provide commented, minimal, self-contained, reproducible code.
#
My point (confusingly made!) is that documented behavior is all you
should expect. The docs say that weights must be non-negative numeric.
If they aren't...

"Consistency" of behavior among different functions is highly
subjective -- it depends exactly on what one considers to be
"consistent", nicht wahr? And, of course, with thousands of packages
and hundreds of weight functions used for different purposes, this
seems a practical impossibility here.

However, I would agree that given R's "organic" growth over time,
"jarring" inconsistencies (i.e. that most would agree are
inconsistent) may exist. This may be such a case. But, again, all you
can do is follow the docs whether or not the behavior meets your
"reasonable" expectations.

Just my opinion, of course. Consume at your own risk.

Bert Gunter

"The trouble with having an open mind is that people keep coming along
and sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
On Mon, Jul 12, 2021 at 9:13 AM matthias-gondan <matthias-gondan at gmx.de> wrote:
#
Sure, you might think that.

But most likely the reason this code has not been corrected is that when you give weights for missing data the most correct result is for your entire density to be invalid.

Fix your inputs so they make sense to you and there is no problem. But absent your intellectual input to restructure your problem the weights no longer make sense once density() removes the NAs from the data.
On July 12, 2021 9:13:12 AM PDT, matthias-gondan <matthias-gondan at gmx.de> wrote:

  
    
#
You're right, of course. Extrapolating your argument a bit, the whole practice of na.rm is questionable, since there's always a reason for missingness (that is not in x and rarely elsewhere in the data)Best wishes?Matthias?
-------- Urspr?ngliche Nachricht --------Von: Jeff Newmiller <jdnewmil at dcn.davis.ca.us> Datum: 12.07.21  18:44  (GMT+01:00) An: r-help at r-project.org, matthias-gondan <matthias-gondan at gmx.de>, Bert Gunter <bgunter.4567 at gmail.com> Cc: r-help at r-project.org Betreff: Re: [R] density with weights missing values Sure, you might think that.But most likely the reason this code has not been corrected is that when you give weights for missing data the most correct result is for your entire density to be invalid.Fix your inputs so they make sense to you and there is no problem. But absent your intellectual input to restructure your problem the weights no longer make sense once density() removes the NAs from the data.On July 12, 2021 9:13:12 AM PDT, matthias-gondan <matthias-gondan at gmx.de> wrote:>The thing is that for na.rm=TRUE, I would expect the weights>corresponding to the missing x to be removed, as well. Like in>weighted.mean. So this one shouldn't raise an error,density(c(1, 2, 3,>4, 5, NA), na.rm=TRUE, weights=c(1, 1, 1, 1, 1, 1))Or am I missing>something??>-------- Urspr?ngliche Nachricht --------Von: Bert Gunter><bgunter.4567 at gmail.com> Datum: 12.07.21? 16:25? (GMT+01:00) An:>Matthias Gondan <matthias-gondan at gmx.de> Cc: r-help at r-project.org>Betreff: Re: [R] density with weights missing values The behavior is as>documented AFAICS.na.rmlogical; if TRUE, missing values are removed>from x. If FALSE anymissing values cause an error.The default is>FALSE.weightsnumeric vector of non-negative observation weights.NA is>not a non-negative numeric.Bert Gunter"The trouble with having an open>mind is that people keep coming alongand sticking things into it."-->Opus (aka Berkeley Breathed in his "Bloom County" comic strip )Bert>Gunter"The trouble with having an open mind is that people keep coming>alongand sticking things into it."-- Opus (aka Berkeley Breathed in his>"Bloom County" comic strip )On Mon, Jul 12, 2021 at 6:10 AM Matthias>Gondan <matthias-gondan at gmx.de> wrote:>> Weighted mean behaves>differently:> ? weight is excluded for missing x> ? no warning for>sum(weights) != 1>> > weighted.mean(c(1, 2, 3, 4), weights=c(1, 1, 1,>1))> [1] 2.5> > weighted.mean(c(1, 2, 3, NA), weights=c(1, 1, 1, 1))>>[1] NA> > weighted.mean(c(1, 2, 3, NA), weights=c(1, 1, 1, 1),>na.rm=TRUE)> [1] 2>>>>> Von: Richard O'Keefe> Gesendet: Montag, 12.>Juli 2021 13:18> An: Matthias Gondan> Betreff: Re: [R] density with>weights missing values>> Does your copy of R say that the weights must>add up to 1?> ?density doesn't say that in mine.?? But it does check.>>>On Mon, 12 Jul 2021 at 22:42, Matthias Gondan <matthias-gondan at gmx.de>>wrote:> >> > Dear R users,> >> > This works as expected:> >> > ?>plot(density(c(1,2, 3, 4, 5, NA), na.rm=TRUE))> >> > This raises an>error> >> > ? plot(density(c(1,2, 3, 4, 5, NA), na.rm=TRUE,>weights=c(1, 1, 1, 1, 1, 1)))> > ? plot(density(c(1,2, 3, 4, 5, NA),>na.rm=TRUE, weights=c(1, 1, 1, 1, 1, NA)))> >> > This seems to work (it>triggers a warning that the weights don?t add up to 1, which makes>sense*):> >> > ? plot(density(c(1,2, 3, 4, 5, NA), na.rm=TRUE,>weights=c(1, 1, 1, 1, 1)))> >> > Questions> >> > ? But shouldn?t the>na.rm filter also filter the corresponding weights?> > ? Extra>question: In case the na.rm filter is changed to filter the weights,>the check for sum(weights) == 1 might trigger false positive warnings>since the weights might not add up to 1 anymore> >> > Best wishes,> >>>> Matthias> >> >> >???????? [[alternative HTML version deleted]]> >> >>______________________________________________> > R-help at r-project.org>mailing list -- To UNSUBSCRIBE and more, see> >>https://stat.ethz.ch/mailman/listinfo/r-help> > PLEASE do read the>posting guide http://www.R-project.org/posting-guide.html> > and>provide commented, minimal, self-contained, reproducible>code.>>>???????? [[alternative HTML version deleted]]>>>______________________________________________> R-help at r-project.org>mailing list -- To UNSUBSCRIBE and more, see>>https://stat.ethz.ch/mailman/listinfo/r-help> PLEASE do read the>posting guide http://www.R-project.org/posting-guide.html> and provide>commented, minimal, self-contained, reproducible code.>	[[alternative HTML version deleted]]>>______________________________________________>R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see>https://stat.ethz.ch/mailman/listinfo/r-help>PLEASE do read the posting guide>http://www.R-project.org/posting-guide.html>and provide commented, minimal, self-contained, reproducible code.-- Sent from my phone. Please excuse my brevity.
#
I think the missing weights are more crucial than equally-weighted missing data would be.

what if there is a heavy weight on the missing values? it could completely change the interpretation of the result.
On July 12, 2021 10:22:19 AM PDT, matthias-gondan <matthias-gondan at gmx.de> wrote:

  
    
#
On 12/07/2021 1:22 p.m., matthias-gondan wrote:
For what it's worth, I partly agree with you:  if you specify na.rm = 
TRUE, it shouldn't make your x and weights vectors incompatible.

Regarding the warning about the sum of weights:  perhaps there's some 
reason that someone would want to create an unnormalized density, and 
that lets you do it.  An unnormalized mean doesn't make any sense, so I 
wouldn't call it a design flaw that the weighted density behaves 
differently than the weighted mean.  On the other hand, it would likely 
make more sense to normalize the density, and that's how I hope I would 
have designed it.

Thinking about this, I guessed density() was a really old function, so 
this was a case of trying to be S-compatible, but it turns out the 
weights argument was added in 2005 in r34130, so perhaps someone still 
remembers what the thinking was.

Duncan Murdoch

P.S.  I think you're posting in HTML, which makes your messages look 
really messy.  If you can turn that off, they'd be clearer.
#
> Weighted mean behaves differently:
    > ? weight is excluded for missing x
    > ? no warning for sum(weights) != 1

    >> weighted.mean(c(1, 2, 3, 4), weights=c(1, 1, 1, 1))
    > [1] 2.5
    >> weighted.mean(c(1, 2, 3, NA), weights=c(1, 1, 1, 1))
    > [1] NA
    >> weighted.mean(c(1, 2, 3, NA), weights=c(1, 1, 1, 1), na.rm=TRUE)
    > [1] 2


I'm sure the 'weights' argument in weighted.mean() has been used
much more often than the one in density().
Hence, it's quite "probable statistically" :-)  that the
weighted.mean() behavior in the NA case has been more rational
and thought through 

So I agree with you, Matthias, that ideally density() should
behave differently here,  probably entirely analogously to weighted.mean().

Still, Bert and others are right that there is no bug formally,
but something that possibly should be changed; even though it
breaks back compatibility for those cases,  such case may be
very rare (I'm not sure I've ever used weights in density() but
I know I've used it very much all those 25 years ..).

https://www.r-project.org/bugs.html

contains good information about determining if something may be
a bug in R *and* tell you how to apply for an account on R's
bugzilla for reporting it formally.
I'm hereby encouraging you, Matthias, to do that and then in
your report mention both density() and weighted.mean(), i.e., a
cleaned up version of the union of your first 2 e-mails..

Thank you for thinking about this and concisely reporting it.
Martin


    > Von: Richard O'Keefe
    > Gesendet: Montag, 12. Juli 2021 13:18
    > An: Matthias Gondan
    > Betreff: Re: [R] density with weights missing values

    > Does your copy of R say that the weights must add up to 1?
    > ?density doesn't say that in mine.   But it does check.

another small part to could be improved, indeed,
thank you, Richard.

--
Martin Maechler
ETH Zurich  and  R Core team
> On Mon, 12 Jul 2021 at 22:42, Matthias Gondan <matthias-gondan at gmx.de> wrote:
>> 
    >> Dear R users,
    >> 
    >> This works as expected:
    >> 
    >> ? plot(density(c(1,2, 3, 4, 5, NA), na.rm=TRUE))
    >> 
    >> This raises an error
    >> 
    >> ? plot(density(c(1,2, 3, 4, 5, NA), na.rm=TRUE, weights=c(1, 1, 1, 1, 1, 1)))
    >> ? plot(density(c(1,2, 3, 4, 5, NA), na.rm=TRUE, weights=c(1, 1, 1, 1, 1, NA)))
[..............]
#
Thanks Martin and the others. I will do so accordingly. 

I guess the 0.1% of the population who uses density with weights will write code like this

x = c(1, 2, 3, NA)
weights = c(1, 1, 1, 1)
density(x[!is.na(x)], weights=weights[!is.na(x)])

These people won?t be affected. For the 0.01% of people with code like this,

density(x, weights=weights[!is.na(x)], na.rm=TRUE)

the corrected version would almost surely raise an error. Note that the error message can, in principle, check if length(x[!is.na(x)]) == length(the provided weights) and tell the programmer that this was the old behavior.

Best wishes,

Matthias

PS. Sorry for the HTML email. I?ve given up trying to fix such behavior.


Von: Martin Maechler
Gesendet: Dienstag, 13. Juli 2021 09:09
An: Matthias Gondan
Cc: r-help at r-project.org
Betreff: Re: [R] density with weights missing values
> Weighted mean behaves differently:
    > ? weight is excluded for missing x
    > ? no warning for sum(weights) != 1

    >> weighted.mean(c(1, 2, 3, 4), weights=c(1, 1, 1, 1))
    > [1] 2.5
    >> weighted.mean(c(1, 2, 3, NA), weights=c(1, 1, 1, 1))
    > [1] NA
    >> weighted.mean(c(1, 2, 3, NA), weights=c(1, 1, 1, 1), na.rm=TRUE)
    > [1] 2


I'm sure the 'weights' argument in weighted.mean() has been used
much more often than the one in density().
Hence, it's quite "probable statistically" :-)  that the
weighted.mean() behavior in the NA case has been more rational
and thought through 

So I agree with you, Matthias, that ideally density() should
behave differently here,  probably entirely analogously to weighted.mean().

Still, Bert and others are right that there is no bug formally,
but something that possibly should be changed; even though it
breaks back compatibility for those cases,  such case may be
very rare (I'm not sure I've ever used weights in density() but
I know I've used it very much all those 25 years ..).

https://www.r-project.org/bugs.html

contains good information about determining if something may be
a bug in R *and* tell you how to apply for an account on R's
bugzilla for reporting it formally.
I'm hereby encouraging you, Matthias, to do that and then in
your report mention both density() and weighted.mean(), i.e., a
cleaned up version of the union of your first 2 e-mails..

Thank you for thinking about this and concisely reporting it.
Martin


    > Von: Richard O'Keefe
    > Gesendet: Montag, 12. Juli 2021 13:18
    > An: Matthias Gondan
    > Betreff: Re: [R] density with weights missing values

    > Does your copy of R say that the weights must add up to 1?
    > ?density doesn't say that in mine.   But it does check.

another small part to could be improved, indeed,
thank you, Richard.

--
Martin Maechler
ETH Zurich  and  R Core team
> On Mon, 12 Jul 2021 at 22:42, Matthias Gondan <matthias-gondan at gmx.de> wrote:
>> 
    >> Dear R users,
    >> 
    >> This works as expected:
    >> 
    >> ? plot(density(c(1,2, 3, 4, 5, NA), na.rm=TRUE))
    >> 
    >> This raises an error
    >> 
    >> ? plot(density(c(1,2, 3, 4, 5, NA), na.rm=TRUE, weights=c(1, 1, 1, 1, 1, 1)))
    >> ? plot(density(c(1,2, 3, 4, 5, NA), na.rm=TRUE, weights=c(1, 1, 1, 1, 1, NA)))
[..............]
#
Den 2021-07-12 kl. 15:09, skrev Matthias Gondan:
One difference is that density has a named argument 'weights' not 
present in weighted.mean, which instead has 'w' for weights.
Annoying.

So, in your examples, the argument 'weights = ' is always ignored, at 
least for weighted.mean.default:

 > stats:::weighted.mean.default
function (x, w, ..., na.rm = FALSE)
{
     if (missing(w)) {
         if (na.rm)
             x <- x[!is.na(x)]
         return(sum(x)/length(x))
     }
     if (length(w) != length(x))
         stop("'x' and 'w' must have the same length")
     if (na.rm) {
         i <- !is.na(x)
         w <- w[i]
         x <- x[i]
     }
     sum((x * w)[w != 0])/sum(w)
}

But, using 'w' for weights, missing values in weights will work only if 
na.rm = TRUE and they match missing values in x. As documented.

[...]
and no warning for sum(w) != 1

That's because the weights w are normalized (after removing weights 
corresponding to missing values in x).

G,