Discourage the weights= option of lm with summarized data

On 3 Dec 2017, at 16:31 , Arie ten Cate <arietencate at gmail.com> wrote:

Peter,

This is a highly structured text. Just for the discussion, I separate
the building blocks, where (D) and (E) and (F) are new:

BEGIN OF TEXT --------------------

(A)

Non-?NULL? ?weights? can be used to indicate that different
observations have different variances (with the values in ?weights?
being inversely proportional to the variances);

(B)

or equivalently, when the elements of ?weights? are positive integers
w_i, that each response y_i is the mean of w_i unit-weight
observations

(C)

(including the case that there are w_i observations equal to y_i and
the data have been summarized).

(D)

However, in the latter case, notice that within-group variation is not
used. Therefore, the sigma estimate and residual degrees of freedom
may be suboptimal;

(E)

in the case of replication weights, even wrong.

(F)

Hence, standard errors and analysis of variance tables should be
treated with care.

END OF TEXT --------------------

I don't understand (D), partly because it is unclear to me whether (D)
refers to (C) or to (B)+(C):

Discourage the weights= option of lm with summarized data

Thread (2 messages)