Although you have provided R code to illustrate your problem, it is fundamentally a statistics theory question, and belongs somewhere else like stats.stackexchange.net.
When you post there, I recommend that you spend more effort to identify why the zeros are present. If they are indicators of unknown values, that will be very different than if zeros are valid members of the population.
---------------------------------------------------------------------------
Jeff Newmiller The ..... ..... Go Live...
DCN:<jdnewmil at dcn.davis.ca.us> Basics: ##.#. ##.#. Live Go...
Live: OO#.. Dead: OO#.. Playing
Research Engineer (Solar/Batteries O.O#. #.O#. with
/Software/Embedded Controllers) .OO#. .OO#. rocks...1k
---------------------------------------------------------------------------
Sent from my phone. Please excuse my brevity.
"Kehl D?niel"<kehld at ktk.pte.hu> wrote:
Dear List-members,
I have a problem where I have to estimate a mean, or a sum of a
population but for some reason it contains a huge amount of zeros.
I cannot give real data but I constructed a toy example as follows
N1<- 100000
N2<- 3000
x1<- rep(0,N1)
x2<- rnorm(N2,300,100)
x<- c(x1,x2)
n<- 1000
x_sample<- sample(x,n,replace=FALSE)
I want to estimate the sum of x based on x_sample (not knowing N1 and
N2
but their sum (N) only).
The sample mean has a huge standard deviation I am looking for a better
estimator.
I was thinking about trimmed (or "left trimmed" as my numbers are all
positive) means or something similar,
but if I calculate trimmed mean I do not know N2 to multiply with.
Do you have any idea or could you give me some insight?
Thanks a lot:
Daniel