Message-ID: <1FD84824-F0B3-4B4F-8AFF-35091A4A5548@comcast.net>
Date: 2012-05-05T14:53:57Z
From: David Winsemius
Subject: estimation problem
In-Reply-To: <20120504202215.GA7247@cs.cas.cz>
On May 4, 2012, at 4:22 PM, Petr Savicky wrote:
> On Fri, May 04, 2012 at 07:43:32PM +0200, Kehl D?niel wrote:
>> Dear Petr,
>>
>> thank you for your input.
>> I tried to experiment with (probably somewhat biased) truncated means
>> like in the following code.
>> How I got the 225 as a truncation limit is a good question. :)
>>
>> REPS1 <- REPS2 <- 1000
>> N1 <- 100000
>> N2 <- 30000
>> N <- N1+N2
>> x1 <- rep(0,N1)
>> x2 <- rnorm(N2,300,100)
>> x <- c(x1,x2)
>>
>> n <- 1000
>>
>> for (i in 1:REPS1){
>> x_sample <- sort(sample(x,n,replace=FALSE),TRUE)
>> x_trunc <- x_sample[1:225]
>> REPS1[i] <- mean(x_sample)*N
>> REPS2[i] <- sum(x_trunc)/n*N
>> }
>>
>> sum(x2)
>> mean(REPS1)
>> mean(REPS2)
>> sd(REPS1)
>> sd(REPS2)
>> sd(REPS2)/sd(REPS1)
>
> Dear Daniel.
>
> Thank you for your reply.
>
> In the original question, you used the parameters
>
> N1 <- 100000
> N2 <- 3000
>
> and now the parameters
>
> N1 <- 100000
> N2 <- 30000
>
> My remark was that with the original parameters, there are only 29.1
> nonzero elements on average. Now, there are 230.8 nonzero elements on
> average, which is significantly better.
>
> Discussion of the use of the truncated mean is probably a question to
> other members of the list. I do not feel to be an expert on this.
>
> Best, Petr.
My experience is that Petr is better than I at much of R, but so far
in this thread I have not seen mention of methods that are designed to
examine data situations with large numbers of zeros. There is a very
informative review of R techniques and packages to such efforts by
Achim Zeileis and others. The same material was published in the
Journal of Statistical Software and as a vignette in one of the
contributed packages:
www.jstatsoft.org/v27/i08/paper
cran.r-project.org/web/packages/pscl/vignettes/countreg.pdf
I don't have this information memorized, but generally find a Google-
search with "count r zeileis" to be highly effective. I've just
noticed that the second author Kleiber also has put up useful material
on that topic for web-searchers to use.
David Winsemius, MD
West Hartford, CT