Skip to content

Fit continuous distribution to truncated empirical values

7 messages · David Winsemius, Michele Mazzucco, Bert Gunter

#
Hi all,

I am trying to fit a distribution to some data about survival times.
I am interested only in a specific interval, e.g., while the data lies in the interval (0,...., 600), I want the best for the interval (0,..., 24).

I have tried both fitdistr (MASS package) and fitdist (from the fitdistrplus package), but I could not get them working, e.g.

fitdistr(left, "weibull", upper=24)
Error in optim(x = c(529L, 528L, 527L, 526L, 525L, 524L, 523L, 522L, 521L,  : 
  L-BFGS-B needs finite values of 'fn'
In addition: Warning message:
In dweibull(x, shape, scale, log) : NaNs produced

Am I doing something wrong?


Thanks,
Michele


p.s. I have seen similar posts, e.g., http://tolstoy.newcastle.edu.au/R/help/05/02/11558.html, but I am not sure whether I can apply the same approach here.
1 day later
#
On Nov 3, 2011, at 7:54 AM, Michele Mazzucco wrote:

            
You didn't supply data to test,  but shouldn't you supply a lower  
bound if you want to fit "weibull"? It is, after all, bounded at 0.

 > left <- c(529L, 528L, 527L, 526L, 525L, 524L, 523L, 522L, 521L,  
50*runif(100))
 > fitdistr(left, "weibull", upper=24)
Error in optim(x = c(529, 528, 527, 526, 525, 524, 523, 522, 521,  
18.3964251773432,  :
   L-BFGS-B needs finite values of 'fn'
In addition: Warning message:
In dweibull(x, shape, scale, log) : NaNs produced

 > fitdistr(left, "weibull", upper=24, lower=0.5)
       shape         scale
    0.58195013   24.00000000
  ( 0.04046087) ( 3.38621367)
David Winsemius, MD
Heritage Laboratories
West Hartford, CT
9 days later
#
Hello David,

thanks for your answer.
I have done as you told me, however the fit is very poor, much worse than that obtained from using the whole dataset (without upper bound).
Any idea?

Thanks,
Michele
On Nov 4, 2011, at 8:56 PM, David Winsemius wrote:

            
#
On Nov 14, 2011, at 6:11 AM, Michele Mazzucco wrote:

            
Counter questions in the absence of data:

??? Is there a reason from domain specific knowledge and theory to  
expect that the procedure you are attempting should be correct?

??? Why would you want to exclude data?

??? Why are you not looking for more general tutorials (such as the  
one by Ricci) rather than asking what is as yet an open-ended question  
on fitting methods that surely does not admit an answer easily  
provided on a mailing list?
#
David,

here is the smallest dataset

# Bid	Price	Survival	Censored
0.03	0.029	1	1
0.03	0.029	11	1
0.03	0.029	10	1
0.03	0.029	9	1
0.03	0.029	8	1
0.03	0.029	7	1
0.03	0.029	6	1
0.03	0.029	5	1
0.03	0.029	4	1
0.03	0.029	3	1
0.03	0.029	2	1
0.03	0.029	1	1
0.03	0.029	1	1
0.03	0.029	1	1
0.03	0.029	9	1
0.03	0.029	8	1
0.03	0.029	7	1
0.03	0.029	6	1
0.03	0.029	5	1
0.03	0.029	4	1
0.03	0.029	3	1
0.03	0.029	2	1
0.03	0.029	1	1
0.03	0.029	16	1
0.03	0.029	15	1
0.03	0.029	14	1
0.03	0.029	13	1
0.03	0.029	12	1
0.03	0.029	11	1
0.03	0.029	10	1
0.03	0.029	9	1
0.03	0.029	8	1
0.03	0.029	7	1
0.03	0.029	6	1
0.03	0.029	5	1
0.03	0.029	4	1
0.03	0.029	3	1
0.03	0.029	2	1
0.03	0.029	1	1
0.03	0.029	6	1
0.03	0.029	5	1
0.03	0.029	4	1
0.03	0.029	3	1
0.03	0.029	2	1
0.03	0.029	1	1
0.03	0.029	7	1
0.03	0.029	6	1
0.03	0.029	5	1
0.03	0.029	4	1
0.03	0.029	3	1
0.03	0.029	2	1
0.03	0.029	1	1
0.03	0.029	5	1
0.03	0.029	4	1
0.03	0.029	3	1
0.03	0.029	2	1
0.03	0.029	1	1
0.03	0.029	4	1
0.03	0.029	3	1
0.03	0.029	2	1
0.03	0.029	1	1
0.03	0.029	6	1
0.03	0.029	5	1
0.03	0.029	4	1
0.03	0.029	3	1
0.03	0.029	2	1
0.03	0.029	1	1
0.03	0.029	30	1
0.03	0.029	29	1
0.03	0.029	28	1
0.03	0.029	27	1
0.03	0.029	26	1
0.03	0.029	25	1
0.03	0.029	24	1
0.03	0.029	23	1
0.03	0.029	22	1
0.03	0.029	21	1
0.03	0.029	20	1
0.03	0.029	19	1
0.03	0.029	18	1
0.03	0.029	17	1
0.03	0.029	16	1
0.03	0.029	15	1
0.03	0.029	14	1
0.03	0.029	13	1
0.03	0.029	12	1
0.03	0.029	11	1
0.03	0.029	10	1
0.03	0.029	9	1
0.03	0.029	8	1
0.03	0.029	7	1
0.03	0.029	6	1
0.03	0.029	5	1
0.03	0.029	4	1
0.03	0.029	3	1
0.03	0.029	2	1
0.03	0.029	1	1
0.03	0.029	8	1
0.03	0.029	7	1
0.03	0.029	6	1
0.03	0.029	5	1
0.03	0.029	4	1
0.03	0.029	3	1
0.03	0.029	2	1
0.03	0.029	1	1
0.03	0.029	4	1
0.03	0.029	3	1
0.03	0.029	2	1
0.03	0.029	1	1
0.03	0.027	71	0
0.03	0.027	70	0
0.03	0.027	69	0
0.03	0.027	68	0
0.03	0.027	67	0
0.03	0.027	66	0
0.03	0.027	65	0
0.03	0.027	64	0
0.03	0.027	63	0
0.03	0.027	62	0
0.03	0.027	61	0
0.03	0.027	60	0
0.03	0.027	59	0
0.03	0.027	58	0
0.03	0.027	57	0
0.03	0.027	56	0
0.03	0.027	55	0
0.03	0.027	54	0
0.03	0.027	53	0
0.03	0.027	52	0
0.03	0.027	51	0
0.03	0.027	50	0
0.03	0.027	49	0
0.03	0.027	48	0
0.03	0.027	47	0
0.03	0.027	46	0
0.03	0.027	45	0
0.03	0.027	44	0
0.03	0.027	43	0
0.03	0.027	42	0
0.03	0.027	41	0
0.03	0.027	40	0
0.03	0.027	39	0
0.03	0.027	38	0
0.03	0.027	37	0
0.03	0.027	36	0
0.03	0.027	35	0
0.03	0.027	34	0
0.03	0.027	33	0
0.03	0.027	32	0
0.03	0.027	31	0
0.03	0.027	30	0
0.03	0.027	29	0
0.03	0.027	28	0
0.03	0.027	27	0
0.03	0.027	26	0
0.03	0.027	25	0
0.03	0.027	24	0
0.03	0.027	23	0
0.03	0.027	22	0
0.03	0.027	21	0
0.03	0.027	20	0
0.03	0.027	19	0
0.03	0.027	18	0
0.03	0.027	17	0
0.03	0.027	16	0
0.03	0.027	15	0
0.03	0.027	14	0
0.03	0.027	13	0
0.03	0.027	12	0
0.03	0.027	11	0
0.03	0.027	10	0
0.03	0.027	9	0
0.03	0.027	8	0
0.03	0.027	7	0
0.03	0.027	6	0
0.03	0.027	5	0
0.03	0.027	4	0
0.03	0.027	3	0
0.03	0.027	2	0
0.03	0.027	1	0


This is column # 3
1 11 10  9  8  7  6  5  4  3  2  1  1  1  9  8  7  6  5  4  3  2  1 16 15 14 
13 12 11 10  9  8  7  6  5  4  3  2  1  6  5  4  3  2  1  7  6  5  4  3  2  1
5  4  3  2  1  4  3  2  1  6  5  4  3  2  1 30 29 28 27 26 25 24 23 22 21 20
19 18 17 16 15 14 13 12 11 10  9  8  7  6  5  4  3  2  1  8  7  6  5  4  3  2
1  4  3  2  1 71 70 69 68 67 66 65 64 63 62 61 60 59 58 57 56 55 54 53 52 51
50 49 48 47 46 45 44 43 42 41 40 39 38 37 36 35 34 33 32 31 30 29 28 27 26 25
24 23 22 21 20 19 18 17 16 15 14 13 12 11 10  9  8  7  6  5  4  3  2  1


This is the script I have used for fitting the dataset to a Weibull distribution


my_plot <- function() {
	require(plotrix) # for plotCI
	require(MASS) # for fitdistr
	require(survival) # for survival analysis
	require(fitdistrplus) # for fitdist


	lifetimes=read.table("lifetime.txt", header=F, comment.char="#")
	s = Surv(lifetimes$V3, lifetimes$V4)
	mfit = survfit(s~V1, data=lifetimes, conf.type="plain")
	
	cdf=ecdf(lifetimes$V3)
	empirical = numeric(24)
	T = seq(1,24)
	for (i in T) empirical[i] = cdf(i)

	fit = fitdist(lifetimes$V3, "weibull")
	plot(T, 1- pweibull(T, fit$estimate[1], fit$estimate[2]), type="l", col="red", lwd=2, lty=1, xlim=c(1,24), xlab="Time [hours]", ylab="S(t)")
	lines(T, 1 - empirical, col="blue", lwd=2, lty=2)
	plotCI(T, mfit$surv[T], ui=mfit$upper[T], li=mfit$lower[T], col="green", lwd=1, lty=3, add=T)
	legend("bottomleft", c("Weibull fit", "1 - Empirical CDF", "Empirical S(t)"), col=c("red", "blue", "green"), horiz=F, lty=1:3, lwd=2)

}


my_plot()




About your questions
1 - What do you mean?, I guess I don't know the answer -- that's why I have posted my question on this mailing list.
2 - As I wrote in my first email, I am interested only in the first portion of the survival times (24 hours). Hence, I'd prefer to have a "good" fit over the interval of interest rather than a "reasonable" fit over the whole data.
3 - I went though those tutorials, but couldn't find an answer. That's probably due to the fact that those tutorials are "general", as you pointed out, while my question is rather specific.

Cheers,
Michele
On Nov 14, 2011, at 5:59 PM, David Winsemius wrote:

            
#
Bert,

I think there is a misunderstanding here.
Some data is censored, but I want to fit the data with a distribution  
in the interval [0,24] only. Also, please note that I have other  
datasets having values larger than 1000.

Cheers,
Michele
On 14 Nov 2011, at 18:28, Bert Gunter wrote: