Hello, I have interval censored data, censored between (0, 100). I used the tobit function in the AER package which in turn backs on survreg. Actually I'm struggling with the distribution. Data is asymmetrically distributed, so first choice would be a Weibull distribution. Unfortunately the Weibull doesn't allow for zero values in time data, as it requires x > 0. So I tried the exponential distribution that allows x to be >= 0 and the log-normal that sets x <= 0 to 0. Still I get the same error: " Fehler in survreg(formula = Surv(ifelse(A16_1_1 >= 100, 100, ifelse(A16_1_1 <= : Invalid survival times for this distribution " The only distributions that seem to work are gaussian and logistic, but they don't really fit the data. I searched for this problem in the archive and found a suggestion by Terry Therneau to set all 0 to NA, applying Weibull afterwards. But this solution is not very satisfying as it eliminates the left censored data from the dataset. So I have three questions: 1. Does anybody know why the lognormal and exponential distribution don't work in survreg? 2. What else could I do to find a distribution that fits the data well? 3. What about the non-parametric approach in survfit(), could that be a solution? I hope my question aren't too stupid, as I'm not a big statistician. Regards, Geraldine
Interval censored Data in survreg() with zero values!
3 messages · Geraldine Henningsen, Don MacQueen, Achim Zeileis
Surv() allows left, right, or interval censoring. Try left censoring instead of interval censoring. For the weibull or lognormal, think of your data as <=100 instead of [0,100]. -Don
At 8:08 PM +0100 12/23/08, Geraldine Henningsen wrote:
Hello, I have interval censored data, censored between (0, 100). I used the tobit function in the AER package which in turn backs on survreg. Actually I'm struggling with the distribution. Data is asymmetrically distributed, so first choice would be a Weibull distribution. Unfortunately the Weibull doesn't allow for zero values in time data, as it requires x > 0. So I tried the exponential distribution that allows x to be >= 0 and the log-normal that sets x <= 0 to 0. Still I get the same error: " Fehler in survreg(formula = Surv(ifelse(A16_1_1 >= 100, 100, ifelse(A16_1_1 <= : Invalid survival times for this distribution " The only distributions that seem to work are gaussian and logistic, but they don't really fit the data. I searched for this problem in the archive and found a suggestion by Terry Therneau to set all 0 to NA, applying Weibull afterwards. But this solution is not very satisfying as it eliminates the left censored data from the dataset. So I have three questions: 1. Does anybody know why the lognormal and exponential distribution don't work in survreg? 2. What else could I do to find a distribution that fits the data well? 3. What about the non-parametric approach in survfit(), could that be a solution? I hope my question aren't too stupid, as I'm not a big statistician. Regards, Geraldine
______________________________________________ R-help at r-project.org mailing list https:// stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http:// www. R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
-------------------------------------- Don MacQueen Environmental Protection Department Lawrence Livermore National Laboratory Livermore, CA, USA 925-423-1062
On Tue, 23 Dec 2008, Geraldine Henningsen wrote:
Hello, I have interval censored data, censored between (0, 100). I used the tobit function in the AER package which in turn backs on survreg. Actually I'm struggling with the distribution. Data is asymmetrically distributed, so first choice would be a Weibull distribution. Unfortunately the Weibull doesn't allow for zero values in time data, as it requires x > 0. So I tried the exponential distribution that allows x to be >= 0 and the log-normal that sets x <= 0 to 0. Still I get the same error: " Fehler in survreg(formula = Surv(ifelse(A16_1_1 >= 100, 100, ifelse(A16_1_1 <= : Invalid survival times for this distribution " The only distributions that seem to work are gaussian and logistic, but they don't really fit the data. I searched for this problem in the archive and found a suggestion by Terry Therneau to set all 0 to NA, applying Weibull afterwards. But this solution is not very satisfying as it eliminates the left censored data from the dataset. So I have three questions: 1. Does anybody know why the lognormal and exponential distribution don't work in survreg?
For these distributions, observations left-censored at zero are rather unlikely to occur: pexp(0) = plnorm(0) = 0.
2. What else could I do to find a distribution that fits the data well? 3. What about the non-parametric approach in survfit(), could that be a solution?
Both probably depend on the questions you want to ask about your data. For the tools implemented in "survival", the "Modeling Survival Data" book by Therneau and Grambsch is the natural reference. hth, Z
I hope my question aren't too stupid, as I'm not a big statistician. Regards, Geraldine
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.