Skip to content
Back to formatted view

Raw Message

Message-ID: <495532B8.8060705@email.uni-kiel.de>
Date: 2008-12-26T19:38:32Z
From: Geraldine Henningsen
Subject: Interval censored Data in survreg() with zero values!
In-Reply-To: <200812241651.mBOGova20506@hsrnfs-101.mayo.edu>

Hello again,

thank you very much for your help so far.

To be more specific, I generate a simplified data set that is similar to
my real world data:

set.seed( 123 )
data <- data.frame( x = runif( 200 ), y = NA )
for( i in 1:200 ){
   data$y[ i ] <- rweibull( 1, 1, 70 + 10 * data$x[ i ] ) - 30
}
data$y[ data$y < 0 ] <- 0
data$y[ data$y > 100 ] <- 100

Applying an interval censored tobit model based on the normal
distribution works:
estNorm <- tobit( y ~ x, left = 0, right = 100, data = data )

Since my data are obviously not normally distributed, I tried the
Weibull distribution, but this does not work (as I wrote before).
estWeibull <- tobit( y ~ x, left = 0, right = 100, dist = "weibull",
data = data )

I have tried to implement Terry's suggestion.
>   [...]  Using Surv(t1, t2, type='interval2'),  you can have 
>     a left censored observation where time of event < t: represented as (NA, t)
>     a right censored observation where time of event >t: represented as (t, NA)
>     an interval censored observations t1<=time <= t2   : represented as (t1,t2)    
>   
estWeibull2 <- survreg( Surv( ifelse( y == 0, NA, y ), ifelse( y == 100,
y, NA), type = "interval2" ) ~ x, data = data )

Is this correct?

My endogenous variable is not a time depending variable but percentages
which naturally are censored in the interval [0,100]. Unfortunately many
data points are 0 or 100 exactly. The rest of the data is asymmetrically
distributed. So I would like to apply a two-limit tobit, regressing the
percentage
(endogenous variable) on several explanatory variables.  

Best Geraldine