Hi, I am beginner with R and need firm guidance with my problem. I have seen some other threads discussing the subject of right censored data, but I am not sure whether or not this problem can be regarded as such. Data: I have a vector with laboratory test data (strength of wood specimens, example attached as txt-file). This data is the full sample. It is a common view that this kind of data follows a lognormal distribution. Background: When fitting a distribution to the lower tail, it will usually be very different compared to fitting the whole data. The lower tail COV is the decisive measure in my analysis (due to resistance estimations of buildings). Problem: I would like to fit a lognormal distribution to the 10%-lower tail of the attached data. Question: Which function would you recommend me to use, and how to formulate it in R using the attached data? Best regards, Mattias Br?nnstr?m PhD student Lule? Technical University -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: C30.txt URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20090116/788dfe60/attachment-0002.txt>
Fitting of lognormal distribution to lower tail experimental data
6 messages · Mattias Brännström, David Winsemius, Göran Broström
On Jan 16, 2009, at 3:39 AM, Mattias Br?nnstr?m wrote:
Hi, I am beginner with R and need firm guidance with my problem. I have seen some other threads discussing the subject of right censored data, but I am not sure whether or not this problem can be regarded as such. Data: I have a vector with laboratory test data (strength of wood specimens, example attached as txt-file). This data is the full sample. It is a common view that this kind of data follows a lognormal distribution.
But it's not in this case. See attached summary and QQ plots.
> wood <- read.table("C30.txt")
> str(wood)
'data.frame': 697 obs. of 1 variable:
$ V1: num 14.8 16.1 17.5 20.1 21.7 ...
> summary(wood)
V1
Min. :14.81
1st Qu.:39.45
Median :45.92 # median is suspiciously close to the mean
Mean :45.80
3rd Qu.:52.41 # hinges also symmetric
Max. :75.70
> qqnorm(log(wood$V1))
-------------- next part --------------
A non-text attachment was scrubbed...
Name: qqnorm-log-trans.pdf
Type: application/pdf
Size: 151096 bytes
Desc: not available
URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20090116/1d8668d7/attachment-0004.pdf>
-------------- next part --------------
>
> qqnorm(wood$V1)
-------------- next part --------------
A non-text attachment was scrubbed...
Name: qqnorm-untrans.pdf
Type: application/pdf
Size: 168521 bytes
Desc: not available
URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20090116/1d8668d7/attachment-0005.pdf>
-------------- next part --------------
The tails do not appear to be systematically different either, so it
looks like both premises of the analysis are not found in this data.
David Winsemius > > > Background: > When fitting a distribution to the lower tail, it will usually be very > different compared to fitting the whole data. The lower tail COV is > the > decisive measure in my analysis (due to resistance estimations of > buildings). > > Problem: > I would like to fit a lognormal distribution to the 10%-lower tail > of the > attached data. > > Question: > Which function would you recommend me to use, and how to formulate > it in R > using the attached data? > > > Best regards, > Mattias Br?nnstr?m > > PhD student > Lule? Technical > University<C30.txt>______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
Thank you, David! I agree and apprechiate your analysis, which definitely will influence my analysis of this data, but still I would like you to disregard from it(!) The standard routine in the field is, beyond my control, to assume lognormal distribution to achieve comparable results also with other materials (comparison is made on COV). For that reason I have to use it, even if it is not statistically defendable for this particular data. So, if I rephrase the question to be (more general): How would you fit a lognormal distribution to the lower 10% tail of the data (assuming it was lognormal)? What functions to use? Best regards, Mattias -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: C30_131.txt URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20090116/a356d13a/attachment-0002.txt>
On Fri, Jan 16, 2009 at 3:31 PM, Mattias Br?nnstr?m
<Mattias.Brannstrom at tt.luth.se> wrote:
Thank you, David! I agree and apprechiate your analysis, which definitely will influence my analysis of this data, but still I would like you to disregard from it(!) The standard routine in the field is, beyond my control, to assume lognormal distribution to achieve comparable results also with other materials (comparison is made on COV). For that reason I have to use it, even if it is not statistically defendable for this particular data. So, if I rephrase the question to be (more general): How would you fit a lognormal distribution to the lower 10% tail of the data (assuming it was lognormal)? What functions to use?
Mattias, it is not clear (to me) what you mean by "fit a lognormal distribution to the 10%-lower tail of the attached data" (and what is COV?). However, a guess is that you really mean what you say, so I tried to right-censor your data at the 10% quantile (33.4134, Type I censoring) and fit the resulting data to a lognormal distribution. The fit was fairly good, as can be seen by comparing the fitted cumulative hazard function to the corresponding non-parametric one (the Nelson-Aalen estimator):
cc <- read.table(.... v1 <- ifelse(cc$V1 <= 33.4134, cc$V1, 33.4134) event <- as.numeric(cc$V1 <= 33.4134) library(eha)
Loading required package: survival Loading required package: splines
fit.ln <- phreg(Surv(v1, event) ~ 1, dist = "lognormal") fit.cox <- coxreg(Surv(v1, event) ~ 1) check.dist(fit.cox, fit.ln)# Gives you a plot summary(fit.ln)
Call: phreg(formula = Surv(v1, event) ~ 1, dist = "lognormal") Covariate Coef Exp(Coef) se(Coef) Wald p log(scale) 3.943 51.597 0.053 0.000 log(shape) 1.089 2.970 0.101 0.000 Events 70 Total time at risk 22998 Max. log. likelihood -401.38 Is this maybe what you are looking for? HTH (!) G?ran
Best regards, Mattias
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
G?ran Brostr?m
On Sat, Jan 17, 2009 at 12:27 AM, G?ran Brostr?m
<goran.brostrom at gmail.com> wrote:
On Fri, Jan 16, 2009 at 3:31 PM, Mattias Br?nnstr?m <Mattias.Brannstrom at tt.luth.se> wrote:
Thank you, David! I agree and apprechiate your analysis, which definitely will influence my analysis of this data, but still I would like you to disregard from it(!) The standard routine in the field is, beyond my control, to assume lognormal distribution to achieve comparable results also with other materials (comparison is made on COV). For that reason I have to use it, even if it is not statistically defendable for this particular data. So, if I rephrase the question to be (more general): How would you fit a lognormal distribution to the lower 10% tail of the data (assuming it was lognormal)? What functions to use?
Mattias, it is not clear (to me) what you mean by "fit a lognormal distribution to the 10%-lower tail of the attached data" (and what is COV?). However, a guess is that you really mean what you say, so I tried to right-censor your data at the 10% quantile (33.4134, Type I censoring)
Sorry, it is of course censoring of Type II. G?ran and fit the resulting data to a lognormal distribution. The
fit was fairly good, as can be seen by comparing the fitted cumulative hazard function to the corresponding non-parametric one (the Nelson-Aalen estimator):
cc <- read.table(.... v1 <- ifelse(cc$V1 <= 33.4134, cc$V1, 33.4134) event <- as.numeric(cc$V1 <= 33.4134) library(eha)
Loading required package: survival Loading required package: splines
fit.ln <- phreg(Surv(v1, event) ~ 1, dist = "lognormal") fit.cox <- coxreg(Surv(v1, event) ~ 1) check.dist(fit.cox, fit.ln)# Gives you a plot summary(fit.ln)
Call: phreg(formula = Surv(v1, event) ~ 1, dist = "lognormal") Covariate Coef Exp(Coef) se(Coef) Wald p log(scale) 3.943 51.597 0.053 0.000 log(shape) 1.089 2.970 0.101 0.000 Events 70 Total time at risk 22998 Max. log. likelihood -401.38 Is this maybe what you are looking for? HTH (!) G?ran
Best regards, Mattias
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
-- G?ran Brostr?m
G?ran Brostr?m
2 days later
Thank you very much, G?ran! I had to install R 2.8.1 since it did not work with 2.4.1. This is exactly what I wanted, now I can move on with my analysis! (And learn more about cencoring...) Best regards, Mattias