Skip to content

random effects model

13 messages · rex2013, Usha Gurunathan, arun

#
HI,
BP.stack5 is the one without missing values.
na.omit(....).? Otherwise, I have to use the option na.action=.. in the ?geese() statement

You need to read about the correlation structures.? IN unstructured option, more number of parameters needs to be estimated,? In repeated measures design, when the underlying structure is not known, it would be better to compare using different options (exchangeable is similar to compound symmetry) and select the one which provide the least value for AIC or BIC.? Have a look at 

http://stats.stackexchange.com/questions/21771/how-to-perform-model-selection-in-gee-in-r
It's not clear to me? "reference to write about missing values". ?? 
A.K.




----- Original Message -----
From: Usha Gurunathan <usha.nathan at gmail.com>
To: arun <smartpink111 at yahoo.com>
Cc: 
Sent: Monday, January 7, 2013 6:12 PM
Subject: Re: [R] random effects model

Hi AK

2)I shall try putting exch. and check when I get home. Btw, what is
BP.stack5? is it with missing values or only complete cases?

I guess I am still not clear about the unstructured and exchangeable
options, as in which one is better.

1)Rgding the summary(p): NA thing, I tried putting one of my gee equation.

Can you suggest me a reference to write about" missing values and the
implications for my results"

Thanks.
On 1/8/13, arun <smartpink111 at yahoo.com> wrote:
#
HI,

In your dataset, the "exchangeable" or "compound symmetry" may work as there are only two levels for time.? In experimental data analysis involving a factor time with more than 2 levels, randomization of combination of levels of factors applied to the subject/plot etc. gets affected as time is unidirectional.? I guess your data is observational, and with two time levels, it may not hurt to use "CS" as option, though, it would help if you check different options.? 

In the link I sent previously, QIC was used.? http://stats.stackexchange.com/questions/577/is-there-any-reason-to-prefer-the-aic-or-bic-over-the-other 
I am not sure whether AIC/BIC is better than QIC or viceversa.

You could sent email to the maintainer of geepack (Jun Yan <jun.yan at uconn.edu>). 
Regarding the reference links,
You can check this link "www.jstatsoft.org/v15/i02/paper" .? Other references are in the paper. 
"
4.3. Missing values (waves)
In case of missing values, the GEE estimates are consistent if the values are missing com-
pletely at random (Rubin 1976). The geeglm function assumes by default that observations
are equally separated in time. Therefore, one has to inform the function about different sep-
arations if there are missing values and other correlation structures than the independence or
exchangeable structures are used. The waves arguments takes an integer vector that indicates
that two observations of the same cluster with the values of the vector of k respectively l have
a correlation of rkl ."

Hope it helps.
A.K.




----- Original Message -----
From: rex2013 <usha.nathan at gmail.com>
To: r-help at r-project.org
Cc: 
Sent: Tuesday, January 8, 2013 5:29 PM
Subject: Re: [R] random effects model

Hi

Thanks a lot, the corstr "exchangeable"does work. Didn't strike to me
for so long. Does the AIC value come out with the gee output?

By reference, I meant reference to a easy-read paper or web address
that can give me knowledge about implications of missing data.

Ta.

On 1/8/13, arun kirshna [via R]
<ml-node+s789695n4654902h89 at n4.nabble.com> wrote:
--
View this message in context: http://r.789695.n4.nabble.com/random-effects-model-tp4654346p4654986.html
Sent from the R help mailing list archive at Nabble.com.
??? [[alternative HTML version deleted]]

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
2 days later
#
Hi,

To get the missing values, you could use:
set.seed(15)
dat1<-data.frame(Gender=rep(c("M","F"),each=10), V1=sample(c(1:20,NA),20,replace=TRUE), V2=sample(c(20:30,NA),20,replace=TRUE))
library(reshape2)
?dcast(melt(dat1,id="Gender"),Gender~variable,function(x) sum(is.na(x)))
#? Gender V1 V2
#1????? F? 1? 0
#2????? M? 2? 1

#or
do.call(rbind,lapply(split(dat1,dat1$Gender),function(x) colSums(is.na(x[-1]))))
?# V1 V2
#F? 1? 0
#M? 2? 1

Regarding the "mi" package, I never used it before.? BTW, you didn't mention what you are going to plot.
Hope it helps.
A.K.



----- Original Message -----
From: rex2013 <usha.nathan at gmail.com>
To: r-help at r-project.org
Cc: 
Sent: Friday, January 11, 2013 2:16 AM
Subject: Re: [R] random effects model

Hi AK

Regarding the missing values, I would like to find out the patterns of
missing values in my data set. I know the overall values for each variable.

using

colSums(is.na(df))

? ? ? ? ? ? ? ? ? ? ? but what I wanted is? to find out the percentages
with each level of the variable with my dataset, as in if there is more
missing data in females or males etc?.

I installed "mi" package, but unable to produce a plot with it( i would
also like to produce a plot). I searched the responses in the relevant
sections in r but could n't find an answer.

Thanks,





On Wed, Jan 9, 2013 at 12:31 PM, arun kirshna [via R] <
ml-node+s789695n4654996h3 at n4.nabble.com> wrote:

            
--
View this message in context: http://r.789695.n4.nabble.com/random-effects-model-tp4654346p4655206.html
Sent from the R help mailing list archive at Nabble.com.
??? [[alternative HTML version deleted]]

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
#
HI,

If you want to find out the percentage of missing values in the whole dataset in females and males:
?set.seed(51)
?dat1<-data.frame(Gender=rep(c("M","F"),each=10),V1=sample(c(1:3,NA),20,replace=TRUE),V2=sample(c(21:24,NA),20,replace=TRUE))
?unlist(lapply(lapply(split(dat1,dat1$Gender),function(x) (nrow(x[!complete.cases(x[,-1]),])/nrow(x))*100),function(x) paste(x,"%",sep="")))
#??? F???? M 
#"20%" "70%" 

#If it is to find the percentage of missing values for each variable in females and males:
?res<-do.call(rbind,lapply(split(dat1,dat1$Gender),function(x) paste((colSums(is.na(x[,-1]))/nrow(x))*100,"%",sep="")))
?colnames(res)<-colnames(dat1)[-1]
?res
#? V1??? V2?? 
#F "0%"? "20%"
#M "50%" "20%"
A.K.





----- Original Message -----
From: rex2013 <usha.nathan at gmail.com>
To: r-help at r-project.org
Cc: 
Sent: Friday, January 11, 2013 2:16 AM
Subject: Re: [R] random effects model

Hi AK

Regarding the missing values, I would like to find out the patterns of
missing values in my data set. I know the overall values for each variable.

using

colSums(is.na(df))

? ? ? ? ? ? ? ? ? ? ? but what I wanted is? to find out the percentages
with each level of the variable with my dataset, as in if there is more
missing data in females or males etc?.

I installed "mi" package, but unable to produce a plot with it( i would
also like to produce a plot). I searched the responses in the relevant
sections in r but could n't find an answer.

Thanks,





On Wed, Jan 9, 2013 at 12:31 PM, arun kirshna [via R] <
ml-node+s789695n4654996h3 at n4.nabble.com> wrote:

            
--
View this message in context: http://r.789695.n4.nabble.com/random-effects-model-tp4654346p4655206.html
Sent from the R help mailing list archive at Nabble.com.
??? [[alternative HTML version deleted]]

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
#
Hello,

Are you sure that you got the error message with#


unlist(lapply(lapply(split(dat1,dat1$Gender),function(x) 
(nrow(x[!complete.cases(x[,-1]),])/nrow(x))*100),function(x) 
paste(x,"%",sep="")))
#or
do.call(rbind,lapply(split(dat1,dat1$Gender),function(x) paste((colSums(is.na(x[,-1]))/nrow(x))*100,"%",sep="")))
#
HI,

Regarding the script I sent to you earlier, it should be modified a bit according to your dataset.? For example, the variable "Sex" was in column 2.? Also, your data had some missing values for the "Sex" column, which I removed before running the code.? 
With respect to the Errors that you showed here, you were using different package and I don't have a clue what you were trying to do. 

I applied the same code to your dataset.? The results are as follows:

BP_2b<-read.csv("BP_2b.csv",sep="\t")
nrow(BP_2b)
#[1] 6898
BP_2bSexNoMV<-BP_2b[!is.na(BP_2b$Sex),]
?nrow(BP_2bSexNoMV)
#[1] 6260
?unique(BP_2bSexNoMV$Sex)
#[1] 2 1
library(car)
BP_2bSexNoMV$Sex<-recode(BP_2bSexNoMV$Sex,"1='Male';2='Female'") #assuming that 1=Male and 2=Female
?unique(BP_2bSexNoMV$Sex)
#[1] "Female" "Male"? 

#whole dataset
?unlist(lapply(lapply(split(BP_2bSexNoMV,BP_2bSexNoMV$Sex),function(x) (nrow(x[!complete.cases(x[,-2]),])/nrow(x))*100),function(x) paste(x,"%",sep="")))
?#??????????? Female??????????????? Male 
#"72.6552179656539%" "74.4740099009901%" 
#splitting the above into components:

?lapply(split(BP_2bSexNoMV,BP_2bSexNoMV$Sex),function(x) head(x,2))
#$Female
#? CODEA??? Sex MaternalAge Education Birthplace AggScore IntScore Obese14
#2???? 3 Female?????????? 3???????? 3????????? 1??????? 0??????? 0?????? 0
#3???? 4 Female?????????? 3???????? 6????????? 1?????? NA?????? NA????? NA
#? Overweight14 Overweight21 Obese21 hibp14 hibp21
#2??????????? 0??????????? 1?????? 0????? 0????? 0
#3?????????? NA?????????? NA????? NA???? NA???? NA

#$Male
#? CODEA? Sex MaternalAge Education Birthplace AggScore IntScore Obese14
#6???? 9 Male?????????? 3???????? 6????????? 2??????? 0??????? 0?????? 0
#8??? 11 Male?????????? 3???????? 4????????? 1??????? 0??????? 0????? NA
?# Overweight14 Overweight21 Obese21 hibp14 hibp21
#6??????????? 0??????????? 0?????? 0????? 0????? 0
#8?????????? NA??????????? 0?????? 0???? NA????? 0



?lapply(split(BP_2bSexNoMV,BP_2bSexNoMV$Sex),function(x)nrow(x))
#$Female
#[1] 3028
#
#$Male
#[1] 3232
lapply(split(BP_2bSexNoMV,BP_2bSexNoMV$Sex),function(x) nrow(x[!complete.cases(x[,-2]),])) #rows which have at least one NA
#$Female
#[1] 2200
#
#$Male
#[1] 2407

?lapply(split(BP_2bSexNoMV,BP_2bSexNoMV$Sex),function(x) head(x[!complete.cases(x[,-2]),],2))
#$Female
#?? CODEA??? Sex MaternalAge Education Birthplace AggScore IntScore Obese14
#3????? 4 Female?????????? 3???????? 6????????? 1?????? NA?????? NA????? NA
#10??? 13 Female?????????? 3???????? 4????????? 2?????? NA?????? NA????? NA
?#? Overweight14 Overweight21 Obese21 hibp14 hibp21
#3??????????? NA?????????? NA????? NA???? NA???? NA
#10?????????? NA?????????? NA????? NA???? NA???? NA

#$Male
#? CODEA? Sex MaternalAge Education Birthplace AggScore IntScore Obese14
#8??? 11 Male?????????? 3???????? 4????????? 1??????? 0??????? 0????? NA
#9??? 12 Male?????????? 3???????? 4????????? 2?????? NA?????? NA????? NA
#? Overweight14 Overweight21 Obese21 hibp14 hibp21
#8?????????? NA??????????? 0?????? 0???? NA????? 0
#9?????????? NA?????????? NA????? NA???? NA???? NA

lapply(split(BP_2bSexNoMV,BP_2bSexNoMV$Sex),function(x) (nrow(x[!complete.cases(x[,-2]),])/nrow(x))*100) #gives the percentage of rows of missing #values from the overall rows for Males and Females 
#$Female
#[1] 72.65522
#
#$Male
#[1] 74.47401

#iF you want the percentage from the total number rows in Males and Females (without NA's in the the Sex column)
?lapply(split(BP_2bSexNoMV,BP_2bSexNoMV$Sex),function(x) (nrow(x[!complete.cases(x[,-2]),])/nrow(BP_2bSexNoMV))*100)
#$Female
#[1] 35.14377
#
#$Male
#[1] 38.45048
unlist(lapply(lapply(split(BP_2bSexNoMV,BP_2bSexNoMV$Sex),function(x) (nrow(x[!complete.cases(x[,-2]),])/nrow(BP_2bSexNoMV))*100),function(x) paste(x,"%",sep="")))#paste the "%" to the numbers
#???????????? Female??????????????? Male 
#"35.1437699680511%" "38.4504792332268%" 






#If it is to find the percentage of missing values for each variable in females and males:
res<-do.call(rbind,lapply(split(BP_2bSexNoMV,BP_2bSexNoMV$Sex),function(x) paste((colSums(is.na(x[,-2]))/nrow(BP_2bSexNoMV))*100,"%",sep=""))) #If #you want percentage of missing values per category, replace by nrow(x)
?colnames(res)<-colnames(BP_2bSexNoMV)[-2]
res
#?????? CODEA MaternalAge Education?????????? Birthplace???????? 
#Female "0%"? "0%"??????? "1.03833865814696%" "1.18210862619808%"
#Male?? "0%"? "0%"??????? "1.10223642172524%" "1.3258785942492%" 
?#????? AggScore??????????? IntScore??????????? Obese14??????????? 
#Female "15.2076677316294%" "15.2076677316294%" "24.0894568690096%"
#Male?? "16.5015974440895%" "16.5015974440895%" "25.814696485623%" 
#?????? Overweight14??????? Overweight21??????? Obese21??????????? 
#Female "24.0894568690096%" "23.3865814696486%" "23.3865814696486%"
#Male?? "25.814696485623%"? "29.1533546325879%" "29.1533546325879%"
#?????? hibp14????????????? hibp21???????????? 
#Female "24.1693290734824%" "31.3418530351438%"
#Male?? "25.8466453674121%" "35.223642172524%" 

A.K.
#
Hi, 
I tried ur dataset with vmv.
library(vmv)
tablemissing(BP_2bSexNoMV[BP_2bSexNoMV$Sex=="Male",],sortby="variable")
tablemissing(BP_2bSexNoMV[BP_2bSexNoMV$Sex=="Female",],sortby="variable")

I am attaching the plots i got with it.? I didn't look at the details to change the font labels in the plot.
Hope this helps.

A.K.
#
Hi,
Yes, you are right. ?72.655222% was those missing among females. ?35.14377% of values in females are missing from among the whole dataset (combined total of Males+Females data after removing the NAs from the variable "Sex").?
A.K.