An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20130123/a1c9cb4a/attachment.pl>
CFA with lavaan or with SEM
6 messages · David Purves, John Fox, yrosseel
Dear David, On Wed, 23 Jan 2013 11:19:09 +0000
David Purves <David.Purves at glasgow.ac.uk> wrote:
Hi Sorry for the rather long message.
. . .
I have tried the analysis using John Fox's SEM package / command. I calculate the correlation matrix with smoothing my.cor<-hetcor(north.dat.sub,use="pairwise.complete.obs")$correlations This returns the warning indicating that the correlation matrix was adjusted to make it positive definite. However the following sem model does not run, with the error message that the matrix is non-invertible. mod1<-sem::sem(sem .model.1, S=my.cor, 300) Should the smoothing not allow it to be inverted?
If the input correlation matrix is really positive definite, then it has an inverse. You could check directly, e.g., by looking at the eignevalues of the tetrachoric correlation matrix. There's very little here to go on, not even the error message produced by sem(). By the way, I assume that you didn't really call sem in the sem package as sem::sem in a session in which lavann was loaded. I'm not sure what would happen if you did that. Best, John ------------------------------------------------ John Fox Sen. William McMaster Prof. of Social Statistics Department of Sociology McMaster University Hamilton, Ontario, Canada http://socserv.mcmaster.ca/jfox/
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Hi John
Thanks for your quick reply.
The full warning I got is
' Error in csem(model = model.description, start, opt.flag = 1, typsize = typsize, :
The matrix is non-invertable.'
The eigenvalues of the tetrachoric correlations are non negative. So it is must be how I am defining my model.
I have also tried it without having lavaan in the session.
A wee example of my error (whether it is sensible);
library(sem)
my.cor<-matrix(c( 1.0000000 , 0.7600616 , 0.3653309 , 0.4377949 , 0.2917927 , 0.5133697,
0.7600616 , 1.0000000, 0.6335519 , 0.8288809 , 0.6223942 , 0.6355725,
0.3653309 , 0.6335519 , 1.0000000 , 0.9098309 , 0.9098309 , 0.7693395,
0.4377949 , 0.8288809 , 0.9098309 , 1.0000000 ,0.9136967 , 0.7829854,
0.2917927 ,0.6223942 , 0.9098309 , 0.9136967 ,1.0000000 , 0.7354562,
0.5133697 ,0.6355725 , 0.7693395 , 0.7829854 , 0.7354562 , 1.0000000),
nrow=6,byrow=T)
colnames(my.cor)<-rownames(my.cor)<-c("a","b","c","d","e","g")
eigen(my.cor)
solve(my.cor)
#i tried defining the model in two ways
model.1<-matrix(c(
# arrow #parameter #start
"f -> a", "g1", NA,
"f -> b", "g2", NA,
"f -> c", "g3", NA,
"f -> d", "g4", NA,
"f -> e", "g5", NA,
"f -> g", "g6", NA,
"f <-> f", NA, 1),
ncol=3,byrow=T)
out<-sem(model.1,S=my.cor,200)
model.1 <- specifyEquations()
f1 = gam11*a + gam12*b + gam13*c + gam14*d + gam15*e + gam16*g
f1 = 1* f1
out<-sem(model.1,S=my.cor,200)
But the same error.
I would be very grateful if you could indicate where the error in my code is please.
thanks, david
-----Original Message-----
From: John Fox [mailto:jfox at mcmaster.ca]
Sent: 23 January 2013 14:00
To: David Purves
Cc: r-help at R-project.org
Subject: Re: [R] CFA with lavaan or with SEM
Dear David,
On Wed, 23 Jan 2013 11:19:09 +0000
David Purves <David.Purves at glasgow.ac.uk> wrote:
Hi Sorry for the rather long message.
. . .
I have tried the analysis using John Fox's SEM package / command. I calculate the correlation matrix with smoothing my.cor<-hetcor(north.dat.sub,use="pairwise.complete.obs")$correlations This returns the warning indicating that the correlation matrix was adjusted to make it positive definite. However the following sem model does not run, with the error message that the matrix is non-invertible. mod1<-sem::sem(sem .model.1, S=my.cor, 300) Should the smoothing not allow it to be inverted?
If the input correlation matrix is really positive definite, then it has an inverse. You could check directly, e.g., by looking at the eignevalues of the tetrachoric correlation matrix. There's very little here to go on, not even the error message produced by sem(). By the way, I assume that you didn't really call sem in the sem package as sem::sem in a session in which lavann was loaded. I'm not sure what would happen if you did that. Best, John ------------------------------------------------ John Fox Sen. William McMaster Prof. of Social Statistics Department of Sociology McMaster University Hamilton, Ontario, Canada http://socserv.mcmaster.ca/jfox/
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
The University of Glasgow, charity number SC004401
Dear David,
It certainly helps to have a "reproducible example."
You've left out the error variances ("uniquenesses") for the observed
variables. You're also making the specification *much* harder than it needs
to be:
---------- snip -----------
cfa.mod.1 <- cfa()
1: F: a, b, c, d, e, g 2: Read 1 item NOTE: adding 6 variances to the model
cfa.mod.1
Path Parameter StartValue 1 F -> a <fixed> 1 2 F -> b lam[b:F] 3 F -> c lam[c:F] 4 F -> d lam[d:F] 5 F -> e lam[e:F] 6 F -> g lam[g:F] 7 F <-> F V[F] 8 a <-> a V[a] 9 b <-> b V[b] 10 c <-> c V[c] 11 d <-> d V[d] 12 e <-> e V[e] 13 g <-> g V[g]
cfa.sem.1 <- sem(cfa.mod.1, S=my.cor, N=200) summary(cfa.sem.1)
Model Chisquare = 543.6442 Df = 9 Pr(>Chisq) = 2.565155e-111
AIC = 567.6442
BIC = 495.9594
Normalized Residuals
Min. 1st Qu. Median Mean 3rd Qu. Max.
-1.536000 -0.135500 0.002829 0.294500 0.353400 5.337000
R-square for Endogenous Variables
a b c d e g
0.1841 0.6969 0.8172 1.0084 0.8269 0.6007
Parameter Estimates
Estimate Std Error z value Pr(>|z|)
lam[b:F] 1.945376727 0.302785547 6.424933 1.319280e-10 b <--- F
lam[c:F] 2.106647980 0.320689035 6.569130 5.061006e-11 c <--- F
lam[d:F] 2.340103148 0.347900207 6.726363 1.739560e-11 d <--- F
lam[e:F] 2.119171567 0.322095480 6.579327 4.725816e-11 e <--- F
lam[g:F] 1.806192591 0.287680436 6.278469 3.419240e-10 g <--- F
V[F] 0.184137740 0.057758730 3.188050 1.432356e-03 F <--> F
V[a] 0.815862342 0.081641551 9.993224 1.631854e-23 a <--> a
V[b] 0.303132223 0.030545714 9.923887 3.277381e-23 b <--> b
V[c] 0.182802929 0.019248279 9.497105 2.158058e-21 c <--> c
V[d] -0.008353614 0.008298643 -1.006624 3.141154e-01 d <--> d
V[e] 0.173057855 0.018375461 9.417878 4.602950e-21 e <--> e
V[g] 0.399281457 0.039935977 9.998039 1.554445e-23 g <--> g
Iterations = 59
---------- snip -----------
Note that the default in cfa() is to use a reference indicator, and that the
solution is improper -- there's a negative estimated error variance, V[d].
An alternative specification sets the variance of the factor to 1, but then
cfa() fails to converge:
---------- snip -----------
cfa.mod.2 <- cfa(reference.indicators=FALSE)
1: F: a, b, c, d, e, g 2: Read 1 item NOTE: adding 6 variances to the model
cfa.mod.2
Path Parameter StartValue 1 F -> a lam[a:F] 2 F -> b lam[b:F] 3 F -> c lam[c:F] 4 F -> d lam[d:F] 5 F -> e lam[e:F] 6 F -> g lam[g:F] 7 F <-> F <fixed> 1 8 a <-> a V[a] 9 b <-> b V[b] 10 c <-> c V[c] 11 d <-> d V[d] 12 e <-> e V[e] 13 g <-> g V[g]
cfa.sem.2 <- sem(cfa.mod.2, S=my.cor, N=200, debug=TRUE)
. . . Start values: lam[a:F] lam[b:F] lam[c:F] lam[d:F] lam[e:F] lam[g:F] V[a] V[b] V[c] V[d] V[e] V[g] 0.65781335 0.87500031 0.89597921 0.95169707 0.87357655 0.86645865 0.56728160 0.23437445 0.19722125 0.09427268 0.23686401 0.24924941 iteration = 0 Step: [1] 0 0 0 0 0 0 0 0 0 0 0 0 Parameter: [1] 0.65781335 0.87500031 0.89597921 0.95169707 0.87357655 0.86645865 0.56728160 0.23437445 0.19722125 0.09427268 0.23686401 0.24924941 Function Value [1] 3.346898 Gradient: [1] 0.4583916 0.3957443 -0.2067868 -0.4369468 -0.2629929 0.2431501 -0.5501220 -1.6700002 0.6543088 3.0031327 0.7820309 -1.0122023 . . . iteration = 21 Parameter: [1] 0.44280000 0.68987016 0.99055402 1.15651371 0.99812990 0.75293242 0.82441291 1.01174284 0.01185904 -1.30253783 -0.01183159 [12] 0.71942353 Function Value [1] -316143 Gradient: [1] 83431722 105921661 12975044375 -137927630 -13105242109 162575760 -22404848 -36111801 -541872735153 [10] -61232522 -552802111412 -85072888 Successive iterates within tolerance. Current iterate is probably solution. Warning message: In eval(expr, envir, enclos) : Could not compute QR decomposition of Hessian. Optimization probably did not converge. ---------- snip ----------- The problem seems ill-conditioned, and in any event the standard errors that you get using tetrachoric correlations won't be right (I expect you know that). I hope this helps, John
-----Original Message-----
From: David Purves [mailto:David.Purves at glasgow.ac.uk]
Sent: Wednesday, January 23, 2013 10:23 AM
To: John Fox
Cc: r-help at R-project.org
Subject: RE: [R] CFA with lavaan or with SEM
Hi John
Thanks for your quick reply.
The full warning I got is
' Error in csem(model = model.description, start, opt.flag = 1, typsize
= typsize, :
The matrix is non-invertable.'
The eigenvalues of the tetrachoric correlations are non negative. So it
is must be how I am defining my model.
I have also tried it without having lavaan in the session.
A wee example of my error (whether it is sensible);
library(sem)
my.cor<-matrix(c( 1.0000000 , 0.7600616 , 0.3653309 , 0.4377949 ,
0.2917927 , 0.5133697,
0.7600616 , 1.0000000, 0.6335519 , 0.8288809 , 0.6223942 ,
0.6355725,
0.3653309 , 0.6335519 , 1.0000000 , 0.9098309 , 0.9098309 ,
0.7693395,
0.4377949 , 0.8288809 , 0.9098309 , 1.0000000 ,0.9136967 ,
0.7829854,
0.2917927 ,0.6223942 , 0.9098309 , 0.9136967 ,1.0000000 ,
0.7354562,
0.5133697 ,0.6355725 , 0.7693395 , 0.7829854 , 0.7354562 ,
1.0000000),
nrow=6,byrow=T)
colnames(my.cor)<-rownames(my.cor)<-c("a","b","c","d","e","g")
eigen(my.cor)
solve(my.cor)
#i tried defining the model in two ways
model.1<-matrix(c(
# arrow #parameter #start
"f -> a", "g1", NA,
"f -> b", "g2", NA,
"f -> c", "g3", NA,
"f -> d", "g4", NA,
"f -> e", "g5", NA,
"f -> g", "g6", NA,
"f <-> f", NA, 1),
ncol=3,byrow=T)
out<-sem(model.1,S=my.cor,200)
model.1 <- specifyEquations()
f1 = gam11*a + gam12*b + gam13*c + gam14*d + gam15*e + gam16*g
f1 = 1* f1
out<-sem(model.1,S=my.cor,200)
But the same error.
I would be very grateful if you could indicate where the error in my
code is please.
thanks, david
-----Original Message-----
From: John Fox [mailto:jfox at mcmaster.ca]
Sent: 23 January 2013 14:00
To: David Purves
Cc: r-help at R-project.org
Subject: Re: [R] CFA with lavaan or with SEM
Dear David,
On Wed, 23 Jan 2013 11:19:09 +0000
David Purves <David.Purves at glasgow.ac.uk> wrote:
Hi Sorry for the rather long message.
. . .
I have tried the analysis using John Fox's SEM package / command. I calculate the correlation matrix with smoothing my.cor<-hetcor(north.dat.sub,use="pairwise.complete.obs")$correlations This returns the warning indicating that the correlation matrix was
adjusted to make it positive definite. However the following sem model does not run, with the error message that the matrix is non-invertible.
mod1<-sem::sem(sem .model.1, S=my.cor, 300) Should the smoothing not allow it to be inverted?
If the input correlation matrix is really positive definite, then it has an inverse. You could check directly, e.g., by looking at the eignevalues of the tetrachoric correlation matrix. There's very little here to go on, not even the error message produced by sem(). By the way, I assume that you didn't really call sem in the sem package as sem::sem in a session in which lavann was loaded. I'm not sure what would happen if you did that. Best, John ------------------------------------------------ John Fox Sen. William McMaster Prof. of Social Statistics Department of Sociology McMaster University Hamilton, Ontario, Canada http://socserv.mcmaster.ca/jfox/
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
The University of Glasgow, charity number SC004401
Dear Daniel, Oh, I see I forgot to comment on your second specification in my last reply:
-----Original Message----- From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf Of David Purves Sent: Wednesday, January 23, 2013 10:23 AM To: John Fox Cc: r-help at R-project.org Subject: Re: [R] CFA with lavaan or with SEM
. . .
model.1 <- specifyEquations() f1 = gam11*a + gam12*b + gam13*c + gam14*d + gam15*e + gam16*g f1 = 1* f1
First, this is backwards: the observed variables depend on the factor, and not vice-versa; e.g., a = gam11*f1. Second, the factor has an error-variance parameter; it doesn't depend on itself: V(f1) = 1. As I mentioned in my previous message, it's easier to use cfa() for this kind of model. Best, John
2 days later
I am trying to use the cfa command in the lavaan package to run a CFA however I am unsure over a couple of issues. I have @25 dichotomous variables, 300 observations and an EFA on a training dataset suggests a 3 factor model.
That is a lot of variables, and a rather small sample size (for binary data).
After defining the model I use the command
fit.dat <- cfa(model.1, data=my.dat, std.lv = T, estimator="WLSMV",
ordered=c("var1","var2" and so on for the other 23 variables))
To avoid having to type "var?" 25 times, you can say
ordered=paste("var",1:25,sep="")
Is it right that I define the variables as ordered (the output returns thresholds suggesting I should).
Yes! Does the cfa command
calculate tetrachoric correlations in the background?
Yes, indeed. You can 'see' it by typing inspect(fit, "sampstat") lavaan also computes an asymptotic variance matrix of these correlations, so you should get correct standard errors and a correct test statistic. By default, lavaan will provide robust standard errors and a mean and variance adjusted test statistic (estimator="WLSMV").
However, output for the command returns two variables with small negative variances (-0.002) which I think is due to the correlation matrix not being positive definite. Is it reasonable to force these to be zero when defining the model or is this more a sign of problems with the model?
You can NOT force these to be equal (at least not in the current version of lavaan - 0.5-11, where the residual variance is a function of other model parameters). I don't think this is caused by a non-pd correlation matrix (you should get a big warning if this was the case). Perhaps the sample size is too small. Could you remove some items, or regroup them?
As an alternative is it possible to calculate the tetrachoric
correlations using hetcor (which applies smoothing) and then use the
smoothed sample correlation as the input to the model, such as
fit.cor <- cfa(model.1, sample.cov=my.hetcor, sample.nobs=300, std.lv
= T,estimator="ML", ordered=c("var1","var2" and so on for the other
23 variables)).
This will work only if you omit the 'ordered' argument. Perhaps in combination with estimator="ULS". But do not trust/report the standard errors in this case.
Final question is I have a lot of missing data - listwise deletion leaves 90 subjects. Is there a way to calculate estimates using pairwise deletion (this is another reason why I tried using the correlation matrix as the input).
You could do this, and use estimator="ULS". But again, you can not use the standard errors. Yves. -- http://lavaan.org