Skip to content

CFA with lavaan or with SEM

6 messages · David Purves, John Fox, yrosseel

#
Dear David,

On Wed, 23 Jan 2013 11:19:09 +0000
David Purves <David.Purves at glasgow.ac.uk> wrote:
. . .
If the input correlation matrix is really positive definite, then it has an inverse. You could check directly, e.g., by looking at the eignevalues of the tetrachoric correlation matrix. There's very little here to go on, not even the error message produced by sem(). By the way, I assume that you didn't really call sem in the sem package as sem::sem in a session in which lavann was loaded. I'm not sure what would happen if you did that.

Best,
 John

------------------------------------------------
John Fox
Sen. William McMaster Prof. of Social Statistics
Department of Sociology
McMaster University
Hamilton, Ontario, Canada
http://socserv.mcmaster.ca/jfox/
#
Hi John

Thanks for your quick reply.

The full warning I got is

' Error in csem(model = model.description, start, opt.flag = 1, typsize = typsize,  :
  The matrix is non-invertable.'

The eigenvalues of the tetrachoric correlations are non negative. So it is must be how I am defining my model.

I have also tried it without having lavaan in the session.

A wee example of my error (whether it is sensible);

library(sem)

my.cor<-matrix(c( 1.0000000  ,  0.7600616  ,  0.3653309 ,   0.4377949 , 0.2917927 ,   0.5133697,
    0.7600616 ,   1.0000000,   0.6335519 ,   0.8288809 , 0.6223942  ,  0.6355725,
     0.3653309 ,  0.6335519  ,  1.0000000 ,   0.9098309 , 0.9098309  ,  0.7693395,
     0.4377949 , 0.8288809  ,  0.9098309  ,  1.0000000  ,0.9136967   , 0.7829854,
      0.2917927  ,0.6223942  ,  0.9098309  ,  0.9136967  ,1.0000000   , 0.7354562,
     0.5133697  ,0.6355725  ,  0.7693395  ,  0.7829854 , 0.7354562   , 1.0000000),
        nrow=6,byrow=T)

colnames(my.cor)<-rownames(my.cor)<-c("a","b","c","d","e","g")

eigen(my.cor)
solve(my.cor)

#i tried defining the model in two ways

        model.1<-matrix(c(
        #       arrow           #parameter              #start
                "f -> a",       "g1",                   NA,
                "f -> b",       "g2",                   NA,
                "f -> c",       "g3",                   NA,
                "f -> d",       "g4",                   NA,
                "f -> e",       "g5",                   NA,
                "f -> g",       "g6",                   NA,
                "f <-> f",      NA,                     1),
                ncol=3,byrow=T)

out<-sem(model.1,S=my.cor,200)

model.1 <- specifyEquations()
 f1 = gam11*a + gam12*b + gam13*c + gam14*d + gam15*e + gam16*g
 f1 = 1* f1

out<-sem(model.1,S=my.cor,200)

But the same error.

I would be very grateful if you could indicate where the error in my code is please.


thanks, david




-----Original Message-----
From: John Fox [mailto:jfox at mcmaster.ca]
Sent: 23 January 2013 14:00
To: David Purves
Cc: r-help at R-project.org
Subject: Re: [R] CFA with lavaan or with SEM

Dear David,

On Wed, 23 Jan 2013 11:19:09 +0000
David Purves <David.Purves at glasgow.ac.uk> wrote:
. . .
If the input correlation matrix is really positive definite, then it has an inverse. You could check directly, e.g., by looking at the eignevalues of the tetrachoric correlation matrix. There's very little here to go on, not even the error message produced by sem(). By the way, I assume that you didn't really call sem in the sem package as sem::sem in a session in which lavann was loaded. I'm not sure what would happen if you did that.

Best,
 John

------------------------------------------------
John Fox
Sen. William McMaster Prof. of Social Statistics Department of Sociology McMaster University Hamilton, Ontario, Canada http://socserv.mcmaster.ca/jfox/
The University of Glasgow, charity number SC004401
#
Dear David,

It certainly helps to have a "reproducible example."

You've left out the error variances ("uniquenesses") for the observed
variables. You're also making the specification *much* harder than it needs
to be:

---------- snip -----------
1: F: a, b, c, d, e, g
2: 
Read 1 item
NOTE: adding 6 variances to the model
Path    Parameter StartValue
1  F -> a  <fixed>   1         
2  F -> b  lam[b:F]            
3  F -> c  lam[c:F]            
4  F -> d  lam[d:F]            
5  F -> e  lam[e:F]            
6  F -> g  lam[g:F]            
7  F <-> F V[F]                
8  a <-> a V[a]                
9  b <-> b V[b]                
10 c <-> c V[c]                
11 d <-> d V[d]                
12 e <-> e V[e]                
13 g <-> g V[g]
Model Chisquare =  543.6442   Df =  9 Pr(>Chisq) = 2.565155e-111
 AIC =  567.6442
 BIC =  495.9594

 Normalized Residuals
     Min.   1st Qu.    Median      Mean   3rd Qu.      Max. 
-1.536000 -0.135500  0.002829  0.294500  0.353400  5.337000 

 R-square for Endogenous Variables
     a      b      c      d      e      g 
0.1841 0.6969 0.8172 1.0084 0.8269 0.6007 

 Parameter Estimates
         Estimate     Std Error   z value   Pr(>|z|)             
lam[b:F]  1.945376727 0.302785547  6.424933 1.319280e-10 b <--- F
lam[c:F]  2.106647980 0.320689035  6.569130 5.061006e-11 c <--- F
lam[d:F]  2.340103148 0.347900207  6.726363 1.739560e-11 d <--- F
lam[e:F]  2.119171567 0.322095480  6.579327 4.725816e-11 e <--- F
lam[g:F]  1.806192591 0.287680436  6.278469 3.419240e-10 g <--- F
V[F]      0.184137740 0.057758730  3.188050 1.432356e-03 F <--> F
V[a]      0.815862342 0.081641551  9.993224 1.631854e-23 a <--> a
V[b]      0.303132223 0.030545714  9.923887 3.277381e-23 b <--> b
V[c]      0.182802929 0.019248279  9.497105 2.158058e-21 c <--> c
V[d]     -0.008353614 0.008298643 -1.006624 3.141154e-01 d <--> d
V[e]      0.173057855 0.018375461  9.417878 4.602950e-21 e <--> e
V[g]      0.399281457 0.039935977  9.998039 1.554445e-23 g <--> g

 Iterations =  59 

---------- snip -----------

Note that the default in cfa() is to use a reference indicator, and that the
solution is improper -- there's a negative estimated error variance, V[d]. 

An alternative specification sets the variance of the factor to 1, but then
cfa() fails to converge:

---------- snip -----------
1: F: a, b, c, d, e, g
2: 
Read 1 item
NOTE: adding 6 variances to the model
Path    Parameter StartValue
1  F -> a  lam[a:F]            
2  F -> b  lam[b:F]            
3  F -> c  lam[c:F]            
4  F -> d  lam[d:F]            
5  F -> e  lam[e:F]            
6  F -> g  lam[g:F]            
7  F <-> F <fixed>   1         
8  a <-> a V[a]                
9  b <-> b V[b]                
10 c <-> c V[c]                
11 d <-> d V[d]                
12 e <-> e V[e]                
13 g <-> g V[g]
. . .

Start values:
  lam[a:F]   lam[b:F]   lam[c:F]   lam[d:F]   lam[e:F]   lam[g:F]       V[a]
V[b]       V[c]       V[d]       V[e]       V[g] 
0.65781335 0.87500031 0.89597921 0.95169707 0.87357655 0.86645865 0.56728160
0.23437445 0.19722125 0.09427268 0.23686401 0.24924941 

iteration = 0
Step:
 [1] 0 0 0 0 0 0 0 0 0 0 0 0
Parameter:
 [1] 0.65781335 0.87500031 0.89597921 0.95169707 0.87357655 0.86645865
0.56728160 0.23437445 0.19722125 0.09427268 0.23686401 0.24924941
Function Value
[1] 3.346898
Gradient:
 [1]  0.4583916  0.3957443 -0.2067868 -0.4369468 -0.2629929  0.2431501
-0.5501220 -1.6700002  0.6543088  3.0031327  0.7820309 -1.0122023

. . .

iteration = 21
Parameter:
 [1]  0.44280000  0.68987016  0.99055402  1.15651371  0.99812990  0.75293242
0.82441291  1.01174284  0.01185904 -1.30253783 -0.01183159
[12]  0.71942353
Function Value
[1] -316143
Gradient:
 [1]      83431722     105921661   12975044375    -137927630  -13105242109
162575760     -22404848     -36111801 -541872735153
[10]     -61232522 -552802111412     -85072888

Successive iterates within tolerance.
Current iterate is probably solution.

Warning message:
In eval(expr, envir, enclos) :
  Could not compute QR decomposition of Hessian.
Optimization probably did not converge.

---------- snip -----------

The problem seems ill-conditioned, and in any event the standard errors that
you get using tetrachoric correlations won't be right (I expect you know
that).

I hope this helps,
 John
#
Dear Daniel,

Oh, I see I forgot to comment on your second specification in my last reply:
. . .
First, this is backwards: the observed variables depend on the factor, and
not vice-versa; e.g., a = gam11*f1. Second, the factor has an error-variance
parameter; it doesn't depend on itself: V(f1) = 1. As I mentioned in my
previous message, it's easier to use cfa() for this kind of model.

Best,
 John
2 days later
#
That is a lot of variables, and a rather small sample size (for binary 
data).
To avoid having to type "var?" 25 times, you can say

ordered=paste("var",1:25,sep="")
Yes!

Does the cfa command
Yes, indeed. You can 'see' it by typing

inspect(fit, "sampstat")

lavaan also computes an asymptotic variance matrix of these 
correlations, so you should get correct standard errors and a correct 
test statistic. By default, lavaan will provide robust standard errors 
and a mean and variance adjusted test statistic (estimator="WLSMV").
You can NOT force these to be equal (at least not in the current version 
of lavaan - 0.5-11, where the residual variance is a function of other 
model parameters). I don't think this is caused by a non-pd correlation 
matrix (you should get a big warning if this was the case). Perhaps the 
sample size is too small. Could you remove some items, or regroup them?
This will work only if you omit the 'ordered' argument. Perhaps in 
combination with estimator="ULS". But do not trust/report the standard 
errors in this case.
You could do this, and use estimator="ULS". But again, you can not use 
the standard errors.

Yves.
--
http://lavaan.org