Skip to content

Factor analysis of categorical or mixed categorical/continuousdata in [R]

9 messages · root, Matthias Burger, Douglas Bates +4 more

#
I am looking to fit one or more latent categorical variables to data that is
a mixture of categorical and continuous variables. Factor analysis would
work for continuous data, latent class analysis for categorical data. I
understand that in a package such as MPlus I could perform a single analysis
of both data types. Are there similar routines available in R?

Stuart

-----Original Message-----
From: Prof Brian Ripley <ripley at stats.ox.ac.uk>
To: Dr Stuart Leask <stuart.leask at nottingham.ac.uk>
Cc: r-help at stat.math.ethz.ch <r-help at stat.math.ethz.ch>
Date: 21 February 2002 10:53
Subject: [R] Re: Factor analysis of categorical or mixed
categorical/continuousdata in [R]
categorical
.-.-
._._

-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
#
Dear R-help group,

I have a two-way ANOVA with two crossed
random factors, no nesting. Each factor has
three levels, resulting in 9 cells for the
experiment. Each cell contains 10 repetitions.
According to the ANOVA model I assume equal
variances for all levels per factor.

I would like to get REML-estimates for the
variances of the two factors and moreover
get confidence intervals for these estimates,
so the use of the nlme-package seems to be
a good idea.

My problem in the first place is to formulate
the model itself for the lme-function.
The fixed part would at most consist of
the intercept, resulting in
	fixed= response ~ 1
and the random part would be
	random = ~ a + b
but I have no idea what my gouping factor
there should be. 
Could somebody please point me in the
right direction ?

Sorry if this turns out to be an extremely
simple question, I'm a newbie to R ...

Many greetings,

	Susanne

----

Susanne Schwenke
 
Epigenomics AG          www.epigenomics.com           Kastanienallee 24
+4930243450                                              10435 Berlin
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
#
Susanne Schwenke <ml-r-help at epigenomics.com> writes:
The answer to your question is not "extremely simple".  It happens
that the lme function is much better suited to nested random effects
than to crossed random effects.  To estimate crossed random effects
you must create an awkward formulation with a grouping factor that has
one level and the random effects model matrix based on the indicators
for factor a and the indicators for factor b.  These two sets of
random effects each have variance-covariance matrices that are
multiples of an identity and the are grouped together as a
block-diagonal matrix.  The whole formulation is

 lme(response ~ 1, data = myData, 
     random = pdBlocked(list(pdIdent(~ a - 1), pdIdent(~ b - 1))))

and myData must be a groupedData object with a grouping factor that
has only one level.

There is an example of fitting crossed random effects in section 4.2
of Pinheiro and Bates (2000) "Mixed-effects Models in S and S-PLUS",
Springer.  That example happens to have two blocks and a random effect
for the block is added.  If we fit a single block it would look like
Loading required package: nls
logDens ~ 1 | Block
Block  sample dilut     logDens        
 2:30   a:10   1:12   Min.   :-0.23319  
 1:30   b:10   2:12   1st Qu.: 0.08404  
        c:10   3:12   Median : 0.31183  
        d:10   4:12   Mean   : 0.27300  
        e:10   5:12   3rd Qu.: 0.51175  
        f:10          Max.   : 0.67549
+ random = pdBlocked(list(pdIdent(~ sample - 1), pdIdent(~ dilut - 1))))
Linear mixed-effects model fit by REML
 Data: B1 
        AIC       BIC   logLik
  -42.21756 -36.74837 25.10878

Random effects:
 Composite Structure: Blocked

 Block 1: samplea, sampleb, samplec, sampled, samplee, samplef
 Formula: ~sample - 1 | Block
 Structure: Multiple of an Identity
           samplea    sampleb    samplec    sampled    samplee    samplef
StdDev: 0.09128746 0.09128746 0.09128746 0.09128746 0.09128746 0.09128746

 Block 2: dilut1, dilut2, dilut3, dilut4, dilut5
 Formula: ~dilut - 1 | Block
 Structure: Multiple of an Identity
           dilut1    dilut2    dilut3    dilut4    dilut5   Residual
StdDev: 0.2903284 0.2903284 0.2903284 0.2903284 0.2903284 0.05280506

Fixed effects: logDens ~ 1 
                Value Std.Error DF  t-value p-value
(Intercept) 0.2847687 0.1354251 29 2.102776  0.0443

Standardized Within-Group Residuals:
       Min         Q1        Med         Q3        Max 
-2.3867399 -0.3489126  0.0328683  0.3878895  2.0134090 

Number of Observations: 30
Number of Groups: 1 
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
#
At 11:18 AM 2/21/2002 +0000, Dr Stuart Leask wrote:
Dear Stuart,

If memory serves me, a common approach is to use tetrachoric correlations 
(for dichotomous data), polychoric correlations (for ordered-category 
data), and point-biserial and polyserial correlations (for mixed data). If 
you want to do inference, then this approach gets complicated (requiring 
asymptotic sampling covariances for the correlations), but for a 
descriptive factor analysis, it should be reasonably straightforward.

I'm not aware of any facility for calculating these kinds of correlations 
in R, but programming them shouldn't be too hard. I may add this at some 
point to the sem package.

I hope that this helps,
  John

-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
#
John Fox <jfox at mcmaster.ca> writes:
On the face of it (which is as far as I am able to see), it would seem
fairly easy to set up an MLE procedure if you treat all discrete
variables as obtained by setting cutpoints on continuous latent
variables. I suspect this is what MPlus is doing. The requisite normal
integrals should be available through library(mvtnorm).
#
On 21 Feb 2002, Peter Dalgaard BSA wrote:

            
My understanding is that the optimisation problem is difficult (even
worse than factor analysis).

OTOH it seems that L-BFGS-B in optim() can work miracles -- I've been
working on another problem related to factor analysis and optim() fits a
model with 4000 constrained parameters in a couple of minutes and gets the
right answer (in stark contrast to my prior attempts to write an optimiser
more tuned to the specific problem).


	-thomas

-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
#
Dear Peter,
At 05:27 PM 2/21/2002 +0100, Peter Dalgaard BSA wrote:
Indeed, this is what tetrachoric, etc., correlations do.

John


-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
#
On Thu, 21 Feb 2002, John Fox wrote:

            
Well, I'm confused.  Stuart said he wanted a latent *categorical*
variable, although that is not what factor analysis assumes, and John's
approach is presumably to do factor analysis and get continuous latent
variables out.

There really are tens of possible approaches even for continuous latent
variables and ordered categorical manifest variables.  All seem to boil
down to some algebra, some integration (perhaps numerical) and a good
constrained optimizer.  Given that the latent variable must affect the
manifest variable non-linearly, there are many possible links.
(Consider for example voter agreement variables, where `folding' can
occur.)

That's why I asked for *precise* details ....
Like factor analysis, finding good estimates is not an easy task, and
there are usually myriad local optima.
#
Dear Brian,
At 05:16 PM 2/21/2002 +0000, Prof Brian Ripley wrote:

            
Quite right -- I didn't read the original question carefully enough.

Sorry,
  John

-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._