Skip to content

How can I use IPF function correctly?

2 messages · Miao Zhang, David L Carlson

#
It is not clear what you are trying to do. The ipf() function you are using
seems to be the one included in package cat for imputing missing values for
categorical variables. For ipf() you have not read the instructions
carefully because you have entered the marginal values, not their dimensions
and you have given ipf() a 2 way table but miss-specified a three way model.
No wonder it is confused. Function loglin() which is part of the included
stats package also does iterative proportional fitting. 

Iterative proportional fitting (ipf) is used for fitting models for
categorical data when there are three or more variables. There is no need
for ipf on a table with two variables since, the values can be directly
calculated. 

Your example data does not include the raw data counts (as it should), but
percentages for each of the 3 x 2 cells (I assume, since they sum to 100).
The marginal values you list (again percentages) are for a model assuming
equal margins. That is easily computed as 1/3*1/2*100 (one third in each row
by one half in each column times 100). So each cell should be 16.667 percent
of the total. Using loglin() that would be specified as follows:
0 iterations: deviation  
$lrt
[1] 25.87661

$pearson
[1] 23.80933

$df
[1] 5

$margin
[1] 0

$fit
         [,1]     [,2]
[1,] 16.66667 16.66667
[2,] 16.66667 16.66667
[3,] 16.66667 16.66667

The lrt and pearson statistics are not valid because you are not using
original counts. Note that the number of iterations is 0 because in a 2 way
model the values are directly computed.

----------------------------------------------
David L Carlson
Associate Professor of Anthropology
Texas A&M University
College Station, TX 77843-4352