Skip to content

mixtures as outcome variables

1 message · Greg Snow

#
Any
Below are a couple of ideas/suggestions of things to think about
(N=58).
means
them.
You might want to include a category for the amout of money not spent
(for
a total of 5 possibilities).
Do you have data representing these characteristics?  The predictor
values
in a regression type model?

Starting with some good graphics may help determine and show 
interesting patterns.

The maptools package can read in shapefiles and plot the maps.  You can

download a shapefile with the county boundaries from:
http://www.census.gov/geo/www/cob/co2000.html

Then you could use the symbols function to plot a star in the center of
each 
county (use get.Pcent from maptools to find the coordinates of the
centers).

Then just look for groups of counties with similar looking stars, or
stars that
are different from those close by (I would use the percentage spent in
each
category for the lengths of the star spokes).

Another graph that may prove interesting is the trilinear plot (see the
article
in Chance from the summer of 2002).  Combine your categories into 3
groups
(e.g. A&B vs. C&D vs. not spent; or A vs. B vs. all others) then plot
each county's
spending on the trilinear plot (functions to do the plot are:
triangle.plot in ade4,
triplot in klaR, or I have some code that I wrote (not on CRAN yet)).

Look for clusters of counties in these plots.
each
away
each
have
that
of
Here are a couple of thoughts (there may be better options).

Assuming that you have some predictor (x) variables about each county:

use the multinom function in the nnet package, the idea being that each

dollar spent follows a multinomial with certain probabilities as to
which category
it will be spent in and the predictors tell you what the probabilities
are.

Similarly you could use package rpart to do a tree model, use the
category as the
outcome and the percentage spent on the category as the weights (each
county
would be spread accross 4 or 5 lines of the dataset with the predictors
replicated
on each line).  rpart gives the probabilities/proportions for each
category based
on splits of the predictor variables.
hope this helps,

Greg Snow, Ph.D.
Statistical Data Center
greg.snow at ihc.com
(801) 408-8111