An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20110502/7a8321fe/attachment.pl>
Tests for the need of cluster analysis
3 messages · Tal Galili, MARY A. WEISS, Ben Bolker
An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20110502/ea55a35d/attachment.pl>
MARY A. WEISS <mweiss <at> temple.edu> writes:
Hi, I am currently using STATA in my analysis. STATA has a cluster option but does not have any tests for whether cluster analysis is necessary or not for a dataset. So I am trying to figure out whether R could be used to test whether I need to be doing cluster analysis or not. If R does tests to determine whether cluster analysis is valid for my data, I will learn R and use it on my data. My data are panel data consisting of 49 states and 25 years. Currently, I am estimating models with fixed state and time effects. Thanks for any help you can give me. Cheers, Mary
You might want to forward this question to the r-sig-mixed-models list. I think you are fairly far off base in comparing 'prabclus' (spatial clustering) to what Stata means by "clustered standard errors" (e.g. <http://www.stata.com/support/faqs/stat/cluster.html>). Cluster _analysis_ has to do with finding clusters in data; prabclus uses spatial information to do cluster analysis; robust cluster variances or standard errors have to do with adjusting variance/SE to account for predetermined grouping variables ("clusters" in the data, e.g. states). I don't know offhand whether there are packages in R that implement the "robust cluster variance" estimator; packages like geeglm, geepack, and especially the "sandwich" package are definitely worth looking at (they implement the equivalent of robust, but not robust cluster [as far as I can tell], variance estimators]), as well as the Econometrics Task View and the book "R for Stata Users" by Muenchen and Hilbe. A final philosophical note: I don't think you should be testing _based on your data_ whether robust or robust cluster variance estimators are more appropriate; there's a fairly dangerous data snooping issue here. Rather, you should try to decide _a priori_ based on your data what's most appropriate. Ben Bolker
On Mon, May 2, 2011 at 1:02 PM, Tal Galili <tal.galili <at> gmail.com> wrote:
Hi Mary, Are you using R for your other analysis? If so, What commands are you using for your analysis? p.s: please keep the rest of the R-help mailing list in the loop. Cheers, Tal
[snip]
[snip] MARY A. WEISS <mweiss <at> temple.edu> wrote:
Hi Tal, Thanks for your answer. I am running models with two-way fixed effects and two-way fixed effects with a cluster option. The results are very different. I wanted to know if it is appropriate to cluster my data or not. In looking through the R manual, I thought that prabclus might help me answer the question. Does prabclus include any tests that will tell me if cluster analysis is appropriate to use with my data? That is, is cluster analysis valid for my data? Thanks in advance for any help you can give me. I really appreciate it. Mary
[snip]
Hi Mary, I'm not sure I understood your question. Are you using this package: http://cran.r-project.org/web/packages/prabclus/index.html <http://cran.r-project.org/web/packages/prabclus/index.html>And asking how to decide if to use it or not? ----------------Contact Details:------------------------------------------------------- Contact me: Tal.Galili <at> gmail.com | 972-52-7275845 Read me: www.talgalili.com (Hebrew) | www.biostatistics.co.il (Hebrew) | www.r-statistics.com (English) On Sun, May 1, 2011 at 7:54 PM, mary weiss <mweiss <at> temple.edu> wrote:
Does R have the capability to perform tests for the need of clustering analysis (e.g., in prabclus)? I am using panel data with two-way fixed effects but am unsure about whether I should be using a cluster option as well to estimate my model.--
[snip]