X matrix deemed to be singular and cbind
Soumitro: Have you read "An Introduction to R." If not, do so, as some of your confusion appears related to basic concepts (e.g. of factors) explained there. 1. Presumably your categorical variables are factors, not character. If so, when you cbind() them, you cbind their integer codes, yielding numerical variables. This produces an in incorrect design matrix in fitting -- 1 df per categorical variable instead of 1 less than the number of levels. Also see ?cbind. 2. Produces the correct design matrix, but you are overfitting, presumably because of many different levels for your categorical variables. I suggest you consult with a local statistician to decide how best to handle this, as you seem to be out of your depth with regard to model fitting. ... unless I have misunderstood, of course. Cheers, Bert
On Fri, Jul 26, 2013 at 7:55 AM, Soumitro Dey <soumitrodey1 at gmail.com> wrote:
Hi list,
While the "X matrix deemed to be singular" question has been answered in
the list for quite a few times, I have a twist to it.
I am using the coxph model for survival analysis on a dataset containing
over 160,000 instances and 46 independent variables and I have 2 scenarios:
1. If I use cbind on the 46 independent variables (many of which are
categorical), coxph runs without any frills. The problem however is that it
won't report which of the categorical variables (e.g. VERY HIGH, HIGH,
NEUTRAL, LOW or VERY LOW) are actually meaningful/significant(e.g. XHIGH
***, XLOW ., etc). Is there any way to check this?
2. If I don't use cbind, assuming it'll give me the details I am looking
for in the previous step, it throws me the "X matrix deemed to be
singular", more precisely: "X matrix deemed to be singular; variable 130
131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149
150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168
169 170 171 172 173 174 175 176 177 178 179 180 181"
Could anyone please elaborate on how to get around problem #1 or #2?
Thanks!
SD
[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Bert Gunter Genentech Nonclinical Biostatistics Internal Contact Info: Phone: 467-7374 Website: http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm