My suggestion is not to do any predictive modeling. Basically, the
data doesn't support a sensible and reproducible model. Yes, the
literature is saturated with this type of analysis but almost none of
the examples have any utility in real life.
Stick to differential expression analysis, investigate the results
statistically and biologically then design a prospective experiment
with a specific set of genes and a more refined measurement system.
If you are doing this analysis to learn something from the data (as
opposed to generating accurate predictions), a predictive model is one
of the worst ways of going about it.
If you are coerced to do this analysis, stick to linear methods
(regularized LDA, nearest shrunken centroids, etc) that are less
likely to over-fit and bias yourself towards those that have embedded
feature selection.
Max
On Mon, Nov 19, 2012 at 10:16 AM, Peter Kupfer <peter.kupfer at me.com> wrote:
Dear all,
i searched for some classification methods and I have no glue if i took the right once.
My problem: I have a matrix with 17000 rows and 33 colums (genes and patients). The patients are grouped into 3 diseases.
No I want to classify the patients and for sure i want to know which rows are more helpful for the classification than others.
I tried SVM and random forest. Do you think this are the right classification methods? Maybe there are some hints you can give me. I am more familiar with the Bioconductor packages. Furthermore: This is/was not my field of study in the past but I want to understand it and I am willing to deal with this field.
Would be amazing if one of the (more) mathematical people can give me a hint.
Thanks and all the best
Peter
PS: I can upload my underlying data if somebody is interested