Linear Model with Discrete Data
On Jun 13, 2013, at 2:21 PM, Bert Gunter wrote:
Lorenzo: 1. This is a statistics question, not an R question. 2. Your statistical background appears inadequate -- it looks like Poisson regression, which would fall under "generalized linear models". But it depends on how "discrete" discrete is (on some level, all measurements are discrete, discretized to the resolution of the measurement process).
There is an excellent R vignette on handling count data by authors: Achim Zeileis, Christian Kleiber, Simon Jackman. Easy to find with a Google search. There's also a somewhat older but possibly useful resource a set of worked S/R examples to accompany Agresti's text on categorical data by Laura Thompson. Alsi easy to find on Google.
David. > > 3. So I would advise seeking local statistical help. Getting > statistical advice remotely over the internet (even on a proper forum > for statistical advice, which this is not) is fraught with hazard and > the risk of bad science (not due to incompetence or maliciousness; > just due to the possibilities of misunderstanding and confusion) -- > imho only, of course. > > Of course, feel free to reject this and proceed at your own risk. > > Cheers, > Bert > > > > On Thu, Jun 13, 2013 at 1:49 PM, Lorenzo Isella > <lorenzo.isella at gmail.com> wrote: >> Dear All, >> I am struggling with a linear model and an allegedly trivial data set. >> The data set does not consist of categorical variables, but rather of >> numerical discrete variables (essentially, they count the number of times >> that something happened). >> Can I still use a standard linear regression, i.e. something like lm(y~x)? >> I attach a small snippet that illustrates the difficulties that I am >> experiencing (I do not understand why R complains about a list()). >> Any suggestion is appreciated. >> The data file can be downloaded from >> >> http://db.tt/hEKv1wH2 >> >> Cheers >> >> Lorenzo >> >> >> ##################################### >> >> data <- read.csv("testData.csv", header=TRUE) >> >> >> data <- subset(data,select= -c (X100, X182)) >> >> >> y <- data$X358 >> >> z <- subset(data, select=-c(X358)) >> >> myLM <- lm(y~z) >> >> >> ##################### >> >> ______________________________________________ >> R-help at r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. > > > > -- > > Bert Gunter > Genentech Nonclinical Biostatistics > > Internal Contact Info: > Phone: 467-7374 > Website: > http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. David Winsemius Alameda, CA, USA