Is my data set too large
Aimin Yan wrote:
I have a data set like this.
I want to do glm, but I get this error:
Error in model.matrix.default(mt, mf, contrasts) :
cannot allocate vector of length 932889958
I am wondering if my data set is too large or I did something wrong.
Is there some limitation for data size for R?
thanks,
Aimin
> p1982<- read.csv("p_1982_aa.csv")
> names(p1982)
[1] "p" "aa" "as" "ms" "cur" "sc"
> str(p1982)
'data.frame': 465979 obs. of 6 variables: $ p : Factor w/ 1982 levels "154l_aa","1A0P_aa",..: 1 1 1 1 1 1 1 1 1 1 ... $ aa : Factor w/ 19 levels "ALA","ARG","ASN",..: 2 16 4 5 18 3 19 3 2 9 ... $ as : num 152.0 15.9 65.1 57.2 28.9 ... $ ms : num 108.8 28.3 59.2 49.9 31.8 ... $ cur: num -0.1020 0.2564 0.0312 -0.0550 0.0526 ... $ sc : num 92.10 103.67 7.27 72.98 96.12 ...
> attach(p1982) > m<-glm(sc~p+aa+as+cur,data=p1982)
Error in model.matrix.default(mt, mf, contrasts) :
cannot allocate vector of length 932889958
Your "p" is a factor with many levels, so the design matrix for your model is roughly 500000 x 2000. That gives 1 billion (US) entries of 8 bytes, so you need at least 8 GB just to store the design matrix. So either you don't want "p" in the model or you have indeed exceeded your capacity.
>
______________________________________________ R-help at stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
O__ ---- Peter Dalgaard ?ster Farimagsgade 5, Entr.B c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk) FAX: (+45) 35327907