linear fit function with NA values
HI,
I couldn't get any error message with the data you provided.
return<- read.table(text="
????? ATI??????? AMU
-1? 0.734??? 9.003
0??? 0.999??? 2.001
1??? 3.097??? -1.003
2??????? NA??????? NA
3??????? NA??? 3.541
",sep="",header=TRUE)
median<- read.table(text="
????? ATI??????? AMU
-1? 3.224??? -2.003
0??? 2.999??? -1.301
1??? 1.3??????? -1.003
2??? 4.000??? 2.442
3????? -10??? 4.511
",sep="",header=TRUE)
?lapply(seq_len(ncol(return)),function(i) {lm(return[,i]~median[,i])})
[[1]]
Call:
lm(formula = return[, i] ~ median[, i])
Coefficients:
(Intercept)? median[, i]?
????? 4.696?????? -1.231?
[[2]]
Call:
lm(formula = return[, i] ~ median[, i])
Coefficients:
(Intercept)? median[, i]?
???? 3.3937????? -0.1607?
lapply(seq_len(ncol(return)),function(i) {lm(return[,i]~median[,i],na.action=na.omit)}) #same as above.
?sessionInfo()
R version 3.0.1 (2013-05-16)
Platform: x86_64-unknown-linux-gnu (64-bit)
locale:
?[1] LC_CTYPE=en_CA.UTF-8?????? LC_NUMERIC=C?????????????
?[3] LC_TIME=en_CA.UTF-8??????? LC_COLLATE=en_CA.UTF-8???
?[5] LC_MONETARY=en_CA.UTF-8??? LC_MESSAGES=en_CA.UTF-8??
?[7] LC_PAPER=C???????????????? LC_NAME=C????????????????
?[9] LC_ADDRESS=C?????????????? LC_TELEPHONE=C???????????
[11] LC_MEASUREMENT=en_CA.UTF-8 LC_IDENTIFICATION=C??????
attached base packages:
[1] stats???? graphics? grDevices utils???? datasets? methods?? base????
other attached packages:
[1] stringr_0.6.2? reshape2_1.2.2
loaded via a namespace (and not attached):
[1] plyr_1.8??? tools_3.0.1
BTW, It is better to ?dput() the example dataset.
A.K.
----- Original Message -----
From: iza.ch1 <iza.ch1 at op.pl>
To: arun <smartpink111 at yahoo.com>
Cc: R help <r-help at r-project.org>
Sent: Saturday, July 27, 2013 4:46 PM
Subject: Re: Re: [R] linear fit function with NA values
Hi
Thanks for your hints. I would like to describe my problem better and give an examle of the data that I use.
I conduct the event study and I need to create abnormal returns for the daily stock prices. I have for each stock returns from time period of 8 years. For some days I don't have the data for many reasons. in excel file they are just empty cells but I convert my data into 'zoo' and then it is transformed into NA. I get something like this
return
? ? ? ATI? ? ? ? AMU
-1? 0.734? ? 9.003
0? ? 0.999? ? 2.001
1? ? 3.097? ? -1.003
2? ? ? ? NA? ? ? ? NA
3? ? ? ? NA? ? 3.541
median
? ? ? ATI? ? ? ? AMU
-1? 3.224? ? -2.003
0? ? 2.999? ? -1.301
1? ? 1.3? ? ? ? -1.003
2? ? 4.000? ? 2.442
3? ? ? -10? ? 4.511
I want to regress first column return with first column median and second column return with second column median. when I do
OLS<-lapply(seq_len(ncol(return)),function(i) {lm(return[,i]~median[,i])})
I get an error message. I would like my function to omit the NAs and for example for ATI returns to take into account only the values for -1,0,1 and regress it against the same values from ATI in median which means it would also take only (3.224, 2.999, 1.3)
Is it possible to do it?
Thanks a lot
W dniu 2013-07-27 17:33:30 u?ytkownik arun <smartpink111 at yahoo.com> napisa?:
HI,
set.seed(28)
dat1<- as.data.frame(matrix(sample(c(NA,1:20),100,replace=TRUE),ncol=10))
set.seed(49)
dat2<- as.data.frame(matrix(sample(c(NA,40:80),100,replace=TRUE),ncol=10))
?lapply(seq_len(ncol(dat1)),function(i) {lm(dat2[,i]~dat1[,i])}) #works bcz the default setting removes NA
Regarding the options:
?lm()
na.action: a function which indicates what should happen when the data
????????? contain ?NA?s.? The default is set by the ?na.action? setting
????????? of ?options?, and is ?na.fail? if that is unset.? The
????????? ?factory-fresh? default is ?na.omit?.? Another possible value
????????? is ?NULL?, no action.? Value ?na.exclude? can be useful.
?lapply(seq_len(ncol(dat1)),function(i) {lm(dat2[,i]~dat1[,i],na.action=na.exclude)})
#or
?lapply(seq_len(ncol(dat1)),function(i) {lm(dat2[,i]~dat1[,i],na.action=na.omit)})
lapply(seq_len(ncol(dat1)),function(i) {lm(dat2[,i]~dat1[,i],na.action=na.fail)})
#Error in na.fail.default(list(`dat2[, i]` = c(54L, 59L, 50L, 64L, 40L,? :
?# missing values in object
In your case, the error is different.? It could be something similar to the below case:
dat1[,1]<- NA
lapply(seq_len(ncol(dat1)),function(i) {lm(dat2[,i]~dat1[,i],na.action=na.omit)})
#Error in lm.fit(x, y, offset = offset, singular.ok = singular.ok, ...) :
?# 0 (non-NA) cases # here it is different
?lapply(seq_len(ncol(dat1)),function(i) {try(lm(dat2[,i]~dat1[,i]))}) #works in the above case.? It may not work in your case.
You need to provide a reproducible example to understand the situation better.
A.K.
----- Original Message -----
From: iza.ch1 <iza.ch1 at op.pl>
To: r-help at r-project.org
Cc:
Sent: Saturday, July 27, 2013 8:47 AM
Subject: [R] linear fit function with NA values
Hi
Quick question. I am running a multiple regression function for each column of two data sets. That means as a result I get several coefficients. I have a problem because data that I use for regression contains NA. How can I ignore NA in lm function. I use the following code for regression:
OLS<-lapply(seq_len(ncol(es.w)),function(i) {lm(es.w[,i]~es.median[,i])})
as response I get
Error in lm.fit(x, y, offset = offset, singular.ok = singular.ok, ...) :
? all values NA
thanks for help :)
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.