Skip to content

estimating NA values against selected slots

5 messages · arun, Eliza Botto

#
Hello Eliza,
Successful in reading the data.? I hope you were able to replace NA with the codes I emailed earlier.

Now, the problem in the dput data is that I have only one level of predictor for Year (1962).? So while running the code, I get a warning message.
Warning message:
In predict.lm(lm1, newdata = dat2) :
? prediction from a rank-deficient fit may be misleading

Collinearity is the reason being shown while I googled.? The fit value will be a single value and that will be the mean of the column Discharge.

I guess the code that I emailed you should work (without any warnings) in your original file with more levels of Year .? 

I checked the code with multiple years (replacing your single level of year column) and the Discharge column from your data.

#Checking for error messages. 

dat1 <- structure(list(Year = c(1962L, 1962L, 1962L, 1962L, 1962L, 1962L, 
1962L, 1962L, 1962L, 1962L, 1962L, 1962L, 1962L, 1962L, 1962L, 
1962L, 1962L, 1962L, 1962L, 1962L, 1962L, 1962L, 1962L, 1962L, 
1962L, 1962L, 1962L, 1962L, 1962L, 1962L, 1962L, 1962L, 1962L, 
1962L, 1962L, 1962L, 1962L, 1962L, 1962L, 1962L, 1962L, 1962L, 
1962L, 1962L, 1962L, 1962L, 1962L, 1962L, 1962L, 1962L, 1962L, 
1962L, 1962L, 1962L, 1962L, 1962L, 1962L, 1962L, 1962L, 1962L, 
1962L, 1962L, 1962L, 1962L, 1962L, 1962L, 1962L, 1962L, 1962L, 
1962L), Discharge = c(0.44, 0.44, 0.44, 0.44, 0.44, 0.44, 0.44, 
0.44, 0.44, 0.44, 0.44, 0.44, 0.44, 0.44, 0.44, 0.44, 0.44, 0.44, 
0.44, 0.44, 0.44, 0.44, 0.35, 0.35, 0.35, 0.27, 0.27, 0.27, 0.27, 
NA, NA, 0.16, 0.16, 1.08, 1.08, 1.08, 1.08, 1.08, 1.08, 1.08, 
1.08, 1.08, 1.08, 1.08, 1.08, 1.08, 1.08, 1.08, 1.08, 1.08, 1.08, 
1.08, 1.08, 1.08, 1.08, 1.08, 1.08, 1.08, NA, 1.08, 1.08, 1.08, 
1.08, 0.65, 0.65, 0.65, NA, 0.44, 0.44, 0.44)), .Names = c("Year", 
"Discharge"), class = "data.frame", row.names = c(NA, -70L))

dat4<-data.frame(Year=rep(1962:1966,each=14),Discharge=dat1[,2])
lm4<-lm(Discharge~Year,dat4)
?dat5<-data.frame(Year=dat4[,1])
?dat4$fit<-predict(lm4,newdata=dat5)
?dat4<-within(dat4,{Dischargenew<-ifelse(is.na(Discharge)==T,fit,Discharge)})
?dat4new<-dat4[,c(1:2,4)]
?tail(dat4new,10)
?? Year Discharge Dischargenew
61 1966????? 1.08????? 1.08000
62 1966????? 1.08????? 1.08000
63 1966????? 1.08????? 1.08000
64 1966????? 0.65????? 0.65000
65 1966????? 0.65????? 0.65000
66 1966????? 0.65????? 0.65000
67 1966??????? NA????? 1.01678
68 1966????? 0.44????? 0.44000
69 1966????? 0.44????? 0.44000
70 1966????? 0.44????? 0.44000



A.K.







----- Original Message -----
From: eliza botto <eliza_botto at hotmail.com>
To: r-help at r-project.org
Cc: 
Sent: Friday, July 6, 2012 8:29 PM
Subject: Re: [R] estimating NA values against selected slots



Dear Arun
Extremly greatful for your concern...
here i go again
structure(list(Year = c(1962L, 1962L, 1962L, 1962L, 1962L, 1962L, 
1962L, 1962L, 1962L, 1962L, 1962L, 1962L, 1962L, 1962L, 1962L, 
1962L, 1962L, 1962L, 1962L, 1962L, 1962L, 1962L, 1962L, 1962L, 
1962L, 1962L, 1962L, 1962L, 1962L, 1962L, 1962L, 1962L, 1962L, 
1962L, 1962L, 1962L, 1962L, 1962L, 1962L, 1962L, 1962L, 1962L, 
1962L, 1962L, 1962L, 1962L, 1962L, 1962L, 1962L, 1962L, 1962L, 
1962L, 1962L, 1962L, 1962L, 1962L, 1962L, 1962L, 1962L, 1962L, 
1962L, 1962L, 1962L, 1962L, 1962L, 1962L, 1962L, 1962L, 1962L, 
1962L), Discharge = c(0.44, 0.44, 0.44, 0.44, 0.44, 0.44, 0.44, 
0.44, 0.44, 0.44, 0.44, 0.44, 0.44, 0.44, 0.44, 0.44, 0.44, 0.44, 
0.44, 0.44, 0.44, 0.44, 0.35, 0.35, 0.35, 0.27, 0.27, 0.27, 0.27, 
NA, NA, 0.16, 0.16, 1.08, 1.08, 1.08, 1.08, 1.08, 1.08, 1.08, 
1.08, 1.08, 1.08, 1.08, 1.08, 1.08, 1.08, 1.08, 1.08, 1.08, 1.08, 
1.08, 1.08, 1.08, 1.08, 1.08, 1.08, 1.08, NA, 1.08, 1.08, 1.08, 
1.08, 0.65, 0.65, 0.65, NA, 0.44, 0.44, 0.44)), .Names = c("Year", 
"Discharge"), class = "data.frame", row.names = c(NA, -70L))
regards
eliza
??? ???  ??? ?  ??? ??? ? 
??? [[alternative HTML version deleted]]


______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
#
Hi Eliza,

No problem.? You could also use the na.approx from library zoo as suggested by Jorge.? In that case, the prediction can be based on only one column (Discharge).? I checked it and i was getting different values for the NA.

tail(datnewz)
?? dat1.Year Discharge
65????? 1962???? 0.650
66????? 1962???? 0.650
67????? 1962???? 0.545
68????? 1962???? 0.440
69????? 1962???? 0.440
70????? 1962???? 0.440

Here, I guess it is the average of nonmissing values before and after the missing values (#67).? The lm way of replacing missing value will be the predicted value for each levels of predictor.? I am not very sure about which one of these methods to choose.

Arun