HELP! Excel and R give me totally different regression results using the exact same data

Okay. Sorry for being vague in my earlier message. I had missed a few lines
from your message because they were hiding well in my own email. I am really
on the learning side with this, so it will take some time. Sorry.

There seem to be two issues: (1) Me preparing the data incorrectly and (2)
the data not being fit for regression. Right?
Well. the second point might be more correctly stated that the data do not meet the conditions for valid inference using linear regression. Since the goals of the exercise have never been stated, it is difficult to say whether other regression methods migh be more applicable.
Ad1. Point about header taken. As to using characters in a matrix, I extract
the data from data files from the National Weather Service. I extract
observations together with dates and location names. Each row comes consists
of date, location and observations.  I chose to store them in matrices
because I can combine them to arrays. A matrix can only have one type of
data, so I chose to leave them all as characters.
That is generally the reason people use data.frames.
When I proceed to do a
regression analysis I transform the observations  into numbers using
as.numeric(). Do you have a different suggestion? Will R give me different
results if I store characters in a matrix?
It shouldn't, but it seems unnecessarily convoluted and prone to errors.
Even though such excerpts from a long script aren't very informative, to be
complete:
collection <- matrix(rep(NA,25),ncol=25)        #collection will be a row of
the output matrix later on. 
#extract dates

collection[1] < -paste(year,"/",  substring(.file,125,126), "/", substring(.file, 127, 128), sep="")
That is only going to change the first element of 'collection'. You should study the help page for "[". If you were changing the first column it would need to be a different call on the LHS.
#extract observations
           collection[start.write+i]<-(substring(input , fields[[i]][1] ,
fields[[i]][2]))
Again, possibly not what you thought you were doing.Lack of context prevents further analysis.
Ad2.  You mention heteroscedasticity and non-normality of residuals. To keep
it short I had provided just a subset of the data I have (100 of 4000 matrix
rows). But the same is true for the whole dataset. I attached the whole
thing this time.  test_complete.txt
<http://r.789695.n4.nabble.com/file/n4648759/test_complete.txt>  How do I
deal with this?
str(dat)
'data.frame':	3548 obs. of  5 variables:
 $ V1: num  1.91 1.9 1.93 2.16 1.9 1.87 1.87 2.01 2.8 2.11 ...
 $ V2: num  1.86 1.9 1.91 1.88 1.87 1.88 6.94 2.01 2.03 2.09 ...
 $ V3: num  1.89 1.94 1.9 1.85 1.86 1.88 2.01 2 2.03 2.06 ...
 $ V4: num  1.92 1.96 1.91 1.83 1.85 1.87 2.01 2.03 2.04 2.03 ...
 $ V5: num  2.1 2 1.93 1.92 1.85 1.86 2.02 2.15 2.08 2.03 ...
lm(V1 ~ ., data=dat)
Call:
lm(formula = V1 ~ ., data = dat)

Coefficients:
(Intercept)           V2           V3           V4           V5  
     0.1291       0.3378       0.2079       0.2635       0.1460
summary( lm(V1 ~ ., data=dat))
Call:
lm(formula = V1 ~ ., data = dat)

Residuals:
     Min       1Q   Median       3Q      Max 
-13.3116  -0.1825  -0.0304   0.0959  27.0989 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  0.12906    0.03840   3.361 0.000784 ***
V2           0.33783    0.01768  19.111  < 2e-16 ***
V3           0.20789    0.01686  12.329  < 2e-16 ***
V4           0.26346    0.01784  14.768  < 2e-16 ***
V5           0.14596    0.01672   8.728  < 2e-16 ***
---
Signif. codes:  0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1 

Residual standard error: 1.781 on 3543 degrees of freedom
Multiple R-squared: 0.7693,	Adjusted R-squared: 0.7691 
F-statistic:  2954 on 4 and 3543 DF,  p-value: < 2.2e-16
with(dat, plot(V2, V1) )
Hit <Return> to see next plot: 

-------------- next part --------------
A non-text attachment was scrubbed...
Name: Rplot.png
Type: image/png
Size: 139409 bytes
Desc: not available
URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20121107/ecd2057a/attachment.png>
-------------- next part --------------

There appears to be quite a bit of "structure" in that plot.And a rather similar structure in 

with(dat, plot(V3, V1) )
I admit I am pretty clueless in this case. Can I do
meaningful regression at all? (I didn't expect test[,3] to be good predictor
but had hopes for test[,2]. 
What are these data and what are the scientific questions? You appear to think a) I can look over your shoulder and see your display and b) deduce your goals from extremely fragmentary evidence. I have a lower opinion of my ability to accomplish those tasks.
The residuals are definitely not normally distributed.
Not generally the biggest concern. But again you provide no code. Nabble-users are unfortunately notorious in rhelp for not reading the Posting Guide, and some do not seem even  to understand that rhelp is not Nabble.
They do not seem to related to either of the two predictors.
Well, that second outcome would be the expected (even the desired) outcome of a regression wouldn't it? You would want the relationships to be in the prediction and the residuals to have zero correlations with
What is the conclusion from that? 

Thanks for your patience!
I'm rapidly running out of patience, however. Please read the PostingGuide more thoroughly than you appear to have done so far.
--
View this message in context: http://r.789695.n4.nabble.com/HELP-Excel-and-R-give-me-totally-different-regression-results-using-the-exact-same-data-tp4648648p4648759.html
Sent from the R help mailing list archive at Nabble.com.

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
David Winsemius, MD
Alameda, CA, USA

HELP! Excel and R give me totally different regression results using the exact same data

Thread (7 messages)