Hi, My question is not directly related to R, but rather a basic question about statistics. I am hoping to receive valuable insights from the expert statisticians in this group. In some cases, when fitting a simple OLS regression, I obtain an estimated beta coefficient that is very small?for example, 0.00034?yet it still appears statistically significant based on the p-value. I am trying to understand how to interpret such a result in practical terms. From a magnitude perspective, such a small coefficient would not be expected to meaningfully affect the predicted response value, but statistically it is still considered significant. I would greatly appreciate any insights or explanations regarding this phenomenon. Thanks for your time.
Correct interpretation of a regression coefficient
13 messages · Sorkin, John, Michael Dewey, Brian Smith +6 more
Brian, Statistical significance and biological importance are two very different concepts. A finding can be statistically significant, but of no biological importance, or of biological importance but not statistically significant. In the first case I would report the finding by indicate in my discussion that the finding is of no biological importance. In the second case, I would think about the finding and as if my sample size was adequate. The path taken beyond this takes a good deal of thought. If the sample size was small, I would consider redoing the experiment with a larger sample. If the sample size is adequate I would entertain the thought that the finding represented random variation and would consider redoing the experiment in a different population. I hope this helps. John John David Sorkin M.D., Ph.D. Professor of Medicine, University of Maryland School of Medicine; Associate Director for Biostatistics and Informatics, Baltimore VA Medical Center Geriatrics Research, Education, and Clinical Center; Former PI Biostatistics and Informatics Core, University of Maryland School of Medicine Claude D. Pepper Older Americans Independence Center; Senior Statistician University of Maryland Center for Vascular Research; Division of Gerontology, Geriatrics and Palliative Medicine, 10 North Greene Street GRECC (BT/18/GR) Baltimore, MD 21201-1524 Cell phone 443-418-5382
From: R-help <r-help-bounces at r-project.org> on behalf of Brian Smith <briansmith199312 at gmail.com>
Sent: Sunday, March 8, 2026 7:50 AM
To: Ralf Goertz via R-help <r-help at r-project.org>
Subject: [R] Correct interpretation of a regression coefficient
Sent: Sunday, March 8, 2026 7:50 AM
To: Ralf Goertz via R-help <r-help at r-project.org>
Subject: [R] Correct interpretation of a regression coefficient
Hi, My question is not directly related to R, but rather a basic question about statistics. I am hoping to receive valuable insights from the expert statisticians in this group. In some cases, when fitting a simple OLS regression, I obtain an estimated beta coefficient that is very small?for example, 0.00034?yet it still appears statistically significant based on the p-value. I am trying to understand how to interpret such a result in practical terms. From a magnitude perspective, such a small coefficient would not be expected to meaningfully affect the predicted response value, but statistically it is still considered significant. I would greatly appreciate any insights or explanations regarding this phenomenon. Thanks for your time. ______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide https://www.r-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Dear Brian You have not given us much to go on here but the problem is often related to the scale of the variables. So if the coefficient is per year tryin to re-express time in months or weeks or days. Michael
On 08/03/2026 11:50, Brian Smith wrote:
Hi, My question is not directly related to R, but rather a basic question about statistics. I am hoping to receive valuable insights from the expert statisticians in this group. In some cases, when fitting a simple OLS regression, I obtain an estimated beta coefficient that is very small?for example, 0.00034?yet it still appears statistically significant based on the p-value. I am trying to understand how to interpret such a result in practical terms. From a magnitude perspective, such a small coefficient would not be expected to meaningfully affect the predicted response value, but statistically it is still considered significant. I would greatly appreciate any insights or explanations regarding this phenomenon. Thanks for your time.
______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide https://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Michael Dewey
Hi Michael, You made an interesting point that, scale of the underlying variable may be vastly different as compared with other variables in the equation. Could I use logarithm of that variable instead of raw? Another possibility is that we could standardise that variable. But IMO, for out of sample prediction, the interpretation of standardisation is not straightforward.
On Sun, 8 Mar 2026 at 23:05, Michael Dewey <lists at dewey.myzen.co.uk> wrote:
Dear Brian You have not given us much to go on here but the problem is often related to the scale of the variables. So if the coefficient is per year tryin to re-express time in months or weeks or days. Michael On 08/03/2026 11:50, Brian Smith wrote:
Hi, My question is not directly related to R, but rather a basic question about statistics. I am hoping to receive valuable insights from the expert statisticians in this group. In some cases, when fitting a simple OLS regression, I obtain an estimated beta coefficient that is very small?for example, 0.00034?yet it still appears statistically significant based on the p-value. I am trying to understand how to interpret such a result in practical terms. From a magnitude perspective, such a small coefficient would not be expected to meaningfully affect the predicted response value, but statistically it is still considered significant. I would greatly appreciate any insights or explanations regarding this phenomenon. Thanks for your time.
______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide https://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
-- Michael Dewey
Scale can be quite important to success in applying algorithms. In various of the nlsr, optim and optimx routines, there is provision for scaling, and though I'm the maintainer / part creator of some of these, my preference is to build the scale into my (nonlinear) models. While not all algorithms are affected in the same way, my experience is that getting the model set up so the parameters fall in the range .1 to 10 does make things a bit easier to read and to gauge "strange" results. Sorry this isn't more prescriptively helpful. JN
On 2026-03-08 14:15, Brian Smith wrote:
Hi Michael, You made an interesting point that, scale of the underlying variable may be vastly different as compared with other variables in the equation. Could I use logarithm of that variable instead of raw? Another possibility is that we could standardise that variable. But IMO, for out of sample prediction, the interpretation of standardisation is not straightforward. On Sun, 8 Mar 2026 at 23:05, Michael Dewey <lists at dewey.myzen.co.uk> wrote:
Dear Brian You have not given us much to go on here but the problem is often related to the scale of the variables. So if the coefficient is per year tryin to re-express time in months or weeks or days. Michael On 08/03/2026 11:50, Brian Smith wrote:
Hi, My question is not directly related to R, but rather a basic question about statistics. I am hoping to receive valuable insights from the expert statisticians in this group. In some cases, when fitting a simple OLS regression, I obtain an estimated beta coefficient that is very small?for example, 0.00034?yet it still appears statistically significant based on the p-value. I am trying to understand how to interpret such a result in practical terms. From a magnitude perspective, such a small coefficient would not be expected to meaningfully affect the predicted response value, but statistically it is still considered significant. I would greatly appreciate any insights or explanations regarding this phenomenon. Thanks for your time.
______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide https://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
-- Michael Dewey
______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide https://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
The `predict` method for regressions typically offers estimates of the "confidence interval" and the "prediction interval" as separate calculations. The former characterizes how well your regression estimates the systematic behavior of your data, while the latter addresses how precisely the regression can predict specific values. You should research the pitfalls of p-values... they tell you a lot about repeatability but little about significance. You should also not just think "differences in scale->log transformation"... logarithms "straighten out" data that is intrinsically exponential.... if that is how your data behaves then logarithms will help you linearize the analysis. There are lots of data arising from non-exponential processes for which a log will probably not help.
On March 8, 2026 11:15:54 AM PDT, Brian Smith <briansmith199312 at gmail.com> wrote:
Hi Michael, You made an interesting point that, scale of the underlying variable may be vastly different as compared with other variables in the equation. Could I use logarithm of that variable instead of raw? Another possibility is that we could standardise that variable. But IMO, for out of sample prediction, the interpretation of standardisation is not straightforward. On Sun, 8 Mar 2026 at 23:05, Michael Dewey <lists at dewey.myzen.co.uk> wrote:
Dear Brian You have not given us much to go on here but the problem is often related to the scale of the variables. So if the coefficient is per year tryin to re-express time in months or weeks or days. Michael On 08/03/2026 11:50, Brian Smith wrote:
Hi, My question is not directly related to R, but rather a basic question about statistics. I am hoping to receive valuable insights from the expert statisticians in this group. In some cases, when fitting a simple OLS regression, I obtain an estimated beta coefficient that is very small?for example, 0.00034?yet it still appears statistically significant based on the p-value. I am trying to understand how to interpret such a result in practical terms. From a magnitude perspective, such a small coefficient would not be expected to meaningfully affect the predicted response value, but statistically it is still considered significant. I would greatly appreciate any insights or explanations regarding this phenomenon. Thanks for your time.
______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide https://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
-- Michael Dewey
______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide https://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
-- Sent from my phone. Please excuse my brevity.
You have run into the fact that "is it there" (statistical significance) and "does it matter" (practical significance) are two different things. As to whether the coefficient is of practical significance, looking at its bigness is not the way to go. A small coefficient multiplying a large variable can have a large result, just as a large coefficient multiplying a small variable can have a small result. One thing you can do is to fit your model and then use the ?drop1 function to see the effect of dropping each coefficient from the model. You might even use ?step with direction="backward" for this if there are other coefficients you think might be unnecessary. Perhaps the
On Mon, 9 Mar 2026 at 00:51, Brian Smith <briansmith199312 at gmail.com> wrote:
Hi, My question is not directly related to R, but rather a basic question about statistics. I am hoping to receive valuable insights from the expert statisticians in this group. In some cases, when fitting a simple OLS regression, I obtain an estimated beta coefficient that is very small?for example, 0.00034?yet it still appears statistically significant based on the p-value. I am trying to understand how to interpret such a result in practical terms. From a magnitude perspective, such a small coefficient would not be expected to meaningfully affect the predicted response value, but statistically it is still considered significant. I would greatly appreciate any insights or explanations regarding this phenomenon. Thanks for your time.
______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide https://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Sometimes it is just a matter of units: If you change the predictor from millimeter to meter, then the regression coefficient automatically scales down by a factor 1000. The fit should be the same mathematically, although sometimes very extreme scale differences confuse the numerical algorithms. E.g. the design matrix can be declared singular even though it isn't. (Scale differences have to be pretty extreme to affect OLS, though. More common is that nonlinear methods are impacted via convergence criteria or numerical derivatives.) -pd
On 8 Mar 2026, at 19.15, Brian Smith <briansmith199312 at gmail.com> wrote: Hi Michael, You made an interesting point that, scale of the underlying variable may be vastly different as compared with other variables in the equation. Could I use logarithm of that variable instead of raw? Another possibility is that we could standardise that variable. But IMO, for out of sample prediction, the interpretation of standardisation is not straightforward. On Sun, 8 Mar 2026 at 23:05, Michael Dewey <lists at dewey.myzen.co.uk> wrote:
Dear Brian You have not given us much to go on here but the problem is often related to the scale of the variables. So if the coefficient is per year tryin to re-express time in months or weeks or days. Michael On 08/03/2026 11:50, Brian Smith wrote:
Hi, My question is not directly related to R, but rather a basic question about statistics. I am hoping to receive valuable insights from the expert statisticians in this group. In some cases, when fitting a simple OLS regression, I obtain an estimated beta coefficient that is very small?for example, 0.00034?yet it still appears statistically significant based on the p-value. I am trying to understand how to interpret such a result in practical terms. From a magnitude perspective, such a small coefficient would not be expected to meaningfully affect the predicted response value, but statistically it is still considered significant. I would greatly appreciate any insights or explanations regarding this phenomenon. Thanks for your time.
______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide https://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
-- Michael Dewey
______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide https://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Peter Dalgaard, Professor, Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Office: A 4.23 Email: pd.mes at cbs.dk Priv: PDalgd at gmail.com
Curiously enough, scale independence is lost in models that lack Nelder?s strong heredity (eg main effects are missing for interactions). Cheers, Andrew -- Andrew Robinson Director, CEBRA and Professor of Biosecurity, School/s of BioSciences and Mathematics & Statistics University of Melbourne, VIC 3010 Australia Tel: (+61) 0403 138 955 Email: apro at unimelb.edu.au<mailto:apro at unimelb.edu.au> Website: https://researchers.ms.unimelb.edu.au/~apro at unimelb/ I acknowledge the Traditional Owners of the land I inhabit, and pay my respects to their Elders.
On 9 Mar 2026 at 8:13?PM +1100, Peter Dalgaard <pdalgd at gmail.com>, wrote:
Sometimes it is just a matter of units: If you change the predictor from millimeter to meter, then the regression coefficient automatically scales down by a factor 1000. The fit should be the same mathematically, although sometimes very extreme scale differences confuse the numerical algorithms. E.g. the design matrix can be declared singular even though it isn't. (Scale differences have to be pretty extreme to affect OLS, though. More common is that nonlinear methods are impacted via convergence criteria or numerical derivatives.) -pd
On 8 Mar 2026, at 19.15, Brian Smith <briansmith199312 at gmail.com> wrote:
Hi Michael, You made an interesting point that, scale of the underlying variable may be vastly different as compared with other variables in the equation. Could I use logarithm of that variable instead of raw? Another possibility is that we could standardise that variable. But IMO, for out of sample prediction, the interpretation of standardisation is not straightforward.
On Sun, 8 Mar 2026 at 23:05, Michael Dewey <lists at dewey.myzen.co.uk> wrote:
Dear Brian You have not given us much to go on here but the problem is often related to the scale of the variables. So if the coefficient is per year tryin to re-express time in months or weeks or days. Michael On 08/03/2026 11:50, Brian Smith wrote:
Hi, My question is not directly related to R, but rather a basic question about statistics. I am hoping to receive valuable insights from the expert statisticians in this group. In some cases, when fitting a simple OLS regression, I obtain an estimated beta coefficient that is very small?for example, 0.00034?yet it still appears statistically significant based on the p-value. I am trying to understand how to interpret such a result in practical terms. From a magnitude perspective, such a small coefficient would not be expected to meaningfully affect the predicted response value, but statistically it is still considered significant. I would greatly appreciate any insights or explanations regarding this phenomenon. Thanks for your time.
______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide https://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
-- Michael Dewey
______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide https://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Peter Dalgaard, Professor, Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Office: A 4.23 Email: pd.mes at cbs.dk Priv: PDalgd at gmail.com ______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide https://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Example? Is this similar to language independence getting lost under similar circumstances because e.g. Ja/Nej in Danish sorts opposite to Yes/No? -pd
On 9 Mar 2026, at 10.34, Andrew Robinson <apro at unimelb.edu.au> wrote: Curiously enough, scale independence is lost in models that lack Nelder?s strong heredity (eg main effects are missing for interactions). Cheers, Andrew -- Andrew Robinson Director, CEBRA and Professor of Biosecurity, School/s of BioSciences and Mathematics & Statistics University of Melbourne, VIC 3010 Australia Tel: (+61) 0403 138 955 Email: apro at unimelb.edu.au Website: https://researchers.ms.unimelb.edu.au/~apro at unimelb/ I acknowledge the Traditional Owners of the land I inhabit, and pay my respects to their Elders. On 9 Mar 2026 at 8:13?PM +1100, Peter Dalgaard <pdalgd at gmail.com>, wrote:
Sometimes it is just a matter of units: If you change the predictor from millimeter to meter, then the regression coefficient automatically scales down by a factor 1000. The fit should be the same mathematically, although sometimes very extreme scale differences confuse the numerical algorithms. E.g. the design matrix can be declared singular even though it isn't. (Scale differences have to be pretty extreme to affect OLS, though. More common is that nonlinear methods are impacted via convergence criteria or numerical derivatives.) -pd
On 8 Mar 2026, at 19.15, Brian Smith <briansmith199312 at gmail.com> wrote: Hi Michael, You made an interesting point that, scale of the underlying variable may be vastly different as compared with other variables in the equation. Could I use logarithm of that variable instead of raw? Another possibility is that we could standardise that variable. But IMO, for out of sample prediction, the interpretation of standardisation is not straightforward. On Sun, 8 Mar 2026 at 23:05, Michael Dewey <lists at dewey.myzen.co.uk> wrote:
Dear Brian You have not given us much to go on here but the problem is often related to the scale of the variables. So if the coefficient is per year tryin to re-express time in months or weeks or days. Michael On 08/03/2026 11:50, Brian Smith wrote:
Hi, My question is not directly related to R, but rather a basic question about statistics. I am hoping to receive valuable insights from the expert statisticians in this group. In some cases, when fitting a simple OLS regression, I obtain an estimated beta coefficient that is very small?for example, 0.00034?yet it still appears statistically significant based on the p-value. I am trying to understand how to interpret such a result in practical terms. From a magnitude perspective, such a small coefficient would not be expected to meaningfully affect the predicted response value, but statistically it is still considered significant. I would greatly appreciate any insights or explanations regarding this phenomenon. Thanks for your time.
______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide https://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
-- Michael Dewey
______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide https://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
-- Peter Dalgaard, Professor, Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Office: A 4.23 Email: pd.mes at cbs.dk Priv: PDalgd at gmail.com
______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide https://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Peter Dalgaard, Professor, Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Office: A 4.23 Email: pd.mes at cbs.dk Priv: PDalgd at gmail.com
Hi Peter,
hopefully this clarifies.
## Here's a light-touch example
set.seed(8675309)
example <- data.frame(x1 = rnorm(100),
x2 = rnorm(100))
example$x3 <- example$x1 - 1
example$x4 <- example$x2 - 1
example$y <- with(example,
2 * x1 + 4 * x2 + x1 * x2 + rnorm(100) * 2)
## In the following code, the statistical information about the
## interaction term is the same across the two scalings
summary(lm(y ~ x1 * x2, data = example))
summary(lm(y ~ x3 * x4, data = example))
## In the following code, the statistical information about the
## interaction term is the not same across the two scalings
summary(lm(y ~ x1 + x1:x2, data = example))
summary(lm(y ~ x3 + x3:x4, data = example))
NB: this obscure fact was published in Robinson, A.P., Pocewicz, A.L., Gessler, P.E., 2004. A cautionary note on scaling variables that
appear only in products in ordinary least squares. Forest Biometry, Modelling and Information Sciences 1, 83?90. I first submitted it to Remote Sensing of the Environment (in which this failing to respect strong hierarchy is most pernicious) and R1 said it was completely obvious that failing to respect strong hierarchy was a stupid idea, reject; whereas R2 said they had never heard of this therefore it could not possibly be true, reject.
I'm not sure if it's similar to language independence .... ? Interesting conjecture! Can you unpack that a little?
Cheers,
Andrew
--
Andrew Robinson
Director, CEBRA and Professor of Biosecurity,
School/s of BioSciences and Mathematics & Statistics
University of Melbourne, VIC 3010 Australia
Tel: (+61) 0403 138 955
Email: apro at unimelb.edu.au<mailto:apro at unimelb.edu.au>
Website: https://researchers.ms.unimelb.edu.au/~apro at unimelb/
I acknowledge the Traditional Owners of the land I inhabit, and pay my respects to their Elders.
On Mar 9, 2026 at 21:04 +1100, Peter Dalgaard <pdalgd at gmail.com>, wrote:
Example? Is this similar to language independence getting lost under similar circumstances because e.g. Ja/Nej in Danish sorts opposite to Yes/No? -pd
On 9 Mar 2026, at 10.34, Andrew Robinson <apro at unimelb.edu.au> wrote:
Curiously enough, scale independence is lost in models that lack Nelder?s strong heredity (eg main effects are missing for interactions). Cheers, Andrew -- Andrew Robinson Director, CEBRA and Professor of Biosecurity, School/s of BioSciences and Mathematics & Statistics University of Melbourne, VIC 3010 Australia Tel: (+61) 0403 138 955 Email: apro at unimelb.edu.au Website: https://researchers.ms.unimelb.edu.au/~apro at unimelb/ I acknowledge the Traditional Owners of the land I inhabit, and pay my respects to their Elders.
On 9 Mar 2026 at 8:13?PM +1100, Peter Dalgaard <pdalgd at gmail.com>, wrote:
Sometimes it is just a matter of units: If you change the predictor from millimeter to meter, then the regression coefficient automatically scales down by a factor 1000. The fit should be the same mathematically, although sometimes very extreme scale differences confuse the numerical algorithms. E.g. the design matrix can be declared singular even though it isn't. (Scale differences have to be pretty extreme to affect OLS, though. More common is that nonlinear methods are impacted via convergence criteria or numerical derivatives.) -pd
On 8 Mar 2026, at 19.15, Brian Smith <briansmith199312 at gmail.com> wrote: Hi Michael, You made an interesting point that, scale of the underlying variable may be vastly different as compared with other variables in the equation. Could I use logarithm of that variable instead of raw? Another possibility is that we could standardise that variable. But IMO, for out of sample prediction, the interpretation of standardisation is not straightforward. On Sun, 8 Mar 2026 at 23:05, Michael Dewey <lists at dewey.myzen.co.uk> wrote:
Dear Brian You have not given us much to go on here but the problem is often related to the scale of the variables. So if the coefficient is per year tryin to re-express time in months or weeks or days. Michael On 08/03/2026 11:50, Brian Smith wrote:
Hi, My question is not directly related to R, but rather a basic question about statistics. I am hoping to receive valuable insights from the expert statisticians in this group. In some cases, when fitting a simple OLS regression, I obtain an estimated beta coefficient that is very small?for example, 0.00034?yet it still appears statistically significant based on the p-value. I am trying to understand how to interpret such a result in practical terms. From a magnitude perspective, such a small coefficient would not be expected to meaningfully affect the predicted response value, but statistically it is still considered significant. I would greatly appreciate any insights or explanations regarding this phenomenon. Thanks for your time.
______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide https://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
-- Michael Dewey
______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide https://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
-- Peter Dalgaard, Professor, Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Office: A 4.23 Email: pd.mes at cbs.dk Priv: PDalgd at gmail.com
______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide https://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
-- Peter Dalgaard, Professor, Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Office: A 4.23 Email: pd.mes at cbs.dk Priv: PDalgd at gmail.com
That's actually a shift and not a scaling, no? Just multiplying with a scalar is not going to change the result:
example$x3 <- example$x1*10 example$x4 <- example$x2*10 summary(lm(y ~ x1 + x1:x2, data = example))
Call:
lm(formula = y ~ x1 + x1:x2, data = example)
Residuals:
Min 1Q Median 3Q Max
-13.3499 -2.8748 0.6154 3.2637 11.4331
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.2437 0.4548 -0.536 0.593352
x1 1.9654 0.4932 3.985 0.000131 ***
x1:x2 1.5827 0.5249 3.015 0.003275 **
---
Signif. codes: 0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1
Residual standard error: 4.533 on 97 degrees of freedom
Multiple R-squared: 0.2232, Adjusted R-squared: 0.2072
F-statistic: 13.94 on 2 and 97 DF, p-value: 4.784e-06
summary(lm(y ~ x3 + x3:x4, data = example))
Call:
lm(formula = y ~ x3 + x3:x4, data = example)
Residuals:
Min 1Q Median 3Q Max
-13.3499 -2.8748 0.6154 3.2637 11.4331
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.243674 0.454821 -0.536 0.593352
x3 0.196545 0.049321 3.985 0.000131 ***
x3:x4 0.015827 0.005249 3.015 0.003275 **
---
Signif. codes: 0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1
Residual standard error: 4.533 on 97 degrees of freedom
Multiple R-squared: 0.2232, Adjusted R-squared: 0.2072
F-statistic: 13.94 on 2 and 97 DF, p-value: 4.784e-06
------
An intuitive explanation for your example is that ~ x1 + x1:x2 is bilinear, but if x1==0 there is no effect of x2.
With ~ x3 + x3:x4 it is still bilinear and no effect of x4 is equivalent to no effect of x2, but now that happens when x3==0, which is when x1==1. Same thing happens if you force a regression through the origin and move the origin.
---
I tried to come up with an example showing the corresponding effect with character variables in different languages and got as far as this
dd <- expand.grid(A=1:2, B=1:2, rep=1:5)
dd <- within(dd, {en_A <- c("F","M")[A]; en_B <- c("N","Y")[B]})
dd <- within(dd, {da_A <- c("K","M")[A]; da_B <- c("N","J")[B]})
dd$Y <- matrix(c(1,1,1,2),2)[cbind(dd$A,dd$B)] + rnorm(20, sd=.2)
summary(lm(Y~en_A * en_B, dd))
summary(lm(Y~da_A * da_B, dd))
With this, you'll find that the main effect of being male depends on whether we speak English or Danish.
Removing a main effect should then give similar effects to your example, but R's factor coding conventions get in the way because it codes the interaction term with the full indicator parametrization of the other term. I suppose you could force it either by directly modifying the design matrix or by using explicit 0-1 codings.
-pd
On 9 Mar 2026, at 22.54, Andrew Robinson <apro at unimelb.edu.au> wrote:
Hi Peter,
hopefully this clarifies.
## Here's a light-touch example
set.seed(8675309)
example <- data.frame(x1 = rnorm(100),
x2 = rnorm(100))
example$x3 <- example$x1 - 1
example$x4 <- example$x2 - 1
example$y <- with(example,
2 * x1 + 4 * x2 + x1 * x2 + rnorm(100) * 2)
## In the following code, the statistical information about the
## interaction term is the same across the two scalings
summary(lm(y ~ x1 * x2, data = example))
summary(lm(y ~ x3 * x4, data = example))
## In the following code, the statistical information about the
## interaction term is the not same across the two scalings
summary(lm(y ~ x1 + x1:x2, data = example))
summary(lm(y ~ x3 + x3:x4, data = example))
NB: this obscure fact was published in Robinson, A.P., Pocewicz, A.L., Gessler, P.E., 2004. A cautionary note on scaling variables that
appear only in products in ordinary least squares. Forest Biometry, Modelling and Information Sciences 1, 83?90. I first submitted it to Remote Sensing of the Environment (in which this failing to respect strong hierarchy is most pernicious) and R1 said it was completely obvious that failing to respect strong hierarchy was a stupid idea, reject; whereas R2 said they had never heard of this therefore it could not possibly be true, reject.
I'm not sure if it's similar to language independence .... ? Interesting conjecture! Can you unpack that a little?
Cheers,
Andrew
--
Andrew Robinson
Director, CEBRA and Professor of Biosecurity,
School/s of BioSciences and Mathematics & Statistics
University of Melbourne, VIC 3010 Australia
Tel: (+61) 0403 138 955
Email: apro at unimelb.edu.au
Website: https://researchers.ms.unimelb.edu.au/~apro at unimelb/
I acknowledge the Traditional Owners of the land I inhabit, and pay my respects to their Elders.
On Mar 9, 2026 at 21:04 +1100, Peter Dalgaard <pdalgd at gmail.com>, wrote:
Example? Is this similar to language independence getting lost under similar circumstances because e.g. Ja/Nej in Danish sorts opposite to Yes/No? -pd
On 9 Mar 2026, at 10.34, Andrew Robinson <apro at unimelb.edu.au> wrote: Curiously enough, scale independence is lost in models that lack Nelder?s strong heredity (eg main effects are missing for interactions). Cheers, Andrew -- Andrew Robinson Director, CEBRA and Professor of Biosecurity, School/s of BioSciences and Mathematics & Statistics University of Melbourne, VIC 3010 Australia Tel: (+61) 0403 138 955 Email: apro at unimelb.edu.au Website: https://researchers.ms.unimelb.edu.au/~apro at unimelb/ I acknowledge the Traditional Owners of the land I inhabit, and pay my respects to their Elders. On 9 Mar 2026 at 8:13?PM +1100, Peter Dalgaard <pdalgd at gmail.com>, wrote:
Sometimes it is just a matter of units: If you change the predictor from millimeter to meter, then the regression coefficient automatically scales down by a factor 1000. The fit should be the same mathematically, although sometimes very extreme scale differences confuse the numerical algorithms. E.g. the design matrix can be declared singular even though it isn't. (Scale differences have to be pretty extreme to affect OLS, though. More common is that nonlinear methods are impacted via convergence criteria or numerical derivatives.) -pd
On 8 Mar 2026, at 19.15, Brian Smith <briansmith199312 at gmail.com> wrote: Hi Michael, You made an interesting point that, scale of the underlying variable may be vastly different as compared with other variables in the equation. Could I use logarithm of that variable instead of raw? Another possibility is that we could standardise that variable. But IMO, for out of sample prediction, the interpretation of standardisation is not straightforward. On Sun, 8 Mar 2026 at 23:05, Michael Dewey <lists at dewey.myzen.co.uk> wrote:
Dear Brian You have not given us much to go on here but the problem is often related to the scale of the variables. So if the coefficient is per year tryin to re-express time in months or weeks or days. Michael On 08/03/2026 11:50, Brian Smith wrote:
Hi, My question is not directly related to R, but rather a basic question about statistics. I am hoping to receive valuable insights from the expert statisticians in this group. In some cases, when fitting a simple OLS regression, I obtain an estimated beta coefficient that is very small?for example, 0.00034?yet it still appears statistically significant based on the p-value. I am trying to understand how to interpret such a result in practical terms. From a magnitude perspective, such a small coefficient would not be expected to meaningfully affect the predicted response value, but statistically it is still considered significant. I would greatly appreciate any insights or explanations regarding this phenomenon. Thanks for your time.
______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide https://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
-- Michael Dewey
______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide https://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
-- Peter Dalgaard, Professor, Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Office: A 4.23 Email: pd.mes at cbs.dk Priv: PDalgd at gmail.com
______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide https://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
-- Peter Dalgaard, Professor, Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Office: A 4.23 Email: pd.mes at cbs.dk Priv: PDalgd at gmail.com
Peter Dalgaard, Professor, Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Office: A 4.23 Email: pd.mes at cbs.dk Priv: PDalgd at gmail.com
Hi Brian, This phenomenon is sometimes known as 'distinction without a difference', with 'distinction' referring to whether a parameter (in this case the regression coefficient) is identifiable, based on the experimental design. With enough observations of high precision, even scientifically insignificant quantities can be distinguished statistically. 'Difference' refers to the science - is this difference important to the scientific process? R can't make that judgment. That's the job of the domain scientist. David K Stevens, PhD, PE, Professor Emeritus Civil and Environmental Engineering Utah Water Research Laboratory Utah State University 8200 Old Main Hill Logan, UT 84322-8200 david.stevens at usu.edu (435) 797-3229 (office)
On 3/8/2026 4:50 AM, Brian Smith wrote:
Hi, My question is not directly related to R, but rather a basic question about statistics. I am hoping to receive valuable insights from the expert statisticians in this group. In some cases, when fitting a simple OLS regression, I obtain an estimated beta coefficient that is very small?for example, 0.00034?yet it still appears statistically significant based on the p-value. I am trying to understand how to interpret such a result in practical terms. From a magnitude perspective, such a small coefficient would not be expected to meaningfully affect the predicted response value, but statistically it is still considered significant. I would greatly appreciate any insights or explanations regarding this phenomenon. Thanks for your time.
______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide https://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.