Skip to content

Correct interpretation of a regression coefficient

13 messages · Sorkin, John, Michael Dewey, Brian Smith +6 more

#
Hi,

My question is not directly related to R, but rather a basic question
about statistics. I am hoping to receive valuable insights from the
expert statisticians in this group.

In some cases, when fitting a simple OLS regression, I obtain an
estimated beta coefficient that is very small?for example, 0.00034?yet
it still appears statistically significant based on the p-value.

I am trying to understand how to interpret such a result in practical
terms. From a magnitude perspective, such a small coefficient would
not be expected to meaningfully affect the predicted response value,
but statistically it is still considered significant.

I would greatly appreciate any insights or explanations regarding this
phenomenon.

Thanks for your time.
#
Brian,
Statistical significance and biological importance are two very different concepts. A finding can be statistically significant, but of no biological importance, or of biological importance but  not statistically significant.
In the first case  I would report the finding by indicate in my discussion that the finding is of no biological importance. In the second case, I would think about the finding and as if my sample size was adequate. The path taken beyond this takes a good deal of thought. If the sample size was small, I would consider redoing the experiment with a larger sample. If the sample size is adequate I would entertain the thought that the finding represented random variation and would consider redoing the experiment in a different population.

I hope this helps.

John



John David Sorkin M.D., Ph.D.
Professor of Medicine, University of Maryland School of Medicine;
Associate Director for Biostatistics and Informatics, Baltimore VA Medical Center Geriatrics Research, Education, and Clinical Center;
Former PI Biostatistics and Informatics Core, University of Maryland School of Medicine Claude D. Pepper Older Americans Independence Center;
Senior Statistician University of Maryland Center for Vascular Research;

Division of Gerontology, Geriatrics and Palliative Medicine,
10 North Greene Street
GRECC (BT/18/GR)
Baltimore, MD 21201-1524
Cell phone 443-418-5382
#
Dear Brian

You have not given us much to go on here but the problem is often 
related to the scale of the variables. So if the coefficient is per year 
tryin to re-express time in months or weeks or days.

Michael
On 08/03/2026 11:50, Brian Smith wrote:

  
    
#
Hi Michael,

You made an interesting point that, scale of the underlying variable
may be vastly different as compared with other variables in the
equation.

Could I use logarithm of that variable instead of raw? Another
possibility is that we could standardise that variable. But IMO, for
out of sample prediction, the interpretation of standardisation is not
straightforward.
On Sun, 8 Mar 2026 at 23:05, Michael Dewey <lists at dewey.myzen.co.uk> wrote:
#
Scale can be quite important to success in applying algorithms.
In various of the nlsr, optim and optimx routines, there is provision
for scaling, and though I'm the maintainer / part creator of some
of these, my preference is to build the scale into my (nonlinear)
models. While not all algorithms are affected in the same way,
my experience is that getting the model set up so the parameters
fall in the range .1 to 10 does make things a bit easier to read
and to gauge "strange" results.

Sorry this isn't more prescriptively helpful.

JN
On 2026-03-08 14:15, Brian Smith wrote:
#
The `predict` method for regressions typically offers estimates of the "confidence interval" and the "prediction interval" as separate calculations. The former characterizes how well your regression estimates the systematic behavior of your data, while the latter addresses how precisely the regression can predict specific values. 

You should research the pitfalls of p-values... they tell you a lot about repeatability but little about significance. 

You should also not just think "differences in scale->log transformation"... logarithms "straighten out" data that is intrinsically exponential.... if that is how your data behaves then logarithms will help you linearize the analysis. There are lots of data arising from non-exponential processes for which a log will probably not help.
On March 8, 2026 11:15:54 AM PDT, Brian Smith <briansmith199312 at gmail.com> wrote:
--
Sent from my phone. Please excuse my brevity.
#
You have run into the fact that "is it there" (statistical significance)
and "does it matter" (practical significance) are two different things.  As
to whether the coefficient is of practical significance, looking at its
bigness is not the way to go.  A small coefficient multiplying a large
variable can have a large result, just as a large coefficient multiplying a
small variable can have a small result.

One thing you can do is to fit your model and then use the ?drop1 function
to see the effect of dropping each coefficient from the model.  You might
even use ?step with direction="backward" for this if there are other
coefficients you think might be unnecessary.

Perhaps the
On Mon, 9 Mar 2026 at 00:51, Brian Smith <briansmith199312 at gmail.com> wrote:

            

  
  
#
Sometimes it is just a matter of units: If you change the predictor from millimeter to meter, then the regression coefficient automatically scales down by a factor 1000. The fit should be the same mathematically, although sometimes very extreme scale differences confuse the numerical algorithms. E.g. the design matrix can be declared singular even though it isn't. 

(Scale differences have to be pretty extreme to affect OLS, though. More common is that nonlinear methods are impacted via convergence criteria or numerical derivatives.)

-pd

  
    
#
Curiously enough, scale independence is lost in models that lack Nelder?s strong heredity (eg main effects are missing for interactions).

Cheers,

Andrew

--
Andrew Robinson
Director, CEBRA and Professor of Biosecurity,
School/s of BioSciences and Mathematics & Statistics
University of Melbourne, VIC 3010 Australia
Tel: (+61) 0403 138 955
Email: apro at unimelb.edu.au<mailto:apro at unimelb.edu.au>
Website: https://researchers.ms.unimelb.edu.au/~apro at unimelb/

I acknowledge the Traditional Owners of the land I inhabit, and pay my respects to their Elders.
On 9 Mar 2026 at 8:13?PM +1100, Peter Dalgaard <pdalgd at gmail.com>, wrote:
Sometimes it is just a matter of units: If you change the predictor from millimeter to meter, then the regression coefficient automatically scales down by a factor 1000. The fit should be the same mathematically, although sometimes very extreme scale differences confuse the numerical algorithms. E.g. the design matrix can be declared singular even though it isn't.

(Scale differences have to be pretty extreme to affect OLS, though. More common is that nonlinear methods are impacted via convergence criteria or numerical derivatives.)

-pd
On 8 Mar 2026, at 19.15, Brian Smith <briansmith199312 at gmail.com> wrote:
Hi Michael,

You made an interesting point that, scale of the underlying variable
may be vastly different as compared with other variables in the
equation.

Could I use logarithm of that variable instead of raw? Another
possibility is that we could standardise that variable. But IMO, for
out of sample prediction, the interpretation of standardisation is not
straightforward.
On Sun, 8 Mar 2026 at 23:05, Michael Dewey <lists at dewey.myzen.co.uk> wrote:
______________________________________________
R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide https://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

--
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Office: A 4.23
Email: pd.mes at cbs.dk Priv: PDalgd at gmail.com

______________________________________________
R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide https://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
#
Example?

Is this similar to language independence getting lost under similar circumstances because e.g. Ja/Nej in Danish sorts opposite to Yes/No?

-pd

  
    
#
Hi Peter,

hopefully this clarifies.

## Here's a light-touch example

set.seed(8675309)

example <- data.frame(x1 = rnorm(100),
                     x2 = rnorm(100))

example$x3 <- example$x1 - 1
example$x4 <- example$x2 - 1

example$y <- with(example,
                 2 * x1 + 4 * x2 + x1 * x2 + rnorm(100) * 2)

## In the following code, the statistical information about the
## interaction term is the same across the two scalings

summary(lm(y ~ x1 * x2, data = example))
summary(lm(y ~ x3 * x4, data = example))

## In the following code, the statistical information about the
## interaction term is the not same across the two scalings

summary(lm(y ~ x1 + x1:x2, data = example))
summary(lm(y ~ x3 + x3:x4, data = example))

NB: this obscure fact was published in Robinson, A.P., Pocewicz, A.L., Gessler, P.E., 2004. A cautionary note on scaling variables that
appear only in products in ordinary least squares. Forest Biometry, Modelling and Information Sciences 1, 83?90.  I first submitted it to Remote Sensing of the Environment (in which this failing to respect strong hierarchy is most pernicious) and R1 said it was completely obvious that failing to respect strong hierarchy was a stupid idea, reject; whereas R2 said they had never heard of this therefore it could not possibly be true, reject.

I'm not sure if it's similar to language independence .... ? Interesting conjecture!  Can you unpack that a little?

Cheers,

Andrew

--
Andrew Robinson
Director, CEBRA and Professor of Biosecurity,
School/s of BioSciences and Mathematics & Statistics
University of Melbourne, VIC 3010 Australia
Tel: (+61) 0403 138 955
Email: apro at unimelb.edu.au<mailto:apro at unimelb.edu.au>
Website: https://researchers.ms.unimelb.edu.au/~apro at unimelb/

I acknowledge the Traditional Owners of the land I inhabit, and pay my respects to their Elders.
On Mar 9, 2026 at 21:04 +1100, Peter Dalgaard <pdalgd at gmail.com>, wrote:
Example?

Is this similar to language independence getting lost under similar circumstances because e.g. Ja/Nej in Danish sorts opposite to Yes/No?

-pd
On 9 Mar 2026, at 10.34, Andrew Robinson <apro at unimelb.edu.au> wrote:
Curiously enough, scale independence is lost in models that lack Nelder?s strong heredity (eg main effects are missing for interactions).
Cheers,
Andrew

--
Andrew Robinson
Director, CEBRA and Professor of Biosecurity,
School/s of BioSciences and Mathematics & Statistics
University of Melbourne, VIC 3010 Australia
Tel: (+61) 0403 138 955
Email: apro at unimelb.edu.au
Website: https://researchers.ms.unimelb.edu.au/~apro at unimelb/

I acknowledge the Traditional Owners of the land I inhabit, and pay my respects to their Elders.
On 9 Mar 2026 at 8:13?PM +1100, Peter Dalgaard <pdalgd at gmail.com>, wrote:
--
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Office: A 4.23
Email: pd.mes at cbs.dk Priv: PDalgd at gmail.com
#
That's actually a shift and not a scaling, no? Just multiplying with a scalar is not going to change the result:
Call:
lm(formula = y ~ x1 + x1:x2, data = example)

Residuals:
     Min       1Q   Median       3Q      Max 
-13.3499  -2.8748   0.6154   3.2637  11.4331 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  -0.2437     0.4548  -0.536 0.593352    
x1            1.9654     0.4932   3.985 0.000131 ***
x1:x2         1.5827     0.5249   3.015 0.003275 ** 
---
Signif. codes:  0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1

Residual standard error: 4.533 on 97 degrees of freedom
Multiple R-squared:  0.2232,	Adjusted R-squared:  0.2072 
F-statistic: 13.94 on 2 and 97 DF,  p-value: 4.784e-06
Call:
lm(formula = y ~ x3 + x3:x4, data = example)

Residuals:
     Min       1Q   Median       3Q      Max 
-13.3499  -2.8748   0.6154   3.2637  11.4331 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept) -0.243674   0.454821  -0.536 0.593352    
x3           0.196545   0.049321   3.985 0.000131 ***
x3:x4        0.015827   0.005249   3.015 0.003275 ** 
---
Signif. codes:  0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1

Residual standard error: 4.533 on 97 degrees of freedom
Multiple R-squared:  0.2232,	Adjusted R-squared:  0.2072 
F-statistic: 13.94 on 2 and 97 DF,  p-value: 4.784e-06
------

An intuitive explanation for your example is that ~ x1 + x1:x2 is bilinear, but if x1==0 there is no effect of x2. 
With  ~ x3 + x3:x4 it is still bilinear and no effect of x4 is equivalent to no effect of x2, but now that happens when x3==0, which is when x1==1. Same thing happens if you force a regression through the origin and move the origin.

---
I tried to come up with an example showing the corresponding effect with character variables in different languages and got as far as this

dd <- expand.grid(A=1:2, B=1:2, rep=1:5) 
dd <- within(dd, {en_A <- c("F","M")[A]; en_B <- c("N","Y")[B]})
dd <- within(dd, {da_A <- c("K","M")[A]; da_B <- c("N","J")[B]})
dd$Y <- matrix(c(1,1,1,2),2)[cbind(dd$A,dd$B)] + rnorm(20, sd=.2)
summary(lm(Y~en_A * en_B, dd))
summary(lm(Y~da_A * da_B, dd))

With this, you'll find that the main effect of being male depends on whether we speak English or Danish.

Removing a main effect should then give similar effects to your example, but R's factor coding conventions get in the way because it  codes the interaction term with the full indicator parametrization of the other term. I suppose you could force it either by directly modifying the design matrix or by using explicit 0-1 codings.

-pd

  
    
#
Hi Brian,

This phenomenon is sometimes known as 'distinction without a 
difference', with 'distinction' referring to whether a parameter (in 
this case the regression coefficient) is identifiable, based on the 
experimental design. With enough observations of high precision, even 
scientifically insignificant quantities can be distinguished 
statistically. 'Difference' refers to the science - is this difference 
important to the scientific process? R can't make that judgment. That's 
the job of the domain scientist.

David K Stevens, PhD, PE, Professor Emeritus
Civil and Environmental Engineering
Utah Water Research Laboratory
Utah State University
8200 Old Main Hill
Logan, UT 84322-8200
david.stevens at usu.edu
(435) 797-3229 (office)
On 3/8/2026 4:50 AM, Brian Smith wrote: