Skip to content

Query on constrained regressions using -mgcv- and -pcls-

3 messages · Clive Nicholas, Bert Gunter

#
Hello all,

I'll level with you: I'm puzzled!

How is it that this constrained regression routine using -pcls- runs
satisfactorily (courtesy of Tian Zheng):

library(mgcv)
options(digits=3)
x.1=rnorm(100, 0, 1)
x.2=rnorm(100, 0, 1)
x.3=rnorm(100, 0, 1)
x.4=rnorm(100, 0, 1)
y=1+0.5*x.1-0.2*x.2+0.3*x.3+0.1*x.4+rnorm(100, 0, 0.01)
x.mat=cbind(rep(1, length(y)), x.1, x.2, x.3, x.4)
ls.print(lsfit(x.mat, y, intercept=FALSE))
M=list(y=y,
w=rep(1, length(y)),
X=x.mat,
C=matrix(0,0,0),
p=rep(1, ncol(x.mat)),
off=array(0,0),
S=list(),
sp=array(0,0),
Ain=diag(ncol(x.mat)),
bin=rep(0, ncol(x.mat)) )
pcls(M)
Residual Standard Error=0.0095
R-Square=1
F-statistic (df=5, 95)=314735
p-value=0

    Estimate Std.Err t-value Pr(>|t|)
       1.000  0.0010  1043.9        0
x.1    0.501  0.0010   512.6        0
x.2   -0.202  0.0009  -231.6        0
x.3    0.298  0.0010   297.8        0
x.4    0.103  0.0011    94.8        0

but this one does not for a panel dataset:

set.seed(02102020)
N=500
M=10
rater=rep(1:M, each = N)
lead_n=as.factor(rep(1:N,M))
a=rep(rnorm(N),M)
z=rep(round(25+2*rnorm(N)+.2*a))
x=a+rnorm(N*M)
y=.5*x+5*a-.5*z+2*rnorm(N*M)
x_cl=rep(aggregate(x,list(lead_n) mean)[,2],M)
model=lm(y~x+x_cl+z)
summary(model)
y=1+1.5*x+4.6*x_cl-0.5*z
x.mat=cbind(rep(1,length(y)),x,x_cl,z)
ls.print(lsfit(x.mat,y,intercept=FALSE))

Residual Standard Error=0
R-Square=1
F-statistic (df=4, 4996)=5.06e+30
p-value=0

     Estimate Std.Err   t-value Pr(>|t|)
          1.0       0  2.89e+13        0
x         0.8       0  2.71e+14        0
x_cl      4.6       0  1.18e+15        0
z        -0.5       0 -3.63e+14        0

?

There shouldn't be anything wrong with the second set of data, unless I've
missed something obvious (that constraints don't work for panel data? Seems
unlikely to me)!

Also:

(1) I'm ultimately looking just to constrain ONE coefficient whilst
allowing the other coefficients to be unconstrained (I tried this with the
first dataset by setting

y=1+0.5*x.1-x.2+x.3+x.4

in the call, but got similar-looking output to what I got in the second
dataset); and

(2) it would be really useful to have the call to -pcls(M)- produce more
informative output (SEs, t-values, fit stats, etc).

Many thanks in anticipation of your expert help and being told what a
clueless berk I am,
Clive
#
As an addendum / erratum to my original post, the second block of code
should read for completeness:

set.seed(02102020)
N=500
M=10
rater=rep(1:M, each = N)
lead_n=as.factor(rep(1:N,M))
a=rep(rnorm(N),M)
z=rep(round(25+2*rnorm(N)+.2*a))
x=a+rnorm(N*M)
y=.5*x+5*a-.5*z+2*rnorm(N*M)
x_cl=rep(aggregate(x,list(lead_n) mean)[,2],M)
model=lm(y~x+x_cl+z)
summary(model)
y=1+1.5*x+4.6*x_cl-0.5*z
x.mat=cbind(rep(1,length(y)),x,x_cl,z)
ls.print(lsfit(x.mat,y,intercept=FALSE))
M=list(y=y,
w=rep(1, length(y)),
X=x.mat,
C=matrix(0,0,0),
p=rep(1, ncol(x.mat)),
off=array(0,0),
S=list(),
sp=array(0,0),
Ain=diag(ncol(x.mat)),
bin=rep(0, ncol(x.mat)) )
pcls(M)

However, all my questions stand.

Ta, Clive

On Tue, 3 Nov 2020 at 01:14, Clive Nicholas <clivelists at googlemail.com>
wrote:

  
    
#
Warning: I did *not* attempt to follow your query(original or addendum) in
detail. But as you have not yet received a reply, it may be because your
post seems mostly about statistical issues, which are generally off topic
here. This list is primarily about R programming issues. If statistical
issues are your primary focus, SO may be a better place to post:
https://stats.stackexchange.com/

Otherwise, I guess you'll just have to continue waiting.

Incidentally, suggestions for improvements in nonstandard packages should
generally be sent to the package maintainer (?maintainer) rather than
posted here. Maintainers may not even check this list.

Finally, this is a plain text list. HTML posts often get mangled by the
server.

Bert Gunter

"The trouble with having an open mind is that people keep coming along and
sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )


On Mon, Nov 2, 2020 at 9:28 PM Clive Nicholas via R-help <
r-help at r-project.org> wrote: