question about update() - R-help

Duncan Murdoch · 2023-05-04T08:34:00Z

> On 4 May 2023, at 10:26, Duncan Murdoch wrote: > > On 04/05/2023 4:05 a.m., Adelchi Azzalini via R-help wrote: >> Hi. There must be something about the use of update() which I do not grasp, >> as the next exercise indicates. >> Suppose that obj is an object returned by a call to lm() or glm(). >> Next, a new variable xf is constructed using the same dataframe used >> for producing obj. Then >> obj$data > new.obj <- update(obj, . ~ . + x

Adelchi Azzalini

Thu, May 4, 2023 1:34 AM #

Thanks, Duncan. What you indicate is surely the ideal route. Unfortunately, in my case this is not feasible, because the construction of xf and the update call are within an iterative procedure where xf is changed at each iteration, so that the steps 

obj$data <- cbind(obj$data, xf=xf)
new.obj <- update(obj, . ~ . + xf)
 
must be repeated hundreds of times, each with a different xf.

Adelchi

Duncan Murdoch

Thu, May 4, 2023 1:49 AM #

On 04/05/2023 4:34 a.m., Adelchi Azzalini wrote:

Sorry, that doesn't make sense.

You didn't show us complete code, but presumably it's preceded by 
something like this:

   obj <- glm( ..., data = somedata)

So change your modification to this:

   somedata$xf <- xf

That can be done hundreds of times.  This will need to be more elaborate 
if the function doing the iteration has a copy of obj but doesn't have a 
copy of somedata, but there are lots of ways to resolve that.  Without 
seeing complete code, I can't recommend which one to use.

Duncan Murdoch

Berwin A Turlach

Thu, May 4, 2023 2:44 AM #

G'day Adelchi,

hope all is well with you.

On Thu, 4 May 2023 10:34:00 +0200

Adelchi Azzalini via R-help <r-help at r-project.org> wrote:

If memory serves correctly, update() takes the object that is passed to
it, looks at what the call was that created that object, modifies that
call according to the additional arguments, and finally executes the
modified call.

So there is a lot of manipulations going on in update().  In particular
it would result each time in a call to lm(), glm() or whatever call was
used to create the object.  Inside any of these modelling functions a
lot of symbolic manipulations/calculations are needed too (parsing the
formula, creating the design matrix and response vector from the parsed
formula and data frame, checking if weights are used &c).

If you do the same calculation essentially over and over again, just
with minor modification, all these symbolic manipulations are just time
consuming.

IMHO, you will be better off to bypass update() and just use lm.fit()
(for which lm() is a nice front-end) and glm.fit() (for which glm() is a
nice front-end), or whatever routine does the grunt work of fitting the
model to the data in your application (hopefully, the package creator
used a set up of XXX.fit() to fit the model, called by XXX() that does
all the fancy formula handling).

Cheers,

	Berwin

Adelchi Azzalini

Thu, May 4, 2023 5:56 AM #

Hi, Berwin, good to hear from you, and thanks for the detailed comments and suggestion.

Actually, my current experimental code works in the way that you suggest, calling directly lm.fit and glm.fit.  What I am trying to develop is an ?improved? version of the code for distribution to other people. Hence I wanted to streamline the code, in particular avoiding branches for each fitting procedure (lm.fit, glm.fit and possibly more).  But I am now considering to drop the idea of the ?improved? version, and stick to the direct calls to the fitting functions.

Duncan, thanks for your additional comments. It is true that my original message presented a very simplified picture of the problem, possibly over-simplistic.  If I present the problem in the full version of the code, it would look quite long and messy. If I manage to construct a reasonably simplified version of the code, I shall post the question again.

Best wishes,

Adelchi