Skip to content

Loop with string variable AND customizable "summary" output

14 messages · C.Rosa at lse.ac.uk, Roger Bivand, Wensui Liu +4 more

#
Dear All,

I am using R for my research and I have two questions about it:

1) is it possible to create a loop using a string, instead of a numeric vector? I have in mind a specific problem:

Suppose you have 2 countries: UK, and USA, one dependent (y) and one independent variable (y) for each country (vale a dire: yUK, xUK, yUSA, xUSA) and you want to run automatically the following regressions:

 

for (i in c("UK","USA"))

output{i}<-summary(lm(y{i} ~ x{i}))

 

In other words, at the end I would like to have two objects as output: "outputUK" and "outputUSA", which contain respectively the results of the first and second regression (yUK on xUK and yUSA on xUSA). 

 

2) in STATA there is a very nice code ("outreg") to display nicely (and as the user wants to) your regression results.

Is there anything similar in R / R contributed packages? More precisely, I am thinking of something that is close in spirit to "summary" but it is also customizable. For example, suppose you want different Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 or a different format display (i.e. without "t value" column) implemented automatically (without manually editing it every time).

In alternative, if I was able to see it, I could modify the source code of the function "summary", but I am not able to see its (line by line) code. Any idea?

Or may be a customizable regression output already exists?

Thanks really a lot!

Carlo
#
On Mon, 29 Jan 2007 C.Rosa at lse.ac.uk wrote:

            
The input data could be reshaped as y, x, country, and subset= used in the 
lm() call. To assign to named objects see assign(), but consider using a 
named list instead, assigning to a list of the required length in turn, 
and giving the names from the defining vector. Then you'd get output$UK, 
etc.
Use a custom function on the output object from using the summary() method 
on the lm object (that is on the summary.lm object). Use str() to look at 
the summary.lm object to see what you want.

  
    
#
Carlo,

try something like:

for (i in c("UK","USA"))
{
summ<-summary(lm(y ~ x), subset = (country = i))
assign(paste('output', i, sep = ''), summ);
}

(note: it is untested, sorry).
On 1/29/07, C.Rosa at lse.ac.uk <C.Rosa at lse.ac.uk> wrote:

  
    
#
C.Rosa wrote:
Consider R functions bquote, substitute, eval and parse.

Several examples are given somewhere in RNews
(http://cran.r-project.org/doc/Rnews/)
Unfortunately I don't remember exactly which issue, one of list members sent
me a link to the article several years ago, when I was studying similar
question.
C.Rosa wrote:
Stars and significance codes are printed with the symnum function.

To customize the summary, explore the result returned by the lm.
For example, 
  str(outputUK)

you will see, it is a list.
Then you will be able to reference its elements with $ (say, outputUK$coeff)

R is an object oriented language, and calls of the same function on
different objects usually invoke different functions (if a class has a
description of proper method). 
The R manuals contain very good description of this mechanism. 

Function methods gives you a list of all defined methods
For example
If you are working with the lm results, you need to explore the function
print.summary.lm
invokes summary.lm function, as outputUK is the object of class "lm". 
This function produces the object of class "summary.lm"
Then this object is printed with the method print.summary.lm
#
That is
C.Rosa wrote:
for (i in c("UK","USA")) {
  lm.txt<-paste("output",i,"<-","lm(","y",i,"x",i,")",sep="") # 1. produce a
character string containing needed expression
  eval(parse(text=lm.txt))                                                 #
2. parse and evaluate it
}
#
Vladimir Eremeev wrote:

  
    
#
Dear All,
Thank you very much for your help!
Carlo

-----Original Message-----
From: Wensui Liu [mailto:liuwensui at gmail.com]
Sent: Mon 29/01/2007 15:39
To: Rosa,C
Cc: r-help at stat.math.ethz.ch
Subject: Re: [R] Loop with string variable AND customizable "summary" output
 
Carlo,

try something like:

for (i in c("UK","USA"))
{
summ<-summary(lm(y ~ x), subset = (country = i))
assign(paste('output', i, sep = ''), summ);
}

(note: it is untested, sorry).
On 1/29/07, C.Rosa at lse.ac.uk <C.Rosa at lse.ac.uk> wrote:

  
    
#
Often you will find that if you arrange your data in a
desirable way in the first place everything becomes
easier.  What you really want is a data frame such
as the last three columns of the builtin data frame
CO2 where Treatment corresponds to country and
the two numeric variables correspond to your y and x.

Then its easy:

lapply(levels(CO2$Treatment), function(lev)
   lm(uptake ~ conc, CO2, subset = Treatment == lev))

The only problem with the above is that the Call: in the
output does not really tell you which level of Treatment
is being used since it literally shows
  "lm(uptake ~ conc, CO2, subset = Treatment == lev)"
each time.  To get around substitute the value of lev in.
Because R uses delayed evaluation you also need to force the
evaluation of lev prior to substituting it in:

lapply(levels(CO2$Treatment), function(lev) {
   lev <- force(lev)
   eval(substitute(lm(uptake ~ conc, CO2, subset = Treatment == lev)),
     list(lev = lev))
})


Now if you really want to do it the way you specified originally
try this.

Suppose we use attach to grab the variables
x1, x2, x3, x4, y1, y2, y3, y4 out of the builtin
anscombe data frame for purposes of getting
our hands on some sample data.   In your case
the variables would already be in the workspace
so the attach is not needed.

Then simply reconstruct the formula in fo.  You
could simply use lm(fo) but then the Call: in the
output of lm would literally read lm(fo) so its
better to use do.call:

# next line gives the variables x1, x2, x3, x4, y1, y2, y3, y4
# from the builtin ancombe data set.
# In your case such variables would already exist.
attach(anscombe)
lapply(1:4, function(i) {
   ynm <- paste("y", i, sep = "")
   xnm <- paste("x", i, sep = "")
   fo <- as.formula(paste(ynm, "~", xnm))
   do.call("lm", list(fo))
})
detach(anscombe)

Or if all the variables have the same length you could use
a form such as ancombe in the first place:

Actually this is not really a recommended way of
proceeding. You would be better off putting all
your variables in a data frame and using that.

lapply(1:4, function(i) {
    fo <- as.formula(paste(names(anscombe)[i+4], "~", names(anscombe)[i]))
    do.call("lm", list(fo, data = quote(anscombe)))
})

or

lapply(1:4, function(i) {
    fo <- y ~ x
    fo[[2]] <- as.name(names(anscombe)[i+4])
    fo[[3]] <- as.name(names(anscombe)[i])
    do.call("lm", list(fo, data = quote(anscombe)))
})
On 1/29/07, C.Rosa at lse.ac.uk <C.Rosa at lse.ac.uk> wrote:
#
In thinking about this a bit more here is an even shorter one yet it
does show the level in the Call output.  See ?bquote

lapply(levels(CO2$Treatment), function(lev)
   eval(bquote(lm(uptake ~ conc, CO2, subset = Treatment == .(lev)))))
On 1/29/07, Gabor Grothendieck <ggrothendieck at gmail.com> wrote:
#
Prior answers are certainly correct, but this is where lists and lapply
shine:

result<-lapply(list(UK,USA),function(z)summary(lm(y~x,data=z)))

As in (nearly) all else, simplicity is a virtue.

If you prefer to keep the data sources as a character vector,dataNames,

result<-lapply(dataNames,function(z)summary(lm(y~x,data=get(z)))) 

should work. 

Note: both of these are untested for the general case where they might be
used within a function and may not find the right z unless you pay attention
to scope, especially in the get() construction.


Bert Gunter
Genentech Nonclinical Statistics
South San Francisco, CA 94404


-----Original Message-----
From: r-help-bounces at stat.math.ethz.ch
[mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of C.Rosa at lse.ac.uk
Sent: Monday, January 29, 2007 8:23 AM
To: liuwensui at gmail.com; bcarvalh at jhsph.edu; Roger.Bivand at nhh.no
Cc: r-help at stat.math.ethz.ch
Subject: Re: [R] Loop with string variable AND customizable "summary" output

Dear All,
Thank you very much for your help!
Carlo

-----Original Message-----
From: Wensui Liu [mailto:liuwensui at gmail.com]
Sent: Mon 29/01/2007 15:39
To: Rosa,C
Cc: r-help at stat.math.ethz.ch
Subject: Re: [R] Loop with string variable AND customizable "summary" output
 
Carlo,

try something like:

for (i in c("UK","USA"))
{
summ<-summary(lm(y ~ x), subset = (country = i))
assign(paste('output', i, sep = ''), summ);
}

(note: it is untested, sorry).
On 1/29/07, C.Rosa at lse.ac.uk <C.Rosa at lse.ac.uk> wrote:
vector? I have in mind a specific problem:
independent variable (y) for each country (vale a dire: yUK, xUK, yUSA,
xUSA) and you want to run automatically the following regressions:
"outputUK" and "outputUSA", which contain respectively the results of the
first and second regression (yUK on xUK and yUSA on xUSA).
the user wants to) your regression results.
am thinking of something that is close in spirit to "summary" but it is also
customizable. For example, suppose you want different Signif. codes:  0
'***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 or a different format display
(i.e. without "t value" column) implemented automatically (without manually
editing it every time).
the function "summary", but I am not able to see its (line by line) code.
Any idea?
http://www.R-project.org/posting-guide.html

  
    
#
And yet one more.  This one does not use eval but uses do.call, quote
and bquote instead:

lapply(levels(CO2$Treatment), function(lev) do.call("lm",
     list(uptake ~ conc, quote(CO2), subset = bquote(Treatment == .(lev)))))
On 1/29/07, Gabor Grothendieck <ggrothendieck at gmail.com> wrote:
#
Or, to throw yet another couple of possibilities into the mix:

lapply(split(YourDF, YourDF$country), 
       function(x) summary(lm(y ~ x, data = x))

and:

library(nlme)
summary(lmList(y ~ x | country, YourDF))


See ?split and help(lmList, package = nlme)

HTH,

Marc Schwartz
On Mon, 2007-01-29 at 09:03 -0800, Bert Gunter wrote:
#
Note that the nlme solution seems to give the same coefficients
but appears to use a single error term rather than one error
term per level of the conditioning variable and that would change various
other statistics relative to the other solutions should that matter.
Call:
  Model: uptake ~ conc | Treatment
   Data: CO2

Coefficients:
   (Intercept)
           Estimate Std. Error  t value     Pr(>|t|)
nonchilled 22.01916    2.46416 8.935769 1.174616e-13
chilled    16.98142    2.46416 6.891361 1.146556e-09
   conc
             Estimate  Std. Error  t value     Pr(>|t|)
nonchilled 0.01982458 0.004692544 4.224699 6.292679e-05
chilled    0.01563659 0.004692544 3.332221 1.306259e-03

Residual standard error: 8.945667 on 80 degrees of freedom
On 1/29/07, Marc Schwartz <marc_schwartz at comcast.net> wrote:
#
On Mon, 2007-01-29 at 14:30 -0500, Gabor Grothendieck wrote:
<snip>

Gabor,

Thanks for noting that. There is a solution using 'pool = FALSE':
Call:
  Model: uptake ~ conc | Treatment 
   Data: CO2 

Coefficients:
   (Intercept) 
           Estimate Std. Error   t value     Pr(>|t|)
nonchilled 22.01916   2.148475 10.248740 9.463480e-13
chilled    16.98142   2.743761  6.189103 2.562416e-07
   conc 
             Estimate  Std. Error  t value     Pr(>|t|)
nonchilled 0.01982458 0.004091379 4.845452 1.934996e-05
chilled    0.01563659 0.005224992 2.992653 4.721873e-03


I suppose that, while subtle, this could make this approach error prone
for those who (like me in this case) miss it...

Then of course, we get down to the format of the output, etc.

:-)

Thanks Gabor,

Marc