Dear R listers --
The program below does the following tasks:
1. It creates a file (wintemp4) that is a subset of alldata4 consisting of
"winner" records in 50 industry groups (about 5400 obs);
2. It defines a function (myppr1) that runs the ppr function in modreg
once to generate goodness of fit (sum of squared errors) measures by number
of terms included in model and then reruns ppr using the number of terms
with the lowest sum of squared errors.
3. It grinds through a loop, subsetting wintemp4 by group and running
myppr1 for each
group subset; and
4. It puts the ppr output into a separate vector element for each group
(in an attempt to avoid "growing" the vector).
I am using R version 1.2.2 in Emacs/ESS on Win98 with 256mb RAM.
I have two questions; I would be most grateful for any help the list can
provide:
A. This program *seems* to take a long time. I have been careful to free
as much memory as I can, and the gc()'s seem to help avoid using the
swapfile and to keep available system resources above 90%. Is there
anything else I can do to make the program more efficient?
B. I say "seems" because after running the program for an hour, I type
ctl-G to quit. The *R* session seemed to be terminated, with about 40 or
so groups processed, so I opened up another R session to try to see what
had happened. After I quit the second session, suddenly the first session
seemed to come back to life and spit out the printed output for the rest of
the groups! So I wonder if there is something I need to add to my program
to "force" it to finish processing? (I apologize for the inarticulate way
I am posing this question!)
Thanks in advance.
David N. Beede
Economist
Office of Policy Development
Economics and Statistics Administration
U.S. Department of Commerce
Room 4858 HCHB
14th Street and Pennsylvania Avenue, N.W.
Washington, DC 20230
Voice: 202.482.1226
Fax: 202.482.0325
e-mail: david.beede at mail.doc.gov
#Here is the program
for(i in 1:4) gc()
load("alldata4.Rdata")
assign("wintemp4", subset(alldata4, 1 <= group & group <= 50 & winner==1))
rm(alldata4)
for(i in 1:4) gc()
library(modreg)
attach(wintemp4)
myppr1 <- function(x)
{
#run pprfile once to get list of sum of squared errors corresponding to differen numbers of terms
pprfile.ppr <- ppr(
award~
ilogemp+ilogage+sdb+allsmall+
size2+size3+size4+size5+size6+size7+size8+size9+size10+
X.Iprimnaic.2+X.Iprimnaic.3+X.Iprimnaic.4+X.Iprimnaic.5+X.Iprimnaic.6+
X.Iprimnaic.7+X.Iprimnaic.8+X.Iprimnaic.9+X.Iprimnaic.10+X.Iprimnaic.11+
X.Iprimnaic.12+X.Iprimnaic.13+X.Iprimnaic.14+X.Iprimnaic.15+X.Iprimnaic.16+
X.Iprimnaic.17+X.Iprimnaic.18+X.Iprimnaic.19+X.Iprimnaic.20+X.Iprimnaic.21+
X.Iprimnaic.22+X.Iprimnaic.23+X.Iprimnaic.24+X.Iprimnaic.25+X.Iprimnaic.26,
data=x, nterms=1, max.terms= min(nrow(x),40), optlevel=3
)
#pick number of terms giving best fit
numterm <- which.min(pprfile.ppr$gofn[pprfile.ppr$gofn>0])
pprfile.ppr <- ppr(
award~
ilogemp+ilogage+sdb+allsmall+
size2+size3+size4+size5+size6+size7+size8+size9+size10+
X.Iprimnaic.2+X.Iprimnaic.3+X.Iprimnaic.4+X.Iprimnaic.5+X.Iprimnaic.6+
X.Iprimnaic.7+X.Iprimnaic.8+X.Iprimnaic.9+X.Iprimnaic.10+X.Iprimnaic.11+
X.Iprimnaic.12+X.Iprimnaic.13+X.Iprimnaic.14+X.Iprimnaic.15+X.Iprimnaic.16+
X.Iprimnaic.17+X.Iprimnaic.18+X.Iprimnaic.19+X.Iprimnaic.20+X.Iprimnaic.21+
X.Iprimnaic.22+X.Iprimnaic.23+X.Iprimnaic.24+X.Iprimnaic.25+X.Iprimnaic.26,
data=x, nterms=numterm, max.terms= min(nrow(x),40), optlevel=3
)
cat("group =", x$group[1],"\n")
cat("NAIC =", x$naic4[1],"\n")
cat("cendiv =", as.character(x$cendiv[1]),"\n")
cat("number of obs used =", nrow(x),"\n")
print(summary(pprfile.ppr))
}
grouparr <- levels(as.factor(wintemp4$group))
pprest <- vector(mode="list",length=length(grouparr))
for(i in seq(along=grouparr))
{
subi <- subset(wintemp4,wintemp4$group==grouparr[i])
if(nrow(subi) > 40) pprest[i][[1]] <- myppr1(subi)
rm(subi)
print(gc())
}
detach(wintemp4)
2. How can one prevent "for loop" output data frame growth?
On p. 178 of "S Programming" by VR, there is a suggestion that it is more
efficient to create an object at least the size of the ultimate output
object, in order to avoid generating copies of the object at each iteration
of a for loop. This seems easy enough for a vector, as illustrated by VR.
However, it is not obvious to me how to do this for the data frame I wish
to
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
efficiency and "forcing" questions
3 messages · david.beede@mail.doc.gov, A.J. Rossini, Brian Ripley
"db" == david beede <david.beede at mail.doc.gov> writes:
db> B. I say "seems" because after running the program for an
db> hour, I type ctl-G to quit. The *R* session seemed to be
db> terminated, with about 40 or so groups processed, so I opened
db> up another R session to try to see what had happened. After I
db> quit the second session, suddenly the first session seemed to
db> come back to life and spit out the printed output for the rest
db> of the groups! So I wonder if there is something I need to
db> add to my program to "force" it to finish processing? (I
db> apologize for the inarticulate way I am posing this question!)
This delay "might" be due to problems with Emacs.
(reason for cc'ing r-help) Is there anything comparable to "top" on
unix, for windows, so that you can track process status?
best,
-tony
A.J. Rossini Rsrch. Asst. Prof. of Biostatistics UW Biostat/Center for AIDS Research rossini at u.washington.edu FHCRC/SCHARP/HIV Vaccine Trials Net rossini at scharp.org -------- (friday is unknown) -------- FHCRC: M--W : 206-667-7025 (fax=4812)|Voicemail is pretty sketchy CFAR: ?? : 206-731-3647 (fax=3694)|Email is far better than phone UW: Th : 206-543-1044 (fax=3286)|Change last 4 digits of phone to FAX -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
On 28 Mar 2001, A.J. Rossini wrote:
"db" == david beede <david.beede at mail.doc.gov> writes:
db> B. I say "seems" because after running the program for an
db> hour, I type ctl-G to quit. The *R* session seemed to be
db> terminated, with about 40 or so groups processed, so I opened
db> up another R session to try to see what had happened. After I
db> quit the second session, suddenly the first session seemed to
db> come back to life and spit out the printed output for the rest
db> of the groups! So I wonder if there is something I need to
db> add to my program to "force" it to finish processing? (I
db> apologize for the inarticulate way I am posing this question!)
This delay "might" be due to problems with Emacs.
(reason for cc'ing r-help) Is there anything comparable to "top" on
unix, for windows, so that you can track process status?
Task Manager on NT/2000/XP; right-click taskbar. wintop (a `powertoy') on 95/98/Me: download from Microsoft
best, -tony -- A.J. Rossini Rsrch. Asst. Prof. of Biostatistics UW Biostat/Center for AIDS Research rossini at u.washington.edu FHCRC/SCHARP/HIV Vaccine Trials Net rossini at scharp.org -------- (friday is unknown) -------- FHCRC: M--W : 206-667-7025 (fax=4812)|Voicemail is pretty sketchy CFAR: ?? : 206-731-3647 (fax=3694)|Email is far better than phone UW: Th : 206-543-1044 (fax=3286)|Change last 4 digits of phone to FAX -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272860 (secr) Oxford OX1 3TG, UK Fax: +44 1865 272595 -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._