An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20081223/5da9962f/attachment.pl>
How can I avoid nested 'for' loops or quicken the process?
13 messages · Brigid Mooney, Daniel Nordlund, David Winsemius +4 more
On Dec 23, 2008, at 9:55 AM, Brigid Mooney wrote:
I have used some of your advice in making some changes to my
function and function call before reposting.
Instead of nesting many 'for' loops, I have gotten to the point
where I only have one. (Also, please note, I am pasting the
function calcProfit at the end of this message as it is a bit long.)
This process works correctly, but still has a 'for' loop, which I
thought I would be able to avoid with 'apply'.
------------------------------------------------------------------------------------------------------------------
# Sample iteration parameters (these can be vectors of arbitrary
length)
# Need to iterate through all possible combinations of these
parameters
Param <- list(long=c(.75, 1.5),
short=c(-.5, -1),
investment=10000,
stoploss=c(-.015),
comission=.0002,
penny=3,
volume=c(.02, .01),
numU=2,
accDefn=0:1 )
CombParam <- expand.grid(Param)
# Create sample X and Y data frames for function call
Y <- data.frame(SymbolID=10:14, OpeningPrice = c(1,3,10,20,60),
ClosingPrice = c(2,2.5,11,18,61.5), YesterdayClose= c(1,3,10,20,60),
MinTrVol = rep(10000000, times=5))
X <- data.frame(SymbolID=10:14, weight = c(1, .5, -3, -.75, 2),
CPweight=c(1.5, .25, -1.75, 2, -1), noU = c(2,3,4,2,10))
for (i in 1:length(CombParam$long))
{
if(i==1)
{ Results <- calcProfit(CombParam[i,], X, Y)
} else {
Results <- rbind(Results, calcProfit(CombParam[i,], X, Y))
}
}
------------------------------------------------------------------------------------------------------------------
However, when I try to replace this for loop with 'apply', I get the
following result:
Results2 <- apply(CombParam, 1, calcProfit, X, Y)
Error in IterParam$long : $ operator is invalid for atomic vectors
apply is giving calcProfit a named numeric vector and then calcProfit is trying to parse it with "$" which is an operator for lists. Try serial corrects of the form: long <- IterParam["long"] That seemed to let the interpreter move on to the next error ;-) > Results2 <- apply(CombParam, 1, calcProfit, X, Y) Error in IterParam$short : $ operator is invalid for atomic vectors
David Winsemius
>
> Any advice that anyone could provide would be much appreciated.
>
> Here is the function which I am using:
>
> ------------------------------------------------------------------------------------------------------------------
> calcProfit <- function(IterParam, marketData, dailyForecast) {
> long <- IterParam$long
> short <- IterParam$short
> investment <- IterParam$investment
> stoploss <- IterParam$stoploss
> comission <- IterParam$comission
> penny <- IterParam$penny
> volume <- IterParam$volume
> numU <- IterParam$numU
> accDefn <- IterParam$accDefn
>
> compareMarket <- merge(dailyForecast, marketData,
> by.x="SymbolID", by.y="SymbolID")
>
> weight <- ifelse(rep(accDefn, times=length(compareMarket
> $weight))==1, compareMarket$weight, compareMarket$CPweight)
>
> position <- ifelse((weight<=short & compareMarket$OpeningPrice >
> penny & compareMarket$noU>=numU), "S",
> ifelse((weight>=long & compareMarket$OpeningPrice > penny &
> compareMarket$noU>=numU), "L", NA))
> positionTF <- ifelse(position=="L" | position=="S", TRUE, FALSE)
>
> estMaxInv <- volume*compareMarket$MinTrVol*compareMarket
> $YesterdayClose
>
> investbySymbol <- ifelse(positionTF==TRUE, ifelse(estMaxInv >=
> investment, investment, 0))
>
> opClProfit <- ifelse(position=="L", compareMarket$ClosingPrice/
> compareMarket$OpeningPrice-1,
> ifelse(position=="S", 1-compareMarket
> $ClosingPrice/compareMarket$OpeningPrice, 0.0))
>
> Gains <- investbySymbol*ifelse(opClProfit <= stoploss, stoploss,
> opClProfit)
> ProfitTable <- data.frame(SymbolID=compareMarket$SymbolID,
> investbySymbol, Gains, percentGains=Gains/investbySymbol,
> LessComm=rep(comission,
> times=length(Gains)), NetGains=Gains/investbySymbol-2*comission)
>
> AggregatesTable <- data.frame( OutTotInvestment = sum(ProfitTable
> $investbySymbol, na.rm=TRUE),
> OutNumInvestments = sum(ProfitTable$investbySymbol,
> na.rm=TRUE)/investment, OutDolProf = sum(ProfitTable$Gains,
> na.rm=TRUE),
> OutPerProf = sum(ProfitTable$Gains, na.rm=TRUE)/
> sum(ProfitTable$investbySymbol, na.rm=TRUE),
> OutNetGains = sum(ProfitTable$Gains, na.rm=TRUE)/
> sum(ProfitTable$investbySymbol, na.rm=TRUE)-2*comission, OutLong =
> long,
> OutShort = short, OutInvestment = investment, OutStoploss =
> stoploss, OutComission = comission, OutPenny = penny, OutVolume =
> volume,
> OutNumU = numU, OutAccDefn = accDefn )
>
> return(AggregatesTable)
> }
>
> ------------------------------------------------------------------------------------------------------------------
>
>
>
> On Mon, Dec 22, 2008 at 4:32 PM, David Winsemius <dwinsemius at comcast.net
> > wrote:
> I do agree with Dr Berry that your question failed on several
> grounds in adherence to the Posting Guide, so this is off list.
>
> Maybe this will give you guidance that you can apply to your next
> question to the list:
>
> > alist <- list("a","b","c")
> > blist <- list("ab","ac","ad")
>
> > expand.grid(alist, blist)
> Var1 Var2
> 1 a ab
> 2 b ab
> 3 c ab
> 4 a ac
> 5 b ac
> 6 c ac
> 7 a ad
> 8 b ad
> 9 c ad
>
> > apply( expand.grid(alist, blist), 1, function(x) paste(x[1], x[2],
> sep=""))
> [1] "aab" "bab" "cab" "aac" "bac" "cac" "aad" "bad" "cad"
>
> > clist <- list("AA","BB")
>
> > apply(expand.grid(alist, blist, clist),1,function(x) paste(x[1],
> x[2], x[3], sep=""))
> [1] "aabAA" "babAA" "cabAA" "aacAA" "bacAA" "cacAA" "aadAA" "badAA"
> "cadAA" "aabBB"
> [11] "babBB" "cabBB" "aacBB" "bacBB" "cacBB" "aadBB" "badBB" "cadBB"
>
> > dlist <- list(TRUE,FALSE)
>
> > apply(expand.grid(alist, blist, clist, dlist),1,function(x)
> paste(x[1], x[2], x[3], (x[4]), sep=""))[8:12]
> [1] "badAATRUE" "cadAATRUE" "aabBBTRUE" "babBBTRUE" "cabBBTRUE"
>
>
> This could get unwieldily if the length of the lists are
> appreciable, since the number of rows will be the product of all the
> lengths. On the other hand you could create a dataframe indexed by
> the variables in expand.grid's output:
>
> > master.df <- data.frame( expand.grid(alist, blist, clist, dlist),
> results = apply(expand.grid(alist, blist,
> clist,dlist),1,
> function(x) paste(x[1], x[2],
> x[3], (x[4]), sep="")))
>
>
>
> --
> David Winsemius
>
> On Dec 22, 2008, at 3:33 PM, Charles C. Berry wrote:
>
> On Mon, 22 Dec 2008, Brigid Mooney wrote:
>
> Hi All,
>
> I'm still pretty new to using R - and I was hoping I might be able
> to get
> some advice as to how to use 'apply' or a similar function instead
> of using
> nested for loops.
>
> Unfortunately, you have given nothing that is reproducible.
>
> The details of MyFunction and the exact structure of the list
> objects are crucial.
>
> Check out the _Posting Guide_ for hints on how to formulate a
> question that will elecit an answer that helps you.
>
> HTH,
>
> Chuck
>
>
>
> Right now I have a script which uses nested for loops similar to this:
>
> i <- 1
> for(a in Alpha) { for (b in Beta) { for (c in Gamma) { for (d in
> Delta) {
> for (e in Epsilon)
> {
> Output[i] <- MyFunction(X, Y, a, b, c, d, e)
> i <- i+1
> }}}}}
>
>
> Where Output[i] is a data frame, X and Y are data frames, and Alpha,
> Beta,
> Gamma, Delta, and Epsilon are all lists, some of which are numeric,
> some
> logical (TRUE/FALSE).
>
> Any advice on how to implement some sort of solution that might be
> quicker
> than these nested 'for' loops would be greatly appreciated.
>
> Thanks!
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>
> Charles C. Berry (858) 534-2098
> Dept of Family/Preventive
> Medicine
> E mailto:cberry at tajo.ucsd.edu UC San Diego
> http://famprevmed.ucsd.edu/faculty/cberry/ La Jolla, San Diego
> 92093-0901
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>
An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20081223/73eb8684/attachment.pl>
On Dec 23, 2008, at 10:56 AM, Brigid Mooney wrote:
Thank you again for your help.
snip
------------- With the 'apply' call, Results2 is of class list. Results2 <- apply(CombParam, 1, calcProfit, X, Y) ------------------------------------------------------------------------------------------------------- How can I get convert Results2 from a list to a data frame like Results?
Have you tried as.data.frame() on Results2? Each of its elements should have the proper structure. You no longer have a reproducible example, but see this session clip: > lairq <- apply(airquality,1, function(x) x ) > str(lairq) num [1:6, 1:153] 41 190 7.4 67 5 1 36 118 8 72 ... - attr(*, "dimnames")=List of 2 ..$ : chr [1:6] "Ozone" "Solar.R" "Wind" "Temp" ... ..$ : NULL > is.data.frame(lairq) [1] FALSE > is.data.frame(rbind(lairq)) [1] FALSE > is.data.frame( as.data.frame(lairq) )
David Winsemius
An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20081223/ee8f63a3/attachment.pl>
Avoiding multiple nested for loops (as requested in the subject) is usually a good idea, especially if you can take advantage of vectorized functions. You were able redesign your code to use a single for loop. I presume there was a substantial improvement in program speed. How much additional time is saved by using apply to eliminate the final for loop? Is it worth the additional programming time? Enquiring minds want to know. :-) Dan Daniel Nordlund Bothell, WA USA
-----Original Message-----
From: r-help-bounces at r-project.org
[mailto:r-help-bounces at r-project.org] On Behalf Of Brigid Mooney
Sent: Tuesday, December 23, 2008 8:36 AM
To: David Winsemius
Cc: r-help at r-project.org
Subject: Re: [R] How can I avoid nested 'for' loops or
quicken the process?
--------------------------------------------------------------
----------------------------------------------------------------
Problem Description: (reproducible code below)
--------------------------------------------------------------
----------------------------------------------------------------
I cannot seem to get as.data.frame() to work as I would expect.
Results2 seems to contain repeated column titles for each
row, as well as a
row name 'investment' (which is not intended), like:
Results2
[[1]]
OutTotInvestment OutNumInvestments OutDolProf OutPerProf
OutNetGains OutLong OutShort OutInvestment OutStoploss
OutComission OutPenny
OutVolume OutNumU OutAccDefn
investment 30000 3 -450 -0.015
-0.0154 0.75 -0.5 10000 -0.015 2e-04
3 0.02 2 0
[[2]]
OutTotInvestment OutNumInvestments OutDolProf OutPerProf
OutNetGains OutLong OutShort OutInvestment OutStoploss
OutComission OutPenny
OutVolume OutNumU OutAccDefn
investment 30000 3 -450 -0.015
-0.0154 1.5 -0.5 10000 -0.015 2e-04
3 0.02 2 0
...
When I try to apply 'as.data.frame', it concatenates
incremental numbers to
the repeated row headers and gives:
as.data.frame(Results2)
OutTotInvestment OutNumInvestments OutDolProf OutPerProf
OutNetGains OutLong OutShort OutInvestment OutStoploss
OutComission OutPenny
OutVolume OutNumU OutAccDefn
investment 30000 3 -450 -0.015
-0.0154 0.75 -0.5 10000 -0.015 2e-04
3 0.02 2 0
OutTotInvestment.1 OutNumInvestments.1
OutDolProf.1 OutPerProf.1
OutNetGains.1 OutLong.1 OutShort.1 OutInvestment.1 OutStoploss.1
OutComission.1 OutPenny.1
investment 30000 3 -450
-0.015 -0.0154 1.5 -0.5 10000
-0.015 2e-04 3
OutVolume.1 OutNumU.1 OutAccDefn.1 OutTotInvestment.2
OutNumInvestments.2 OutDolProf.2 OutPerProf.2 OutNetGains.2 OutLong.2
OutShort.2 OutInvestment.2
investment 0.02 2 0
30000 3 -450 -0.015 -0.0154
0.75 -1 10000
...
which is a data frame of dimension 1 224, when I am looking
for a data frame
like Results of dimension 16 14.
--------------------------------------------------------------
----------------------------------------------------------------
Reproducible code:
--------------------------------------------------------------
----------------------------------------------------------------
# --------------------------------------------------------------
# FUNCTION calcProfit
# --------------------------------------------------------------
calcProfit <- function(IterParam, marketData, dailyForecast) #, long,
short, investment, stoploss, comission, penny, volume, numU, accDefn)
{
if (class(IterParam) == "numeric")
{
long <- IterParam["long"]
short <- IterParam["short"]
investment <- IterParam["investment"]
stoploss <- IterParam["stoploss"]
comission <- IterParam["comission"]
penny <- IterParam["penny"]
volume <- IterParam["volume"]
numU <- IterParam["numU"]
accDefn <- IterParam["accDefn"]
} else {
long <- IterParam$long
short <- IterParam$short
investment <- IterParam$investment
stoploss <- IterParam$stoploss
comission <- IterParam$comission
penny <- IterParam$penny
volume <- IterParam$volume
numU <- IterParam$numU
accDefn <- IterParam$accDefn
}
compareMarket <- merge(dailyForecast, marketData, by.x="SymbolID",
by.y="SymbolID")
weight <- ifelse(rep(accDefn,
times=length(compareMarket$weight))==1,
compareMarket$weight, compareMarket$CPweight)
position <- ifelse((weight<=short &
compareMarket$OpeningPrice > penny &
compareMarket$noU>=numU), "S",
ifelse((weight>=long & compareMarket$OpeningPrice > penny &
compareMarket$noU>=numU), "L", NA))
positionTF <- ifelse(position=="L" | position=="S", TRUE, FALSE)
estMaxInv <-
volume*compareMarket$MinTrVol*compareMarket$YesterdayClose
investbySymbol <- ifelse(positionTF==TRUE, ifelse(estMaxInv >=
investment, investment, 0))
opClProfit <- ifelse(position=="L",
compareMarket$ClosingPrice/compareMarket$OpeningPrice-1,
ifelse(position=="S",
1-compareMarket$ClosingPrice/compareMarket$OpeningPrice, 0.0))
Gains <- investbySymbol*ifelse(opClProfit <= stoploss, stoploss,
opClProfit)
ProfitTable <- data.frame(SymbolID=compareMarket$SymbolID,
investbySymbol, Gains, percentGains=Gains/investbySymbol,
LessComm=rep(comission, times=length(Gains)),
NetGains=Gains/investbySymbol-2*comission)
AggregatesTable <- data.frame( OutTotInvestment =
sum(ProfitTable$investbySymbol, na.rm=TRUE),
OutNumInvestments = sum(ProfitTable$investbySymbol,
na.rm=TRUE)/investment, OutDolProf = sum(ProfitTable$Gains,
na.rm=TRUE),
OutPerProf = sum(ProfitTable$Gains,
na.rm=TRUE)/sum(ProfitTable$investbySymbol, na.rm=TRUE),
OutNetGains = sum(ProfitTable$Gains,
na.rm=TRUE)/sum(ProfitTable$investbySymbol,
na.rm=TRUE)-2*comission, OutLong
= long,
OutShort = short, OutInvestment = investment, OutStoploss =
stoploss, OutComission = comission, OutPenny = penny,
OutVolume = volume,
OutNumU = numU, OutAccDefn = accDefn )
return(AggregatesTable)
}
# Sample iteration parameters (these can be vectors of
arbitrary length)
# Need to iterate through all possible combinations of these
parameters
Param <- list(long=c(.75, 1.5),
short=c(-.5, -1),
investment=10000,
stoploss=c(-.015),
comission=.0002,
penny=3,
volume=c(.02, .01),
numU=2,
accDefn=0:1 )
CombParam <- expand.grid(Param)
# Create sample X and Y data frames for function call
Y <- data.frame(SymbolID=10:14, OpeningPrice =
c(1,3,10,20,60), ClosingPrice
= c(2,2.5,11,18,61.5), YesterdayClose= c(1,3,10,20,60), MinTrVol =
rep(10000000, times=5))
X <- data.frame(SymbolID=10:14, weight = c(1, .5, -3, -.75, 2),
CPweight=c(1.5, .25, -1.75, 2, -1), noU = c(2,3,4,2,10))
for (i in 1:length(CombParam$long))
{
if(i==1)
{ Results <- calcProfit(CombParam[i,], X, Y)
} else {
Results <- rbind(Results, calcProfit(CombParam[i,], X, Y))
}
}
Results2 <- apply(CombParam, 1, calcProfit, X, Y)
--------------------------------------------------------------
----------------------------------------------------------------
On Tue, Dec 23, 2008 at 11:15 AM, David Winsemius
<dwinsemius at comcast.net>wrote:
On Dec 23, 2008, at 10:56 AM, Brigid Mooney wrote: Thank you again for your help.
snip
-------------
With the 'apply' call, Results2 is of class list. Results2 <- apply(CombParam, 1, calcProfit, X, Y)
-------------------------------------------------------------- -----------------------------------------
How can I get convert Results2 from a list to a data frame
like Results?
Have you tried as.data.frame() on Results2? Each of its
elements should
have the proper structure. You no longer have a reproducible example, but see this
session clip:
lairq <- apply(airquality,1, function(x) x ) str(lairq)
num [1:6, 1:153] 41 190 7.4 67 5 1 36 118 8 72 ... - attr(*, "dimnames")=List of 2 ..$ : chr [1:6] "Ozone" "Solar.R" "Wind" "Temp" ... ..$ : NULL
is.data.frame(lairq)
[1] FALSE
is.data.frame(rbind(lairq))
[1] FALSE
is.data.frame( as.data.frame(lairq) )
-- David Winsemius
[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
FWIW: Good advice below! -- after all, the first rule of optimizing code is: Don't! For the record (yet again), the apply() family of functions (and their packaged derivatives, of course) are "merely" vary carefully written for() loops: their main advantage is in code readability, not in efficiency gains, which may well be small or nonexistent. True efficiency gains require "vectorization", which essentially moves the for() loops from interpreted code to (underlying) C code (on the underlying data structures): e.g. compare rowMeans() [vectorized] with ave() or apply(..,1,mean). Cheers, Bert Gunter Genentech Nonclinical Statistics -----Original Message----- From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf Of Daniel Nordlund Sent: Tuesday, December 23, 2008 10:01 AM To: r-help at r-project.org Subject: Re: [R] How can I avoid nested 'for' loops or quicken the process? Avoiding multiple nested for loops (as requested in the subject) is usually a good idea, especially if you can take advantage of vectorized functions. You were able redesign your code to use a single for loop. I presume there was a substantial improvement in program speed. How much additional time is saved by using apply to eliminate the final for loop? Is it worth the additional programming time? Enquiring minds want to know. :-) Dan Daniel Nordlund Bothell, WA USA
-----Original Message-----
From: r-help-bounces at r-project.org
[mailto:r-help-bounces at r-project.org] On Behalf Of Brigid Mooney
Sent: Tuesday, December 23, 2008 8:36 AM
To: David Winsemius
Cc: r-help at r-project.org
Subject: Re: [R] How can I avoid nested 'for' loops or
quicken the process?
--------------------------------------------------------------
----------------------------------------------------------------
Problem Description: (reproducible code below)
--------------------------------------------------------------
----------------------------------------------------------------
I cannot seem to get as.data.frame() to work as I would expect.
Results2 seems to contain repeated column titles for each
row, as well as a
row name 'investment' (which is not intended), like:
Results2
[[1]]
OutTotInvestment OutNumInvestments OutDolProf OutPerProf
OutNetGains OutLong OutShort OutInvestment OutStoploss
OutComission OutPenny
OutVolume OutNumU OutAccDefn
investment 30000 3 -450 -0.015
-0.0154 0.75 -0.5 10000 -0.015 2e-04
3 0.02 2 0
[[2]]
OutTotInvestment OutNumInvestments OutDolProf OutPerProf
OutNetGains OutLong OutShort OutInvestment OutStoploss
OutComission OutPenny
OutVolume OutNumU OutAccDefn
investment 30000 3 -450 -0.015
-0.0154 1.5 -0.5 10000 -0.015 2e-04
3 0.02 2 0
...
When I try to apply 'as.data.frame', it concatenates
incremental numbers to
the repeated row headers and gives:
as.data.frame(Results2)
OutTotInvestment OutNumInvestments OutDolProf OutPerProf
OutNetGains OutLong OutShort OutInvestment OutStoploss
OutComission OutPenny
OutVolume OutNumU OutAccDefn
investment 30000 3 -450 -0.015
-0.0154 0.75 -0.5 10000 -0.015 2e-04
3 0.02 2 0
OutTotInvestment.1 OutNumInvestments.1
OutDolProf.1 OutPerProf.1
OutNetGains.1 OutLong.1 OutShort.1 OutInvestment.1 OutStoploss.1
OutComission.1 OutPenny.1
investment 30000 3 -450
-0.015 -0.0154 1.5 -0.5 10000
-0.015 2e-04 3
OutVolume.1 OutNumU.1 OutAccDefn.1 OutTotInvestment.2
OutNumInvestments.2 OutDolProf.2 OutPerProf.2 OutNetGains.2 OutLong.2
OutShort.2 OutInvestment.2
investment 0.02 2 0
30000 3 -450 -0.015 -0.0154
0.75 -1 10000
...
which is a data frame of dimension 1 224, when I am looking
for a data frame
like Results of dimension 16 14.
--------------------------------------------------------------
----------------------------------------------------------------
Reproducible code:
--------------------------------------------------------------
----------------------------------------------------------------
# --------------------------------------------------------------
# FUNCTION calcProfit
# --------------------------------------------------------------
calcProfit <- function(IterParam, marketData, dailyForecast) #, long,
short, investment, stoploss, comission, penny, volume, numU, accDefn)
{
if (class(IterParam) == "numeric")
{
long <- IterParam["long"]
short <- IterParam["short"]
investment <- IterParam["investment"]
stoploss <- IterParam["stoploss"]
comission <- IterParam["comission"]
penny <- IterParam["penny"]
volume <- IterParam["volume"]
numU <- IterParam["numU"]
accDefn <- IterParam["accDefn"]
} else {
long <- IterParam$long
short <- IterParam$short
investment <- IterParam$investment
stoploss <- IterParam$stoploss
comission <- IterParam$comission
penny <- IterParam$penny
volume <- IterParam$volume
numU <- IterParam$numU
accDefn <- IterParam$accDefn
}
compareMarket <- merge(dailyForecast, marketData, by.x="SymbolID",
by.y="SymbolID")
weight <- ifelse(rep(accDefn,
times=length(compareMarket$weight))==1,
compareMarket$weight, compareMarket$CPweight)
position <- ifelse((weight<=short &
compareMarket$OpeningPrice > penny &
compareMarket$noU>=numU), "S",
ifelse((weight>=long & compareMarket$OpeningPrice > penny &
compareMarket$noU>=numU), "L", NA))
positionTF <- ifelse(position=="L" | position=="S", TRUE, FALSE)
estMaxInv <-
volume*compareMarket$MinTrVol*compareMarket$YesterdayClose
investbySymbol <- ifelse(positionTF==TRUE, ifelse(estMaxInv >=
investment, investment, 0))
opClProfit <- ifelse(position=="L",
compareMarket$ClosingPrice/compareMarket$OpeningPrice-1,
ifelse(position=="S",
1-compareMarket$ClosingPrice/compareMarket$OpeningPrice, 0.0))
Gains <- investbySymbol*ifelse(opClProfit <= stoploss, stoploss,
opClProfit)
ProfitTable <- data.frame(SymbolID=compareMarket$SymbolID,
investbySymbol, Gains, percentGains=Gains/investbySymbol,
LessComm=rep(comission, times=length(Gains)),
NetGains=Gains/investbySymbol-2*comission)
AggregatesTable <- data.frame( OutTotInvestment =
sum(ProfitTable$investbySymbol, na.rm=TRUE),
OutNumInvestments = sum(ProfitTable$investbySymbol,
na.rm=TRUE)/investment, OutDolProf = sum(ProfitTable$Gains,
na.rm=TRUE),
OutPerProf = sum(ProfitTable$Gains,
na.rm=TRUE)/sum(ProfitTable$investbySymbol, na.rm=TRUE),
OutNetGains = sum(ProfitTable$Gains,
na.rm=TRUE)/sum(ProfitTable$investbySymbol,
na.rm=TRUE)-2*comission, OutLong
= long,
OutShort = short, OutInvestment = investment, OutStoploss =
stoploss, OutComission = comission, OutPenny = penny,
OutVolume = volume,
OutNumU = numU, OutAccDefn = accDefn )
return(AggregatesTable)
}
# Sample iteration parameters (these can be vectors of
arbitrary length)
# Need to iterate through all possible combinations of these
parameters
Param <- list(long=c(.75, 1.5),
short=c(-.5, -1),
investment=10000,
stoploss=c(-.015),
comission=.0002,
penny=3,
volume=c(.02, .01),
numU=2,
accDefn=0:1 )
CombParam <- expand.grid(Param)
# Create sample X and Y data frames for function call
Y <- data.frame(SymbolID=10:14, OpeningPrice =
c(1,3,10,20,60), ClosingPrice
= c(2,2.5,11,18,61.5), YesterdayClose= c(1,3,10,20,60), MinTrVol =
rep(10000000, times=5))
X <- data.frame(SymbolID=10:14, weight = c(1, .5, -3, -.75, 2),
CPweight=c(1.5, .25, -1.75, 2, -1), noU = c(2,3,4,2,10))
for (i in 1:length(CombParam$long))
{
if(i==1)
{ Results <- calcProfit(CombParam[i,], X, Y)
} else {
Results <- rbind(Results, calcProfit(CombParam[i,], X, Y))
}
}
Results2 <- apply(CombParam, 1, calcProfit, X, Y)
--------------------------------------------------------------
----------------------------------------------------------------
On Tue, Dec 23, 2008 at 11:15 AM, David Winsemius
<dwinsemius at comcast.net>wrote:
On Dec 23, 2008, at 10:56 AM, Brigid Mooney wrote: Thank you again for your help.
snip
-------------
With the 'apply' call, Results2 is of class list. Results2 <- apply(CombParam, 1, calcProfit, X, Y)
-------------------------------------------------------------- -----------------------------------------
How can I get convert Results2 from a list to a data frame
like Results?
Have you tried as.data.frame() on Results2? Each of its
elements should
have the proper structure. You no longer have a reproducible example, but see this
session clip:
lairq <- apply(airquality,1, function(x) x ) str(lairq)
num [1:6, 1:153] 41 190 7.4 67 5 1 36 118 8 72 ... - attr(*, "dimnames")=List of 2 ..$ : chr [1:6] "Ozone" "Solar.R" "Wind" "Temp" ... ..$ : NULL
is.data.frame(lairq)
[1] FALSE
is.data.frame(rbind(lairq))
[1] FALSE
is.data.frame( as.data.frame(lairq) )
-- David Winsemius
[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
I have to agree with Daniel Nordlund regarding not creating subsidiary problems when the main problem has been cracked. Nonetheless, ... might you be happier with the result of changing the last data.frame() call in calcProfit to c()? I get a matrix: > str(Results2) num [1:14, 1:16] 3.00e+04 3.00 -4.50e+02 -1.50e-02 -1.54e-02 7.50e-01 -5.00e-01 1.00e+04 -1.50e-02 2.00e-04 ... - attr(*, "dimnames")=List of 2 ..$ : chr [1:14] "OutTotInvestment" "OutNumInvestments.investment" "OutDolProf" "OutPerProf" ... ..$ : NULL ... if you go along with that strategy, then I think it is possible that you really want as.data.frame( t( Results2)) since the rows and columns seem to be transposed from what I would have wanted. Now, ... your next task is to set up your mail-client so it sends unformatted text to R-help.
David Winsemius
Heritage Labs
On Dec 23, 2008, at 11:36 AM, Brigid Mooney wrote:
> ------------------------------------------------------------------------------------------------------------------------------
> Problem Description: (reproducible code below)
> ------------------------------------------------------------------------------------------------------------------------------
> I cannot seem to get as.data.frame() to work as I would expect.
>
> Results2 seems to contain repeated column titles for each row, as
> well as a row name 'investment' (which is not intended), like:
> Results2
> [[1]]
> OutTotInvestment OutNumInvestments OutDolProf OutPerProf
> OutNetGains OutLong OutShort OutInvestment OutStoploss OutComission
> OutPenny OutVolume OutNumU OutAccDefn
> investment 30000 3 -450
> -0.015 -0.0154 0.75 -0.5 10000 -0.015
> 2e-04 3 0.02 2 0
> [[2]]
> OutTotInvestment OutNumInvestments OutDolProf OutPerProf
> OutNetGains OutLong OutShort OutInvestment OutStoploss OutComission
> OutPenny OutVolume OutNumU OutAccDefn
> investment 30000 3 -450
> -0.015 -0.0154 1.5 -0.5 10000 -0.015
> 2e-04 3 0.02 2 0
> ...
>
> When I try to apply 'as.data.frame', it concatenates incremental
> numbers to the repeated row headers and gives:
> as.data.frame(Results2)
> OutTotInvestment OutNumInvestments OutDolProf OutPerProf
> OutNetGains OutLong OutShort OutInvestment OutStoploss OutComission
> OutPenny OutVolume OutNumU OutAccDefn
> investment 30000 3 -450
> -0.015 -0.0154 0.75 -0.5 10000 -0.015
> 2e-04 3 0.02 2 0
> OutTotInvestment.1 OutNumInvestments.1 OutDolProf.1
> OutPerProf.1 OutNetGains.1 OutLong.1 OutShort.1 OutInvestment.1
> OutStoploss.1 OutComission.1 OutPenny.1
> investment 30000 3 -450
> -0.015 -0.0154 1.5 -0.5 10000
> -0.015 2e-04 3
> OutVolume.1 OutNumU.1 OutAccDefn.1 OutTotInvestment.2
> OutNumInvestments.2 OutDolProf.2 OutPerProf.2 OutNetGains.2 OutLong.
> 2 OutShort.2 OutInvestment.2
> investment 0.02 2 0
> 30000 3 -450 -0.015
> -0.0154 0.75 -1 10000
> ...
>
> which is a data frame of dimension 1 224, when I am looking for a
> data frame like Results of dimension 16 14.
>
>
> ------------------------------------------------------------------------------------------------------------------------------
> Reproducible code:
> ------------------------------------------------------------------------------------------------------------------------------
> # --------------------------------------------------------------
> # FUNCTION calcProfit
> # --------------------------------------------------------------
> calcProfit <- function(IterParam, marketData, dailyForecast) #,
> long, short, investment, stoploss, comission, penny, volume, numU,
> accDefn)
> {
> if (class(IterParam) == "numeric")
> {
> long <- IterParam["long"]
> short <- IterParam["short"]
> investment <- IterParam["investment"]
> stoploss <- IterParam["stoploss"]
> comission <- IterParam["comission"]
> penny <- IterParam["penny"]
> volume <- IterParam["volume"]
> numU <- IterParam["numU"]
> accDefn <- IterParam["accDefn"]
> } else {
> long <- IterParam$long
> short <- IterParam$short
> investment <- IterParam$investment
> stoploss <- IterParam$stoploss
> comission <- IterParam$comission
> penny <- IterParam$penny
> volume <- IterParam$volume
> numU <- IterParam$numU
> accDefn <- IterParam$accDefn
> }
> compareMarket <- merge(dailyForecast, marketData,
> by.x="SymbolID", by.y="SymbolID")
>
> weight <- ifelse(rep(accDefn, times=length(compareMarket
> $weight))==1, compareMarket$weight, compareMarket$CPweight)
>
> position <- ifelse((weight<=short & compareMarket$OpeningPrice >
> penny & compareMarket$noU>=numU), "S",
> ifelse((weight>=long & compareMarket$OpeningPrice > penny &
> compareMarket$noU>=numU), "L", NA))
> positionTF <- ifelse(position=="L" | position=="S", TRUE, FALSE)
>
> estMaxInv <- volume*compareMarket$MinTrVol*compareMarket
> $YesterdayClose
>
> investbySymbol <- ifelse(positionTF==TRUE, ifelse(estMaxInv >=
> investment, investment, 0))
>
> opClProfit <- ifelse(position=="L", compareMarket$ClosingPrice/
> compareMarket$OpeningPrice-1,
> ifelse(position=="S", 1-compareMarket
> $ClosingPrice/compareMarket$OpeningPrice, 0.0))
>
> Gains <- investbySymbol*ifelse(opClProfit <= stoploss, stoploss,
> opClProfit)
>
> ProfitTable <- data.frame(SymbolID=compareMarket$SymbolID,
> investbySymbol, Gains, percentGains=Gains/investbySymbol,
> LessComm=rep(comission,
> times=length(Gains)), NetGains=Gains/investbySymbol-2*comission)
>
> AggregatesTable <- data.frame( OutTotInvestment = sum(ProfitTable
> $investbySymbol, na.rm=TRUE),
> OutNumInvestments = sum(ProfitTable$investbySymbol,
> na.rm=TRUE)/investment, OutDolProf = sum(ProfitTable$Gains,
> na.rm=TRUE),
> OutPerProf = sum(ProfitTable$Gains, na.rm=TRUE)/
> sum(ProfitTable$investbySymbol, na.rm=TRUE),
> OutNetGains = sum(ProfitTable$Gains, na.rm=TRUE)/
> sum(ProfitTable$investbySymbol, na.rm=TRUE)-2*comission, OutLong =
> long,
> OutShort = short, OutInvestment = investment, OutStoploss =
> stoploss, OutComission = comission, OutPenny = penny, OutVolume =
> volume,
> OutNumU = numU, OutAccDefn = accDefn )
>
> return(AggregatesTable)
> }
>
>
> # Sample iteration parameters (these can be vectors of arbitrary
> length)
> # Need to iterate through all possible combinations of these
> parameters
> Param <- list(long=c(.75, 1.5),
> short=c(-.5, -1),
> investment=10000,
> stoploss=c(-.015),
> comission=.0002,
> penny=3,
> volume=c(.02, .01),
> numU=2,
> accDefn=0:1 )
> CombParam <- expand.grid(Param)
>
> # Create sample X and Y data frames for function call
> Y <- data.frame(SymbolID=10:14, OpeningPrice = c(1,3,10,20,60),
> ClosingPrice = c(2,2.5,11,18,61.5), YesterdayClose= c(1,3,10,20,60),
> MinTrVol = rep(10000000, times=5))
> X <- data.frame(SymbolID=10:14, weight = c(1, .5, -3, -.75, 2),
> CPweight=c(1.5, .25, -1.75, 2, -1), noU = c(2,3,4,2,10))
>
>
> for (i in 1:length(CombParam$long))
> {
> if(i==1)
> { Results <- calcProfit(CombParam[i,], X, Y)
> } else {
> Results <- rbind(Results, calcProfit(CombParam[i,], X, Y))
> }
> }
>
>
> Results2 <- apply(CombParam, 1, calcProfit, X, Y)
>
> ------------------------------------------------------------------------------------------------------------------------------
>
>
> On Tue, Dec 23, 2008 at 11:15 AM, David Winsemius <dwinsemius at comcast.net
> > wrote:
>
> On Dec 23, 2008, at 10:56 AM, Brigid Mooney wrote:
>
> Thank you again for your help.
>
> snip
>
>
>
> -------------
> With the 'apply' call, Results2 is of class list.
>
> Results2 <- apply(CombParam, 1, calcProfit, X, Y)
> -------------------------------------------------------------------------------------------------------
>
> How can I get convert Results2 from a list to a data frame like
> Results?
>
> Have you tried as.data.frame() on Results2? Each of its elements
> should have the proper structure.
>
> You no longer have a reproducible example, but see this session clip:
> > lairq <- apply(airquality,1, function(x) x )
> > str(lairq)
> num [1:6, 1:153] 41 190 7.4 67 5 1 36 118 8 72 ...
> - attr(*, "dimnames")=List of 2
> ..$ : chr [1:6] "Ozone" "Solar.R" "Wind" "Temp" ...
> ..$ : NULL
> > is.data.frame(lairq)
> [1] FALSE
> > is.data.frame(rbind(lairq))
> [1] FALSE
> > is.data.frame( as.data.frame(lairq) )
> --
> David Winsemius
>
2 days later
Bert Gunter <gunter.berton <at> gene.com> writes:
FWIW: Good advice below! -- after all, the first rule of optimizing code is: Don't! For the record (yet again), the apply() family of functions (and their packaged derivatives, of course) are "merely" vary carefully written for() loops: their main advantage is in code readability, not in efficiency gains, which may well be small or nonexistent. True efficiency gains require "vectorization", which essentially moves the for() loops from interpreted code to (underlying) C code (on the underlying data structures): e.g. compare rowMeans() [vectorized] with ave() or apply(..,1,mean).
[...] The apply-functions do bring speed-advantages. This is not only what I read about it, I have used the apply-functions and really got results faster. The reason is simple: an apply-function does make in C, what otherwise would be done on the level of R with for-loops. Ciao, Oliver
On Thu, 25 Dec 2008, Oliver Bandel wrote:
Bert Gunter <gunter.berton <at> gene.com> writes:
FWIW: Good advice below! -- after all, the first rule of optimizing code is: Don't! For the record (yet again), the apply() family of functions (and their packaged derivatives, of course) are "merely" vary carefully written for() loops: their main advantage is in code readability, not in efficiency gains, which may well be small or nonexistent. True efficiency gains require "vectorization", which essentially moves the for() loops from interpreted code to (underlying) C code (on the underlying data structures): e.g. compare rowMeans() [vectorized] with ave() or apply(..,1,mean).
[...] The apply-functions do bring speed-advantages. This is not only what I read about it, I have used the apply-functions and really got results faster. The reason is simple: an apply-function does make in C, what otherwise would be done on the level of R with for-loops.
Not true of apply(): true of lapply() and hence sapply(). I'll leave you to check eapply, mapply, rapply, tapply. So the issue is what is meant by 'the apply() family of functions': people often mean *apply(), of which apply() is an unusual member, if one at all. [Historical note: a decade ago lapply was internally a for() loop. I rewrote it in C in 2000: I also moved apply to C at the same time but it proved too little an advantage and was reverted. The speed of lapply comes mainly from reduced memory allocation: for() is also written in C.]
Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595
Prof Brian Ripley wrote:
On Thu, 25 Dec 2008, Oliver Bandel wrote:
....
The apply-functions do bring speed-advantages. This is not only what I read about it, I have used the apply-functions and really got results faster. The reason is simple: an apply-function does make in C, what otherwise would be done on the level of R with for-loops.
Not true of apply(): true of lapply() and hence sapply(). I'll leave you to check eapply, mapply, rapply, tapply. So the issue is what is meant by 'the apply() family of functions': people often mean *apply(), of which apply() is an unusual member, if one at all.
Conceptually, I think it belongs there. apply(M,1,max) is similar to tapply(M,row(M),max), etc. The "apply-functions" share a general split-operate-reassemble set of semantics, and apply _could_ be implemented as splitting by indices in MARGINS, followed by lapply, followed by reassembly into a matrix, as in tapply(). In reality, apply() is implemented differently, using aperm() and direct indexing. This is more efficient, but it shouldn't necessarily change the way in which we think about it. It is a bit unfortunate that the most complex mamber of the family has gotten the most basic name, though.
[Historical note: a decade ago lapply was internally a for() loop. I rewrote it in C in 2000: I also moved apply to C at the same time but it proved too little an advantage and was reverted. The speed of lapply comes mainly from reduced memory allocation: for() is also written in C.]
O__ ---- Peter Dalgaard ?ster Farimagsgade 5, Entr.B c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk) FAX: (+45) 35327907
Thankyou for the clarification, Brian. This is very helpful (as usual). However, I think the important point, which I misstated, is that whether it be for() or, e.g. lapply(), the "loop" contents must be evaluated at the interpreted R level, and this is where most time is typically spent. To get the speedup that most people hope for, avoiding the loop altogether (i.e. moving loop **and** evaluations) to C level, via R programming -- e.g. via use of matrix operations, indexing, or built-in .Internal functions, etc. -- is the key. Please correct me if I'm (even partially) wrong. As you know, the issue arises frequently. -- Bert Gunter Genentech -----Original Message----- From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf Of Prof Brian Ripley Sent: Friday, December 26, 2008 12:44 AM To: Oliver Bandel Cc: r-help at stat.math.ethz.ch Subject: Re: [R] How can I avoid nested 'for' loops or quicken the process?
On Thu, 25 Dec 2008, Oliver Bandel wrote:
Bert Gunter <gunter.berton <at> gene.com> writes:
FWIW: Good advice below! -- after all, the first rule of optimizing code is: Don't! For the record (yet again), the apply() family of functions (and their packaged derivatives, of course) are "merely" vary carefully written
for()
loops: their main advantage is in code readability, not in efficiency
gains,
which may well be small or nonexistent. True efficiency gains require "vectorization", which essentially moves the for() loops from interpreted code to (underlying) C code (on the underlying data structures): e.g. compare rowMeans() [vectorized] with ave() or apply(..,1,mean).
[...] The apply-functions do bring speed-advantages. This is not only what I read about it, I have used the apply-functions and really got results faster. The reason is simple: an apply-function does make in C, what otherwise would be done on the level of R with for-loops.
Not true of apply(): true of lapply() and hence sapply(). I'll leave you to check eapply, mapply, rapply, tapply. So the issue is what is meant by 'the apply() family of functions': people often mean *apply(), of which apply() is an unusual member, if one at all. [Historical note: a decade ago lapply was internally a for() loop. I rewrote it in C in 2000: I also moved apply to C at the same time but it proved too little an advantage and was reverted. The speed of lapply comes mainly from reduced memory allocation: for() is also written in C.]
Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595 ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
On Fri, 26 Dec 2008, Bert Gunter wrote:
Thankyou for the clarification, Brian. This is very helpful (as usual). However, I think the important point, which I misstated, is that whether it be for() or, e.g. lapply(), the "loop" contents must be evaluated at the interpreted R level, and this is where most time is typically spent. To get the speedup that most people hope for, avoiding the loop altogether (i.e. moving loop **and** evaluations) to C level, via R programming -- e.g. via use of matrix operations, indexing, or built-in .Internal functions, etc. -- is the key. Please correct me if I'm (even partially) wrong. As you know, the issue arises frequently.
'Typically' is not the whole story. In a loop like
Y <- double(length(X))
for(i in seq_along(X)) Y[i] <- fun(X[i])
quite a lot of time and memory may be spent in re-allocating Y at each
step of the loop, and lapply() is able to avoid that. E.g.
X <- runif(1e6)
system.time({
Y <- double(length(X))
for(i in seq_along(X)) Y[i] <- sin(X[i])
})
takes 5.2 secs vs unlist(lapply(X, sin)) which takes 1.5. Of course,
using the vectorized function sin() takes 0.05 sec. If you use sapply you
will lose all the gain.
This is not a typical example, but it arises often enough to make it
worthwhile having an optimized lapply().
-- Bert Gunter Genentech -----Original Message----- From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf Of Prof Brian Ripley Sent: Friday, December 26, 2008 12:44 AM To: Oliver Bandel Cc: r-help at stat.math.ethz.ch Subject: Re: [R] How can I avoid nested 'for' loops or quicken the process? On Thu, 25 Dec 2008, Oliver Bandel wrote:
Bert Gunter <gunter.berton <at> gene.com> writes:
FWIW: Good advice below! -- after all, the first rule of optimizing code is: Don't! For the record (yet again), the apply() family of functions (and their packaged derivatives, of course) are "merely" vary carefully written
for()
loops: their main advantage is in code readability, not in efficiency
gains,
which may well be small or nonexistent. True efficiency gains require "vectorization", which essentially moves the for() loops from interpreted code to (underlying) C code (on the underlying data structures): e.g. compare rowMeans() [vectorized] with ave() or apply(..,1,mean).
[...] The apply-functions do bring speed-advantages. This is not only what I read about it, I have used the apply-functions and really got results faster. The reason is simple: an apply-function does make in C, what otherwise would be done on the level of R with for-loops.
Not true of apply(): true of lapply() and hence sapply(). I'll leave you to check eapply, mapply, rapply, tapply. So the issue is what is meant by 'the apply() family of functions': people often mean *apply(), of which apply() is an unusual member, if one at all. [Historical note: a decade ago lapply was internally a for() loop. I rewrote it in C in 2000: I also moved apply to C at the same time but it proved too little an advantage and was reverted. The speed of lapply comes mainly from reduced memory allocation: for() is also written in C.]
Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595