Inline
On Tue, Feb 14, 2012 at 3:16 AM, Nerak T <nerak.t at hotmail.com> wrote:
Dear Ilai, Thanks for your answer. I'm indeed kind of a beginner in R, starting to?discover the endless possibilities in R. My goal?for the moment is indeed to get rid of the use of loops and to see through the apply family (which seems to be an endless maze for the moment). (For some reason, the apply function doesn?t seems to be logical for my brains which prefer to think in a loop way)
?apply ?lapply ?tapply, etc. are just wrappers for building more
efficient loops. If you "think in loops" (which you shouldn't) you are
also thinking in "apply". The reason it may seem like an endless maze
is because you use different wrappers for looping over different
object classes and indices, but at the end, the call to apply() is
similar to calling for().
e.g. consider a matrix with dimensions n x p. To sum rows you could
for(i in 1:n) sum(matrix[i,])
But better apply(matrix, 1 , sum) # where the 1 denotes the 1st
dimension (rows)
Same thing for sum columns
for(j in 1:p) sum(matrix[,j])
But better apply(matrix, 2 , sum) # where the 2 denotes the 2nd
dimension (columns)
The same "stuff" that goes in the loop can go to apply with small
syntax changes. e.g.
ll<- list(1:5,1:10,letters)
out<- list()
for(L in 1:3){
# ...
# ... a bunch of complicated functions/calculations
# ...
out[[ L ]] <- length( ll[[ L ]] )
}
out
Can be replaced with
lapply( ll, function(L) {
# ...
# ... a bunch of complicated function/calculations
# ...
length(L)
} )
This time use lapply since you are looping over L elements of the list ll.
Your answer is really helpful. Something I found really interesting is your general comment. You say that I don?t need to declare variables? The reason I started to do this is because if I don?t, I get a message that the object is not found.
??? xx<- c(0,10,100) # declare xx print(xx) rnorm(3,xx) # use it rm(xx) # remove xx rnorm(3, xx<- c(0,10,100)) define and use it "at the same time" print(xx)
If I create a data frame before the calculation with the right amount of rows, it seems to work. But if there is a way not to have to make them before, would be great?
Only in loops. What happens is you need to create a "storage" for the result of your loop, since the objects created in the loop are overwritten at each step: for(i in 1:10) cat( i, '\n' ) i # i = 10 everything before was overwritten
Because most of the time, I don?t need that column that I created (but couldn't create an empty data frame with right dimensions to solve the problem)?
See my lapply example for creating "empty" storage.
Last, in R you want to avoid loops as much as possible especially for large data sets. Operations are performed on objects so 1:4 + 1 is equivalent to for(i in 1:4) i+1
This part I don?t understand? where do you put that ? 1:4 + 1 ? ?
You don't put it anywhere, it was in answer to your comment: " but it?s not that I have created a function that has to be applied on a whole column, calculations are done for the different rows?" So, no! calculations are done on the object (which has some dimension or is a list), only in rare cases do you need to loop over each element (or dimension) of the object itself.
Many thanks, I?m trying to learn as much as possible to be able to use R more efficient so I really appreciate your help.
Pleasure. Good luck !
Kind regards, Nerak
Date: Mon, 13 Feb 2012 23:43:29 -0700 Subject: Re: [R] different way for a for loop for several columns? From: keren at math.montana.edu To: nerak.t at hotmail.com
Nerak, Your example could have been done without a loop at all (at least this calculation), or as you already know by calling one of the apply family functions which are more efficient (but are still "loops"): test<- data.frame( Date=c(1980,1980,1980,1980,1981,1981,1981,1981,1982,1982,1982,1982,1983,1983,1983,1983), C = c(0,0,0,0,5,2,0,0,0,15,12,10,6,0,0,0), B = c(0,0,0,0,9,6,2,0,0,24,20,16,2,0,0,0), F = c(0,0,0,0,6,5,1,0,0,18,16,12,10,5,1,0) ) test.2 <- test[,-1] > 1 aggregate(test.2, list(test$Date), sum) # See ?aggregate for more details. it also has a time series method which may be useful for you. A general comment. if you are or will be using R a bit more, it may benefit you to study the manuals or find a good basic tutorial. You seem to be applying the conventions of some other programming language and that's slowing you down. e.g. you don't need to declare variables, so all this stuff before your loop is unnecessary:
Year<-data.frame(Date) test.1<-data.frame(c(1980:1983)) test.4<-data.frame(c(1:4))
Also 1:4 is equivalent to data.frame(c(1:4)) without the extra attributes. Last, in R you want to avoid loops as much as possible especially for large data sets. Operations are performed on objects so 1:4 + 1 is equivalent to for(i in 1:4) i+1 Bottom line, the inner loops, calls to which, all that stuff... Hope that helps.
? ? ? ?test.3<-test.2[which(Year$Date== y)]
? ? ? ?test.4$length[y-Year[1,]+1]<-length(which(test.3>0))
? ? ? ?}
test.1<-cbind(test.1, test.4$length)
}
names(test.1)<-c("year","C","B","F")
test.1
You can see that it will take a lot of time for more objects and years.
A
problem is that the for (y in 1980:1983) { } takes a lot of time because
?
[which(Year$Date== y)] ? is used several times and it takes a lot of
time to
search through all the rows. And then, all of this has to be repeated
several times for the different objects.
But actually, it are totally the same calculations that have to be made
for
all the objects. Only the input data are different. (calculations are
made
with the values of a corresponding columns of data frame test). I
thought it
could be faster to calculate each step of the inner loop (for (y in
1980:1983) at the same time for each object . So for example: now,
test.2<-ifelse(test[,l]>1,1,0) is first calculated for year 1980 for
object
1, than for year 1981 for object 1 and so on for all the years, this is
all
repeated for the different object. I?m looking for a way to calculate
test.2<-ifelse(test[,l]>1,1,0) first for all the objects for year 1980,
ten
for all the objects for year 1981 and so on.
Does somebody knows a way to do this? I was thinking about some kind of
form
of apply, but it?s not that I have created a function that has to be
applied
on a whole column, calculations are done for the different rows?
Many thanks for your help!
Kind regards,
Nerak
--
View this message in context:
http://r.789695.n4.nabble.com/different-way-for-a-for-loop-for-several-columns-tp4385705p4385705.html
Sent from the R help mailing list archive at Nabble.com.
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.