Using a by() function to process several regression (lm()) functions
On Thu, Nov 5, 2009 at 11:15 PM, Marc Los Huertos <mloshuertos at csumb.edu> wrote:
Hi Charlie, Wow, I like this approach and see the problem my list of lm objects.? It does not work well. You have created a list of the values of interest, which seems obvious in hindsight, but still extracting the values with the do.call(rbind()) bit is certainly outside my experience. I'll have to look at the do.call() and see if I can backward engineer what it is doing...always more to do!? :-)
do.call() is an incredibly useful function-- I was able to do data processing much more efficiently after I found it. Basically, it takes two arguments-- a function and a list. The function is called and the list is used as the arguments. Since rbind() takes an arbitrarily long list of objects, using do.call and rbind() or cbind() is a quick way to collapse a list into a matrix or data.frame. If the function takes named arguments, such as pf(), do.call will match the names in the list with the names of the arguments-- this is the reason for all the monkey business I pulled by: 1. Extracting the parameters of the F-distribution from a summary of the linear model. 2. Converting them from a vector to a list. 3. Renaming them so that they matched the arguments to pf()
Another suggestion include this to extract the p-value, anova(linMod)$'Pr(>F)'[1], which seems more straight forward. Do you see any reason why this should be a problem?? It seems to work fine when I inserted it into your code.
This looks like a much more efficient method!
However, the plyr() package seems best to solve the other problem of trying to extract my date and site information, which I need to run the rest of the analysis (i.e. the treatment difference, which is what the point is!).? I am disappointed I didn't find it after a few hours of searching, but that is another issue.? Do you have any idea why the function has the syntax that include dots for each argument, e.g. .data and .fun. I am sure there is some logic, but I didn't find a reference in the help. Perhaps, this convention is not important, but it does beg the question for me...
I believe this is just the convention that Hadley decided to use in the plyr package. Another incredibly useful package of his that you may want to check out for data processing is 'reshape'. It is based on plyr and uses some of the same conventions. You can find good documentation, examples and papers concerning his R packages on his website: http://had.co.nz/
Thank you very much! I appreciate the diverse set of solutions, I am sure I'll find use for each of them... cheers, marc
No problem, have fun busting that data apart! -Charlie