Skip to content

Using functions/loops for repetitive commands

10 messages · Gerrit Eichner, Shekhar, dereksloan +1 more

#
I still need to do some repetitive statistical analysis on some outcomes
from a dataset.

Take the following as an example;

id	sex	hiv	age	famsize	bmi	resprate
1	M	Pos	23	2	        16	15
2	F	Neg	24	5	        18	14
3	F	Pos	56	14	        23	24
4	F	Pos	67	3	        33	31
5	M	Neg	34	2	        21	23

I want to know if there are statistically detectable differences in all of
the continuous variables in my data set when subdivided by sex or hiv status
(ie are age, family size, bmi and resprate different in my male and female
patients or in hiv pos/neg patients)
Of course I can use wilcoxon or t-tests e.g:

wilcox.test( age~sex)
wilcox.test(famsize~sex) 
wilcox.test(bmi~sex)
wilcox.test(resprate~sex)
wilcox.test( age~hiv)
wilcox.test(famsize~hiv) 
wilcox.test(bmi~hiv)
wilcox.test(resprate~hiv)

but there must be some easy way of looping/automating this code (i.e. get
all the continuous variables analysed one by one by sex, then analysed one
by one by hiv status).
Obviously my actual dataset is considerably bigger than what is shown here -
I have many variables to assess making the longhand instruction to do every
test pretty unsatisfactory.

I think I can use ?for? or some other looping command for this purpose but I
can?t work out how. I think I don?t properly understand how loops work yet
as I'm still quite new to R.

Please could someone help ? ideally with an explanation and some quick
sample code? 

Derek


--
View this message in context: http://r.789695.n4.nabble.com/Using-functions-loops-for-repetitive-commands-tp3498006p3498006.html
Sent from the R help mailing list archive at Nabble.com.
#
Hello, Derek,

see below.
On Thu, 5 May 2011, dereksloan wrote:

            
Define, e. g.,

my.wilcox.tests <- function( var.names, groupvar.name, data) {
  lapply( var.names,
          function( v) {
           form <- as.formula( paste( v, "~", groupvar.name))
           wilcox.test( form, data = data)
           } )
  }


and call something like

my.wilcox.test( <character vector with relevant variable names>,
                 <character string with relevant grouping variable>,
                 data = <your data set as data frame>)

Caveat: untested!

  Hth  --  Gerrit

---------------------------------------------------------------------
Dr. Gerrit Eichner                   Mathematical Institute, Room 212
gerrit.eichner at math.uni-giessen.de   Justus-Liebig-University Giessen
Tel: +49-(0)641-99-32104          Arndtstr. 2, 35392 Giessen, Germany
Fax: +49-(0)641-99-32109        http://www.uni-giessen.de/cms/eichner
#
Hi Derek,
You can accomplish your loop jobs by following means:
(a) use for loop
(b) use while loop
(c) use lapply, tapply, or sapply. (i feel "lapply is the elegant
way )


---------------For Loop-----------------------------
"for" loops are pretty simple to use and is almost similar to any
other scripting languages you know.( I am referring to Matlab)

(Example 1) lets say you know that you have to run 10 iterations then
you can run it as

for(i in 1:10) print(i)
//it will print the number from 1 to 10

(Example 2) You don't know how many iterations you need to run. Only
thing you have is some vector and you want to do some operation on
that vector. You can do something like this:

myVector<-c(20,45,23,45,89)
for(i in seq_along(myVector)) print(myVector[i]

-------------Using lapply-------------------------
In "lapply" you need to provide mainly two things:
(1)First parameter: vectors or some sequence of numbers
(2)Second parameter: A function which could be user defined function
or some other inbuilt function.

lapply will call the function for every number given in the "First
parameter of the function)

For example:

x<-c(10,20,20)
lapply(seq_along(x),function(i) {//your logic})

if you see the first parameter i have sent seq_along(x). The outcome
of seq_along(x) will be 1, 2,3.
Now lapply will take each of these numbers and call the function. That
means lapply is calling the function thrice for the current data set
something like this

function(1) { //your logic}
function(2) { }
function(3) { //)

That means your logic inside the function will be executed for each
and every value specified in the first parameter of the lapply
function.

I hope it helps you in some way.

For your problem, i am making a guess that you are using data frame or
matrix to store the data and then you want to automate the data right?
You can try using "lapply", i think that would be efficient..Let me
also try ..

Regards,
Som Shekhar
#
Your code may be untested but it works - also helping me slowly to start
understanding how to write functions. Thank you.

However I still have difficulty. I also have some categorical variables to
analyse by age & hiv status - i.e. my dataset expands to (for example);

id	sex	hiv	age	famsize	bmi	resprate   smoker   alcohol
1	M	Pos	23	2	       16	15             Y           Y
2	F	Neg	24	5	       18	14             Y           Y
3	F	Pos	56	14	       23	24             Y           N
4	F	Pos	67	3	       33	31             N           N
5	M	Neg	34	2	       21	23             N           N


Using the template for the code you sent me I thought I could analyse the
categorical variables by sex & hiv status using a chiq-squared test;

Long-hand this would be;

chisq.test(smoker,sex)
chisq.test(alcohol,sex)
chisq.test(smoker,hiv)
chisq.test(alcohol,hiv)

Again I wanted to use a function to loop automate it and thought I could
write;

categ<-c(smoker,alcohol)
group.name<-c(sex,hiv)
bl.chisq<-function(categ,group.name,<dataframe name>){
lapply(categ,
function(y){
form2<-as.formula(paste(y,group.name))
chisq.test(form2,<dataframe name>)
})
}

bl.chisq(categ,group.name,<data frame name>)

but I get an error message:

Error in parse(text = x) : unexpected symbol in "smoker sex"

What is wrong with the code? Is is because the wilcox.test is a formula
(with a ~ symbol for modelling) whilst the chisq.test simply requires me to
list raw data? If so how can I change my code to automate the chisq.test in
the same way I did for the wilcox.test?

Many thanks for any help!

Derek 




--
View this message in context: http://r.789695.n4.nabble.com/Using-functions-loops-for-repetitive-commands-tp3498006p3498427.html
Sent from the R help mailing list archive at Nabble.com.
#
On May 5, 2011, at 10:01 AM, dereksloan wrote:

            
I haven't tested it but I suspect you failed to note that Eichner used  
sep="~" in his paste argument to as.formula().
David Winsemius, MD
West Hartford, CT
#
Thanks David,

I did notice that and I got his code to work using wilcox.test for the
continuous variables.

The problem is that when I tried to alter the code to do chisq.test on my
categorical variables there is something wrong with the syntax and I don't
know what.

Derek

--
View this message in context: http://r.789695.n4.nabble.com/Using-functions-loops-for-repetitive-commands-tp3498006p3498896.html
Sent from the R help mailing list archive at Nabble.com.
#
On May 5, 2011, at 1:08 PM, dereksloan wrote:

            
Right....
 > ?chisq.test
# No mention of a formula argument seen
 > ?chisq.test.formula
No documentation for 'chisq.test.formula' in specified packages and  
libraries:
you could try '??chisq.test.formula'

`chisq.test` doesn't have a formula method, so sending it a formula  
will fail.

Why aren't you sending it the arguments instead of turning them into  
strings?
David Winsemius, MD
West Hartford, CT
#
Thanks a lot,

I understand what you say but I'm having problems - maybe with the syntax or
the specific command.

You are right - I have a dataframe to store the data and want to automate
the analysis.

i.e. I want do a chisq.test with to know if alcohol intake (Y/N) differs
between sexes, then if smoking (Y/N) differs between sexes, then if alcohol
intake or smoking differ by hiv status.

The command within my data frame for each individual comparison is e.g.

chisq.test(alcohol,sex)... then repeat it for all combination of variables.

but using lapply I'm still unsure how to design the loop.

I'll keep trying - let me know if you have more ideas.

Derek


--
View this message in context: http://r.789695.n4.nabble.com/Using-functions-loops-for-repetitive-commands-tp3498006p3499001.html
Sent from the R help mailing list archive at Nabble.com.
#
On May 5, 2011, at 1:45 PM, dereksloan wrote:

            
I don't generally answer questions that support shotgun approaches to  
manufacturing p-values for fear of encouraging unprincipled data- 
ming ... unless it is clear that the questioner understands what he  
are doing from a statistical point of view. So my apologies. I  
probably shouldn't have even posted in this case. I misunderstood the  
question and thought it was just a quick syntactic fix. I now  
understand it to be more involved and really demands more care and  
respect than I was giving it.

  
    
#
Hello, Derek,

first of all, be very aware of what David Winsemius said; you are about to 
enter the area of "unprincipled data-mining" (as he called it) with its 
trap -- one of many -- of multiple testing. So, *if* you know what the 
consequences and possible remedies are, a purely R-syntactic "solution" to 
your problem might be the (again not fully tested) hack below.
Try

lapply( <your_data_frame>[<selection_of_relevant_components>],
         function( y)
          chisq.test( y, <your_data_frame>$<group_name>)
       )

or even shorter:

lapply( <your_data_frame>[<selection_of_relevant_components>],
         chisq.test, <your_data_frame>$<group_name>
       )


However, in the resulting output you will not be seeing the names of the 
variables that went into the first argument of chisq.test(). This is a 
little bit more complicated to resolve:

lapply( names( <your_data_frame>[<selection_of_relevant_components>]),
         function( y)
          eval( substitute( chisq.test( <your_data_frame>$y0,
                                        <your_data_frame>$tension),
                            list( y0 = y) ) )
        )



Still another possibility is to use xtabs() (with its summary-method) 
which has a formula argument.


  Hoping that you know what to do with the results  --  Gerrit

---------------------------------------------------------------------
Dr. Gerrit Eichner                   Mathematical Institute, Room 212
gerrit.eichner at math.uni-giessen.de   Justus-Liebig-University Giessen
Tel: +49-(0)641-99-32104          Arndtstr. 2, 35392 Giessen, Germany
Fax: +49-(0)641-99-32109        http://www.uni-giessen.de/cms/eichner