Skip to content
Back to formatted view

Raw Message

Message-ID: <AANLkTi=xstnKhGsC_Jyx8Dh9E2fJ87GFjwYyAg2+oM=v@mail.gmail.com>
Date: 2010-12-23T05:53:41Z
From: Gabor Grothendieck
Subject: Seeking feedback on my first attempt at R programming
In-Reply-To: <398784.63903.qm@web57007.mail.re3.yahoo.com>

On Wed, Dec 22, 2010 at 4:26 PM, Paul Miller <pjmiller_57 at yahoo.com> wrote:
> Hello Everyone,
>
> Below is my first attempt at R programming. The code replicates example 5.1 from Common Statistical Methods for Clinical Research with SAS Examples. I was hoping that people more experienced than myself would be willing to take a look and let me know what I did well and what could have been done better. I'd be particularly grateful if anyone could tell me why the two user defined functions commented out at the bottom of the code don't work whereas the one I ultimately went with did.
>
> Thanks,
>
> Paul
>
> ######################################
> #### Chapter 5: The Two-Sample T-Test ####
> ######################################
>
> #### Example 5.1: Two-Sample t-Test using FEV1 Changes Data ####
>
> connection <- textConnection("
> 101 A 1.35 <NA>?? 103 A 3.22 3.55?? 106 A 2.78 3.15
> 108 A 2.45 2.30?? 109 A 1.84 2.37?? 110 A 2.81 3.20
> 113 A 1.90 2.65?? 116 A 3.00 3.96?? 118 A 2.25 2.97
> 120 A 2.86 2.28?? 121 A 1.56 2.67?? 124 A 2.66 3.76
> 102 P 3.01 3.90?? 104 P 2.24 3.01?? 105 P 2.25 2.47
> 107 P 1.65 1.99?? 111 P 1.95 <NA>?? 112 P 3.05 3.26
> 114 P 2.75 2.55?? 115 P 1.60 2.20?? 117 P 2.77 2.56
> 119 P 2.06 2.90?? 122 P 1.71 <NA>?? 123 P 3.54 2.92
> ")
>
> fev <- data.frame(scan(connection,
> list(patno = 0, trtgrp = "", fev0 = 0, fev6 = 0),?na.strings="<NA>"))
>
> fev <- within(fev,trtgrp <- factor(trtgrp, labels = c("ABC-123","Placebo")))
> fev <- within(fev,chg <- fev6 - fev0)
>
> summaryfn <- function(x)
> data.frame(n=sum(complete.cases(x)),mean=mean(x),std.dev=sd(x),
> t.value=mean(x)/(sd(x)/sqrt(sum(complete.cases(x)))),
> p.value=2*pt(-abs(mean(x)/(sd(x)/sqrt(sum(complete.cases(x))))),df=sum(complete.cases(x))-1))
>
> cfev <- na.omit(fev)
> by(cfev[3:5],cfev[2],summaryfn)
>
> fligner.test(chg ~ trtgrp, data=fev)
> t.test(chg ~ trtgrp, data=fev, var.equal=TRUE)
>
> #summaryfn <- function(x)
> #?? data.frame(n=sum(complete.cases(x)),mean=mean(x),std.dev=sd(x),
> #????t.value=t.test(x)$statistic,
> #????p.value=t.test(x)$p.value)
> #cfev <- na.omit(fev)
> #by(cfev[3:5],cfev[2],summaryfn)
>
> #summaryfn <- function(x)
> #?? data.frame(n=sum(complete.cases(x)),mean=mean(x),std.dev=sd(x),
> #????t.value=as.numeric(t.test(x)[1]),
> #????p.value=as.numeric(t.test(x)[3]))
> #cfev <- na.omit(fev)
> #by(cfev[3:5],cfev[2],summaryfn)
>


1.  Better to stay away from within unless you really need it.   Use
with or transform:

fev <- transform(fev,
  trtgrp = factor(trtgrp, labels = c("ABC-123","Placebo")),
  chg = fev6 - fev0
)

2. mean and sd act column-wise but t.test does not.   A minimal change
to make it work would be to add an apply to the t.value and p.value:

summaryfn2a <- function(x)
   data.frame(n=sum(complete.cases(x)),mean=mean(x),std.dev=sd(x),
    t.value=apply(x, 2, function(x) (t.test)(x)$statistic),
    p.value=apply(x, 2, function(x) (t.test)(x)$p.value))
cfev <- na.omit(fev)
by(cfev[3:5],cfev[2],summaryfn2a)

-- 
Statistics & Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com