R-help Digest, Vol 30, Issue 22 - R-help

A Mani

Mon, Aug 22, 2005 6:54 AM #

Re: A. Mani : Avoiding loops (Petr Pikal)

Message: 9
Date: Mon, 22 Aug 2005 06:40:45 +0200
From: "Petr Pikal" <petr.pikal at precheza.cz>
Subject: Re: [R] A. Mani : Avoiding loops
To: "A. Mani" <a_mani_sc_gs at vsnl.net>, r-help
 <r-help at stat.math.ethz.ch>

On 20 Aug 2005 at 3:26, A. Mani wrote:

On Friday 19 August 2005 11:54, Sean O'Riordain wrote:

Hi,
I'm not sure what you actually want from your email (following the
posting guide is a good way of helping you explain things to the
rest of us in a way we understand - it might even answer your
question!

I'm only a beginner at R so no doubt one of our expert colleagues
will help me...

fred <- data.frame()
fred <- edit(fred)
fred

  A B C D E
1 1 2 X Y 1
2 2 3 G L 1
3 3 1 G L 5

fred[,3]

[1] X G G
Levels: G X

fred[fred[,3]=="G",]

  A B C D E
2 2 3 G L 1
3 3 1 G L 5

so at this point I can create a new dataframe with column 3 (C) ==
"G"; either explicitly or implicitly...

and if I want to calculate the sum() of column E, then I just say
something like...

sum(fred[fred[,3]=="G",][,5])

[1] 6


now naturally being a bit clueless at manipulating stuff in R, I
didn't know how to do this before I started... and you guys only get
to see the lines that I typed in and got a "successful" result...

according to section 6 of the "Introduction to R" manual which comes
with R, I could also have said

sum(fred[fred$C=="G",]$E)

[1] 6

Hmmm.... I wonder would it be reasonable to put an example of this
type into section 2.7 of the "Introduction to R"?


cheers!
Sean

On 18/08/05, A. Mani <a_mani_sc_gs at vsnl.net> wrote:

Hello,
        I want to avoid loops in the following situation. There is
        a
5-col dataframe with col headers alone. two of the columns are
non-numeric. The problem is to calculate statistics(scores) for
each element of one column. The functions depend on matching in
the other non-numeric column.

A  B  C  E  F
1  2  X  Y  1
2  3  G  L  1
3  1  G  L  5
and so on ...30000+ entries.

I need scores for col E entries which depend on conditional
implications.


Thanks,

Hello,
      Sorry about the incomplete problem. Here is a better version for
      the
problem: (the measure is not simple)
The data frame is like
  col1       col2            col3       col4        col5
  <num>  <nonum>   <nonum>      <num>   <num>
       A           B             C                  E           F  
There are repeated strings in col3, col2. Problem : Calculate 
Measure(Ci) = [No. of repeats of Ci *100] + [If (Bi, Ci) is same as
(Bj, Cj) and 6>= Ej - Ei >=3 then add 100 else  10] .

Hi

I am not sure what exactly you would like to compute, 
**working** example could help. But if you want to do some 
computation for row "i" which depends on row "j", I suppose that 
you can not avoid loops. 

Generally you can use one of aggregate, tapply, by or ave for some 
computation split by factor. See help pages.

tapply(vector or data frame, list(factors), function)

is the standard form.

HTH
Petr


Actually it is to stretched further by adding similar blocks.

 How do we use *apply or
something else in the situation  ?


In prolog it is extremely easy, but here it is not quite...

Here is some code and a little data 

dat <- read.table("/home/project5R/datasplf.csv", header=TRUE,
sep=",", na.strings="NA", dec=".", strip.white=TRUE)
attach(dat)
showData(dat, placement='-20+200', font=.logFont, maxwidth=80, maxheight=30)
x <- as.matrix(dat)
x1 <- as.vector(x[,1])
xd1 <- as.Date(x1, format= "%m-%d-%Y")
n <- length(x1)
n
x2 <- as.vector(x[,2])
length(x2)
x3 <- as.vector(x[,3])
length(x3)
x4 <- as.vector(x[,4])
x5 <- as.vector(x[,5])
x5[is.na(x5)] <- 0
xd4 <- as.Date(x4, format= "%m-%d-%Y")
xd4
p6 <- (1-(abs(x5 - 6)/6))*100
p6
xd1 <- as.Date(x1, format= "%m-%d-%Y")
xd1
x23 <- cbind(x2,x3)
xp <- paste(x2,x3)
xp
y <- cbind(x23,xd4,xd1,xp)

_____________________________________________________________
#The Score to be computed is for the doctors. It is no. of patients *100 + rate
of decrease of diabetic score *1000 + no.of tests at approx 3 months *....(see
below )  

_____________________________________________________________  
# To be debugged (loops)

sc <- vector(n, mode = "numeric")
for (i in 1:n){for(j in 1:n) {If identical(x3[[i]],x3[[j]]) &
identical(x2[[i]],x2[[j]])}
sc[[i]] <- sc[[i]] + 100 else sc[[i]] <- sc[[i]] +0 }
sc
scf <- vector(0, length= n, mode = "numeric", step=0)
for (i,j in 1:n) {If (identical(x3[[i]],x3[[j]]) & identical(x2[[i]],x2[[j]]) &
abs(1-(abs(xd4[[i]]-xd4[[j]]))/90) <= 1.25)} scf[[i]] <- scf[[i]] +
100 else scf[[i]] <- scf[i] +0

scr <- vector(0, length= n, mode = "numeric", step=0)
for (i,j in 1:n) {If (identical(x3[[i]],x3[[j]]) & identical(x2[[i]],x2[[j]])}
scr[[i]] <- ((abs(x5[[i]]-x5[[j]]))/(abs(xd4[[i]]-xd4[[j]]))) *1000 + scr[[i]] 

sce <- vector(0, length= n, mode = "numeric", step=0)
for (i in 1:n) {sce[[i]] <- sce[[i]] + (1 - abs(x5[[i]]- 6)/6)*100}

se <- scf + sce + scr + sc

score <- cbind(x3, se)

____________________________
DATA
"DOB","ID","DOCTOR","DATE of TEST","TEST1"
12-23-1921,2177532.174,NA,01-20-2003,NA
NA,2358368.261,"152N7R",01-26-2003,NA
NA,2358368.261,"152N7R",01-27-2003,NA
07-24-1938,2174903.913,NA,01-31-2003,6.7
12-25-1924,2185493.043,NA,01-31-2003,NA
07-21-1943,2181658.696,"K9PL9N,L",01-28-2003,7
05-24-1938,2306571.304,"SH7RM9N",01-13-2003,NA
07-29-1949,2296516.522,"H3001FR9",01-20-2003,NA


Thanks,

 A. Mani
 Member, Cal. Math. Soc