Message: 9
Date: Mon, 22 Aug 2005 06:40:45 +0200
From: "Petr Pikal" <petr.pikal at precheza.cz>
Subject: Re: [R] A. Mani : Avoiding loops
To: "A. Mani" <a_mani_sc_gs at vsnl.net>, r-help
<r-help at stat.math.ethz.ch>
On 20 Aug 2005 at 3:26, A. Mani wrote:
On Friday 19 August 2005 11:54, Sean O'Riordain wrote:
Hi,
I'm not sure what you actually want from your email (following the
posting guide is a good way of helping you explain things to the
rest of us in a way we understand - it might even answer your
question!
I'm only a beginner at R so no doubt one of our expert colleagues
will help me...
fred <- data.frame()
fred <- edit(fred)
fred
A B C D E
1 1 2 X Y 1
2 2 3 G L 1
3 3 1 G L 5
fred[,3]
[1] X G G
Levels: G X
fred[fred[,3]=="G",]
A B C D E
2 2 3 G L 1
3 3 1 G L 5
so at this point I can create a new dataframe with column 3 (C) ==
"G"; either explicitly or implicitly...
and if I want to calculate the sum() of column E, then I just say
something like...
sum(fred[fred[,3]=="G",][,5])
[1] 6
now naturally being a bit clueless at manipulating stuff in R, I
didn't know how to do this before I started... and you guys only get
to see the lines that I typed in and got a "successful" result...
according to section 6 of the "Introduction to R" manual which comes
with R, I could also have said
sum(fred[fred$C=="G",]$E)
[1] 6
Hmmm.... I wonder would it be reasonable to put an example of this
type into section 2.7 of the "Introduction to R"?
cheers!
Sean
On 18/08/05, A. Mani <a_mani_sc_gs at vsnl.net> wrote:
Hello,
I want to avoid loops in the following situation. There is
a
5-col dataframe with col headers alone. two of the columns are
non-numeric. The problem is to calculate statistics(scores) for
each element of one column. The functions depend on matching in
the other non-numeric column.
A B C E F
1 2 X Y 1
2 3 G L 1
3 1 G L 5
and so on ...30000+ entries.
I need scores for col E entries which depend on conditional
implications.
Thanks,
Hello,
Sorry about the incomplete problem. Here is a better version for
the
problem: (the measure is not simple)
The data frame is like
col1 col2 col3 col4 col5
<num> <nonum> <nonum> <num> <num>
A B C E F
There are repeated strings in col3, col2. Problem : Calculate
Measure(Ci) = [No. of repeats of Ci *100] + [If (Bi, Ci) is same as
(Bj, Cj) and 6>= Ej - Ei >=3 then add 100 else 10] .
Hi
I am not sure what exactly you would like to compute,
**working** example could help. But if you want to do some
computation for row "i" which depends on row "j", I suppose that
you can not avoid loops.
Generally you can use one of aggregate, tapply, by or ave for some
computation split by factor. See help pages.
tapply(vector or data frame, list(factors), function)
is the standard form.
HTH
Petr
Actually it is to stretched further by adding similar blocks.
How do we use *apply or
something else in the situation ?
In prolog it is extremely easy, but here it is not quite...
Here is some code and a little data
dat <- read.table("/home/project5R/datasplf.csv", header=TRUE,
sep=",", na.strings="NA", dec=".", strip.white=TRUE)
attach(dat)
showData(dat, placement='-20+200', font=.logFont, maxwidth=80, maxheight=30)
x <- as.matrix(dat)
x1 <- as.vector(x[,1])
xd1 <- as.Date(x1, format= "%m-%d-%Y")
n <- length(x1)
n
x2 <- as.vector(x[,2])
length(x2)
x3 <- as.vector(x[,3])
length(x3)
x4 <- as.vector(x[,4])
x5 <- as.vector(x[,5])
x5[is.na(x5)] <- 0
xd4 <- as.Date(x4, format= "%m-%d-%Y")
xd4
p6 <- (1-(abs(x5 - 6)/6))*100
p6
xd1 <- as.Date(x1, format= "%m-%d-%Y")
xd1
x23 <- cbind(x2,x3)
xp <- paste(x2,x3)
xp
y <- cbind(x23,xd4,xd1,xp)
_____________________________________________________________
#The Score to be computed is for the doctors. It is no. of patients *100 + rate
of decrease of diabetic score *1000 + no.of tests at approx 3 months *....(see
below )
_____________________________________________________________
# To be debugged (loops)
sc <- vector(n, mode = "numeric")
for (i in 1:n){for(j in 1:n) {If identical(x3[[i]],x3[[j]]) &
identical(x2[[i]],x2[[j]])}
sc[[i]] <- sc[[i]] + 100 else sc[[i]] <- sc[[i]] +0 }
sc
scf <- vector(0, length= n, mode = "numeric", step=0)
for (i,j in 1:n) {If (identical(x3[[i]],x3[[j]]) & identical(x2[[i]],x2[[j]]) &
abs(1-(abs(xd4[[i]]-xd4[[j]]))/90) <= 1.25)} scf[[i]] <- scf[[i]] +
100 else scf[[i]] <- scf[i] +0
scr <- vector(0, length= n, mode = "numeric", step=0)
for (i,j in 1:n) {If (identical(x3[[i]],x3[[j]]) & identical(x2[[i]],x2[[j]])}
scr[[i]] <- ((abs(x5[[i]]-x5[[j]]))/(abs(xd4[[i]]-xd4[[j]]))) *1000 + scr[[i]]
sce <- vector(0, length= n, mode = "numeric", step=0)
for (i in 1:n) {sce[[i]] <- sce[[i]] + (1 - abs(x5[[i]]- 6)/6)*100}
se <- scf + sce + scr + sc
score <- cbind(x3, se)
____________________________
DATA
"DOB","ID","DOCTOR","DATE of TEST","TEST1"
12-23-1921,2177532.174,NA,01-20-2003,NA
NA,2358368.261,"152N7R",01-26-2003,NA
NA,2358368.261,"152N7R",01-27-2003,NA
07-24-1938,2174903.913,NA,01-31-2003,6.7
12-25-1924,2185493.043,NA,01-31-2003,NA
07-21-1943,2181658.696,"K9PL9N,L",01-28-2003,7
05-24-1938,2306571.304,"SH7RM9N",01-13-2003,NA
07-29-1949,2296516.522,"H3001FR9",01-20-2003,NA
Thanks,
A. Mani
Member, Cal. Math. Soc