R code for if-then-do code blocks
I got another 10% savings with this example by using only one
subscripting adjustment.
I also fixed a typo in my previous posting (which didn't affect the timing).
microbenchmark(
rmh={
d3 <-data.frame(ID=rownames(d1),
d1,
test1=0,
test2=0,
test4=0,
test5=0)
myRowSubset <- d3$gender=="f" & d3$workshop==1
test1 <- 1
d3[myRowSubset, "test1"] <- test1 + 6
d3[myRowSubset, "test2"] <- test1 + 6 + 2
d3[myRowSubset, c("test4", "test5")] <- test1
},
rmh4={
d4 <- data.frame(ID=rownames(d1),
d1,
test1=0,
test2=0,
test4=0,
test5=0)
myRowSubset <- d4$gender=="f" & d4$workshop==1
test1 <- 1
d4[myRowSubset, c("test1", "test2", "test4", "test5")] <-
matrix(test1 + c(6, 6+2, 0, 0), nrow=sum(myRowSubset), ncol=4, byrow=TRUE)
}
)
Unit: microseconds
expr min lq mean median uq max neval cld
rmh 956.187 1183.304 1538.012 1617.985 1865.149 2177.071 100 b
rmh4 850.729 1042.997 1380.842 1416.476 1700.307 2448.545 100 a
On Mon, Dec 17, 2018 at 12:49 PM Richard M. Heiberger <rmh at temple.edu> wrote:
this can be dome even faster, and I think more easily read, using only base R
d1 <- data.frame(workshop=rep(1:2,4),
gender=rep(c("f","m"),each=4))
## needed by vector and rowbased, not needed by rmh
library(tibble)
library(plyr)
library(magrittr)
microbenchmark(
vector = {d1 %>%
rownames_to_column("ID") %>%
mutate(
test1 = ifelse(gender == "f" & workshop == 1, 7, 0),
test2 = ifelse(gender == "f" & workshop == 1, test1 + 2, 0),
test4 = ifelse(gender == "f" & workshop == 1, 1, 0),
test5 = test4
) },
rowbased = {d1 %>%
rownames_to_column("ID") %>%
mutate(test1 = NA, test2 = NA, test4 = NA, test5 = NA) %>%
ddply("ID",
within,
if (gender == "f" & workshop == 1) {
test1 <- 1
test1 <- 6 + test1
test2 <- 2 + test1
test4 <- 1
test5 <- 1
} else {
test1 <- test2 <- test4 <- test5 <- 0
})},
rmh={
data.frame(ID=rownames(d1),
d1,
test1=0,
test2=0,
test4=0,
test5=0)
myRowSubset <- d3$gender=="f" & d3$workshop==1
test1 <- 1
d3[myRowSubset, "test1"] <- test1 + 6
d3[myRowSubset, "test2"] <- test1 + 6 + 2
d3[myRowSubset, c("test4", "test5")] <- test1
}
)
Unit: microseconds
expr min lq mean median uq max neval cld
vector 1281.994 1468.102 1669.266 1573.043 1750.354 3171.777 100 a
rowbased 8131.230 8691.899 10894.700 9219.882 10435.642 133293.034 100 b
rmh 925.571 1056.530 1167.568 1116.425 1221.457 1968.199 100 a
On Mon, Dec 17, 2018 at 12:15 PM Thierry Onkelinx via R-help
<r-help at r-project.org> wrote:
Dear Paul,
R's power is that is works vectorised. Unlike SAS which is rowbased. Using
R in a SAS way will lead to very slow code.
Your examples can be written vectorised
d1 %>%
rownames_to_column("ID") %>%
mutate(
test1 = ifelse(gender == "f" & workshop == 1, 7, 0),
test2 = ifelse(gender == "f" & workshop == 1, test1 + 2, 0),
test4 = ifelse(gender == "f" & workshop == 1, 1, 0),
test5 = test4
)
Here is a speed comparison.
library(microbenchmark)
microbenchmark(
vector = {d1 %>%
rownames_to_column("ID") %>%
mutate(
test1 = ifelse(gender == "f" & workshop == 1, 7, 0),
test2 = ifelse(gender == "f" & workshop == 1, test1 + 2, 0),
test4 = ifelse(gender == "f" & workshop == 1, 1, 0),
test5 = test4
) },
rowbased = {d1 %>%
rownames_to_column("ID") %>%
mutate(test1 = NA, test2 = NA, test4 = NA, test5 = NA) %>%
ddply("ID",
within,
if (gender == "f" & workshop == 1) {
test1 <- 1
test1 <- 6 + test1
test2 <- 2 + test1
test4 <- 1
test5 <- 1
} else {
test1 <- test2 <- test4 <- test5 <- 0
})}
)
Best regards,
Thierry
ir. Thierry Onkelinx
Statisticus / Statistician
Vlaamse Overheid / Government of Flanders
INSTITUUT VOOR NATUUR- EN BOSONDERZOEK / RESEARCH INSTITUTE FOR NATURE AND
FOREST
Team Biometrie & Kwaliteitszorg / Team Biometrics & Quality Assurance
thierry.onkelinx at inbo.be
Havenlaan 88 bus 73, 1000 Brussel
www.inbo.be
///////////////////////////////////////////////////////////////////////////////////////////
To call in the statistician after the experiment is done may be no more
than asking him to perform a post-mortem examination: he may be able to say
what the experiment died of. ~ Sir Ronald Aylmer Fisher
The plural of anecdote is not data. ~ Roger Brinner
The combination of some data and an aching desire for an answer does not
ensure that a reasonable answer can be extracted from a given body of data.
~ John Tukey
///////////////////////////////////////////////////////////////////////////////////////////
<https://www.inbo.be>
Op ma 17 dec. 2018 om 16:30 schreef Paul Miller via R-help <
r-help at r-project.org>:
Hello All,
Season's greetings!
Am trying to replicate some SAS code in R. The SAS code uses if-then-do
code blocks. I've been trying to do likewise in R as that seems to be the
most reliable way to get the same result.
Below is some toy data and some code that does work. There are some things
I don't necessarily like about the code though. So I was hoping some people
could help make it better. One thing I don't like is that the within
function reverses the order of the computed columns such that test1:test5
becomes test5:test1. I've used a mutate to overcome that but would prefer
not to have to do so.
Another, perhaps very small thing, is the need to calculate an ID
variable that becomes the basis for a grouping.
I did considerable Internet searching for R code that conditionally
computes blocks of code. I didn't find much though and so am wondering if
my search terms were not sufficient or if there is some other reason. It
occurred to me that maybe if-then-do code blocks like we often see in SAS
as are frowned upon and therefore not much implemented.
I'd be interested in seeing more R-compatible approaches if this is the
case. I've learned that it's a mistake to try and make R be like SAS. It's
better to let R be R. Trouble is I'm not always sure how to do that.
Thanks,
Paul
d1 <- data.frame(workshop=rep(1:2,4),
gender=rep(c("f","m"),each=4))
library(tibble)
library(plyr)
d2 <- d1 %>%
rownames_to_column("ID") %>%
mutate(test1 = NA, test2 = NA, test4 = NA, test5 = NA) %>%
ddply("ID",
within,
if (gender == "f" & workshop == 1) {
test1 <- 1
test1 <- 6 + test1
test2 <- 2 + test1
test4 <- 1
test5 <- 1
} else {
test1 <- test2 <- test4 <- test5 <- 0
})
______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.