An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20130607/6bb4d645/attachment.pl>
recode: how to avoid nested ifelse
6 messages · Joshua Wiley, Neal Fultz, Paul Johnson
Hi Paul, Unless you have truly offended the data generating oracle*, the pattern: NA, 1, NA, should be a data entry error --- graduating HS implies graduating ES, no? I would argue fringe cases like that should be corrected in the data, not through coding work arounds. Then you can just do: x <- do.call(paste0, list(es, hs, cg))
table(factor(x, levels = c("000", "100", "110", "111"), labels = c("none", "es","hs", "cg")))
none es hs cg 4 1 1 2 Cheers, Josh *Drawn from comments by Judea Pearl one lively session.
On Fri, Jun 7, 2013 at 6:13 PM, Paul Johnson <pauljohn32 at gmail.com> wrote:
In our Summer Stats Institute, I was asked a question that amounts to
reversing the effect of the contrasts function (reconstruct an ordinal
predictor from a set of binary columns). The best I could think of was to
link together several ifelse functions, and I don't think I want to do this
if the example became any more complicated.
I'm unable to remember a less error prone method :). But I expect you might.
Here's my working example code
## Paul Johnson <pauljohn at ku.edu>
## 2013-06-07
## We need to create an ordinal factor from these indicators
## completed elementary school
es <- c(0, 0, 1, 0, 1, 0, 1, 1)
## completed high school
hs <- c(0, 0, 1, 0, 1, 0, 1, 0)
## completed college graduate
cg <- c(0, 0, 0, 0, 1, 0, 1, 0)
ed <- ifelse(cg == 1, 3,
ifelse(hs == 1, 2,
ifelse(es == 1, 1, 0)))
edf <- factor(ed, levels = 0:3, labels = c("none", "es", "hs", "cg"))
data.frame(es, hs, cg, ed, edf)
## Looks OK, but what if there are missings?
es <- c(0, 0, 1, 0, 1, 0, 1, 1, NA, NA)
hs <- c(0, 0, 1, 0, 1, 0, 1, 0, 1, NA)
cg <- c(0, 0, 0, 0, 1, 0, 1, 0, NA, NA)
ed <- ifelse(cg == 1, 3,
ifelse(hs == 1, 2,
ifelse(es == 1, 1, 0)))
cbind(es, hs, cg, ed)
## That's bad, ifelse returns NA too frequently.
## Revise (becoming tedious!)
ed <- ifelse(!is.na(cg) & cg == 1, 3,
ifelse(!is.na(hs) & hs == 1, 2,
ifelse(!is.na(es) & es == 1, 1,
ifelse(is.na(es), NA, 0))))
cbind(es, hs, cg, ed)
## Does the project director want us to worry about
## logical inconsistencies, such as es = 0 but cg = 1?
## I hope not.
Thanks in advance, I hope you are having a nice summer.
pj
--
Paul E. Johnson
Professor, Political Science Assoc. Director
1541 Lilac Lane, Room 504 Center for Research Methods
University of Kansas University of Kansas
http://pj.freefaculty.org http://quant.ku.edu
[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Joshua Wiley Ph.D. Student, Health Psychology University of California, Los Angeles http://joshuawiley.com/ Senior Analyst - Elkhart Group Ltd. http://elkhartgroup.com
I would do this to get the highest non-missing level: x <- pmax(3*cg, 2*hs, es, 0, na.rm=TRUE) rock chalk... -nfultz
On Fri, Jun 07, 2013 at 06:24:50PM -0700, Joshua Wiley wrote:
Hi Paul, Unless you have truly offended the data generating oracle*, the pattern: NA, 1, NA, should be a data entry error --- graduating HS implies graduating ES, no? I would argue fringe cases like that should be corrected in the data, not through coding work arounds. Then you can just do: x <- do.call(paste0, list(es, hs, cg))
table(factor(x, levels = c("000", "100", "110", "111"), labels = c("none", "es","hs", "cg")))
none es hs cg 4 1 1 2 Cheers, Josh *Drawn from comments by Judea Pearl one lively session. On Fri, Jun 7, 2013 at 6:13 PM, Paul Johnson <pauljohn32 at gmail.com> wrote:
In our Summer Stats Institute, I was asked a question that amounts to
reversing the effect of the contrasts function (reconstruct an ordinal
predictor from a set of binary columns). The best I could think of was to
link together several ifelse functions, and I don't think I want to do this
if the example became any more complicated.
I'm unable to remember a less error prone method :). But I expect you might.
Here's my working example code
## Paul Johnson <pauljohn at ku.edu>
## 2013-06-07
## We need to create an ordinal factor from these indicators
## completed elementary school
es <- c(0, 0, 1, 0, 1, 0, 1, 1)
## completed high school
hs <- c(0, 0, 1, 0, 1, 0, 1, 0)
## completed college graduate
cg <- c(0, 0, 0, 0, 1, 0, 1, 0)
ed <- ifelse(cg == 1, 3,
ifelse(hs == 1, 2,
ifelse(es == 1, 1, 0)))
edf <- factor(ed, levels = 0:3, labels = c("none", "es", "hs", "cg"))
data.frame(es, hs, cg, ed, edf)
## Looks OK, but what if there are missings?
es <- c(0, 0, 1, 0, 1, 0, 1, 1, NA, NA)
hs <- c(0, 0, 1, 0, 1, 0, 1, 0, 1, NA)
cg <- c(0, 0, 0, 0, 1, 0, 1, 0, NA, NA)
ed <- ifelse(cg == 1, 3,
ifelse(hs == 1, 2,
ifelse(es == 1, 1, 0)))
cbind(es, hs, cg, ed)
## That's bad, ifelse returns NA too frequently.
## Revise (becoming tedious!)
ed <- ifelse(!is.na(cg) & cg == 1, 3,
ifelse(!is.na(hs) & hs == 1, 2,
ifelse(!is.na(es) & es == 1, 1,
ifelse(is.na(es), NA, 0))))
cbind(es, hs, cg, ed)
## Does the project director want us to worry about
## logical inconsistencies, such as es = 0 but cg = 1?
## I hope not.
Thanks in advance, I hope you are having a nice summer.
pj
--
Paul E. Johnson
Professor, Political Science Assoc. Director
1541 Lilac Lane, Room 504 Center for Research Methods
University of Kansas University of Kansas
http://pj.freefaculty.org http://quant.ku.edu
[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
-- Joshua Wiley Ph.D. Student, Health Psychology University of California, Los Angeles http://joshuawiley.com/ Senior Analyst - Elkhart Group Ltd. http://elkhartgroup.com
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
I still argue for na.rm=FALSE, but that is cute, also substantially faster f1 <- function(x1, x2, x3) do.call(paste0, list(x1, x2, x3)) f2 <- function(x1, x2, x3) pmax(3*x3, 2*x2, es, 0, na.rm=FALSE) f3 <- function(x1, x2, x3) Reduce(`+`, list(x1, x2, x3)) f4 <- function(x1, x2, x3) rowSums(cbind(x1, x2, x3)) es <- rep(c(0, 0, 1, 0, 1, 0, 1, 1, NA, NA), 1000) hs <- rep(c(0, 0, 1, 0, 1, 0, 1, 0, 1, NA), 1000) cg <- rep(c(0, 0, 0, 0, 1, 0, 1, 0, NA, NA), 1000) system.time(replicate(1000, f1(cg, hs, es))) system.time(replicate(1000, f2(cg, hs, es))) system.time(replicate(1000, f3(cg, hs, es))) system.time(replicate(1000, f4(cg, hs, es)))
system.time(replicate(1000, f1(cg, hs, es)))
user system elapsed 22.73 0.03 22.76
system.time(replicate(1000, f2(cg, hs, es)))
user system elapsed 0.92 0.04 0.95
system.time(replicate(1000, f3(cg, hs, es)))
user system elapsed 0.19 0.02 0.20 > system.time(replicate(1000, f4(cg, hs, es))) user system elapsed 0.95 0.03 0.98 R version 3.0.0 (2013-04-03) Platform: x86_64-w64-mingw32/x64 (64-bit)
On Fri, Jun 7, 2013 at 7:25 PM, Neal Fultz <nfultz at gmail.com> wrote:
I would do this to get the highest non-missing level: x <- pmax(3*cg, 2*hs, es, 0, na.rm=TRUE) rock chalk... -nfultz On Fri, Jun 07, 2013 at 06:24:50PM -0700, Joshua Wiley wrote:
Hi Paul, Unless you have truly offended the data generating oracle*, the pattern: NA, 1, NA, should be a data entry error --- graduating HS implies graduating ES, no? I would argue fringe cases like that should be corrected in the data, not through coding work arounds. Then you can just do: x <- do.call(paste0, list(es, hs, cg))
table(factor(x, levels = c("000", "100", "110", "111"), labels = c("none", "es","hs", "cg")))
none es hs cg 4 1 1 2 Cheers, Josh *Drawn from comments by Judea Pearl one lively session. On Fri, Jun 7, 2013 at 6:13 PM, Paul Johnson <pauljohn32 at gmail.com> wrote:
In our Summer Stats Institute, I was asked a question that amounts to
reversing the effect of the contrasts function (reconstruct an ordinal
predictor from a set of binary columns). The best I could think of was to
link together several ifelse functions, and I don't think I want to do this
if the example became any more complicated.
I'm unable to remember a less error prone method :). But I expect you might.
Here's my working example code
## Paul Johnson <pauljohn at ku.edu>
## 2013-06-07
## We need to create an ordinal factor from these indicators
## completed elementary school
es <- c(0, 0, 1, 0, 1, 0, 1, 1)
## completed high school
hs <- c(0, 0, 1, 0, 1, 0, 1, 0)
## completed college graduate
cg <- c(0, 0, 0, 0, 1, 0, 1, 0)
ed <- ifelse(cg == 1, 3,
ifelse(hs == 1, 2,
ifelse(es == 1, 1, 0)))
edf <- factor(ed, levels = 0:3, labels = c("none", "es", "hs", "cg"))
data.frame(es, hs, cg, ed, edf)
## Looks OK, but what if there are missings?
es <- c(0, 0, 1, 0, 1, 0, 1, 1, NA, NA)
hs <- c(0, 0, 1, 0, 1, 0, 1, 0, 1, NA)
cg <- c(0, 0, 0, 0, 1, 0, 1, 0, NA, NA)
ed <- ifelse(cg == 1, 3,
ifelse(hs == 1, 2,
ifelse(es == 1, 1, 0)))
cbind(es, hs, cg, ed)
## That's bad, ifelse returns NA too frequently.
## Revise (becoming tedious!)
ed <- ifelse(!is.na(cg) & cg == 1, 3,
ifelse(!is.na(hs) & hs == 1, 2,
ifelse(!is.na(es) & es == 1, 1,
ifelse(is.na(es), NA, 0))))
cbind(es, hs, cg, ed)
## Does the project director want us to worry about
## logical inconsistencies, such as es = 0 but cg = 1?
## I hope not.
Thanks in advance, I hope you are having a nice summer.
pj
--
Paul E. Johnson
Professor, Political Science Assoc. Director
1541 Lilac Lane, Room 504 Center for Research Methods
University of Kansas University of Kansas
http://pj.freefaculty.org http://quant.ku.edu
[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
-- Joshua Wiley Ph.D. Student, Health Psychology University of California, Los Angeles http://joshuawiley.com/ Senior Analyst - Elkhart Group Ltd. http://elkhartgroup.com
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Joshua Wiley Ph.D. Student, Health Psychology University of California, Los Angeles http://joshuawiley.com/ Senior Analyst - Elkhart Group Ltd. http://elkhartgroup.com
rowSums and Reduce will have the same problems with bad data you alluded to earlier, eg
cg = 1, hs = 0
But that's something to check for with crosstabs anyway.
Side note: you should check out the microbenchmark pkg, it's quite handy.
R>require(microbenchmark)
R>microbenchmark(
+ f1(cg,hs,es),
+ f2(cg,hs,es),
+ f3(cg,hs,es),
+ f4(cg,hs,es)
+ )
Unit: microseconds
expr min lq median uq max neval
f1(cg, hs, es) 23029.848 25279.9660 27024.9640 29996.6810 55444.112 100
f2(cg, hs, es) 730.665 755.5750 811.7445 934.3320 6179.798 100
f3(cg, hs, es) 85.029 101.6785 129.8605 196.2835 2820.187 100
f4(cg, hs, es) 762.232 804.4850 843.7170 1079.0800 24869.548 100
On Fri, Jun 07, 2013 at 08:03:26PM -0700, Joshua Wiley wrote:
I still argue for na.rm=FALSE, but that is cute, also substantially faster f1 <- function(x1, x2, x3) do.call(paste0, list(x1, x2, x3)) f2 <- function(x1, x2, x3) pmax(3*x3, 2*x2, es, 0, na.rm=FALSE) f3 <- function(x1, x2, x3) Reduce(`+`, list(x1, x2, x3)) f4 <- function(x1, x2, x3) rowSums(cbind(x1, x2, x3)) es <- rep(c(0, 0, 1, 0, 1, 0, 1, 1, NA, NA), 1000) hs <- rep(c(0, 0, 1, 0, 1, 0, 1, 0, 1, NA), 1000) cg <- rep(c(0, 0, 0, 0, 1, 0, 1, 0, NA, NA), 1000) system.time(replicate(1000, f1(cg, hs, es))) system.time(replicate(1000, f2(cg, hs, es))) system.time(replicate(1000, f3(cg, hs, es))) system.time(replicate(1000, f4(cg, hs, es)))
system.time(replicate(1000, f1(cg, hs, es)))
user system elapsed 22.73 0.03 22.76
system.time(replicate(1000, f2(cg, hs, es)))
user system elapsed 0.92 0.04 0.95
system.time(replicate(1000, f3(cg, hs, es)))
user system elapsed 0.19 0.02 0.20
> system.time(replicate(1000, f4(cg, hs, es)))
user system elapsed 0.95 0.03 0.98 R version 3.0.0 (2013-04-03) Platform: x86_64-w64-mingw32/x64 (64-bit) On Fri, Jun 7, 2013 at 7:25 PM, Neal Fultz <nfultz at gmail.com> wrote:
I would do this to get the highest non-missing level: x <- pmax(3*cg, 2*hs, es, 0, na.rm=TRUE) rock chalk... -nfultz On Fri, Jun 07, 2013 at 06:24:50PM -0700, Joshua Wiley wrote:
Hi Paul, Unless you have truly offended the data generating oracle*, the pattern: NA, 1, NA, should be a data entry error --- graduating HS implies graduating ES, no? I would argue fringe cases like that should be corrected in the data, not through coding work arounds. Then you can just do: x <- do.call(paste0, list(es, hs, cg))
table(factor(x, levels = c("000", "100", "110", "111"), labels = c("none", "es","hs", "cg")))
none es hs cg 4 1 1 2 Cheers, Josh *Drawn from comments by Judea Pearl one lively session. On Fri, Jun 7, 2013 at 6:13 PM, Paul Johnson <pauljohn32 at gmail.com> wrote:
In our Summer Stats Institute, I was asked a question that amounts to
reversing the effect of the contrasts function (reconstruct an ordinal
predictor from a set of binary columns). The best I could think of was to
link together several ifelse functions, and I don't think I want to do this
if the example became any more complicated.
I'm unable to remember a less error prone method :). But I expect you might.
Here's my working example code
## Paul Johnson <pauljohn at ku.edu>
## 2013-06-07
## We need to create an ordinal factor from these indicators
## completed elementary school
es <- c(0, 0, 1, 0, 1, 0, 1, 1)
## completed high school
hs <- c(0, 0, 1, 0, 1, 0, 1, 0)
## completed college graduate
cg <- c(0, 0, 0, 0, 1, 0, 1, 0)
ed <- ifelse(cg == 1, 3,
ifelse(hs == 1, 2,
ifelse(es == 1, 1, 0)))
edf <- factor(ed, levels = 0:3, labels = c("none", "es", "hs", "cg"))
data.frame(es, hs, cg, ed, edf)
## Looks OK, but what if there are missings?
es <- c(0, 0, 1, 0, 1, 0, 1, 1, NA, NA)
hs <- c(0, 0, 1, 0, 1, 0, 1, 0, 1, NA)
cg <- c(0, 0, 0, 0, 1, 0, 1, 0, NA, NA)
ed <- ifelse(cg == 1, 3,
ifelse(hs == 1, 2,
ifelse(es == 1, 1, 0)))
cbind(es, hs, cg, ed)
## That's bad, ifelse returns NA too frequently.
## Revise (becoming tedious!)
ed <- ifelse(!is.na(cg) & cg == 1, 3,
ifelse(!is.na(hs) & hs == 1, 2,
ifelse(!is.na(es) & es == 1, 1,
ifelse(is.na(es), NA, 0))))
cbind(es, hs, cg, ed)
## Does the project director want us to worry about
## logical inconsistencies, such as es = 0 but cg = 1?
## I hope not.
Thanks in advance, I hope you are having a nice summer.
pj
--
Paul E. Johnson
Professor, Political Science Assoc. Director
1541 Lilac Lane, Room 504 Center for Research Methods
University of Kansas University of Kansas
http://pj.freefaculty.org http://quant.ku.edu
[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
-- Joshua Wiley Ph.D. Student, Health Psychology University of California, Los Angeles http://joshuawiley.com/ Senior Analyst - Elkhart Group Ltd. http://elkhartgroup.com
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
-- Joshua Wiley Ph.D. Student, Health Psychology University of California, Los Angeles http://joshuawiley.com/ Senior Analyst - Elkhart Group Ltd. http://elkhartgroup.com
1 day later
An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20130610/58f4e091/attachment.pl>