Hi,
I don't know whether the 'mydata" object was updated or not before you run the table.
mydata <- within(mydata,FNAME_SUSPECT <- FNAME_TOKEN_COUNT >10|FNAME_LENGTH>45|regexpr("9",FNAME_PATTERN)==0)
table(mydata$FNAME_SUSPECT)
#
#FALSE
#?? 50
Now, your second condition (reply to David).
?indx <- with(mydata,FNAME_TOKEN_COUNT >3| FNAME_LENGTH>55|regexpr("9",FNAME_PATTERN)==0)
?indx1 <-? ifelse(mydata$FNAME_TOKEN_COUNT > 3, TRUE,
???????????? ifelse(mydata$FNAME_LENGTH > 55, TRUE,
???????????????????? ifelse(regexpr("9", mydata$FNAME_PATTERN) == 0, TRUE,
?FALSE
?????????????????????????? )
?????????????????????? )
???????????????????? )
?identical(indx,indx1)
#[1] TRUE
A.K.
On Tuesday, February 18, 2014 12:57 PM, Jeff Johnson <mrjefftoyou at gmail.com> wrote:
Hmm, I don't think as constructed the within clause is yielding the desired results. The test case you suggested works. However, if I try another test case:
within(mydata,FNAME_SUSPECT <- FNAME_TOKEN_COUNT >10|FNAME_LENGTH>45|regexpr("9",FNAME_PATTERN)==0)
which I read as if any row has more than 10 tokens, longer than 45 characters OR does not have a number (9), it should assign the result (FALSE in this case) to FNAME_SUSPECT.
table(mydata$FNAME_SUSPECT)
TRUE?
? 50?
On Tue, Feb 18, 2014 at 9:38 AM, arun <smartpink111 at yahoo.com> wrote:
I think it doesn't even need ifelse()
? within(mydata,FNAME_SUSPECT <- FNAME_TOKEN_COUNT >3|FNAME_LENGTH>35|regexpr("9",FNAME_PATTERN)>0)
A.K.
On , arun <smartpink111 at yahoo.com> wrote:
Hi,
Try ?ifelse()
A.K.
On Tuesday, February 18, 2014 12:26 PM, Jeff Johnson <mrjefftoyou at gmail.com> wrote:
I have a subset of data that I have identified as "suspect" (for example,
the first name has excessive spaces, is longer than 35 characters or has a
number).
What I want to do is update the FNAME_SUSPECT field in "mydata" to TRUE if
any of those conditions are met.
Here's my data:
dput(mydata)
structure(list(PERSON_FIRST_NAME = c("1298530", "JULIA, TAYLOR, CS AND
JEFF",
"88", "4465891170098562", "1124211", "LEWIS & MARY KAY", "KARL R O S",
"5466181820076010", "JULI0 C", "WAYNE?? T.", "1124211", "1124211",
"ROBERT B & VIONA D", "DENNIS and MARY SUE", "BRIAN?? JOANNE",
"1124211", "RONALD and? GAIL", "Mike and Mary Lou", "31763006",
"7", "11460735", "Paul and Mary Beth", "JIMMY and RUTH MARIE",
"1124211", "WAYNE & LU ANN", "SCOTT & ANNA MARIE", "1124211",
"1124211", "952714", "DAVID, RHONDA and NATALIE", "VIRGINIA?? S",
"707069", "4397836190001917", "MARIA DE LA LUZ", "MARIA DE LA LUZ",
"G & S COMPUTERIZED GRADING", "1124211", "1124211", "1124211",
"1124211", "MARIA DE LA LUZ", "ED AND JANICE KISHI", "1124211",
"Garrett A. and Jenny E.", "1124211", "1124211", "Hiram T. and A. Judith",
"MA DE LA LUZ", "STEVE, Bev, and Caleb", "MR AND MRS EVER"),
? ? FNAME_SUSPECT = c(FALSE, FALSE, FALSE, FALSE, FALSE, FALSE,
? ? FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE,
? ? FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE,
? ? FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE,
? ? FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE,
? ? FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE),
? ? FNAME_LENGTH = c(7L, 26L, 2L, 16L, 7L, 16L, 10L, 16L, 7L,
? ? 10L, 7L, 7L, 18L, 19L, 14L, 7L, 16L, 17L, 8L, 1L, 8L, 18L,
? ? 20L, 7L, 14L, 18L, 7L, 7L, 6L, 25L, 12L, 6L, 16L, 15L, 15L,
? ? 26L, 7L, 7L, 7L, 7L, 15L, 19L, 7L, 23L, 7L, 7L, 22L, 12L,
? ? 21L, 15L), FNAME_PATTERN = c("9999999", "AAAAA,_AAAAAA,_AA_AAA_AAAA",
? ? "99", "9999999999999999", "9999999", "AAAAA_&_AAAA_AAA",
? ? "AAAA_A_A_A", "9999999999999999", "AAAA9_A", "AAAAA___A.",
? ? "9999999", "9999999", "AAAAAA_A_&_AAAAA_A", "AAAAAA_AAA_AAAA_AAA",
? ? "AAAAA___AAAAAA", "9999999", "AAAAAA_AAA__AAAA", "AAAA_AAA_AAAA_AAA",
? ? "99999999", "9", "99999999", "AAAA_AAA_AAAA_AAAA",
"AAAAA_AAA_AAAA_AAAAA",
? ? "9999999", "AAAAA_&_AA_AAA", "AAAAA_&_AAAA_AAAAA", "9999999",
? ? "9999999", "999999", "AAAAA,_AAAAAA_AAA_AAAAAAA", "AAAAAAAA___A",
? ? "999999", "9999999999999999", "AAAAA_AA_AA_AAA", "AAAAA_AA_AA_AAA",
? ? "A_&_A_AAAAAAAAAAAA_AAAAAAA", "9999999", "9999999", "9999999",
? ? "9999999", "AAAAA_AA_AA_AAA", "AA_AAA_AAAAAA_AAAAA", "9999999",
? ? "AAAAAAA_A._AAA_AAAAA_A.", "9999999", "9999999",
"AAAAA_A._AAA_A._AAAAAA",
? ? "AA_AA_AA_AAA", "AAAAA,_AAA,_AAA_AAAAA", "AA_AAA_AAA_AAAA"
? ? ), FNAME_TOKEN_COUNT = c(1L, 5L, 1L, 1L, 1L, 4L, 4L, 1L,
? ? 2L, 4L, 1L, 1L, 5L, 4L, 4L, 1L, 4L, 4L, 1L, 1L, 1L, 4L, 4L,
? ? 1L, 4L, 4L, 1L, 1L, 1L, 4L, 4L, 1L, 1L, 4L, 4L, 5L, 1L, 1L,
? ? 1L, 1L, 4L, 4L, 1L, 5L, 1L, 1L, 5L, 4L, 4L, 4L)), .Names =
c("PERSON_FIRST_NAME",
"FNAME_SUSPECT", "FNAME_LENGTH", "FNAME_PATTERN", "FNAME_TOKEN_COUNT"
), row.names = c(6717L, 11035L, 11626L, 14965L, 17874L, 24341L,
25582L, 25834L, 26851L, 30134L, 36385L, 45244L, 46947L, 61449L,
67564L, 71465L, 73782L, 75278L, 78977L, 79037L, 80577L, 81644L,
84427L, 86286L, 89963L, 91208L, 94054L, 99518L, 114658L, 128305L,
129082L, 137492L, 137573L, 138556L, 139489L, 148757L, 153956L,
155546L, 160533L, 162386L, 162681L, 165220L, 168063L, 173003L,
175322L, 179935L, 180991L, 181215L, 183787L, 184573L), class = "data.frame")
Note I defaulted all of the FNAME_SUSPECT to FALSE. I plan to change that
later.
I've tried running this:
if(mydata$FNAME_TOKEN_COUNT > 3 | mydata$FNAME_LENGTH > 35 | regexpr("9",
mydata$FNAME_PATTERN) > 0)
? ? ? ? mydata$FNAME_SUSPECT <- TRUE
however I get the error:
Warning message:
In if (mydata$FNAME_TOKEN_COUNT > 3 | mydata$FNAME_LENGTH > 35 |? :
? the condition has length > 1 and only the first element will be used
Would I be better doing this in a for loop? I had once heard that if you're
doing a for loop in R, you're doing something wrong.
--
Jeff
??? [[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Jeff