An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20111024/eaddc5a9/attachment.pl>
How to create a new variable based on parts of another character variable: A generalization
2 messages · Bert Gunter, PIKAL Petr
Hi Bert I am aware of factor features and frankly speaking I consider them quite usefull despite of prevalent preference to character vectors. For the OP question seems to me that ifelse construction is appropriate, based on his statement he has 2 strings which shall be converted to another two strings and that he is starting with R. I agree that for more levels to change, factor is the way to go. Regards Petr
... Well, this works in this simple case, but is too clumsy for a
general
formulation of this problem: given a "dictionary" consisting of two character vectors of unique "names" (or two columns in a data frame), x and y, how does one convert a factor z with levels in x into the corresponding equivalent with levels in y? There are likely a zillion ways to do this with various packages and functions, but the simplest and most straightforward must surely be:
factor(y[z])
Example:
x <- LETTERS[1:4] y <- LETTERS[5:8] z <- factor(sample(x,15, rep=TRUE)) z
[1] B D A C B A B D A D D A A D B Levels: A B C D
factor(y[z])
[1] F H E G F E F H E H H E E H F Levels: E F G H This is a nice example of the utility of the factor data structure,
which
tends to get dissed a lot, because it can badly burn you if you're not careful with it. A fuller discussion of these issues can be found by searching on"associative arrays" or "hashes", of which factors are an elementary
example.
-- Bert
On Mon, Oct 24, 2011 at 6:00 AM, Petr PIKAL <petr.pikal at precheza.cz>
wrote:
Hi If you want to get rid of regular expressions at all and your A values start AWI for Arctic and UFT for boreal you can DF$D <- ifelse(substr(DF$A, 1,1) == "A", "Arctic", "Boreal") Regards Petr
Hello, I am just starting with R and I am having a (most probably) stupid
problem
by creating a new variable in a data.frame based on a part of another character variable. I have a data frame like this one: A B C AWI-test1 1 i AWI-test5 2 r AWI-tes75 56 z UFT-2 5 I UFT56 f t UFT356 9j t etc. etc. 89 t I now want to look in the variable A if the string AWI is present and
then
create a variable D and putting "Arctic" inside. However, if the
string
UFT occurs in the variable A, then the variable D shall be "Boreal"
etc.
etc.
The resulting data.frame file should look like A B C D AWI-test1 1 i Arctic AWI-test5 2 r Arctic AWI-tes75 56 z Arctic UFT-2 5 I Boreal UFT56 f t Boreal UFT356 9j t Boreal etc. etc. 89 t I know how to do this when I want to look for the entire string of A
means
when there is "AWI-test1" and then create the variable D with "Arctic"
but
not how to look only for a substring in A? Would be great if somebody might help. Thanks Philipp *************************************************** [[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code. -- Bert Gunter Genentech Nonclinical Biostatistics Internal Contact Info: Phone: 467-7374 Website: http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb- biostatistics/pdb-ncb-home.htm