How to create a new variable based on parts of another character variable: A generalization

An embedded and charset-unspecified text was scrubbed...
Name: not available
URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20111024/eaddc5a9/attachment.pl>
Hi Bert

I am aware of factor features and frankly speaking I consider them quite 
usefull despite of prevalent preference to character vectors. For the OP 
question seems to me that ifelse construction is appropriate, based on his 
statement he has 2 strings which shall be converted to another two strings 
and that he is starting with R. I agree that for more levels to change, 
factor is the way to go.

Regards
Petr
... Well, this works in this simple case, but is too clumsy for a 
general
formulation of this problem:  given a "dictionary" consisting of two 
character vectors of unique "names" (or two columns in a data frame), x 
and y,  how does one convert a factor z with levels in x into the 
corresponding equivalent with levels in y?

There are likely a zillion ways to do this with various packages and 
functions, but the simplest and most straightforward must surely be:  
factor(y[z])
Example:
x <- LETTERS[1:4]
y <- LETTERS[5:8]
z <- factor(sample(x,15, rep=TRUE))
z
 [1] B D A C B A B D A D D A A D B
Levels: A B C D
factor(y[z])
 [1] F H E G F E F H E H H E E H F
Levels: E F G H

This is a nice example of the utility of the factor data structure, 
which
tends to get dissed a lot, because it can badly burn you if you're not 
careful with it.

A fuller discussion of these issues can be found by searching 
on"associative arrays"  or "hashes", of which factors are an elementary 
example.
-- Bert

On Mon, Oct 24, 2011 at 6:00 AM, Petr PIKAL <petr.pikal at precheza.cz> 
wrote:
Hi

If you want to get rid of regular expressions at all and your A values
start AWI for Arctic and UFT for boreal you can

DF$D <- ifelse(substr(DF$A, 1,1) == "A", "Arctic", "Boreal")

Regards
Petr

Hello,
I am just starting with R and I am having a (most probably) stupid
problem
by creating a new variable in a data.frame based on a part of another
character variable.

I have a data frame like this one:

A         B       C
AWI-test1   1      i
AWI-test5   2      r
AWI-tes75   56      z
UFT-2      5      I
UFT56      f      t
UFT356      9j      t
etc. etc.      89      t

I now want to look in the variable A if the string AWI is present and
then
create a variable D and putting "Arctic" inside. However, if the 
string
UFT occurs in the variable A, then the variable D shall be "Boreal" 
etc.
etc.
The resulting data.frame file should look like
A         B       C   D
AWI-test1   1      i   Arctic
AWI-test5   2      r   Arctic
AWI-tes75   56      z   Arctic
UFT-2      5      I   Boreal
UFT56      f      t   Boreal
UFT356      9j      t   Boreal
etc. etc.      89      t

I know how to do this when I want to look for the entire string of A
means
when there is "AWI-test1" and then create the variable D with "Arctic"
but
not how to look only for a substring in A?
Would be great if somebody might help.
Thanks
Philipp

***************************************************

   [[alternative HTML version deleted]]

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

-- 

Bert Gunter
Genentech Nonclinical Biostatistics

Internal Contact Info:
Phone: 467-7374
Website:
http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-
biostatistics/pdb-ncb-home.htm