Skip to content

creating a derived variable in a data frame

2 messages · Avram Aelony, Martin Henry H. Stevens

#
Hello,

I have read through the manuals and can't seem to find an answer.

I have a categorical, character variable that has hundreds of values.  I want to group the existing values of this variable into a new, derived (categorical) variable by applying conditions to the values in the data.

For example, suppose I have a data frame with variables: date, country, x, y, and z.  

x,y,z are numeric and country is a 2-digit character string.  I want to create a new derived variable named "continent" that would also exist in the data frame. The Continent variable would have values of "Asia", "Europe", "North America", etc...   

How would this best be done for a large dataset (>10MB) ?  
I have tried many variations on following without success (note in a real example I would have a longer list of countries and continent values):
I have read about factors, but I am not sure how they apply here.  

Can anyone help me with the syntax?  I am sure it is trivial and a common thing to do.
The ultimate goal is to compute percentages of x by continent.

Thanks for any help in advance.

-Avram
#
Hi Avram-
How many countries do you have?
I would do it the following way because it is simple and I don't know  
any better, even if it is  absurdly painstaking.

#Step 1
mydata$continent <- factor(NA, levels=c("NoAm","Euro"))

#Steps 2 a-z
mydata$continent[mydata$country=="US" |
                                 mydata$country=="CA" |
                                mydata$country=="MX" ]  <- "NoAm"

#Repeat for all countries and continents.

Hank
On Oct 19, 2005, at 8:09 PM, Avram Aelony wrote:

            
Dr. Martin Henry H. Stevens, Assistant Professor
338 Pearson Hall
Botany Department
Miami University
Oxford, OH 45056

Office: (513) 529-4206
Lab: (513) 529-4262
FAX: (513) 529-4243
http://www.cas.muohio.edu/~stevenmh/
http://www.muohio.edu/ecology/
http://www.muohio.edu/botany/
"E Pluribus Unum"