Skip to content

Creating dummy variables in r

3 messages · Joseph Norman Thomson, Bert Gunter, Peter Dalgaard

#
Hello,

Semi-new r user here and still learning the ropes. I am creating dummy
variables for a dataset on stock prices in r. One dummy variable is
called prev1 and is:

prev1 <- ifelse(ret1 >= .5, 1, 0)

where ret1 is the previous day's return.

The variable "prev1" is created fine and works in my regression model
and for running conditional statistics. However, when I call the
names() function on the dataset the freshly created variable (prev1)
doesn't show up; also, when I export the dataset the prev1 variable
doesn't show up in the exported file. Is there a way to make the
variable show up on both the call function but more importantly on the
exported file? Or am I forced to create dummy variables elsewhere(much
tougher)?


Thanks,

Joe
#
You almost never need dummy variables in R. R creates them
automatically from factors given model and possibly contrasts
specification.

?contrasts  ## for some technical details.

If you have not read "An Introduction to R" do so now. Pay particular
attention to the chapter on modeling and categorical variables. You
can also google around to find appropriate tutorials. Here is one:

http://www.ats.ucla.edu/stat/r/modules/dummy_vars.htm

I repeat: DO not create dummy variablesby hand in R unless you have
understood the above and have good reason to do so.

-- Bert

On Tue, Jan 29, 2013 at 7:21 PM, Joseph Norman Thomson
<thomsonj at email.arizona.edu> wrote:

  
    
#
On Jan 30, 2013, at 04:58 , Bert Gunter wrote:

            
In this case it's a cutpoint-type situation, and the user might be excused for not wanting to deal with the mysteries of cut() (yet). 

More importantly, the main issue here seems to be a lack of understanding of where new variables are located. I.e., if the data set is called dd, you need

dd$prev1 <- (etc)

and if you use attach(), do it _after_ modifying the data (or detach() and reattach).

Otherwise, new variables end up in the global environment. (This is logical enough once you realize that the result of a computation does not necessarily "fit" into the dataset.)

By the way: You don't need ifelse(): as.numeric(ret1 >= .5) or even just (ret1 >= .5) works.