Skip to content

categorical variables

5 messages · Luis Silva, Clément Calenge, Spencer Graves +1 more

#
Dear helpers

I constructed a data frame with this structure
`data.frame':   485 obs. of  16 variables:
 $ Emissor         : int  1 1 1 1 1 1 1 1 1 1 ...
 $ Marisca.Rio     : int  1 1 1 1 1 1 1 1 1 1 ...
 $ Per?odo         : int  1 1 1 1 1 1 1 1 1 1 ...
 $ Reproducao      : int  3 3 3 3 3 3 3 3 3 3 ...
 $ Estacao         : int  2 2 2 2 2 2 2 2 2 2 ...
 $ X30cm           : int  1 1 1 1 1 1 1 1 1 1 ...
 $ Dir.mov         : int  0 0 0 0 0 0 0 0 0 0 ...
 $ Temp.media      : num  12.0 11.3 12.1 12.3 12.4 ...
 $ Temp.amplitude  : num   1.167 -0.750  0.875  0.125 ...
 $ Vel.media       : num  0.479 0.514 0.517 0.445 0.468 ...
 $ Vel.amplitude   : num  -0.04865  0.03417  0.00312 0.02364 ...
 $ Caudal.medio    : num  0.570 0.585 0.589 0.485 0.501 ...
 $ Caudal.amplitude: num  -0.04323  0.01449  0.00405 0.01617 ...
 $ Prof.media      : num  36.1 34.6 34.1 32.9 32.1 ...
 $ Prof.amplitude  : num   0.458 -1.500 -0.500 -1.250 -0.750 ...
 $ Movimento       : num  0 0 0 0 0 0 0 0 0 0 ...

My problem is that the first 7 variables are in fact 
categorical, some of them of the type Present/Absent. R is 
taking them as integer but I want categorical. How can I solve 
this problem? I searched data.frame help but I didn't found any 
parameter to set some variables to categorical


thank you
luis
--


http://adsl.sapo.pt
#
At 14:49 14/04/2003 +0100, Luis Silva wrote:
see ?factor:

for (i in 1:7) dados1[,i]<-factor(dados1[,i])

hope this helps,

Clem.
#
What do you want to do with the categorical variables?

summary(data.fr) or sapply(data.fr[,1:7], table) will get you started. 
glm for model fitting, etc.

Spencer Graves
Luis Silva wrote:
#
I want to fit an rpart model (regression tree). I transformed 
those variables to factors already and it worked. The problem 
is that when I plot the tree the categorical variables came 
like Emissor=acde, that I suppose it is a code for the numbers 
1,3,4,5. I think I can force R to plot Emissor=1345 or 
something like that

luis

} What do you want to do with the categorical variables?
} 
} summary(data.fr) or sapply(data.fr[,1:7], table) will
} get you started. 
} glm for model fitting, etc.
} 
} Spencer Graves
}
} Luis Silva wrote:
} > Dear helpers
} > 
} > I constructed a data frame with this structure
} > 
} > 
} >>str(dados1)
} > 
} > `data.frame':   485 obs. of  16 variables:
} >  $ Emissor         : int  1 1 1 1 1 1 1 1 1 1 ...
} >  $ Marisca.Rio     : int  1 1 1 1 1 1 1 1 1 1 ...
} >  $ Per?odo         : int  1 1 1 1 1 1 1 1 1 1 ...
} >  $ Reproducao      : int  3 3 3 3 3 3 3 3 3 3 ...
} >  $ Estacao         : int  2 2 2 2 2 2 2 2 2 2 ...
} >  $ X30cm           : int  1 1 1 1 1 1 1 1 1 1 ...
} >  $ Dir.mov         : int  0 0 0 0 0 0 0 0 0 0 ...
} >  $ Temp.media      : num  12.0 11.3 12.1 12.3 12.4
} ...
} >  $ Temp.amplitude  : num   1.167 -0.750  0.875 
} 0.125 ...
} >  $ Vel.media       : num  0.479 0.514 0.517 0.445
} 0.468 ...
} >  $ Vel.amplitude   : num  -0.04865  0.03417 
} 0.00312 0.02364 ...
} >  $ Caudal.medio    : num  0.570 0.585 0.589 0.485
} 0.501 ...
} >  $ Caudal.amplitude: num  -0.04323  0.01449 
} 0.00405 0.01617 ...
} >  $ Prof.media      : num  36.1 34.6 34.1 32.9 32.1
} ...
} >  $ Prof.amplitude  : num   0.458 -1.500 -0.500
} -1.250 -0.750 ...
} >  $ Movimento       : num  0 0 0 0 0 0 0 0 0 0 ...
} > 
} > My problem is that the first 7 variables are in
} fact 
} > categorical, some of them of the type
} Present/Absent. R is 
} > taking them as integer but I want categorical. How
} can I solve 
} > this problem? I searched data.frame help but I
} didn't found any 
} > parameter to set some variables to categorical
} > 
} > 
} > thank you
} > luis
} > --
} > 
} > 
} > http://adsl.sapo.pt
} > 
} > ______________________________________________
} > R-help at stat.math.ethz.ch mailing list
} >
} https://www.stat.math.ethz.ch/mailman/listinfo/r-help
} 
} 
} 

--


http://adsl.sapo.pt
#
On Mon, 14 Apr 2003, Luis Silva wrote:

            
?text.rpart says

  pretty: an integer denoting the extent to which factor levels in
          split labels will be abbreviated.  A value of (0) signifies
          no abbreviation.  A `NULL', the default, signifies using
          elements of letters to represent the different factor levels. 

so the answer is right there on the help page.