-----Original Message-----
From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-
project.org] On Behalf Of arun
Sent: Tuesday, October 16, 2012 12:09 PM
To: Lopez, Dan
Cc: R help
Subject: Re: [R] List of Levels for all Factor variables
HI,
You can also try this:
set.seed(1)
dat1<-
data.frame(col1=factor(sample(1:25,10,replace=TRUE)),col2=sample(letter
s[1:10],10,replace=TRUE),col3=factor(rep(1:5,each=2)))
sapply(lapply(mapply(c,lapply(names(sapply(dat1,levels)),function(x)
x),sapply(dat1,levels)),function(x) paste(x[1],":",paste(x[-
1],collapse=" "))),print)
#[1] "col1 : 2 6 7 10 15 16 17 23 24"
#[1] "col2 : b c d e g h j"
#[1] "col3 : 1 2 3 4 5"
#[1] "col1 : 2 6 7 10 15 16 17 23 24" "col2 : b c d e g h j"
#[3] "col3 : 1 2 3 4 5"
A.K.
----- Original Message -----
From: "Lopez, Dan" <lopez235 at llnl.gov>
To: "R help (r-help at r-project.org)" <r-help at r-project.org>
Cc:
Sent: Tuesday, October 16, 2012 11:19 AM
Subject: [R] List of Levels for all Factor variables
Hi,
I want to get a clean succinct list of all levels for all my factor
variables.
I have a dataframe that's something like #1 below. This is just an
example subset of my data and my actual dataset has 70 variables. I
know how to narrow down my list of variables to just my factor
variables by using #2 below (thanks to Bert Gunter). I can also get
list of all levels for all my factor variables using #3 below. But I
what I want to find out is if there is a way to get this list in a
similar fashion to what the str function returns: without all the extra
spacing and carriage returns. That's what I mean by "clean succinct
list".
BTW I also tried playing around with several of the parameters for the
str function itself but could not find a way to accomplish what I want
to accomplish.
1.? ? ? DATAFRAME
'data.frame':? 11868 obs. of? 26 variables:
$ EMPLID? ? ? ? ? : int? 431108 32709 19730 10850 48786 2004 237628 558
3423 743175 ...
$ NAME? ? ? ? ? ? : Factor w/ 6402 levels "Aaron Cathy E",..: 2777 242
161 104 336 4254 1595 1244 3669 4760 ...
$ TRAIN? ? ? ? ? : int? 1 1 1 1 1 1 1 1 1 1 ...
$ TARGET? ? ? ? ? : int? 0 0 0 0 0 0 0 0 0 0 ...
$ APPT_TYP_CD_LL? : Factor w/ 3 levels "FX","IN","IP": 2 2 2 2 2 2 2 2
2 2 ...
$ ORG_NAM_LL? ? ? : Factor w/ 18 levels "Business","Chief Financial
Officer",..: 11 7 7 9 4 4 18 18 8 4 ...
$ NEW_DISCIPLINE? : Factor w/ 15 levels "100s","300s",..: 14 6 4 1 11
11 14 2 1 1 ...
$ SERIES? ? ? ? ? : Factor w/ 10 levels "100s","300s",..: 9 6 4 1 9 9 9
2 1 1 ...
$ AGE? ? ? ? ? ? : int? 62 53 46 62 55 59 50 36 34 53 ...
$ SERVICE? ? ? ? : int? 13 29 16 26 18 9 19 11 8 26 ...
$ AGE_SERVICE? ? : int? 75 82 62 87 73 69 69 47 42 79 ...
$ HIEDUCLV? ? ? ? : Factor w/ 6 levels "Associate","Bachelor",..: 5 6 6
6 5 2 3 2 2 1 ...
$ GENDER? ? ? ? ? : Factor w/ 2 levels "F","M": 2 2 2 1 2 2 2 2 2 1 ...
$ RETCD? ? ? ? ? : Factor w/ 2 levels "TCP1","TCP2": 2 1 2 2 2 1 1 2 1
2 ...
$ FLSASTATUS? ? ? : Factor w/ 2 levels "E","N": 1 2 2 1 1 1 1 1 1 1 ...
$ MONTHLY_RT? ? ? : int? 17640 6932 5845 9809 11473 8719 19190 8986
7231 6758 ...
$ RETSTATUSDERIVED: Factor w/ 4 levels "401K","DOUBLE DIPPERS",..: 2 4
3 2 3 4 4 3 4 3 ...
$ ETHNIC_GRP_CD? : Factor w/ 8 levels "AMIND","ASIAN",..: 8 8 8 8 8 8
8 8 8 8 ...
$ COMMUTE_BIN? ? : Factor w/ 7 levels "","<15","15 - 24",..: 5 7 2 2 4
3 3 6 3 2 ...
$ EEO_CLASS? ? ? : Factor w/ 4 levels "M","S1","S2",..: 1 2 4 4 4 4 1
2 4 2 ...
$ WRK_SCHED? ? ? : Factor w/ 6 levels "12HR","4/10s",..: 3 3 3 3 3 3 3
3 4 4 ...
$ FWT_MAR_STATUS? : Factor w/ 2 levels "M","S": 1 1 1 1 2 1 1 1 1 2 ...
$ COVERED_DP? ? ? : int? 2 2 4 0 1 3 1 2 0 0 ...
$ YRS_IN_SERIES? : int? 13 29 16 26 18 9 19 3 7 26 ...
$ SAVINGS_PCT? ? : int? 10 0 6 19 8 0 10 15 15 18 ...
$ Generation? ? ? : Factor w/ 4 levels "Baby Boomers",..: 1 1 2 1 1 1 1
2 2 1 ...
2. Create mydataF to only include factor variables (and exclude NAME
which I am not interested in)
mydataF<-mydata[,sapply(mydata,function(x)is.factor(x))][,-1]
3. Get a list of all levels
sapply(mydataF,function(x)levels(x))
$APPT_TYP_CD_LL
[1] "FX" "IN" "IP"
$ORG_NAM_LL
[1] "Business"? ? ? ? ? ? ? ? ? ? ? ? "Chief Financial Officer"
"Chief Information Office"? ? ? ? "Computation"
"Engineering"? ? ? ? ? ? ? ? ? ? "ESH and Quality"
[7] "Facilities and Infrastructure"? "Global Security"
"NIF"? ? ? ? ? "NO"? ? ? ? ? ? ? "Office of the Director"
"Operations and Business Office"
[13] "Physical and Life Sciences"? ? ? "Planning and Financial
Services" "ST"? "Security Organization"? ? ? ? ? "Strategic Human
Resources Mgmt"? "WCI"
$NEW_DISCIPLINE
[1] "100s"? ? ? ? ? ? ? ? ? ? ? "300s"? ? ? ? ? ? ? ? ? ? ? "400s"
? ? ? ? ? ? ? ? ? "500s"? ? ? ? ? ? ? ? ? ? ? "600s"
? ? "800s"? ? ? ? ? ? ? ? ? ? ? "900s"
[8] "Chem? Science"? ? ? ? ? ? ? "Engineering"? ? ? ? ? ? ? ? "Life
Sciences"? ? ? ? ? ? ? "Math? Computer Science? IT" "Physics"
? ? ? ? ? "pre100s"? ? ? ? ? ? ? ? ? ? "PSTS Other"
[15] "Re"
$SERIES? ......
Daniel Lopez
Workforce Analyst
HRIM - Workforce Analytics & Metrics
??? [[alternative HTML version deleted]]