To all who have offered suggestions: THANKS! Wow, this list has generated a lot of good ideas for me in a very short time, and I appreciate it. For now, I've got some solutions to my problem. Greg's suggestion about creating a subclass to handle the "multi-checkbox" type of question is probably the most flexible, in the long run. However, I've not chosen it in the short run because my programming experience is deeper in the procedural vein than in OOP. I'm only just starting to see how the OO qualities of R can be used, and I'm not yet comfortable in coding that way. Following the suggestions of several people on the list, I have created a few functions that proceed this way for my multi-choice questions: - create a matrix with as many rows as there are responses, and as many columns as there are "checkboxes" in the original question - use strsplit to break up the factors based on the separator inside the field - for each column in the matrix that I created, fill it with T/F (1/0) by using the is.element function to determine which responses had each checkbox checked - use the resulting matrix to create whatever sums, averages and plots I want The code I wrote is not pretty, but is working for me at the moment. I'm an old assembly and C programmer mainly, so I'm still getting used to the capabilities and idioms of R. I think my code does great violence to both and probably makes the interpreter thrash pitifully, but for now it seems to produce the correct result and I can understand it! I'll look for elegance as I go along. --- "Warnes, Gregory R"
<gregory_r_warnes at groton.pfizer.com> wrote:
Hint #1, to do any useful transformations on your
variables you will
probably need to convert them temporarily into
character variables (aka
strings). Do that with
as.character(n$OSUSE)
Probably your will want to convert each of the
variables that are in this
format into a set of numeric variables. Something
like this:
n <- data.frame(OSUSE = c("1","1,3","1,2,3"))
n$OSUSE.Windows <- sapply( strsplit(n$OSUSE, ",")
, function(X) (
"1" %in% X ) )
n$OSUSE.Macintosh <- sapply( strsplit(n$OSUSE, ",")
, function(X) (
"2" %in% X ) )
n$OSUSE.Unix <- sapply( strsplit(n$OSUSE,
",") , function(X) (
"3" %in% X ) )
Alternatively, if you often have variables like
this, you might consider
creating a new object type that extends factor and
that includes the
operations that you need.
Something like:
### Start Sample Code ###
checklist <- function(X, boxnames)
{
attr(X, "boxnames") <- boxnames
class(X) <- c("checklist","factor")
return(X)
}
contains <- function(X, name)
{
if(is.character(name) )
name <- pmatch( name, attr(X,"boxnames" ) )
retval <- sapply( strsplit(X, ",") , function(X)
( name %in% X ) )
return(retval)
}
numchecked <- function(X)
{
retval <- sapply( strsplit(X, ","), length )
return(retval)
}
summary.checklist <- function(x, ...)
{
sum <- apply( as.matrix(x), 2, sum )
mean <- apply( as.matrix(x), 2, mean )
return( rbind(sum,mean))
}
as.matrix.checklist <- function(x, ...)
{
sapply( attr(x, "boxnames"), function(YY)
contains(x, YY) )
}
### End Sample Code ##
Here's some examples of using these functions:
n <- data.frame(OSUSE = c("1","1,3","1,2,3"))
n$OSUSE <- checklist(n$OSUSE,
c("Windows","Macintosh","Unix"))
#
# Check if OSUSE includes a specific OS
#
contains( n$OSUSE, "Windows")
[1] TRUE TRUE TRUE
contains( n$OSUSE, "Macintosh")
[1] FALSE FALSE TRUE
contains( n$OSUSE, "Unix")
[1] FALSE TRUE TRUE
# # Compute the average number of checked items #
numchecked(n$OSUSE)
[1] 1 2 3
mean(numchecked(n$OSUSE))
[1] 2
# # Create a matrix showing whether each box was checked or not #
as.matrix(n$OSUSE)
Windows Macintosh Unix [1,] TRUE FALSE FALSE [2,] TRUE FALSE TRUE [3,] TRUE TRUE TRUE
# # Show some summary info #
summary(n$OSUSE)
Windows Macintosh Unix sum 3 1.0000000 2.0000000 mean 1 0.3333333 0.6666667 Of course, you'll want to modify these classes to suit your needs. A little time up front can help a lot. If you like, I'll include these classes and any enhancements that you make in my 'gregmisc' library. -Greg
-----Original Message----- From: Tom Arnold
[mailto:thomas_l_arnold at yahoo.com]
Sent: Friday, March 29, 2002 8:59 AM To: R Subject: [R] Newbie struggling with "factors" I am processing some survey results, and my data
are
being read in as "factors". I don't know how to process these things in any way. To start with, several of the survey questions are mulit-choice check boxes on the original
(web-based)
survey, as in "check all that apply". These are encoded as numbers. For example, if the survey has a question: Which operating systems have you used? (Check all
that
apply) [ ]Windows [ ]Macinotsh [ ]Unix ...then the data exported for three different responses might look like ;1; ;1,3; ;1,2,3; ...where ";" is the field delimiter. I use read.table to get the data in. I read all
the
survey data into a table "n" and the field above
is
called "OSUSE". When I query R about the field, it tells me it is class "factor"
class(n$OSUSE)
[1] "factor"
mode(n$OSUSE)
[1] "numeric" I'd like to be able to do some simple things like: what is the most common item checked (1, 2, or 3?) What is the average number of boxes checked? But I can't find any way to manipulate this
"factor"
field. What's the secret? Thanks. ===== Tom Arnold Summit Media Partners Visit our web site at
__________________________________________________ Yahoo! Greetings - send holiday greetings for
Easter, Passover
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.
-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe"
=== message truncated === ===== Tom Arnold Summit Media Partners Visit our web site at http://www.summitmediapartners.com __________________________________________________ Yahoo! Greetings - send holiday greetings for Easter, Passover -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._