Skip to content
Prev 200085 / 398506 Next

Transforming a dataframe into a response/predictor matrix

Ki L. Matlock wrote:
Thanks for providing test data-- however this sort of format is difficult to
work with as email tends to mangle the line wrapping.  It took me about 5
minutes and the combined powers of Vim and OpenOffice to reflow and
re-export the example into a format that R could ingest.  And I probably
made a mistake somewhere along the way.  Nothing wrong with providing data
like this-- but it probably limits the number of people who are willing to
give your problem a try.

A good way to share test data frames if they contain a lot of rows/columns
is to dump them using the dput() function.  This encodes the data in the
following format:


structure(list(Lastname = structure(1:2, .Label = c("alastname", 
"anotherlastname"), class = "factor"), Firstname = structure(1:2, .Label =
c("afirstname", 
"anotherfirstname"), class = "factor"), CATALOG_NBR = c(1213L, 
1213L), Email = structure(1:2, .Label = c("*@uark.edu", "**@uark.edu"
), class = "factor"), StudentID = structure(c(2L, 1L), .Label = c("##", 
"10295236"), class = "factor"), EMPLID = structure(1:2, .Label = c("#", 
"10295236"), class = "factor"), Start = structure(c(14215, 14125
), class = "Date"), Xattempts = c(1L, 1L), Q1 = c(1L, 1L), Q2 = c(1L, 
1L), Q3 = 0:1, Q4 = 0:1, Q5 = 0:1, Q6 = c(0L, 0L), Q7 = 0:1, 
    Q8 = c(0L, 0L), Q9 = c(0L, 0L), Q10 = c(1L, 1L), Q11 = 0:1, 
    Q12 = c(0L, 0L), Q13 = c(1L, 0L), Q14 = c(1L, 1L), Q15 = c(0L, 
    0L), Q16 = c(1L, 0L), Q17 = c(1L, 0L), Q18 = c(0L, 0L), Q19 = c(1L, 
    1L), Q20 = c(0L, 0L), Q21 = c(0L, 0L), Q22 = c(0L, 0L), Q23 = c(0L, 
    0L), Q24 = c(0L, 0L), Q25 = c(0L, 0L), Q26 = c(0L, 0L), Q27 = c(0L, 
    0L), Q28 = c(0L, 0L), Q29 = c(1L, 0L), Q30 = 0:1, Q31 = 0:1, 
    Q32 = c(0L, 0L), Score = c(9L, 13L), Form = structure(1:2, .Label =
c("E", 
    "G"), class = "factor"), CRSE_GRADE_OFF = structure(c(1L, 
    1L), .Label = "D", class = "factor")), .Names = c("Lastname", 
"Firstname", "CATALOG_NBR", "Email", "StudentID", "EMPLID", "Start", 
"Xattempts", "Q1", "Q2", "Q3", "Q4", "Q5", "Q6", "Q7", "Q8", 
"Q9", "Q10", "Q11", "Q12", "Q13", "Q14", "Q15", "Q16", "Q17", 
"Q18", "Q19", "Q20", "Q21", "Q22", "Q23", "Q24", "Q25", "Q26", 
"Q27", "Q28", "Q29", "Q30", "Q31", "Q32", "Score", "Form", "CRSE_GRADE_OFF"
), row.names = c(NA, -2L), class = "data.frame")


Not very pretty, but this format is more resistant to email mangling and can
generally be copied/pasted into an R session-- saves all the monkey business
with Vim/OpenOffice/Excel/whatever.
Ki L. Matlock wrote:
The melt() function from Hadley Wickham's 'reshape' package can probably
take care of this for you.  Assuming the data.frame is named "studentData",
the following might process your data the way you want it:

  require( reshape )

  # Retrieve the names of all columns holding responses to questions.
  questions <- names( studentData )[ grep( '^[Q]', names( studentData ) ) ]

  testBreakdown <- melt( studentData, c( 'StudentID', 'Form'), questions,
variable_name = 'Question' )


The first argument after the name of the data set specifies the names of
those columns that we wish to use in order to categorize the data. The
second argument specifies the names of columns that contain the data we are
interested in. testBreakdown is now a data.frame containing:

  A column labeled "StudentID"-- contains the ID of the student.
  A column labeled "Form" -- contains the code of the form they used.
  A column labeled "Question" -- contains the name of the question they
answered.  The default name for this column is "variable", but I overrode it
by setting variable_name in the above call to melt().
  A column labeled "value"-- contains the result of the student's answer to
the given question.

I was not able to figure out which part of your data.frame contained
information concerning the "s-th" test taken by a student-- maybe it got
lost in translation.  Anyway, if the column names and order you gave above
are important, then all you need to do is rename and reorder the columns of
testBreakdown.


Hope this helps!

-Charlie

-----
Charlie Sharpsteen
Undergraduate
Environmental Resources Engineering
Humboldt State University