Skip to content
Prev 343768 / 398498 Next

Turn Rank Ordering Into Numerical Scores By Transposing A Data Frame

The big difference between the data sets is that many of your rows (16) have all missing values. None of mine do. If you run my data and yours, you will see that decast throws a warning "Aggregation function missing: defaulting to length" with your data but not with mine. As a result, instead of using the value of rank, dcast uses length(rank) which is always 1 except when there are multiple missing values when it is the number of missing values. This problem will occur whenever there is more than one missing value on a row. The simplest way to handle this is to create a function that returns the first value of a vector and use that with the fun.aggregate= argument:
The only drawback is that this will not warn you if a category was ranked twice except that the NA column will be zero and one of the other columns will be zero. The number of missing values is the number of zeroes in your category columns (not including row or NA) and the value in NA is the lowest rank that was missing.

David C

-----Original Message-----
From: Simon Kiss [mailto:sjkiss at gmail.com] 
Sent: Friday, September 5, 2014 10:22 AM
To: David L Carlson
Cc: r-help at r-project.org
Subject: Re: [R] Turn Rank Ordering Into Numerical Scores By Transposing A Data Frame

HI, of course.

The a mini-version of my data-set is below, stored in d2. Then the code I'm working follows.
library(reshape2)
#Create d2
structure(list(row = 1:50, rank1 = structure(c(3L, 3L, 3L, 4L, 
3L, 3L, NA, NA, 3L, NA, 3L, 3L, 1L, NA, 2L, NA, 3L, NA, 2L, 1L, 
1L, 3L, NA, 6L, NA, 1L, NA, 3L, 1L, NA, 1L, NA, NA, 6L, 3L, NA, 
1L, 3L, 3L, 4L, 1L, NA, 3L, 3L, 3L, NA, 3L, 3L, NA, 1L), .Label = c("accessible", 
"alternatives", "information", "responsive", "social", "technical", 
"trade"), class = "factor"), rank2 = structure(c(6L, 1L, 1L, 
2L, 4L, 6L, NA, NA, 6L, NA, 6L, 4L, 2L, NA, 4L, NA, 6L, NA, 1L, 
6L, 3L, 2L, NA, 3L, NA, 6L, NA, 6L, 6L, NA, 3L, NA, NA, 3L, 6L, 
NA, 6L, 6L, 6L, 7L, 3L, NA, 1L, 6L, 6L, NA, 2L, 6L, NA, 2L), .Label = c("accessible", 
"alternatives", "information", "responsive", "social", "technical", 
"trade"), class = "factor"), rank3 = structure(c(1L, 6L, 4L, 
3L, 2L, 4L, NA, NA, 4L, NA, 1L, 1L, 6L, NA, 1L, NA, 1L, NA, 7L, 
3L, 6L, 1L, NA, 2L, NA, 4L, NA, 1L, 3L, NA, 6L, NA, NA, 4L, 2L, 
NA, 7L, 1L, 1L, 6L, 7L, NA, 6L, 1L, 1L, NA, 4L, 1L, NA, 3L), .Label = c("accessible", 
"alternatives", "information", "responsive", "social", "technical", 
"trade"), class = "factor"), rank4 = structure(c(7L, 4L, 2L, 
1L, 1L, 7L, NA, NA, 1L, NA, 7L, 2L, 7L, NA, 3L, NA, 2L, NA, 3L, 
4L, 5L, 6L, NA, 4L, NA, 3L, NA, 4L, 4L, NA, 4L, NA, NA, 2L, 7L, 
NA, 2L, 2L, 2L, 3L, 6L, NA, 2L, 5L, 4L, NA, 1L, 2L, NA, 4L), .Label = c("accessible", 
"alternatives", "information", "responsive", "social", "technical", 
"trade"), class = "factor"), rank5 = structure(c(2L, 7L, 6L, 
7L, 7L, 2L, NA, NA, 2L, NA, 2L, 7L, 3L, NA, 6L, NA, 7L, NA, 6L, 
7L, 4L, 7L, NA, 7L, NA, 7L, NA, 2L, 2L, NA, 2L, NA, NA, 7L, 1L, 
NA, 3L, 7L, 4L, 2L, 2L, NA, 4L, 2L, 2L, NA, 6L, 4L, NA, 5L), .Label = c("accessible", 
"alternatives", "information", "responsive", "social", "technical", 
"trade"), class = "factor"), rank6 = structure(c(4L, 2L, 7L, 
6L, 6L, 1L, NA, NA, 7L, NA, 4L, 5L, 4L, NA, 7L, NA, 4L, NA, 4L, 
2L, 2L, 4L, NA, 1L, NA, 2L, NA, 7L, 7L, NA, 7L, NA, NA, 1L, 4L, 
NA, 4L, 4L, 7L, 1L, 4L, NA, 7L, 7L, 7L, NA, 7L, 7L, NA, 7L), .Label = c("accessible", 
"alternatives", "information", "responsive", "social", "technical", 
"trade"), class = "factor"), rank7 = structure(c(5L, 5L, 5L, 
5L, 5L, 5L, NA, NA, 5L, NA, 5L, 6L, 5L, NA, 5L, NA, 5L, NA, 5L, 
5L, 7L, 5L, NA, 5L, NA, 5L, NA, 5L, 5L, NA, 5L, NA, NA, 5L, 5L, 
NA, 5L, NA, 5L, 5L, 5L, NA, 5L, 4L, 5L, NA, 5L, 5L, NA, 6L), .Label = c("accessible", 
"alternatives", "information", "responsive", "social", "technical", 
"trade"), class = "factor")), .Names = c("row", "rank1", "rank2", 
"rank3", "rank4", "rank5", "rank6", "rank7"), row.names = c(NA, 
50L), class = "data.frame")


#This code is a replication of David Carlson's code (below) which works splendidly, but does not work on my data-set
#Melt d2: Note, I've used value.name='color' to maximize comparability with David's suggestion
d3 <- melt(d2, id.vars=1, measure.vars=2:8, variable.name="rank",value.name="color")
#Make Rank Variable Numeric
d3$rank<-as.numeric(d3$rank)
#Recast d3 into d4
d4<- dcast(d3, row~color,value.var="rank", fill=0)
#Note that d4 appears to provide a binary variable for one if a respondent checked the option, but does not provide information as to which rank they assigned each option, but also seems to summarize the number of missing values

#David Carlson's Code
mydf <- data.frame(t(replicate(100, sample(c("red", "blue",  "green", "yellow", NA), 4))))
mydf <- data.frame(rows=1:100, mydf)
colnames(mydf) <- c("row", "rank1", "rank2", "rank3", "rank4")
mymelt <- melt(mydf, id.vars=1, measure.vars=2:5, variable.name="rank", value.name="color")
mymelt$rank <- as.numeric(mymelt$rank)
mycast <- dcast(mymelt, row~color, value.var="rank", fill=0)

#Compare
str(mydf)
str(d2)
head(mycast)
head(d4)

Again, I'm grateful for assistance. I can't understand what how my data-set differs from David's sample data-set.
Simon Kiss
On Sep 4, 2014, at 2:35 PM, David L Carlson <dcarlson at tamu.edu> wrote:

            
*********************************
Simon J. Kiss, PhD
Assistant Professor, Wilfrid Laurier University
73 George Street
Brantford, Ontario, Canada
N3T 2C9
Cell: +1 905 746 7606