An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20110926/d6be07e3/attachment.pl>
Restructuring data - unstack, reshape?
2 messages · Jen M, Dennis Murphy
Hi:
Here's one approach using the reshape() function in base R:
# Read in your data:
d <- read.table(textConnection("
Candidate.ID Specialty Office Score
110002 C London 47
110002 C East 48
110003 RM West 45
110003 RM Southwest 39
110003 C Southwest 38
110004 H South 42
110006 G East 47
110006 G London 45"), header = TRUE)
closeAllConnections()
# Create a variable to distinguish positions
d$Position <- c(1, 2, 1, 2, 1, 1, 1, 2)
reshape(d, idvar = 'Candidate.ID', timevar = 'Position',
v.names = c('Specialty', 'Office', 'Score'),
direction = 'wide')
HTH,
Dennis
On Sun, Sep 25, 2011 at 6:47 PM, Jen M <jmstatshelp at gmail.com> wrote:
Hi all, I'm having a problem restructuring my data the way I'd like it. ?I have data that look like this: ?Candidate.ID ? ? ? ?Specialty ? ? ? Office ? ? ? ? Score ? ? ? 110002 ? ? ? ? C ? ? ? ? ? ? ? ? ?London ? ? ? ? ? 47 ? ? ? 110002 ? ? ? ? C ? ? ? ? ? ? ? ? ?East ? ? ? ? ? ? ?48 ? ? ? 110003 ? ? ? ? RM ? ? ? ? ? ? ? West ? ? ? ? ? ? ?45 ? ? ? 110003 ? ? ? ? RM ? ? ? ? ? ? ? Southwest ? ? ?39 ? ? ? 110003 ? ? ? ? C ? ? ? ? ? ? ? ? ?Southwest ? ? ?38 ? ? ? 110004 ? ? ? ? H ? ? ? ? ? ? ? ? ?South ? ? ? ? ? ? 42 ? ? ? 110006 ? ? ? ? G ? ? ? ? ? ? ? ? ?East ? ? ? ? ? ? ? 47 ? ? ? 110006 ? ? ? ? G ? ? ? ? ? ? ? ? ?London ? ? ? ? ? 45 Candidates can apply for the same job specialty in up to 2 offices (never more). ?They can apply for different specialties in further centres. ?I would like to look at score differences when candidates apply for the same specialty in two different offices. ?With the help of the archives I have tried various stack/unstack and reshape/melt/cast combinations, and I've managed to get a huge matrix where the columns are all possible combinations of Specialties & Offices - and there are many. ?This leaves a very sparse matrix ?with mainly null values, and this is not what I want. ?I'd like the scores from the two attempts in two columns so I can do scatterplots, calculate differences by specialty etc. In SPSS I'd use 'restructure' to get what I want. ?I'm working to order with specific requests here so I have to do it this way (as opposed to a modelling approach). I would like it restructured to look something like this: ?Candidate.ID ? ? ? ?Specialty ? ? ? Office.1 ? ? ? Score.1 ? ? ?Office.2 Score.2 ? ? ? 110002 ? ? ? ? C ? ? ? ? ? ? ? ? ?London ? ? ? ? ? 47 ?East ? ? ? ? ?48 ? ? ? 110003 ? ? ? ? RM ? ? ? ? ? ? ? West ? ? ? ? ? ? ?45 ?Southwest ?39 ? ? ? 110003 ? ? ? ? C ? ? ? ? ? ? ? ? ?Southwest ? ? ?38 ? ? ? 110004 ? ? ? ? H ? ? ? ? ? ? ? ? ?South ? ? ? ? ? ? 42 ? ? ? 110006 ? ? ? ? G ? ? ? ? ? ? ? ? ?East ? ? ? ? ? ? ? 47 London ? ? ?45 ? ? ? 110006 ? ? ? ? G ? ? ? ? ? ? ? ? ?London ? ? ? ? ? 45 So one row per candidate/specialty combination, with 2 sets of offices/scores and null values in the second set if they've only applied once for that specialty. ?Can anyone help me out with this? Is it possible using stack or reshape? Many thanks for reading, Jan PS Closest I've come to what I need is the sparse matrix produced by this: recast(spec.scores, Candidate.ID ~ Specialty + Office, measure.var="Score") ? ? ? ?[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.