flexible approach to subsetting data
Actually the ".0" on the first variable is not needed.
You could modify the reshape() call to search for the base
name of each variable so you would not need to change the code
if the number of replications changes:
reshape(df5, direction="long", v.names=c("dose", "resp"),
varying=list(dose=grepl("dose", names(df5)),
resp=grepl("resp", names(df5)) )
)
-------------------------------------
David L Carlson
Associate Professor of Anthropology
Texas A&M University
College Station, TX 77840-4352
-----Original Message-----
From: r-help-bounces at r-project.org
[mailto:r-help-bounces at r-project.org] On Behalf Of David
Winsemius
Sent: Tuesday, July 23, 2013 1:12 PM
To: David Winsemius
Cc: R help; Andrea Lamont
Subject: Re: [R] flexible approach to subsetting data
On Jul 23, 2013, at 10:49 AM, David Winsemius wrote:
On Jul 23, 2013, at 10:01 AM, Adams, Jean wrote:
Check out the reshape() function of the reshape package.
Here's one of the
examples from ?reshape. Jean library(reshape) # No, at least not for the
reshape-function
The reshape function is from the 'base' package. The
'reshape' and 'reshape2' packages were written (at least in part) because the 'reshape'-function was so difficult to understand.
If you do choose to use the reshape2 package, which is
well-respected and often extremely helpful, the function you will want to start with is 'melt'.
long <- reshape(wide, direction="long")
I don't think this example will be particularly helpful
since the initial direction is "long" (from "wide") and more
input would be needed.
Here's a dataset to experiment with
df5 <- data.frame(dose.0 =
c(40,50,60,50),resp.0=c(40,50,60,50),
dose.1 = c(1,2,1,2), resp.1=c(1,2,1,2)+3,
dose.2 = c(2,1,2,1), resp.2=c(1,2,1,2)+3,
dose.3 = c(3,3,3,3), resp.3=c(1,2,1,2)+3 )
Notice that you would need add the ".0" to the column names
reshape(df5, direction="long",
v.names=c("dose", "resp"),
varying=list(dose=c(1,3,5,7), resp=c(2,4,6,8) )
) # succeeds
So perhaps could use similar call (after append the ".0"'s)
with:
varying=list(sim=seq(1,810,by=4),
X1= seq(2,810,by=4),
X2= seq(3,810,by=4),
X3= seq(4,810,by=4)
)
wide long On Tue, Jul 23, 2013 at 9:35 AM, Andrea Lamont
<alamont082 at gmail.com> wrote:
Hello: I am running a simulation study and am stuck with a
subsetting problem.
Here is the basic issue: I generated data and am running a simulation that uses
multiple imputation.
For each generated dataset, I used multiple imputation.
The resultant
dataset is in wide for where each imputation is recorded
as a separate
column (though the different simulations are stacked).
Here is an example
of what it looks like: sim X1 X2 X3 sim.1 X1.1 X1.1 X3.1
1 # # # # # #
#
1 # # # # # #
#
1 # # # # # #
#
2 # # # # # #
#
2 # # # # # #
#
2 # # # # # #
#
sim refers to the simulated/generated dataset. X1-X3 are
the values for the
first imputed dataset, X1.1-X3.1 are the values for the
second imputed
dataset. The problem is that I want the data to be in long format,
like this:
sim m X1 X2 X3 1 1 # # # 1 2 # # # 2 1 # # # 2 2 # # # where m is the imputation number. This will allow me to do cleaner calculations (e.g.
X3-X1).
I know I can subset the data manually - e.g. [,1:10] and
save this to
separate datasets then rbind; however, I'm looking for a
more flexible
approach to do this. This manual approach would be quite
tedious as number
of imputations (and therefore number of columns) increased
(with only 10
imputations, there are roughly 810 columns). Also,I would
like to
avoid having to recode each time I change the number of
imputations.
THe same is true for the reshape function, which would
require naming
a huge number of columns and edits each time 'm' changes.
If the columns are named regularly, then 'reshape' will
attempt to split properly without an explicit naming. Details and a better description of the problem might allow more specific answers to emerge. The fact that the first instances have no numeric indicators may be a problem for the algorithm.
Why not post dput(head( dfrm[ ,1:12])) -- David.
Is there a flexible way to approach this? I'm inclined to
use a for loop,
but know that 1) this is generally inefficient and 2) am
having trouble
with the coding regardless. Any suggestions are appreciated. Thanks, Andrea
David Winsemius Alameda, CA, USA ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.