Hi, ? I have been given a set of around 300 files where there are 5 files corresponding to each chunk. E.g. Chunk 1 for chr1 contains these 5 files: ? ? ??? chr1.one.phased.impute2.chunk1 ??????? chr1.one.phased.impute2.chunk1_info ??????? chr1.one.phased.impute2.chunk1_info_by_sample ??????? chr1.one.phased.impute2.chunk1_summary ??????? chr1.one.phased.impute2.chunk1_warnings For chr 1 there are 47 chunks, chr2 has 42 chunks...and it ends at chr22 with 23 chunks. I am using the DatABEL package to? convert them databel format using the following command: impute2databel(genofile="chr1.one.phased.impute2.chunk1", samplefile="chr1.one.phased.impute2.chunk1_info", outfile="chr1.chunk1", makeprob=TRUE, old=FALSE)? which uses two files per chunk. Is there a way I can automate this so that the code goes through each chunk of each chromosome and does the conversion to databel format. Thanks, ?-Debs
Reading in and modifying multiple datasets in a loop
4 messages · Debs Majumdar, Uwe Ligges
On 21.10.2011 23:32, Debs Majumdar wrote:
Hi,
I have been given a set of around 300 files where there are 5 files corresponding to each chunk.
E.g. Chunk 1 for chr1 contains these 5 files:
chr1.one.phased.impute2.chunk1
chr1.one.phased.impute2.chunk1_info
chr1.one.phased.impute2.chunk1_info_by_sample
chr1.one.phased.impute2.chunk1_summary
chr1.one.phased.impute2.chunk1_warnings
For chr 1 there are 47 chunks, chr2 has 42 chunks...and it ends at chr22 with 23 chunks.
I am using the DatABEL package to convert them databel format using the following command:
impute2databel(genofile="chr1.one.phased.impute2.chunk1", samplefile="chr1.one.phased.impute2.chunk1_info", outfile="chr1.chunk1", makeprob=TRUE, old=FALSE)
which uses two files per chunk.
Is there a way I can automate this so that the code goes through each chunk of each chromosome and does the conversion to databel format.
Yes, probably (all untested):
owd <- setwd(pth)
fls <- list.files(pattern="^chr")
ufls <- unique(sapply(strsplit(fls, "_"), "[", 1))
for(i in ufls){
of <- strsplit(i, "\\.")[[1]]
of <- paste(of[1], tail(of, 1), sep=".")
impute2databel(genofile = i,
samplefile = paste(i, "info", sep="_"),
outfile = of,
makeprob=TRUE, old=FALSE)
}
setwd(owd)
Uwe Ligges
Thanks, -Debs
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
2 days later
Thanks Uwe. This works perfectly.
#######
owd <- setwd(pth)
fls <- list.files(pattern="^chr")
ufls <- unique(sapply(strsplit(fls, "_"), "[", 1))
for(i in ufls){
? ? ?of <- strsplit(i, "\\.")[[1]]
? ? ?of <- paste(of[1], tail(of, 1), sep=".")
? ? ?impute2databel(genofile = i,
? ? ? ? ? ? ? ? ? ? samplefile = paste(i, "info", sep="_"),
? ? ? ? ? ? ? ? ? ? outfile = of,
? ? ? ? ? ? ? ? ? ? makeprob=TRUE, old=FALSE)
}
setwd(owd)
####
I have a question regarding how strsplit works.
When my files are the following:
??????? chr1.one.phased.impute2.chunk1
??????? chr1.one.phased.impute2.chunk1_info
??????? chr1.one.phased.impute2.chunk1_info_by_sample
??????? chr1.one.phased.impute2.chunk1_summary
??????? chr1.one.phased.impute2.chunk1_warnings
ufls <- unique(sapply(strsplit(fls, "_"), "[", 1))
This works like a charm.
I have another dataset where the files are
? ? ? ? study1_chr1.one.phased.impute2.chunk1
??????? study1_chr1.one.phased.impute2.chunk1_info
??????? study1_chr1.one.phased.impute2.chunk1_info_by_sample
??????? study1_chr1.one.phased.impute2.chunk1_summary
??????? study1_chr1.one.phased.impute2.chunk1_warnings
... and so on.
and I wanted to run the same loop but I was unable to change strsplit so that it will work when the files are names ads above:
I tried
ufls <- unique(sapply(strsplit(fls, "_"), "[", 2))
but this knocks off "study1" (modified code below).? What modification do I need to make to make this run:
####
fls <- list.files(pattern="study1_chr")
ufls <- unique(sapply(strsplit(fls, "_"), "[", 2))
library(GenABEL)
for(i in ufls){
???? of <- strsplit(i, "\\.")[[1]]
???? of <- paste(of[1], tail(of, 1), sep=".")
???? impute2databel(genofile = i,
??????????????????? samplefile = paste(i, "info", sep="_"),
??????????????????? outfile = of,
??????????????????? makeprob=TRUE, old=FALSE)
}
#####
Thanks,
?Debs
----- Original Message -----
From: Debs Majumdar <debs_stata at yahoo.com>
To: "r-help at r-project.org" <r-help at r-project.org>
Cc:
Sent: Friday, October 21, 2011 2:32 PM
Subject: Reading in and modifying multiple datasets in a loop
Hi,
? I have been given a set of around 300 files where there are 5 files corresponding to each chunk.
E.g. Chunk 1 for chr1 contains these 5 files:
? ? ??? chr1.one.phased.impute2.chunk1
??????? chr1.one.phased.impute2.chunk1_info
??????? chr1.one.phased.impute2.chunk1_info_by_sample
??????? chr1.one.phased.impute2.chunk1_summary
??????? chr1.one.phased.impute2.chunk1_warnings
For chr 1 there are 47 chunks, chr2 has 42 chunks...and it ends at chr22 with 23 chunks.
I am using the DatABEL package to? convert them databel format using the following command:
impute2databel(genofile="chr1.one.phased.impute2.chunk1", samplefile="chr1.one.phased.impute2.chunk1_info", outfile="chr1.chunk1", makeprob=TRUE, old=FALSE)?
which uses two files per chunk.
Is there a way I can automate this so that the code goes through each chunk of each chromosome and does the conversion to databel format.
Thanks,
?-Debs
1 day later
On 24.10.2011 23:10, Debs Majumdar wrote:
Thanks Uwe. This works perfectly.
#######
owd<- setwd(pth)
fls<- list.files(pattern="^chr")
ufls<- unique(sapply(strsplit(fls, "_"), "[", 1))
for(i in ufls){
of<- strsplit(i, "\\.")[[1]]
of<- paste(of[1], tail(of, 1), sep=".")
impute2databel(genofile = i,
samplefile = paste(i, "info", sep="_"),
outfile = of,
makeprob=TRUE, old=FALSE)
}
setwd(owd)
####
I have a question regarding how strsplit works.
When my files are the following:
chr1.one.phased.impute2.chunk1
chr1.one.phased.impute2.chunk1_info
chr1.one.phased.impute2.chunk1_info_by_sample
chr1.one.phased.impute2.chunk1_summary
chr1.one.phased.impute2.chunk1_warnings
ufls<- unique(sapply(strsplit(fls, "_"), "[", 1))
This works like a charm.
I have another dataset where the files are
study1_chr1.one.phased.impute2.chunk1
study1_chr1.one.phased.impute2.chunk1_info
study1_chr1.one.phased.impute2.chunk1_info_by_sample
study1_chr1.one.phased.impute2.chunk1_summary
study1_chr1.one.phased.impute2.chunk1_warnings
... and so on.
and I wanted to run the same loop but I was unable to change strsplit so that it will work when the files are names ads above:
I tried
ufls<- unique(sapply(strsplit(fls, "_"), "[", 2))
unique(gsub("(_.*)_.*", "\\1", x))
Should do if there is a first underscore.
Uwe Ligges
but this knocks off "study1" (modified code below). What modification do I need to make to make this run:
####
fls<- list.files(pattern="study1_chr")
ufls<- unique(sapply(strsplit(fls, "_"), "[", 2))
library(GenABEL)
for(i in ufls){
of<- strsplit(i, "\\.")[[1]]
of<- paste(of[1], tail(of, 1), sep=".")
impute2databel(genofile = i,
samplefile = paste(i, "info", sep="_"),
outfile = of,
makeprob=TRUE, old=FALSE)
}
#####
Thanks,
Debs
----- Original Message -----
From: Debs Majumdar<debs_stata at yahoo.com>
To: "r-help at r-project.org"<r-help at r-project.org>
Cc:
Sent: Friday, October 21, 2011 2:32 PM
Subject: Reading in and modifying multiple datasets in a loop
Hi,
I have been given a set of around 300 files where there are 5 files corresponding to each chunk.
E.g. Chunk 1 for chr1 contains these 5 files:
chr1.one.phased.impute2.chunk1
chr1.one.phased.impute2.chunk1_info
chr1.one.phased.impute2.chunk1_info_by_sample
chr1.one.phased.impute2.chunk1_summary
chr1.one.phased.impute2.chunk1_warnings
For chr 1 there are 47 chunks, chr2 has 42 chunks...and it ends at chr22 with 23 chunks.
I am using the DatABEL package to convert them databel format using the following command:
impute2databel(genofile="chr1.one.phased.impute2.chunk1", samplefile="chr1.one.phased.impute2.chunk1_info", outfile="chr1.chunk1", makeprob=TRUE, old=FALSE)
which uses two files per chunk.
Is there a way I can automate this so that the code goes through each chunk of each chromosome and does the conversion to databel format.
Thanks,
-Debs
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.