[Bioc-devel] R: Re: Re: Deseq2 and differentia expression

2 messages · jarod_v6 at libero.it, Michael Love

Original

jarod_v6 at libero.it

Fri, Jul 11, 2014 7:55 AM #

Dear All I don't understand. I want to extract only genes have a fold-change 
more than 0.5
I use this comand but I have all the genes inside:
 resGA2 <- results(dds, lfcThreshold=.5, altHypothesis="greaterAbs") #greater
dim(resGA)

how can extract the names of genes are in resGA2?
thanks for the patience!
j.

wrote:

need

#greater

FDR

absolute

NA

),]

FALSE)

  subscript contains NAs

This is not the recommended way to filter on large log fold changes.
We have implemented a test specifically for this, check the vignette
section on "Tests of log2 fold change above or below a threshold"

Mike

How can I resolve this problenms?
thanks in advance for the help



R version 3.1.0 (2014-04-10)
Platform: i686-pc-linux-gnu (32-bit)

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
 [9] LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] splines   parallel  stats     graphics  grDevices utils     datasets
[8] methods   base

other attached packages:
 [1] annotate_1.40.1         RColorBrewer_1.0-5      gplots_2.14.1
 [4] org.Hs.eg.db_2.10.1     ReportingTools_2.4.0    AnnotationDbi_1.24.0
 [7] RSQLite_0.11.4          DBI_0.2-7               knitr_1.6
[10] biomaRt_2.18.0          DESeq2_1.4.5            RcppArmadillo_0.

4.320.0

[13] Rcpp_0.11.2             GenomicRanges_1.14.4    XVector_0.2.0
[16] IRanges_1.20.7          affy_1.40.0             NOISeq_2.6.0
[19] Biobase_2.22.0          BiocGenerics_0.8.0

loaded via a namespace (and not attached):
 [1] affyio_1.30.0            AnnotationForge_1.4.4    BiocInstaller_1.
12.1
 [4] Biostrings_2.30.1        biovizBase_1.10.8        bitops_1.0-
6
 [7] BSgenome_1.30.0          Category_2.28.0          caTools_1.
17
[10] cluster_1.15.2           colorspace_1.2-4         dichromat_2.0-
0
[13] digest_0.6.4             edgeR_3.4.2              evaluate_0.
5.5
[16] formatR_0.10             Formula_1.1-1            gdata_2.
13.3
[19] genefilter_1.44.0        geneplotter_1.40.0       GenomicFeatures_1.
14.5
[22] ggbio_1.10.16            ggplot2_1.0.0            GO.db_2.
10.1
[25] GOstats_2.28.0           graph_1.40.1             grid_3.
1.0
[28] gridExtra_0.9.1          GSEABase_1.24.0          gtable_0.
1.2
[31] gtools_3.4.1             Hmisc_3.14-4             hwriter_1.
3
[34] KernSmooth_2.23-12       lattice_0.20-29          latticeExtra_0.6-
26
[37] limma_3.18.13            locfit_1.5-9.1           MASS_7.3-
33
[40] Matrix_1.1-4             munsell_0.4.2            PFAM.db_2.
10.1
[43] plyr_1.8.1               preprocessCore_1.24.0    proto_0.3-
10
[46] RBGL_1.38.0              RCurl_1.95-4.1           reshape2_1.
4
[49] R.methodsS3_1.6.1        R.oo_1.18.0              Rsamtools_1.
14.3
[52] rtracklayer_1.22.7       R.utils_1.32.4           scales_0.
2.4
[55] stats4_3.1.0             stringr_0.6.2            survival_2.37-
7
[58] tools_3.1.0              VariantAnnotation_1.8.13 XML_3.98-
1.1
[61] xtable_1.7-3             zlibbioc_1.8.0

_______________________________________________
Bioc-devel at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

Michael Love

Fri, Jul 11, 2014 8:07 AM #

hi Jarod,

Please take a look at the beginner vignette for DESeq2. We explain a
lot of the questions you are asking, including how to subset the
object based on adjusted p-value. Please take a look a my previous
email as well, I demonstrated how to subset the results object. As you
can see in the vignette R output, the rownames of the object returned
by results inherits the rownames of dds. So this gives the gene names:

rownames(res)

Mike

On Fri, Jul 11, 2014 at 10:55 AM, jarod_v6 at libero.it <jarod_v6 at libero.it> wrote:

Dear All I don't understand. I want to extract only genes have a fold-change
more than 0.5
I use this comand but I have all the genes inside:
 resGA2 <- results(dds, lfcThreshold=.5, altHypothesis="greaterAbs") #greater
dim(resGA)

[1] 62893     6

dim(resGA2)

[1] 62893     6

dim(resGA3)

[1] 62893     6

how can extract the names of genes are in resGA2?
thanks for the patience!
j.

----Messaggio originale----
Da: michaelisaiahlove at gmail.com
Data: 11/07/2014 15.15
A: "jarod_v6 at libero.it"<jarod_v6 at libero.it>
Cc: "bioc-devel at r-project.org"<bioc-devel at r-project.org>
Ogg: Re: Re: [Bioc-devel] Deseq2 and differentia expression

hi Jarod,

This is more of a main Bioc mailing list question, so you can address
future questions there.

On Fri, Jul 11, 2014 at 6:05 AM, jarod_v6 at libero.it <jarod_v6 at libero.it>

wrote:

Dear Dr,
Thanks so much for clarification!!!
So I try the test of log fold change but I'm bit confusion on the results:
If I interested in the genes that have a foldchange more than 0.5 and 2 I

need

to use this comand is it right?

the second and third results() commands below give you this.

ddsNoPrior <- DESeq(ddHTSeq, betaPrior=FALSE) #only for lessABs

resGA <- results(ddsNoPrior, lfcThreshold=.5, altHypothesis="lessAbs")
#greater tdi
resGA2 <- results(dds, lfcThreshold=.5, altHypothesis="greaterAbs")

#greater

tdi
resGA3 <- results(dds, lfcThreshold=2, altHypothesis="greaterAbs") #greater
tdi

dim(resGA)
[1] 62893     6

dim(resGA2)

[1] 62893     6

dim(resGA3)

[1] 62893     6

The number of gene select it is always the same.. Where is my mistake!
thanks in advance!

DESeq2 returns the results for all the genes in the same order as the
original object. You need to specify a threshold on adjusted p-value.

table(res$padj < 0.1)

You can use subset(res, padj < 0.1) to filter the DataFrame.

----Messaggio originale----
Da: michaelisaiahlove at gmail.com
Data: 10/07/2014 14.46
A: "jarod_v6 at libero.it"<jarod_v6 at libero.it>
Cc: "bioc-devel at r-project.org"<bioc-devel at r-project.org>
Ogg: Re: [Bioc-devel] Deseq2 and differentia expression

hi Jarod,

On Thu, Jul 10, 2014 at 7:59 AM, jarod_v6 at libero.it <jarod_v6 at libero.it>

wrote:

Hi there!!!

I have did this code:
SampleTable <-data.frame(SampleName=metadata$ID_CLINICO,

fileName=metadata$NOME,

condition=metadata$CONDITION,prim=metadata$CDT)
ddHTSeq <- DESeqDataSetFromHTSeqCount(sampleTable=SampleTable,directory="
Count/", design= ~condition) # effetto dello mutazione
ddHTSeq$condition <- relevel(ddHTSeq$condition, "NVI")# quindi verso non
viscerali
dds <- DESeq(ddHTSeq)
res <-results(dds)

resOrdered <- res[order(res$padj),]
head(resOrdered)
ResSig <- res[ which(res$padj < 0.1 ), ]


I want to select some data. How can I do? which is the good cut-off on

FDR

values?

The code above does the selection on adjusted p-value. The right FDR
cutoff is up to you, what percent of false discoveries is tolerable in
the final list of genes? The considerations are: the cost of
validation or following up on a false discovery, versus the cost of a
missed discovery. These are hard to quantify even if you know all the
details of an experiment.

All the data have a FDR less thank 0.1 . :
Is it right this comand?
res[ which(res$padj < 0.1 ), ]

yes. The which() is necessary because some of the res$padj have NA. If
you have a logical vector with NA, you cannot directly index a
DataFrame, but you can index after calling which(), which will return
the numeric index of the TRUE's. You could also subset with:
subset(res, padj < 0.1).

The reason for the NAs is explained in the vignette: "Note that some
values in the results table can be set to NA, for either one of the
following reasons:..."

How many significant genes are with FDR less than 0.1 and  have an

absolute

value of  foldchange  more of 1 ? I  have and error on this. I have many

NA

values.

If I try this code I have the follow errors

significant.genes = res[(res$padj < .05 & abs(res$log2FoldChange) >= 1

),]

Set thethreshold for the log2 fold change.
Error in normalizeSingleBracketSubscript(i, x, byrow = TRUE, exact =

FALSE)

  subscript contains NAs

This is not the recommended way to filter on large log fold changes.
We have implemented a test specifically for this, check the vignette
section on "Tests of log2 fold change above or below a threshold"

Mike

How can I resolve this problenms?
thanks in advance for the help



R version 3.1.0 (2014-04-10)
Platform: i686-pc-linux-gnu (32-bit)

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
 [9] LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] splines   parallel  stats     graphics  grDevices utils     datasets
[8] methods   base

other attached packages:
 [1] annotate_1.40.1         RColorBrewer_1.0-5      gplots_2.14.1
 [4] org.Hs.eg.db_2.10.1     ReportingTools_2.4.0    AnnotationDbi_1.24.0
 [7] RSQLite_0.11.4          DBI_0.2-7               knitr_1.6
[10] biomaRt_2.18.0          DESeq2_1.4.5            RcppArmadillo_0.

4.320.0

[13] Rcpp_0.11.2             GenomicRanges_1.14.4    XVector_0.2.0
[16] IRanges_1.20.7          affy_1.40.0             NOISeq_2.6.0
[19] Biobase_2.22.0          BiocGenerics_0.8.0

loaded via a namespace (and not attached):
 [1] affyio_1.30.0            AnnotationForge_1.4.4    BiocInstaller_1.
12.1
 [4] Biostrings_2.30.1        biovizBase_1.10.8        bitops_1.0-
6
 [7] BSgenome_1.30.0          Category_2.28.0          caTools_1.
17
[10] cluster_1.15.2           colorspace_1.2-4         dichromat_2.0-
0
[13] digest_0.6.4             edgeR_3.4.2              evaluate_0.
5.5
[16] formatR_0.10             Formula_1.1-1            gdata_2.
13.3
[19] genefilter_1.44.0        geneplotter_1.40.0       GenomicFeatures_1.
14.5
[22] ggbio_1.10.16            ggplot2_1.0.0            GO.db_2.
10.1
[25] GOstats_2.28.0           graph_1.40.1             grid_3.
1.0
[28] gridExtra_0.9.1          GSEABase_1.24.0          gtable_0.
1.2
[31] gtools_3.4.1             Hmisc_3.14-4             hwriter_1.
3
[34] KernSmooth_2.23-12       lattice_0.20-29          latticeExtra_0.6-
26
[37] limma_3.18.13            locfit_1.5-9.1           MASS_7.3-
33
[40] Matrix_1.1-4             munsell_0.4.2            PFAM.db_2.
10.1
[43] plyr_1.8.1               preprocessCore_1.24.0    proto_0.3-
10
[46] RBGL_1.38.0              RCurl_1.95-4.1           reshape2_1.
4
[49] R.methodsS3_1.6.1        R.oo_1.18.0              Rsamtools_1.
14.3
[52] rtracklayer_1.22.7       R.utils_1.32.4           scales_0.
2.4
[55] stats4_3.1.0             stringr_0.6.2            survival_2.37-
7
[58] tools_3.1.0              VariantAnnotation_1.8.13 XML_3.98-
1.1
[61] xtable_1.7-3             zlibbioc_1.8.0

_______________________________________________
Bioc-devel at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

_______________________________________________
Bioc-devel at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel