Skip to content
Prev 8240 / 21312 Next

[Bioc-devel] riboSeqR error - matrix of zeros output from riboSeqR::readingFrame()

Hi Tom,

1) Duplicate names in fasta file

You were right! Fixed.. Thanks.


2) You're right ! The seqlevels in the fasta file have an additional 
column with the gene description (as shown below)

 > head(seqlevels(fastaCDS))
[1] "gi|584277002|ref|NM_001289933.1| Homo sapiens ZO-2 associated 
speckle protein (ZASP), mRNA"
[2] "gi|514682097|ref|NM_001205273.2| Homo sapiens testis highly 
expressed protein 5 (THEG5), transcript variant 1, mRNA"
[3] "gi|321400129|ref|NM_001129895.2| Homo sapiens uncharacterized 
LOC100128124 (HGC6.3), mRNA"
[4] "gi|409104115|ref|NM_001271560.1| Homo sapiens chromosome 16 open 
reading frame 72-like (LOC389895), mRNA"
[5] "gi|514961963|ref|NM_001278587.1| Homo sapiens uncharacterized 
LOC100134391 (LOC100134391), mRNA"
[6] "gi|514961949|ref|NM_001278577.1| Homo sapiens testis highly 
expressed protein 5 (THEG5), transcript variant 4, mRNA"

 > head( seqlevels(riboDat at riboGR[[1]]))
[1] "gi|161333872|ref|NR_004421.1|" "gi|335057560|ref|NR_002756.2|"
[3] "gi|261862236|ref|NM_001166278.1|" "gi|664805956|ref|NM_001300840.1|"
[5] "gi|209969817|ref|NM_001540.3|" "gi|161333875|ref|NR_004427.1|"

The flat files produced after the alignment step

bowtie ~/tools/human_rrna_hg38/hg38.rrna --un nonrRNA_sample1.fq -q 
sample1_trimmed.fq -p 8  > rRNA_sample1
bowtie ~/tools/full_transcriptome_index/hg19_rna_index --un 
unaligned_sample1.fq -q nonrRNA_sample1.fq --al aligned_sample1.fq 
--suppress 1,6,7,8  -p 8 > sample1


look like this -

+       gi|299523218|ref|NM_001190706.1|        382 
AAACCTGCATTAAAAATTTCGGTTGGG
+       gi|767974393|ref|XM_006719433.2|        1129 
CTAACATTCCTCAAGGGATGGTGACGGA
+       gi|214829672|ref|NM_014570.4|   1429 CGCAAGCCAGATTATGAGCCAGTTGAAAA
+       gi|767948282|ref|XM_011516409.1|        374 
GCCGAAATAGGACTCATTTTAATCTTCTG
+       gi|767982461|ref|XR_943839.1|   1890 
GGTGACCTCCCGGGAGCGGGGGACCACCAGGT
-       gi|374429547|ref|NR_046235.1|   13038 GCGCGTCCGGCGCCGTCCGTCCTTCCGTTC

and this is what is read in via riboSeqR::riboDat() - why do you think I 
am loosing that column ?


How do you propose I proceed ? Remove the gene description (eg: Homo 
sapiens ZO-2 associated speckle protein (ZASP), mRNA) ie do a 
renameSeqlevels on fastaCDS?

I tried the following - and I get -

old_seqlevels <- seqlevels(fastaCDS)
new_seqlevels <- gsub(" Homo sapiens.*$","", old_seqlevels)
new_seqlevels <- gsub(" PREDICTED:.*$","",new_seqlevels)
fastaCDS_renamed <- renameSeqlevels(fastaCDS, new_seqlevels)

 > head(seqlevels(fastaCDS_renamed))
[1] "gi|584277002|ref|NM_001289933.1|" "gi|514682097|ref|NM_001205273.2|"
[3] "gi|321400129|ref|NM_001129895.2|" "gi|409104115|ref|NM_001271560.1|"
[5] "gi|514961963|ref|NM_001278587.1|" "gi|514961949|ref|NM_001278577.1|"


 > fCs <- frameCounting(riboDat, fastaCDS_renamed)
Calling frames................done!

 > fS
              26     27     28      29      30
          131836 166076 833536 2911540 2205274
           73425 122531 327252  503215  596236
           89614 153513 414387 1341278 1482810
frame.ML      0      0      0       0       0

This looks great to me!


Thanks for the great tips!
Sonali.
On 10/27/2015 8:56 AM, Tom Hardcastle wrote: