hi,
importing an OBO file with GSEABase::getOBOCollection() I have observed
missing children in the imported ontology. Here is an example with the
Sequence Ontology:
library(GSEABase)
oboSOXP <-
getOBOCollection("http://sourceforge.net/p/song/svn/HEAD/tree/trunk/so-xp.obo")
Warning message:
In readLines(src) :
incomplete final line found on
'http://sourceforge.net/p/song/svn/HEAD/tree/trunk/so-xp.obo'
gSOXP <- as(oboSOXP, "graphNEL")
edges(gSOXP)[["SO:0001622"]]
[1] "SO:0001968"
so the term SO:0001622 in principle has only one child term SO:0001968.
However, a free text search for this entry in the OBO file shows the
following:
[Term]
id: SO:0001622
name: UTR_variant
def: "A transcript variant that is located within the UTR." [SO:ke]
synonym: "UTR variant" EXACT []
synonym: "UTR_" EXACT ebi_variants
[http://ensembl.org/info/docs/variation/index.html]
is_a: SO:0001791 ! exon_variant
is_a: SO:0001968 ! coding_transcript_variant
created_by: kareneilbeck
creation_date: 2010-03-23T11:22:58Z
that is, it has two children, not just one. The child SO:0001791 is
missing. Actually, looking to the distribution of the number of children
per term, they all have at most one child:
nchild <- sapply(edges(gSOXP), length)
table(nchild)
nchild
0 1
206 2072
I have not found in the manual page of getOBOCollection() that this
function cannot import more than one child per term, so I guess this is
either a bug or an oversight issue.
cheers,
robert.
[Bioc-devel] GSEABase::getOBOCollection() missing children
3 messages · Robert Castelo, Martin Morgan
On 06/05/2015 08:51 AM, Robert Castelo wrote:
hi, importing an OBO file with GSEABase::getOBOCollection() I have observed missing children in the imported ontology. Here is an example with the Sequence Ontology:
Thanks Robert, the import went ok, but the coercion to graphNEL was flawed. This is fixed in 1.31.2 in devel, and will be ported to release / available via biocLite tomorrow afternoon (all being well...) Martin
library(GSEABase)
oboSOXP <-
getOBOCollection("http://sourceforge.net/p/song/svn/HEAD/tree/trunk/so-xp.obo")
Warning message:
In readLines(src) :
incomplete final line found on
'http://sourceforge.net/p/song/svn/HEAD/tree/trunk/so-xp.obo'
gSOXP <- as(oboSOXP, "graphNEL")
edges(gSOXP)[["SO:0001622"]]
[1] "SO:0001968"
so the term SO:0001622 in principle has only one child term SO:0001968. However,
a free text search for this entry in the OBO file shows the following:
[Term]
id: SO:0001622
name: UTR_variant
def: "A transcript variant that is located within the UTR." [SO:ke]
synonym: "UTR variant" EXACT []
synonym: "UTR_" EXACT ebi_variants
[http://ensembl.org/info/docs/variation/index.html]
is_a: SO:0001791 ! exon_variant
is_a: SO:0001968 ! coding_transcript_variant
created_by: kareneilbeck
creation_date: 2010-03-23T11:22:58Z
that is, it has two children, not just one. The child SO:0001791 is missing.
Actually, looking to the distribution of the number of children per term, they
all have at most one child:
nchild <- sapply(edges(gSOXP), length)
table(nchild)
nchild
0 1
206 2072
I have not found in the manual page of getOBOCollection() that this function
cannot import more than one child per term, so I guess this is either a bug or
an oversight issue.
cheers,
robert.
Computational Biology / Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: Arnold Building M1 B861 Phone: (206) 667-2793
2 days later
hi Martin, thanks for the quick fix!!! best regards, robert.
On 06/05/2015 08:05 PM, Martin Morgan wrote:
On 06/05/2015 08:51 AM, Robert Castelo wrote:
hi, importing an OBO file with GSEABase::getOBOCollection() I have observed missing children in the imported ontology. Here is an example with the Sequence Ontology:
Thanks Robert, the import went ok, but the coercion to graphNEL was flawed. This is fixed in 1.31.2 in devel, and will be ported to release / available via biocLite tomorrow afternoon (all being well...) Martin
library(GSEABase)
oboSOXP <-
getOBOCollection("http://sourceforge.net/p/song/svn/HEAD/tree/trunk/so-xp.obo")
Warning message:
In readLines(src) :
incomplete final line found on
'http://sourceforge.net/p/song/svn/HEAD/tree/trunk/so-xp.obo'
gSOXP <- as(oboSOXP, "graphNEL")
edges(gSOXP)[["SO:0001622"]]
[1] "SO:0001968"
so the term SO:0001622 in principle has only one child term
SO:0001968. However,
a free text search for this entry in the OBO file shows the following:
[Term]
id: SO:0001622
name: UTR_variant
def: "A transcript variant that is located within the UTR." [SO:ke]
synonym: "UTR variant" EXACT []
synonym: "UTR_" EXACT ebi_variants
[http://ensembl.org/info/docs/variation/index.html]
is_a: SO:0001791 ! exon_variant
is_a: SO:0001968 ! coding_transcript_variant
created_by: kareneilbeck
creation_date: 2010-03-23T11:22:58Z
that is, it has two children, not just one. The child SO:0001791 is
missing.
Actually, looking to the distribution of the number of children per
term, they
all have at most one child:
nchild <- sapply(edges(gSOXP), length)
table(nchild)
nchild
0 1
206 2072
I have not found in the manual page of getOBOCollection() that this
function
cannot import more than one child per term, so I guess this is either
a bug or
an oversight issue.
cheers,
robert.
Robert Castelo, PhD Associate Professor Dept. of Experimental and Health Sciences Universitat Pompeu Fabra (UPF) Barcelona Biomedical Research Park (PRBB) Dr Aiguader 88 E-08003 Barcelona, Spain telf: +34.933.160.514 fax: +34.933.160.550