Skip to content

Unable to extract gene list from chromosome

5 messages · David Winsemius, pooja sinha

#
Hi All,

I am trying to extract gene list from chromosome number and position, for
that I am using biomaRt in R but I am getting error messages as shown
below. Also below is the code I am using for extraction.

library("biomaRt")
listMarts()
ensembl <- useMart("ensembl")
datasets <- listDatasets(ensembl)
ensembl = useDataset("rnorvegicus_gene_ensembl",mart=ensembl)
AT_AC_Gene <- read.csv("AT-AC-methylkit_biomart-4-7-21.csv",header=T)
attributes <-
c("external_gene_name","ensembl_gene_id","start_position","end_position","rgd_symbol","chromosome_name")
filters <- c("chromosome_name","start","end")
values <- list(AT_AC_Gene$chr,AT_AC_Gene$start,AT_AC_Gene$end)
final_1 <- getBM(attributes=attributes, filters=filters, values=values,
mart=ensembl)

The code runs well without any error but the final1 output has 0
observations of 6 variables. Why?

Can anyone help me with this?


Thanks,

Puja
#
On 4/8/21 2:30 PM, pooja sinha wrote:
#--- a this point I get

Error in file(file, "rt") : cannot open the connection
In addition: Warning message:
In file(file, "rt") :
 ? cannot open file 'AT-AC-methylkit_biomart-4-7-21.csv': No such file 
or directory
You are more likely to get a useful response on the BioC mailing list. 
It appears you have a dependenciy of a csv file that you have not told 
us about.
#
Hi David,

Sorry I forgot to attach the file. Now it's attached.


Thanks,
Puja

On Thu, Apr 8, 2021 at 6:01 PM David Winsemius <dwinsemius at comcast.net>
wrote:
#
On 4/8/21 3:42 PM, pooja sinha wrote:
Now when I go back and check the values of the setup variables after 
seeing an error on the last call,

Error in .processResults(postRes, mart = mart, sep = sep, fullXmlQuery = 
fullXmlQuery,? :
 ? Query ERROR: caught BioMart::Exception::Database: Error during query 
execution: You have an error in your SQL syntax; check the manual that 
corresponds to your MySQL server version for the right syntax to use 
near 'AND (main.seq_region_end_1020 >= '15108600' OR 
main.seq_region_end_1020 >= '9115' at line 1

I now notice:


AT_AC_Gene$chr

#NULL

Changing that to AT_AC_Gene$Chromosome_number gets at least a startup 
message from the server:

Batch submitting query 
[==>-------------------------------------------------------------------] 
5% eta:? 1m

Error in .processResults(postRes, mart = mart, sep = sep, fullXmlQuery = 
fullXmlQuery,? :
 ? Query ERROR: caught BioMart::Exception::Database: Error during query 
execution: You have an error in your SQL syntax; check the manual that 
corresponds to your MySQL server version for the right syntax to use 
near 'AND (main.seq_region_end_1020 >= '15108600' OR 
main.seq_region_end_1020 >= '9115' at line 1

But then I get the same error before about SQL syntax error.


Then I ran it with only complete cases and now get no error but again 
see no hits:

str(final_1)
'data.frame':??? 0 obs. of? 6 variables:
 ?$ external_gene_name: logi
 ?$ ensembl_gene_id?? : logi
 ?$ start_position??? : logi
 ?$ end_position????? : logi
 ?$ rgd_symbol??????? : logi
 ?$ chromosome_name?? : logi


I also see a lot of NA's in that dataset and when I just send the first 
10 rows of the request, I get no error (but also no matches.)


So you clearly are not giving us all the data or all the code, but I'm 
finally wondering if you just don't have an data that matches teh 
external datasets in your chosen "biomart". Can you offer a smaller 
dataset that you know with certainty should produce a match?


Alternatively, you might want to post this instead at the BioConductor 
mailing list. They are the people who have a better chance of spotting 
obvious errors. I've found two likely code-related errors but I'm not a 
computational biostatistician.

David

  
  
#
Hi David,

That's the only file I have for analysis and I am also getting the final_1
as 0 obs. of  6 variables. My problem is that I am not getting any output.
It seems like I am missing something in the* values* code but I don't know
what. Just for your hint I googled and some people have suggested using
values as vectors which I do not understand. Also when I pick one row of
the start column and do it on the interactive phase it's giving the result
but it's not possible to do one by one due to the large no. of rows. I
posted my problem in biostars but am still waiting for someone to reply.

Thanks,
Puja

On Thu, Apr 8, 2021 at 7:28 PM David Winsemius <dwinsemius at comcast.net>
wrote: