Skip to content

for loop and if problem

9 messages · Sake, Richard Cotton, Philipp Pagel +4 more

#
Hi,

I'm heaving difficulties with a dataset containing gene names and positions
of those genes.
Not such a big problem, but each gene has multiple exons so it's hard to say
where de gene starts and where it ends. I want the starting and ending
position of each gene in my dataset.
Attached is the dataset:
http://www.nabble.com/file/p21312449/genlistchrompos.csv genlistchrompos.csv 
Column 'B' is the gene name, 'G' is the starting position and 'H' is the
stop position.
You can load the dataset by using: data<-read.csv("genlistchrompos.csv",
sep=";")
I hope someone can help me, it's giving me headaches for a week now:-((.

Thanks!
#
positions
say
genlistchrompos.csv
which(diff(as.numeric(data$Gene))!=0)

will give you a vector of the last row in each gene.  The start position 
is obviously the next row after the previous end.

Also take a look at 

split(data, data$Gene)

Regards,
Richie.

Mathematical Sciences Unit
HSL


------------------------------------------------------------------------
ATTENTION:

This message contains privileged and confidential inform...{{dropped:20}}
#
On Tue, Jan 06, 2009 at 07:21:48AM -0800, Sake wrote:
I don't really see how 'if' and 'for loops' are involved in the
question. You may want to give us a little more detail on what
exactly you need and what you tried unsuccessfully.  (By the way
-- there are no columns labeled 'B', 'G' or 'H' in the file).

Anyway - I believe this is what you are after:

# get minimum start position by gene
aggregate(dat[, c('Exon_Start.Chr.')], by=list(dat$Gene), min)
# get maximum stop position by gene
aggregate(dat[, c('Exon_Stop.Chr.')], by=list(dat$Gene), max)

Of course, these will only reflect the real start and stop
coordinates of the gene if ALL exons are given in the file.

cu
	Philipp
#
On Tue, 6 Jan 2009, Sake wrote:

            
This looks like a minor variant on the 'first and last observation' thread 
from a few days ago:

 	http://thread.gmane.org/gmane.comp.lang.r.general/135411

to which several useful solutions were posted.

I suggest you read that thread and try to adapt what is there to your 
situation.

If this does not get you all the way there, when you post back it will 
help to "provide commented, minimal, self-contained, reproducible code".

What you have given us is not quite there.

Here is a start:

  data <-
 	read.csv("http://www.nabble.com/file/p21312449/genlistchrompos.csv",sep=';')

and note that
[1] "Query"             "Gene"              "Chrom"             "Strand"            "Accession"
  [6] "Exon"              "Exon_Start.Chr."   "Exon_Stop.Chr."    "Exon_Start.Trans." "Exon_Stop.Trans."

does not include anything like "Column 'B'", so refer to those column 
names if you need further help after studying the thread above.

HTH,

Chuck
Charles C. Berry                            (858) 534-2098
                                             Dept of Family/Preventive Medicine
E mailto:cberry at tajo.ucsd.edu	            UC San Diego
http://famprevmed.ucsd.edu/faculty/cberry/  La Jolla, San Diego 92093-0901
5 days later
#
Try using:

write.table(..., sep=";")

write.csv just calls write.table
On Mon, Jan 12, 2009 at 6:38 AM, Sake <tlep.nav.ekas at hccnet.nl> wrote:

  
    
#
write.csv does exactly what you would expect ... creates a *Comma*  
Separated Values file. If you don't want a comma separated value  
format then use write.table with sep=";"

You can still name it "whatever.csv".

Or you if you also intend commas for decimal points,  use write.csv2  
as described in the help page:

"write.csv2 uses a comma for the decimal point and a semicolon for the  
separator, the Excel convention for CSV files in some Western European  
locales."
#
There is write.csv2 on the same help page as write.csv!

      'write.csv' uses '"."' for the decimal point and a comma for the
      separator.

      'write.csv2' uses a comma for the decimal point and a semicolon
      for the separator, the Excel convention for CSV files in some
      Western European locales.

I've never seen a CSV file with ; that used . for the decimal point, 
so check out if you really want write.csv2.
On Mon, 12 Jan 2009, jim holtman wrote:

            

  
    
#
Thanks! Why did I not think at that myself. .csv means 'Comma Separated
Value'
David Winsemius wrote: