newbie: new_data_frame <- selected set of rows - R-help

Sat, Dec 2, 2006 5:16 AM #

Hi!

Just compute the distances WITHOUT ordering, here. And then

top = scaled_DB[rank(distances)<=5, ]

rank() is better for this than order() in case there are ties.

You mean by row name?

which(row.names(scaled_DB)=='query_string')

But why would you need the index? If you want to get the respective row
use logical indexing:

my_dataframe['query_string', ]

The easiest way to store the distances along with the original names and
data would be to simply make distances a column in your data frame,
which is what I would have done to begin with. The entire procedure
would then look like this:

my_dataframe = read.table( ... )
scaled_DB <- scale(my_dataframe, center=FALSE)
scaled_DB$dist1 = distancevector(scaled_DB, scaled_DB['query1',], ...)
scaled_DB$dist2 = distancevector(scaled_DB, scaled_DB['query2',], ...)
scaled_DB$dist3 = distancevector(scaled_DB, scaled_DB['query3',], ...)
...
top1 = scaled_DB[rank(scaled_DB$dist1)<=5, ]
...

cu
	Philipp

Dr. Philipp Pagel                            Tel.  +49-8161-71 2131
Dept. of Genome Oriented Bioinformatics      Fax.  +49-8161-71 2186
Technical University of Munich
Science Center Weihenstephan
85350 Freising, Germany

 and

Institute for Bioinformatics / MIPS          Tel.  +49-89-3187 3675
GSF - National Research Center               Fax.  +49-89-3187 3585
      for Environment and Health
Ingolst?dter Landstrasse 1
85764 Neuherberg, Germany
http://mips.gsf.de/staff/pagel