Message-ID: <20061202131606.GA15341@gsf.de>
Date: 2006-12-02T13:16:06Z
From: Philipp Pagel
Subject: newbie: new_data_frame <- selected set of rows
In-Reply-To: <928715.70626.qm@web90407.mail.mud.yahoo.com>
Hi!
> distances <- order(distancevector(scaled_DB, scaled_DB['query',],
> d="euclid"))
Just compute the distances WITHOUT ordering, here. And then
> 1) create a small top_five frame
top = scaled_DB[rank(distances)<=5, ]
rank() is better for this than order() in case there are ties.
> 2) after I got top_five I woul like to get the index
> of my query entry, something along Pythons
> top_five.index('query_string')
You mean by row name?
which(row.names(scaled_DB)=='query_string')
But why would you need the index? If you want to get the respective row
use logical indexing:
my_dataframe['query_string', ]
> 3) possibly combine values in distances with row names
> from my_dataframe:
> row_1 distance_from_query1
> row_2 distance_from_query2
The easiest way to store the distances along with the original names and
data would be to simply make distances a column in your data frame,
which is what I would have done to begin with. The entire procedure
would then look like this:
my_dataframe = read.table( ... )
scaled_DB <- scale(my_dataframe, center=FALSE)
scaled_DB$dist1 = distancevector(scaled_DB, scaled_DB['query1',], ...)
scaled_DB$dist2 = distancevector(scaled_DB, scaled_DB['query2',], ...)
scaled_DB$dist3 = distancevector(scaled_DB, scaled_DB['query3',], ...)
...
top1 = scaled_DB[rank(scaled_DB$dist1)<=5, ]
...
cu
Philipp
--
Dr. Philipp Pagel Tel. +49-8161-71 2131
Dept. of Genome Oriented Bioinformatics Fax. +49-8161-71 2186
Technical University of Munich
Science Center Weihenstephan
85350 Freising, Germany
and
Institute for Bioinformatics / MIPS Tel. +49-89-3187 3675
GSF - National Research Center Fax. +49-89-3187 3585
for Environment and Health
Ingolst?dter Landstrasse 1
85764 Neuherberg, Germany
http://mips.gsf.de/staff/pagel