Hi!
distances <- order(distancevector(scaled_DB, scaled_DB['query',], d="euclid"))
Just compute the distances WITHOUT ordering, here. And then
1) create a small top_five frame
top = scaled_DB[rank(distances)<=5, ] rank() is better for this than order() in case there are ties.
2) after I got top_five I woul like to get the index
of my query entry, something along Pythons
top_five.index('query_string')
You mean by row name? which(row.names(scaled_DB)=='query_string') But why would you need the index? If you want to get the respective row use logical indexing: my_dataframe['query_string', ]
3) possibly combine values in distances with row names from my_dataframe: row_1 distance_from_query1 row_2 distance_from_query2
The easiest way to store the distances along with the original names and data would be to simply make distances a column in your data frame, which is what I would have done to begin with. The entire procedure would then look like this: my_dataframe = read.table( ... ) scaled_DB <- scale(my_dataframe, center=FALSE) scaled_DB$dist1 = distancevector(scaled_DB, scaled_DB['query1',], ...) scaled_DB$dist2 = distancevector(scaled_DB, scaled_DB['query2',], ...) scaled_DB$dist3 = distancevector(scaled_DB, scaled_DB['query3',], ...) ... top1 = scaled_DB[rank(scaled_DB$dist1)<=5, ] ... cu Philipp
Dr. Philipp Pagel Tel. +49-8161-71 2131
Dept. of Genome Oriented Bioinformatics Fax. +49-8161-71 2186
Technical University of Munich
Science Center Weihenstephan
85350 Freising, Germany
and
Institute for Bioinformatics / MIPS Tel. +49-89-3187 3675
GSF - National Research Center Fax. +49-89-3187 3585
for Environment and Health
Ingolst?dter Landstrasse 1
85764 Neuherberg, Germany
http://mips.gsf.de/staff/pagel