Skip to content

newbie: new_data_frame <- selected set of rows

1 message · Philipp Pagel

#
Hi!
Just compute the distances WITHOUT ordering, here. And then
top = scaled_DB[rank(distances)<=5, ]

rank() is better for this than order() in case there are ties.
You mean by row name?

which(row.names(scaled_DB)=='query_string')

But why would you need the index? If you want to get the respective row
use logical indexing:

my_dataframe['query_string', ]
The easiest way to store the distances along with the original names and
data would be to simply make distances a column in your data frame,
which is what I would have done to begin with. The entire procedure
would then look like this:

my_dataframe = read.table( ... )
scaled_DB <- scale(my_dataframe, center=FALSE)
scaled_DB$dist1 = distancevector(scaled_DB, scaled_DB['query1',], ...)
scaled_DB$dist2 = distancevector(scaled_DB, scaled_DB['query2',], ...)
scaled_DB$dist3 = distancevector(scaled_DB, scaled_DB['query3',], ...)
...
top1 = scaled_DB[rank(scaled_DB$dist1)<=5, ]
...

cu
	Philipp