how to delete specific rows in a data frame where the first column matches any string from a list

Wacek Kusnierczyk · 2009-02-07T01:36:25Z

Andrew Choens wrote: > I regularly deal with a similar pattern at work. People send me these > big long .csv files and I have to run them through some pattern analysis > to decide which rows I keep and which rows I kill off. > > As others have mentioned, Perl is a good candidate for this task. > Another option would be a quick SQL query. It should be a snap to pull > this into something like Access or OOo Base . . . . or better yet, a > real database like Postgres, MySQL, etc. > > In case you a

Wacek Kusnierczyk

Fri, Feb 6, 2009 5:36 PM

Andrew Choens wrote:

(this is actually off-topic, but since it may be interesting for the
general public, i keep the response cc: to r-help)

yes, you can do this with sed.  suppose you have two files, one (say,
sample.txt) with the data to be filtered, record fields separated by,
e.g., a tab character, and another (say, filter.txt) with patterns to be
matched.  a row from the first is passed to output only of its second
field does not match any of the patterns -- this corresponds to (a
simplified version of) the original problem.

then, the following should do:

sed "$(sed 's/^/\/^[^\\t]\\+\\t/; s/$/\/d/' filter.txt)" sample.txt >
filtered-sample.txt

(unless the patterns contain characters that interfere with the shell or
sed's syntax, in which case they'd have to be appropriately escaped.)

vQ

how to delete specific rows in a data frame where the first column matches any string from a list

Thread (12 messages)