Skip to content

Tools for data preparation?

2 messages · David Mitchell, (Ted Harding)

#
Hello list,

I'm regularly in the position where I have to do a lot of data
manipulation, in order to get the data I have into a format R is happy
with.  This manipulation would generally be in one of two forms:
- getting data from e.g. text log files into a tabular format
- extracting sensible sample data from a very large data set (i.e. too
large for R to handle)

In general, I use Perl or Python to do the task; I'm curious as to
what others use when they hit the same problem.

Regards

Dave Mitchell
#
On 19-Nov-04 David Mitchell wrote:
I generally use 'awk' with help from 'sed' when needed.
This is on the same lines as your choice though lighter-weight
and less powerful (but I've never had a case that needed more).

Since the sort of task you describe is basically on a line-by-line
basis (and what's meant by a "line" can be pretty flexible in 'awk'),
this sort of thing can be done straightforwardly; but greater
flexibility is also possible.

E.g. it is easy to extract a line from the input, or apply a certain
transformation to fields in a line, if & only if it has already been
preceded by a line satisfying a certain condition, and so on.

Best wishes,
Ted.


--------------------------------------------------------------------
E-Mail: (Ted Harding) <Ted.Harding at nessie.mcc.ac.uk>
Fax-to-email: +44 (0)870 094 0861  [NB: New number!]
Date: 19-Nov-04                                       Time: 08:56:47
------------------------------ XFMail ------------------------------