Skip to content

R command line and pipe using in Linux?

5 messages · Sean O'Riordain, Dirk Eddelbuettel, Petr Savicky +1 more

#
Good afternoon Hang,

This is an example of what I've done with a csv file with a header
which is too big to read into memory.

# this is a file with about 50 columns and 28 million records
ap.fnam <- 'p2_all28m_records.csv'
# lets just explore the columns in Addresspoint...
# by reading in the header and the first row
p1 <- read.csv(ap.fnam, nrows=1)

# now which columns do we actually want?
# ok... in this case we only want the NCAT column...
cols.reqd <- grep('NCAT', names(p1))
# so we create a list containing this/these column(s) as a 'character'
# type and all other columns as 'NULL'...
col.classes <- ifelse(seq(ncol(p1)) %in% cols.reqd, 'character', 'NULL')

# this will likely take a little over a minute!
p9 <- read.csv(ap.fnam, colClasses=col.classes )

Hope this helps

Kind regards,
Sean
On 14 February 2011 17:40, Hang PHAN <hangphan at gmail.com> wrote:
#
On 14 February 2011 at 17:40, Hang PHAN wrote:
| Hi,
| I have a very large data file(GB) from which I only want to extract one
| column to draw histogram. This would be done several times, so I would like
| to ask if there is anyway to plot this using R from the linux command line,
| something look like this
| 
| cut -f1 xxx.txt |RplotHist ....

Have a look at littler which was written with these uses in mind:

   http://dirk.eddelbuettel.com/code/littler.html

It includes a few examples which should get you going. Also, in
non-interactive mode, your plot device will have to a file.

Hope this helps, Dirk
#
On Mon, Feb 14, 2011 at 05:40:29PM +0000, Hang PHAN wrote:
Hi Hang:

Can you use something like the following?

  x <- as.numeric(system("cut -f1 xxx.txt", intern=TRUE))

According to ?system, long lines will be split, however, no limit
on the number of lines of the output is formulated there.

Petr Savicky.