Skip to content
Back to formatted view

Raw Message

Message-ID: <1291843024958-3079110.post@n4.nabble.com>
Date: 2010-12-08T21:17:04Z
From: Ryan Garner
Subject: Parallel Scan of Large File
In-Reply-To: <AANLkTi=tnvFxry9=hpu3PbWLFqXzJR2pYV6iQA-eo=n6@mail.gmail.com>

Hi Jim,

Thanks for your insight. I used Linux split to split my large file into
smaller partitions. On the server I work on, multipath I/O access is enabled
and we use RAID for storage; thus, I don't think I can put each partition on
a spindle. I'm able to open multiple files at a time into stdin from the
command line:

> cat file1.txt | wc -l &
> cat file2.txt | wc -l &
> cat file3.txt | wc -l &
> cat file4.txt | wc -l &

But I'm still not sure how to read each partition in parallel. When I run
this code, it doesn't run in parallel, instead file.list gets filled with 1
cpu doing all the work.

R> library(doMC)
R> files <- Sys.glob("x*")                                             #
Grabs all the file partitions created by split
R> file.list <- lapply(files,function(x){file(x,"r")})               #
Creates all the file partitions connections
R> master <- foreach (i = icount(length(open))) %dopar% # Attempt at
parallel readLines
+{
+	readLines(file.list[[i]],1000000)
+}

-- 
View this message in context: http://r.789695.n4.nabble.com/Parallel-Scan-of-Large-File-tp3077545p3079110.html
Sent from the R help mailing list archive at Nabble.com.