Skip to content

Creating a custom connection to read from multiple files

4 messages · Brian Ripley, Tomas Kalibera

#
Hello,

is it possible to create my own connection which I could use with
read.table or scan ? I would like to create a connection that would read
from multiple files in sequence (like if they were concatenated),
possibly with an option to skip first n lines of each file. I would like
to avoid using platform specific scripts for that... (currently I invoke
"/bin/cat" from R to create a concatenation of all those files).

Thanks,

Tomas
#
On Thu, 20 Jan 2005, Tomas Kalibera wrote:

            
Yes.  In a sense, all the connections are custom connections written by 
someone.
I would use pipes, but a pure R solution is to process the files to an 
anonymous file() connection and then read that.

However, what is wrong with reading a file at a time and combining the 
results in R using rbind?
#
Dear Prof Ripley,

thanks for your suggestions, it's very nice one can create custom 
connections directly in R and I think it is what I need just now.
Well, the problem is performance. If I concatenate all those files, they 
have around 8MB, can grow to tens of MBs in near future.

Both concatenating and reading from a single file by scan takes 5 
seconds (which is almost OK).

However, reading individual files by read.table and rbinding one by one 
( samples=rbind(samples, newSamples ) takes minutes. The same is when I 
concatenate lists manually. Scan does not help significantly. I guess 
there is some overhead in detecting dimensions of objects in rbind (?) 
or re-allocation or copying data ?

Best regards,

Tomas Kalibera
#
On Thu, 20 Jan 2005, Tomas Kalibera wrote:

            
rbind is vectorized so you are using it (way) suboptimally.