Hello, is it possible to create my own connection which I could use with read.table or scan ? I would like to create a connection that would read from multiple files in sequence (like if they were concatenated), possibly with an option to skip first n lines of each file. I would like to avoid using platform specific scripts for that... (currently I invoke "/bin/cat" from R to create a concatenation of all those files). Thanks, Tomas
Creating a custom connection to read from multiple files
4 messages · Brian Ripley, Tomas Kalibera
On Thu, 20 Jan 2005, Tomas Kalibera wrote:
is it possible to create my own connection which I could use with
Yes. In a sense, all the connections are custom connections written by someone.
read.table or scan ? I would like to create a connection that would read from multiple files in sequence (like if they were concatenated), possibly with an option to skip first n lines of each file. I would like to avoid using platform specific scripts for that... (currently I invoke "/bin/cat" from R to create a concatenation of all those files).
I would use pipes, but a pure R solution is to process the files to an anonymous file() connection and then read that. However, what is wrong with reading a file at a time and combining the results in R using rbind?
Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595
Dear Prof Ripley, thanks for your suggestions, it's very nice one can create custom connections directly in R and I think it is what I need just now.
However, what is wrong with reading a file at a time and combining the results in R using rbind?
Well, the problem is performance. If I concatenate all those files, they have around 8MB, can grow to tens of MBs in near future. Both concatenating and reading from a single file by scan takes 5 seconds (which is almost OK). However, reading individual files by read.table and rbinding one by one ( samples=rbind(samples, newSamples ) takes minutes. The same is when I concatenate lists manually. Scan does not help significantly. I guess there is some overhead in detecting dimensions of objects in rbind (?) or re-allocation or copying data ? Best regards, Tomas Kalibera
On Thu, 20 Jan 2005, Tomas Kalibera wrote:
Dear Prof Ripley, thanks for your suggestions, it's very nice one can create custom connections directly in R and I think it is what I need just now.
However, what is wrong with reading a file at a time and combining the results in R using rbind?
Well, the problem is performance. If I concatenate all those files, they have around 8MB, can grow to tens of MBs in near future. Both concatenating and reading from a single file by scan takes 5 seconds (which is almost OK). However, reading individual files by read.table and rbinding one by one ( samples=rbind(samples, newSamples ) takes minutes. The same is when I concatenate lists manually. Scan does not help significantly. I guess there is some overhead in detecting dimensions of objects in rbind (?) or re-allocation or copying data ?
rbind is vectorized so you are using it (way) suboptimally.
Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595