Skip to content

How to fetch specific part from a number of Text files?

5 messages · Megh Dal, Charles C. Berry, Augusto.Sanabria at ga.gov.au

#
Hi all,

I my c: drive I have possibly 1,000 notepad files, with .txt extension. They
are named as the dates on which they were saved i.e. 1st file name is
"Volume_4-18-2008", 2nd one is "Volume_4-21-2008", 3rd one
"Volume_4-22-2008" and so on............

Also, content of each file are in same format like :

******** content of 1st file *************
section : 1
-----       ---------      ----------    -----------
-----       ---------      ----------    -----------
-----       ---------      ----------    -----------
-----       ---------      ----------    -----------
section : 2
-----       ---------      ----------    -----------
-----       ---------      ----------    -----------
-----       ---------      ----------    -----------
-----       ---------      ----------    -----------
section : 3
-----       ---------      ----------    -----------
-----       ---------      ----------    -----------
-----       ---------      ----------    -----------
-----       ---------      ----------    -----------
section : 4
-----       ---------      ----------    -----------
-----       ---------      ----------    -----------
-----       ---------      ----------    -----------
-----       ---------      ----------    -----------

Here all files have 4-sections, just like shown here but contents within
each section (i.e. dashed line here) differs file to file.

What I have to do is I have to fetch contents of "section : 2" from each
file and then save it to a R-object, matrix of list for further analysis.

Can you ppl please tell me how to do that?

Thanks and regards,
#
On Mon, 15 Dec 2008, megh wrote:

            
Here is the outline:

 	*) use list.files() or Sys.glob() to get a list of the files

 	*) write a function that takes the file name as its arg, uses
            readLines() to swallow the text and uses grep() to find the
            'section' lines. Then put the 'dashes' in between two section
            lines into a separate object (say, dash.lines). Then use

 		as.matrix( read.table(con <- textConnection( dash.lines ) )
 		close(con)

 	  to get the numeric values or maybe

 		sapply( strsplit(dash.lines, "[ ]+"), as.numeric)

 	*) debug this on one file


 	*) use lapply  to step thru the list of file names.

See

 	?list.files
 	?Sys.glob
 	?readLines
 	?grep
 	?textConnection
 	?strsplit
 	?sapply

HTH,

Chuck
Charles C. Berry                            (858) 534-2098
                                             Dept of Family/Preventive Medicine
E mailto:cberry at tajo.ucsd.edu	            UC San Diego
http://famprevmed.ucsd.edu/faculty/cberry/  La Jolla, San Diego 92093-0901
#
Thanks Charles for this reply. I have started according to your suggestion
and hopefully I can do it. In the mean time what I was thinking, instead of
calling my text files by their names, is there any mechanism to call them by
the order they are stored in that directory? Means, suppose, I have total
1000 text files in that directory and therefore I create a vector like
sel.no <- c(1:1000). Next I use the i-th element of the vector "sel.no" to
access the i-th file?

With regards,
Charles C. Berry wrote:

  
    
#
On Mon, 15 Dec 2008, megh wrote:

            
I am not sure what that order would be. If you mean 'how would I order 
files by (say) creation date?', see

 	?file.info

Eventually you need a string that has the file name in it or a connection 
object (see ?connection)  that accesses the file(s).


Means, suppose, I have total
Hmmm. Something about this question is telling me you are either a novice 
programmer or really unfamiliar with R or perhaps you just need that extra 
cup of coffee.

In any but the latter case, let me suggest that it helps to reread the 
Intro to R (and any other books/manuals you might have), read help pages 
for possibly relevant functions, and to run example( file.info ), say, to 
get a handle on functions you are tying to learn. Also, rereading the 
_posting guide_ is helpful as it is, in part, a guide to figuring out 
things in R.


HTH,

Chuck
Charles C. Berry                            (858) 534-2098
                                             Dept of Family/Preventive Medicine
E mailto:cberry at tajo.ucsd.edu	            UC San Diego
http://famprevmed.ucsd.edu/faculty/cberry/  La Jolla, San Diego 92093-0901
#
Megh,

You can capture all your external files into R using:

All_files <- dir(pattern="txt")
Then read files one by one and insert the contents from section 2,
say, in line 10, to section 3, say in line 40, into file "cont":
no_files <- length(All_files)
cont <- vector("list",no_files)
for(i in 1:no_files)cont[[i]] <- read.csv(files[i],skip=10,nrows=40)

Now the 1000 files "cont" contain 'section 2' of all your external files.

This is an effective but not very elegant way to do what you want.

Hope it helps,

Augusto


--------------------------------------------
Augusto Sanabria. MSc, PhD.
Mathematical Modeller
Risk & Impact Analysis Group
Geospatial & Earth Monitoring Division
Geoscience Australia (www.ga.gov.au)
Cnr. Jerrabomberra Av. & Hindmarsh Dr.
Symonston ACT 2601
Ph. (02) 6249-9155






-----Original Message-----
From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf Of megh
Sent: Monday, 15 December 2008 8:54
To: r-help at r-project.org
Subject: [R] How to fetch specific part from a number of Text files?



Hi all,

I my c: drive I have possibly 1,000 notepad files, with .txt extension. They are named as the dates on which they were saved i.e. 1st file name is "Volume_4-18-2008", 2nd one is "Volume_4-21-2008", 3rd one "Volume_4-22-2008" and so on............

Also, content of each file are in same format like :

******** content of 1st file *************
section : 1
-----       ---------      ----------    -----------
-----       ---------      ----------    -----------
-----       ---------      ----------    -----------
-----       ---------      ----------    -----------
section : 2
-----       ---------      ----------    -----------
-----       ---------      ----------    -----------
-----       ---------      ----------    -----------
-----       ---------      ----------    -----------
section : 3
-----       ---------      ----------    -----------
-----       ---------      ----------    -----------
-----       ---------      ----------    -----------
-----       ---------      ----------    -----------
section : 4
-----       ---------      ----------    -----------
-----       ---------      ----------    -----------
-----       ---------      ----------    -----------
-----       ---------      ----------    -----------

Here all files have 4-sections, just like shown here but contents within each section (i.e. dashed line here) differs file to file.

What I have to do is I have to fetch contents of "section : 2" from each file and then save it to a R-object, matrix of list for further analysis.

Can you ppl please tell me how to do that?

Thanks and regards,
--
View this message in context: http://www.nabble.com/How-to-fetch-specific-part-from-a-number-of-Text-files--tp21011017p21011017.html
Sent from the R help mailing list archive at Nabble.com.

______________________________________________
R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.