extracting information from txt file - R-help

chuck.01

Wed, Oct 31, 2012 9:46 AM #

Hello,

Here is a link to some data:
http://www.epa.gov/emap/html/data/surfwatr/data/mastreams/9396/wchem/chmval.txt

I am trying to read this in, and want to use: 
chmval <-
read.table("http://www.epa.gov/emap/html/data/surfwatr/data/mastreams/9396/wchem/chmval.txt",
sep=",", skip= 84, header=T)

the # 84, for 84 lines skipped needs to be derived from the 5th line of the
txt file  
# Header Records:  85 

so, I need that # (-1) for input into the read.table statement above

I've tried grep but that didn't work: 
 (for this I downloaded the txt file and manually removed that hash mark!)

grep("Header Records:", read.table("chmval.txt", header=T))
Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, 
: 
  line 1 did not have 5 elements

Any ideas?
Can I just extract the 5th line?




--
View this message in context: http://r.789695.n4.nabble.com/extracting-information-from-txt-file-tp4648033.html
Sent from the R help mailing list archive at Nabble.com.

Rui Barradas

Wed, Oct 31, 2012 10:54 AM #

Hello,

Use readLines instead.

?readLines  # see argument 'n'
readLines("http://www.epa.gov/emap/html/data/surfwatr/data/mastreams/9396/wchem/chmval.txt", 
n = 5)[5]


Hope this helps,

Rui Barradas
Em 31-10-2012 16:46, chuck.01 escreveu:

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Taimur Sajid

Wed, Oct 31, 2012 10:56 AM #

This worked for the example you provided. Assumes the header count is the only numeric value on the 5th line.

	epa_extract <- function(address){
		doc <- readLines(address, n = 5)[5]
		
		head_count <- as.numeric(gsub("\\D", "", doc))
		
		read.table(address, sep = ",", header = TRUE, skip = head_count)
		}
		
	foo <- epa_extract("http://www.epa.gov/emap/html/data/surfwatr/data/mastreams/9396/wchem/chmval.txt")


Taimur Sajid
Research & Development Analyst
Primatics Financial

-----Original Message-----
From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf Of chuck.01
Sent: Wednesday, October 31, 2012 12:47 PM
To: r-help at r-project.org
Subject: [R] extracting information from txt file

Hello,

Here is a link to some data:
http://www.epa.gov/emap/html/data/surfwatr/data/mastreams/9396/wchem/chmval.txt

I am trying to read this in, and want to use: 
chmval <-
read.table("http://www.epa.gov/emap/html/data/surfwatr/data/mastreams/9396/wchem/chmval.txt",
sep=",", skip= 84, header=T)

the # 84, for 84 lines skipped needs to be derived from the 5th line of the txt file # Header Records:  85 

so, I need that # (-1) for input into the read.table statement above

I've tried grep but that didn't work: 
 (for this I downloaded the txt file and manually removed that hash mark!)

grep("Header Records:", read.table("chmval.txt", header=T)) Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings,
: 
  line 1 did not have 5 elements

Any ideas?
Can I just extract the 5th line?




--
View this message in context: http://r.789695.n4.nabble.com/extracting-information-from-txt-file-tp4648033.html
Sent from the R help mailing list archive at Nabble.com.

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

jim holtman

Wed, Oct 31, 2012 11:10 AM #

This worked fine for me:

'data.frame':   711 obs. of  75 variables:
 $ ALDI    : chr  "." "." "." "." ...
 $ ALDS    : chr  "." "S" "S" "S" ...
 $ ALDSF   : chr  " " " " " " " " ...
 $ ALKCALC : chr  "106.05" "210.7" "73.51" "432.63" ...
 $ ALOR    : chr  "." "S" "S" "S" ...
 $ ALORF   : chr  " " " " " " " " ...
 $ ALTD    : chr  "54" "36" "47" "12" ...
 $ ALTDF   : chr  " " " " " " " " ...
 $ ANC     : chr  "115" "207.2" "82.2" "435.2" ...

On Wed, Oct 31, 2012 at 12:46 PM, chuck.01 <CharlieTheBrown77 at gmail.com> wrote:

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.

David Winsemius

Wed, Oct 31, 2012 11:11 AM #

On Oct 31, 2012, at 9:46 AM, chuck.01 wrote:

That "# (-1)" is fairly cryptic to my reading, but it appears you are seeing the behavior of the "3" character in terminating input for comments. Changing the comment character in the call to read.table will allow input from that line.

?read.table

You will need to read only the first 5 or 6 lines first, then execute a separate read.table while skipping input from those lines as well as the variable list that forms a secondary header.

V1                                       V2
1          Dataset               EMAP Stream Chemistry Data
2        File Name                                   chmval
3     Date Created                                 02/22/99
4      # Variables                                       75
5 # Header Records                                       85
6   # Data Records                                      711

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

David Winsemius, MD
Alameda, CA, USA

jim holtman

Wed, Oct 31, 2012 11:14 AM #

Using na.string works better:

'data.frame':   711 obs. of  75 variables:
 $ ALDI    : int  NA NA NA NA NA NA NA NA NA NA ...
 $ ALDS    : chr  NA "S" "S" "S" ...
 $ ALDSF   : chr  " " " " " " " " ...
 $ ALKCALC : num  106 210.7 73.5 432.6 38.7 ...
 $ ALOR    : chr  NA "S" "S" "S" ...
 $ ALORF   : chr  " " " " " " " " ...
 $ ALTD    : int  54 36 47 12 19 10 12 5 8 6 ...
 $ ALTDF   : chr  " " " " " " " " ...
 $ ANC     : num  115 207.2 82.2 435.2 37.4 ...
 $ ANCF    : chr  " " " " " " " " ...
 $ ANDEF   : num  82.5 52.3 31.8 21.9 12.2 ...
 $ ANSUM   : num  771 728 328 892 251 ...
 $ CA      : num  303 529 182 392 124 ...

On Wed, Oct 31, 2012 at 12:46 PM, chuck.01 <CharlieTheBrown77 at gmail.com> wrote:

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.

David Winsemius

Wed, Oct 31, 2012 11:26 AM #

On Oct 31, 2012, at 11:11 AM, David Winsemius wrote:

That would be the shifted-"3".

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

David Winsemius, MD
Alameda, CA, USA