Trouble pulling data from a messy ASCII file...
Here is an example of some code that might do it for you::
input <- readLines(textConnection("19 c:/data/WF-100/2008/20080911/trk/20080911.013115.007.17.txt
+ 10 s name of program that wrote this file trkplt name of program that wrote this file + 10 GORDON machine that generated this file machine that generated this file + 10 3.7 version of program + 10 3.6 version of this data file + 10 5.81 version of Universal Library + 10 20081121.145730 when this file was written + 10 Windows_XP operating system used operating system used + * + * radar characteristics + 11 WF-100 + 11 20000000 A/D rate, samples/second + 11 7.5 bin width, m + 11 800 nominal PRF, Hz + 11 0.25 nominal pulse width, microsec + 11 0 tuning, volts + 11 3.19779 nominal wave length, cm"))
closeAllConnections()
# parse out the data
f.parse <- function(line){
+ x <- sub("^(\\S+)\\s+(\\S+)\\s*(.*)", "\\1`\\2`\\3", line)
+ unlist(strsplit(x, "`"))
+ }
fileName <- ''
result <- NULL
for (i in input){
+ values <- f.parse(i)
+ switch(values[1],
+ '19'={fileName <<- values[2]},
+ '*'=NULL, # ignore comments
+ '10'=,
+ '11'={result <<- rbind(result, c(fileName, values[3], values[2]))}
+ )
+ }
# convert to dataframe for 'melt'
result <- as.data.frame(result, stringsAsFactors=FALSE)
names(result) <- c('fileName', 'variable', 'value')
require(reshape)
cast(result, fileName ~ variable, c)
fileName A/D
rate, samples/second bin width, m
1 c:/data/WF-100/2008/20080911/trk/20080911.013115.007.17.txt
20000000 7.5
machine that generated this file machine that generated this file
1 GORDON
name of program that wrote this file trkplt name of program that
wrote this file nominal PRF, Hz
1
s 800
nominal pulse width, microsec nominal wave length, cm operating
system used operating system used
1 0.25 3.19779
Windows_XP
tuning, volts version of program version of this data file version
of Universal Library
1 0 3.7 3.6
5.81
when this file was written NA
1 20081121.145730 WF-100
On Wed, Dec 17, 2008 at 12:21 PM, Titan8883 <jplaney at gmail.com> wrote:
The output I would be looking for would be one row for each data file with columns for each variable, so using a .csv example with a few variables would be: ------------------------------------------------------------------------- File_name,date_written,program_ver,data_file_ver,bin_width 20080911.013115.007.17.txt, 20081121.145730,3.7,3.6,7.5 -------------------------------------------------------------------------- My plan is to create a table with all the data files listed. This would allow me to find mean/min/max values for different variables,sort by a certain variable, etc. I am not limiting myself to R, I have seen awk mentioned before, so that sounds like it is worth looking at to prep the data. Hope that helps. jholtman wrote:
It would be helpful if you could show what the output would be for the example given. Exactly what are 'values' and what would be the 'headings'. As mentioned before, you can use readLines and then parse the data you want, but something like Perl might be easier, but it is hard to tell from the mail. On Wed, Dec 17, 2008 at 2:37 PM, Titan8883 <jplaney at gmail.com> wrote:
Hi all, I am a new graduate student who is also new to R. I am ok with the basics, but the problem I am having right now seems beyond what I can do..so I am looking for advice. I am trying to pull data from flat ASCII files, but they do not have a "nice" structure so a simple "read.table" doesn't work. An example first half of a data file is below: ---------------------------------------------------------------------------------------------- 19 c:/data/WF-100/2008/20080911/trk/20080911.013115.007.17.txt 10 s name of program that wrote this file trkplt name of program that wrote this file 10 GORDON machine that generated this file machine that generated this file 10 3.7 version of program 10 3.6 version of this data file 10 5.81 version of Universal Library 10 20081121.145730 when this file was written 10 Windows_XP operating system used operating system used * * radar characteristics 11 WF-100 11 20000000 A/D rate, samples/second 11 7.5 bin width, m 11 800 nominal PRF, Hz 11 0.25 nominal pulse width, microsec 11 0 tuning, volts 11 3.19779 nominal wave length, cm ----------------------------------------------------------------------------------------------- ..the file goes on from there... How would I go about getting this data into some kind of useful format? This is one of about 1000 files I will need to go through. I would ideally like to get these into a format with each data file as a row with columns for the various values with the description text removed(version of program, file version, tuning volts, etc...). I'm not looking for a cut and paste answer, but perhaps some direction on where I should start. I have only done basic .csv, table, and line inputs up until now. Thanks for any advice -- View this message in context: http://www.nabble.com/Trouble-pulling-data-from-a-messy-ASCII-file...-tp21059239p21059239.html Sent from the R help mailing list archive at Nabble.com.
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
-- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem that you are trying to solve?
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
-- View this message in context: http://www.nabble.com/Trouble-pulling-data-from-a-messy-ASCII-file...-tp21059239p21060639.html Sent from the R help mailing list archive at Nabble.com.
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem that you are trying to solve?