Skip to content
Prev 323653 / 398503 Next

Help with how to process multiple column variable in a read.table

Hi,
Try this:
unemp.wy <- read.table("ftp://ftp.bls.gov/pub/time.series/la/la.data.59.Wyoming", header=TRUE, sep="\t",stringsAsFactors=FALSE,na.strings="") 
dim(unemp.wy)
#[1] 46692???? 5
?head(unemp.wy)
#????????? series_id year period value footnote_codes
#1 LASST56000003???? 1976??? M01?? 4.2?????????? <NA>
#2 LASST56000003???? 1976??? M02?? 4.1?????????? <NA>
#3 LASST56000003???? 1976??? M03?? 4.0?????????? <NA>
#4 LASST56000003???? 1976??? M04?? 3.9?????????? <NA>
#5 LASST56000003???? 1976??? M05?? 3.9?????????? <NA>
#6 LASST56000003???? 1976??? M06?? 3.9?????????? <NA>
?str(unemp.wy)
#'data.frame':??? 46692 obs. of? 5 variables:
# $ series_id???? : chr? "LASST56000003??? " "LASST56000003??? " "LASST56000003??? " "LASST56000003??? " ...
# $ year????????? : int? 1976 1976 1976 1976 1976 1976 1976 1976 1976 1976 ...
# $ period??????? : chr? "M01" "M02" "M03" "M04" ...
# $ value???????? : num? 4.2 4.1 4 3.9 3.9 3.9 4 4.1 4.1 4 ...
# $ footnote_codes: chr? NA NA NA NA ...
?tail(unemp.wy)
#????????????? series_id year period? value footnote_codes
#46687 LAUST56000006???? 2012??? M11 305820????????????? D
#46688 LAUST56000006???? 2012??? M12 304293????????????? D
#46689 LAUST56000006???? 2012??? M13 306064????????????? D
#46690 LAUST56000006???? 2013??? M01 305150?????????? <NA>
#46691 LAUST56000006???? 2013??? M02 304918?????????? <NA>
#46692 LAUST56000006???? 2013??? M03 305556????????????? P
A.K.
column structure has 5 columns but on the 5th column data is not always 
present, >so it is throwing of error: here is my code:
?>unemp.wy <- read.table("ftp://ftp.bls.gov/pub/time.series/la/la.data.59.Wyoming", header=FALSE, sep="", skip=2 )
?> line 384 did not have 4 elements
column gets added as well. This seems to throw of the read.table. Is it 
possible to just >read the line a a text string and then parse it or is 
there a better way to approach this problem.
LASST56000003 ? ?	1976	M04	 ? ? ? ? 3.9
unsure of the best way to overcome a Program vector approach to data 
cleansing.