Skip to content

Skipping lines and incomplete rows

12 messages · vioravis, Uwe Ligges, Rui Barradas +1 more

#
I have a text file that has semi-colon separated values. The table is nearly
10,000 by 585. The files looks as follows:

*******************************************
First line: Skip this line
Second line: skip this line
Third line: skip this line
variable1 Variable2 Variable3 Variable4
                   Unit1         Unit2         Unit3
10              0.1               0.01           0.001
20              0.2               0.02           0.002 
30              0.3               0.03           0.003
40              0.4               0.04           0.004
*******************************************

The first three lines need to be skipped. Moreover, line 5 doesn't have
units for all the variables and hence, has to be skipped as well.
Effectively, I want the following to be read to a dataframe skipping rows 1,
2, 3 and 5.

*******************************************
variable1 Variable2 Variable3 Variable4
10              0.1               0.01           0.001
20              0.2               0.02           0.002 
30              0.3               0.03           0.003
40              0.4               0.04           0.004
*******************************************

I tried using read.table with skip for line 1-3 as follows 

inputData <- read.table("test.txt",sep = ";",skip = 3)

but the line 4 is creating problem with the following error:

Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, 
: 
  line 3 did not have 585 elements

Can someone help me with this?

Thank you.

Ravi

--
View this message in context: http://r.789695.n4.nabble.com/Skipping-lines-and-incomplete-rows-tp4635830.html
Sent from the R help mailing list archive at Nabble.com.
#
Hello,

Try the following.

head <- readLines("test.txt", n=4)[4]
dat <- read.table("test.txt", skip=5)
names(dat) <- unlist(strsplit(head, " "))
dat


hope this helps,

Rui Barradas

Em 09-07-2012 11:23, vioravis escreveu:
#
Hi,

I guess you should have "fill=TRUE" in the read.table.

dat1<-read.table(text="
First line: Skip this line
Second line: skip this line
Third line: skip this line
variable1 Variable2 Variable3 Variable4
????????????????? Unit1??????? Unit2??????? Unit3
10????????????? 0.1????????????? 0.01????????? 0.001
20????????????? 0.2????????????? 0.02????????? 0.002
30????????????? 0.3????????????? 0.03????????? 0.003
40????????????? 0.4????????????? 0.04????????? 0.004
",sep="",skip=4, fill=TRUE,header=TRUE)
dat1<-dat1[-1,]
row.names(dat1)<-1:nrow(dat1)
dat1
? variable1 Variable2 Variable3 Variable4
1??????? 10?????? 0.1????? 0.01???? 0.001
2??????? 20?????? 0.2????? 0.02???? 0.002
3??????? 30?????? 0.3????? 0.03???? 0.003
4??????? 40?????? 0.4????? 0.04???? 0.004


A.K.




----- Original Message -----
From: vioravis <vioravis at gmail.com>
To: r-help at r-project.org
Cc: 
Sent: Monday, July 9, 2012 6:23 AM
Subject: [R] Skipping lines and incomplete rows

I have a text file that has semi-colon separated values. The table is nearly
10,000 by 585. The files looks as follows:

*******************************************
First line: Skip this line
Second line: skip this line
Third line: skip this line
variable1 Variable2 Variable3 Variable4
? ? ? ? ? ? ? ? ?  Unit1? ? ? ?  Unit2? ? ? ?  Unit3
10? ? ? ? ? ? ? 0.1? ? ? ? ? ? ?  0.01? ? ? ? ?  0.001
20? ? ? ? ? ? ? 0.2? ? ? ? ? ? ?  0.02? ? ? ? ?  0.002 
30? ? ? ? ? ? ? 0.3? ? ? ? ? ? ?  0.03? ? ? ? ?  0.003
40? ? ? ? ? ? ? 0.4? ? ? ? ? ? ?  0.04? ? ? ? ?  0.004
*******************************************

The first three lines need to be skipped. Moreover, line 5 doesn't have
units for all the variables and hence, has to be skipped as well.
Effectively, I want the following to be read to a dataframe skipping rows 1,
2, 3 and 5.

*******************************************
variable1 Variable2 Variable3 Variable4
10? ? ? ? ? ? ? 0.1? ? ? ? ? ? ?  0.01? ? ? ? ?  0.001
20? ? ? ? ? ? ? 0.2? ? ? ? ? ? ?  0.02? ? ? ? ?  0.002 
30? ? ? ? ? ? ? 0.3? ? ? ? ? ? ?  0.03? ? ? ? ?  0.003
40? ? ? ? ? ? ? 0.4? ? ? ? ? ? ?  0.04? ? ? ? ?  0.004
*******************************************

I tried using read.table with skip for line 1-3 as follows 

inputData <- read.table("test.txt",sep = ";",skip = 3)

but the line 4 is creating problem with the following error:

Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, 
: 
? line 3 did not have 585 elements

Can someone help me with this?

Thank you.

Ravi

--
View this message in context: http://r.789695.n4.nabble.com/Skipping-lines-and-incomplete-rows-tp4635830.html
Sent from the R help mailing list archive at Nabble.com.

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
#
Hello,

Just now I checked reading directly from .txt file instead of the one showed in my earlier reply,


#Use skip=3 instead of 4.

dat1<-read.table("dat1.txt",sep="",skip=3,fill=TRUE,header=TRUE)
dat1<-dat1[-1,]
?row.names(dat1)<-1:nrow(dat1)
?dat1
? variable1 Variable2 Variable3 Variable4
1??????? 10?????? 0.1????? 0.01???? 0.001
2??????? 20?????? 0.2????? 0.02???? 0.002
3??????? 30?????? 0.3????? 0.03???? 0.003
4??????? 40?????? 0.4????? 0.04???? 0.004

Hope it works.


A.K.




----- Original Message -----
From: vioravis <vioravis at gmail.com>
To: r-help at r-project.org
Cc: 
Sent: Monday, July 9, 2012 6:23 AM
Subject: [R] Skipping lines and incomplete rows

I have a text file that has semi-colon separated values. The table is nearly
10,000 by 585. The files looks as follows:

*******************************************
First line: Skip this line
Second line: skip this line
Third line: skip this line
variable1 Variable2 Variable3 Variable4
? ? ? ? ? ? ? ? ?  Unit1? ? ? ?  Unit2? ? ? ?  Unit3
10? ? ? ? ? ? ? 0.1? ? ? ? ? ? ?  0.01? ? ? ? ?  0.001
20? ? ? ? ? ? ? 0.2? ? ? ? ? ? ?  0.02? ? ? ? ?  0.002 
30? ? ? ? ? ? ? 0.3? ? ? ? ? ? ?  0.03? ? ? ? ?  0.003
40? ? ? ? ? ? ? 0.4? ? ? ? ? ? ?  0.04? ? ? ? ?  0.004
*******************************************

The first three lines need to be skipped. Moreover, line 5 doesn't have
units for all the variables and hence, has to be skipped as well.
Effectively, I want the following to be read to a dataframe skipping rows 1,
2, 3 and 5.

*******************************************
variable1 Variable2 Variable3 Variable4
10? ? ? ? ? ? ? 0.1? ? ? ? ? ? ?  0.01? ? ? ? ?  0.001
20? ? ? ? ? ? ? 0.2? ? ? ? ? ? ?  0.02? ? ? ? ?  0.002 
30? ? ? ? ? ? ? 0.3? ? ? ? ? ? ?  0.03? ? ? ? ?  0.003
40? ? ? ? ? ? ? 0.4? ? ? ? ? ? ?  0.04? ? ? ? ?  0.004
*******************************************

I tried using read.table with skip for line 1-3 as follows 

inputData <- read.table("test.txt",sep = ";",skip = 3)

but the line 4 is creating problem with the following error:

Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, 
: 
? line 3 did not have 585 elements

Can someone help me with this?

Thank you.

Ravi

--
View this message in context: http://r.789695.n4.nabble.com/Skipping-lines-and-incomplete-rows-tp4635830.html
Sent from the R help mailing list archive at Nabble.com.

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
#
Thanks a lot Rui and Arun.

The methods work fine with the data I gave but when I tried the two methods
with the following semi-colon separated data using sep = ";". Only the first
3 columnns are read properly rest of the columns are either empty or NAs.


**********************************************************************************************
Remove this line
Remove this line
Remove this line
Time;Actual Speed;Actual Direction;Temp;Press;Value1;Value2
;[m/s];[?];?C;[hPa];[MWh];[MWh]
1/1/2012;0.0;0;#N/A;#N/A;0.0000;0.0000
1/2/2012;0.0;0;#N/A;#N/A;0.0000;0.0000
1/3/2012;0.0;0;#N/A;#N/A;1.5651;2.2112
1/4/2012;0.0;0;#N/A;#N/A;1.0000;2.0000
1/5/2012;0.0;0;#N/A;#N/A;3.2578;7.5455
***********************************************************************************************

I used the following code:
dat1<-read.table("testInput.txt",sep=";",skip=3,fill=TRUE,header=TRUE) 
dat1<-dat1[-1,] 
row.names(dat1)<-1:nrow(dat1)

Could you please let me know what is wrong with this approach? 

Thank you.

Ravi

--
View this message in context: http://r.789695.n4.nabble.com/Skipping-lines-and-incomplete-rows-tp4635830p4635952.html
Sent from the R help mailing list archive at Nabble.com.
#
Hello,

My approach was slightly different, to use readLines to take care of the 
header and read.table for the data. This works with the new dataset 
you've posted, but we must use the option comment.char = "".

Try the following.


head <- readLines("test.txt", n=4)[4]
dat <- read.table("test.txt", skip=5, sep=";", stringsAsFactors=FALSE, 
comment.char="c")
names(dat) <- unlist(strsplit(head, ";"))

dat$Time <- as.Date(dat$Time, format="%m/%d/%Y")
dat$Temp[dat$Temp == '#N/A'] <- NA
dat$Press[dat$Press == '#N/A'] <- NA
dat


It works with me, good luck.

Rui Barradas

Em 10-07-2012 06:41, vioravis escreveu:
#
Or maybe it's better to coerce Temp and Press to numeric, if they are 
variables temperature and presssure.

dat$Time <- as.Date(dat$Time, format="%m/%d/%Y")
dat$Temp <- as.numeric(dat$Temp)
dat$Press <- as.numeric(dat$Press)

This makes those '#N/A' values NA.

Rui Barradas

Em 10-07-2012 09:34, Rui Barradas escreveu:
#
Hello Ravi,

I was not aware that your dataset have special character "#" before NA.? If it was just plain NA, it would have worked.? So, It's not because of sep= ";".

See below:

#Without "#"
dat1<-read.table(text="
?Remove this line
?Remove this line
?Remove this line
?Time;Actual Speed;Actual Direction;Temp;Press;Value1;Value2
? ;[m/s];[?];?C;[hPa];[MWh];[MWh]
?1/1/2012;0.0;0;NA;NA;0.0000;0.0000
?1/2/2012;0.0;0;NA;NA;0.0000;0.0000
?1/3/2012;0.0;0;NA;NA;1.5651;2.2112
?1/4/2012;0.0;0;NA;NA;1.0000;2.0000
?1/5/2012;0.0;0;NA;NA;3.2578;7.5455
?",sep=";",header=TRUE,fill=TRUE,skip=4,stringsAsFactors=FALSE)
????? Time Actual.Speed Actual.Direction Temp Press Value1 Value2
1???????????????? [m/s]????????????? [?]?? ?C [hPa]? [MWh]? [MWh]
2 1/1/2012????????? 0.0??????????????? 0 <NA>? <NA> 0.0000 0.0000
3 1/2/2012????????? 0.0??????????????? 0 <NA>? <NA> 0.0000 0.0000
4 1/3/2012????????? 0.0??????????????? 0 <NA>? <NA> 1.5651 2.2112
5 1/4/2012????????? 0.0??????????????? 0 <NA>? <NA> 1.0000 2.0000
6 1/5/2012????????? 0.0??????????????? 0 <NA>? <NA> 3.2578 7.5455


#With "#": Reading data from the .txt file.? 

# In the documentation (http://stat.ethz.ch/R-manual/R-devel/library/utils/html/read.table.html), comment.char="#" is an option in the read.table, but unfortunately it shows only blank columns after the first three columns.? 


#I think Rui's method of reading header separately using readLines might be a good option.? Or if you know the columnheadings, then you can do this:

dat2<-read.table("dat2.txt",skip=4,col.names=c("Time","Actual Speed","Actual Direction", "Temp","Press","Value1","Value2"),fill=TRUE,sep=";",comment.char="c")
????? Time Actual.Speed Actual.Direction Temp Press Value1 Value2
1???????????????? [m/s]????????????? [?]?? ?C [hPa]? [MWh]? [MWh]
2 1/1/2012????????? 0.0??????????????? 0? #NA?? #NA 0.0000 0.0000
3 1/2/2012????????? 0.0??????????????? 0? #NA?? #NA 0.0000 0.0000
4 1/3/2012????????? 0.0??????????????? 0? #NA?? #NA 1.5651 2.2112
5 1/4/2012????????? 0.0??????????????? 0? #NA?? #NA 1.0000 2.0000
6 1/5/2012????????? 0.0??????????????? 0? #NA?? #NA 3.2578 7.5455


A.K.










----- Original Message -----
From: vioravis <vioravis at gmail.com>
To: r-help at r-project.org
Cc: 
Sent: Tuesday, July 10, 2012 1:41 AM
Subject: Re: [R] Skipping lines and incomplete rows

Thanks a lot Rui and Arun.

The methods work fine with the data I gave but when I tried the two methods
with the following semi-colon separated data using sep = ";". Only the first
3 columnns are read properly rest of the columns are either empty or NAs.


**********************************************************************************************
Remove this line
Remove this line
Remove this line
Time;Actual Speed;Actual Direction;Temp;Press;Value1;Value2
;[m/s];[?];?C;[hPa];[MWh];[MWh]
1/1/2012;0.0;0;#N/A;#N/A;0.0000;0.0000
1/2/2012;0.0;0;#N/A;#N/A;0.0000;0.0000
1/3/2012;0.0;0;#N/A;#N/A;1.5651;2.2112
1/4/2012;0.0;0;#N/A;#N/A;1.0000;2.0000
1/5/2012;0.0;0;#N/A;#N/A;3.2578;7.5455
***********************************************************************************************

I used the following code:
dat1<-read.table("testInput.txt",sep=";",skip=3,fill=TRUE,header=TRUE) 
dat1<-dat1[-1,] 
row.names(dat1)<-1:nrow(dat1)

Could you please let me know what is wrong with this approach? 

Thank you.

Ravi

--
View this message in context: http://r.789695.n4.nabble.com/Skipping-lines-and-incomplete-rows-tp4635830p4635952.html
Sent from the R help mailing list archive at Nabble.com.

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
#
You actually jsut need to say what the comment char and what the 
na.strings are:

read.table(filename, sep=";", skip=3, header=TRUE, na.string="#N/A", 
comment.char="")

Uwe Ligges
On 10.07.2012 19:30, arun wrote:
#
Thanks a lot for the guidance. I have another text file with a time stamp and
an empty column as given below:

********************************************************************************************
First line: Skip this line 
Second line: skip this line 
Third line: skip this line 
variable1 Variable2 Variable3 Variable4 
                Unit1     Unit2     Unit3 
11/1/2004 0:00  0.1                 0.001 
11/1/2004 0:10  0.2                 0.002 
11/1/2004 0:20  0.3                 0.003 
11/1/2004 0:30  0.4                 0.004 
********************************************************************************************

This is space separated text file. When I use the following code:

head <- readLines("testInput.txt", n=4)[4] 
dat <- read.table("testInput.txt", skip=5, sep="",fill = TRUE,
stringsAsFactors=FALSE) 
names(dat) <- unlist(strsplit(head, " "))

I get the following output:
'data.frame':   4 obs. of  4 variables:
 $ variable1: chr  "11/1/2004" "11/1/2004" "11/1/2004" "11/1/2004"
 $ Variable2: chr  "0:00" "0:10" "0:20" "0:30"
 $ Variable3: num  0.1 0.2 0.3 0.4
 $ Variable4: num  0.001 0.002 0.003 0.004

Variable1's date and time gets split as Variable1 and Variable2 whereas they
should both be part of Variable1.

Also, the empty column is missing from the data frame.

Is there a way to handle these two cases? 

Thank you.

Ravi


--
View this message in context: http://r.789695.n4.nabble.com/Skipping-lines-and-incomplete-rows-tp4635830p4636129.html
Sent from the R help mailing list archive at Nabble.com.
#
Hello,

That seems easy.

dat$variable1 <- with(dat, paste(variable1, variable2))
dat$variable2 <- dat$variable3
dat$variable3 <- ""

Then convert variable1 to date/time using as.POSIXct or strptime

See ?strptime.

Hope this helps,

Rui Barradas

Em 11-07-2012 13:30, vioravis escreveu:
#
Hello,
Try this:
dat3<-read.table("dat3.txt",sep="",skip=3,header=TRUE,fill=TRUE)
?dat4<-data.frame(variable1=paste(dat3[,1],dat3[,2],sep=" "),Variable2=dat3[,3],Variable3="",Variable4=dat3[,4])
?dat4<-dat4[-1,]
row.names(dat4)<-1:nrow(dat4)
dat4
????? variable1 Variable2 Variable3 Variable4
1 11/1/2004 0:00?????? 0.1?????????????? 0.001
2 11/1/2004 0:10?????? 0.2?????????????? 0.002
3 11/1/2004 0:20?????? 0.3?????????????? 0.003
4 11/1/2004 0:30?????? 0.4?????????????? 0.004
#If you need to convert date to class "Date"
dat4$variable1<-as.Date(dat4[,1],format="%m/%d/%Y %H:%M")
A.K.




----- Original Message -----
From: vioravis <vioravis at gmail.com>
To: r-help at r-project.org
Cc: 
Sent: Wednesday, July 11, 2012 8:30 AM
Subject: Re: [R] Skipping lines and incomplete rows

Thanks a lot for the guidance. I have another text file with a time stamp and
an empty column as given below:

********************************************************************************************
First line: Skip this line 
Second line: skip this line 
Third line: skip this line 
variable1 Variable2 Variable3 Variable4 
? ? ? ? ? ? ? ? Unit1? ?  Unit2? ?  Unit3 
11/1/2004 0:00? 0.1? ? ? ? ? ? ? ?  0.001 
11/1/2004 0:10? 0.2? ? ? ? ? ? ? ?  0.002 
11/1/2004 0:20? 0.3? ? ? ? ? ? ? ?  0.003 
11/1/2004 0:30? 0.4? ? ? ? ? ? ? ?  0.004 
********************************************************************************************

This is space separated text file. When I use the following code:

head <- readLines("testInput.txt", n=4)[4] 
dat <- read.table("testInput.txt", skip=5, sep="",fill = TRUE,
stringsAsFactors=FALSE) 
names(dat) <- unlist(strsplit(head, " "))

I get the following output:
'data.frame':?  4 obs. of? 4 variables:
$ variable1: chr? "11/1/2004" "11/1/2004" "11/1/2004" "11/1/2004"
$ Variable2: chr? "0:00" "0:10" "0:20" "0:30"
$ Variable3: num? 0.1 0.2 0.3 0.4
$ Variable4: num? 0.001 0.002 0.003 0.004

Variable1's date and time gets split as Variable1 and Variable2 whereas they
should both be part of Variable1.

Also, the empty column is missing from the data frame.

Is there a way to handle these two cases? 

Thank you.

Ravi


--
View this message in context: http://r.789695.n4.nabble.com/Skipping-lines-and-incomplete-rows-tp4635830p4636129.html
Sent from the R help mailing list archive at Nabble.com.

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.