Skip to content

Extraneous full stop in csv read

5 messages · John, jim holtman, David Winsemius +1 more

#
I ran into a puzzling minor behaviour I would like to understand.
Reading in a csv file, I find an extraneous "." after a column header,
"in" [short for "inches"] thus, "in.". Is this due to "in" being
reserved?  I initially blamed this on RStudio or to processing the data
through LibreCalc. However, the same result occurs in a console R
session.  Sending the file to the console via less reveals no strange
characters in the first line.  The data is California statewide
rainfall which was screen captured from the Western Regional Climate
Center web site.

First 15 lines including header line:

"yr","mo","Data","in"
1895,1,8243,8.243
1895,2,2265,2.265
1895,3,2340,2.34
1895,4,1014,1.014
1895,5,1281,1.281
1895,6,58,0.058
1895,7,156,0.156
1895,8,140,0.14
1895,9,1087,1.087
1895,10,322,0.322
1895,11,1331,1.331
1895,12,2428,2.428
1896,1,7156,7.156
1896,2,712,0.712
1896,3,2982,2.982

File read in as follows:

x <- read.csv('DRI-mo-prp.csv', header = T)

Structure:

 str(x)
'data.frame':   1469 obs. of  4 variables:
 $ yr  : int  1895 1895 1895 1895 1895 1895 1895 1895 1895 1895 ...
 $ mo  : int  1 2 3 4 5 6 7 8 9 10 ...
 $ Data: int  8243 2265 2340 1014 1281 58 156 140 1087 322 ...
 $ in. : num  8.24 2.27 2.34 1.01 1.28 ...
[note "in" is now "in."]
#
try the 'read_csv' function in the 'readr' package:
+ 1895,1,8243,8.243
+ 1895,2,2265,2.265
+ 1895,3,2340,2.34
+ 1895,4,1014,1.014
+ 1895,5,1281,1.281
+ 1895,6,58,0.058
+ 1895,7,156,0.156
+ 1895,8,140,0.14
+ 1895,9,1087,1.087
+ 1895,10,322,0.322
+ 1895,11,1331,1.331
+ 1895,12,2428,2.428
+ 1896,1,7156,7.156
+ 1896,2,712,0.712
+ 1896,3,2982,2.982
+ ')
Classes ?tbl_df?, ?tbl? and 'data.frame': 15 obs. of  4 variables:
 $ yr  : int  1895 1895 1895 1895 1895 1895 1895 1895 1895 1895 ...
 $ mo  : int  1 2 3 4 5 6 7 8 9 10 ...
 $ Data: int  8243 2265 2340 1014 1281 58 156 140 1087 322 ...
 $ in  : num  8.24 2.27 2.34 1.01 1.28 ...


Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.
On Wed, Jun 28, 2017 at 7:30 PM, John <jwd at surewest.net> wrote:

            

  
  
#
or use the 'check.names = FALSE':
+ 1895,1,8243,8.243
+ 1895,2,2265,2.265
+ 1895,3,2340,2.34
+ 1895,4,1014,1.014
+ 1895,5,1281,1.281
+ 1895,6,58,0.058
+ 1895,7,156,0.156
+ 1895,8,140,0.14
+ 1895,9,1087,1.087
+ 1895,10,322,0.322
+ 1895,11,1331,1.331
+ 1895,12,2428,2.428
+ 1896,1,7156,7.156
+ 1896,2,712,0.712
+ 1896,3,2982,2.982
+ ', check.names = FALSE)
'data.frame': 15 obs. of  4 variables:
 $ yr  : int  1895 1895 1895 1895 1895 1895 1895 1895 1895 1895 ...
 $ mo  : int  1 2 3 4 5 6 7 8 9 10 ...
 $ Data: int  8243 2265 2340 1014 1281 58 156 140 1087 322 ...
 $ in  : num  8.24 2.27 2.34 1.01 1.28 ...


Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.
On Wed, Jun 28, 2017 at 7:30 PM, John <jwd at surewest.net> wrote:

            

  
  
#
If I change one of those other headers to "for", I also see the period-suffix appended, which supports your theory about reserved words being protected. If for some reason this were important to you, hten I'd suggest first looking at the code for make.names which in turn indicates that it's done with a .Internal call, so you'll need to look at the source code for the base-package.
#
On 28/06/2017 7:30 PM, John wrote:
Yes, "in" is not a valid variable name, because of its syntactic use. 
You can stop this correction by setting check.names=FALSE in your call 
to read.csv.  This will make it a little tricky to deal with in some 
situations, e.g.

 > x <- data.frame(4)
 > names(x) <- "in"
 > x
   in
1  4
 > x$in
Error: unexpected 'in' in "x$in"

but you can work around this problem: x[, "in"] and x$`in` are both fine.

Duncan Murdoch