An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20140711/93127782/attachment.pl>
Problems with read.table and data structure
5 messages · Tim Richter-Heitmann, Marc Schwartz, David L Carlson +1 more
On Jul 11, 2014, at 9:15 AM, Tim Richter-Heitmann <trichter at uni-bremen.de> wrote:
Hi there!
I have huge datafile of 600 columns 360 samples:
data <- read.table("small.txt", header = TRUE, sep = "\t", dec = ".",
row.names=1)
The txt.file (compiled with excel) is showing me only numbers, however R
gives me the structure of ANY column as "factor".
When i try "stringsAsFactors=FALSE" in the read command, the structure
of the dataset becomes "character."
When i try as.numeric(data), i get
Error: (list) object cannot be coerced to type 'double'
even, if i try to subset columns with [].
When i try as.numeric on single columns with $, i am successful, but the numbers dont make any sense at all, as the factors are not converted by their levels:
Factor w/ 358 levels "0,123111694",..: 11 14 50 12 38 44 13 76 31 30
becomes
num 11 14 50 12 38 44 13 76 31 30
whereas i would need the levels, though!
I suspect excel to mess up the "save as tab-delimited text", but the text file seems fine with me on surface (i dont know how the numbers are stored internally). I just see correct numbers, also the View command
yields the correct content.
Anyone knows help? Its pretty annoying.
Thank you!
Hi, See: http://cran.r-project.org/doc/FAQ/R-FAQ.html#How-do-I-convert-factors-to-numeric_003f Regards, Marc Schwartz
On Jul 11, 2014, at 2:36 PM, Marc Schwartz <marc_schwartz at me.com> wrote:
On Jul 11, 2014, at 9:15 AM, Tim Richter-Heitmann <trichter at uni-bremen.de> wrote:
Hi there!
I have huge datafile of 600 columns 360 samples:
data <- read.table("small.txt", header = TRUE, sep = "\t", dec = ".",
row.names=1)
The txt.file (compiled with excel) is showing me only numbers, however R
gives me the structure of ANY column as "factor".
When i try "stringsAsFactors=FALSE" in the read command, the structure
of the dataset becomes "character."
When i try as.numeric(data), i get
Error: (list) object cannot be coerced to type 'double'
even, if i try to subset columns with [].
When i try as.numeric on single columns with $, i am successful, but the numbers dont make any sense at all, as the factors are not converted by their levels:
Factor w/ 358 levels "0,123111694",..: 11 14 50 12 38 44 13 76 31 30
becomes
num 11 14 50 12 38 44 13 76 31 30
whereas i would need the levels, though!
I suspect excel to mess up the "save as tab-delimited text", but the text file seems fine with me on surface (i dont know how the numbers are stored internally). I just see correct numbers, also the View command
yields the correct content.
Anyone knows help? Its pretty annoying.
Thank you!
Hi, See: http://cran.r-project.org/doc/FAQ/R-FAQ.html#How-do-I-convert-factors-to-numeric_003f Regards, Marc Schwartz
Sorry, I just noted that you defined dec = "." in your call to read.table(), whereas it appears that a comma (,) is being used as a decimal separator in your source data. Modify the dec = "." to dec = "," and that should obviate the need to convert the numeric values to factors during import. They should be converted to numerics right away. For example:
str(read.table(textConnection("0,1234"), dec = "."))
'data.frame': 1 obs. of 1 variable: $ V1: Factor w/ 1 level "0,1234": 1
str(read.table(textConnection("0,1234"), dec = ","))
'data.frame': 1 obs. of 1 variable: $ V1: num 0.123 Regards, Marc
It is hard to diagnose without looking at the file. For example
readLines("small.txt", n=5)
would print out the first five lines that might show problems with wrapping the lines. What does dim(data) give you? Are you getting all 360 samples and 600 columns? You could also try using the colClasses=
argument in read.table(), eg. colClasses=rep("numeric", 600). You could also have Excel save in csv format and use read.csv().
David C
-----Original Message-----
From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf Of Tim Richter-Heitmann
Sent: Friday, July 11, 2014 9:16 AM
To: R-help at r-project.org
Subject: [R] Problems with read.table and data structure
Hi there!
I have huge datafile of 600 columns 360 samples:
data <- read.table("small.txt", header = TRUE, sep = "\t", dec = ".",
row.names=1)
The txt.file (compiled with excel) is showing me only numbers, however R
gives me the structure of ANY column as "factor".
When i try "stringsAsFactors=FALSE" in the read command, the structure
of the dataset becomes "character."
When i try as.numeric(data), i get
Error: (list) object cannot be coerced to type 'double'
even, if i try to subset columns with [].
When i try as.numeric on single columns with $, i am successful, but the numbers dont make any sense at all, as the factors are not converted by their levels:
Factor w/ 358 levels "0,123111694",..: 11 14 50 12 38 44 13 76 31 30
becomes
num 11 14 50 12 38 44 13 76 31 30
whereas i would need the levels, though!
I suspect excel to mess up the "save as tab-delimited text", but the text file seems fine with me on surface (i dont know how the numbers are stored internally). I just see correct numbers, also the View command
yields the correct content.
Anyone knows help? Its pretty annoying.
Thank you!
Tim Richter-Heitmann [[alternative HTML version deleted]] ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
data <- read.table("small.txt", header = TRUE, sep = "\t", dec = ".", row.names=1)
...
Factor w/ 358 levels "0,123111694",..: 11 14 50 12 38 44 13 76 31 30
It looks like your data file used commas for the decimal point. Is that right? You used dec="." when reading it; does dec="," work better? Bill Dunlap TIBCO Software wdunlap tibco.com On Fri, Jul 11, 2014 at 7:15 AM, Tim Richter-Heitmann
<trichter at uni-bremen.de> wrote:
Hi there!
I have huge datafile of 600 columns 360 samples:
data <- read.table("small.txt", header = TRUE, sep = "\t", dec = ".",
row.names=1)
The txt.file (compiled with excel) is showing me only numbers, however R
gives me the structure of ANY column as "factor".
When i try "stringsAsFactors=FALSE" in the read command, the structure
of the dataset becomes "character."
When i try as.numeric(data), i get
Error: (list) object cannot be coerced to type 'double'
even, if i try to subset columns with [].
When i try as.numeric on single columns with $, i am successful, but the numbers dont make any sense at all, as the factors are not converted by their levels:
Factor w/ 358 levels "0,123111694",..: 11 14 50 12 38 44 13 76 31 30
becomes
num 11 14 50 12 38 44 13 76 31 30
whereas i would need the levels, though!
I suspect excel to mess up the "save as tab-delimited text", but the text file seems fine with me on surface (i dont know how the numbers are stored internally). I just see correct numbers, also the View command
yields the correct content.
Anyone knows help? Its pretty annoying.
Thank you!
--
Tim Richter-Heitmann
[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.