Skip to content

read.table() can't read in this table (But Splus can) (PR#9687)

2 messages · Marc Schwartz, Liaw, Andy

#
On Mon, 2007-05-14 at 23:41 +0200, vax9000 at gmail.com wrote:
Using R version 2.5.0 Patched:
Warning message:
number of items read is not a multiple of the number of columns 


So I tried it with 'fill = TRUE' and that seems to work, which suggests
that perhaps something is going on with the data file structure:

DF <- read.table("http://llmpp.nih.gov/DLBCL/NEJM_Web_Fig1data", 
                 header = TRUE, sep = "\t", fill = TRUE)
'data.frame':   4734 obs. of  295 variables:
 $ UNIQID                                     : int  27481 17013 24751 27498 27486 30984 17293 28329 27459 27482 ...
 $ NAME                                       : Factor w/ 4040 levels "||*AA037178|Hs.179661|FK506 binding protein 1A (12kD)",..: 3444 3445 3446 3444 3445 657 1788 3121 3119 3119 ...
 $ MLC94.46_LYM009_de.novo.untreated          : num  0.234 0.452 0.405 0.115 0.249 ...
 $ MLC96.45_LYM186_de.novo.untreated          : num  -0.1725 -0.0387 -0.0413 -0.0242 -0.1028 ...
 $ MLC91.27_LYM427_de.novo.untreated          : num  0.200 0.175 0.195 0.223 0.179 ...
 $ MLC96.84_LYM225_transformed                : num  -0.213 -0.325 -0.200 -0.199 -0.155 ...
 $ MLC95.43_LYM095_de.novo.untreated          : num  -0.1197  0.0038 -0.0213 -0.0705 -0.0755 ...
 $ MLC91.28_LYM428_de.novo.untreated          : num  -0.3729  0.0047 -0.2220 -0.3373 -0.2808 ...
 $ MLC94.50_LYM004_de.novo.untreated          : num  -0.195 -0.224 -0.126 -0.161 -0.199 ...
 $ MLC95.46_LYM098_de.novo.untreated          : num  0.489 0.611 0.577 0.661 0.519 ...
 $ MLC95.62_LYM114_de.novo.untreated          : num  0.390 0.657 0.747 0.723 0.731 ...
 $ MLC95.85_LYM137_de.novo.untreated          : num  -0.277 -0.564 -0.297 -0.140 -0.513 ...
..


I would update your version of R and then re-try this.

HTH,

Marc Schwartz
#
It's the quoting character(s).  This following seems to read the file in
correctly:

R> DF <- read.table("http://llmpp.nih.gov/DLBCL/NEJM_Web_Fig1data", 
+                   header = TRUE, sep = "\t", quote="")
R> str(DF)
'data.frame':   7399 obs. of  295 variables:
[...]

If I have to guess, it's the "3-prime" or "5-prime" that occurs commonly
in biology...

I don't think Mr. 9000 Vax can blame R for this.

Best,
Andy
 

From: marc_schwartz at comcast.net
------------------------------------------------------------------------------
Notice:  This e-mail message, together with any attachments,...{{dropped}}