Help: read a proportion of high through-put data
Ok, it seems to have worked on my machine as well, but for some levels you didn't mention before. ?If you are having trouble with the header names, I'll take a stab at it -- R (by default) requires them to be syntactically valid names (i.e., can't start with a number or have a dollar sign or hyphen in them) and will modify them as needed. Generally this is helpful for interactive use (if you want to call names directly). If you wish to suppress this behavior, add the "check.names = FALSE" argument to read.table() and it will keep them as is. If you ever do need a non-syntactic name again, you can get it by surrounding it in backquotes: i.e., `3s` <- 4 3s # throws an error identical(`3s`, 4) # works Michael
On Mon, Jan 23, 2012 at 11:28 PM, chee chen <chee.chen at yahoo.com> wrote:
Hi, Michael, Please ignore my previous email with the attachment, since I guess I resolved it with your suggestions (with "header=TRUE), except some minor issues with the names of the header. Regards, Chee
________________________________
From: R. Michael Weylandt <michael.weylandt at gmail.com>
To: Chee Chen <chee.chen at yahoo.com>
Cc: R-ORG <r-help at r-project.org>
Sent: Monday, January 23, 2012 10:26 PM
Subject: Re: [R] Help: read a proportion of high through-put data
It's pretty hard to answer this without the file in hand, but I'd
guess something like the following is at play:
Columns of data.frame()s have to have a single type. So if R sees
anything it thinks is a character, it will coerce the whole column to
character. Since you have not set the first row to be a header, it's
probably interpreting that as the first element of the row and
recognizes it as character. This behavior is sometimes auto-rectified
by read.table() or read.csv() if it sees a column without a member in
the first line -- as that suggests that we have column and rownames
around rectangular data -- but that doesn't seem to be happening here.
What happens if you try
read.table("sample.txt", header = TRUE)
An alternative route, if those names are coming in as headers, would
be to manually coerce the columns -- if everything is to be numeric,
just wrap the call in as.numeric()
Michael
On Mon, Jan 23, 2012 at 10:18 PM, Chee Chen <chee.chen at yahoo.com> wrote:
Dear All,
I have a text file, tab delimited, called "sample.txt",as follows:
ID_REF ? ?382 ? ?GC_Score ? ?Theta ? ?R ? ?B_Allele_Freq ? ?Log_R_Ratio
200003 ? ?BB ? ?0.9101527 ? ?0.9734979 ? ?0.8788951 ? ?1 ? ?0
200006 ? ?AB ? ?0.6003323 ? ?0.4385073 ? ?2.033364 ? ?0.4850979
?0.01553433
I have explored various options of the command: read.table, with one as:
read.table("sample.txt", na.strings="NA",as.is = TRUE)
However, everything that it reads in becomes a character.
Could you please help me on this?
Best regards,
Chee
? ? ? ?[[alternative HTML version deleted]]
______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.