Skip to content

Help converting .txt to .csv file

6 messages · Bert Gunter, Richard M. Heiberger, Spencer Brackett +1 more

#
Good evening,

I am attempting to anaylze the protein expression data contained within
these two ICGC, TCGA datasets (one for GBM and the other for LGG)

*File for GBM  protein expression*:
https://dcc.icgc.org/search?filters=%7B%22donor%22:%7B%22projectId%22:%7B%22is%22:%5B%22GBM-US%22%5D%7D,%22availableDataTypes%22:%7B%22is%22:%5B%22pexp%22%5D%7D%7D%7D

*File for LGG protein expression:*


*https://dcc.icgc.org/search?filters=%7B%22donor%22:%7B%22projectId%22:%7B%22is%22:%5B%22LGG-US%22%5D%7D,%22availableDataTypes%22:%7B%22is%22:%5B%22pexp%22%5D%7D%7D%7D
<https://dcc.icgc.org/search?filters=%7B%22donor%22:%7B%22projectId%22:%7B%22is%22:%5B%22LGG-US%22%5D%7D,%22availableDataTypes%22:%7B%22is%22:%5B%22pexp%22%5D%7D%7D%7D>*

  When I tried to transfer the files from .txt (via Notepad) to .csv (via
Excel), the data appeared in the columns as unorganized and random
script... not like how a typical csv should be arranged at all. I need the
dataset to be converted into .csv in order to analyze it in R, which is why
I am hoping someone here might help me in doing that. If not, is there
perhaps some other way that I could analyze the datatsets on R, which again
is downloaded from the dataportal ICGC?

Best,

Spencer Brackett
#
Inline.

-- Bert

Bert Gunter

"The trouble with having an open mind is that people keep coming along and
sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )


On Wed, Dec 26, 2018 at 3:04 PM Spencer Brackett <
spbrackett20 at saintjosephhs.com> wrote:

            
Huh?? Why do you think this? A csv is just a comma delimited text file.

R can input pretty much any kind of file, ONCE YOU KNOW THE FORMAT OF WHAT
YOU ARE INPUTTING. This should be provided by the links that you gave. Then
see ?read.table or, more generally, ?scan for how to read the (text) file
into R into whatever data structure you need. See also the R data
import/export manual. Or possibly post to the Bioconductor list where they
specialize in this sort of thing and may already have packages that can
access the repositories and bring in the data in the form you need them.
They also have lots of software there for analysis, too.

Cheers,
Bert

  
  
#
I looked at the first file.  It gives an option to download as TSV
(tab separated values).
That is the same as CSV except with tabs instead of commas.
You do not need any external software to read it.  Read the downloaded
file directly into R.

read.delim looks as if it would work directly on the downloaded file.
?read.delim
The notation "\t" means the tab character.

As an aside, stay away from notepad. it is too naive for almost
anything interesting.
The specific case I often see is people reading linux-style text files
with notepad, which doesn't
understand NL terminated lines.  nicely formatted text files become illegible.

On Wed, Dec 26, 2018 at 6:04 PM Spencer Brackett
<spbrackett20 at saintjosephhs.com> wrote:
#
Mr. Heiberger,

 Thank you for the insight! I will try out suggestion.

Best,

Spencer Brackett
On Wed, Dec 26, 2018 at 6:34 PM Richard M. Heiberger <rmh at temple.edu> wrote:

            

  
  
#
Hello again,

I worked on directly downloading the file into R as was suggested, but have
thus far been unsuccessful. This is what  I generated on my second
attempt...

 GBM protein_expression<-(file.choose(), header=TRUE, sep="\t")
Error: unexpected symbol in "GBM protein_expression"
protein_expression<-(file.choose(GBM_protein_expression.xlsx),header=TRUE,
sep="\t")
Error: unexpected symbol in "GBM protein_expression"
What part of the argument is in error?

Also I tried importing the dataset as an excel file on RStudio to see if I
could solve my problem that way. However, my imported excel file has been
stuck in the 'retrieving preview data' and no data is appearing. Is the
data file prehaps too large or in the wrong format?



On Wed, Dec 26, 2018 at 6:42 PM Spencer Brackett <
spbrackett20 at saintjosephhs.com> wrote:

            

  
  
#
Hi

See inline
You forgot to add read.* function.

something like
protein_expression <- read.delim(file.choose())

or

protein_expression <- read.table(file.choose(), header=TRUE, sep="\t")

If your files are tab delimited as Richard suggested.

Cheers
Petr
Osobn? ?daje: Informace o zpracov?n? a ochran? osobn?ch ?daj? obchodn?ch partner? PRECHEZA a.s. jsou zve?ejn?ny na: https://www.precheza.cz/zasady-ochrany-osobnich-udaju/ | Information about processing and protection of business partner?s personal data are available on website: https://www.precheza.cz/en/personal-data-protection-principles/
D?v?rnost: Tento e-mail a jak?koliv k n?mu p?ipojen? dokumenty jsou d?v?rn? a podl?haj? tomuto pr?vn? z?vazn?mu prohl??en? o vylou?en? odpov?dnosti: https://www.precheza.cz/01-dovetek/ | This email and any documents attached to it may be confidential and are subject to the legally binding disclaimer: https://www.precheza.cz/en/01-disclaimer/