An embedded and charset-unspecified text was scrubbed... Name: not available Url: https://stat.ethz.ch/pipermail/r-help/attachments/20080304/8509d427/attachment.pl
problem
9 messages · Erika Frigo, jim holtman, Roland Rau +4 more
Is it just a file with a million values or is it some type of a structure with a million rows of indeterinent columns? If it is just a million numbers, you can easily read with is 'scan' or 'read.table' with no problem. I work with data structures that have several million rows and 4-5 columns without any problems. What is the format of the input?
On 3/4/08, Erika Frigo <erika.frigo at unimi.it> wrote:
Good evening to everybody,
I have problems to import in R a really big dataset (more than 1000000 values). Which is the best package to install?
Is there someone who works with this kind of dataset and can help me, please?
Thank you very much,
Regards
Dr.ssa Erika Frigo
Department of Veterinary Sciences and Technology for Food Safety
University of Milan
Via Grasselli, 7
20137 Milano
Tel. +39 0250318515
Fax +39 0250318501
[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem you are trying to solve?
Hi,
Erika Frigo wrote:
Good evening to everybody, I have problems to import in R a really big dataset (more than 1000000 values). Which is the best package to install? Is there someone who works with this kind of dataset and can help me, please?
Maybe the package SQLiteDF could be useful for you. http://cran.r-project.org/web/packages/SQLiteDF/index.html But since you mention that the data has 1 mio values, I think it should be no problem to read the data set "conventionally". > (object.size(rnorm(1e06)))/(1024^2) [1] 7.629417 Assuming that all data are numeric, the data-set should consume less than 8MB. I hope this helps, Roland
On Tue, Mar 4, 2008 at 10:35 AM, Erika Frigo <erika.frigo at unimi.it> wrote:
Good evening to everybody, I have problems to import in R a really big dataset (more than 1000000 values). Which is the best package to install? Is there someone who works with this kind of dataset and can help me, please?
A good place to start is the manual "R Data Import/Export" that comes with every installed version of R.
Goodmorning Jim, My file has not only more than a million values, but more than a million rows and moreless 30 columns (it is a productive dataset for cows), infact with read.table i'm not able to import it. It is an xls file. How do you import your million rows and 4-5 columns files? thank you regards Dr.ssa Erika Frigo Universit? degli Studi di Milano Facolt? di Medicina Veterinaria Dipartimento di Scienze e Tecnologie Veterinarie per la Sicurezza Alimentare (VSA) Via Grasselli, 7 20137 Milano Tel. 02/50318515 Fax 02/50318501 ----- Original Message ----- From: "jim holtman" <jholtman a gmail.com> To: "Erika Frigo" <erika.frigo a unimi.it> Cc: <r-help a r-project.org> Sent: Tuesday, March 04, 2008 6:13 PM Subject: Re: [R] problem
Is it just a file with a million values or is it some type of a structure with a million rows of indeterinent columns? If it is just a million numbers, you can easily read with is 'scan' or 'read.table' with no problem. I work with data structures that have several million rows and 4-5 columns without any problems. What is the format of the input? On 3/4/08, Erika Frigo <erika.frigo a unimi.it> wrote:
Good evening to everybody,
I have problems to import in R a really big dataset (more than 1000000
values). Which is the best package to install?
Is there someone who works with this kind of dataset and can help me,
please?
Thank you very much,
Regards
Dr.ssa Erika Frigo
Department of Veterinary Sciences and Technology for Food Safety
University of Milan
Via Grasselli, 7
20137 Milano
Tel. +39 0250318515
Fax +39 0250318501
[[alternative HTML version deleted]]
______________________________________________ R-help a r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
-- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem you are trying to solve?
On Wed, Mar 05, 2008 at 12:32:19PM +0100, Erika Frigo wrote:
My file has not only more than a million values, but more than a million rows and moreless 30 columns (it is a productive dataset for cows), infact with read.table i'm not able to import it. It is an xls file.
read.table() expects clear text -- e.g. csv or tab separated in the case of read.delim(). If your file is in xls format the simplest option would be to export the data to CSV format from Excel. If for some reason that is not an option please have a look at the "R Data Import/Export" manual. Of course neither will solve the problem of not enough memory if your file is simply too large. In that case you will may want to put your data into a database and have R connect to it and retrieve the data in smaller chunks as required. cu Philipp
Dr. Philipp Pagel Tel. +49-8161-71 2131 Lehrstuhl f?r Genomorientierte Bioinformatik Fax. +49-8161-71 2186 Technische Universit?t M?nchen Wissenschaftszentrum Weihenstephan 85350 Freising, Germany and Institut f?r Bioinformatik und Systembiologie / MIPS Helmholtz Zentrum M?nchen - Deutsches Forschungszentrum f?r Gesundheit und Umwelt Ingolst?dter Landstrasse 1 85764 Neuherberg, Germany http://mips.gsf.de/staff/pagel
Philipp Pagel <p.pagel at wzw.tum.de> wrote in news:20080305120637.GA8181 at localhost:
On Wed, Mar 05, 2008 at 12:32:19PM +0100, Erika Frigo wrote:
My file has not only more than a million values, but more than a million rows and moreless 30 columns (it is a productive dataset for cows), infact with read.table i'm not able to import it. It is an xls file.
There is something very wrong here. Even the most recent versions of Excel cannot handle files with a million rows. Heck, they can't even handle files with one-tenth than number. In earlier versions the limit was on the order of 36K.
David Winsemius > > read.table() expects clear text -- e.g. csv or tab separated in the > case of read.delim(). If your file is in xls format the simplest > option would be to export the data to CSV format from Excel. > > If for some reason that is not an option please have a look at the > "R Data Import/Export" manual. > > Of course neither will solve the problem of not enough memory if > your file is simply too large. In that case you will may want to put > your data into a database and have R connect to it and retrieve the > data in smaller chunks as required. > > cu > Philipp >
On Thu, Mar 6, 2008 at 12:00 AM, David Winsemius <dwinsemius at comcast.net> wrote:
Philipp Pagel <p.pagel at wzw.tum.de> wrote in news:20080305120637.GA8181 at localhost:
On Wed, Mar 05, 2008 at 12:32:19PM +0100, Erika Frigo wrote:
My file has not only more than a million values, but more than a million rows and moreless 30 columns (it is a productive dataset for cows), infact with read.table i'm not able to import it. It is an xls file.
There is something very wrong here. Even the most recent versions of Excel cannot handle files with a million rows. Heck, they can't even handle files with one-tenth than number. In earlier versions the limit was on the order of 36K.
Excel 2007 can handle over 1 million rows: http://office.microsoft.com/en-us/excel/HP100738491033.aspx#WorksheetWorkbook
"Gabor Grothendieck" <ggrothendieck at gmail.com> wrote in news:971536df0803052116q6a91bd95ja50ed541330d8ff1 at mail.gmail.com:
On Thu, Mar 6, 2008 at 12:00 AM, David Winsemius <dwinsemius at comcast.net> wrote:
Philipp Pagel <p.pagel at wzw.tum.de> wrote in news:20080305120637.GA8181 at localhost:
On Wed, Mar 05, 2008 at 12:32:19PM +0100, Erika Frigo wrote:
My file has not only more than a million values, but more than a million rows and moreless 30 columns (it is a productive dataset for cows), infact with read.table i'm not able to import it. It is an xls file.
There is something very wrong here. Even the most recent versions of Excel cannot handle files with a million rows. Heck, they can't even handle files with one-tenth than number. In earlier versions the limit was on the order of 36K.
Excel 2007 can handle over 1 million rows: http://office.microsoft.com/en-us/excel/HP100738491033.aspx#Worksheet Workbook
Yes. I was going to correct myself. I saw another posting that said they and Excel file with had 200,000 and just got back from checking the 2007version. 1,048,576 rows. The 2003 version's limit was 65,536 rows.
David Winsemius