Manage huge database

Thomas Lumley · 2008-09-22T20:38:19Z

On Mon, 22 Sep 2008, Martin Morgan wrote: > "Jos? E. Lozano" writes: > >>> Maybe you've not lurked on R-help for long enough :) Apologies! >> >> Probably. >> >>> So, how much "design" is in this data? If none, and what you've >>> basically got is a 2000x500000 grid of numbers, then maybe a more raw >> >> Exactly, raw data, but a little more complex since all the 500000 variables >> are in text format, so the width is around 2,500,000. > >> Is genetic DNA data (indivi

Thomas Lumley

Mon, Sep 22, 2008 1:38 PM

On Mon, 22 Sep 2008, Martin Morgan wrote:

<snip>>

netCDF is another useful option -- we have been using the ncdf package for 
large genomic datasets.  We read the data in one person at a time and 
write to netCDF.  For analysis we can then read any subsets.  Since we 
have imputed SNP data  as well as measured this comes to about 2.5 million 
variables on 4000 people for one of our data sets.


 	-thomas

Thomas Lumley			Assoc. Professor, Biostatistics
tlumley at u.washington.edu	University of Washington, Seattle

Manage huge database

Thread (3 messages)