Manage huge database
Maybe you've not lurked on R-help for long enough :) Apologies!
Probably.
So, how much "design" is in this data? If none, and what you've basically got is a 2000x500000 grid of numbers, then maybe a more raw
Exactly, raw data, but a little more complex since all the 500000 variables are in text format, so the width is around 2,500,000.
Thanks, I will check. Right now I am reading line by line the file. It's time consuming, but since I will do it only once, just to rearrange the data into smaller tables to query, it's ok.
Thinking back to your 4GB file with 1,000,000,000 entries, that's only 3 bytes per entry (+1 for the comma). What is this data? There may be more efficient ways to handle it.
Is genetic DNA data (individuals genotyped), hence the large amount of columns to analyze. Best Regards, Jose Lozano ------------------------------------------ Jose E. Lozano Alonso Observatorio de Salud P?blica. Direccion General de Salud P?blica e I+D+I. Junta de Castilla y Le?n. Direccion: Paseo de Zorrilla, n?1. Despacho 3103. CP 47071. Valladolid.