I note that "current implementations of R use 32-bit integers for integer vectors," but I am working with large arrays that contain integers from 0 to 3, so they could be stored as unsigned 8-bit integers. Can R do this? (FYI -- This is for storing minor-allele counts for genetic studies. There are 0, 1 or 2 minor alleles and 3 would represent missing.) It is theoretically possible to store such data with four integers per byte. This is what PLINK (GPL license) does in its binary (.bed) pedigree format: http://pngu.mgh.harvard.edu/~purcell/plink/data.shtml#ped That might be too much to hope for. ;-) I think that the R system uses double-precision floating point numbers by default. When I impute minor-allele counts, I get posterior expected values ranging from 0 to 2 (called dosages). The imputation isn't very precise, so it would be fine to store such data using one or two bytes. (The values are used as regressors and small changes would have minimal impact on results.) I could use unsigned 8-bit integers (0 to 255), probably using only 0 to 254 so that 1 and 2 could be represented with perfect precision as 127/127 and 254/127 (but I would do regression on the integer values). Or I could use 16 bits, doubling memory load and improving precision. It would be convenient if R could work with half-precision floating-point numbers (binary16): http://en.wikipedia.org/wiki/Half_precision_floating-point_format Can R do that? If not, is anyone interested in working on developing some of these features in R? We have GPL code from PLINK and Octave that might help a lot. http://www.gnu.org/software/octave/doc/interpreter/Integer-Data-Types.html Best, Mike -- Michael B. Miller, Ph.D. Bioinformatics Specialist Minnesota Center for Twin and Family Research Department of Psychology University of Minnesota
integer and floating-point storage
2 messages · Mike Miller, Matt Shotwell
Hi Mike, There are some facilities for storing and manipulating small (2 bit) integers. See here: http://cran.r-project.org/web/packages/ff/index.html -Matt
On 04/14/2011 01:20 PM, Mike Miller wrote:
I note that "current implementations of R use 32-bit integers for integer vectors," but I am working with large arrays that contain integers from 0 to 3, so they could be stored as unsigned 8-bit integers. Can R do this? (FYI -- This is for storing minor-allele counts for genetic studies. There are 0, 1 or 2 minor alleles and 3 would represent missing.) It is theoretically possible to store such data with four integers per byte. This is what PLINK (GPL license) does in its binary (.bed) pedigree format: http://pngu.mgh.harvard.edu/~purcell/plink/data.shtml#ped That might be too much to hope for. ;-) I think that the R system uses double-precision floating point numbers by default. When I impute minor-allele counts, I get posterior expected values ranging from 0 to 2 (called dosages). The imputation isn't very precise, so it would be fine to store such data using one or two bytes. (The values are used as regressors and small changes would have minimal impact on results.) I could use unsigned 8-bit integers (0 to 255), probably using only 0 to 254 so that 1 and 2 could be represented with perfect precision as 127/127 and 254/127 (but I would do regression on the integer values). Or I could use 16 bits, doubling memory load and improving precision. It would be convenient if R could work with half-precision floating-point numbers (binary16): http://en.wikipedia.org/wiki/Half_precision_floating-point_format Can R do that? If not, is anyone interested in working on developing some of these features in R? We have GPL code from PLINK and Octave that might help a lot. http://www.gnu.org/software/octave/doc/interpreter/Integer-Data-Types.html Best, Mike -- Michael B. Miller, Ph.D. Bioinformatics Specialist Minnesota Center for Twin and Family Research Department of Psychology University of Minnesota
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Matthew S Shotwell Assistant Professor School of Medicine
Department of Biostatistics Vanderbilt University