[Bioc-devel] GenomicRanges: Storing 'seqlengths' as numeric
Hi Kasper,
On 12/03/2013 12:25 PM, Kasper Daniel Hansen wrote:
Is integer.max dependent on 32bit vs 64bit?
I don't think so. AFAIK integers are always 32-bit in R (at least on Intel platforms), even on 64-bit OSes. So .Machine$integer.max is always 2^31 - 1 (roughly 2 billions).
It seems to me that the OP specifically complains that he cannot represent 995*10^6 as an integer.
995*10^6 is roughly 1 billion so it can be represented as an integer, except maybe on some exotic systems.
Also, is there a sign issue here as well?
Not that I know of. H.
On Tue, Dec 3, 2013 at 2:53 PM, Herv? Pag?s <hpages at fhcrc.org
<mailto:hpages at fhcrc.org>> wrote:
Hi,
Agreed with Martin that until someone comes up with a chromosome that
is longer than .Machine$integer.max I don't see the need for switching
to double or int64 to represent the seqlengths.
Furthermore, since the seqlengths are used in many range operations
like checking the validity of the ranges in a GRanges object, trimming
them, computing coverage, handling circularity, etc... it would not
make much sense to make the switch for the seqlengths without also
making it for Ranges objects. That would be a serious undertaking though
and probably with many backward compatibility issues.
H.
On 12/03/2013 10:07 AM, Martin Morgan wrote:
On 12/03/2013 02:29 AM, Julian Gehring wrote:
Hi,
Some of the chromosomes out in the world are fairly large
(e.g. wheat
chr 3B
with > 995 Mbp [1]). Currently, the 'seqlengths' of the
reference
sequence are
stored as 'integers' which do not allow to store lengths of this
size. Are
there any plans of switching to 'doubles' or 64-bit integers
for the
'seqlengths' slot? Or extending the slot such that a user
can store
it either
as integer or floating-point number?
But
> .Machine$integer.max
[1] 2147483647 <tel:%5B1%5D%202147483647>
so we at least survive wheat chr 3B?
If there is movement to support this I'd encourage exact
representation
as double (this is how R deals with long vectors, and I believe
it is
the javascript representation of integers so not completely
unprecedented) rather than 64 bit integers (which do not have any
support in R).
I guess this would be quite a big undertaking so real use cases
need to
be present. And support for larger integers would seem to be
useful to R
generally rather than just to Bioc.
Martin
Best wishes
Julian
[1] http://www.sciencemag.org/__content/322/5898/101
<http://www.sciencemag.org/content/322/5898/101>
_________________________________________________
Bioc-devel at r-project.org <mailto:Bioc-devel at r-project.org>
mailing list
https://stat.ethz.ch/mailman/__listinfo/bioc-devel
<https://stat.ethz.ch/mailman/listinfo/bioc-devel>
--
Herv? Pag?s
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024
E-mail: hpages at fhcrc.org <mailto:hpages at fhcrc.org>
Phone: (206) 667-5791 <tel:%28206%29%20667-5791>
Fax: (206) 667-1319 <tel:%28206%29%20667-1319>
_________________________________________________
Bioc-devel at r-project.org <mailto:Bioc-devel at r-project.org> mailing list
https://stat.ethz.ch/mailman/__listinfo/bioc-devel
<https://stat.ethz.ch/mailman/listinfo/bioc-devel>
Herv? Pag?s Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpages at fhcrc.org Phone: (206) 667-5791 Fax: (206) 667-1319