Error in vector("integer", length) : vector size cannot be NA

Hello,

I have uploaded a csv file that looks like this:
gc
alpha_id     beta_id
1       142053       1
2         9454       1
3       295618       2
4        42691       2
5       389224       3
6         9455       3

The alpha_id contains 310660 unique values and the beta_id contains 17431
unique values. The number of rows adds up to more than 1.3 million. Now I
want to convert this list of observations into a matrix with alpha_id in the
first row and beta_id in the first column (or vice versa) and a count in the
cells. So this would be an option M = as.matrix( table(gc) ). However, I
keep getting this error message:

Error in vector("integer", length) : vector size cannot be NA
In addition: Warning messages:
1: In pd * (as.integer(cat) - 1L) : NAs produced by integer overflow
2: In pd * nl : NAs produced by integer overflow

There is no missing data in my file, so I don't know what's wrong. Can you
please help me? Thanks!

Mathijs
View this message in context: http://r.789695.n4.nabble.com/Error-in-vector-integer-length-vector-size-cannot-be-NA-tp3079566p3079566.html
Sent from the R help mailing list archive at Nabble.com.
Hello,

I have uploaded a csv file that looks like this:

gc
         alpha_id     beta_id
1       142053       1
2         9454       1
3       295618       2
4        42691       2
5       389224       3
6         9455       3

The alpha_id contains 310660 unique values and the beta_id contains 17431
unique values. The number of rows adds up to more than 1.3 million. Now I
want to convert this list of observations into a matrix with alpha_id in the
first row and beta_id in the first column (or vice versa) and a count in the
cells. So this would be an option M = as.matrix( table(gc) ). However, I
keep getting this error message:

Error in vector("integer", length) : vector size cannot be NA
In addition: Warning messages:
1: In pd * (as.integer(cat) - 1L) : NAs produced by integer overflow
2: In pd * nl : NAs produced by integer overflow

There is no missing data in my file, so I don't know what's wrong. Can you
please help me? Thanks!
The number of entries in the table is 310660*17431. Using integer
type, this is 310660*17431*4 bytes, which is 20.17 GB. This probably
does not fit into RAM. Function table() produces a full matrix, not
a sparse one, even if there are empty cells.

Petr Savicky.
Try using 'sqldf' to get your result

sqldf("select alpha_id, beta_id, count(*) from gc group by alpha_id, beta_id")

You might also try 'data.table'
Hello,

I have uploaded a csv file that looks like this:

gc
? ? ? ? alpha_id ? ? beta_id
1 ? ? ? 142053 ? ? ? 1
2 ? ? ? ? 9454 ? ? ? 1
3 ? ? ? 295618 ? ? ? 2
4 ? ? ? ?42691 ? ? ? 2
5 ? ? ? 389224 ? ? ? 3
6 ? ? ? ? 9455 ? ? ? 3

The alpha_id contains 310660 unique values and the beta_id contains 17431
unique values. The number of rows adds up to more than 1.3 million. Now I
want to convert this list of observations into a matrix with alpha_id in the
first row and beta_id in the first column (or vice versa) and a count in the
cells. So this would be an option M = as.matrix( table(gc) ). However, I
keep getting this error message:

Error in vector("integer", length) : vector size cannot be NA
In addition: Warning messages:
1: In pd * (as.integer(cat) - 1L) : NAs produced by integer overflow
2: In pd * nl : NAs produced by integer overflow

There is no missing data in my file, so I don't know what's wrong. Can you
please help me? Thanks!

Mathijs

--
View this message in context: http://r.789695.n4.nabble.com/Error-in-vector-integer-length-vector-size-cannot-be-NA-tp3079566p3079566.html
Sent from the R help mailing list archive at Nabble.com.

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?