Sum Question
On Jun 30, 2011, at 11:20 AM, Edgar Alminar wrote:
I did this:
library(data.table)
dd <- data.table(bl)
dd[,sum(as.integer(CONTTIME)), by = SCRNO]
(I used as.integer because I got an error message: sum not meaningful for factors)
And got this:
SCRNO V1
[1,] HBA0020036 111
[2,] HBA0020087 71
[3,] HBA0020209 140
[4,] HBA0020213 189
[5,] HBA0020222 174
[6,] HBA0020292 747
[7,] HBA0020310 57
[8,] HBA0020317 291
[9,] HBA0020365 417
[10,] HBA0020366 124
All the sums are way too big. Is there something making it not add up correctly?
Original dataset:
RID SCRNO VISCODE RECNO CONTTIME 338 43 HBA0020036 bl 1 9 1187 95 HBA0020087 bl 1 3 3251 230 HBA0020209 bl 2 3 3258 230 HBA0020209 bl 1 28 3321 235 HBA0020213 bl 2 5 3351 235 HBA0020213 bl 1 6 3436 247 HBA0020222 bl 1 5 3456 247 HBA0020222 bl 2 4 4569 321 HBA0020292 bl 13 2 4572 321 HBA0020292 bl 5 13 4573 321 HBA0020292 bl 1 25 4576 321 HBA0020292 bl 7 5 4578 321 HBA0020292 bl 8 2 4581 321 HBA0020292 bl 4 4 4582 321 HBA0020292 bl 9 5 4586 321 HBA0020292 bl 12 2 4587 321 HBA0020292 bl 6 2 4590 321 HBA0020292 bl 10 3 4591 321 HBA0020292 bl 11 7
That is not the entire dataset....HBA0020366 is missing, as an example. I don't use the data.table package, but if you are getting an error indicating that CONTTIME is a factor, then something is wrong with either the data itself (there are non-numeric entries) or the way in which it was entered/imported into R. Thus, I would first check your data for errors. Use str(YourDataSet) to review its structure and if CONTTIME is a factor, check into the data to see why. Lastly, review this R FAQ: http://cran.r-project.org/doc/FAQ/R-FAQ.html#How-do-I-convert-factors-to-numeric_003f Just as an alternative, with your data in 'DF':
DF
RID SCRNO VISCODE RECNO CONTTIME 338 43 HBA0020036 bl 1 9 1187 95 HBA0020087 bl 1 3 3251 230 HBA0020209 bl 2 3 3258 230 HBA0020209 bl 1 28 3321 235 HBA0020213 bl 2 5 3351 235 HBA0020213 bl 1 6 3436 247 HBA0020222 bl 1 5 3456 247 HBA0020222 bl 2 4 4569 321 HBA0020292 bl 13 2 4572 321 HBA0020292 bl 5 13 4573 321 HBA0020292 bl 1 25 4576 321 HBA0020292 bl 7 5 4578 321 HBA0020292 bl 8 2 4581 321 HBA0020292 bl 4 4 4582 321 HBA0020292 bl 9 5 4586 321 HBA0020292 bl 12 2 4587 321 HBA0020292 bl 6 2 4590 321 HBA0020292 bl 10 3 4591 321 HBA0020292 bl 11 7
aggregate(CONTTIME ~ DF$SCRNO, data = DF, sum)
DF$SCRNO CONTTIME 1 HBA0020036 9 2 HBA0020087 3 3 HBA0020209 31 4 HBA0020213 11 5 HBA0020222 9 6 HBA0020292 70 See ?aggregate HTH, Marc Schwartz