Help

An embedded and charset-unspecified text was scrubbed...
Name: not available
URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20110426/92b2b9bc/attachment.pl>
Is this what you were looking for as output.  You did not show what
the output would look like:
x
var1 var2 X. varN
1  122 nnn1  ?    1
2  213 nnn2  ?    2
3  422 nnn4  ?    2
4  432    ?  ?    3
5  441    ?  ?    4
6  500    ?  ?    4
7  550    ?  ?    4
str(x)
'data.frame':   7 obs. of  4 variables:
 $ var1: int  122 213 422 432 441 500 550
 $ var2: Factor w/ 4 levels "?","nnn1","nnn2",..: 2 3 4 1 1 1 1
 $ X.  : Factor w/ 1 level "?": 1 1 1 1 1 1 1
 $ varN: int  1 2 2 3 4 4 4
x$newCol <- ave(x$var1, x$varN, FUN=sum)
x
var1 var2 X. varN newCol
1  122 nnn1  ?    1    122
2  213 nnn2  ?    2    635
3  422 nnn4  ?    2    635
4  432    ?  ?    3    432
5  441    ?  ?    4   1491
6  500    ?  ?    4   1491
7  550    ?  ?    4   1491

Hey Everyone!
I?m a quite ?new R user .. I found a problem that I'd like to share with you
and help me find a solution.
I have a large txt. file which I opened with read.table command, and what I
understood from many R manuals is that ?I have a kind of matrix readed with
read.table,
I've used order() to sort my data and now my problem is: I have a variable
that has many repeated values and ?I would like to operate with the row
indexes of "these repeated values": for example, suppose I have:

?var1 ? ?var2 ? ? ? ? ?varN
?122 ? ? nnn1 ? ?? ? ? 1
?213 ? ? nnn2 ? ?? ? ?2
?422 ? ? nnn4 ? ?? ? ?2
?432 ? ? ? ? ? ? ?? ? ?3
?441 ? ? ? ? ? ? ?? ? ?4
?500 ? ? ? ? ? ? ?? ? ?4
?550 ? ? ? ? ? ? ?? ? ?4

So I want to obtain a new column where all elements of var1 are added at the
places where varN are repetead ... so for varN=2 ?the new column correspond
to this element will be 213+422, for varN=4 will be 441+500+550, where there
is no such repeated values obviously there?s nothing to do and varN is the
unique value.
I made a function to do this but is not so good, (I hava a database with
around 1 million rows and 5 columns) actually, this function works for not
so large data:

suma.rep=function(X,Y){
resp=numeric(0)
Z=unique(Y)
for (i in (1:length(Z)))
resp=c(resp,sum(X[which(Y==Z[i])]))
return(resp)}

When I ?run this function with my large data, R appears calculating and I
think it would take so long to make my new required column.(maybe 4 days)
Question1: I "feel" that maybe there's a command that could help me to do
this "simple" operation more elegant, I googled it but I couldnt find... Is
there any such a command?
Question2: Is a good idea to handle large data bases files with ?R, as in my
example?

Thank you so much for your help.
Christian Pa?l

? ? ? ?[[alternative HTML version deleted]]

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?