An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20110426/92b2b9bc/attachment.pl>
Help
2 messages · петрович, jim holtman
Is this what you were looking for as output. You did not show what the output would look like:
x
var1 var2 X. varN 1 122 nnn1 ? 1 2 213 nnn2 ? 2 3 422 nnn4 ? 2 4 432 ? ? 3 5 441 ? ? 4 6 500 ? ? 4 7 550 ? ? 4
str(x)
'data.frame': 7 obs. of 4 variables: $ var1: int 122 213 422 432 441 500 550 $ var2: Factor w/ 4 levels "?","nnn1","nnn2",..: 2 3 4 1 1 1 1 $ X. : Factor w/ 1 level "?": 1 1 1 1 1 1 1 $ varN: int 1 2 2 3 4 4 4
x$newCol <- ave(x$var1, x$varN, FUN=sum) x
var1 var2 X. varN newCol 1 122 nnn1 ? 1 122 2 213 nnn2 ? 2 635 3 422 nnn4 ? 2 635 4 432 ? ? 3 432 5 441 ? ? 4 1491 6 500 ? ? 4 1491 7 550 ? ? 4 1491
On Tue, Apr 26, 2011 at 6:31 PM, ???????? <bistanz at gmail.com> wrote:
Hey Everyone!
I?m a quite ?new R user .. I found a problem that I'd like to share with you
and help me find a solution.
I have a large txt. file which I opened with read.table command, and what I
understood from many R manuals is that ?I have a kind of matrix readed with
read.table,
I've used order() to sort my data and now my problem is: I have a variable
that has many repeated values and ?I would like to operate with the row
indexes of "these repeated values": for example, suppose I have:
?var1 ? ?var2 ? ? ? ? ?varN
?122 ? ? nnn1 ? ?? ? ? 1
?213 ? ? nnn2 ? ?? ? ?2
?422 ? ? nnn4 ? ?? ? ?2
?432 ? ? ? ? ? ? ?? ? ?3
?441 ? ? ? ? ? ? ?? ? ?4
?500 ? ? ? ? ? ? ?? ? ?4
?550 ? ? ? ? ? ? ?? ? ?4
So I want to obtain a new column where all elements of var1 are added at the
places where varN are repetead ... so for varN=2 ?the new column correspond
to this element will be 213+422, for varN=4 will be 441+500+550, where there
is no such repeated values obviously there?s nothing to do and varN is the
unique value.
I made a function to do this but is not so good, (I hava a database with
around 1 million rows and 5 columns) actually, this function works for not
so large data:
suma.rep=function(X,Y){
resp=numeric(0)
Z=unique(Y)
for (i in (1:length(Z)))
resp=c(resp,sum(X[which(Y==Z[i])]))
return(resp)}
When I ?run this function with my large data, R appears calculating and I
think it would take so long to make my new required column.(maybe 4 days)
Question1: I "feel" that maybe there's a command that could help me to do
this "simple" operation more elegant, I googled it but I couldnt find... Is
there any such a command?
Question2: Is a good idea to handle large data bases files with ?R, as in my
example?
Thank you so much for your help.
Christian Pa?l
? ? ? ?[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Jim Holtman Data Munger Guru What is the problem that you are trying to solve?