-----Original Message-----
From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf
Of William Dunlap
Sent: Friday, October 11, 2013 9:51 AM
To: arun; Steven Ranney; r-help at r-project.org
Subject: Re: [R] Create sequential vector for values in another column
At this point 3 functions have been suggested and I'll add a 4th:
f1 <- function(x)unlist(lapply(unname(split(rep.int(1L,length(x)), x)), cumsum))
f2 <- function(x)unlist(sapply(rle(x)$lengths, function(k) 1:k ))
f3 <- function(x)ave(x,x,FUN=seq)
f4 <- function(x)ave(seq_along(x), x, FUN=seq_along)
You can compare their results with ftest (as long as their results have the
same lengths):
ftest <- function(x) {
data.frame(x, f1=f1(x), f2=f2(x), f3=f3(x), f4=f4(x))
}
They all return the same result for the Steven's sample data, which is numeric
and in sorted order:
x0 <- c(123.45, 123.45, 123.45, 123.45, 234.56,
234.56, 234.56, 234.56, 234.56, 234.56, 234.56, 345.67, 345.67,
345.67, 456.78, 456.78, 456.78, 456.78, 456.78, 456.78, 456.78,
456.78, 456.78)
However, f1() gives the wrong answer if x is not sorted:
> ftest(c(30,30,30, 20,20))
x f1 f2 f3 f4
1 30 1 1 1 1
2 30 2 2 2 2
3 30 1 3 3 3
4 20 2 1 1 1
5 20 3 2 2 2
f1() and f2() give the wrong answer if the groups are split up in the data
> ftest(c(10,10, 8,8,8, 10,10,10)) # 10's not contiguous
x f1 f2 f3 f4
1 10 1 1 1 1
2 10 2 2 2 2
3 8 3 1 1 1
4 8 1 2 2 2
5 8 2 3 3 3
6 10 3 1 3 3
7 10 4 2 4 4
8 10 5 3 5 5
(It is not clear what result the OP wants here.)
f3() gives the wrong answer if x is not numeric
> f3(c("a","a","a", "b","b"))
[1] "1" "2" "3" "1" "2"
f3() also gives an ominous warning if there is singleton in x (be
[1] 1 2 3 1
Warning message:
In `split<-.default`(`*tmp*`, g, value = lapply(split(x, g), FUN)) :
number of items to replace is not a multiple of replacement length
f2() fails to give an answer if x is a factor
> f2(factor(c("x","y","z")))
Error in rle(x) : 'x' must be an atomic vector
I think f4 gives the correct result for all those cases.
I think all of the above call lapply(split()) at some point and that can use
a lot of memory when there are lots of unique values in x. You can use
a sort-based algorithm to avoid that problem.
Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com
-----Original Message-----
From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf
Of arun
Sent: Friday, October 11, 2013 6:43 AM
To: Steven Ranney; r-help at r-project.org
Subject: Re: [R] Create sequential vector for values in another column
Also,
it might be faster to use ?data.table()
library(data.table)
?dt1<- data.table(dat1,key='id.name')
dt1[,x:=seq(.N),by='id.name']
A.K.
On , arun <smartpink111 at yahoo.com> wrote:
Hi,
Try:
dat1<-
structure(list(id.name = c(123.45, 123.45, 123.45, 123.45, 234.56,
234.56, 234.56, 234.56, 234.56, 234.56, 234.56, 345.67, 345.67,
345.67, 456.78, 456.78, 456.78, 456.78, 456.78, 456.78, 456.78,
456.78, 456.78)), .Names = "id.name", class = "data.frame", row.names = c(NA,
-23L))
dat1$x <- with(dat1,ave(id.name,id.name,FUN=seq))
A.K.
On Friday, October 11, 2013 9:28 AM, Steven Ranney <steven.ranney at gmail.com>
wrote:
Hello all -
I have an example column in a dataFrame
id.name
123.45
123.45
123.45
123.45
234.56
234.56
234.56
234.56
234.56
234.56
234.56
345.67
345.67
345.67
456.78
456.78
456.78
456.78
456.78
456.78
456.78
456.78
456.78
...
[truncated]
And I'd like to create a second vector of sequential values (i.e., 1:N) for
each unique id.name value.? In other words, I need
id.name? x
123.45?? 1
123.45?? 2
123.45?? 3
123.45?? 4
234.56?? 1
234.56?? 2
234.56?? 3
234.56?? 4
234.56?? 5
234.56?? 6
234.56?? 7
345.67?? 1
345.67?? 2
345.67?? 3
456.78?? 1
456.78?? 2
456.78?? 3
456.78?? 4
456.78?? 5
456.78?? 6
456.78?? 7
456.78?? 8
456.78?? 9
The number of unique id.name values is different; for some values, nrow()
may be 42 and for others it may be 36, etc.
The only way I could think of to do this is with two nested for loops.? I
tried it but because this data set is so large (nrow = 112,679 with 2,161
unique values of id.name), it took several hours to run.
Is there an easier way to create this vector?? I'd appreciate your thoughts.
Thanks -
SR
Steven H. Ranney
??? [[alternative HTML version deleted]]