Skip to content

Help understanding loop behaviour

7 messages · e-mail ma015k3113, Jim Lemon, PIKAL Petr +2 more

#
I am trying to understand how loops in operate. I have a simple dataframe xx which is as follows

COMPANY_NUMBER   NUMBER_OF_YEARS
 
#0070837                             3
#0070837                             3
#0070837                             3
1000403                               4
1000403                               4
1000403                               4
1000403                               4
10029943                             3
10029943                             3
10029943                             3
10037980                             4
10037980                             4
10037980                             4
10037980                             4
10057418                             3
10057418                             3

10057418                             3
1009550                               4
1009550                               4
1009550                               4
1009550                               4
The code I have written is

while (i <= nrow(xx1) )

{

for (j in 1:xx1$NUMBER_OF_YEARS[i])
{
xx1$I[i] <- i
xx1$J[j] <- j
xx1$NUMBER_OF_YEARS_j[j] <- xx1$NUMBER_OF_YEARS[j]
}
i=i + (xx1$NUMBER_OF_YEARS[i] )
}
After running the code I want my dataframe to look like

|COMPANY_NUMBER |NUMBER_OF_YEARS| | I| |J|

|#0070837 |3| |1| |1|
|#0070837 |3| |1| |2|
|#0070837 |3| |3| |3|
|1000403 |4| |1| |1|
|1000403 |4| |1| |2|
|1000403 |4| |1| |3|
|1000403 |4| |4| |4|
|10029943 |3| |1| |1|
|10029943 |3| |1| |2|
|10029943 |3| |3| |3|
|10037980 |4| |1| |1|
|10037980 |4| |1| |2|
|10037980 |4| |1| |3|
|10037980 |4| |4| |4|
|10057418 |3| |1| |1|
|10057418 |3| |1| |1|
|10057418 |3| |1| |1|
|1009550 |4| |1| |1|
|1009550 |4| |1| |2|
|1009550 |4| |1| |3|
|1009550 |4| |4| |4|


I get the correct value of I but in the wrong row but the vaule of J is correct in the first iteration and then it goes to 1

Any help will be greatly appreciated
#
Hi

Your code is hardly readable as you used HTML formating (not recommended) so
I used another (split) approach.

Third column seems to be simple 

#make list
lll <- split(as.factor(COMPANY_NUMBER), COMPANY_NUMBER)

#calculate sequences
as.numeric(unlist(lapply(lll, function(x) 1:length(x))))
should give you third column

The second column seems to be calculated this way.
lapply(lll, function(x) c(rep(1, length(x)-1), max(length(x))))

I believe others could come with simpler solutions.

BTW why result for
10057418 
Should be different?

Cheers
Petr
xx
correct in
#
Hi email,
If you want what you described, try this:

xx<-read.table(text="COMPANY_NUMBER NUMBER_OF_YEARS
0070837  3
0070837  3
0070837  3
1000403  4
1000403  4
1000403  4
1000403  4
10029943  3
10029943  3
10029943  3
10037980  4
10037980  4
10037980  4
10037980  4
10057418  3
10057418  3
10057418  3
1009550  4
1009550  4
1009550  4
1009550  4",
header=TRUE,stringsAsFactors=FALSE)
xx$I<-NA
xx$J<-NA
row_count<-1
for(row in 1:nrow(xx)) {
 if(row == nrow(xx) || xx$COMPANY_NUMBER[row]==xx$COMPANY_NUMBER[row+1]) {
  xx$I[row]<-1
  xx$J[row]<-row_count
  row_count<-row_count+1
 } else {
  xx$I[row]<-xx$J[row]<-xx$NUMBER_OF_YEARS[row]
  row_count<-1
 }
}
xx

Like Petr, I am assuming that you want company 10057418 treated the
same as the others. If not, let us know why. I am also adssuming that
the first three rows should _not_ have a "#" at the beginning, which
means that they will be discarded.

Jim

On Fri, Apr 30, 2021 at 1:41 AM e-mail ma015k3113 via R-help
<r-help at r-project.org> wrote:
#
Hallo,

Sorry, my suggestion did not worked in your case correctly as split used
natural factor ordering.

So using Jim's data, this results in desired output.

#prepare factor in original ordering
ff <- factor(xx[,1], levels=unique(xx[,1]))
lll <- split(xx$COMPANY_NUMBER, ff)
xx$I <- unlist(lapply(lll, function(x) c(rep(1, length(x)-1),
max(length(x)))),use.names=FALSE)
xx$J <- unlist(lapply(lll, function(x) 1:length(x)), use.names=FALSE)
COMPANY_NUMBER NUMBER_OF_YEARS I J
1           70837               3 1 1
2           70837               3 1 2
3           70837               3 3 3
4         1000403               4 1 1
5         1000403               4 1 2
6         1000403               4 1 3
7         1000403               4 4 4
8        10029943               3 1 1
9        10029943               3 1 2
10       10029943               3 3 3
11       10037980               4 1 1
12       10037980               4 1 2
13       10037980               4 1 3
14       10037980               4 4 4
15       10057418               3 1 1
16       10057418               3 1 2
17       10057418               3 3 3
18        1009550               4 1 1
19        1009550               4 1 2
20        1009550               4 1 3
21        1009550               4 4 4

Cheers.
Petr
three
be
#
Hello,

For column J, ave/seq_along seems to be the simplest. For column I, ave 
is also a good option, it avoids split/lapply.


xx$I <- ave(xx$NUMBER_OF_YEARS, xx$COMPANY_NUMBER, FUN = function(x){
   c(rep(1, length(x) - 1), max(length(x)))
})

xx$J <- ave(xx$NUMBER_OF_YEARS, xx$COMPANY_NUMBER, FUN = seq_along)


Hope this helps,

?s 11:49 de 30/04/21, PIKAL Petr escreveu:
#
There is something wrong here I believe -- see inline below:

Bert Gunter

"The trouble with having an open mind is that people keep coming along and
sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
On Fri, Apr 30, 2021 at 10:37 AM Rui Barradas <ruipbarradas at sapo.pt> wrote:

            
length() returns a single integer, so max(length(x)) makes no sense
************************************

  
  
#
Hello,

Right, thanks. I should be


xx$I <- ave(xx$NUMBER_OF_YEARS, xx$COMPANY_NUMBER, FUN = function(x){
         c(rep(1, length(x) - 1), length(x))  ### ???
     })


Hope this helps,

Rui Barradas

?s 19:46 de 30/04/21, Bert Gunter escreveu: