Duplicate names in the pivot column

Sun, Mar 29, 2020 12:24 AM

Does this help?

df4 <- (   df
       %>% group_by( time, y )
       %>% mutate( lvl = seq.int( n() ) )
       %>% ungroup()
       %>% mutate( y = ifelse( 1==lvl
                             , y
                             , paste( y, "dup" )
                             )
                 )
       )

On March 28, 2020 6:18:51 PM PDT, phil at philipsmith.ca wrote:

I have a problem involving inefficient coding. My code works, but in my

actual application it takes a very long time to execute. I have
included 
a reprex here that uses the same code, but with a much smaller-scale 
application.

The data frame I am working with (df in my reprex) is in long form and
I 
want to change it to wide form. My problem is that the pivot column, 
column 2 in my reprex, has some duplicate strings, so the pivot doesn't

work well (df1 in my reprex). I want to find all the duplicates and tag

them so they are no longer duplicates. My code succeeds (df3 in my 
reprex). But in the real application there can be over 100 "cases" and 
the for loops grind on far too long.

I encounter this problem frequently in the datasets I use, so I am 
looking for a general solution that is as efficient as possible. Any 
help will be much appreciated.

Philip

``` r
library(tidyverse)
df <- data.frame(time=c(1,1,1,1,1,1,2,2,2,2,2,2),
                 y=c("A","B","C","B","D","C","A","B","C","B","D","C"),
               z=sample(1:100,12,replace=TRUE),stringsAsFactors=FALSE)
df1 <- pivot_wider(df,id_cols=1,names_from=y,values_from=z)
#> Warning: Values in `z` are not uniquely identified; output will 
contain list-cols.
#> * Use `values_fn = list(z = list)` to suppress this warning.
#> * Use `values_fn = list(z = length)` to identify where the
duplicates 
arise
#> * Use `values_fn = list(z = summary_fun)` to summarise duplicates
fixcol <- function(dfm,cases,per,s,tag) {
  # dfm is the data frame
  # s is the target column number, containing character names
  # tag is a string to be added to a duplicate name
  # cases is the number of rows for a single time period
  # per is the number of time periods
  # all time periods must have the same number of rows
  for (k in 1:per) {
    for (i in (1+(k-1)*cases):(k*cases-1)) {
      for (j in (i+1):(k*cases)) {
        if (dfm[j,s]==dfm[i,s]) { # found a duplicate
          dfm[j,s] <- paste0(dfm[i,s],tag) # fix the duplicate
          dfm[j,s]
        }
      }
    }
  }
  return(dfm)
}
df2 <- fixcol(df,6,2,2,"_dup")
df3 <- pivot_wider(df2,id_cols=1,names_from=y,values_from=z)
```

<sup>Created on 2020-03-28 by the [reprex 
package](https://reprex.tidyverse.org) (v0.3.0)</sup>

Sent from my phone. Please excuse my brevity.

Duplicate names in the pivot column

Thread (4 messages)