Skip to content

Find "undirected" duplicates in a tibble

2 messages · Kimmo Elo, Gabor Grothendieck

#
Hi!

I am working with a large network data consisting of source-target
pairs stored in a tibble. Now I need to transform the directed dataset
to an undirected network data. This means, I need to keep only one
instance for pairs with the same "nodes". In other words, if my data
has one row with A (source) and B (target) and one with B (source) and
A (target), only the pair A-B should be kept.

Here an example how I have solved this problem so far:

--- snip ---

# Create some data
x<-tibble(Source=rep(1:3,4), Target=c(rep(1,3),rep(2,3),rep(3,3),rep(4,3)))
x	# print original data

# Remove "undirected" duplicates
x<-x %>% mutate(pair=mapply(function(x,y)
paste0(sort(c(x,y)),collapse="-"), Source, Target)) %>% distinct(pair,
.keep_all = T) %>% mutate(Source=sapply(pair, function(x)
unlist(strsplit(x, split="-"))[1]), Target=sapply(pair, function(x)
unlist(strsplit(x, split="-"))[2])) %>% select(-pair)

x	# print cleaned data

--- snip ---

The good thing with my own solution is that it allows the creation of
weighted pairs as well. One just needs to replace 'distinct(pair,
.keep_all=T)' with 'count(pair)'.

I have done a lot of searching but not found any function providing
this functionality. Does someone know an alternative, maybe a more
effective function/solution?

Best,

Kimmo Elo
#
Since you are dealing with graphs you could consider using
the igraph package.  This is more involved than needed for
what you are
asking but it might be useful for other follow on calculations.
We first define a 2 column matrix of edges, then convert it to
an igraph and simplify it to remove duplicate edges giving g.
At the end we get an edgelist back.

  library(igraph)
  m <- matrix(c(1, 2, 6, 6, 4, 9, 1, 5, 2, 1, 8, 7, 5, 10, 6, 10), 8, 2)
  g <- m |>
    graph_from_edgelist(directed = FALSE) |>
    simplify()

  plot(g)

  g |>
    get.edgelist() |>
    as.data.frame()
On Fri, Aug 20, 2021 at 5:00 AM Kimmo Elo <kimmo.elo at utu.fi> wrote: