Skip to content
Prev 388090 / 398502 Next

Decompose df1 into another df2 based on values in df1

Thank you for the reprex. However your specification was too vague for me
to know exactly what your data are like, so I tried to assume the most
general possibility, with the consequence that I may be giving you an
answer to the wrong question. Hopefully, you can adjust as needed to get
what you want.

I need also warn you that I am nearly certain there are more elegant,
cleverer, faster ways to do this. I just used simple tools. So you may wish
to wait a bit to see whether others can improve on my attempt.

First of all, I assumed the "a2/a3" in S5 in d1 is a typo and it should be
"a2|a3". If it is is not a typo then substitute "\\||\\/" for "\\|" in the
strsplit function in the code that follows.
Secondly, I assumed that your identifiers, "a1" for example, could occur
more than 1 time in your data. If the only possibilities are 0 or 1 times,
then the code I provided --in particular the last sapply-- is too
complicated. A faster approach in that case might be to use R's outer()
function; I leave that as an exercise for you or someone else to help you
with if so.

Here is my code for your reprex:

getall<- function(x){
   ul <-unlist(strsplit(x,"\\|"))
   ul[ul != "w"]
}
allvals <- lapply(d1, getall)
uneeks <- sort(unique(unlist(allvals)))
sapply(allvals, function(x)table(factor(x, levels = uneeks)))


## which gives
S1 S2 S3 S4 S5
a1  1  0  0  0  0
a2  1  0  1  0  1
a3  0  0  0  0  1
b1  1  1  1  0  0
b3  1  0  1  0  0
b4  0  0  1  1  0
c1  0  0  1  0  0
c2  0  1  0  0  0
c4  0  0  1  1  0

Bert

Bert Gunter

"The trouble with having an open mind is that people keep coming along and
sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )


On Wed, May 26, 2021 at 2:18 PM Adrian Johnson <oriolebaltimore at gmail.com>
wrote: