Back to formatted view
Raw Message

Message-ID: <cc9903ac-59fa-a01f-41c7-49b28e56f499@molconn.com>
Date: 2020-09-23T15:37:24Z
From: LMH
Subject: Split
In-Reply-To: <CAJOiR6aDGTjvN_s4DOK_3qGJjEQ=N55SY1cX-mee5KswrB_nnQ@mail.gmail.com>

What is the delimiter is in the input data? Is it tab, space, etc?

Is this going to be the same for the output data that you will use for R input?

LMH


Val wrote:
> Thank you all for the help!
> 
> LMH, Yes I would like to see the alternative.  I am using this for a
> large data set and if the  alternative is more efficient than this
> then I would be happy.
> 
> On Tue, Sep 22, 2020 at 6:25 PM Bert Gunter <bgunter.4567 at gmail.com> wrote:
>>
>> To be clear, I think Rui's solution is perfectly fine and probably better than what I offer below. But just for fun, I wanted to do it without the lapply().  Here is one way. I think my comments suffice to explain.
>>
>>> ## which are the  non "_" indices?
>>> wh <- grep("_",F1$text, fixed = TRUE, invert = TRUE)
>>> ## paste "_." to these
>>> F1[wh,"text"] <- paste(F1[wh,"text"],".",sep = "_")
>>> ## Now strsplit() and unlist() them to get a vector
>>> z <- unlist(strsplit(F1$text, "_"))
>>> ## now cbind() to the data frame
>>> F1 <- cbind(F1, matrix(z, ncol = 2, byrow = TRUE))
>>> F1
>>   ID1 ID2   text    1  2
>> 1  A1  B1 NONE_. NONE  .
>> 2  A1  B1  cf_12   cf 12
>> 3  A1  B1 NONE_. NONE  .
>> 4  A2  B2  X2_25   X2 25
>> 5  A2  B3  fd_15   fd 15
>>> ## You can change the names of the 2 columns yourself
>>
>> Cheers,
>> Bert
>>
>> Bert Gunter
>>
>> "The trouble with having an open mind is that people keep coming along and sticking things into it."
>> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>>
>>
>> On Tue, Sep 22, 2020 at 12:19 PM Rui Barradas <ruipbarradas at sapo.pt> wrote:
>>>
>>> Hello,
>>>
>>> A base R solution with strsplit, like in your code.
>>>
>>> F1$Y1 <- +grepl("_", F1$text)
>>>
>>> tmp <- strsplit(as.character(F1$text), "_")
>>> tmp <- lapply(tmp, function(x) if(length(x) == 1) c(x, ".") else x)
>>> tmp <- do.call(rbind, tmp)
>>> colnames(tmp) <- c("X1", "X2")
>>> F1 <- cbind(F1[-3], tmp)    # remove the original column
>>> rm(tmp)
>>>
>>> F1
>>> #  ID1 ID2 Y1   X1 X2
>>> #1  A1  B1  0 NONE  .
>>> #2  A1  B1  1   cf 12
>>> #3  A1  B1  0 NONE  .
>>> #4  A2  B2  1   X2 25
>>> #5  A2  B3  1   fd 15
>>>
>>>
>>> Note that cbind dispatches on F1, an object of class "data.frame".
>>> Therefore it's the method cbind.data.frame that is called and the result
>>> is also a df, though tmp is a "matrix".
>>>
>>>
>>> Hope this helps,
>>>
>>> Rui Barradas
>>>
>>>
>>> ?s 20:07 de 22/09/20, Rui Barradas escreveu:
>>>> Hello,
>>>>
>>>> Something like this?
>>>>
>>>>
>>>> F1$Y1 <- +grepl("_", F1$text)
>>>> F1 <- F1[c(1, 2, 4, 3)]
>>>> F1 <- tidyr::separate(F1, text, into = c("X1", "X2"), sep = "_", fill =
>>>> "right")
>>>> F1
>>>>
>>>>
>>>> Hope this helps,
>>>>
>>>> Rui Barradas
>>>>
>>>> ?s 19:55 de 22/09/20, Val escreveu:
>>>>> HI All,
>>>>>
>>>>> I am trying to create   new columns based on another column string
>>>>> content. First I want to identify rows that contain a particular
>>>>> string.  If it contains, I want to split the string and create two
>>>>> variables.
>>>>>
>>>>> Here is my sample of data.
>>>>> F1<-read.table(text="ID1  ID2  text
>>>>> A1 B1   NONE
>>>>> A1 B1   cf_12
>>>>> A1 B1   NONE
>>>>> A2 B2   X2_25
>>>>> A2 B3   fd_15  ",header=TRUE,stringsAsFactors=F)
>>>>> If the variable "text" contains this "_" I want to create an indicator
>>>>> variable as shown below
>>>>>
>>>>> F1$Y1 <- ifelse(grepl("_", F1$text),1,0)
>>>>>
>>>>>
>>>>> Then I want to split that string in to two, before "_" and after "_"
>>>>> and create two variables as shown below
>>>>> x1= strsplit(as.character(F1$text),'_',2)
>>>>>
>>>>> My problem is how to combine this with the original data frame. The
>>>>> desired  output is shown   below,
>>>>>
>>>>>
>>>>> ID1 ID2  Y1   X1    X2
>>>>> A1  B1    0   NONE   .
>>>>> A1  B1   1    cf        12
>>>>> A1  B1   0  NONE   .
>>>>> A2  B2   1    X2    25
>>>>> A2  B3   1    fd    15
>>>>>
>>>>> Any help?
>>>>> Thank you.
>>>>>
>>>>> ______________________________________________
>>>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>> PLEASE do read the posting guide
>>>>> http://www.R-project.org/posting-guide.html
>>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>>
>>>>
>>>> ______________________________________________
>>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide
>>>> http://www.R-project.org/posting-guide.html
>>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
> 
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>