Regex to stop at first capital letter after sequence
On Dec 19, 2016, at 1:25 PM, Omar Andr? Gonz?les D?az <oma.gonzales at gmail.com> wrote: I have the following strings: [1] "PPA 06 - Promo Vasito" [2] "PPA 05 - Cuentos" [3] "PPA 04 - Promo vasito" [4] "PPA 03 - Promoci?n escolar" [5] "PPA - Saluda a tu pediatra" [6] "PPL - Dia del Pediatra" *Desired result*: [1] "Promo Vasito" "Cuentos" "Promo vasito" [4] "Promoci?n escolar" "Saluda a tu pediatra" "Dia del Pediatra"
All this assumes you are passing a character vector to sub. The combination of your subject line and the example are a bit underspecified. Here's two solution one of which is delivering everything beginning with the last cap after the (last) dash and the other is delivering everything after but not including the last <dash><spc> sequence:
sub("^.+[-].+(?=[A-Z])", "" , dat, perl=TRUE) # need perl=TRUE for PCRE look-ahead
[1] "Vasito" "Cuentos" [3] "Promo vasito" "Promoci?n escolar" [5] "Saluda a tu pediatra" "Pediatra" Greedy matching above, ungreedy; set by '(?U)' below:
sub("(?U)^.+[-].+(?=[A-Z])", "" , dat, perl=TRUE)
[1] "Promo Vasito" "Cuentos" [3] "Promo vasito" "Promoci?n escolar" [5] "Saluda a tu pediatra" "Dia del Pediatra"
sub("^.+[-][ ]", "" , dat) # character classes to define sequence.
[1] "Promo Vasito" "Cuentos" [3] "Promo vasito" "Promoci?n escolar" [5] "Saluda a tu pediatra" "Dia del Pediatra"
*First attemp*:
After this line:
mead_nov$`Nombre del anuncio` <- gsub("(PPA.*)([A-Z].*)", "\\2",
mead_nov$`Nombre del anuncio`)
I get these:
[1] "Vasito" [2] "Cuentos" [3] "Promo
vasito"
[4] "Promoci?n escolar" [5] "Saluda a tu pediatra" [6] "PPL - Dia
del Pediatra"
*Second attemp:*
mead_nov$`Nombre del anuncio` <- gsub("(PPA|PPL.*)([A-Z].*)", "\\2",
mead_nov$`Nombre del anuncio`)
I get this:
[1] "PPA 06 - Promo Vasito" [2] "PPA 05 - Cuentos"
[3] "PPA 04 - Promo vasito" [3] "PPA 03 - Promoci?n escolar"
[5] "PPA - Saluda a tu pediatra" [6] "Pediatra"
Thank you for your help.
[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
David Winsemius Alameda, CA, USA