Skip to content

Help with text separation

6 messages · Michael Griffiths, PIKAL Petr, Sarah Goslee +1 more

#
Hi,

On Mon, Nov 14, 2011 at 4:20 AM, Michael Griffiths
<griffiths at upstreamsystems.com> wrote:
You just want to get rid of them? gsub() it is.

I've changed your formula a little bit to better demonstrate what's going on:
[1] "~ A + C / D + E + E / F * G + H + I + J + K"

That regular expression will take out a
space
+
any capital letter
space
*
space
any capital letter.

It will take out all occurrences of that sequence, but won't take out
occurrences of * not in that sequence.

If you don't want the spaces, you don't need them. Just take them out
of the regular expression as well.

Not that strsplit() was remotely the right tool here, but you can
split into characters without a separator:
[[1]]
[1] "a" "b" "c" "d"

Sarah
#
Hi

r-help-bounces at r-project.org napsal dne 14.11.2011 14:54:05:
only
string.
Hm. I am not at all an expert in regular expressions but recently I 
learned some ways (thanks Uwe)

sub("^(~)\\+(.+)\\+$", "\\1\\2", gsub("[[:alnum:]]+\\*[[:alnum:]]+", "", 
form))
[1] "~Intro+Intro/Intro1++benefit+benefit/benefit1+product+action+mean"

this will remove all values xxxxxx*yyyyy from your form together with 
leading and trailing +

I wonder if any automatic process can remove only one from several 
xxxxxx*yyyyy substrings.

Regards
Petr

PS and still it is not perfect as there is one middle + more.
<sarah.goslee at gmail.com>wrote:
not
M')
and to
also
'+L*M'
going
M')
consider a
them in
does,
for the
2Fwww.upstreamsystems.com%2F&sa=D&sntz=1&usg=AFrqEzfKYfaAalqvahwrpywpJDL9DxUmWw>
http://www.R-project.org/posting-guide.html
#
Hi,

On Mon, Nov 14, 2011 at 8:54 AM, Michael Griffiths
<griffiths at upstreamsystems.com> wrote:
Regular expressions are *very* powerful, so yes. You should read a good
intro to regular expressions, and pay careful attention to the word markers,
then take a look at the specifics of R's implementation.

Why do I send you to the help? Because the possible answers all look a
lot like this:
[1] "~Sentence*LEGAL+Intro+Intro/Intro1+benefit+benefit/benefit1+product+action+mean"

Sarah
#
On Nov 14, 2011, at 4:20 AM, Michael Griffiths wrote:

            
This would be a very narrow implementation that requires the +/spc/ 
alnum/spc/*/alnum sequence exactly;

 > sub("\\+*\\s*[[:alnum:]]*\\s*\\*.[[:alnum:]]*", "", form)
[1] "~ A + B + C + C / D + E + E / F + G + H + I + J + K "

This is a more general implementation using the "*" operator that  
matches each of the preceding item 0 or more times.

  form<-c('~ A + B + C + C / D + E + E / F + G + H + I + J + K + L * M',
  '~ A + B + C + C / D + E + E / F + G + H + I + J + K + L*M',
   '~ A + B + C + C / D + E + E / F + G + H + I + J + K +Llll*M'
  )
 > sub("\\+*\\s*[[:alnum:]]*\\s*\\*.[[:alnum:]]*", "", form)
[1] "~ A + B + C + C / D + E + E / F + G + H + I + J + K "
[2] "~ A + B + C + C / D + E + E / F + G + H + I + J + K "
[3] "~ A + B + C + C / D + E + E / F + G + H + I + J + K "


---stripped out code---