Back to formatted view
Raw Message

Message-ID: <8DA0B307-E961-4877-8D07-A9FCC7639631@comcast.net>
Date: 2011-11-14T17:05:13Z
From: David Winsemius
Subject: Help with text separation
In-Reply-To: <CACMqikdhnSO+Y4hkYRtByR2J6qCUPuEJrE3ubj2TvPtT-HRD_g@mail.gmail.com>

On Nov 14, 2011, at 4:20 AM, Michael Griffiths wrote:

> Good morning R list,
>
> My apologies if this has *already* answered elsewhere, but I have  
> not found
> the answer that I am looking for.
>
> I have a character string, i.e.
>
>
> form<-c('~ A + B + C + C / D + E + E / F + G + H + I + J + K + L * M')
>
> Now, my aim is to find the position of all those instances of '*'  
> and to
> remove said '*'. However, I would also like to remove the preceding
> variable name before the '*', the math operator preceding this, and  
> also
> the variable name after the '*'. So, here I would like to remove  
> '+L*M'

This would be a very narrow implementation that requires the +/spc/ 
alnum/spc/*/alnum sequence exactly;

 > sub("\\+*\\s*[[:alnum:]]*\\s*\\*.[[:alnum:]]*", "", form)
[1] "~ A + B + C + C / D + E + E / F + G + H + I + J + K "

This is a more general implementation using the "*" operator that  
matches each of the preceding item 0 or more times.

  form<-c('~ A + B + C + C / D + E + E / F + G + H + I + J + K + L * M',
  '~ A + B + C + C / D + E + E / F + G + H + I + J + K + L*M',
   '~ A + B + C + C / D + E + E / F + G + H + I + J + K +Llll*M'
  )
 > sub("\\+*\\s*[[:alnum:]]*\\s*\\*.[[:alnum:]]*", "", form)
[1] "~ A + B + C + C / D + E + E / F + G + H + I + J + K "
[2] "~ A + B + C + C / D + E + E / F + G + H + I + J + K "
[3] "~ A + B + C + C / D + E + E / F + G + H + I + J + K "


---stripped out code---

-- 
David Winsemius, MD
West Hartford, CT