Skip to content

& and |

14 messages · Eric Berger, Ivan Calandra, Henrik Bengtsson +4 more

#
Dear useRs,

I feel really stupid, but I cannot understand why "&" doesn't work as I
expect, while "|" does.

I have the following vector:
mydata <- c("SSFA-ConfoMap_GuineaPigs_NMPfilled.csv",
"SSFA-ConfoMap_Lithics_NMPfilled.csv",?
"SSFA-ConfoMap_Sheeps_NMPfilled.csv", "SSFA-Toothfrax_GuineaPigs.xlsx",
"SSFA-Toothfrax_Lithics.xlsx", "SSFA-Toothfrax_Sheeps.xlsx")
and I want to find the values that include both "ConfoMap" and "GuineaPigs".

If I do:
grep("ConfoMap&GuineaPigs", mydata, value=TRUE)
it returns an empty vector, character(0).

But if I do:
grep("ConfoMap|GuineaPigs", mydata, value=TRUE)
it returns all the elements that include either "ConfoMap" or
"GuineaPigs", as I would expect.

So what is wrong with my "&" construct? How can I return the elements
that include both parts?

Thank you for your help!
Ivan
#
"&" is not a regex metacharacter.
See ?regexp

Bert Gunter

"The trouble with having an open mind is that people keep coming along and
sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
On Wed, Aug 19, 2020 at 7:53 AM Ivan Calandra <calandra at rgzm.de> wrote:

            

  
  
#
mydata[ intersect( grep("ConfoMap", mydata), grep("GuineaPigs", mydata)  ) ]
On Wed, Aug 19, 2020 at 6:13 PM Bert Gunter <bgunter.4567 at gmail.com> wrote:

            

  
  
#
Thank you Bert for the pointer.

So I guess the solution is:
grep("ConfoMap.+GuineaPigs", mydata, value=TRUE)

This is not the case here, but what if "GuineaPigs" comes before
"ConfoMap"?
Of course I could do two "grep()" calls, but if there a better solution?

Ivan

--
Dr. Ivan Calandra
TraCEr, laboratory for Traceology and Controlled Experiments
MONREPOS Archaeological Research Centre and
Museum for Human Behavioural Evolution
Schloss Monrepos
56567 Neuwied, Germany
+49 (0) 2631 9772-243
https://www.researchgate.net/profile/Ivan_Calandra
On 19/08/2020 17:07, Bert Gunter wrote:
#
Thank you Eric, I didn't think about intersect().

Now I'm trying to do that in tidyverse with pipes, and I think that's
too much for me for now!

Ivan

--
Dr. Ivan Calandra
TraCEr, laboratory for Traceology and Controlled Experiments
MONREPOS Archaeological Research Centre and
Museum for Human Behavioural Evolution
Schloss Monrepos
56567 Neuwied, Germany
+49 (0) 2631 9772-243
https://www.researchgate.net/profile/Ivan_Calandra
On 19/08/2020 17:17, Eric Berger wrote:
#
Well... wouldn't it be:

rep("(ConfoMap.*GuineaPigs)|(GuineaPigs.*ConfoMap)", mydata, value=TRUE)

Bert Gunter

"The trouble with having an open mind is that people keep coming along and
sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
On Wed, Aug 19, 2020 at 8:23 AM Ivan Calandra <calandra at rgzm.de> wrote:

            

  
  
#
Indeed!
I was just hoping that there would be a shorter way... intersect() is a
nice alternative too. Maybe I can make it work with pipes so that I
don't have to repeat "mydata" but that's another story.

Thank you for the help!
Ivan

--
Dr. Ivan Calandra
TraCEr, laboratory for Traceology and Controlled Experiments
MONREPOS Archaeological Research Centre and
Museum for Human Behavioural Evolution
Schloss Monrepos
56567 Neuwied, Germany
+49 (0) 2631 9772-243
https://www.researchgate.net/profile/Ivan_Calandra
On 19/08/2020 17:31, Bert Gunter wrote:
#
A version of Eric's answer is to use grepl(), which returns a logical vector:

mydata[grepl("ConfoMap", mydata) & grepl("GuineaPigs", mydata)]

with the OR analogue:

mydata[grepl("ConfoMap", mydata) | grepl("GuineaPigs", mydata)]

/Henrik
On Wed, Aug 19, 2020 at 8:24 AM Ivan Calandra <calandra at rgzm.de> wrote:
#
Instead of intersect you could use grepl(pattern1,x) &
grepl(pattern2,x).  Use which() on the result if you must have
integers, but the logicals that grepl() produces are often easier to
use as subscripts.

Bill Dunlap
TIBCO Software
wdunlap tibco.com
On Wed, Aug 19, 2020 at 8:54 AM Ivan Calandra <calandra at rgzm.de> wrote:
#
On Wed, Aug 19, 2020 at 7:53 AM Ivan Calandra <calandra at rgzm.de> wrote:
| 
| I have the following vector:
| 	mydata <- 
| 	c("SSFA-ConfoMap_GuineaPigs_NMPfilled.csv", 
| 	"SSFA-ConfoMap_Lithics_NMPfilled.csv", 
| 	"SSFA-ConfoMap_Sheeps_NMPfilled.csv", 
| 	"SSFA-Toothfrax_GuineaPigs.xlsx", 
| 	"SSFA-Toothfrax_Lithics.xlsx", 
| 	"SSFA-Toothfrax_Sheeps.xlsx")
| and I want to find the values that 
| include both "ConfoMap" and 
| "GuineaPigs".

Dear Ivan,

I also found this[1], so this line 
returns 1 like many of these other 
suggestions:

	grep("(.*ConfoMap)(.*GuineaPigs)", mydata)

Best,
Rasmus

[1] https://stackoverflow.com/questions/13187414/r-grep-is-there-an-and-operator

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: not available
URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20200820/41072300/attachment.sig>
#
There are & and | operators in the R language.
There is an | operator in regular expressions.
There is NOT any & operator in regular expressions.
grep("ConfoMap&GuineaPigs", mydata, value=TRUE)
looks for elements of mydata containing the literal
string 'ConfoMap&GuineaPigs'.
[1] "cab"  "back"

grepl returns a TRUE/FALSE vector.
On Thu, 20 Aug 2020 at 02:53, Ivan Calandra <calandra at rgzm.de> wrote:

            

  
  
#
Thank you all for all the very helpful answers!

Best,
Ivan

--
Dr. Ivan Calandra
TraCEr, laboratory for Traceology and Controlled Experiments
MONREPOS Archaeological Research Centre and
Museum for Human Behavioural Evolution
Schloss Monrepos
56567 Neuwied, Germany
+49 (0) 2631 9772-243
https://www.researchgate.net/profile/Ivan_Calandra
On 20/08/2020 3:28, Richard O'Keefe wrote:
#
The single grep regex solutions offered to Ivan's problem were fine, but do
not readily generalize to the conjunction of multiple (>2, say) regex
patterns that can appear anywhere in a string and in any order. However,
note that this can easily be done using the Perl zero width lookahead
construction,  "(?=...)" .
e.g.
"xAyCz","xAyBzC","xCByAz","xACyB","BAyyC","CBxBAy")

## to search for strings contain "A", "B", & "C" in any order
[1] 3 4 5 6 7

Note that this matches on one or multiple instances of the patterns. If one
wants only exactly one instance of each conjunct,  then something like this
should do:
[1] 3 4 5 6

Cheers,
Bert
On Wed, Aug 19, 2020 at 11:38 PM Ivan Calandra <calandra at rgzm.de> wrote:

            

  
  
#
Thank you Bert, this is wonderful!

Best wishes,
Ivan

--
Dr. Ivan Calandra
TraCEr, laboratory for Traceology and Controlled Experiments
MONREPOS Archaeological Research Centre and
Museum for Human Behavioural Evolution
Schloss Monrepos
56567 Neuwied, Germany
+49 (0) 2631 9772-243
https://www.researchgate.net/profile/Ivan_Calandra
On 21/08/2020 0:37, Bert Gunter wrote: