Skip to content
Back to formatted view

Raw Message

Message-ID: <501F629F.7090201@sapo.pt>
Date: 2012-08-06T06:22:23Z
From: Rui Barradas
Subject: regexpr with accents
In-Reply-To: <F649E9A1-B354-4178-8FF8-E97BE6C4C135@gmail.com>

Hello,

Works with me:

d1 <- data.frame(V1 = 1:3,
     V2 = c("some text = 9", "some t?xt = 9", "some other text = 9"))

regexpr("some text = 9", d1$V2)
[1]  1 -1 -1
attr(,"match.length")
[1] 13 -1 -1
regexpr("some t?xt = 9", d1$V2)
[1] -1  1 -1
attr(,"match.length")
[1] -1 13 -1
d1$V1[regexpr("some text = 9",d1$V2) > 0] <- 9
d1$V1[regexpr("some t?xt = 9",d1$V2) > 0] <- 9
d1
   V1                  V2
1  9       some text = 9
2  9       some t?xt = 9
3  3 some other text = 9

What do you mean by "it did not work"? What was the contents of 'd1'?

sessionInfo()
R version 2.15.1 (2012-06-22)
Platform: x86_64-pc-mingw32/x64 (64-bit)

locale:
[1] LC_COLLATE=Portuguese_Portugal.1252 LC_CTYPE=Portuguese_Portugal.1252
[3] LC_MONETARY=Portuguese_Portugal.1252 LC_NUMERIC=C
[5] LC_TIME=Portuguese_Portugal.1252

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods base

loaded via a namespace (and not attached):
[1] fortunes_1.5-0

Hope this helps,

Rui Barradas

Em 06-08-2012 06:55, Luca Meyer escreveu:
> Hello,
>
> I have build a syntax to find out if a given substring is included in a larger string that works like this:
>
> d1$V1[regexpr("some text = 9",d1$V2)>0] <- 9
>
> and this works all right till "some text" contains standard ASCII set. However, it does not work when accents are included as the following:
>
> d1$V1[regexpr("some t?xt = 9",d1$V2)>0] <- 9
>
> I have tried to substitute "?" with several wildcards but it did not work, can anyone suggest how to have the syntax parse the string ignoring the accent?
>
> Thank you in advance,
>
> Luca
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.