Skip to content

Removing rows if certain elements are found in character string

10 messages · jim holtman, arun, David Winsemius +3 more

#
Hello,

Try regular expressions instead.
In this data.frame, I've changed row nr.4 to have a row with 'D' as 
first non-zero character.

dd <- read.table(text="
ch     count
1  0000000000D0000000000000000000000000000000000000 0.007368
2  0000000000d0000000000000000000000000000000000000 0.002456
3  000000000T00000000000000000000000000000000000000 0.007368
4  000000000DT0000000000000000000000000000000000000 0.007368
5  000000000T00000000000000000000000000000000000000 0.002456
6  000000000Td0000000000000000000000000000000000000 0.002456
7  00000000T000000000000000000000000000000000000000 0.007368
8  00000000T0D0000000000000000000000000000000000000 0.007368
9  00000000T000000000000000000000000000000000000000 0.002456
10 00000000T0d0000000000000000000000000000000000000 0.002456
", header=TRUE)
dd

i1 <- grepl("^([0D]|[0d])*$", dd$ch)
i2 <- grepl("^0*[Dd]", dd$ch)

dd[!i1, ]
dd[!i2, ]
dd[!(i1 | i2), ]


Hope this helps,

Rui Barradas

Em 02-07-2012 23:48, Claudia Penaloza escreveu:
#
You will have to change the 'i1' expression as follows:
[1]  TRUE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[1]  TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
I put a 'd' and 'D' in the second string and the original regular
expression is equivalent to

grepl("^[0dD]*$", dd$ch)

which will match strings containing d, D and 0.  If you only want 'd'
or 'D' (and not both), then you will have to use the one in 'i1new'.
On Mon, Jul 2, 2012 at 7:24 PM, Rui Barradas <ruipbarradas at sapo.pt> wrote:

  
    
#
Hi,
I didn't think about the situation where D comes before T.? I changed my code a little to accommodate that.

dat2<-read.table(text="
1? 0000000000D0000000000000000000000000000000000000 0.007368;
2? 0000000000d0000000000000000000000000000000000000 0.002456;
3? 000000000T00000000000000000000000000000000000000 0.007368;
4? 000000000DT0000000000000000000000000000000000000 0.007368;
5? 000000000T00000000000000000000000000000000000000 0.002456;
6? 000000000Td0000000000000000000000000000000000000 0.002456;
7? 00000000T000000000000000000000000000000000000000 0.007368;
8? 00000000T0D0000000000000000000000000000000000000 0.007368;
9? 00000000T000000000000000000000000000000000000000 0.002456;
10 00000000T0d0000000000000000000000000000000000000 0.002456;
",sep="",header=FALSE)
colnames(dat2)<-c("num","Ch", "count")
dat2[grepl("0T|0Td|0TD",dat2$Ch),]
num?????????????????????????????????????????????? Ch???? count
3??? 3 000000000T00000000000000000000000000000000000000 0.007368;
5??? 5 000000000T00000000000000000000000000000000000000 0.002456;
6??? 6 000000000Td0000000000000000000000000000000000000 0.002456;
7??? 7 00000000T000000000000000000000000000000000000000 0.007368;
8??? 8 00000000T0D0000000000000000000000000000000000000 0.007368;
9??? 9 00000000T000000000000000000000000000000000000000 0.002456;
10? 10 00000000T0d0000000000000000000000000000000000000 0.002456;

A.K.





----- Original Message -----
From: Rui Barradas <ruipbarradas at sapo.pt>
To: Claudia Penaloza <claudiapenaloza at gmail.com>
Cc: r-help at r-project.org
Sent: Monday, July 2, 2012 7:24 PM
Subject: Re: [R] Removing rows if certain elements are found in character string

Hello,

Try regular expressions instead.
In this data.frame, I've changed row nr.4 to have a row with 'D' as 
first non-zero character.

dd <- read.table(text="
ch? ?  count
1? 0000000000D0000000000000000000000000000000000000 0.007368
2? 0000000000d0000000000000000000000000000000000000 0.002456
3? 000000000T00000000000000000000000000000000000000 0.007368
4? 000000000DT0000000000000000000000000000000000000 0.007368
5? 000000000T00000000000000000000000000000000000000 0.002456
6? 000000000Td0000000000000000000000000000000000000 0.002456
7? 00000000T000000000000000000000000000000000000000 0.007368
8? 00000000T0D0000000000000000000000000000000000000 0.007368
9? 00000000T000000000000000000000000000000000000000 0.002456
10 00000000T0d0000000000000000000000000000000000000 0.002456
", header=TRUE)
dd

i1 <- grepl("^([0D]|[0d])*$", dd$ch)
i2 <- grepl("^0*[Dd]", dd$ch)

dd[!i1, ]
dd[!i2, ]
dd[!(i1 | i2), ]


Hope this helps,

Rui Barradas

Em 02-07-2012 23:48, Claudia Penaloza escreveu:
______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
#
On Jul 2, 2012, at 6:48 PM, Claudia Penaloza wrote:

            
You seem to be missing test cases for the second set of conditions but  
this works for the first set (and might for the second):

 > dat[ grepl("[^0dD]", dat$ch) & ! grepl("^0+d|^0^D", dat$ch) , ]
                                                  ch    count
3  000000000T00000000000000000000000000000000000000 0.007368
4  000000000TD0000000000000000000000000000000000000 0.007368
5  000000000T00000000000000000000000000000000000000 0.002456
6  000000000Td0000000000000000000000000000000000000 0.002456
7  00000000T000000000000000000000000000000000000000 0.007368
8  00000000T0D0000000000000000000000000000000000000 0.007368
9  00000000T000000000000000000000000000000000000000 0.002456
10 00000000T0d0000000000000000000000000000000000000 0.002456

  
    
#
Hello,

Inline.

Em 03-07-2012 01:15, jim holtman escreveu:
Right, apparently, I forgot that grep is greedy, and the test cases were 
not complete.
This is only for the first request, and does not solve cases where there 
are characters other than '0', 'd' or 'D', but 'd' or 'D' are the first 
non-zero. This is the case of my 4th row, changed from the OP's data 
example.

My regexpr for 'i2' is equivalent to this one, that I believe is more 
readable:


i2b <- grepl("^0{0,}[Dd]", dd$ch)


First a zero, that might occur zero or more times, then a 'd' or 'D', 
then and til the end, irrelevant.
To the OP: bottom line, use Jim's 'i1new' and my 'i2' or 'i2b'.

Rui Barradas
#
Hello,

I'm glad it helped. See answer inline.

Em 03-07-2012 17:09, Claudia Penaloza escreveu:
Because both 'i1' and 'i1new' test from beginning to end of string, 
allowing only '0' and either 'd' or 'D', but not both (i1new).

So, there's no need to explicitly test for a string that begins with '0'.

Rui Barradas
1 day later
#
Perhaps I've missed something, but if it's really true that the goal is to
remove rows if the first non-zero element is "D" or "d", then how about
this:

tmp <- gsub('0','',df$ch)
first <- substr(tmp,1,1)
subset(df, tolower(first) != 'd')

and of course it could be rolled up into a single expression, but I wrote
it in several steps to make it easy to follow. No need to wrap one's brain
around regular expressions (which is hard for me!)

-Don