Thank you again,
Claudia
On Tue, Jul 3, 2012 at 2:06 AM, Rui Barradas <ruipbarradas at sapo.pt
<mailto:ruipbarradas at sapo.pt>> wrote:
Hello,
Inline.
Em 03-07-2012 01:15, jim holtman escreveu:
You will have to change the 'i1' expression as follows:
i1 <- grepl("^([0D]|[0d])*$", dd$ch)
i1 # matches strings with d & D in them
[1] TRUE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
# second string had 'd' & 'D' in it so it was TRUE above and
FALSE below
i1new <- grepl("^([0D]*$|[0d]*$)", dd$ch)
i1new
[1] TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
Right, apparently, I forgot that grep is greedy, and the test cases
were not complete.
I put a 'd' and 'D' in the second string and the original regular
expression is equivalent to
grepl("^[0dD]*$", dd$ch)
This is only for the first request, and does not solve cases where
there are characters other than '0', 'd' or 'D', but 'd' or 'D' are
the first non-zero. This is the case of my 4th row, changed from the
OP's data example.
My regexpr for 'i2' is equivalent to this one, that I believe is
more readable:
i2b <- grepl("^0{0,}[Dd]", dd$ch)
First a zero, that might occur zero or more times, then a 'd' or
'D', then and til the end, irrelevant.
which will match strings containing d, D and 0. If you only
want 'd'
or 'D' (and not both), then you will have to use the one in 'i1new'.
To the OP: bottom line, use Jim's 'i1new' and my 'i2' or 'i2b'.
Rui Barradas
On Mon, Jul 2, 2012 at 7:24 PM, Rui Barradas
<ruipbarradas at sapo.pt <mailto:ruipbarradas at sapo.pt>> wrote:
Hello,
Try regular expressions instead.
In this data.frame, I've changed row nr.4 to have a row with
'D' as first
non-zero character.
dd <- read.table(text="
ch count
1 0000000000D0000000000000000000__000000000000000000 0.007368
2 0000000000d0000000000000000000__000000000000000000 0.002456
3 000000000T00000000000000000000__000000000000000000 0.007368
4 000000000DT0000000000000000000__000000000000000000 0.007368
5 000000000T00000000000000000000__000000000000000000 0.002456
6 000000000Td0000000000000000000__000000000000000000 0.002456
7 00000000T000000000000000000000__000000000000000000 0.007368
8 00000000T0D0000000000000000000__000000000000000000 0.007368
9 00000000T000000000000000000000__000000000000000000 0.002456
10 00000000T0d0000000000000000000__000000000000000000 0.002456
", header=TRUE)
dd
i1 <- grepl("^([0D]|[0d])*$", dd$ch)
i2 <- grepl("^0*[Dd]", dd$ch)
dd[!i1, ]
dd[!i2, ]
dd[!(i1 | i2), ]
Hope this helps,
Rui Barradas
Em 02-07-2012 23:48, Claudia Penaloza escreveu:
I would like to remove rows from the following data
frame (df) if there
are
only two specific elements found in the df$ch character
string (I want to
remove rows with only "0" & "D" or "0" & "d").
Alternatively, I would like
to remove rows if the first non-zero element is "D" or "d".
ch
count
1 0000000000D0000000000000000000__000000000000000000
0.007368;
2 0000000000d0000000000000000000__000000000000000000
0.002456;
3 000000000T00000000000000000000__000000000000000000
0.007368;
4 000000000TD0000000000000000000__000000000000000000
0.007368;
5 000000000T00000000000000000000__000000000000000000
0.002456;
6 000000000Td0000000000000000000__000000000000000000
0.002456;
7 00000000T000000000000000000000__000000000000000000
0.007368;
8 00000000T0D0000000000000000000__000000000000000000
0.007368;
9 00000000T000000000000000000000__000000000000000000
0.002456;
10 00000000T0d0000000000000000000__000000000000000000
0.002456;
I tried the following but it doesn't work if there is
more than one
character per string:
df <- df[!df$ch %in% c("0","D"),]
df <- df[!df$ch %in% c("0","d"),]
Any help greatly appreciated,
Claudia
[[alternative HTML version deleted]]