Removing rows if certain elements are found in character string

Hello,

I'm glad it helped. See answer inline.

Em 03-07-2012 17:09, Claudia Penaloza escreveu:
Thank you Rui and Jim, both 'i1' and 'i1new' worked perfectly
because there are no instances of 'Dd' or 'dD' in the data set (that I
would/not want to include/exclude)... but I understand that 'i1new'
targets precisely what I want.
Why isn't a leader of zero's required for either 'i1' or 'i1new', as so?
i1newer <- grepl("^0{0,}[D]*$|^0{0,}[d]*$", dd$ch)

Because both 'i1' and 'i1new' test from beginning to end of string, 
allowing only '0' and either 'd' or 'D', but not both (i1new).

So, there's no need to explicitly test for a string that begins with '0'.

Rui Barradas
Thank you again,
Claudia
On Tue, Jul 3, 2012 at 2:06 AM, Rui Barradas <ruipbarradas at sapo.pt
<mailto:ruipbarradas at sapo.pt>> wrote:

    Hello,

    Inline.

    Em 03-07-2012 01:15, jim holtman escreveu:

        You will have to change the 'i1' expression as follows:

            i1 <- grepl("^([0D]|[0d])*$", dd$ch)
            i1  # matches strings with d & D in them

           [1]  TRUE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE

            # second string had 'd' & 'D' in it so it was TRUE above and
            FALSE below
            i1new <- grepl("^([0D]*$|[0d]*$)", dd$ch)
            i1new

           [1]  TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE

    Right, apparently, I forgot that grep is greedy, and the test cases
    were not complete.

        I put a 'd' and 'D' in the second string and the original regular
        expression is equivalent to

        grepl("^[0dD]*$", dd$ch)

    This is only for the first request, and does not solve cases where
    there are characters other than '0', 'd' or 'D', but 'd' or 'D' are
    the first non-zero. This is the case of my 4th row, changed from the
    OP's data example.

    My regexpr for 'i2' is equivalent to this one, that I believe is
    more readable:

    i2b <- grepl("^0{0,}[Dd]", dd$ch)

    First a zero, that might occur zero or more times, then a 'd' or
    'D', then and til the end, irrelevant.

        which will match strings containing d, D and 0.  If you only
        want 'd'
        or 'D' (and not both), then you will have to use the one in 'i1new'.

    To the OP: bottom line, use Jim's 'i1new' and my 'i2' or 'i2b'.

    Rui Barradas

        On Mon, Jul 2, 2012 at 7:24 PM, Rui Barradas
        <ruipbarradas at sapo.pt <mailto:ruipbarradas at sapo.pt>> wrote:

            Hello,

            Try regular expressions instead.
            In this data.frame, I've changed row nr.4 to have a row with
            'D' as first
            non-zero character.

            dd <- read.table(text="

            ch     count
            1  0000000000D0000000000000000000__000000000000000000 0.007368
            2  0000000000d0000000000000000000__000000000000000000 0.002456
            3  000000000T00000000000000000000__000000000000000000 0.007368
            4  000000000DT0000000000000000000__000000000000000000 0.007368

            5  000000000T00000000000000000000__000000000000000000 0.002456
            6  000000000Td0000000000000000000__000000000000000000 0.002456
            7  00000000T000000000000000000000__000000000000000000 0.007368
            8  00000000T0D0000000000000000000__000000000000000000 0.007368
            9  00000000T000000000000000000000__000000000000000000 0.002456
            10 00000000T0d0000000000000000000__000000000000000000 0.002456
            ", header=TRUE)
            dd

            i1 <- grepl("^([0D]|[0d])*$", dd$ch)
            i2 <- grepl("^0*[Dd]", dd$ch)

            dd[!i1, ]
            dd[!i2, ]
            dd[!(i1 | i2), ]

            Hope this helps,

            Rui Barradas

            Em 02-07-2012 23:48, Claudia Penaloza escreveu:

                I would like to remove rows from the following data
                frame (df) if there
                are
                only two specific elements found in the df$ch character
                string (I want to
                remove rows with only "0" & "D" or "0" & "d").
                Alternatively, I would like
                to remove rows if the first non-zero element is "D" or "d".

                                                                    ch
                   count
                1  0000000000D0000000000000000000__000000000000000000
                0.007368;
                2  0000000000d0000000000000000000__000000000000000000
                0.002456;
                3  000000000T00000000000000000000__000000000000000000
                0.007368;
                4  000000000TD0000000000000000000__000000000000000000
                0.007368;
                5  000000000T00000000000000000000__000000000000000000
                0.002456;
                6  000000000Td0000000000000000000__000000000000000000
                0.002456;
                7  00000000T000000000000000000000__000000000000000000
                0.007368;
                8  00000000T0D0000000000000000000__000000000000000000
                0.007368;
                9  00000000T000000000000000000000__000000000000000000
                0.002456;
                10 00000000T0d0000000000000000000__000000000000000000
                0.002456;

                I tried the following but it doesn't work if there is
                more than one
                character per string:

                    df <- df[!df$ch %in% c("0","D"),]
                    df <- df[!df$ch %in% c("0","d"),]

                Any help greatly appreciated,
                Claudia

                          [[alternative HTML version deleted]]

                ________________________________________________
                R-help at r-project.org <mailto:R-help at r-project.org>
                mailing list
                https://stat.ethz.ch/mailman/__listinfo/r-help
                <https://stat.ethz.ch/mailman/listinfo/r-help>
                PLEASE do read the posting guide
                http://www.R-project.org/__posting-guide.html
                <http://www.R-project.org/posting-guide.html>
                and provide commented, minimal, self-contained,
                reproducible code.

            ________________________________________________
            R-help at r-project.org <mailto:R-help at r-project.org> mailing list
            https://stat.ethz.ch/mailman/__listinfo/r-help
            <https://stat.ethz.ch/mailman/listinfo/r-help>
            PLEASE do read the posting guide
            http://www.R-project.org/__posting-guide.html
            <http://www.R-project.org/posting-guide.html>
            and provide commented, minimal, self-contained, reproducible
            code.

Removing rows if certain elements are found in character string

Thread (10 messages)