Skip to content

Regex problem

3 messages · Carl Sutton, David Winsemius

#
Re-sending help request, went to wrong addy first time.  
r-help-request at r-project.org

Belated Happy new year to the Guru's:

I have a data frame with 570+ columns and in those column headers yours truly has a few blunders.  Namely somehow I managed to end some of them with both an apostrophe ' and an ending quote.   I think the attached code finds the occurrences (not 100% sure) and feedback is appreciated.  This is my first attempt at regex and I have been googling and reading the last few days (including an R -Exercise).

Confused as to why the column names shows a "." instead of a " ' ".

Ignorant of why gregexpr and regexpr show attr(,"useBytes") as TRUE when the default is FALSE.  Is it possible I somehow messed them up last week?   Simply typing the function name in the console shows the defaults as FALSE.

I have not been able to build a construct to simply delete the apostrophe.  I have made several attempts to do this, and left one for your perusal.  The others were just to "off the wall" and embarrassing.

Lastly, is there a way for me to check that all of my column names end with a letter followed by a quote?  I am thinking something along the lines of "[[:alpha:]\\"" but I expect that will throw an error.  I stumbled upon the ' " problem when dplyr complained about it last week, and it is unsettling to think I may have more goofs.

Any suggestions of a good reference book is much appreciated.  I can see extended use of regex coming toward me and I am so ignorant it is frightening (all volunteer work, no $'s involved, but I dislike being incompetent).


#  regex problemdf1 <- data.frame("WhatAmI'" = 1:5, "WhoAreYou" = 11:15)
colnames(df1)
df1
ma_pattern <- "[[:punct:]][[:punct:]]" # Need single ][ in the middle??
grep(ma_pattern,colnames(df1))
ma_pattern <- "[[:punct:][:punct:]]"  #  single ][ worked
grep(ma_pattern,colnames(df1),value = TRUE)  #  found it
grepl(ma_pattern,colnames(df1)) 
gregexpr(ma_pattern,colnames(df1))   # at position 8
regexpr(ma_pattern,colnames(df1))

#sub(pattern, replacement, x, ignore.case = FALSE, perl = FALSE,
#    fixed = FALSE, useBytes = FALSE)

#sub(ma_pattern,replacement = "'\\"",df1)
colnames(df1)

Carl Sutton
#
Doubtful. You probably only have a single apostrophe and no "ending quote". In fact when I run your `problemdf`, the `make.names` function (called by data.frame) changed the apostrophe into a period. To actually get a trailing apostrophe with `data.frame` you would need to set check.names=FALSE:

df1 <- data.frame("WhatAmI\'" = 1:5, "WhoAreYou" = 11:15, check.names=FALSE)
colnames(df1)
#[1] "WhatAmI'"  "WhoAreYou"
There is no double quote in that name. Now to remove the offending apostrophe (or even multiple instances of them) just do this:

names(df) <- gsub( "\\'", "", names(df)
See above.
I learned regex by reading the ?regex page, and by looking up and working through questions on R-help by Gabor Grothendeick:

http://markmail.org/search/?q=list%3Aorg.r-project.r-help+regex#query:list%3Aorg.r-project.r-help%20regex%20from%3A%22Gabor%20Grothendieck%22+page:1+state:facets


There are also several online sites where you can get an expression by expression readout of what your regexes are doing. They do need the understanding that hte escape character for R and regex are the same and the means they need to be doubled in hte pattern arguments (but _not_ the replacement arguments).
#
Thank you gentlemen, thank you!?? All worked as you said it would and my headers are now error free.
And David, thanks for the reference material cite.? I will looking at that this weekend.
?Carl Sutton
On Thursday, January 5, 2017 12:12 PM, David Winsemius <dwinsemius at comcast.net> wrote:

            
Doubtful. You probably only have a single apostrophe and no "ending quote". In fact when I run your `problemdf`, the `make.names` function (called by data.frame) changed the apostrophe into a period. To actually get a trailing apostrophe with `data.frame` you would need to set check.names=FALSE:

df1 <- data.frame("WhatAmI\'" = 1:5, "WhoAreYou" = 11:15, check.names=FALSE)
colnames(df1)
#[1] "WhatAmI'"? "WhoAreYou"
There is no double quote in that name. Now to remove the offending apostrophe (or even multiple instances of them) just do this:

names(df) <- gsub( "\\'", "", names(df)
See above.
I learned regex by reading the ?regex page, and by looking up and working through questions on R-help by Gabor Grothendeick:

http://markmail.org/search/?q=list%3Aorg.r-project.r-help+regex#query:list%3Aorg.r-project.r-help%20regex%20from%3A%22Gabor%20Grothendieck%22+page:1+state:facets


There are also several online sites where you can get an expression by expression readout of what your regexes are doing. They do need the understanding that hte escape character for R and regex are the same and the means they need to be doubled in hte pattern arguments (but _not_ the replacement arguments).