Skip to content
Prev 379810 / 398500 Next

how to identify record with broken format

I've seen that behaviour with a C" atom in a chemical structure.

Here is code to identify lines with an uneven number of quotation marks. Read your file with readLines() to use it.

myTxt    <- '"This" "is" "fine"'
myTxt[2] <- '"This" "is "not"'
myTxt[3] <- 'This is ok'
 
x <- lengths(regmatches(myTxt, gregexpr('\\"', myTxt)))  # (1)
which(x %% 2 == 1)
[1] 2


Cheers,
Boris


(1) credit to https://stackoverflow.com/questions/12427385/how-to-calculate-the-number-of-occurrence-of-a-given-character-in-each-row-of-a