how to identify record with broken format
I've seen that behaviour with a C" atom in a chemical structure.
Here is code to identify lines with an uneven number of quotation marks. Read your file with readLines() to use it.
myTxt <- '"This" "is" "fine"'
myTxt[2] <- '"This" "is "not"'
myTxt[3] <- 'This is ok'
x <- lengths(regmatches(myTxt, gregexpr('\\"', myTxt))) # (1)
which(x %% 2 == 1)
[1] 2
Cheers,
Boris
(1) credit to https://stackoverflow.com/questions/12427385/how-to-calculate-the-number-of-occurrence-of-a-given-character-in-each-row-of-a
On 2019-06-05, at 06:12, Luigi Marongiu <marongiu.luigi at gmail.com> wrote: Dear all, I have a large dataframe where one of the records in a column must have been wrongly formatted, in particular i think is missing a closing ". When I try to show only that column's value I get a [1] with plenty of empty space, the final record [45] and the system freezes. also, when i try to plot i get a table's printout instead of a real plot. Is there a way to identify the record with the format? On a spreadsheet or text editor, all records seem OK; end there are too many records to visually inspect them all. -- Best regards, Luigi
______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.