Skip to content
Prev 302487 / 398506 Next

bibtex::read.bib -- extracting bibentry keys

On 8/6/2012 11:54 AM, Achim Zeileis wrote:
One thing that was confusing was that read.bib returns a "bibentry" 
object, all of whose
elements are also "bibentry" objects.
That is what I was missing -- it would have helped to find a link to 
utils::bibentry in the [rather scanty] documentation for
read.bib. I'm now a happy camper in this regard. What I wanted is given by:

bib1 <- read.bib("C:/localtexmf/bibtex/bib/timeref.bib")
length(bib1)
keys1 <- unlist(bib1$key)

bib2 <- read.bib("W:/texmf/bibtex/bib/timeref.bib")
length(bib2)
keys2 <- unlist(bib2$key)


 > which(! keys1 %in% keys2)
[1] 133 249 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 
627 628
 > keys1[which(! keys1 %in% keys2)]
[1] "Langren:1646" "Fisher:1915a" "Stigler:2012"
[4] "Wainer:2011" "Minard:1860a" "CNAM:1906"
[7] "Wainer:2012" "Wainer-Ramsay:2010" "Stephenson-Galneder:1969"
[10] "Waters:1964" "Agathe:1988" "Gascoigne:2007"
[13] "Krzywinski:2009" "Bolle:1929" "Balbi:1829"
[16] "Bills-Li:2005" "Lewi:2006" "Fletcher:1851"
[19] "Perrot:1976"
 >

As a side note, I searched extensively for bibtex tools that would help 
me resolve the differences between two
related bibtex files, but none was as simple as this, once I could get 
the keys. Thanks to Roman for providing this
infrastructure!

So, ignoring for now differences in the contents of the bibentries, a 
useful tool for my purpose is bibdiff(),

bibdiff <- function(bib1, bib2) {
keys1 <- unlist(bib1$key)
keys2 <- unlist(bib2$key)
only1 <- keys1[which(! keys1 %in% keys2)]
only2 <- keys2[which(! keys2 %in% keys1)]
cat("Only in bib1:\n")
print(only1)
cat("Only in bib2:\n")
print(only2)
}

 > bibdiff(bib1, bib2)
Only in bib1:
[1] "Langren:1646" "Fisher:1915a" "Stigler:2012"
[4] "Wainer:2011" "Minard:1860a" "CNAM:1906"
[7] "Wainer:2012" "Wainer-Ramsay:2010" "Stephenson-Galneder:1969"
[10] "Waters:1964" "Agathe:1988" "Gascoigne:2007"
[13] "Krzywinski:2009" "Bolle:1929" "Balbi:1829"
[16] "Bills-Li:2005" "Lewi:2006" "Fletcher:1851"
[19] "Perrot:1976"
Only in bib2:
[1] "Langren:1644" "Quetelet:1842"
 >

which gives me the complete answer, as far as it goes.
It turns out that read.bib seems to be pickier than bibtex itself -- it 
does not accommodate crossref= fields, used for
InCollection items; these resolve correctly using bibtex.
For some books in my database, the publisher is unknown. bibtex generates
warnings (I think) and does include the references. It would be nicer if 
there was an argument to read.bib, e.g.,
strict = {T/F} where strict=FALSE would allow entries not containing all 
required fields. But perhaps that's buried
too deep in the implementation.

 > bib1 <- read.bib("C:/localtexmf/bibtex/bib/timeref.bib")
ignoring entry 'Donoho-etal:1988' (line 40) because :
A bibentry of bibtype ?InCollection? has to correctly specify the 
field(s): booktitle

ignoring entry 'Martonne:1919:map' (line 90) because :
A bibentry of bibtype ?InCollection? has to correctly specify the 
field(s): booktitle, publisher, year

ignoring entry 'Touraine:2002' (line 5423) because :
A bibentry of bibtype ?Book? has to correctly specify the field(s): 
publisher

ignoring entry 'Cotes:1722' (line 6004) because :
A bibentry of bibtype ?Book? has to correctly specify the field(s): 
publisher

ignoring entry 'Quetelet:1842' (line 6605) because :
A bibentry of bibtype ?Book? has to correctly specify the field(s): 
publisher

ignoring entry 'Wenzlick:1950' (line 6663) because :
A bibentry of bibtype ?Unpublished? has to correctly specify the 
field(s): note

ignoring entry 'Verniquet:1791' (line 6695) because :
A bibentry of bibtype ?Book? has to correctly specify the field(s): 
publisher

 > length(bib1)
[1] 628
 >
No, it was only lack of documentation, and perhaps an example or two for 
read.bib that caused me to
stumble.