LaTeX reference management

By John Lenz. June 14, 2012.

In this post, I describe how I manage all the papers and books and other references I have accumulated over the years. Several people have asked me how I do it, and as far as I can tell I have a unique approach. This exact post is part of the reason for creating this blog, since several people have asked me how I manage my references and now I can point them here. Since this got quite long, I have split it in pieces. This first post describes at a high level the way I manage references and part 2, part 3, and part 4 describe some tools and code I use to work with my references.

Citing references from LaTeX using BibTeX works by creating a file containing a long list of references and info about them (author, title, journal, etc) into a file. In the LaTeX document, you reference a citation using the \cite command and the BibTeX and LaTeX compilers get the correct reference number and correctly formated citation (using the author, title, etc. info) in the output pdf. The hard part here is managing this file containing the long list of references.

Existing Reference Management Tools

There are lots of programs (and some websites) for managing a list of references and as far as I can tell, all of them work around a database model. You add references into a personal database, with columns for author, title, journal etc. Also might store the paper itself or a link to the paper online and perhaps some notes attached to the paper. The program or website then has some method of automatically generating the file expected by BibTeX and LaTeX, so you add references to your personal database and manage them there, and when it comes time to compile LaTeX you manually export the list.

There are several downsides for me.

Of course, all these disadvantages can be worked around; some of these management programs provide scripting or other access, provide more free-form notes, etc. But for example none of the programs I checked out rendered LaTeX math in notes (maybe some recent ones do, I haven't looked recently). Even if these issues can be overcome, I still think managing references around a database is the wrong approach.

Comments next to code

As mentioned, I think a personal database of references is fundamentally flawed approach. Programmers have this idea called "comments next to code." Roughly speaking, the idea is that descriptions of code and APIs should live next to the code itself and then you should use a tool called a document generator to generate documentation from the code. The reason for this is twofold.

What does this have to do with references? For similar reasons, my reference management is not designed around a database of references which have individual tags or notes attached. Instead, I markup the "code" (in this case, the reference data that BibTeX and LaTeX expect) with "comments" which can be notes about the main results of the paper, a sketch of a clever proof idea in the paper, relations of this paper to others, and so on. Also, since the list of BibTeX references is directly managed, I can put them in an order that makes sense by topic and include "comments" about groups of papers. For example, I can add a comment like "The next three papers combine to show this remarkable result that...." Also, I can add comments which link to related papers.

The key advantage of this method is that the comments and references all appear in a single document of related papers and can be read top to bottom like a survey. The comments are interspersed with the actual reference data. I then have several documents/code files each on a separate topic. At the moment I have 41 documents containing 241 references, and each document is a mini survey of comments and references.

I manage these documents as follows.

An Example

Here is an example from some of the references used in this paper and this one. If you are interested in a more detailed write up, Sections 3 and 4 of this paper have an overview (and are where these two references are cited).

Take a k-dimensional unit sphere.
Partition the sphere into n domains D~1~,...,D~n~ of equal measure and diameter <
$0.5 \epsilon/\sqrt{k}$.  Chose a point in each set, call the set of all points $P$.
Consider the graph with vertex set $V_1 \cup V_2$ where $V_i$ isisomorphic to $P$.

1. Join an edge $x \in V_1$ to $y \in V_2$ if $d(x,y) < \sqrt{2} - \epsilon/\sqrt{k}$.
2. Join $x,y \in V_i$ if $d(x,y) > 2 - \epsilon/\sqrt{k}$.

This graph has small independence number by properties 3 and the theorem about the
diameter, has  a large number of edges by property 1, and has no $K_4$ by the BE
Rombus theorem.

~~~ {.bib}
@article {beg-bollobas76,
    AUTHOR = {Bollob{\'a}s, B{\'e}la and Erd\"{o}s, Paul},
     TITLE = {On a {R}amsey-{T}ur\'an type problem},
   JOURNAL = {J. Combinatorial Theory Ser. B},
  FJOURNAL = {Journal of Combinatorial Theory. Series B},
    VOLUME = {21},
      YEAR = {1976},
    NUMBER = {2},
     PAGES = {166--168},
   MRCLASS = {05C99},
  MRNUMBER = {MR0424613 (54 \#12572)},
MRREVIEWER = {R. L. Graham},
       URL = {}

Rodl extended this construction to produce a graph with independence number o(n)
which does not contain either $K_4$  or $K_{3,3,3}$.

Let G be the Bollobas-Erdos graph described above, and let H be the spanning subgraph
consisting of all edges inside a part.  There exists a blowup H' of H where each
vertex is blown up into an independent set of size t and H' satisfies the following

1. For all $xy \in E(H)$ and $X' \subseteq B_x$, $Y' \subseteq B_y$ with
$|X'| > \mu t$ and $|Y'| > \mu t$, then there exists at least one edge of H' joining
X' to Y'.
2. H' does not contain cycles of lengths 3,...,k

[Our paper](Ramsey-Turan#rt-balogh11) extends this type of construction to hypergraphs.

~~~ {.bib}
@article {beg-rodl85,
    AUTHOR = {R{\"o}dl, Vojt{\v{e}}ch},
     TITLE = {Note on a {R}amsey-{T}ur\'an type problem},
   JOURNAL = {Graphs Combin.},
  FJOURNAL = {Graphs and Combinatorics},
    VOLUME = {1},
      YEAR = {1985},
    NUMBER = {3},
     PAGES = {291--293},
      ISSN = {0911-0119},
     CODEN = {GRCOE5},
   MRCLASS = {05C35 (05C55)},
  MRNUMBER = {MR951018 (89h:05034)},
MRREVIEWER = {Yair Caro},
       DOI = {10.1007/BF02582954},
       URL = {},

You can notice several things from this example. The BibTeX has been embedded right in the comments; while that takes up a bunch of extra space on this page, in Vim I have folding so the bibtex sections are collapsed to one line unless I use "zo" or "zO" to open the folds. Also, in the markup I use TeX equations, use numbered lists, and link to another page. Essentially I have all of pandoc's markup available.

Also, notice that both of the BibTeX blocks above contain a "URL" and "MRNUMBER". As part of converting these pages to HTML using pandoc (see this post), the URLS turn into links which I can then click on to open up the paper, and the MRNUMBER is massaged into a link to MathSciNet. For those without access (you need to be an AMS member or log in through a university), MathSciNet is a giant database of all math papers with links to the paper, links to the papers it references, and links to papers which reference it. Very helpful when discovering information about references.

The BibTeX entries above were not written manually, instead MathSciNet has the BibTeX fragments already. So when I want to add a reference, I find it on MathSciNet and copy the BibTeX into the page (at some point I will perhaps write a Vim plugin to pull it in automatically). After that, I sometimes have to add the URL.