Subject: Re: Planning the Dissertron
From: npdoty@ischool.berkeley.edu
Date: To: Sebastian Benthall Cc: Mohit Gupta, Brooks Paige Bcc: https://bcc.npdoty.name/

Hi Seb,

(CCing Mohit and Brooks, with whom I've been having these conversations in San Francisco cafes.)

Thanks for getting this conversation started, formally, with your blog post. I appreciate both the prompt to think about my own tooling for paper-writing and dissertation-writing (which I hope to be a single process) and the requirements/resources dump you provided. I'm afraid I too missed the Scholarly Markdown workshop (reading about the NSA and recovering from PLSC made me oversleep), but instead spent the weekend reading up on the relevant tools.

And I'm also shocked, on reading through the resources you mention and following a few (dozen) links on my own, to find that this problem is much closer to solved than I would have guessed. Zotero, Mendeley, Dropbox, BibTeX for gathering references and papers and exporting them for reading and review on devices; then Markdown, Pandoc and [pdf]LaTeX for writing and rendering papers; and GitHub, Hyde and gh-pages for version-tracking, hosting and presenting.

Mohit and I just finished a workshop paper a week ago. As is our wont, we started writing text, in Markdown, version controlled and hosted in GitHub. To fit the paper template, we had to switch over to LaTeX at some point (we clearly did not even consider using the .doc template). It's a simple enough process, and I'm familiar with LaTeX from teaching it to myself one weekend in college in order to write up a physics lab report with the equations nicely formatted, so I did the conversion manually.

The conversion process, while not long, was slow enough that I only did it a couple times, and eventually had to stop accepting changes from Mohit in Markdown. And the biggest issue I noted was around citations -- Mohit and I hadn't natively used \cite{} syntax, of course, or had any stable convention on how to refer to cited works. But I was pretty excited when I realized that Mendeley, in addition to having a nice feature for synchronizing out to a BibTeX file, has a handy shortcut (Command-K) for copying the Citation Key in LaTeX format. And I hadn't realized it at the time, but Scholarly Markdown and Pandoc both suggest the use of these citation keys, with the convention of prepending an @. Thanks to citeproc-hs, we could write in Markdown with occasional references to @Whitten1999 and with a script running Pandoc and a library.bib and appropriate CSL (another standard, who knew!), render out to HTML while writing/reading and eventually to PDF through a conference-specific LaTeX template.

While Jekyll is tempting, I agree that we academics seem to be tending more towards Python than Ruby; fortunately, there's Hyde. Mohit and I are rendering static pages for Privacy Patterns (hey, that's what the workshop paper I just mentioned is about) by writing in Markdown in a GitHub Wiki and then running scripts around Hyde to generate template-based static HTML, which is what you see at http://privacypatterns.org. With post-commit hooks written in some combination of Pandoc and Hyde, I'm confident that without writing any entirely new libraries of code, we can automatically publish our academic writing in progress just by typing `git push dissertation master`. Ah, I look forward to that.

—Nick

P.S. Yes, this doesn't answer any of your requirements about integrating academic review and commenting into this online academic work, which hasn't been as much a priority for me, but rather clearly should be. Maybe something will come of open standards for Web annotation. Or does Github's functionality for inline commenting for code review work for us?

P.P.S. You want to know the biggest open problem I have right now? Usable diffs of changes to plain text documents. I'm confident this should really be a trivial problem, but as it is right now I can't easily see what the changes are within a single line of text in a Markdown document. In working with computer scientists in the past, this has led to awkward solutions like hard-wrapping paragraphs at 78 characters (so that line-by-line differences are closer to usable). Nice diffs for reviewing changes is essentially what will stop me from getting Deirdre and other lawyers on board with this being our collaborative writing process.

P.P.P.S. Okay, and I have a handful of other open questions too. Does this work for collaborative writing or just for an individual dissertation? Do we make these repositories public or private during the writing process? Is a dissertation one giant repository, or do we set up repos for each project, paper or deliverable and then a "dissertation" is a set of configuration files which compile these repos into a nicely readable Web site with its own domain name?