Towards an Open Notebook Built on Python

As noted earlier, I am very taken with the idea of moving to an open notebook system: it goes well with my interest in keeping my research accessible not only to myself but also to others. Towards that end, I am in the midst of moving my notes and web captures out of Evernote and into DevonThink — a move made easier by a script that automates the process. I am still not a fan of DT’s UI, but its functionality cannot be denied or ignored. It quite literally does everything. This also means moving my reference library out of Papers, which I have had a love/hate relationship with for the past few years. (Much of this move is, in fact, prompted by the fact that I don’t quite trust the program after various moments of failure. I cannot deny that some of the failings might be of my own making, but, then again, this move I am making is to foolproof systems from the fail/fool point at the center of it all, me.)

Caleb McDaniel’s system is based on Gitit, which itself relies on Pandoc to do much of the heavy lifting. In his system, bibtex entries appear at the top of a note document and are, as I understand it, compiled as needed into larger, comprehensive bibtex lists. To get the bibtex entry at the top of the page into HTML for the wiki, McDaniel uses an OCAML library.

Why not, I wondered as I read McDaniel, attempt to keep as much of the workflow as possible within a single language. Since Python is my language of choice — mostly because I am too time and mind poor to attempt to master anything else — I decided to make the attempt in Python. As luck would have it, there is a bibtex2html module available for Python: [bibtex2html](https://github.com/goliveira/bibtex2html).

Now, whether the rest of the system is built on Sphinx or with MkDocs is the next matter — as is figuring out how to write a script that chains these things together so that I can approach the fluidity and assuredness of McDaniel.

I will update this post as I go. (Please note that this post will stay focused on the mechanics of such a system.)

Open Notebook

Despite having had the experience of having some of my work copied from publications of initial findings, I remain committed to the idea that science and scholarship should be affairs conducted, insofar as it is practicable and ethical to do so, in the open, available to others for inspection and consideration. To some extent, this website cum blog, which I originally called a logbook, was/is an effort to do some of that. But blogs are now best understood as more oriented toward chronological arrangements, which runs counter to what is often the slow accumulation of notes and ideas that are fundamental to science and scholarship.

While software like WordPress includes the options for more static pages, it is not necessarily easy to set up nor is it necessarily easily integrated into other workflows — that is, being able to use a an always available application that can capture notes and scribblings wherever you happen to be reading or writing. (I suspect Automattic’s purchase of SimpleNote is one step to solving this problem, and, like many others, I will be watching to see how things develop. For now, publishing from within SimpleNote simply creates a URL which one can send to others, like Dropbox or Google Drive.)

I have been following Caleb McDaniel’s experiments in open notebook scholarship for a while now, waiting for a moment where I might be in a position to follow suit. McDaniel’s system is based on Gitit, a Pandoc-based wiki that, obviously, uses Git for publishing results to a web server — it’s not quite clear to me, at this stage of my reading, how much of the Gitit infrastructure is copied to the web.

This past semester, I tried out a Python-based system, MkDocs, which I chose principally for the simplicity of its folder structure and the fact that it seemed least “fussy.” (Obviously such an evaluation comes with a rather large block of salt.) The advantage of Gitit, as I understand it, is that it uses Pandoc to create HTML, but Pandoc can be used within any setup — so long as you have it installed — as can Git. McDaniel’s system is, like so many others, including my own, founded on markdown.

The really exciting part of his note system, I think, is that instead of building one giant Bibtex file, each bibtex entry is at the top of the document that also contains his reading notes and quotations. I have been using Papers for reference management, and while my experience of it has been mixed: the automatic determination of the likely citation information is quite good; my experience of its ability to keep track of associated files — sometimes hard-won PDFs — has been lackluster. The experience of taking notes on electronic documents is okay, and, obviously the auto-generation of quotations from highlighted material is exceptional — but also dependent on the quality of the PDF. (Why, oh why, do so many humanities journals and materials generate such poor PDFs? Is this a workflow issue? Would a LaTeX-based workflow generate better PDFs? What’s going on here?)

It’s this generation of Bibtex on-the-fly, which I find truly exciting, and McDaniel has released the Pandoc filter which does that work. As I move forward with my own system, probably based on MkDocs, I’ll be looking to see if I can either re-use his filter or re-create it within Python. I am not alone in exploring this territory, Johannes Grassler has released [mkdocs-pandoc][], a Python module “contain[ing] a set of filters for converting mkdocs style markdown documentation into a single pandoc-flavored markdown document.”

Why do this? Open science/scholarship is hard, and, as I have experienced, possibly opens you up to some unpleasant experiences. First, there is the principle of openness itself, which is something which one aspires to, knowing that it’s probably never fully achievable because, first, an individual researcher from a public university simply doesn’t have the resources to make it happen, and, second, because human. Second, there is my own reticence to commit everything into a file system that remains outside my control, and, especially, open to corruption — and here I simply mean the dangers of having a system break ungracefully. In the early aughts I had a Windows 2000 machine crash — when the power supply friend, it took out part of the hard disk. The part that remained could not be logged into, leaving several years of work, which, yes, should have been backed up, unaccessible. Since then, I am more active in terms of having multiple copies and having those copies in a format that degrades with some grace. Plain text can’t be beat in this regard.

For the record, I wrote The Amazing Crawfish Boat in Scrivener, and I will probably do the same with the next book, especially now that Scrivener has an iOS companion. But Scrivener is a place where I focus on writing, not necessarily on compilation of notes and ideas — indeed, when I have used it as such, the writing suffers. (To be clear, the app itself can handle pretty much anything you throw at it. That dump everything and sort it out in the writing approach simply doesn’t work for me.)

This business of compiling notes has remained one of those things I wish I could sort out. At present, I have a few Devonthink databases as well as a number of Evernote notebooks, and that’s not counting the collection of plain text notes I have sitting in at least one directory in my Dropbox account — and some sitting in a Ulysses notebook on my iCloud account. I don’t think all these notes necessarily need to come together in one place, but reducing the places isn’t a bad idea, and, just as importantly, finding a place for scientific and scholarly notes that facilitate productivity is one piece of the puzzle that I think can be solved sooner rather than later.

Winter Break 2016-2017 just got a little more interesting.

Why I Write Notes by Hand (And You Should, Too)

There are a number of articles available that make a variety of arguments for why students in classes and individuals in meetings should take notes by hand. Those arguments range from the pragmatic — it’s too easy to distract yourself, and others — to the cognitive — there is more/better brain activity when we write by hand. The arguments have appeared in prestigious publications like _Scientific American_, _The New York Times_, and the _Chronicle of Higher Education_. All of them, I think, make good sense both intuitively and rationally.

I would like to add another dimension to why I take notes by hand and why I think you should, too, and it comes down to this: don’t limit yourself to a single mode of thinking. I don’t know how your brain works, but only if you are the most linear of thinkers, and one narrowly confined — by some weird birthright (or curse) — to only ever thinking in words, are you going to be able to capture your thoughts strictly with, well, words. I find I sometimes need to diagram, and even if I don’t do much more than put words in weird text blocks connected by lines and arrows, I am still able to indicate more complex kinds of relationships — various forms of subordination (like multiple branches) or parallelism — quickly than I can if I only use words:

Notes with Additional Relationships

Notes with Additional Relationships

Consider, too, that sometimes all you have is a diagram or some other kind of image in your mind. You certainly *could* use words to describe it … eventually. But, perhaps, your first impulse is to “see” it in its totality.

Field Notes with Diagram

Field Notes with Diagram

And then there are the times that you can’t “find the right word” or the word is “on the tip of your tongue.” Why force yourself to find a word, especially in the middle of a class or a meeting where you may not have time to figure it out? Why not draw or doodle or whatever; allow yourself the opportunity to capture your thoughts in some other fashion, and then, later, when you are putting your notes away for the day, as I have [advised elsewhere][], you can find time to discover what it was you were trying to say to yourself — this works especially well if you use [paper with wide left margins][].

### Links

* In [Why I’m Asking You Not to Use Laptops][], Anne Curzan makes a number of practical arguments against using laptops in class.
* In [What’s Lost as Handwriting Fades][], Maria Konnikova reports on recent neurological studies that reveal that writing by hand activates certain kinds of pathways in the brain.
* In [A Learning Secret][], Cindi May reports that it’s actually important that you write by hand more slowly than you can type: you think better and remember more as a result.
* While not on the topic of writing, in [Science Has Great News for People Who Read Actual Books][] Rachel Grate reports on recent studies that reveal people have higher reading comprehension when they read on paper than when they read on screens.

[advised elsewhere]: http://johnlaudun.org/20150703-re-notebooks/
[paper with wide left margins]: http://www.amazon.com/gp/product/B0013CHS0O/
[Why I’m Asking You Not to Use Laptops]: http://chronicle.com/blogs/linguafranca/2014/08/25/why-im-asking-you-not-to-use-laptops/
[What’s Lost as Handwriting Fades]: http://www.nytimes.com/2014/06/03/science/whats-lost-as-handwriting-fades.html
[A Learning Secret]: http://www.scientificamerican.com/article/a-learning-secret-don-t-take-notes-with-a-laptop/
[Science Has Great News for People Who Read Actual Books]: http://mic.com/articles/99408/science-has-great-news-for-people-who-read-actual-books

re: Notebooks

Lab notebooks (323.365)

Over the years, I have made a number of posts about various dimensions of notebooks, but, really, the only point I want to make is: get one, keep one. I no longer try to keep projects in single notebooks (more on this in a moment), but I do keep a notebook with me at all times, and, when in doubt, whatever it is I need to record or I want to write/think about goes in there. I can always copy that material to some other location, but I cannot do that if it is lost to the vagaries of time.

When I am working in Python, I am working inside an iPython notebook, which like the script pane in RStudio, allows you to write and run code in a way that also allows you to keep track of what you have done. This is different from working in an IDE, where your sole focus might be developing a piece of code. In many instances, scientists and scholars are interested in what a particular piece of code does to a particular piece/stretch of data. In my case, I am still learning so much about the interaction of code and data, and I need to take notes about not only what I thought I was doing but also what I wish I could do.

This is a lot like a lab notebook. As Dutch data scientist Jeroen Janssens notes: “Doing research is hard. Recalling which steps you’ve taken, and why, is even harder. To be an effective researcher, you may want to keep a laboratory notebook. Besides having a record of your steps and results, this also allows you to improve reproducibility, share your research with others, and, yes, think more clearly. So, why wouldn’t you keep a notebook?” Lab notebooks are important for their ability to track your thinking.

That noted, and in contrast with the usual advice for lab notebooks, when I am working on a project, I tend to use pads of paper or looseleaf paper: for the record, I use either engineering paper or law-ruled paper with wide left margins, and I will almost always have a pad of one or the other out on my desk when I am working. When I am done for the moment, and I am prepared to take the project off my work surface, I take the various sheets of paper I have generated and place them in a folder.

I try to keep folders between a half-inch and three-quarters of an inch in thickness. Above that and it gets too hard to find something quickly, which is the whole point of putting things in a folder and of filing systems in general. If that means I have to spend five to fifteen minutes thinking about how to break a burgeoning folder into two smaller folders, I am okay with the time spent. Like the time spent filing things, I regard it as an opportunity to review what I have done so far, what remains to be done, and what, if any, changes in direction need to be undertaken. I’m at that point right now in a project: I didn’t realize it until I wrote that sentence, but there’s been some slight friction in getting things done, and it’s because the folder has gotten too big, too sloppy.

When it comes time to archive a project, I don’t mind big, sloppy folders. Sometimes, in fact, I’ll take several smaller folders and empty them into one, re-write the label (and this is why I write labels in pencil), and then put the thing in the box, or drawer, of projects done. If I ever need to work through that material again, then plowing through a pile of paper is just one way to refresh my memory.

For participants in my courses reading this, maybe because they’ve decided to find more about me or maybe because I’ve told them to look this post up, the TL;DR version of this post is this:

  • Spiral-bound notebooks are wrong for a number of reasons: it’s too tempting to tear out a page if you think you’ve made a mistake. (Keep your mistakes: it’s part of learning.) It’s also tempting to tear out a page when you need a blank piece of paper: no one wants your ratty tassels of paper. It’s also tempting to think that you can fit everything within a given space of a notebook, especially those big, stupid “multi-subject” notebooks. There is “no one notebook to rule them all.” (I’m betting even Sauron didn’t have a multi-subject notebook.) You’re going to feel like an idiot when you run out of paper halfway, two-thirds, or three-quarters of the way through the semester.
  • Life breathes in and out. Get a capture system that does too. Pads of paper, or loose-leaf paper, kept in a fashion that it doesn’t get beat up as you take it in and out of your bag are the way to go. Write as much as you want whenever, and wherever, you want during the day. Stack all your paper in a common folder for the day, and then, at the end of the day, you can parse it into folders that represent courses, subjects, or projects. (Maybe you can even have one called “just for me.” Think about it.) As you sort the notes from the various events that filled your day — classes, meetings, etc. — you also get a chance to review your day, go over what’s important, remind yourself — even write it down in a calendar or todo list — of the things that need to get done. This review process, almost everyone agrees, is central to getting things done.

Creativity is in the notes

This one is for my students:

> The one characteristic that all of these creatives shared— whether they were painters, actors, or scientists— was how often they put their early thoughts and inklings out into the world, in sketches, dashed-off phrases and observations, bits of dialogue, and quick prototypes. Instead of arriving in one giant leap, great creations emerged by zigs and zags as their creators engaged over and over again with these externalized images.

For the record, I keep two notebooks: this logbook and a hand-written notebook. I like thinking on paper for a variety of reasons, at least one of which is that I sometimes need to draw. Conversely, I sometimes like to make myself work on the computer, because, in the end, almost all forms of productivity in the academy that are valued are largely written. Working on the computer means typing. And typing produces writing. And it makes revising that writing into something others may want to read quite easy.

Oh, and I am going to ignore for now the quite obvious topic of the quotation: genius is iterative. It’s just that we often don’t glimpse the iterations. The final product bursts upon our world and we experience it as revolutionary.

Perception, perception, perception. So much of this comes down to the fact that we are these little bundles of perceiving mechanisms tied to a small, always-on computing device that we have only begun to understand. We are always alone, and yet we are always tangled up with others.

Via [Business Insider](http://www.businessinsider.com/strokes-of-genius-heres-how-the-most-creative-people-get-their-ideas-2013-7)