Omeka and Neatline

One day I am going to sit down and figure out how to use [Omeka][]. When I do, I am *sooo* going to use the [Neatline][] plug-in.

[Omeka]: http://omeka.org/
[Neatline]: http://nowviskie.org/2014/neatline-and-visualization-as-interpretation/

Some Further Notes on a Plain Text (CM) System

If you are working in plain text, you are probably still going to want some way of structuring your text, that is marking it up just a little so that you can do a variety of things with it. As I have already noted, the way that I know best is a variant of Markdown known as MultiMarkdown. But there are other systems out there: I have always been intrigued by the amazing scope of [reStructuredText][] and I am somewhat impressed by [AsciiDoc][]. (By way of contrast, I have always hated MediaWiki markup: it is almost incomprehensible to me.) The beauty of reStructuredText is that you can convert it to HTML or a lot of other formats with `docutils`. Even better is [Pandoc][], which converts back and forth between Markdown, HTML, MediaWiki, man, and reStructuredText. *Oh my!*

You can get Pandoc through a standalone installer or you can get it through MacPorts. To get MacPorts, however, you need the latest version of Xcode, which brings me to the topic of the moment: a plain text system is really founded on the Unix way of doing things, which means that your data is in the clear but you as an operator must be more sophisticated. Standalone applications like MacJournal and DevonThink, which I keep mentioning not at all because they are inadequate but because they are so good and because I use them when I am more in an “Apple” mode of doing things, are wonderful because you download them and all this functionality is built in. At the command line, not only do you assemble the functionality you want out of a variety of small applications, but in order to install or maintain those applications you need to have a better grasp of *what requires what*, also known as *dependencies*.

The useful Python script [Blogpost][], a command line tool for uploading posts directly to a WordPress site, is available through a Google Code project, which requires that you get a local copy through Mercurial, a distributed version control system, which is easily available … through MacPorts. There are other ways to get it, but allowing MacPorts to keep track of it means that you have an easier time getting it updated. This works much like Mac’s Software Update functionality, or the new badges that come with the Mac App store that tell you that updates are available. No badges at the command line, but if you allow MacPorts, also known as a package manager, to, well, manage your packages, then all you need to remember to do is to run `update` once a week or so and all of that stuff is taken care of for you.

And so to summarize the dependencies:

`Blogpost -> Mercurial -> MacPorts -> XCode`

Package managers, like MacPorts, only keep track of things locally, that is on the one machine on which they are installed, and not across several machines. It’s a bit of a pain to replicate all these steps across various machines, and so I now understand the appeal of `debconf` for Ubuntu users. I don’t quite know how to make that happen for myself, but I am open to suggestions.

[reStructuredText]: http://docutils.sourceforge.net/docs/ref/rst/introduction.html
[AsciiDoc]: http://www.methods.co.nz/asciidoc/
[Pandoc]: http://johnmacfarlane.net/pandoc/
[Blogpost]: http://srackham.wordpress.com/blogpost-readme/

Some Notes on a Plain Text (CM) System

The idea of a “trusted system” probably can be attributed to David Allen as much as to anyone else. Certainly the idea is his within the current zeitgeist. Even if you have not heard of him you probably have heard the ubiquitous three letters associated with him, GTD. Allen’s focus is on projects and tasks, but the idea of a trusted system applies just as well to any undertaking. For folks who type for a living, be it words in sentence or functions in a line of code, ideas are just as important as tasks when it comes to accomplishing projects. Allen’s GTD system has a response to ideas, but it largely comes down to putting things in folders.

But as anyone who works with ideas knows, sometimes you don’t know where to put them. And, just as importantly, why should you have to put them in any particular place? In the era of computation — that is, in the era of `grep` and `#tag` — having to file things, at least right away, would seem an anachronism that forces us to return to a paper era that often forced us to ignore the way the human mind words. That is, when operating in rich mode the mind is capable of grasping diffuse patterns across a range of items in a given corpus, but finding those items when they are filed across a number of separate folders, or their digital equivalent of directories is tedious work. `grep` solves some of that problem, of course.

I have largely committed, in the last few weeks, to using DevonThink as the basis for my workflow, because I like its UI and its various features and because it makes casual use so easy — and when I am sitting in my campus office, I need things to be casually easy.

But the more I learn about DevonThink’s artificial intelligence, the more I want to be able to tweak it, add my own dimensions to it. For example, DevonThink readily gives you a word frequency list, but what I want to exclude common words from that list? I know a variety of command line programs that allow me to feed them a “stop list”, a list of words to drop from consideration (and indeed these lists are sometimes known as “drop lists”) when presenting me a table of words and the number of times they appear in a given corpus. I am also guessing that when DT offers to “auto group” or “auto classify” a collection of texts, it is using some form of semantic, or keyword, mapping to do so. What if I would like to tweak those results? Not possible. This is, of course, the problem with closed applications.

The other problem with applications like DevonThink and MacJournal, as much as I like both of them, is that you can do a lot within them, but not so much without. While neither application holds your data captive — both offer a variety of export options — a lot of their functionality exists within the application itself. Titles, tags, etc.

Having seen what these applications can do and how I use them, would it be possible to replicate much of the functionality I prefer in a plain text system that would also have the advantage of, well, being plain text? As the Linux Information Project notes:

> Plain text is supported by nearly every application program on every operating system and on every type of CPU and allows information to be manipulated (including, searching, sorting and updating) both manually and programmatically using virtually every text processing tool in existence. … This flexibility and portability make plain text the best format for storing data persistently (i.e., for years, decades, or even millennia). That is, plain text provides insurance against the obsolescence of any application programs that are needed to create, read, modify and extend data. Human-readable forms of data (including data in self-describing formats such as HTML and XML) will most likely survive longer than all other forms of data and the application programs that created them. In other words, as long as the data itself survives, it will be possible to use it even if the original application programs have long since vanished.

Who doesn’t want their data to be around several millennia from now? On a smaller horizon, I once lost some data to a Windows NT crash that could not be recovered even with three IT specialists hovering over the machine. (To be fair to Windows NT, I think I remember the power supply was just about to go bad and that it was going to take the hard drive with it.) Ever since that moment, I have had a tendency to want to keep several copies of my data in several places at the same time. Both DropBox and our NAS satisfy that lingering anxiety, but both of them are largely opaque in their process and they largely sync my data as it exists in various closed formats.

And as the existence of this logbook itself proves, I have problems with focus, and there is something deeply appealing in working inside an environment as singularly focused as a terminal shell. That is, I really do daydream about having a laptop which has no GUI installed. All command line, all the time. Data would be synced via `rsync` or something like it, and I would da various kinds of data manipulation via a set number of scripts, that I also maintained via Git or something like it.

Now, the chief problem plain text systems have, compared to other forms of content management, is a lack of an ability to hold metadata, and so the system I have sketched out defaults to two conventions about which I am ambivalent but which I feel offer reasonable working solutions.

The first of these conventions is the filename. Whether I am writing in MacJournal or making a note in my notebook, I tend to label most private entries with their date and time. In MacJournal this looks like this: `2012-01-04-1357`. In my Moleskine notebook, every page has a day header and each entry has its own title. Diary entries are titled with the time they were begun. So a ‘date-time` file naming convention will work for those notes.

When I am reading, I write down two kinds of things: quotes and notes. Quotes are obvious, but notes can range from short questions to extended responses and brainstorming. Quotes are easily named using the Turabian author-date system which would produce a file name that looks like this: `Author-date-pagenumber(s)`. Such a scheme requires that a key be kept somewhere that decodes `author-date`s into bibliographic entries. What about notes? I think the easiest way to handle this is using `author-date-page-note`. In my own hand-written notes, I tend to handle page numbers to citations within parentheses and pages to notes with square brackets, but I don’t know that regex on filenames is how I want to handle this.

Filenames handle the basics of metadata, in some fashion, but obviously not a lot, and I am being a bit purposeful here in trying to avoid overly long filenames. For additional metadata, I think the best way to go is with Twitter-style “hashtags”. E.g., `#keyword`.

Where to put the tags, at the beginning like MultiMarkdown or AsciiDoc, or at the end where they don’t interfere with reading? I haven’t decided yet? I use MultiMarkdown, and PHPMarkdown, almost by default when writing in plain text. The current exception to this is that I am not separating paragraphs by an additional line feed, which is the basis for most Markdown variants. This is just something I am trying, because when I am writing prose with dialogue or prose with short paragraphs, the additional white space looks a bit nonsensical. The fact is, after years of being habituated to books, I am used to seeing paragraphs begin with an indent and no extra line spacing. It’s very tidy looking, and so I am playing with a script through which I pass my indented prose notes and which replaces the tab characters, `\t`, with a newline character, `\n`, before passing the text onto Markdown.

Now, this system is extremely limited: it doesn’t handle media. It doesn’t handle PDFs. It doesn’t handle a whole host of things, but that is also its essence. It’s a work in progress. I will let you know how it goes. Look for the collection of scripts to appear on GitHub on some point in the near future.

If I can ever get Omeka up and working on my [Small Orange](http://asmallorange.com/) account, [D-Lib Magazine has a nice post about using it](http://www.dlib.org/dlib/march10/kucsma/03kucsma.html).

Islandora

John Anderson wrote me about a new content management system for education being developed by the University of Prince Edward Island: [Islandora][1]. Islandora is a combination of Drupal and Fedora “to create a robust digital asset management system that can be fitted to meet the short and long term collaborative requirements of digital data stewardship.”

Looks interesting. As always, the question is really — where I work — how much of a community there is, will be, to draw upon. Drupal and Fedora have deep, rich communities, but Islandora?

[1]: http://islandora.ca/about

The Reliability of Blogging Platforms

[Royal Pingdom has the results](http://royal.pingdom.com/2010/12/17/the-most-reliable-and-unreliable-blogging-services-2/) of their monitoring of five popular blogging platforms: Blogger, WordPress.com, TypePad, Posterous, Tumblr (spoler alert: listed in order of reliability). Ordinarily I would let this pass, but I am considering using a publicly available blogging platform for my digital humanities seminar. Why a public service? I want students to have something that can continue beyond their years at university: using our Moodle installation can’t do this. I am currently leaning towards [Wordpress.com](http://wordpress.com) because

1. I use it and am familiar with it
2. It’s open source
3. A number of digital humanities projects, e.g. CUNY’s [Academic Commons]( http://commons.gc.cuny.edu/), are built on it — or the other open source CMS, [Drupal](http://drupal.org/). (CUNY’s effort should not be confused with the other [Academic Commons](http://www.academiccommons.org/), which is equally interesting, but I don’t know if it’s built on WordPress CMS.)

CMS Made Simple

I have to remind myself now and then that CMS Made Simple is still out there and it’s still inviting. I still prefer the Ruby way, but if I ever do decide to build a much more CMS-oriented site, there’s always CMSMS.

Choosing a CMS (and how the web works)

In the past few weeks I have had a number of direct conversations or made indirect observations about a number of websites run either by individuals or by organizations that are still using some form of static HTML generator when they probably should be using some form of content management system (hereafter CMS), almost all of which produce HTML dynamically.

What’s the difference between *static* and *dynamic* you ask? Static HTML pages sit on a server, typically in a folder/directory titled `public_html`.

Now let me make this clear for all my friends who have asked me, or were about to ask me, that static HTML generation is great for the internet.

### Comparing Code

A pretty fundamental, and arguably not very interesting to most users, way to compare the various CMSes is to look at their code base. [Dries Buytaert](http://buytaert.net/cms-code-base-comparison) has done so. His graphs reveal the size of the code bases over time.

It turns out that the Drupallers are themselves prone to reflecting on what they do in relationship to WordPress. There have been a number of threads over the years. [This one in particular](http://drupal.org/node/29364) reflects on ease of use issues. And here’s [another discussion](http://groups.drupal.org/node/15689).

Web developers regularly ask this (http://ask.metafilter.com/131535/Drupal-vs-Joomla-vs-Wordpress-vs) precisely because they want to be able to deliver to their clients a stable, robust platform that is very user friendly. If any of those three dimensions fail, they know that the client will fault them, not the platform. But what do we mean by stable, robust, and friendly?

> WordPress is really slick for quick, turnkey web sites that don’t really need much functionality beyond a blog and an ‘about’ page.

> Drupal definitely has a learning curve, but it’s your platform if you anticipate needing to integrate a lot of custom functionality; its biggest strengths are its APIs.

Web Publishing Platforms for the Humanities

As I continue to work on the scholarly narratives for Project Bamboo, I have gleaned the following platforms that people are using, or would like to use, in the service of humanities projects:

* [Omeka](http://omeka.org/) is brought to you by the same folks who brought us Zotero and is described as “a free and open source collections based web-based publishing platform for scholars, librarians, archivists, museum professionals, educators, and cultural enthusiasts. Its “five-minute setup” makes launching an online exhibition as easy as launching a blog. Omeka is designed with non-IT specialists in mind, allowing users to focus on content and interpretation rather than programming. It brings Web 2.0 technologies and approaches to academic and cultural websites to foster user interaction and participation. It makes top-shelf design easy with a simple and flexible templating system. Its robust open-source developer and user communities underwrite Omeka’s stability and sustainability.”
* [CONTENTdm](http://www.contentdm.com/) is described as *digital collection management software*. Its blurb is “CONTENTdm® makes everything in your digital collections available to everyone, everywhere. No matter the format — local history archives, newspapers, books, maps, slide libraries or audio/video — CONTENTdm can handle the storage, management and delivery of your collections to users across the Web.”
* [Pachyderm](http://pachyderm.nmc.org/) is “n easy-to-use multimedia authoring tool. Designed for people with little multimedia experience, Pachyderm is accessed through a web browser and is as easy to use as filling out a web form. Authors upload their own media (images, audio clips, and short video segments) and place them into pre-designed templates, which can play video and audio, link to other templates, zoom in on images, and more. Once the templates have been completed and linked together, the presentation is published and can then be downloaded and placed on the author’s website or on a CD or DVD ROM. Authors may also leave their presentations on the Pachyderm server and link directly to them there. The result is an attractive, interactive Flash-based multimedia presentation.” It appears to be available in three versions: hosted, as a managed deployment, and as a DIY open source download.