JetPens has a long list of mini-pens, if you’re inclined to be obsessed with writing paraphernalia.
The iMac in my office is 9 years old and is approaching a moment of no longer being usable. When it finally goes, I’m hoping that it is not so old that it cannot be used in Target Display Mode.
The Hagley Museum and Library recently finished digitizing Sperry-UNIVAC’s “Introduction to the Digital Computer.” It’s a 20 minute film which, in some ways, is still useful today for its presentation of foundational matters in computing. Link.
Mark Davies of BYU announced that the NOW (News on the Web) Corpus has arrived at its own word of the year: “Based on 1.7 billion words of text from 2017 in NOW, our Word of the Year for 2017 is fake news (more info), followed by the related phrase alternative facts.”
This Reddit post uses data analysis techniques to distinguish between cookies, pastries, and pizzas in order to win an office party argument. And there’s data too — “1931 recipes from the Food Network that contain the keywords cookies (my group of interest), pastry, or pizza (two control groups).”
Rejected for a special issue of the Journal of Cultural Analytics, but, still, I think, an interesting project and one I will continue to pursue. If anyone else is interested, this is part of a larger project I have in mind and I am open to there being a working group.
Current efforts to treat narrative computationally tend to focus on either the very small or the very large. Studies of small texts, some only indifferently narrative in nature, have been the focus for those interested in social media, networks, and natural language technologies, which are largely dominated by the fields of information and computer sciences. Studies of large texts, so large that they contain many kinds of modalities with narrative the dominant, have largely been the purview of the field we now tend to call the digial humanities, dominated by the fields of literary studies, classics, and history.
The current work proposes to examine the texts that fall in the middle: larger than a few dozen words, but smaller than tens, or hundreds, of thousands of words. These are the texts that have historically been the purview of two fields that themselves line either side of the divide between the humanities and the human sciences, folklore studies and anthropology (respectively).
The paper profiles the knot of issues that keep these texts out of our scholarly-scientific systems. The most significant issue is the matter of “visibility”, of accessibility, of these texts as texts and thus also as data: largely oral by nature, most folk or traditional narratives (must) have been the product of a transcription process that cannot guarantee the same kind of textuality of a “born literary” text. (The borrowing of the notion of natality is somewhat purposeful here, since we often distinguish between texts that have been, sometimes laboriously, digitized and those that were “born digital.”) As scholarly fictions, if you will, they are largely embedded within the texts that treat them, only occasionally available in collections. With limited availability, and traditionally outside the realm of the fields that currently dominate the digital humanities, folk/traditional/oral narratives are not yet a part of the larger project to model narrative nor of efforts to consider the “shape of stories.”
This accessibility gap has overlooked both human and textual populations: most of the world’s verbal narratives are in fact oral in nature and millions upon millions are produced everyday by millions and millions of people and those narratives tend to range in size from somewhere around a hundred words to, perhaps, a few thousand words in length. The result is that any current model or notion of shape simply has allowed the wrong “figures figure figures.” Put another way, there can be no shape of stories without these stories.
With the rise of Lore from an obscure podcast about odd moments in “history,” to an Amazon production, there was been a concomitant rise in interest in the possibilities for expanding the scope of the engagement between folklore studies and some form of a “popular audience.” At least two folklorists I know have been contacted by production companies looking to be a part of this emergent interest.
Like its cousin, history, folklore studies has had a strange, and often estranged relationship with popular media. Some of the popular contact has been initiated by folklorists themselves: e.g., Jan Harold Brunvand. Brunvand was a much beloved individual among the folklorists I know, which seems to be unlike how historians felt about, say, Stephen Ambrose — I know, I know, Ambrose had other issues (e.g., plagiarism). There’s also the recent discussion among historians about (yet another) Ken Burns’ film. (See Jonathan Zimmerman’s “What’s So Bad about Ken Burns?”.
Jeffrey Tolbert has written about this and even engaged in a dialogue with the creator of Lore. (For those interested, Tolbert has a personal essay in New Directions in Folklore: [here].
Working with a sample corpus this morning of fraudulent emails — Rachael Tatman’s Fraudulent Email Corpus on Kaggle, I found myself not able to get past
reading the file, thanks to decoding errors:
codec can't decode byte 0xc2
Oof. That byte
0xc2 has bitten me before — I think it may be a Windows thing, but I don’t remember right now, and, more importantly, I don’t care. Data loss is not important in this moment, so simply ignoring the error is my best course forward:
import codecs fh = codecs.open( "fraudulent_emails_small.txt", "r", encoding='utf-8', errors='ignore')
And done. Thanks, as usual, to a great StackOverflow thread.
BTW, thank you Rachael for making the dataset available!
After the first round of work is done with the TED talks and I’ve taken the next steps on the legend material, it will be time to figure out what to do on the literary side of things. When that happens, Jonathan Reeve’s database for Project Gutenberg looks fantastic.
Vikash Singh has a terrific write-up on “How our startup switched from Unsupervised LDA to Semi-Supervised GuidedLDA” which not only has a very clear discussion of LDA and how they modified it but also that his company’s efforts resulted in a Python library that’s as easy to install as:
pip install guidedlda
If you missed the Louisiana Book Festival but still want to hear what I had to say, here’s your chance:
It probably also makes for an excellent sleep aid. Your mileage may vary.
I am delighted to announce that The Amazing Crawfish Boat will be one of the featured books at this year’s Louisiana Book Festival. The book talk is scheduled for Saturday afternoon, 3:30 p.m. to 4 p.m. in the First Floor Meeting Room of the Capitol Park Museum. If you’re at the Festival, come say hello or swing by the festival’s store after the talk to find me signing books. See you there!
While science fiction has a long history of human-AI/robot interaction, especially in terms of dialogue, the idea of robots/AIs talking to each other gained a lot more currency in the wake of two Facebook AIs seemingly developing their own language. First, a more reasoned summary of what happend at Facebook from the BBC. And now something a bit more sensational. This Quora post also has a bit more on what happened at Facebook.
All of this concern about AIs talking to each other has a history, at least in science fiction. One moment to consider occurred in 1970’s The Forbin Project in which the USA build a supercomputer to oversee its strategic defense systems (missiles, bombers, you name it), only to discover that the USSR (now Russia) had a similar computer. It’s not too long before the two computers demand to talk directly to each other, then merge to form “World Control.”
One good place to start a larger history of robots and AIs talking to each other is Emily Asher-Perrin’s survey on Tor. (Tor is a long-time publisher of science fiction and fantasy literature; their website contains a mix of original fiction, thoughtful essays, and read or watch-alongs of classic or beloved works in the genres.)
(Perhaps one thing to think about is the difference between robots as corporealized entities and artificial intelligences as noncorporeal entities: our responses to intra-entity dialogue seems to differ significantly based on whether the consciousness is individuated in a way that our own seems to be.)
One of the things that happens as you nurture and grow a software stack is that you begin to take its functionality for granted, and, when you are faced with the prospect of re-creating it elsewhere or over, you realize you need better documentation. My work is currently founded on Python, and I have already documented the great architecture that is
matplotlib + … you get the idea.
jupyteris central to how I work my way through code, and when I need to present that code, I am delighted that
jupytergives me the option to present a notebook as a collection of slides.
RISEmakes those notebooks fly using
missingno“provides a small toolset of flexible and easy-to-use missing data visualizations and utilities that allows you to get a quick visual summary of the completeness (or lack thereof) of your dataset. It’s built using matplotlib, so it’s fast, and takes any pandas DataFrame input that you throw at it, so it’s flexible. Just pip install missingno to get started.”
I’ve got more … I just need to list them out.