Counting Control Words in a Text

As I was working on a toy corpus to understand the various facets of skearn, I came across this very clear example of how to count specific words in a collection of texts:

import sklearn
cv = sklearn.feature_extraction.text.CountVectorizer(vocabulary=['hot', 'cold', 'old'])
data = cv.fit_transform(['pease porridge hot', 'pease porridge cold', 'pease porridge in the pot', 'nine days old']).toarray()
[[1 0 0]
 [0 1 0]
 [0 0 0]
 [0 0 1]]

Please note that I’ve changed the original a bit to make it easier to deploy this is a longer script.

Travel Card Falderol

Notes for travel card.

  • If workflow has been implemented: scan each document.
  • Membership dues are not allowed.
  • If you’re out of office, check with agency policy.
  • Other employees cannot possess your card.
  • If a charge is declined, contact admin.
  • Program agreement is renewed annually.
  • Cash is not allowed.
  • There are no flexibilities in PPM49.
  • Incidentals, personal purchases are not allowed.
  • Gift cards, baggage fees, and food are allowed with prior approval.
  • Document everything.
  • If card used accidentally, contact admin.

Of Open Tabs and Persistent Concerns

I’ve left this logbook under-attended for a while now, and since I want to get back into writing mode, it’s a good time, an appropriate moment also to get back into posting here. Once again, one of the prompts for doing so is a browser full of tabs. A lot of interesting pages to digest and some sense that their contents will be useful later.

In general, I would say that the pages that remain open, that persist, in my web browsing fall into two categories which I have not yet been able to resolve into one. The first category is making and manufacturing and the future of work in the world. It results in open tabs like:

  • An Ars Technica interview with Cory Doctorow on his new novel, Walkaway in which Doctorow imagines a post-scarcity world built upon his interest in open-source software, reputation management, and other ideas that have long fascinated him. (I confess that I tried reading his Makers but it just didn’t work for me.)

  • In the interview, Doctorow mentions Bruce Sterling’s Shaping Things, which seems worth a read, since it aspires to be both a history of how we have used energy and matter to create objects in our world but also how we might go about doing that in the future.

  • Also in this vein of the future of work or the future of ideas about work is a Guardian column on how the privatization of innovation in the U.S. is in fact starving the country of its innovation. What Ben Tarnoff argues is that private firms and private capital are not capable of taking the kind of risk that the public sector can.

Now, some of these things I read because part of me wants to write a follow-up book to The Amazing Crawfish Boat that focuses on how to address, or redress, issues that not necessarily the technology boom has brought about but the changes in our thinking: sometimes we get a little carried away. When I read things like the following in particular, I am struck by how much it might benefit from spending time with a farmer:

Accelerationists argue that technology, particularly computer technology, and capitalism, particularly the most aggressive, global variety, should be massively sped up and intensified – either because this is the best way forward for humanity, or because there is no alternative. Accelerationists favour automation. They favour the further merging of the digital and the human. They often favour the deregulation of business, and drastically scaled-back government. They believe that people should stop deluding themselves that economic and technological progress can be controlled. They often believe that social and political upheaval has a value in itself. (Andy Beckett, 11 May 2017, “Accelerationism: how a fringe philosophy predicted the future we live in”, The Guardian LINK)

Plants take time to grow. You can’t change that. (Not a lot, anyway.) People take time to mature, to digest not only their food but also the information they ingest. The problem with the current crop of people running the show is their incredibly short lives and attention spans. (I wonder if this will change when anti-agapic is discovered. When we have longer lives will we be so stuck in short cycles? Perhaps we delude ourselves into thinking that the ping of endorphins would somehow be offset by the knowledge that we have more time. Maybe we would just have longer lives but still pass through them as junkies.)

The second category is my interest in artificial intelligence and machine learning and big data. That’s up next.