Git and GitHub

Fateen Alam has compiled a terrific Notion page that provides overviews of version control, setting up and using git, and then using GitHub: Git and GitHub. (Notion is one of many new entrants into the note-taking app/system/omnibus category.)

Scott’s Cheap Flights

I subscribe to Scott’s Cheap Flights, mostly for aspirational reasons, but even if me and my family travel relatively rarely, the service has still saved us more than its cost of subscription in the past few years. A recent email included this reminder about optimizing your chances of finding a great deal:

Things that actually will help you get a better price: searching in the Goldilocks window (2-8 months for international flights and for 1-3 months for domestic), searching flexible dates, avoiding peak travel times, and acting fast when you find a great deal.

The link in the quotation is to a page on their website that has more details.

Gaming M/Disinformation

One of the things I had hoped to do in the next semester or so is to create a simulation of some kind that let students in my classes see for themselves how information cascades through various kinds of networks.

My idea was to build on top of some simulation/modeling scenarios I had found in order to model/simulate the way information moves into and out of various kinds of networks — and here I mean not only the kinds of networks we once considered to be social groups but also the two distinct networks that now occupy our lives: offline (aka oral, face-to-face) and online networks.

An /r/science subreddit thread collects up a number of games focused on disinformation, collected here for ready reference:

  • Harmony Square is based on “inoculation theory”: that exposing people to a weak “dose” of common techniques used to spread fake news allows them to better identify and disregard misinformation when they encounter it in future (University of Cambridge press release. More on the game can be found in this article in Misinformation Review. (MR is published by the Harvard Kennedy School.)
  • Headliner: NoviNews is an “adventure where you control the news and its impact on society, your friends and career. Different choices lead to unique combinations of endings.” Right now it’s part of Steam’s “Bundle of Consequences,” which includes four other titles where you play the grim reaper, Death & Taxes; a digital voyeur, Do Not Feed the Monkeys; someone interned in a relocation camp, Not Tonight; and a border control agent, Papers Please. (Let the dystopian games begin?!)
  • In Orwell: Keeping an Eye On You, “Big Brother has arrived – and it’s you. Investigate the lives of citizens to find those responsible for a series of terror attacks. Information from the internet, personal communications and private files are all accessible to you.”
  • NewsFeed Defenders is a clearly educational game that puts users in charge of a fictional social media site focused on news and information: “Your mission? Maintain the site, grow traffic, and watch out! You’ll also need to spot fake posts that try to sneak in through hidden ads, viral deception, and false reporting.”
  • In Bad News users “take on the role of fake news-monger. Drop all pretense of ethics and choose a path that builds your persona as an unscrupulous media magnate. But keep an eye on your ‘followers’ and ‘credibility’ meters. Your task is to get as many followers as you can while slowly building up fake credibility as a news site. But watch out: you lose if you tell obvious lies or disappoint your supporters!”
  • Go Viral appears to be the simplest of the lot, billing itself as “a 5-minute game that helps protect you against COVID-19 misinformation. You’ll learn about some of the most common strategies used to spread false and misleading information about the virus. Understanding these tricks allows you to resist them the next time you come across them online.” Interestingly, they link to an article in the Journal of Cognition: Good News about Bad News: Gamified Inoculation Boosts Confidence and Cognitive Immunity Against Fake News.
  • There is also The Westport Independent “a censorship simulator taking place in a post-war country, governed by the recently elected Loyalist Party.As the editor of one of the last independent newspapers in the country, your job is to remove and edit the content of your paper, affecting the people’s opinion of both the rebels and the Loyalist government.”

I plan on exploring these games/simulations over the holiday break, and I hope to post notes on their game play and how well they both achieve the goals they set for themselves and how well I think they capture the nature of information flows on- and offline.

Cognitive Biases

Your Bias Is compresses 24 cognitive biases into a very small user interface. The definitions are very brief, but it may be useful as a way to introduce people to the notion of cognitive bias. A PDF and a poster of the biases are also available as well as other materials.

Compare Lists in Python

If you search for how to compare two lists in Python, you will find a lot of helpful pages in a lot of places, many of which assume you are working with numbers or you want exact matches. But what if you want to compare all the items in one list with all the items in another list and you want to be able to set some arbitrary measure of similarity or difference?

The problem arose for me recently when I was trying to compare two lists of different lengths. The two lists represented keyword sets derived from a corpus using NMF, which I had run with two different component values. As part of wanting to discover a probable “best fit” I wanted to compare which strings had remained the same and which had changed to some degree.

My first impulse was to try the Jaccard coefficient, and I used some simple code to make that work:

def jaccard_similarity(query, document):
    intersection = set(query).intersection(set(document))
    union = set(query).union(set(document))
    return len(intersection)/len(union)

I then embedded that bit of code, but it could be any code you wanted, in the following:

for jk, jv in enumerate(second_list):
    for ik, iv in enumerate(first_list): 

The logic is pretty simple, but it is a leap, at least for me, in terms of how I think about things. When I started work on this, I kept trying to pack everything in one for loop: after all, I wanted to compare one list to another. But I wanted to compare all of one list with all of another list, which means I needed to iterate through both lists. A simpler version of this would be:

for j in second_list:
    for i in first_list:

The addition of enumerate above was so that I could keep track of which string in each list was matching without necessarily having to see the string itself — I could use the index values that enumerate produces to call those, if I needed. enumerate is one of those functions I regularly forget, and it is very convenient: essentially it takes a list of items and transforms it into a list of tuples where the first value is the item’s index and the second value is the item itself, so [‘a’] becomes [(0,’a’)]. You can call the parts of the tuple by any variable name you like, but I tend to stick with k and v, for key and value, because … well, because. (It could easily be anything else, and I’ve even written code that called three-item tuples with rather bland, and thus also not advisable, t, u, v. Do not do this.)

So essentially both the for loops above are transforming each of the lists involved into a list of tuples and then walking through the list, comparing the items themselves but reporting only their indices.

It doesn’t really matter which list is which, so far as I can tell, so long as you keep the variables correctly aligned. My final code block looked like this:

print("Jc = Jaccard coefficient")
for jk, jv in enumerate(topics_45):
    for ik, iv in enumerate(topics_35):
        if jaccard_similarity(iv.split(" "), jv.split(" ")) > 0.5:
            print(f"35-{ik} and 45-{jk} have a Jc of {jaccard_similarity(iv,jv):.2f}.") 

My next step is to determine how to transform this into a network or tree so that I can see which keyword clusters continues (relatively) unchanged — where I set the threshold for relatively (and perhaps end up using something other than the Jaccard coefficient which doesn’t seem terribly discriminating — and also where clusters split or, in a few cases, disappear/die.

These Books

At least two newsletters arrived in my inbox this week using this stock photo of books. I’ve seen the image used elsewhere, but seeing it twice on the same day made me wonder “Whose books are these?” @ me on Twitter if you know.

The Power of a bash Script

Every time I run it, I am delighted by how much work the bash script for the COVID dashboard works.

~ % sh ./
remote: Enumerating objects: 35, done.
remote: Counting objects: 100% (35/35), done.
remote: Compressing objects: 100% (27/27), done.
remote: Total 29 (delta 18), reused 5 (delta 2), pack-reused 0
Unpacking objects: 100% (29/29), done.
   3ad1afa..f06d614  master     -> origin/master
Updating 67b320c..f06d614
Fast-forward            |     2 +-
 live/us-counties.csv |  6395 ++++++++++++------------
 live/us-states.csv   |   110 +-
 live/us.csv          |     2 +-
 us-counties.csv      | 12803 ++++++++++++++++++++++++++++++++++++++++++++++++-
 us-states.csv        |   226 +-
 us.csv               |     6 +-
 7 files changed, 16287 insertions(+), 3257 deletions(-)
INFO    -  Cleaning site directory 
INFO    -  Building documentation to directory: /Users/johnlaudun/Developer/COVID-Acadiana/site 
INFO    -  Documentation built in 0.10 seconds 
~ %

I will admit that the dashboard is still primitive, but the idea of it was what was important at the time, and so many dashboards have popped up since then. I mostly keep running the script for a sense of the historical depth it provides.

Quick Labels with Python’s f-string

Sometimes I need a list of titles or labels for a project on which I am working. E.g., I am working with a toy dataset and I’ve created a 10 x 10 array and I want to give the rows and columns headers so I can try slicing and dicing. I prefer human-readable/thinkable names for headers, loc over iloc in pandas-speak. And this one-liner works a treat, as they say:

labels = [label{item}' for item in range(1,11)]

Done. Place it into your dataframe creation (as below) and you are good to go.

df = pd.DataFrame(data=scores, index=names, columns=labels)

A COVID Dashboard for Acadiana

At some point in May (2020), it became clear that one of the things we were facing both nationally and locally was a lack of clear information about the status of COVID — and there were far too many outlets and venues happy, as always, to pounce upon both genuine confusion as well as incipient paranoia. As a folklorist, I am of course interested in the legendry that has sprung up but as a resident of my community I am equally concerned that people don’t have easy access to information about the local scene.

When I came across Bee Guan Teo’s “Has Europe Past the First Peak of COVID-19 Outbreak?” on Towards Data Science (link), I decided to start work on what I imagined as a dashboard to let people keep abreast of the situation here in south Louisiana: COVID-19 in Acadiana was the result.

While it would seem obvious to host the page as part of this WordPress installation, my desire to have the information update daily and to do so in as automated, and thus less prone to human-induced error, a fashion as possible made it more likely that I would develop a dedicated site for the purpose. (And, let’s be clear, the role played by my own limitations with hacking either WordPress or PHP.)

The current version of COVID-19 in Acadiana is in fact built with MkDocs, a Python library that makes it easy to create a status website using Markdown. As the name suggests, it is built with documentation in mind, and so it really isn’t made to support a blog or something like that. (One day I will explore those possibilities.)

COVID-19 in Acadiana is essentially a bash script with the following components:

(1) Update the data from the NYT repo:

cd /Users/johnlaudun/Developer/covid-19-data
git pull

(2) Update the graph of cases and the table of deaths:

cd /Users/johnlaudun/Developer/COVID-Acadiana

(3) Build the site with the new markdown, html, and image(s):

mkdocs build

(4) Deploy the site/ directory to the web server:

cd ~
rsync -r ./Developer/COVID-Acadiana/site/ \

It’s nothing fancy, but it works and it’s a start. My goal is to increase the information density of the page whenever I have the chance.

UPDATE (July 22): I have collected a couple of notes about creating COVID dashboards and I am pasting them here for anyone interested in setting up their own (and I may very well re-write mine).

Flattening a List in Python

There has to be a more elegant, and pythonic, way to do this, but none of my experiments with nested list comprehensions or with itertool’s chain function worked.

What I started with is a function that creates a list of sentences, each of which is a list of words from a text (string):

def sentience (the_string):
    sentences = [
            [word.lower() for word in nltk.word_tokenize(sentence)]
            for sentence in nltk.sent_tokenize(the_string)
    return sentences

But in the current moment, I didn’t need all of a text, but only two sentences to examine with the NLTK’s part-of-speech tagger. nltk.pos_tag(text), however, only accepts a flat list of words. So I needed to flatten my lists of lists into one list, and I only needed, in this case, the first two sentences:

test = []
for i in range(len(text2[0:2])): #the main list
    for j in range (len(text2[i])): #the sublists

I’d still like to make this a single line of code, a nested list comprehension, but, for now, this works.

Strengths in the Humanities

Jason Jackson is one of those people I wish I could be around more: he is principled, thoughtful, and acts for the long-term. So when he casually tags something on social media, I’ll almost always have a look. Most recently, he linked to an article by Helene Meyers in Inside Higher Education on How small liberal arts colleges can best weather the pandemic, noting that humanities scholars might take a few tips from Meyers.

The entire article is worth a read, but for the purposes of re-thinking my own courses for the fall, and just generally re-thinking how I teach, I want to focus on the following things that Meyer highlights as strengths of liberal arts colleges:

  • low faculty/student ratios and small classes “allow meaningful mentoring relationships with faculty members as well as peer education. What if a British-style tutorial were part of every first-year student’s experience? Among smaller groups, meetings powered by Zoom can foster intellectual community, while online discussion forums can require students to respond to one another’s writing.”
  • intensive research seminars “where faculty-guided independent work is supplemented with a cohort of peers who can help vet one another’s projects and learn to ask (and answer) critical questions about both the research process and its products should be provided for upper-class students.”
  • study pandemic-related topics “to [help students] process the experiences of this moment” keeping mind that some students “might need to lose themselves in a passion that seems distant from the horrors of the present.”
  • integrate career coaching throughout the curriculum because “the next few graduating classes will be entering a brutal job market, and we owe our students careful instruction in the development and transferability of marketable skills.”

I see all these things as possible and even within my reach — so long as I am willing to stretch — with career coaching being the weakest point for me. Here, I will have to do more research and, I think, I will also have to consider ways to highlight portable skills/methods/ideas. (I know, I know: it’s the commodification of knowledge and education, but nothing says that making things complex or emphasizing, and perhaps teaching, that all syntheses are dynamic and ever-changing can’t be built into any particular course program or disciplinary curriculum.)

*This post is part of a series in which I design a new course, ENGL 334: Digital Folklore and Culture, in the open. I do so for myself, for my colleagues, and for my students. They are all collected under the tag open course design.

rsync without a Password

In order to set up rsync to work without a password, you first need to make sure that you can do so with a password:

rsync /local/path username@/remote/path

If successful, then generate a public/private key pair, but be sure not to give a password:

$ ssh-keygen
Enter passphrase (empty for no passphrase):
Enter same passphrase again:

Then copy the public key to the remote host — note that ssh-copy-id will copy the file to the correct location for you:

ssh-copy-id -i ~/.ssh/ username@/remote

Make sure that you can ssh without a password:

ssh jlaudun@/remote

Now try rsync adding the argument -e ssh to specify the remote shell to use:

rsync -avz -e ssh /local/path username@/remote/path

Who is this course for?

This post is one of several in which I am designing a new course, ENGL 334: Digital Folklore and Culture, that I will also be teaching in a new context, remotely, and doing so completely in the open. Other posts are tagged open course design.

The Udemy How to Set Your Course Goals course begins with a consideration of who is the target student, with the understanding that courses that attempt to reach too broad of an audience end up reaching no one. Beginners feel overwhelmed and experienced individuals feel under-served. Target an audience.

After brainstorming on paper for a bit, I came up with, I think a basic list:

This course assumes that participants:

  • while fully enrobed in cultural, and folkloric, dynamics do not necessarily understand those dynamics,
  • but are interested in, and committed to, that understanding;
  • have a working familiarity with the research process — the development of an hypothesis, the collection of data, the testing of ideas against the hypothesis, and the eventual development of a syn/thesis — and need for clear communication of results;
  • willing to apply ideas and methods learned in this course (and elsewhere in the university) to materials that seem ephemeral, trivial, trolling, ass-holish (racist, sexist, classist, etc.).§

§ This course also assumes participants can handle language and/or cultural artifacts that are of intentionally or intentionally provocative/offensive in nature. Indeed, this course assumes participants want to understand why people say/do such things.

Why did I switch to Udemy? May was both busy and not, but the month slipped by and I lost access to the edX 101 course on designing courses for edX. (The edX model is that you can audit, take for free, a course for a limited time, but if you want access to it for more than a month or if you want it to count towards a curriculum, then you have to pay for it. The “if you want credit” model worked for me, but “if you want access for more than a month” appears not work for me.) The upshot is that I have switched to the Udemy course, which also means I have switched to a platform that is open to hosting courses by individuals: both edX and Coursera offer courses through affiliated institutions and organizations. I don’t know that what I do will end up on Udemy, but I can certainly take advantage of their “market aware” approach to sharpen my thinking about the course.