Python Site Generators

I have never been particularly impressed with Moodle, the learning management system used by my university and a number of other organizations. Its every impulse, it seems to me, is to increase the number of steps to get simple things done, I suppose to simplify more complex things for users with less tech savvy. Using markdown, for example, is painful and there’s no way to control the presentation of materials unless you resort to one of its myriad of under-explained, and probably under-thought, content packaging options. (I’ve never grokked the Moodle “book”, for example.)

To be honest, there are times when I feel the same way about WordPress, which has gotten GUIer and less sharp on a number of fronts — why oh why are categories and tags now unlimited in application?

I’m also less than clear on my university’s approach to intellectual property: they seem rather keen to claim everything and anything as their own, when they can’t even be bothered to give you basic production tools. (Hello? It’s been three years since I had access to a printer that didn’t involve me copying files to a flash drive and walking down stairs to load things onto a Windows machine that can only ever print PDFs.)

I decided I would give static site generation a try, particularly if I could compose in markdown, ReST, or even a Jupyter notebook (as a few of the generators appear to promise). I’m not interested in using this for blogging, and I will probably maintain it on a subdirectory of my own site, e.g. /teaching, and I hope to be able to sync between local and remote versions using Git. That seems straightforward, doesn’t it? (I’m also now thinking that I will stuff everything into the same directory and just have different pages, and subpages?, for each course. Just hang everything out there for all to see.

As for the site generators themselves, there are a number of options:

  • Pelican is a popular one, but seems very blog oriented.
  • I’ve installed both Pelican and Nikola, and I ran the latter this morning and was somewhat overwhelmed by the number of directories it generated right away.
  • Cactus seems compelling, and has a build available for the Mac.
  • There is also Hyde.
  • I’m going to ignore blogofile for now, but it’s there and its development is active.
  • If all else fails, I have used Poole before. It doesn’t have a templating system or Javascript of any of that, but maybe it’s better for it.

More on Normalizing Sentiment Distributions

Mehrdad Yazdani pointed out that some of my problems in normalization may have been the result of not having the right pieces in place, and so suggested some changes to the sentiments.py script. The result would seem to suggest that the two distributions are now comparable in scale — as well as on the same x-axis. (My Python-fu is not strong enough, yet, for me to determine how this error crept in.)

Mehrdaded Sentimental Outputs

Raw Sentiment normalized with np.max(np.abs(a_list))

When I run these results through my averaging function, however, I get significant vertical compression:

Averaged Mehrdaded Sentiments

Averaged Sentiment normalized with np.max(np.abs(a_list))

If I substitute np.linalg.norm(a_list) for np.max(np.abs(a_list)) in the script, I get the following results:

Raw Sentiment Normalized with numpy.linalg.norm

Raw Sentiment Normalized with numpy.linalg.norm

Averaged Sentiment Normalized with numpy.linalg.norm

Averaged Sentiment Normalized with numpy.linalg.norm

A Tale of Two Sentimental Signatures

I’m still working my way through the code that will, I hope, make it possible to compare effectively different sentimental modules in Python. While the code is available as a GitHub gist, I wanted to post some of the early outcomes here, publishing my failure, as it were.

I began with the raw sentiments, which is not very interesting, since the different modules use different ranges: quite wide for Afinn, -1 to 1 for TextBlob, and between 0 and 1 for Indico.

Raw Sentiments: Afinn, Textblob, Indico

Raw Sentiments: Afinn, Textblob, Indico

To make them more comparable, I needed to normalize them, and to make the whole of it more digestible, I needed to average them. I began with normalizing the values — see the gist — and you can already see there’s a divergence in the baseline for which I cannot yet account in my code:

Normalized Sentiment: Afinn and TextBlob

Normalized Sentiment: Afinn and TextBlob

To be honest, I didn’t really notice this until I plotted the average, where the divergence becomes really apparent:

Average, Normalized Sentiments: Afinn and TextBlob

Average, Normalized Sentiments: Afinn and TextBlob

More Sentiment Comparisons

I added two kinds of moving averages to the sentiments.py script, and as you can see from the results below, whether you go with the numpy version or the Technical Analysis library, talib, of the running average, you get the same results: NP starts its running average at the beginning of the window; TA at the end. Here, the window was 10% of the total sentence count, which was approximately 700 overall. I entered the following in Python:

my_file = "/Users/john/Code/texts/sentiment/mdg.txt"
smooth_plots(my_file, 70)

And here is the graph:

Moving/Running Averages

Moving/Running Averages

The entire script is available as a gh.

Next step: NORMALIZATION!

Comparing Sentiments

Sentiments

Following up on some previous explorations, I was curious about the relationship between the various sentiment libraries available in Python. The code below will let you compare a text for yourself, but the current list of three — Afinn, TextBlob, and Indico — is not exhaustive, but rather the three I used to draft out this bit of code, which is better than a lot of code I’ve written thus far but still probably quite crude to some.


#! /usr/bin/env python
# Imports
import matplotlib.pyplot as plt
import seaborn # for more appealing plots
from nltk import tokenize

# Customizations
seaborn.set_style("darkgrid")
plt.rcParams['figure.figsize'] = 12, 8

import math
import re
import sys
#reload(sys)
#sys.setdefaultencoding('utf-8')

# AFINN

def afinn_sentiment(filename):
    from afinn import Afinn
    afinn = Afinn()
    with open (my_file, "r") as myfile:
        text = myfile.read().replace('\n', ' ')   
        sentences = tokenize.sent_tokenize(text)
        sentiments = []
        for sentence in sentences:
            sentsent = afinn.score(sentence)
            sentiments.append(sentsent)
        return sentiments


# TextBlob

def textblob_sentiment(filename):
    from textblob import TextBlob
    with open (filename, "r") as myfile:
        text=myfile.read().replace('\n', ' ')   
        blob = TextBlob(text)       
        textsentiments = []
        for sentence in blob.sentences:
            sentsent = sentence.sentiment.polarity
            textsentiments.append(sentsent)
        return textsentiments

# Indico

def indico_sentiment(filename):
    import indicoio
    indicoio.config.api_key = 'yourkeyhere'
    with open (my_file, "r") as myfile:
        text = myfile.read().replace('\n', ' ')   
        sentences = tokenize.sent_tokenize(text)
        indico_sent = indicoio.sentiment(sentences)
    return indico_sent

def plot_sentiments(filename):
    fig = plt.figure()
    plt.title("Comparison of Sentiment Libraries")
    plt.plot(afinn_sentiment(filename), label = "Afinn")
    plt.plot(textblob_sentiment(filename), label = "TextBlob")
    plt.plot(indico_sentiment(filename), label = "Indico")
    plt.ylabel("Emotional Valence")
    plt.xlabel("Sentence #")
    plt.legend(loc='lower right')
    plt.annotate("Oral Legend LAU-14 Used", xy=(30, 2))

Once you’ve loaded this script, all you need to do is give it a file with which to work:


plot_sentiments("/Users/john/Code/texts/legends/lau-014.txt")