I have never been particularly impressed with Moodle, the learning management system used by my university and a number of other organizations. Its every impulse, it seems to me, is to increase the number of steps to get simple things done, I suppose to simplify more complex things for users with less tech savvy. Using markdown, for example, is painful and there’s no way to control the presentation of materials unless you resort to one of its myriad of under-explained, and probably under-thought, content packaging options. (I’ve never grokked the Moodle “book”, for example.)
To be honest, there are times when I feel the same way about WordPress, which has gotten GUIer and less sharp on a number of fronts — why oh why are categories and tags now unlimited in application?
I’m also less than clear on my university’s approach to intellectual property: they seem rather keen to claim everything and anything as their own, when they can’t even be bothered to give you basic production tools. (Hello? It’s been three years since I had access to a printer that didn’t involve me copying files to a flash drive and walking down stairs to load things onto a Windows machine that can only ever print PDFs.)
I decided I would give static site generation a try, particularly if I could compose in markdown, ReST, or even a Jupyter notebook (as a few of the generators appear to promise). I’m not interested in using this for blogging, and I will probably maintain it on a subdirectory of my own site, e.g.
/teaching, and I hope to be able to sync between local and remote versions using Git. That seems straightforward, doesn’t it? (I’m also now thinking that I will stuff everything into the same directory and just have different pages, and subpages?, for each course. Just hang everything out there for all to see.
As for the site generators themselves, there are a number of options:
- Pelican is a popular one, but seems very blog oriented.
- I’ve installed both Pelican and Nikola, and I ran the latter this morning and was somewhat overwhelmed by the number of directories it generated right away.
- Cactus seems compelling, and has a build available for the Mac.
- There is also Hyde.
- I’m going to ignore blogofile for now, but it’s there and its development is active.
Mehrdad Yazdani pointed out that some of my problems in normalization may have been the result of not having the right pieces in place, and so suggested some changes to the sentiments.py script. The result would seem to suggest that the two distributions are now comparable in scale — as well as on the same x-axis. (My Python-fu is not strong enough, yet, for me to determine how this error crept in.)
When I run these results through my averaging function, however, I get significant vertical compression:
If I substitute
np.max(np.abs(a_list)) in the script, I get the following results:
I’m still working my way through the code that will, I hope, make it possible to compare effectively different sentimental modules in Python. While the code is available as a GitHub gist, I wanted to post some of the early outcomes here, publishing my failure, as it were.
I began with the raw sentiments, which is not very interesting, since the different modules use different ranges: quite wide for Afinn, -1 to 1 for TextBlob, and between 0 and 1 for Indico.
To make them more comparable, I needed to normalize them, and to make the whole of it more digestible, I needed to average them. I began with normalizing the values — see the gist — and you can already see there’s a divergence in the baseline for which I cannot yet account in my code:
To be honest, I didn’t really notice this until I plotted the average, where the divergence becomes really apparent:
I added two kinds of moving averages to the
sentiments.py script, and as you can see from the results below, whether you go with the
numpy version or the Technical Analysis library,
talib, of the running average, you get the same results: NP starts its running average at the beginning of the window; TA at the end. Here, the window was 10% of the total sentence count, which was approximately 700 overall. I entered the following in Python:
my_file = "/Users/john/Code/texts/sentiment/mdg.txt" smooth_plots(my_file, 70)
And here is the graph:
The entire script is available as a gh.
Next step: NORMALIZATION!
Following up on some previous explorations, I was curious about the relationship between the various sentiment libraries available in Python. The code below will let you compare a text for yourself, but the current list of three — Afinn, TextBlob, and Indico — is not exhaustive, but rather the three I used to draft out this bit of code, which is better than a lot of code I’ve written thus far but still probably quite crude to some.
#! /usr/bin/env python # Imports import matplotlib.pyplot as plt import seaborn # for more appealing plots from nltk import tokenize # Customizations seaborn.set_style("darkgrid") plt.rcParams['figure.figsize'] = 12, 8 import math import re import sys #reload(sys) #sys.setdefaultencoding('utf-8') # AFINN def afinn_sentiment(filename): from afinn import Afinn afinn = Afinn() with open (my_file, "r") as myfile: text = myfile.read().replace('\n', ' ') sentences = tokenize.sent_tokenize(text) sentiments =  for sentence in sentences: sentsent = afinn.score(sentence) sentiments.append(sentsent) return sentiments # TextBlob def textblob_sentiment(filename): from textblob import TextBlob with open (filename, "r") as myfile: text=myfile.read().replace('\n', ' ') blob = TextBlob(text) textsentiments =  for sentence in blob.sentences: sentsent = sentence.sentiment.polarity textsentiments.append(sentsent) return textsentiments # Indico def indico_sentiment(filename): import indicoio indicoio.config.api_key = 'yourkeyhere' with open (my_file, "r") as myfile: text = myfile.read().replace('\n', ' ') sentences = tokenize.sent_tokenize(text) indico_sent = indicoio.sentiment(sentences) return indico_sent def plot_sentiments(filename): fig = plt.figure() plt.title("Comparison of Sentiment Libraries") plt.plot(afinn_sentiment(filename), label = "Afinn") plt.plot(textblob_sentiment(filename), label = "TextBlob") plt.plot(indico_sentiment(filename), label = "Indico") plt.ylabel("Emotional Valence") plt.xlabel("Sentence #") plt.legend(loc='lower right') plt.annotate("Oral Legend LAU-14 Used", xy=(30, 2))
Once you’ve loaded this script, all you need to do is give it a file with which to work: