Mehrdad Yazdani pointed out that some of my problems in normalization may have been the result of not having the right pieces in place, and so suggested some changes to the sentiments.py script. The result would seem to suggest that the two distributions are now comparable in scale — as well as on the same x-axis. (My Python-fu is not strong enough, yet, for me to determine how this error crept in.)
Raw Sentiment normalized with np.max(np.abs(a_list))
When I run these results through my averaging function, however, I get significant vertical compression:
Averaged Sentiment normalized with np.max(np.abs(a_list))
If I substitute
np.max(np.abs(a_list)) in the script, I get the following results:
Raw Sentiment Normalized with numpy.linalg.norm
Averaged Sentiment Normalized with numpy.linalg.norm
I’m still working my way through the code that will, I hope, make it possible to compare effectively different sentimental modules in Python. While the code is available as a GitHub , I wanted to post some of the early outcomes here, publishing my failure, as it were.
I began with the raw sentiments, which is not very interesting, since the different modules use different ranges: quite wide for Afinn, -1 to 1 for TextBlob, and between 0 and 1 for Indico.
Raw Sentiments: Afinn, Textblob, Indico
To make them more comparable, I needed to normalize them, and to make the whole of it more digestible, I needed to average them. I began with normalizing the values — see the  — and you can already see there’s a divergence in the baseline for which I cannot yet account in my code:
Normalized Sentiment: Afinn and TextBlob
To be honest, I didn’t really notice this until I plotted the average, where the divergence becomes really apparent:
Average, Normalized Sentiments: Afinn and TextBlob