More on Normalizing Sentiment Distributions

Mehrdad Yazdani pointed out that some of my problems in normalization may have been the result of not having the right pieces in place, and so suggested some changes to the sentiments.py script. The result would seem to suggest that the two distributions are now comparable in scale — as well as on the same x-axis. (My Python-fu is not strong enough, yet, for me to determine how this error crept in.)

Mehrdaded Sentimental Outputs

Raw Sentiment normalized with np.max(np.abs(a_list))

When I run these results through my averaging function, however, I get significant vertical compression:

Averaged Mehrdaded Sentiments

Averaged Sentiment normalized with np.max(np.abs(a_list))

If I substitute np.linalg.norm(a_list) for np.max(np.abs(a_list)) in the script, I get the following results:

Raw Sentiment Normalized with numpy.linalg.norm

Raw Sentiment Normalized with numpy.linalg.norm

Averaged Sentiment Normalized with numpy.linalg.norm

Averaged Sentiment Normalized with numpy.linalg.norm

A Tale of Two Sentimental Signatures

I’m still working my way through the code that will, I hope, make it possible to compare effectively different sentimental modules in Python. While the code is available as a GitHub [], I wanted to post some of the early outcomes here, publishing my failure, as it were.

I began with the raw sentiments, which is not very interesting, since the different modules use different ranges: quite wide for Afinn, -1 to 1 for TextBlob, and between 0 and 1 for Indico.

Raw Sentiments: Afinn, Textblob, Indico

Raw Sentiments: Afinn, Textblob, Indico

To make them more comparable, I needed to normalize them, and to make the whole of it more digestible, I needed to average them. I began with normalizing the values — see the [] — and you can already see there’s a divergence in the baseline for which I cannot yet account in my code:

Normalized Sentiment: Afinn and TextBlob

Normalized Sentiment: Afinn and TextBlob

To be honest, I didn’t really notice this until I plotted the average, where the divergence becomes really apparent:

Average, Normalized Sentiments: Afinn and TextBlob

Average, Normalized Sentiments: Afinn and TextBlob

: https://gist.github.com/johnlaudun/5ea8234cc8d6f39b982648704c3824b0