During a clean out of my email application this morning — and we won’t discuss how badly I have been managing incoming mail of late, I came across a [Humanist DG] post, in response to an inquiry about text analysis software, to [Diction]:
> DICTION is a computer-aided text analysis program for determining the tone of a verbal message. DICTION searches a passage for five general features as well as thirty-five sub-features. It can process a variety of English language texts using a 10,000 word corpus and user-created custom dictionaries. DICTION currently runs on Windows® on a PC; a Mac® version is in development.
DICTION produces reports about the texts it processes and also writes the results to numeric files for later statistical analysis. Output includes raw totals, percentages, and standardized scores and, for small input files, extrapolations to a 500-word norm.
Okay, so they like to capitalize themselves. We get it.
Digging a little further into its features, you get a bit more information on the five general features:
> DICTION … uses a series of dictionaries to search a passage for five semantic features — Activity, Optimism, Certainty, Realism and Commonality — as well as thirty-five sub-features. DICTION uses predefined dictionaries and can use up to thirty custom dictionaries built with words that the user has defined, such as specific negative and positive words, for particular research needs.
And then there’s a bit more on the word lists:
> DICTION uses dictionaries (word-lists) to search a text for these qualities:
> * Certainty – Language indicating resoluteness, inflexibility, and completeness and a tendency to speak ex cathedra.
> * Activity – Language featuring movement, change, the implementation of ideas and the avoidance of inertia.
> * Optimism – Language endorsing some person, group, concept or event, or highlighting their positive entailments.
> * Realism – Language describing tangible, immediate, recognizable matters that affect people’s everyday lives.
> * Commonality – Language highlighting the agreed-upon values of a group and rejecting idiosyncratic modes of engagement.
> DICTION output includes raw totals, percentages, and standardized scores and, for small input files, extrapolations to a 500-word norm. DICTION also reports normative data for each of its forty scores based on a 50,000-item sample of discourse. The user may use these general norms for comparative purposes or select from among thirty-six sub-categories, including speeches, poetry, newspaper editorials, business reports, scientific documents, television scripts, telephone conversations, etc.
> On a computer with a 2.16 GHz Intel chip and 2 GB of RAM, DICTION can process 3,000 passages (1,500,000 words) in four minutes. The program can accept either individual or multiple-passages and, at your discretion, it provides special counts of orthographic characters and high frequency words.
Just to make sure I understand this, the “semantic” features here are really words on lists, which can come pre-populated or that you can modify or create, and then the additional 36 subcategories are really different corpora? Am I wrong in perceiving this as a more nuanced version of sentiment analysis, but still operating in much the same way by depending upon certain pre-determined word lists?
There’s a lot, to be sure, I don’t know about the history of CATA (computer-assisted textual analysis, which is my acronym for the day!) And there are certainly approaches that I do not yet fully understand the nuances. I think this must be one of them.
[Humanist DG]: http://dhhumanist.org