Somewhere there was quite the controversy over the unmasking of J. K. Rowling writing under a different pen name. In that controversy, stylistics / stylometrics, and their advanced development in the era of computation, have received some coverage in very facets of the media machine that now seems so diverse and so distributed (thanks in no small part to later stages of the era of computation). And so I guess it shouldn’t be any surprise that the [_New Republic_ reports] on software that uses the pattern discernment of stylistics to “undo” an author’s style, “anonymizing” them.
I mark the transforming verb because I am struck by how much of what remains in the anonymized examples provided by the article. A number of lead passages from Fitzgerald, Tolstoy, Dickens, Eliot, and others are provided. The Dickens paragraph is from the beginning of _A Tale of Two Cities_ and you simply can’t undo it. Perhaps the Fitzgerald one might work–I confess that I remember the novel less well. But the Tolstoy is an example of how things happen — and did no one take into consideration that the Tolstoy is actually already transformed by translation?
_Anna Karenina_ as translated:
> Happy families are all alike; every unhappy family is unhappy in its own way. Everything was in confusion in the Oblonskys’ house. The wife had discovered that the husband was carrying on an intrigue with a French girl, who had been a governess in their family, and she had announced to her husband that she could not go on living in the same house with him. This position of affairs had now lasted three days, and not only the husband and wife themselves, but all the members of their family and household, were painfully conscious of it.
Now, as anonymized:
> Happy families are all alike. And, every family that isn’t happy, is unhappy in its own way. The Oblonskys’ house was in turmoil. The wife/mother discovered her husband had been having a passionate relationship with a French girl–who used to be a governess in their family. She announced to her husband that she couldn’t continue living with him. This unpleasant situation existed for three days—and not only were the husband and wife themselves aware of the tension of the situation, but the entire family/household was troubled by the situation.
There are so many dimensions of language that can’t be undone, at least not in this stage of anti-stylistics. I think a human anonymizer would have gone much further. Perhaps computational methods will get there, but it’s not there. Yet.
[_New Republic_ reports]: http://www.newrepublic.com/article/114112/anonymouth-linguistic-tool-might-have-helped-jk-rowling