A note on writing style and vocabulary

< Previous


Next >

Gender appears to be reified through writing style as well as handwriting, according to preliminary statistical analysis.

Recent attempts at creating algorithms that can determine a person’s gender by their writing style have produced some fairly accurate systems (Koppel 2003, Argamon 2003). They were able to guess with 83% accuracy based on a large sample of texts run through their algorithm. Generally speaking, the algorithm assumed men talk more about objects, and women more about relationships. Women tend to use more pronouns (I, you, she, their, myself), and men prefer words that identify or determine nouns (a, the, that) and words that quantify them (one, two, more). See the link in the reference section for the methodology involved.

David Lodge, whose early novel The Picture Goers was among the one out of five texts misgendered by the original algorithm, noted:

"Novels are very problematic texts because they are written in a medley of styles. And more often than not the author is trying to imitate some kind of imagined consciousness ­ male or female. Indeed, writers have always tried to imitate the distinctive characteristics of male and female discourse and we are in the habit of thinking that they have often succeeded. But perhaps these scientists believe they can prove this is an illusion. Still, I’m very surprised that this program is able to discern the gender of the real author. If you were to take ordinary first-person texts ­ letters or diaries ­ then you might, of course, expect a fairly high degree of accuracy. But that it can be done on literary novels intrigues me. This will have fascinating literary, critical and general sociological implications. That said, I’d like to see them apply it to a novelist’s attempt to imitate the opposite sex in a particular passage.” (McGrath 2003)

Some resourceful nerds at bookblog.net created a cruder version of the algorithm used on novels and called it the Gender Genie. It's available online for text analysis.

Elf Sternberg writes:

The Gender Genie algorithm, which first appeared in the NY Times' "science" section, is a poor popularization of the algorithm as it appeared in the original academic literature. I have the original paper and that algorithm is meant to be applied to fiction; applied to non-fiction, the authors admit, the algorithm is no better than random chance at detecting an author's gender. A much better alogrithm, the one that has an "80%" chance of detecting author's gender correctly, needs to be taught on a large sample to generate a massive statistical measure of male vs. female characteristics in text. Even applied to fiction, the popular algorithm is not much better. It seems to think I'm a woman, at least 97% of the time.

You can test a few samples of your fiction writing style with the Gender Genie, a simplified (and less scientific) version of the algorithm used by Koppel:


The Gender Genie statistics page indicates it only gets about 3 in 5 right, where Koppel's original got 4 in 5 right based on multivariate analysis on a large sample of texts. Like other "gender tests," this is not scientifically rigorous and should not be taken very seriously.

Note that this form of stylometry looks for patterns in common words rather than outliers. It has been used to identify authors ranging from the anonymous author of Primary Colors to the Unabomber. These same techniques can be taken from the individual to group level to ascertain statistical likelihood of an author's demographic characteristics. Those interested in the scientific paper that sparked Gender Genie can find a link to the full text in the references.

< Previous


Next >

Handwriting and gender cues