He said, she said

A new and large corpus of Australian newspaper articles compiled by linguists at Lancaster University’s Corpus Approaches to Social Science Research Centre can help us investigate the gender imbalance in Australian public life.

The collection – consisting of nearly 13,000 articles and close to 7.4 million words – provides an extremely rich data source for studying the Australian media. The data comes from 18 Australian newspapers including The Adelaide AdvertiserThe AgeThe AustralianThe Canberra TimesThe Courier MailThe Daily TelegraphThe Sydney Morning HeraldThe West AustralianThe Northern Territory News and The Hobart Mercury, among others.

It includes all news articles published over 12 months from August 2015 to July 2016 that contained one of the following keywords: Australia, Australian, or Australians.

In any large set of text, the most frequently used words are the smallest – words like “the”, “to”, “and”, “of” and “a”. But not too far down this list is the male pronoun: “he”.

Of nearly 100,000 distinct words used in the collected news articles, “he” was the 16th most frequently used. By comparison, the equivalent female pronoun – “she” – was the 66th most frequently used. “She” turned up 11,765 times, while “he” appeared more than 40,000 times.

That makes the ratio of “he” to “she” in Australian news reporting 3.4 to 1.

Unfortunately, we don’t have comparable data from the period of Gillard’s leadership. It seems likely that, with a female prime minister, this gap would have been narrower.

If you are a “he” or “she” in a text, it means you have a prominent grammatical role – you are the subject of the clause, and you have lasted long enough in the story to graduate from proper name to pronoun.

We can also examine the frequency of combinations of words, like “he said” versus “she said”. Of the articles in the corpus, “he said” appeared 9,892 times compared to “she said” at 2,709 – a ratio of roughly 3.6 to 1. That tells us something important about whose voices are being heard in Australian news media.

Pronouns aren’t the only indicator. The use of proper names – such as Peter, Paul and Malala – also give us clues. The table below is a list of the top 21 names published in the year’s worth of news articles. Why 21? The top 20 are all male names. It was not until I got to the 21st proper name in the corpus that I found a female name.

There’s a good chance the name “Julia” would have appeared in the top five when Gillard was Prime Minister. It’s no coincidence that the top female name in the list is “Julie” – the same name as Australia’s Foreign Minister.


A new and large corpus of Australian newspaper articles compiled by linguists at Lancaster University’s Corpus Approaches to Social Science Research Centre can help us investigate the gender imbalance in Australian public life.

The collection – consisting of nearly 13,000 articles and close to 7.4 million words – provides an extremely rich data source for studying the Australian media. The data comes from 18 Australian newspapers including The Adelaide AdvertiserThe AgeThe AustralianThe Canberra TimesThe Courier MailThe Daily TelegraphThe Sydney Morning HeraldThe West AustralianThe Northern Territory News and The Hobart Mercury, among others.

It includes all news articles published over 12 months from August 2015 to July 2016 that contained one of the following keywords: Australia, Australian, or Australians.

In any large set of text, the most frequently used words are the smallest – words like “the”, “to”, “and”, “of” and “a”. But not too far down this list is the male pronoun: “he”.

Of nearly 100,000 distinct words used in the collected news articles, “he” was the 16th most frequently used. By comparison, the equivalent female pronoun – “she” – was the 66th most frequently used. “She” turned up 11,765 times, while “he” appeared more than 40,000 times.

That makes the ratio of “he” to “she” in Australian news reporting 3.4 to 1.

Unfortunately, we don’t have comparable data from the period of Gillard’s leadership. It seems likely that, with a female prime minister, this gap would have been narrower.

If you are a “he” or “she” in a text, it means you have a prominent grammatical role – you are the subject of the clause, and you have lasted long enough in the story to graduate from proper name to pronoun.

We can also examine the frequency of combinations of words, like “he said” versus “she said”. Of the articles in the corpus, “he said” appeared 9,892 times compared to “she said” at 2,709 – a ratio of roughly 3.6 to 1. That tells us something important about whose voices are being heard in Australian news media.

Pronouns aren’t the only indicator. The use of proper names – such as Peter, Paul and Malala – also give us clues. The table below is a list of the top 21 names published in the year’s worth of news articles. Why 21? The top 20 are all male names. It was not until I got to the 21st proper name in the corpus that I found a female name.

There’s a good chance the name “Julia” would have appeared in the top five when Gillard was Prime Minister. It’s no coincidence that the top female name in the list is “Julie” – the same name as Australia’s Foreign Minister.


Find the full article here on The Conversation.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s