Background and methodology

Overview and scholarly context

I am interested in understanding the publicly stated priorities of Canadian governments. This project is guided by the following overall question:

How have the governments of different Canadian political parties described their priorities over the past 65 years?

This question implies a few different sub-questions:

This is not a new field of study. Other scholars have considered this broad question from different angles, using a variety of methods:

From this grounding in existing literature, I decided to analyze speeches from the throne using topic modelling, supplemented with other forms of computerized textual analysis. For a specific question, then, this project investigates the following:

What does topic modelling on speeches from the throne delivered in the Canadian House of Commons between 1953 and 2015 indicate about the changing public priorities of federal governments?

Corpus

Speeches from the throne are a strong corpus for studying the public priorities of governing parties. They have several advantages:

Admittedly, speeches from the throne are not a perfect corpus. There may be issues on which a government is intensely focused during a parliamentary session that, for whatever reason, it does not mention in its speech from the throne. Accordingly, I can draw conclusions only on the public priorities stated by governments during their speeches from the throne. I do not mean to imply that these are a government’s only priorities.

This project is based on an analysis of the 51 speeches from the throne delivered between 1953 and 2015. I gathered these speeches from both the Library of Parliament’s list of speeches and the Linked Parliamentary Data Project, LiPaD. I used LiPaD’s digitized record for the 35 speeches that the Library of Parliament had not converted from PDF, as LiPaD’s versions include significant optical character recognition corrections, amid other improvements (Beelen et al. 2017). The 51 speeches total over 158,000 words; the longest is 7,206 words, while the shortest is 144 words.

The time period in question, 1953–2015, spans 21 Parliaments. 12 of these Parliaments had Liberal governments, while 9 had Conservative governments. (I group the Progressive Conservative Party and the modern Conservative Party of Canada under the shared label “Conservative”.) This time period includes at least one speech from the throne from each party during each decade from the 1950s to the 2010s.

Method

I chose topic modelling as a digital tool with which to analyze this corpus.

Topic modelling uses a computer algorithm (run by a program such as MALLET) to infer statistically significant groups of words that appear in a collection of texts. Graham, Weingart, and Milligan explain the approach clearly in a tutorial at The Programming Historian:

Topic models represent a family of computer programs that extract topics from texts. A topic to the computer is a list of words that occur in statistically meaningful ways. A text can be an email, a blog post, a book chapter, a journal article, a diary entry – that is, any kind of unstructured text. By unstructured we mean that there are no computer-readable annotations that tell the computer the semantic meaning of the words in the text.

Topic modeling programs do not know anything about the meaning of the words in a text. Instead, they assume that any piece of text is composed (by an author) by selecting words from possible baskets of words where each basket corresponds to a topic. If that is true, then it becomes possible to mathematically decompose a text into the probable baskets from whence the words first came. The tool goes through this process over and over again until it settles on the most likely distribution of words into baskets, which we call topics. (Graham, Weingart, and Milligan 2012)

For a very practical (and amusing) example of how topic modelling works, “The LDA Buffet is Now Open; or, Latent Dirichlet Allocation for English Majors” (Jockers 2011) explains the technique well.

Each topic is a list of words with associated weights. The first eight words in one topic that I generated, for example, are “olympic athletes winter step planning unnecessary stories gold”. Texts are then compared each topic’s list of words weights to see how much of the text is reflected in that topic. The weight of the word determines its importance to the topic—a topic actually contains much more than eight words, but the weight of some is so minimal as to not influence these calculations.

Topic modelling has been put to some use in historical research. Perhaps the best known example is Nelson’s “Mining the Dispatch”, which combines topic modelling with an innovative approach to web-based scholarship (Nelson). More in the vein of this project, in 2014, Milligan offered an early example of topic modelling for political history. He ran a topic modelling algorithm on the entirety of the Canadian Hansard (the official parliamentary record) between 1994 and 2012, extracting several topics and using them to evaluate a thesis on a shifting narrative surrounding Canada’s portrayal to the outside world (Milligan 2014).

Topic modelling offers two notable advantages when analyzing speeches from the throne over manual coding:

Topic modelling considerations

Based on a survey of literature related to topic modelling (Blevins 2010; Graham, Weingart, and Milligan 2010; Jockers 2011; Milligan 2014; Weingart 2012a; Weingart 2012b; Yang, Torget, and Rada 2011), I identified the following important points to consider in using topic modelling for research:

How I created my model

  1. I first loaded my corpus (described above) into MALLET.

  2. I then curated my list of stopwords. In addition to Voyant’s automatically-generated list (based on common English words), I added the following:

     government
     members
     ministers
     parliament
     parliamentary
     parliamentarians
     senate
     house
     commons
     speech
     throne
     th
     session
    

    I assembled this list by looking at the most frequent words and removing any that seemed to appear only in “routine” portions of the speeches—namely, the openings and closings.

  3. I then generated topics numerous times. I varied the number of topics, comparing the results when I tried to generate 10, 15, 20, 25, 30, 35, or 40 topics. I ultimately settled on 30 topics, as I found that this seemed to strike a good balance between having meaningful topics that were also distinct from one another.