Controlling the language used to describe the topics raised in the Parliament of Canada: A political and linguistic challenge

Words matter – especially in a bilingual environment where there are political sensitivities. As an impartial resource for Canadian parliamentarians (and others) that produces and collects many documents, the Library of Parliament maintains a controlled vocabulary internally to facilitate access. In this article, the author outlines the Library of Parliament Subject Taxonomy and discusses two challenges related to its development: language neutrality and the interlinguistic equivalence of concepts between English and French.

Alexandre Fortier


The Library of Parliament’s Parliamentary Information and Research Service (PIRS) provides research and analysis on any topic related to public policy to senators and members of the House of Commons, as well as to parliamentary committees and associations. Year in and year out, the Library analysts produce thousands1 of background documents that are stored in an electronic document management system. The Library is also the repository for the House of Commons sessional papers and the speeches of members of Cabinet.2 To facilitate the retrieval of this mass of documents by subject and also to optimize the visibility by search engines of documents published online, the Library maintains a controlled vocabulary internally: the Library of Parliament Subject Taxonomy. Controlling language, in a bilingual environment where political sensitivities can be exacerbated, proves to be a more complex task than it may seem at first. First, this article outlines the Library of Parliament Subject Taxonomy. Second, it discusses two challenges related to its development: language neutrality and the interlinguistic equivalence of concepts between English and French.

Description of the taxonomy

All information systems seek to increase precision(the number of relevant documents retrieved as a proportion of the number of relevant or irrelevant documents returned by the system) and recall (the number of relevant documents retrieved as a proportion of the number of relevant documents in the collection). However, two natural language phenomena, synonymy and polysemy, affect precision and recall. First, synonymy, where the same concept is represented by different words or expressions (for example, the expressions “cellular phone”, “mobile phone” and “cell phone” represent the same concept), affects recall: users must think of all possible synonyms to ensure that they can find all the documents on that concept. The opposite is polysemy, where the same word represents different concepts (for example, in French, “droit” is both “the body of legal rules in force in a society” and “permission to do something under rules recognized in a community”), affects precision: without language control, users may have to filter a large number of results with concepts that do not interest them but are represented by the same words.3

The purpose of controlled vocabularies is, first, to control synonymy by ensuring that, in any given information system, a concept is represented by only one label (one word or one expression). Synonyms become keys to access those labels. Second, controlled vocabularies eliminate polysemy by ensuring that each label represents only one concept. The Library of Parliament Subject Taxonomy consists of 2,492 concepts,4 with one English and one French, label in 15 categories5 covering all the topics addressed in Parliament, in addition to networks of synonyms and quasi-synonyms (4,827 in French and 4,505 in English) that allow users to access the concepts using their own words. Special attention is paid to providing as many synonyms as possible. Some emerging concepts such as “zero-emission vehicle,” for which a number of labels exist, have up to 21 synonyms in French and 24 synonyms in English.

The Library of Parliament’s Taxonomy also organizes concepts into hierarchical structures, strictly defined by ISO-25964-1,6 which allow users to achieve the desired level of precision in the representation of a concept. Descriptors from different hierarchies are also linked together by associative relationships that allow users to find potentially useful terms. The semantic network of the Taxonomy contains 22,634 relationships in French and 21,970 in English.

Language neutrality

Languages are rarely neutral. In her seminal work, Hope Olson7 uses the expression “naming information” to describe how indexing terms are assigned. “Naming,” she says, indicates the power to control how subjects are represented in documents and, therefore, how they are accessed. Through the labels they use to represent concepts and the relationships they establish between them, controlled vocabularies (and documentary languages in general) provide a representation of the world. In this sense, continues Olson, naming information is more than representing it, it is constructing it. The major subject languages (such as the Library of Congress Subject Headings, the Library of Congress Classification or the Dewey Decimal Classification8) have often been criticized for their treatment of groups that do not correspond to the standards of the dominant class, factors such as their gender, sexual orientation, ethnic or cultural origins, or physical abilities. For example, by choosing “female executive” (but not “male executive”) under “executive” or “male nurses” (but not “female nurses”) under “nurses”, the Library of Congress is taking a position—intentionally or unintentionally—on the place of men and women in society.

The label given to a concept also influences the view expressed by the controlled vocabulary. For example, the label “Indiens d’Amérique” (Indians of North America) still found in the Répertoire de vedettes-matière de l’Université Laval,9 rather than a more contemporary term such as “peuples autochtone” (Indigenous peoples), reflects a colonialist view—again, intentionally or unintentionally—in the representation of this concept. Sometimes, on the other hand, the choice of a particular label within a controlled vocabulary is undeniably deliberate. A clear contemporary example came in 2016 when a political group in the United States Congress forced the retention of the term “illegal aliens” instead of the more neutral term “undocumented immigrants” that the Library of Congress was seeking in its controlled vocabulary.10

In an environment such as the Parliament of Canada, political sensitivities also colour many debates; the choice of words matters. Nadine Desrochers,11 for example, raises the challenge of representing politically charged concepts, such as “distinct society,” in the documentary languages used in Canada. Since impartiality is one of the three core values of the Library of Parliament, the choice of labels used in the Taxonomy must reflect this value. For example, a subject such as medical assistance in dying, the term used in the Act to amend the Criminal Code, which was the subject of six bills in the 41st and 42nd Parliaments, can also be expressed in several ways: suicide assistance, euthanasia, assisted suicide. Each of these expressions, although relatively neutral, carries a certain emotional charge, depending on which side of the debate one is on. Analysis of the debates in the House of Commons, for example, indicates that the terms “euthanasia” and “assisted suicide”, which are more explicit than the euphemistic term in the legislation, are used more frequently by opponents of this change to the Criminal Code.12 The degree of neutrality of a word or phrase is also culturally embedded and may vary from language to language, with perfect interlinguistic equivalence being far from universal.

Interlinguistic equivalence between English and French

A controlled vocabulary contains concepts that are represented by terms linked to each other by a relational structure. In a monolingual controlled vocabulary, this relational structure is composed of intralinguistic equivalence relationships, hierarchical relationships and associative relationships. Those three types of relationships are used to define the concepts represented. A concept is simultaneously defined by the label chosen to represent it, the synonyms attached to it, and the hierarchy in which it is placed. As a result, concepts acquire the properties inherited from concepts at higher levels, and, to a lesser extent, by the terms linked to them by their associative relationships.

Languages are the expression of complex conceptual and relational universes. A concept is not necessarily identical from one language to another. In a multilingual controlled vocabulary, such as the Library of Parliament Subject Taxonomy, there is an added interlinguistic equivalence relationship. Standards that guide the creation of controlled vocabularies, such as ISO 25964-1:2011, typically describe five degrees of interlinguistic equivalence:

  • Exact equivalence: equivalence describing the same reality (for example, “travailleur étranger” and “foreign worker”);
  • Inexact equivalence: equivalence presenting the same concept, but with a difference in the point of view (for example, “assurance-chômage” and “employment insurance”);
  • Partial equivalence: equivalence linking two concepts that extend differently (for example, “accessibilité pour personnes handicapées” and “barrier-free access”, the second being a broader concept than the first);
  • Simple to multiple equivalence: equivalence linking a concept in language A to several concepts in language B (for example, “droit” with “law” and “right”, or “law” with “droit” and “loi”);
  • Non-equivalence: a concept in language A that does not exist in language B (for example, “teenager” [a young person from 13 to 19 years old] is specific to English).

The main challenge, in a multilingual controlled vocabulary, is to deal adequately with degrees three to five. The relationship of a concept to other concepts may also differ from one language and culture to another. For example, some cultures may treat animals as “goods”, whereas this hierarchy would be totally incomprehensible in another culture. A multilingual controlled vocabulary must have a hierarchical and associative structure common to the language communities it serves. Developing the Library of Parliament’s Subject Taxonomy therefore requires attention to ensure that one language does not take precedence over another. To avoid this pitfall, it seeks to adequately represent the concepts and relational structures around them so as not to create terms that are not understood by the speakers of one language and to force a language into a relational structure that its speakers would not recognize. To achieve this, the taxonomy is constructed or modified simultaneously in English and French. The knowledge of the analysts in the Library’s Parliamentary Information and Research Service is also valuable in choosing terms and including them in a hierarchy.


A controlled vocabulary is not a static product; to remain relevant, it must reflect the evolution of knowledge as well as the evolution of the lexicon of the fields it covers. In the Parliament of Canada, for example, the terminology used to refer to indigenous peoples or gender identities has evolved over time, and these changes are reflected in the Library of Parliament’s Subject Taxonomy. The frequency with which terms are used by indexers and the strategies of researchers must also be periodically evaluated to ensure that the vocabulary remains relevant to the community it serves. Some of these challenges are common to the creation of any controlled vocabulary, but they are of particular importance in the context of the Library of Parliament. Examples include the use of the least partisan vocabulary possible, and an interlinguistic equivalence between English and French where neither language—nor any of the cultural contexts which underlie them—takes precedence over the other. The intellectual exercise required to create such a vocabulary must not, however, take precedence over the primary purpose of the tool: to facilitate the retrieval of information. One must learn to accept some solutions that may be imperfect, but that users find helpful.


