about « all posts

Qualitative and Quantitative Content Analysis

Mar 27 2022 · 9 min read
#nlp #content_analysis #quantitative #qualitative

In this post, we are going to discuss the differences between qualitative and quantitative content analysis and their core characteristics.

Content Analysis

The goal of content analysis is to provide insights about some data, i.e. a newspaper article, a questionnaire, or a speech. The insights could be the presence of certain words, themes, or relationships between the two that are useful to support a certain hypothesis or phenomenon. There are two popular ways to perform content analysis: qualitative [1] and quantitative [2, 3]. Both of these methodologies rely on the extraction of information that is organized under different categories selected depending on the goal of the analysis. These categories might in turn contain some qualitative or quantitative data which are then analyzed using different techniques. Often, research works will analyze different sources of information to study a certain research topic, i.e. a collection of newspaper articles from different sources, a set of questionnaires submitted by different groups of people.

Popular uses of content analysis include [4]:

Content Analysis Overview

Qualitative Content Analysis

Qualitative Content Analysis (QCA) has the goal of reducing the complexity of some content from a given research-led perspective [1]. It does so by working with categories and developing a category system or coding frame.

In other words, human coders annotate the content we want to analyze through a set of given categories, following coding guidelines from a codebook. A codebook provides a detailed explanation to human coders on how to conduct the annotation (coding) process. Different strategies are employed to evaluate the work of each coder such as inter-annotator agreement. In this case, annotations from different coders on the same data are compared and coders are graded based on the degree their annotations match the ones of their peers.

The choice of categories is crucial for the effectiveness of the whole research and is often preceded by some extensive study of the research problem and the formulation of a set of research questions.

Several types of categories can be distinguished in the social science research literature [1]:

How to Develop a Category System

There are three principal ways to develop categories [1, 4]:

  1. Concept-driven (‘deductive’) development of categories; in this case the categories:

  2. Data-driven (‘inductive’) development of categories; the characteristics here are the:

  3. Mixing a concept-driven and data-driven development of codes:

Quantitative Content Analysis

Quantitative Content Analysis has many similarities with its qualitative counterpart. In this case, however, the categories that we exploit are quantifiable aspects of the content we are analyzing, i.e. word counts, entity mentions, multiple-choice or graded responses to a questionnaire. This information is most often extracted through automated means and covers large quantities of data.

After the data collection process, we can conduct two types of analysis, i.e. conceptual and relational.

In the first case, the frequency and presence or absence of certain concepts expressed through any of the previously selected coding categories is computed and analyzed. This can be based on term frequencies or a more elaborated quantitative representation of information.

In the second case, we focus on the relations between the categories we annotated in our coding structure. This allows us to evaluate the intensity and structure of the network of relations between different concepts. These results should then be evaluated and interpreted following some underlying theoretical framework of hypotheses related to the phenomenon we are studying.

Qualitative VS Quantitative Content Analysis in Summary

Qualitative Content Analysis:

Quantitative Content Analysis:

Quantitative Content Analysis Case Study

Our case study will be the analysis of the Diary of the President of the Republic (PoR)[3].

The Italian Constitution assigns the PoR, as Head of the State, a role that is not only symbolic or ceremonial in nature but also endowed with high political relevance. The PoR power in the Italian parliamentary system has been mainly analyzed through three different theoretical approaches: the institutional, relational and presidential leadership approaches. The institutional approach analyzes the presidential power as a result of the normative restraints and intervention opportunities conferred to the PoR by the Constitution, and by established institutional practices. The relational approach analyzes the power as a result of the PoR’s relations with other political actors. Since political parties are considered by several scholars as the most important actors of reference, the variability of the PoR’s power ultimately depends on the relations of the PoR with the parties; i.e. the so-called “presidential accordion” [6] . Recently, Poguntke et al. proposed an interpretation of the PoR power which develops within the leadership theory. This theory refers to the processes of presidentialization of politics [7] , and analyzes the PoR’s power as a result of exogenous conditions (e.g., international factors such as the EU restraints or opportunities) and/or endogenous conditions (e.g., cultural factors such as the disaffection from party politics and governmental institutions) to the political system. These conditions enable the PoR to decide whether or not to use his/her personal resources in power relations. The PoR’s higher power would show through a more intensive use of the so-called informal or soft powers. These soft-powers are based on his/her personal communication skills and personal resources exercised through formal and informal channels of influence, like the freedom of speech and expression of his/her personal opinions, on every possible policy issue [8], and the “moral suasion” powers [9]. To analyze the above factors which describe the presidential powers according to the leadership theory, we choose to employ the theoretical framework offered by QNA. The period taken into consideration for the analysis corresponds to the first Napolitano’s Presidency, and stretches from May 15 2006 to April 30 2013. Information on 3068 events of the seven-year term – contained in the Diary – was then collected automatically from the Web. The PoR’s Diary contains all of the appointments of the President, written in a formal style in Italian as shown below.

Date Place Description
5/30/06 Palazzo del Quirinale On. Sen. Franco MARINI, Presidente del Senato della Repubblica, e On. Fausto BERTINOTTI, Presidente della Camera dei Deputati
6/7/06 Palazzo del Quirinale On. Silvio BERLUSCONI, Presidente di Forza Italia
6/3/08 Palazzo della FAO Intervento alla cerimonia di apertura della Conferenza sulla sicurezza alimentare, promossa dalla FAO

Following the strategies described above, we can define a set of categories to represent the relations between the entities we would like to consider in our analysis and encode the information in one of the records of the above table as follows:

(Event:
(Subject: Presidente della Repubblica), (Verb: incontra),
(Object: On. Silvio BERLUSCONI), (Internal Politics:
(Political Organizations:
(Political Parties: Leader of party),
(Goverment: Prodi II), (Parliamentary/Extraparliamentary: Parliamentary), (Majority/Minority Political Parties: Minority), (Party Name: Forza Italia)),
(Legislative Power:
(Chamber of Deputies: Leader of Minority Group))),
(Date: 7 Giugno 2008),
(Place: Palazzo del Quirinale)).

Once the information is encoded, we can analyze it by computing normalized frequency counts of different categories like members of the majority of minority parliamentary groups.

Frequency counts example

We can also analyze the intensity of relations between the PoR and different powers in the Italian political system by looking at the co-occurrence frequency of each of these entities in each encoded data sample.

Frequency counts example

Location information could also be visualized through some Geographic Information Systems such as Google Earth Pro.

GIS visualization example

The main challenge for the application of these quantitative analysis techniques to the human sciences is the initial investment needed to develop ad-hoc algorithms for the extraction of information from the chosen data source. This calls for a close collaboration between researchers from different fields, as shown in this sample case study. At the time of writing, there is no solution that could satisfy the needs of all researchers.

However, the democratization of machine learning approaches and Natural Language Processing (NLP) techniques – such as Named Entity Recognition (NER) or topic modeling – is making these approaches more accessible to the general public. They have also become easier to employ in a modular way, as elements of an information extraction and categorization pipeline, that can be customized each time to the user’s needs.

References

[1] Kuckartz, Udo. “Qualitative text analysis: A systematic approach.” Compendium for early career researchers in mathematics education. Springer, Cham, 2019. 181-197.

[2] Franzosi, Roberto, Quantitative narrative analysis. Sage, 2010.

[3] Purpura, Alberto, and Marco Calaresu. “A Semi-Automated Approach for Information Extraction, Classification and Analysis of Unstructured Data.” arXiv preprint arXiv:1910.12734 (2019).

[4] Content Analysis, https://www.publichealth.columbia.edu/research/population-health-methods/content-analysis, accessed on March 27th, 2022.

[5] Hsieh, Hsiu-Fang, and Sarah E. Shannon. “Three approaches to qualitative content analysis.” Qualitative health research 15.9 (2005): 1277-1288.

[6] Mauro Tebaldi, Il Presidente della Repubblica, Il Mulino, 2005.

[7] Poguntke, Thomas, and Paul Webb, The presidentialization of politics: A comparative study of modern democracies. Oxford University Press on Demand, 2007.

[8] Tebaldi, Mauro. “From notary to ruler: The role of the president of the republic during the Italian crisis (2010–14).” South European Society and Politics 19.4 (2014): 561-581.

[9] Amoretti, Francesco, and Diego Giannone. “The power of words: the changing role of the Italian head of state during the Second Republic.” Modern Italy 19.4 (2014): 439-455.