How Many People Should Suffer Before We TALK about it?

Human Rights’ movements tend to experience surges in popularity after emotionally intense, or traumatic, events. One such example is Black Lives Matter, which was founded in 2013 and was revived in 2020 after George Floyd’s death. Other benign (non-traumatic) events can also be catalysts for these movements: another example is the movie Bombshell, which was released in 2019 and reopened the conversation about workplace sexual harassment (#MeToo) that was first made famous in 2017.

This data story focuses on the MeToo movement and in general about the topic of sexual harassment. We probe the media discourse and public opinion to gain insights about events and biases that characterize this discussion.

How to Tackle this Problem?

foursquare_logo In order to explore the public perception of the MeToo movement and investigate the impact of traumatic/non-traumatic events we explore quotes in the Quotebank dataset from 2015 to 2020.

In order to augment our data, we have fetched tweets related to #MeToo from the public twitter API and integrated them with previously extracted tweets on MeToo.

Research Questions

  1. Can we extract quotes related to the MeToo movement or Sexual Harassment in the workplace from Quotebank 2015-2020?
  2. What is the impact of the traumatic vs non-traumatic events on the MeToo movement, based on quotes and tweets involved in #MeToo?
  3. Is there a gender bias in the speakers of quotes related to sexual harassment?
  4. Can we observe cancel culture as a ramification of the #MeToo movement?

Can we extract quotes related to the MeToo movement or Sexual Harassment in the workplace from Quotebank 2015-2020?

We investigated the quotes in Quotebank throughout the years from 2015 to 2020 and we found a total of 115,584,251 quotes.

From these quotes, we extracted the words most frequently co-occurring with MeToo, shown in the word cloud below. We validated our results with a word cloud from a MeToo search on Media Cloud, a news-gathering website, to confirm that our sample is representative of the movement.

Our Quotes 😎 Media Cloud 📹

We then extracted from Quotebank2015-2020 the quotes including the keywords listed below :

These are samples of the initially extracted quotes.

Some of them do not look quite right, as we included some general keywords in our selection. We set out to refine our sample: using Latent Dirichlet Allocation (LDA) we assigned a topic to each of the previously extracted quotes. The topics detected by the LDA were very insightful of the content of the quotes extracted so far.

We could interpret these topics as:

  1. Women Empowerment 👩
  2. Sexual Harassment
  3. Politics 📰
  4. Christmassy 🎄

The second topic seems to be more like our target. It was further investigated by adding another LDA layer to filter it more. Then, we applied a PCA to explore the clustering of the latest topics generated. The plot below shows a subsample of the first 2 PCA dimensions of the extracted quotes before filtering them based on the topics for a second time. (You can read the quote itself by hovering the data point on the plot!)