The “Call Me Sexist But” Dataset

Have you ever wondered if a message you sent may be received as sexist? Did you ask yourself how could you have made it better? To take concrete steps towards understanding what, really, is sexist, a group of researchers curated a dataset of over 10.000 tweets to enable researchers to study the multiple ways sexism appears in day-to-day communication.
Haben Sie sich jemals gefragt, ob eine von Ihnen gesendete Nachricht als sexistisch empfunden werden könnte? Haben Sie sich gefragt, wie Sie es besser machen könnten? Um konkrete Schritte zu unternehmen, um zu verstehen, was wirklich sexistisch ist, hat eine Gruppe von Forschenden einen Datensatz von über 10.000 Tweets zusammengestellt, um Forschenden die Möglichkeit zu geben, die verschiedenen Arten von Sexismus in der alltäglichen Kommunikation zu untersuchen.
DOI: 10.34879/gesisblog.2021.52
Sexism in everyday conversations
Have you ever wondered if a message you sent may be received as sexist? Did you ask yourself how could you have made it better?
Sexism is complex: while we know it when we see it, it is hard to describe all the ways in which something may be sexist—and different people may interpret the same message differently. Some forms of sexism are blatant, such as misogyny that directly antagonizes, humiliates, and threatens women. Other forms are subtler, though they may be equally harmful. For example, praising the sensitive nature of women presents a positive opinion that may not immediately seem sexist. Yet, should women who can fend for themselves be considered lesser women? After all, defining all women as sensitive implies that they are in need of special care, and assumes limits to the experiences women are fit for. To take concrete steps towards understanding what, really, is sexist, GESIS CSS curated a dataset of over 10.000 tweets to enable researchers to study the multiple ways sexism appears in day-to-day communication.1
Measuring sexism
Grounding our observations in these empirical data, we organized the complexity and variety of sexist expressions. We developed a taxonomy that assigns each tweet to a distinct sexism category. How did we make sense of the complexity? Similarly to how we would assess if someone is sexist: through psychological scales for sexism. The field of psychometrics boasts years of experience in developing validated scales for measuring sexist attitudes in individuals. Each scale is made of sentences, called items, that are meant to stimulate a specific response in people with sexist attitudes. Yet, tens of scales address each a distinct dimension of sexism. For example, one scale may measure attitudes towards the roles of women in the context of family life; another scale may measure attitudes towards feminism as an ideology. These scale items embody how social psychologist measure sexism in humans. Through the efforts of an interdisciplinary team and through valuable discussions with experts throughout GESIS, we surfaced seven major categories appearing in 800 sexism scale items, spanning from attitudes towards how women allegedly are and behave, to denial and endorsement of gender inequality.
Strong of our taxonomy for sexist expressions, we leveraged the wisdom of the crowd to associate each of the 10.000 tweets to a sexism category. The tweets comprise of existing datasets for tweets showing ambivalent sexism expressions—that is, sexism with positive or negative attitudes towards women—explored in previous research in natural language processing. We also collected a new dataset of tweets, whose text starts with the disclaimer “call me sexist, but…”—a phrase that often leads to a sexist statement. We used crowdsourcing to scale-up the annotation, so that at least 5 people annotated each tweet. This way, we can reliably consider a tweet sexist only when a majority of 3 annotators agree. This corpus offers a valuable resource for understanding the experiences of women who are faced with in online communication, but also gives nuanced insights on how sexism manifests in everyday language.
The sexism dataset in practice
Resources like this dataset are essential to advance our understanding of sexism, but also serve as practical tools to fight it. There is an increasing need for identifying sexism at a scale and automatically. Online spaces like twitter are often hostile, and research shows that it is often young women who experience particularly severe forms of harassment. Datasets like the one we contributed allow training machine learning models for detecting sexism automatically, which in turn hold promise for helping with content moderation. While human moderators cannot review all tweets as they are written, machine learning models can: they do not tire nor distract themselves and can make a decision in milliseconds. Companies like Facebook employ machine learning models to flag messages that are potentially offensive, so that human moderators can focus on this important minority of message. High quality datasets lead to accurate machine learning models.
In this light, the fact that we bridge between the distinct research traditions of psychometrics and machine learning holds a special significance. The intuition behind our approach is that a perfect machine learning model for sexism should be held to the same standard of a human responding to a sexism scale. Our dataset includes a gold-standard test set of 800 items from psychological scales. Machine learning researchers can therefore test their models against this gold-standard benchmark and see how they would fare on tests intended for humans.
If you are reading about research on sexism, chances are that we agree on how important it is to make space for accessible, inclusive, and democratic public discussions. Sexism perpetuates barriers for ideas that everyone would benefit from. Equitable participation makes for healthier and more productive communities. Our work takes steps towards making equitable participation part of today.
Leave a Reply