The German Federal Election 2021 Twitter Dataset

The German Federal Election of 2021 will be the most digitalized ever: In times of Covid-19, a younger electorate and omnipresence of user generated content, social media has become an important tool and arena for politics. While collecting data from Twitter’s Application Programming Interfaces (APIs) is simple, it is difficult to find meaningful information. To improve social science research with Twitter data, we published The German Federal Election Twitter dataset. It contains 1,556 Twitter accounts of all candidates running for the seven parties represented in the previous session of Bundestag. The data gives researchers, journalists and citizens the ability to observe the election campaign on Twitter.

Die Bundestagswahl 2021 wird die am stärksten digitalisierte aller Zeiten: Angesichts der Covid-19-Pandemie, einer jüngeren Wählerschaft und der Omnipräsenz von nutzergenerierten Inhalten sind die sozialen Medien zu einem wichtigen Instrument und zu einem zentralen Schauplatz der Politik geworden. Während das Sammeln von Daten aus Twitters “Application Programming Interfaces (APIs)” einfach ist, ist es zugleich schwierig, aussagekräftige Informationen zu finden. Um die sozialwissenschaftliche Forschung mit Twitter-Daten zu verbessern, haben wir den Twitter-Datensatz zur Bundestagswahl veröffentlicht. Er enthält 1.556 Twitter-Accounts aller Kandidat*innen, die für die sieben im Bundestag vertretenen Parteien kandidieren. Die Daten geben Forscher*innen, Journalist*innen und Bürger*innen die Möglichkeit, den Wahlkampf auf Twitter zu beobachten.

DOI: 10.34879/gesisblog.2021.48


Social media transforms our everyday communication – but even more so, it transforms politics. As tweets become part of news stories, and important societal debates (#metoo) are triggered by hashtags social media has become an important tool for politicians, voters, journalists and researchers alike.

While German politicians used to be reluctant to replace the face-to-face experience with a computer screen, the COVID-19 crisis left them no choice but to adapt – fast. The German federal elections are typically already complex, with a crowded field of hopeless list candidates and fringe parties. The current election pushed this to the extreme: a potentially gigantic parliament with many seats up for grabs and high electoral volatility in absence of an incumbent chancellor (for the first time in over 70 years). Now the comparably low-cost environment of social media levels the playing field as it replaces large shares of traditional campaign communication. This year saw a rise to more than 6.000 candidates, a more than 30% increase compared to 2017.

Accordingly, mapping this candidate field has become ever more challenging.  With the German Federal Election Twitter Dataset, we offer a comprehensive overview, first for all the seven major parties represented in the current parliament, and in the next iteration, for all remaining smaller parties and lone contenders.

Monitoring social media is a state of the art method to observe campaigns, debates and even (parts of) public opinion. But it is easy to overload research with irrelevant and unstructured data. There are many projects that monitor the elections on social media based on content: they analyze hashtags, keywords or mentions of the lead candidates. However, this is not the full picture. Politicians use social media differently: they do not necessarily attempt to maximize their outreach among Twitter publics but use it as a device to communicate their targeted messages and mobilize their core voters. To get an idea of campaign strategies, relevant issues and political positions, it requires the total universe of the tweets candidates posted.

Collecting data this way requires identifying relevant accounts of political elites. The German Federal Election Twitter dataset allows researchers to collect relevant data for network, text and image analysis.

Data Collection

To find all communication by political elites, it is important to correctly identify relevant accounts. To do this, we first had to identify the candidates. In Germany, candidates are nominated by local and state party organization. While the Federal Returning Officer (FRO, Bundeswahlleiter) produces a list of candidates shortly before the election, at this stage it is too late to find all of the candidates’ Twitter accounts in time. The team of the German Longitudinal Election Study (GLES) therefore monitored the proceedings of the major parties on the state and district level to identify their candidates and Twitter accounts before they are officially submitted to the FRO. This is enabled by the party internal democracy that is required by German party law for the nomination process of candidates.

The next step is to identify a candidate’s social media presence. This is not trivial, as no social media user needs to proof her/his identity when creating or renaming an account. At the same time, not all politicians use their accounts for campaigning and might not even use their real name. This creates uncertainty and problems with the coding, resulting in a fair amount of detective work. To do this, we defined clear guidelines and used a number of coders to look up the accounts. We coded an account as being a campaign account if it matched the candidate name and the account was followed by other members of the party and/or if the posts were related to party politics. We also used the profile image as an indicator.

Features

The first batch of accounts presented here are the 2,558 candidates of the seven major parties AfD, CDU, CSU, DIE LINKE, FDP, Greens and SPD.  Of these 1,556 had a Twitter account (60%).1 The Figure below visualizes the respective share. As we can see, some parties have a higher share than others. For each of the candidates, we also provide variables offered by the FRO (district name/number, state, list, list position and gender), in addition to the relevant Twitter account and – most importantly – the Twitter user ID. On Twitter, users can change their screenname (account name) easily, a feature politicians use fairly often. Once they run for office, many candidates add their party affiliation to their account name, while after they win a mandate, they change it to signal their status as member of parliament (“mdb”). The static Twitter user ID is a stable numeric identifier that allows detecting users and associated metadata through the Twitter API despite changes in account names.

Applications

Once researchers have this data, they can use it to collect all sorts of interesting behavioral features: content of tweets, images posted, what posts where retweeted and how often candidates were retweeted themselves. Other information contains the friends and followers, allowing social network analysis to identify communities clustering around specific interests or parties. Other applications are the spread of misinformation, who is a preferred target of hate speech or questions rooted in the party politics literature such as varying issue salience and framing of issues across parties. There are new approaches from the field of natural language processing to identify different campaign strategies such as negative campaigning, which has seen great attention during the current campaign. All in all, this data set allows creating vast databases to answer research questions about politics, social media, and democracy in the digital society.

Der Twitter-Datensatz

DOI: 10.4232/1.13789

Sältzer, Marius, Stier, Sebastian, Bäuerle, Joscha, Blumenberg, Manuela, Mechkova, Valeriya, Pemstein, Daniel, Seim, Brigitte, . . . Wilson, Steven (2021). Twitter accounts of candidates in the German federal election 2021. 
GESIS Data Archive, Cologne. ZA7721 Data file Version 1.0.0, https://doi.org/10.4232/1.13789.

Footnotes

  1. We thank the team of Jan-Hinrik Schmidt at the Leibniz-Institut für Medienforschung | Hans-Bredow-Institut (HBI) for their valuable input that made it possible to publish an updated version 2.0 of the dataset.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.