Publishers‘ data submission policies for journal articles. An explorative review and guidelines

The replication crisis in many disciplines beginning in the early 2000s highlighted a lack of transparency and led to calls for “open science” and data availability for replication. The text discusses journals’ and publishers’ data sharing policies, which aim at making data usage in the social sciences more transparent. It presents the results of an empirical check of such policies and comes to the conclusion that in most cases authors only need to provide proof of the availability of their data. An example of a proper data citation is also included.

Die Replikationskrise in vielen Disziplinen, die Anfang der 2000er Jahre begann, machte einen Mangel an Transparenz deutlich und führte zu Forderungen nach „offener Wissenschaft“ und Datenverfügbarkeit für Replikationen. Der Text befasst sich mit den Richtlinien für die gemeinsame Nutzung von Daten durch Zeitschriften und Verlage, die darauf abzielen, die Datennutzung in den Sozialwissenschaften transparenter zu gestalten. Er präsentiert die Ergebnisse einer empirischen Überprüfung solcher Richtlinien und kommt zu dem Schluss, dass Autor*innen in den meisten Fällen nur die Verfügbarkeit ihrer Daten nachweisen müssen. Ein Beispiel für eine korrekte Datenangabe ist ebenfalls enthalten.

DOI: 10.34879/gesisblog.2024.88


Introduction1

Beginning in the early 2000’s several social science disciplines like psychology and economics experienced what is referred to in the literature as a ‘replication crisis’. That is the “finding […] that a large proportion of scientific studies published across disciplines do not replicate”2. One of the issues raised in this context was a lack of transparency, which supported the call for open science and the publication of research data for others to reproduce and replicate scientific results. Scientific journals picked up this issue and started to demand that researchers make their data available in some way. Over time more and more journals have published data sharing or access policies which detail the way data should be accessible. The policies vary and, in some cases, it is not clear to authors if the data they have used needs to be provided to the journal itself, the journals’ publishing companies, or if it is sufficient to include a reference to where the data is available. This is the moment in which researchers approach data infrastructures like GESIS to find out if they are allowed to hand over the data to those companies.

To guide researchers to fulfil the requirements of journals and publishers we first look at what social science research data is and how data that was retrieved from a repository, data archive or Research Data Center can be used for scientific analyses. This is the area of usage licenses. We then look at the situation of data sharing or access policies. We undertake a limited analysis of demands of social science journals as well as of publishing companies and show that for most of them it is enough that a proper reference to the data exists. Finally, we show how GESIS handles usage rights for data and give advice on how to properly reference the data you use. We also go into detail on the question of what you can do in case journals or publishing companies demand that the data be made accessible via their platforms. This might be done i.e. by providing chunks or sub-samples of the entire datasets.

Social Science data and licenses

“Research data are data that are generated during scientific projects, e.g. through observations, experiments, simulations, surveys, interviews, source research, recordings, digitisation, or evaluations. In terms of research pragmatics, although not always clear-cut, a distinction can be made between primary and secondary research data […]. In the research process, secondary data can itself become primary data again, which is important for the life cycle of research data”3.

Usage licenses

The main difference between primary and secondary data is that primary data can be actively influenced by researchers through their choice of a research design etc., whereas secondary data is collected by others. And these researchers or institutions may hold rights to the data they have collected. They can transfer these rights to others in the form of licenses, which mean usage rights. In a lot of cases this transferal of rights is done as part of agreements between researchers and data repositories, research data centres, or data archives like GESIS. These institutions then provide access to this data and users of this secondary data must agree to terms of service, usage regulations or licences.4.

The openness of the license depends among other things on the sensitivity of the data (data protection) or possible copyright issues (i.e. images). Furthermore, access to some data collected by public administration (i.e. register or social security data) or statistical offices (i.e. Destatis in Germany) is regulated by law. Whatever the license model, it specifies who can do what with which data for what purpose. In the social sciences this implies very often that the data may not be shared with others. And this might be a challenge for some journals. In the next section we look at the current situation of data sharing or access policies.

Data sharing and access policies

Over the past years research has been undertaken on data policies and on the question of which journals demand what level of data availability. First, the terminology is not clear. Crosas et al. (2018)5 find that journals and publishers speak of ‘Research Data Policy’ (i.e. Springer Nature), ‘Data Sharing Policy’ (i.e. West European Politics), or ‘Data availability policies’ (i.e. Oxford University Press). Furthermore, the landscape of data policies is still heterogenous and “policy documents cannot be found easily at the publisher’s website”6. In this paper we use the term very broadly, because our main question is whether a journal requires an author to deposit his or her data i.e. on the journal’s website, in a repository provided by a publisher, or if it is sufficient to provide a reference to the data i.e. in form of a DOI.

When we look at the landscape of data policies we see very different results. Crosas et al. (2018) for example look at six social science disciplines and find a very heterogeneous situation of data policies. While for example only 18 percent of the history journals under investigation do have such a policy, 74 percent of all economics journals offer information on or demand the data to be shared. This variation across disciplines supports the results of older investigations. Focusing just on economic journals, Vlaeminck (2021)7 finds that the number of journals with some kind of data availability policy increased between 2014 and 2019 from 38 percent (100 out of 262) to 68 percent (104 out 223). This matches the results of Crosas et al. (2018). In an investigation on a sample of 34 scholarly journals from five disciplines Novotny and Seyffertitz (2023) find that only three journals have no data guidelines of any kind. They too can show that some of the data policies are hard to find, but their results imply that “publishing data driven research in top scholarly journals increasingly requires authors to address data and code as early as the planning stage of a publication”8.

In order to get a firsthand impression on what data policies actually demand of researchers and what they demand when it comes to data provided by GESIS, we looked in detail at 100 journals from a bibliography of articles linked to data provided by our institution. This way we could make sure that quantitative secondary data had been used for the analyses.

Universe, cases, and sampling

The articles we used were sampled from a universe of 33,166 references (August 2024) of GESIS’ ‘Research Data Bibliography’ (‘Forschungsdatenbibliographie’). This bibliography covers publications, which have worked with data from one of thirteen collection that GESIS takes care of, such as the German General Social Survey (ALLBUS) or the German Longitudinal Election Study (GLES). Each year about 10,000 publications are checked for references to data for this list, and about 20 percent of those publications are included in the bibliography. The criterium for an inclusion is a quantitative analysis of the data.

GESIS uses BibSonomy (https://www.bibsonomy.org/; last access: 25.09.2024), a social bookmarking and publication sharing system, for keeping its publication records. The publication records are used for GESIS’ quality management and reporting. All titles are also findable via the search on our website where they are visible in connection with the data that was used for the publication.

Sampling

We selected 50 journals with the highest number of articles using GESIS data and 50 journal articles at random. Eliminating duplicates, we ended up with 90 different journals and book publishers that were analyzed. 14 of those had no submission policy, which left us with 76 cases that we studied in more detail.

Analysis

One result of our analysis was that most publishers have no overarching guidelines for data sharing but refer to the guidelines of the respective journals. At the same time, the guidelines for sharing data are often similar due to similar wording and requirements but can vary greatly even among journals published by the same company. The European Economic Review and Research in Social Stratification and Mobility are two examples of journals both published by Sage Publication. While the first journal demands a data appendix, the latter has no precise specifications on data used. In 68 out of the 76 resulting cases (about 90 %), data submission was not mandatory but was often desired. In detail with multiple possible indications per case this means:

  • 30 of the publishers or journals wanted to see a data availability statement,
  • 33 requested a citation and the indication of a DOI,
  • 21 asked for links to the location of the data (i.e., repository) if possible, and
  • 13 publishers or journals were not explicit about their demands.

We only came across one journal, the European Economic Review, which made data submission mandatory. In a few other cases, in which mandatory submission was also part of the data access policy, the journals or publishers allowed for justified exceptions or make it possible to bypass the obligation in exceptional cases. This results in effectively one out of 76 cases (about 1 percent) in which data submission was obligatory.

We then looked at the 50 journal articles we had picked at random and checked if the journals or publishers provided direct access to the data utilized by their authors. In six cases no precise information about data availability was available, which left us with 44 articles for closer inspection. Out of those 44 articles, 36 (82 percent) offered no access to data. In six cases access to research data was offered, and in four cases, links to the data or an appendix were provided. In two articles data was available via a repository and in two more cases access to the data was possible on request. This means that only about 18 percent of the articles provided direct connections to the underlying data.

Result

Through our analysis of articles using data provided by GESIS, we can conclude that our findings support the available research: the landscape of data access policies is very heterogenous. At the same time, only very few journals make data sharing mandatory and even among those we only find one that upholds this mandate and does not allow for bypassing the obligation. This means that authors that come across the demand for making their data available should check the journals data access policy carefully. In most cases, it suffices to provide a data availability statement or a proper citation of the data.

Our results are restricted in the sense that our analysis relies on a small convenience sample of journals and publishing companies. For example, we do not categorize the journals by disciplines or whether they predominantly focus on quantitative analyses. Furthermore, individual journals may not be easily comparable with those of large publishing companies, as they do not have to adhere to regulations imposed on the publishing company.

After looking at data access policies we conclude with some advice for those who encounter demands for data sharing by a publisher or scientific journal. We look at the usage licenses of GESIS, provide an example of proper citation, and point out the possibility to share a subset of the data received from GESIS in case everything else fails.

GESIS usage licenses, proper references, and sharing of subsets

GESIS has set up usage regulations with currently four access categories 0 (‘zero’), A, B and C. While data under access category 0 can be used by anybody and can thus also be freely shared with others, access category C often means filling out a usage agreement or in some cases traveling to Cologne to analyse data on-site in our Secure Data Center. All access categories except for 0 prohibit sharing the complete data files for legal reasons like the contract GESIS signs with the researchers who have collected data in the first place, the so called ‘principal investigators’.

There is a citation reference for each database, so called ‘study’ in the GESIS data holdings. You find this reference at the end of each study description in the GESIS catalogue. The reference includes the current version of the data marked by a version number and a specific DOI. A citation for older versions can be found under ‘Versions’ in the study description.

The citation is composed of:
[Principal Investigators] ([Version Year]): [Title]. [Data Collector]. GESIS, Cologne.
[Study Number] Data file Version [Version Number], [DOI]

An example looks like this:
Schmitt, Hermann; Popa, Sebastian Adrian; Devinger, Felix (2015): European Parliament Election Study 2014, Voter Study, Supplementary Study. GESIS, Cologne. ZA5161 Data file Version 1.0.0, doi:10.4232/1.5161

The DOI is cited in the form of a universal resource locator (URL) and leads users directly to a landing page on which they can access the data. In the case of GESIS and many similar institutions this landing page is a catalogue entry. A citation like the example above should be enough to prove the data availability.

In the exceptional case of a journal or publishing company that still wants to see the data, you can request in writing (email is sufficient) the sharing of a sub-sample of the complete dataset, meaning that part of the dataset that was used for the analysis. Please turn to us at dataservices@gesis.org in case you have such a request.

References

  1. We thank Ina Lendowski (GESIS) for her support.
  2. Korbmacher, M., Azevedo, F., Pennington, C.R. et al. (2013). The replication crisis has led to positive structural, procedural, and community changes. Commun Psychol, 1(3), p.2. https://doi.org/10.1038/s44271-023-00003-2
  3. RfII – Rat für Informationsinfrastrukturen. (2016). Begriffsklärungen [Definitions]. RfII Berichte No. 1, Göttingen, p.11. (translation by authors)
  4. Watteler, O. (2022). Daten in den Sozialwissenschaften [Data in Social Science]. In: Tausendpfund, M. (Hg.). Forschungsstrategien in den Sozialwissenschaften: Eine Einführung. Springer VS, Wiesbaden, 225-256, p.241f. https://doi.org/10.1007/978-3-658-36972-9_10
  5. Crosas, M., Gautier, J., Karcher, S., Kirilova, D., Otalora, G., Schwartz, A. (2018). Data policies of highly-ranked social science journals. https://doi.org/10.31235/osf.io/9h7ay
  6. Novotny, G., Seyffertitz T. (2023). Research Data Policies in Scientific Journals – a Case Study. In: Heuveline, V., Bisheh, N., Kling, P. (Hg.). E-Science-Tage 2023. Empower Your Research – Preserve Your Data, Heidelberg, 73-88, p.81. https://d-nb.info/1312641177/34#page=75
  7. Vlaeminck, S. (2021). Dawning of a new age? Economics journals’ data policies on the test bench. LIBER Quarterly: The Journal of the Association of European Research Libraries, 31(1), 1–29. https://doi.org/10.53377/lq.10940
  8. Novotny, G., Seyffertitz T. (2023). Research Data Policies in Scientific Journals – a Case Study. In: Heuveline, V., Bisheh, N., Kling, P. (Hg.). E-Science-Tage 2023. Empower Your Research – Preserve Your Data, Heidelberg, 73-88, p.81. https://d-nb.info/1312641177/34#page=75

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Discover more from GESIS Blog

Subscribe now to keep reading and get access to the full archive.

Continue reading