It’s Not Such ‘A Fair Way Off’ to Process Open Data: Facing Requirements on Open Access and the FAIR Data Principles.

Dr. Sebastian Netscher clarifies the difference between Open Data and FAIR Data, and explains why even access-restricted data can conform to FAIR principles. He recommends that researchers focus on processing shareable data and rely on the existing research infrastructure of data archives and repositories.

Dr. Sebastian Netscher erklärt den Unterschied zwischen Open Data und FAIR Data und erläutert, warum auch zugangsbeschränkte Daten den FAIR-Prinzipien entsprechen können. Er empfiehlt Forschenden, sich auf die Verarbeitung von gemeinsam nutzbaren Daten zu konzentrieren und die bestehende Forschungsinfrastruktur von Datenarchiven und Repositorien zu nutzen.


DOI: 10.34879/gesisblog.2020.3

In the context of Open Access to research data – also referred to as Open Data – the FAIR data principles are being integrated more and more often into research projects (for an example see Guidelines on FAIR Data Management in Horizon 2020). In requiring that data be findable(F), accessible(A), interoperable (I) and re-usable (R), funders and journals expect that facilitating Open Data will (inter alia) increase transparency in research, foster researcher innovation and scientific cooperation (both international and cross-disciplinary), and ensure efficient use of public funding. However, the requirements for producing data that meet all of the FAIR data principles can be challenging for many researchers.

Much can be made clearer by remembering three simple statements about the FAIR data principles. First, FAIR data is not a synonym for Open Data. While Open Data should be available for everyone and all purposes, access to FAIR data can be restricted, e.g. due to legal issues and the obligation to protect personal information, and still be FAIR. Second, FAIR does not establish what data must look like. “There is no such thing as ‘unfair’” (Barend Mons et al. 2017), instead, the FAIR data principles are best understood as a multi-dimensional continuum, composed of the four (more or less) independent FAIR facets. For example, findability is not a fixed feature of data. The ‘findability’ of research data can be improved, for instance, by mapping domain-specific standardized information about the data (i.e. metadata) to other metadata standards of other research domains to increase metadata distribution. Third, FAIR is less a matter of the data itself than it is of the data’s metadata. While the FAIR data elements remain the “ultimate goal”, having “FAIR metadata is of very high value in its own” (FORCE11). In other words, although the data itself might not be accessible, the capacity to find information about the data and why its access is restricted online achieves one aspect of FAIRness.

From these three statements, we can draw two simple conclusions when supporting researchers who process Open Data on the base of the FAIR data principles. The first is that researchers should focus on processing shareable data that is as open as possible, enabling the widest possible re-user community to continue working with the data. The second is that researchers whose aim is to make their data FAIR should rely on existing research infrastructure, i.e. data archives and repositories, and their standards and guidelines. In general such institutions a) assign a unique and persistent identifier to the data and increase findability by registering data in online data catalogues as well as ensuring citable data, b) manage data access in the long run, c) facilitate interoperability (of metadata) by mapping and harvesting metadata, and d) license data appropriately and thus support its re-usability. In conclusion, realizing the FAIR data principles depends not only on individual researchers processing sharable data, but rather it is the responsibility of the whole research community to make metadata and data as FAIR as possible to facilitate their re-use (Wilkinson et al. 2016).

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.