Recommendations for the Handling of Research Data at the University of Potsdam
These recommendations specify and complement the Research Data Policy of the University of Potsdam. They are aimed at all researchers and research support staff. Non-official translation of the recommendations adopted by Senate’s Commission for Research and Junior Academics (FNK) on June 26, 2019, duly noted by Senate on September 25, 2019, with changes adopted by FNK on October 9, 2019 to match the requirements of the novel DFG Code of Conduct Guidelines for Safeguarding Good Research Practice.
I. Working with Research Data
The requirements of good scientific practice regarding honest, methodologically correct, thorough, reliable, and replicable work through good documentation also apply to the handling of research data. Principle overarching elements of secure and well-documented work with research data are listed below. For the planning and structured documentation of these elements it is recommended to use data management plans (DMP) as instruments for project and quality management and to update them regularly. DMP can be created at the level of a project, a data-intensive research instrument, or a working group, as needed. In the case of projects which pose an exceptional burden on centrally-operated infrastructure, central facilities may require the submission of a DMP.
1. Purpose of the data. It isrecommended to determine at an early stage which data will be published or archived and which data will continue to be stored by the researchers themselves. It is useful to define holding periods. Data that are no longer needed, nor worth publishing or archiving, should be deleted regularly. In particular, personal data are subject to the principle of storage limitation: storage is only permitted as long as strictly necessary for the respective purpose. Therefore, research data that contains personally identifyable information must be anonymized as soon as possible according to the research purpose.[i]
2. Intellectual property. Research data usually is not covered by copyright or related intellectual property rights, but in certain cases multiple rights may accrue to different persons. Ownership and usage rights of research data are therefore often unclear, which can limit its reuse. For each project with multiple participants it is therefore recommended to document at an early stage which rights may have accrued to whom, the intention to publish the data, and the reciprocal granting of the relevant usage rights.
3. Secure storage. Data loss is to be prevented through the use of suitable storage services or of storage media and appropriate backups. The use of storage services run by the scientific community is recommended; the use of local storage media and of commercial storage services as a private customer is discouraged. In many cases version-control is helpful. The necessary level of data security is to be ensured by appropriate technical and organizational measures, e.g. effective access restrictions and pseudonymization of personal data that cannot be fully anonymized.
4. Documentation and use of standards. To make data reusable it is necessary to document the context in which the data was created and the tools used for data processing. For the sake of interoperability and long-term readability, the use of appropriate file formats is recommended with preference given to formats using open standards. To further reproducibility it is recommended to define and document conventions for file names and folder hierarchies early on; to use suitable, and when possible subject-specific metadata standards; and to collect relevant metadata already during the research process.
II. Publication and Citation of Research Data
The rules of good scientific practice for publishing also apply to data and software publications. In particular, it is prohibited to withhold data that does not support the authors' hypotheses, to fragment data and software publications with the aim of increasing the number of publications, or to duplicate publications without disclosing the previous publication. The following points provide further guidance regarding the publication and citation of research data.
1. Place of publication. For the publication of research data, well-established discipline- or data-type-specific databases, repositories, and data centers should be preferred. Infrastructures should be used in which data are kept and can be referenced independently; research data should not exclusively be provided with the text publication as supplementary material.
2. Preparation and availability of data. Research data should be made accessible in a processing stage (raw or structured data) that enables meaningful reuse by third parties. It is recommended to strive for consistent adherence to the FAIR data principles (findable, accessible, interoperable, and reusable) when preparing data for publication and selecting the place of publication.[ii] The four principles articulate the following basic requirements:
a. Findability: The data is sufficiently described with relevant metadata and referenced by a unique persistent identifier (e.g., a DOI).
b. Accessibility: The data is readable by both humans and machines and is stored in a trusted repository.
c. Interoperability: Data and metadata use a formal, accessible, shared, and broadly applicable vocabulary for knowledge representation.
d. Reusability: The data is unambiguously licensed, contains correct provenance information, and is well documented.[iii]
Self-programmed research software should be made publicly available as source code. Source code of published software should be persistent, citable, and documented.[iv]
3. Authorship: A person who makes a genuine, accountable contribution to the content of academic data or software publication is an author. Such a contribution is established especially through substantial research-based involvement in the creation, collection, acquisition, or provision of the data, software, or sources.[v]
4. Free licensing and open access. Research data and scientific software should be made available under established, standardized and, as far as possible, free and open licenses. Access conditions and, if applicable, embargo periods should adhere to the principle of "as open as possible, as closed as necessary". The following points provide specific guidance for licensing:
a. Licensing and attribution: The obligation to attribute academic achievements is a principle of good scientific practice. Licenses and waivers for data and software that do not contractually require the attribution of the authors do not supersede this obligation. The preferred way to promote data and software citation is not through restrictive licensing but by including a recommended citation.
b. Public research data: Creative Commons tools have become the most common way to license research data that is made publicly available. Data that is free of copyright and related rights should be labelled as such upon release using the "Creative Commons Zero" (CC0) waiver. For data that is protected by copyright or related rights, using the CC0 waiver is also recommended because it is the best way to ensure reusability. The license "Creative Commons Attribution" (CC BY) can also be considered, but due to the strict formal requirements regarding attribution and the provision of license information it already limits subsequent use. Creative Commons licenses with terms that go beyond the "Attribution" element are not suitable for licensing research data ("Non-Commercial", NC; "No Derivative Works", ND), or are suitable only in certain cases and only to a limited extent ("Sharing under Equal Conditions", SA).[vi]
c. Research data for “Scientific Use” only: Research data to which access has to be restricted should be licensed in consultation with the approriate data center using the licenses recommended by them.
d. Software: Creative Commons licenses (such as CC BY) are not suitable for software. A CC0 waiver can be granted for completely self-written code (to which the programmers own all rights). This is especially useful for short scripts. Free software licenses should be used for more complex products. Examples of common free software licenses are the MIT license and the GNU GPLv3.[vii]
5. Registration of research output: Quality-assured data and software publications that are recognized by the respective scientific community and can be referenced independently should be reported to the University Library by the authors for inclusion in the University Bibliography.
6. Citation of data and software: If no subject-specific standards have been established yet for the citation of data and software and if no guidelines by the journal or the publisher are available, it is recommended to refer to the Data Citation Principles.[viii]
III. Contracts and Cooperations
1. When negotiating grant agreements, in particular with private sponsors, cooperation agreements, and license agreements, researchers and research support staffshould ensure that these agreements observe the principles of the Research Data Policy of the University of Potsdam as much as possible, in particular with regard to the openness and reusability of research data. When transferring rights for reuse, publication, or commercial utilization, care should be taken to ensure that the data remain freely available for scientific purposes, and in particular that no exclusive rights are granted to commercial actors.
2. Cross-institutional research cooperations should align their practices with the Research Data Policy of the University of Potsdam, unless other parties mandate equivalent or stricter requirements. Within the framework of their governance, institutionalised cooperations like clusters and networks should establish clear and binding regulations on joint data management as well as on the openness and usability of their research data at an early stage.
IV. Institutional Responsibility
1. All faculties are advised to consider whether qualification theses should contain statements on data availability in the future and, if so, to set up regulations with an appropriate degree of bindingness.
2. The study commissions are advised to reassess curricula to ensure appropriate consideration of the practical handling of research data as comprehensive course content in undergraduate and consecutive Masters programs.
3. Departments and research groups are advised to appoint research data contacts to enable institutional archiving of the research data at the University of Potsdam.
Contact:
If you have any questions about the Research Data Policy, these recommendations, or about research data management (RDM), please contact the joint research data team of the University Library and the Computer and Media Center (ZIM) at forschungsdatenuuni-potsdampde. Further information can be found on the website of the University of Potsdam at https://www.uni-potsdam.de/forschungsdaten.
[i] To this end, all directly identifying characteristics must be removed, or the pseudonymization key that allows linking the research data to personally identifying information must be destroyed. Further measures may be necessary. Data is only considered anonymous if "any particular information about personal or factual circumstances can no longer be assigned to an identified or identifiable person, or can only be assigned with a disproportionate expenditure of time, cost and labour" (§ 3 BbgDSG).
[ii] Wilkinson, Mark D., Michel Dumontier, IJsbrand Jan Aalbersberg, Gabrielle Appleton, Myles Axton, Arie Baak, Niklas Blomberg, u. a. "The FAIR Guiding Principles for scientific data management and stewardship“. Scientific Data 3 (2016). https://doi.org/10.1038/sdata.2016.18
[iii] League of European Research Libraries. "Implementing FAIR Data Principles". Factsheet (n.d.[2017]). https://libereurope.eu/wp-content/uploads/2017/12/LIBER-FAIR-Data.pdf
[iv] Version-control software and repositories commonly used in software development do not usually meet these requirements, as they neither guarantee continuity nor offer persistent identifiers (such as DOIs). The releases to be cited therefore should also be submitted to a suitable research data repository. GitHub offers an easy-to-use interface to Zenodo: https://guides.github.com/activities/citable-code/
[v] Criteria for whether a contribution is genuine, accountable, substantial, and research-based may vary by subject area. Within the framework of the discipline-specific publication culture, it is possible that the authors of a text publication differ from the authors of the publications of the data and software on which the text publication is based.
[vi] More information about Creative Commons is available at https://creativecommons.org/
[vii] Support in the selection of a suitable software license is provided on the website https://choosealicense.com
[viii] Data Citation Synthesis Group. "Joint Declaration of Data Citation Principles - FINAL“. FORCE11 (2013). https://doi.org/10.25490/a97f-egyk