Skip to main content

Research

We are dedicated to advancing the field of software engineering in the context of computational science, also known as research software engineering (RSE). Our work encompasses a wide range of exciting areas, including: software quality in the realm of science, FAIR (findable, accessible, interoperable, reusable) software, the role of software in Open Science, as well as computational workflows and their automated construction. Collaboration with researchers from other disciplines (including bioinformatics, geosciences, and more) is essential to our work.

Below we give an impression of some of our current research activities.

PD Dr. Henning Bordihn, an independent researcher part-time affiliated with our group, describes his line of research at his own webpage.

FAIR Research Software

FAIR (Findable, Accessible, Interoperable, and Reusable) is a set of guiding principles designed to enhance the usability of research data and related resources, including software. Reception of the FAIR principles was initially focused on research data, but has been extended to encompass research software as well. By applying the FAIR principles to software, researchers can enhance the reproducibility, transparency, and impact of their work, fostering a culture of open science and collaboration.

Prof. Lamprecht was actively involved in the initial formulation of FAIR principles for research software and in the FAIR for Research Software (FAIR4RS) working group. Key outcomes from this work include the following:

  • A. Lamprecht, L. Garcia, M. Kuzak et al.: Towards FAIR Principles for Research Software. Data Science, vol. 3, no. 1, pp. 37-59, 2020. https://doi.org/ 10.3233/DS-190026 (open access)
  • N. Chue Hong, D. Katz, M. Barker, A. Lamprecht, et al.: FAIR Principles for Research Software (FAIR4RS Principles). RDA Recommendation, 2022. https://doi.org/10.15497/RDA00068 (open access)
  • M. Barker, N. Chue Hong, D. Katz, A. Lamprecht et al.: Introducing the FAIR Principles for Research Software. Nature Scientific Data Sci 9, 622, 2022. https://doi.org/10.1038/s41597-022-01710-x (open access)

Now more and more efforts are underway to establish guidelines, standards, and tools to support the development, evaluation, and dissemination of FAIR software. Communities and organizations, such as the Research Software Alliance (ReSA) and the FAIR for Research Software (FAIR4RS) initiative, are actively working to promote the adoption of FAIR principles in the realm of research software.

Research Software Repository Analysis

Software repository analysis is used to examine and analyze code repositories to gain insights into various aspects of software development projects. It involves studying the codebase, commit history, branches, documentation, and other related artifacts within the repository. The goal of software repository analysis is to understand the project's development patterns, collaboration dynamics, code quality, and overall project health. It helps identify trends, assess project progress, detect potential issues or bottlenecks, and make informed decisions regarding project management, code improvements, and team collaboration strategies.

We perform software repository analysis on open-source research software repositories to understand better how research software is developed and maintained. We analyze the repositories for general characteristics (such as programming languages used, number of contributors and size of the code base), FAIRness aspects (such as availability of a license, documentation and citation information), as well as other quality parameters (such as test coverage, modularity and coding style).

In collaboration with research software engineers at the University of Utrecht, Netherlands, we have developed the SWORDS (Scan and revieW of Open Research Data and Software) pipeline (https://github.com/UtrechtUniversity/SWORDS-template) as a framework to get insights into open source activities of universities and research institutes. In his Master’s thesis, Keven Quach used SWORDS to obtain insights into UU’s research software landscape:

We are now extending SWORDS to cover further criteria and started to analyze the research software landscape at Potsdam’s universities and research institutes.

Automated Composition of Scientific Workflows

Scientific workflows are systematic and structured processes that researchers use to organize, automate, and document their computational experiments or data analysis tasks. These tasks can be executed sequentially, in parallel, or in a distributed computing environment. Workflows can involve various software tools, libraries, and data sources, allowing researchers to integrate diverse resources and automate the execution of their experiments.

Composing purpose-specific workflows from the wealth of available resources can be a tedious and challenging endeavor. Approaches to automated workflow composition aim to simplify and streamline the process of creating complex workflows by reducing the manual effort required. Instead of manually designing and connecting each task in a workflow, automated composition techniques can analyze the requirements, available resources, and constraints to generate a suitable workflow automatically. FAIR resources, in particular FAIR computational tools with rich metadata that includes semantic descriptions of inputs, outputs and operations, facilitate the application of automated workflow composition techniques.

Based on a 2020 Lorentz Center workshop on automated workflow composition in the life sciences, we have compiled the following overview article that you might find useful as an introduction to the topic:

  • A. Lamprecht, M. Palmblad, J. Ison, V. Schwämmle, et al.: Perspectives on automated composition of workflows in the life sciences [version 1; peer review: 2 approved]. F1000Research, 2021. https://doi.org/10.12688/f1000research.54159.1 (open access)

Over the past years, we have developed the Automated Pipeline Explorer (APE, https://github.com/sanctuuary/APE) as a command line tool and Java API for the automated exploration of possible scientific workflows (pipelines) from large collections of computational tools. For further information, see:

Notable applications of APE include:

  • V. Kasalica, V. Schwämmle, M. Palmblad, J. Ison, A. Lamprecht: APE in the Wild: Automated Exploration of Proteomics Workflows in the bio.tools Registry. Journal of Proteome Research, 2021. https://doi.org/10.1021/acs.jproteome.0c00983
  • J. Kruiger, V. Kasalica, R. Meerlo et al.: Loose programming of GIS workflows with geo-analytical concepts. Transactions in GIS, 2020. https://doi.org/10.1111/tgis.12692
  • S. Scheider, R. Meerlo, V. Kasalica, A. Lamprecht: Ontology of core concept data types for answering geo-analytical questions. Journal of Spatial Information Science, 2020. https://doi.org/10.5311/JOSIS.2020.20.555    

Workflomics

Creating optimal workflows for specific data analysis problems requires an interplay of exploring the latest relevant tool combinations and benchmarking selected workflow candidates with reference data to determine the best-performing ones. Due to a lack of adequate tooling, this is currently hardly done systematically. Therefore, many workflows compromise on scientific quality.

To tackle this problem, in collaboration with Magnus Palmblad (Leiden University Medical Center), Veit Schwämmle (University of Southern Denmark) and other colleagues from the ELIXIR network, we have started the Workflomics project (https://github.com/Workflomics). The term “workflomics” combines "workflows" and "omics," following the pattern of other scientific disciplines such as genomics and metabolomics, thus meaning the systematic and large-scale study of workflows. With support by the Netherlands eScience Center (NLeSC), we develop a novel software system facilitating the workflomics idea: Its key contribution will be a new and unique integration of tools and metadata with technologies for automated workflow exploration and benchmarking. The system will provide a much-needed platform for systematic workflow generation and evaluation that complements and can be interfaced with existing state-of-the-art workflow systems.