A Fault Tolerant Workflow for Reproducible Research
Article 2014 en
Authors
MR
Manuel Rodríguez-Pascual
AM
A. J. Rubio Montero
RM
Rafael Mayo-García
Abstract
1 min read
In this work, the authors present a set of tools to overcome the problem of creating and executing distributed applications on dynamic environments in a resilient way, also ensuring the reproducibility of the performed experiments. The objective is to provide a portable, unattended and fault-tolerant set of tools, encapsulating the infrastructure-dependent operations away from the application developers and users, allowing to perform experiments based on open access data repositories. In this way, users can seamlessly search and lately access datasets that can be automatically retrieved as input data into a code already integrated in the proposed workflow. Such a search is based on metadata standards and relies on Persistent Identifiers (PID) to assign specific repositories. The applications profit from Distributed Toolbox, a newly created framework devoted to the creation and execution of distributed applications and includes tools for unattended Cluster and Grid execution, where a total fault tolerance is provided. By decoupling the definition of the remote tasks from its execution and control, the development, execution and maintenance of distributed applications is significantly simplified with respect to previous solutions, increasing their robustness and allowing running them on different computational platforms with little effort. The integration with open access databases and employment of PIDs for long-lasting references ensures that the data related to the experiments will persist, closing a complete research circle of data access / processing/ storage / dissemination of results.
Frank Sifei Luan, Ziming Mao, R. Wang, Chi‐Wei Lin, Amog Kamsetty, Hao Chen, Cheng Su, Balaji Veeramani, Scott Lee, SangBin Cho, Clark Zinzow, Eric Liang, Ion Stoica, Stephanie Wang
Discussion(0)
No comments yet. Be the first to comment.