Extending an Application-Level Checkpointing Tool to Provide Fault Tolerance Support to OpenMP Applications
Nuria Losada (University of A Coruña, Spain)
María J. Martín (University of A Coruña, Spain)
Gabriel Rodríguez (University of A Coruña, Spain)
Patricia González (University of A Coruña, Spain)
Abstract: Despite the increasing popularity of shared-memory systems, there is a lack of tools for providing fault tolerance support to shared-memory applications. CPPC (ComPiler for Portable Checkpointing) is an application-level checkpointing tool focused on the insertion of fault tolerance into long-running MPI applications. This paper presents an extension to CPPC to allow the checkpointing of OpenMP applications. The proposed solution maintains the main characteristics of CPPC: portability and reduced checkpoint file size. The performance of the proposal is evaluated using the OpenMP NAS Parallel Benchmarks showing that most of the applications present small checkpoint overheads.
Keywords: OpenMP, checkpointing, fault tolerance, parallel programming
Categories: D.1.3, D.4.5