Extending an Application-Level Checkpointing Tool to Provide Fault Tolerance Support to OpenMP Applications
            
            
               Nuria Losada (University of A Coruña, Spain)  
              
             
            
            
               María J. Martín (University of A Coruña, Spain)  
              
             
            
            
               Gabriel Rodríguez (University of A Coruña, Spain)  
              
             
            
            
               Patricia González (University of A Coruña, Spain)  
              
             
                    
            
              Abstract: Despite the increasing popularity of   shared-memory systems, there is a lack of tools for providing fault   tolerance support to shared-memory applications. CPPC (ComPiler for   Portable Checkpointing) is an application-level checkpointing tool   focused on the insertion of fault tolerance into long-running MPI   applications. This paper presents an extension to CPPC to allow the   checkpointing of OpenMP applications.  The proposed solution   maintains the main characteristics of CPPC: portability and reduced   checkpoint file size. The performance of the proposal is evaluated   using the OpenMP NAS Parallel Benchmarks showing that most of the   applications present small checkpoint overheads. 
             
            
              Keywords: OpenMP, checkpointing, fault tolerance, parallel programming 
             
            Categories: D.1.3, D.4.5  
           |