| 
          
            User-Oriented Approach to Data Quality Evaluation
            
            
               Anastasija Nikiforova (University of Latvia, Latvia)  
              
             
            
            
               Janis Bicevskis (University of Latvia, Latvia)  
              
             
            
            
               Zane Bicevska (University of Latvia, Latvia)  
              
             
            
            
               Ivo Oditis (University of Latvia, Latvia)  
              
             
                    
            
              Abstract: The paper proposes a new data object-driven   approach to data quality evaluation. It consists of three main   components: (1) a data object, (2) data quality requirements, and   (3) data quality evaluation process. As data quality is of relative   nature, the data object and quality requirements are (a) use-case   dependent and (b) defined by the user in accordance with his   needs. All three components of the presented data quality model are   described using graphical Domain Specific Languages (DSLs). In   accordance with Model-Driven Architecture (MDA), the data quality   model is built in two steps: (1) creating a platform-independent   model (PIM), and (2) converting the created PIM into a   platform-specific model (PSM). The PIM comprises informal   specifications of data quality. The PSM describes the implementation   of a data quality model, thus making it executable, enabling data   object scanning and detecting data quality defects and   anomalies. The proposed approach was applied to open data sets,   analysing their quality. At least 3 advantages were highlighted: (1)   a graphical data quality model allows the definition of data quality   by non-IT and non-data quality experts as the presented diagrams are   easy to read, create and modify, (2) the data quality model allows   an analysis of "third-party" data without deeper knowledge on how   the data were accrued and processed, (3) the quality of the data can   be described at least at two levels of abstraction - informally   using natural language or formally by including executable artefacts   such as SQL statements. 
             
            
              Keywords: data object, data quality, domain-specific language, executable model, platform-independent model, platform-specific model 
             
            Categories: E.0, H.0, H.1.0, H.2.0, I.6.5  
           |