Nitendra Gautam

Managing Data Quality in Relational Databases

Accurate,consistant and timely available data in a a Relational Databse are essential to the management of organization today. Organizations must strive to identify the data that are relevant to their decision making to develop business policies and practices that ensure the accuracy and completeness of the data and to facilate enterprise-wide data sharing.

Maintaining data quality helps a company to :

  • Minimize IT project risk Bad data or dirty data can cause delays and extra work on information system projects, especially those that involve reusing data from existing systems.

  • Make timely business decisions When managers do not have access to high-quality data when they lack confidence in data ,they can’t make a quick and informed business decisions.

  • Ensure regulatory compliance Quality data can help an organizationin justice, intelligence and anti-fraud activities.

  • Expand the customer base by knowing correct information about the customers

Characteristics of Quality Data

Quality data are that kind of data which are free of defects and can be used in operations,decision making and planning in an organization.

Below are some of the important characteristics of Quality data.

  • Uniqueness Uniqueness in terms of data quality means that each entity exists no more than once within the database and is identified by unique key.

  • Accuracy Accuracy has to do with the degree to which any datum correctly represents the real-life object it models.Data must be both accurate and precise enough for their intended use. Data can be valid (i.e., satisfy a specified domain or range of values) and not be accurate.

  • Consistancy Consistency means that values for data in one data set (database) are in agreement with the values for related data in another data set (database).

  • Completeness Completeness refers to data having assigned values if they need to have values.It also means that all data needed are present.

  • Timeliness Timeliness means meeting the expectation for the time between when data are expected and when they are readily available for use. Some data need to be time-stamped to indicate from when to when they apply, and missing from or to dates may indicate a data quality issue.

  • Currency Currency is the degree to which data are recent enough to be useful.For Example: Cutsomer phone number needs to be up-to-date in the database so that we can call customers in real time.

  • Conformance Conformance refers to whether data are stored, exchanged, or presented in a format that is as specified by their metadata. The metadata include both domain integrity rules (e.g., attribute values come from a valid set or range of values) and actual format (e.g., specific location of special characters, precise mixture of text, numbers, and special symbols).

  • Referential Integrity Data that refer to other data need to be unique and satisfy requirements to exist (i.e., satisfy any mandatory one or optional one cardinalities).

Reasons for deteriorate Data Quality

In addition to the defect correction for quality data,it also requires prevention and reporting. As data are frequently updated within an organization ,achieving quality data requires constant monitoring and measurement as well as improvement actions.

Below table shows some of the four important reasons which tells us why the quality of database has deteriorated in the past few years.

External data sourcesMuch of the organizational data originates outside, where there is less control over the data sources to comply with expectations of the receiving organization.
Redundant data storage and inconsistent metadataMany organizations have allowed the redundant, inconsistent, and incompatible data through uncontrolled proliferation of spreadsheets, desktop databases, legacy databases, data marts, data warehouses, and other repositories of data.
Data entry problemsUser interfaces that do not take advantage of integrity controls—such as automatically filling in data, providing drop-down selection boxes, and other improvements in data entry control— are tied for the number-one cause of poor data.
Lack of organizational commitmentNot recognizing poor data quality as an organizational issue because of budget and commitment issue

Improving the Data Quality

Implementing a successful quality improvement program will require the active commitment and participation of all members of an organization.

Some of the key steps which can help improve data quality in an organization are given below.

  • Get the Business Buy-In

An organization should view a data quality as an business imperatives rather than an IT project.Most of the organization needs to obtain appropriate level of executive sponsorship for a good business case related to data quality.It is also important to identify and define the key performance indicators and metrics that can quantify the results for improving data quality.

  • Conduct a Data Quality Audit

An organization without any established data quality project should begin with an audit of data to understand the extent and nature of data quality problems.Data quality audits can include many procedures such as statistically profiling the files and keeping track of all the records of data at table level.By a data audit we can identify the obscure and unexpected extreme values.

Statistical analysis helps to analyze patterns of data using their distribution, outliers and frequencies.Data can be checked against certain relevant business rules(counts,means and variance) and can sent an alert(email,reports) of those rules are broken.

  • Maintain a data stewardship program

Data steward/Data Governance must ensure that the data that are captured within the organization are accurate and consistent throughout the organization so that users within that organization can rely on that data.

  • Improve the Data Capture Process

Improving the data capture process is a fundamental step in data quality improvement program. Critical points of data entry relates to where data are:

  • originally captured
  • Pulled into a data integration process
  • Loaded into an integrated data store(eg Data warehouse)

  • Apply Modern Data Management Principles and Technology

There are different powerful softwares available that can assist users with technical aspects of data quality improvement. These software employs advanced techniques such as pattern matching,fuzzy logic, and other expert systems to analyze data for quality problem,identify and eliminate redundant data and integrate data from multiple sources.

  • Apply Total Quality Management(TQM) Principles and Practices

Data quality improvement is an continuous effort which should not be treated as one-time projects. With this in mind many leading organization apply Total Quality Managemt(TQM) to improve data quality. Even though TQM has many principles,some of of the principles are used to prevent defect(rather than correction), continuous improvement of the processes that touches the data, and use of enterprise data standard.


[1] Jeffrey A. Hoffer, Ramesh Venkataraman, and Heikki Topi. 2010. Modern Database Management (10th ed.). Prentice Hall Press, Upper Saddle River, NJ, USA.