In traditional business, data quality was quite simple. You either had data that was right or wrong. However, with the continued growth and deployment of Big Data and technology, it is now a more complex matter. There are millions or websites transacting daily meaning it is incredibly tough to keep up with the competition. Unlike 20 or 30 years ago, we are no longer just competing with other High Street stores but just about everybody who has a digital presence.
With vast amounts of data, quality has more dimensions than a standard right or wrong. In most data quality strategies, 7 dimensions are generally considered.
- Accuracy – does the data represent reality
- Completeness – are all fields completed
- Consistency – is data the same across platforms
- Conformity – are formats of said data the same e.g. dates are always dd/mm/yyyy
- Uniqueness – there are no duplicate entries
- Integrity – is the data valid and related
- Timeliness – is data available when users need it
Each of the dimensions should be considered when conducting a data quality analysis and ensure your needs are suitably met. There is an increasing awareness of the criticality of data as an asset to make informed decision with the successful deployments in large enterprises like Facebook, Google and Amazon of AI and IoT technology. It is important to have a strategy in place that can efficiently monitor the business data quality.
Defining Data Quality
A data quality analysis can only be successful if you know the goals. This could be ensuring all customer records are unique or consistent across every system and platform in the business. Each goal should have an owner and then a list of impacted processes. Following this, rules can be applied to ensure data quality as everybody is working towards the same objective.
Existing data should be analyzed against the rules that were set up when defining the goals for quality. This should look at all of the dimensions we spoke about at the beginning of this article to best gauge the current business position. For example, if the goal is to eliminate duplicate data from the database, it is important to see how many rows are duplicated based on several attributes e.g. emails, phone numbers, postal addresses.
Within this stage, it is also important to look at data security and availability as well as standards already in place through governance strategies.
Once the data has been assessed it will be possible to review the gap between the current position and the business goals. This may involve some root cause analysis of the data quality. Common reasons for poor data quality are human error, systems, technology and lack of processes, each of which should be reviewed.
Data Quality Improvement
After establishing the goals, analyzing the data and root cause, an improvement plan for data quality can be developed. This should include detailed timeframes, actions to be taken and owners who are responsible for achieving the goals which have been set. In some cases, it may require budgeting depending on the scale of the task at hand and the amount of data involved. Given that Big Data solution can involve billions of records, the investment must be accounted for.
Data quality solutions can only be implemented once an improvement plan has been completed. The plan should be signed off by all stakeholders and everybody in the business must be aware of their goals. Implementations might be technical like email verification on a website or de-duping at point of sale. These deployments need to be trained out to the relevant teams and there may be cases where customers are made aware to create the right experience.
Control and Continuous Improvement
Data quality analysis should not only happen once. The process should be incremental and continuous to make sure the 7 core dimensions are always adhered to. In some industries as much as 80% of data can be out of date every 12 months meaning a cyclical strategy is imperative. It requires the entire organization to have a data focused culture with direction from the top down. This in itself is a tough process and one that shouldn’t be taken lightly.
For help with your data quality analysis and reporting, contact StrategicDB a full service data cleansing and analytics company.