What is Data Hygiene?
Data hygiene is way to ensure that your data is complete, accurate, consistent and current. While, it is impossible to have 100% perfectly clean data there are steps that you can ensure high data hygiene.
Steps for Good Data Quality:
1) Establishing a proper data governance strategy is step one in achieving good data hygiene. Data governance plan should include all stakeholders and users and make everyone accountable for data quality. It is difficult to make someone care about their data unless they realize the impact that it has on their jobs and their colleges. Often dirty data happens due to lack of understanding the need to input certain information.
2) Data quality Reporting should be established to monitor the data quality going forward. Some metrics can include: % of records with incomplete data, % of records that have not been modified in a certain time frame and other KPIs which are important to your data quality strategy.
3) Audit data providers to ensure that high quality data is being received. If you use appending services or third party data providers, it is a good idea to verify their data and ensure that the records you are receiving are complete and accurate. It is not surprising to see bounce rates of 30% or more from some of more reputable data providers in the marketplace.
4) Clean your historical data to ensure that you bring your data up to par, once you have established your data governance, reporting and have audited your providers. Often companies only clean their data when their are big changes happening such as new management, system migration or the data is so dirty that it is impacting every day life potentially including a decline in sales or increase in costs.
What is included in Data Cleansing?
The first step of data cleansing is to identify outdated, incomplete, inconsistent and duplicated records. Then it is time to clean data by:
- De-duping duplicate records – prior to deduping it is important to define what is a true duplicate. You can de-dupe on a household level or individual level for B2C companies or on a national, international or per address or even brand level for B2B companies. Regardless of the rules, it is important to ensure that you do not have duplicates in your system. You can use de-duping tools, such as StrategicDB’s Deduping Tool. You can also try to prevent duplicates inside the system prior to them being created however, in some instances you may want a duplicate therefore, those systems will restrict your business.
- Normalization or Standardizing Data – It is impossible to establish segments, run any analysis or to have a strong sales operations team without consistent data. If you have 5 different variations of the same country, or 100 different possibilities into job titles or industries, it is impossible to select all of them or have any insights on those fields. The solution is to normalize or standardize those fields that should be consistent or that can be grouped for better performance.
- Data Completeness – Not having complete data harms all users of the data. Prior to identifying which records are incomplete, it is important to establish mandatory fields for completeness. For example, you may choose to make sure that phone number, address and email is filled but other fields are good to have and not must have. After you have identified the fields that are critical and records that are missing that information your final step is to decide if you want to use third party data to append this information or you simply want to flag those records as incomplete and choose a different path.
- Outdated Data – Depending on the length of time your data has been in your CRM, your data may have the majority of records outdated and with people switching jobs every couple of years and moving every 7 or so years you can see how your data may not hold the same value as it did 5 or more years ago. There are a few ways you can clean this data. The first is to go to a data verification services to identify records that are outdated this can be costly but the more accurate approach. The second is to simply exclude those records that have not been modified or had any activity in the last x amount of years.
- False or Inaccurate Data – While you can identify bogus records and test accounts identifying records or worst fields that are not correct is almost impossible to do especially on large data sets.