5 Sources of Dirty Data and How to Fix It

The more data you collect the more there is a risk that your data is becoming dirty. StrategicDB a data cleaning company has put together 5 of the most common sources of dirty data and how to go about fixing it. While no database is 100% accurate, clean, and consistent, you can eliminate common mistakes to ensure that it is getting closer to 100%.

Why clean data matters?

Consider data is probably the biggest assets that companies have. Data is used for sales, marketing, reporting, legal, accounting and product development reasons to name a few. In fact, it is probably easier to say that no department is untouched by some data. Therefore, having reliable data is key to success of any organization. Dirty data costs money too. Consider a simple case of a duplicate customer account, you are double counting it in your reporting which makes your projections inaccurate, if you are mailing any marketing it costs you extra in printing and postage costs, it takes your customer service longer to find that customer, and worst you maybe annoying the customer by sending them the same email twice.

Inconsistent data has a high cost too, while it may not seem like a big deal to have a state spelled in different ways such as ‘NY’ or ‘New York’, but consider that now any segmentation automation must include both version you are duplicating efforts. It goes beyond reporting and segmentation, but what about sales ops who have to now identify records in both cases to ensure that leads go to the NY rep? Now in case of country or state you are looking at 2 versions, but what if you are not consistent in other fields such as industry or titles, this could mean thousands of versions of the same category.  What’s worst is if this data is user facing, for example, you have multiple categories in your eCommerce store making your prospect buyers confused where to find the product and checking multiple products to find the one they are interested in, chances are they will leave your store. Finally, outdated data kept in your database over long periods of time, just costs you more in server fees, more clean-up is needed on larger data set and probably processes are instilled to deal with outdated data.

Finally, reporting which is so critical in moving the business forward is stuck working with bad data, at the best case it is just extra processes for them so more time. At worst, dirty data is actually making analytics report on wrong numbers which has implications beyond projections but opens up to wrong insights and decision making.

Prior to fixing your dirty data its key to understanding what caused dirty data in the first place, stopping the pattern that it lead there is the first step to cleaner data.

Sources of Bad Data:

  1. Human input – The hardest part of any database to fix, is the human input as bad source of data. This could be uploading to the wrong fields, not checking if there is a duplicate and adding a new prospect or simply ignoring protocol and having missing data. This is the hardest piece to fix, as most database administrators, sales ops or marketing ops have no control over users. A few ways to prevent it is to have processes in place along with training to anyone who has the power to add data to your CRM or marketing automation tool. Another way is to limit access to a few people who will follow protocol. You can also try to implement automated process such as check for duplicates prior to allowing a new contact to be added. You can also have stricter upload rules to not allow users to upload any data that does not match strict formatting. The one thing to watch for is to not slow down the teams efficiency in the name of clean data, therefore, a combination of the above solutions typically work best.
  • In-accurate mapping to third party: In todays systems you have access to third party data such as Zoominfo via Salesforce or Hubspot’s automated data appending such as IP city or Country, while these tools help add important data to your CRM, if not implemented correctly can cause more problems. Consider industry field, if you are adding it from different places they could be different for example, one may have HR services, another Human Resources, and a third professional services. Having a look up table to bring all this data into a new classification field usually is the best solution. Restricting views of those automated fields, and ensuring they are not used in forms or when a user uploads new data can ensure consistence and data integrity. It is important to remember that third party is not always the most up to date and accurate, normalized data available. Therefore, each third party tool should be evaluated and treated with a different priority level for data hierarchy.
  • Outdated data: Start-ups rarely have the issue with outdated data, however, mature businesses with over 2-3 years worth of data has a high percentage of database that is outdated, especially in B2B settings. Considering that people change jobs every 1-2 years, companies get acquired or close, and people move, it is a good idea to keep your database clean on an ongoing basis. One way to do so is to send a simple email on a 6 month to a year basis, to check if the email is still valid, this is a great solution for B2B. For B2C people generally do not change email addresses, but they do move, therefore, it is important to have other indicators such as last purchase date. One practice which as data cleansing consultants we have seen is to add a field that is a marker for outdated data, based on various indicators like last open/clicked email, last purchase date or last phone call, after a certain amount of time, flag as potentially outdated or outdated or active, this will help with reporting and other departments identify what part of your data is active.
  • Bad processes – duplicating or updated data. For anyone that thinks that robots are better then humans with data, can be often proven wrong because processes were coded by programmers who may have multiple processes in conflict. For example, you may have one process that creates an order, if that order is not checked against current customers can create a duplicate customer. Or a customer email, can add a new customer instead of checking for the same account. It is a good idea to do an audit on your processes to ensure that they are efficient and do not create dirty data.
  • Not standardized fields on forms. The easiest way to prevent bad data is to ensure your forms collect the right data, in the right format. The more fields you ask on forms the lower the conversions, however, without the necessary fields you are limited in what you can do with the data. Therefore, it is recommended to identify key fields that are mandatory. Next is check if the field should be a free form text format or a drop down. For example, industry, company size, and state should be a drop down. Keep in mind that some data points can be added with third party tools. Even company names can be a drop down, with an option to add a new one. Therefore, it is a good idea to check every form that is currently enabled to ensure that the best data collection practices are being used.

Now that you have audited your data and stopped dirty data from coming back it is time to clean your data. A few ways to do so is by hiring a data cleansing company or by doing your own normalization/standardization of data, de-duping and eliminating of bad data.

How to find the right data cleaning company for your business?

Different data cleaning solutions exist in the marketplace. Some are automated within your CRM, while others are fully customized with a dedicated team of data cleaning specialists such as StrategicDB. Different tools and processes exist to help achieve a clean CRM. One way to identify the right data cleaning firm for you is to ask the following questions:

  • Do we have any custom data that third party data will not provide?
  • Do we need to automate standardization of fields for future?
  • Do we have duplicates in our system that we want to keep?
  • Do we have a database that’s aged that we may want to see if still relevant?
  • Do we want a custom solution for identifying no longer there contacts that maybe in a new company at the moment?

If you answered yes to one or more of the questions, then you need a data cleansing service provider, as oppose to automated tool. Automated de-duping for example is great but if you have duplicate company with different departments, you may not want to lose that data. And to spend time reviewing many duplicates can be time consuming and be error prone. Therefore, hiring a company such as StrategicDB can be the solution to your data integrity.