data cleaning mistakes

Data cleaning is an important task that sales, marketing and IT teams should undergo on an ongoing basis, at the minimum once a year. Think of it as your spring cleaning but for your data. It is easy for data to get dirty due to data input errors, data natural ages and processes that are creating inconsistent or duplicate data. By doing data cleansing on an on-going basis, you can catch issues before they hurt your business.

How to tell if your data needs cleaning?

Prior to data cleansing, it is important to see if you have any reason to clean data, while data cleaning is recommended on a periodic basis, it is typically time to call your data cleansing company if you have one or more of the following:

  • Reporting cannot be trusted: You are adding filters to account for ‘issues with data quality’, you are exporting data and cleaning it prior to reporting it, or simply you stopped looking at reports because you are not sure if your CRM or data sources are usable. This is usually a sign to prioritize data cleansing.
  • Bounce Rate & Unsubscribe Rate is increasing Marketers or sales teams will most likely identify an issue if bounce rate is increasing. Increased bounce rate means that the email is no longer valid, this could be due to aging database or addition of spam/bogus records. Unsubscribe rates increasing means there is a need to cleanse your marketing automation system, while not related to the exact data it does require a marketing automation audit and cleaning or adjusting your workflows and processes. Either way, if you notice your marketing performance decreasing it is often a sign that its been too long prior to your last data cleanse.
  • Sales Inefficiencies: For those companies that track sales performance, by status changes or other metrics like outbound calls, you may notice that they are decreasing or deals are taking longer to close. While there maybe different reasons for increase time, decreased performance, it is a good idea to ask your sales teams why? Chances are they may say things it takes longer to reach this person because email is wrong, or I do not have a complete information to make a successful call so I have to do my own research. Or marketers favorite lead quality is bad. If you are hearing any of these complaints, its time to prioritize data cleansing.
  • Customer Service complaints increased: By far the worst metric to see an increase on is customer service complaints. For eCommerce companies especially seeing returns, phone calls and emails increasing means that something is going wrong. It can be cause by duplicates in the system, wrong shipping address being labeled, product descriptions do not match current products or inventory mismanagement is causing issues. At this point, you have no choice but to address the root cause which is ensure your data is not the issue.
  • Team Complains: Finally, if your team is bringing up data issues, it is probably a good idea to call your data cleansing agency to have a data audit to understand what data issues are there.

Now that you have decided to cleanse your data your next step is to clean. What is involved in data cleansing?

What is included in Data Cleansing?

Depending on your data and what issues you have your data cleansing will typically include one or more of the following:

  • De-duping: duplicate records can cause confusion, increased time spent de-duping manually and bad reporting. Regardless if your duplicates are on the product, customer record, prospect/lead database or in other places, it is ideal to not have duplicates. Keep in mind in certain situations, you maybe creating duplicates for a reason, such as selling into the same organization but into different teams will mean you have duplicate company names, that is an exception.
  • Dealing with Bogus Records: Every company gets spam or bogus records, it could be people filling out forms to try to sell you on their services or products, it could be test records or demos created by your internal personal or it could have been created with corrupted data. Either way, it is a good idea to ensure that bogus records are removed so they do not count into your reports and your team is not spending time updating records or deleting them.
  • Standardizing Data: making sure your data is consistent is important for the purpose of marketing segmentation, analysis and territory planning for sales. It is also important for online retailers because you should standardize categories, and data for filters and search.
  • Improving data completeness: incomplete data can mean incomplete reporting, lengthier sales cycle or missed opportunities. Doing a data audit on what data is missing and finding a solution to fill that data is a good idea. Ways to complete your data include: adjusting your forms (but pay attention so you do not harm conversion rates), using third party to fulfill missing data and changing internal processes to identify ways to collect the necessary data.
  • Custom data cleaning: depending on your data you may have other needs on cleaning data fields, removing historical data or other customized data projects.

What mistakes to avoid when starting your data cleansing initiative:

Your data cleansing company can ensure that proper steps are taken prior and during your data cleansing to avoid these mistakes. However, if you are planning to do your own cleaning, here are a few pitfalls to avoid:

  • Not Doing a Data Audit: Prior to beginning it is always important to plan and analyze your current state of data. This way you can prioritize your data cleansing initiatives, you can identify gaps in data and ensure that you do not miss any data cleaning.
  • Not Backing-Up Data:  Starting with a back-up can ensure that in the event you de-dupe the wrong record, delete a non bogus account and cause other issues, you will have a way to restore your data. Today’s CRMs, you can do a quick snapshot of your system prior to making live production changes.
  • De-duping without a hierarchy: When de-duping it is advisable to come up with a criterion for master vs. merged account. The hierarchy can be based on data completeness, date created, or some other custom field or fields. Merging at random can mean data loss or wrong data being kept.
  • Not validating the quality of third-party data: Prior to appending any third-party data, it is advisable to check how accurate is that data. There are many affordable options, but not all of them have high quality, some third-party data providers will keep record information that are greater than 2 years +, therefore you maybe overriding good data with bad. It is always a good idea to have a hierarchy of which data is priority, as well as testing data providers prior to making production changes.
  •  Standardizing manually or using AI: There are many choices for data standardization, AI companies, packages on python or R as well as hiring a poor intern to standardize your titles. The solution is somewhere in between. Automation will get you 90% there, but should be supervised, while manual work will get the remaining 10% done. Doing it in combination will give the optimal result for data accuracy, cost and time.
  • Not finding root cause of bad data: Cleaning data prior to identifying what caused dirty data is a recipe for continuous data cleaning. Eliminating the issues, regardless if its process, human error or a different problem, will ensure that you would not need to clean your data as soon as you finished cleaning.

There are other mistakes that can be avoided by hiring a professional data cleaning firm. They will be able to provide you with problems, solutions and get your data in the right step without taking any down time from your team.

How to hire a data cleaning company

There are many data cleaning companies, some are offshore which come at a lower cost, others are technological companies that have automated every process. There are also customized data cleansing agencies such as StrategicDB which offer customized solutions for any data cleansing needs.

About StrategicDB

StrategicDB is a data cleansing company specializing in customized data cleaning projects. With the use of latest available technology and humans, they can customize any data cleansing initiative to the companies needs. Having words with large publicly traded security firms and start-ups alike, they are able to handle any data in any format and tool.