data cleaning

Regardless of how you are using there is a chance that your data may need to be cleansed. Clean data means that your data is up to date, not duplicated, correct, and consistent. In other words, your data should be useable for everyday tasks and reporting.

How does dirty data get into

Before cleaning your data, it is advisable to look at what lead to dirty data in the first place. Typically data gets outdated overtime, the time depends on what data you have. For example, for sales data, your probably gets outdated after 3 years or so as people move jobs sooner these days. Inconsistent data is created by your team during input if fields are not standardized, or during list uploads. Bug tracking for example, can have old irrelevant or duplicated data due to input error. Regardless of how you use, the first step prior to cleaning is to do a data audit.

Data Audit for

The data audit for takes a snapshot of your data and asks questions such as:

  • When was your data created?
  • How is data inputted today?
  • What data is inconsistent?
  • What data are you using for reporting?
  • What data gaps do you currently have in your system?
  • What are your current reporting limitations?

Data Cleansing Company focusing on will be able to provide data on number of duplicates, which fields are incomplete and what fields should be consistent. Once the data audit is complete, you can start putting a plan together on what is needed to be cleansed and how to go about it.

Data Cleaning Plan for

Once you identified your data problems, you need to identify solutions. Your data cleansing plan will including not only ways to clean up historical data, but also how to fix the problem from happening again. For example, if you have industry field that is not a drop down but free text field, then any data inputted into your Sales CRM will be inconsistent going forward as well. If you are collecting lead data, then making sure forms are consistent with the right amount of fields, the more data you ask for the less your form conversion rate is so you want to be strategic in what you ask and at what stage of your funnel. For work management, your plan may include identifying ways to label tasks or perhaps the problem was there were too many people created redundant tasks or status was not labeled perfectly.

Data cleansing agency will provide you with a clear plan on action(s) needed to happen to stop the problem from coming back, and a project plan on how to tackle historical data. Typically, the plan will consists of both process changes for your team and system changes such as forms or field type changes.

Cleaning Your

Once you addressed the root cause of dirty data, it is time to clean historical data by:

De-duping: Establishing duplicate companies or leads in your Sales CRM has to start by identifying all the duplicates. To identify companies you may use company name, address, phone number, website or email domain (assuming they are not using a personal email address such as or For leads you may use email, phone number and first/last name along with company name. Keep in mind that you may identify same person but at a different company, best practice there is to label one of them as no longer there and list the new company so that sales team has history of the lead. For tasks and project the de-duping is a bit tricker, as it can be labeled differently, therefore, ideally you can look at project name or task name based on machine learning technics to identify similar tasks assigned to the same person or team to identify the duplicates. For feature requests and other dev modules you would also use AI to identify any duplicates.

Once duplicates are identified, a master record has be assigned, this can be done by rules for example, the record with the most complete data or the latest record. But this part should be reviewed by the team to ensure that it is in fact a true duplicate (some companies have deliberate duplicates), and to ensure that the right master record is kept.

Once the master record is identified you would need to merge records or update the surviving record with data from the to be deleted record. Your Data de-duping company, can de-dupe your records for you.

Standardize Data: Best example of unstandardized data is Country Field, if it was not a list you may have United States being inputted into your Sales CRM as ‘US’, ‘USA’, ‘United States’, ‘United States of America’. While, this is a primitive example, the same inconsistency can be found in other fields most commonly found in fields such as industry, title, any numeric values can be ranges such as employee size, status, tags, and other custom fields.

Inconsistent fields should be identified and fixed into becoming a drop-down on forms and as type of field. Once that’s done, you can update by placing the right value into the new consistent field. This process can be easily done by your data cleaning company.

Historical Data Cleansing: Think of your house’s spring cleaning but for your data. Old, outdated information may not be useful anymore. Unless you need it for reporting purposes, chances are its just clutter. As with clutter in your home, it maybe needed and should be kept for reporting, legal liabilities and so on. However, just like clutter you may want to store it in a way that it does not interfere with your day to day. The best way to do so is using Status or another field to indicate old information. This process can help you with your dev platform as you would not need to worry about bugs that have been in the system in 2+ years that no one is planning to address or has been resolved a while ago. It will also help your sales team from thinking they have more leads then in actuality.

Data cleaning agency can help identify outdated data and either delete it in case it hold no value to the business or mark it appropriately so it does not become cluttered.

Customized Data Cleansing: Finally, you may have data problems that fall outside the normal set of data cleaning tools. For example, you may need to merge two different database into one. Or your data maybe in a different format and you need to update it prior to uploading. Or you may have partial sales data, but for segmentation purposes you need to update it. Or perhaps you are migrating from Salesforce to and want to make sure you have complete data that is cleansed.

Regardless of your custom data needs, data cleaning firm can help not only clean that data but also offer strategic solutions into future data management which includes data quality reporting.  

Data Quality Reporting: Once your data is cleansed, your next step is to educate your team on new processes if it was not done prior to the cleaning. The other step that is often overlooked by top data cleaning companies is monitoring of data quality post clean-up. It is easy to set up reporting in or your BI tool to monitor for things like:

  • Number of leads/tasks/projects created by month: this will identify any anomaly such as list uploads which can introduce dirty data by not standardizing the list prior to upload or worst uploading in the wrong format.

  • Data Completeness by Key Fields: monitoring percentage of your data that is complete, can help identify if any fields are dropping their current data completeness and address the issue before it grows.

  • Number of variations per field that are standardized: can help identify if any new data is being created without the proper standardized process. This is typically the case when data is uploaded, or new forms are created.

  • Statuses: Introducing bad data status can help identify automated sources which produce poor quality leads or have data quality issues.

  • Other Customized Widget: Since companies implement differently and for different purposes it is important to have a metric measuring any special cases that can contribute to dirty data. Catching them prior to them becoming a bigger issue can help you save time and money down the line.

Hiring a Data Cleansing Company for Your Clean-Up

Once you identify that you have an issue that is: your reports are no longer predictive, your sales team is complaining, your projects start being delayed, and your teams stop using it is time to hire a data cleansing agency. Prior to hiring an agency, it is a good idea to identify what problems you wish to solve not from data perspective but for your company. What are you trying to achieve and cannot? What are the biggest pain-points? And in the perfect world I would have this? Once you have those questions you can start the process of brining a data cleansing specialist into your data world.

Some questions you may wish to ask a prospective company includes:

  • Do you offer a data audit?
  • What services are available post clean-up?
  • Do you offer data governance solutions?
  • Do you offer data reporting?

About StrategicDB

StrategicDB is a data cleansing company that was established in 2014 to help businesses overcome the challenges effected by data. We can be found in Toronto, Canada, and service clients around the world, while the team stays in Canada. Our philosophy is that dirty data starts somewhere and until you stop the leak there is no point in cleaning the pipe. We start by asking strategic questions, followed by a detailed data audit, we provide long term solutions prior to starting your data cleaning, we proceed to clean your data and put together tools to identify issues before they become problems.

We work with fortune 500 companies, start-ups and small companies to help bring your data to a useful state. No project is too small or too complex, contact us to get a quote today.