Part of data cleaning is de-duping. But before you can begin your de-duping process you need to define what a duplicate is. Duplicate definition is the most important part of any deduping process, and therefore it is usually step one. Here is what will happen if you do NOT define what makes a record a duplicate:

  • Records that are mandatory will be deleted in the de-duping process. Sometimes duplicates are necessary for legal or operational reasons therefore, deleting them will not be a good idea.
  • Duplicates will remain in your CRM, if you de-dupe based only on one or two fields you may still have duplicates left after your de-duping processes. This will create inefficiency and will also require an additional de-duping down the line.

Since you want the de-duping process to remove as many duplicates as you have and yet keep the records that are needed for business to continue to run, it is important to establish a duplicate definition.

So how do you define a duplicate?

Since each system, department and business have their own set-up there is no one clear rule. However, here are the steps to take to help clarify your duplicate definition:

  1. Invite a person from each department that touches the database or crm that you plan on de-duping. Also invite, the database administrators of tools that are connecting to your CRM. Remember, when you de-dupe one system it can have an impact on another. For example, if you are de-duping in Salesforce, it may impact processes in your Marketing Automation Tool such as Marketo. Therefore, it is important that all stakeholders of the data are considered.
  2. Ask each department, how they define a duplicate. For example, sales may identify a duplicate company as long as the name is the same, while finance may identify a duplicate only if the address is the same.
  3. Ask the reason for identifying the duplicate. You may realize that finance needs duplicates as they identify each product ordered as a unique record regardless of how many times that company has placed the order.
  4. Once you have all the reasons and definitions of the duplicate, start identify commonalities and potential conflicts in definitions. In the example above, you may decide to keep all duplicates if the company type is customer, while de-dupe all prospect companies based on company name regardless of address.
  5. Once you have mapped out all the rules, it is good to communicate them to anyone who touches the data to see if they see any red flags.
  6. When you finalize the definition of what makes a record a duplicate. Make sure you communicate it to all stakeholders so they are aware of it going forward. You are now ready to begin your de-duping process.

Next Steps: 

Now that you have a duplicate definition, you are ready to start your de-duping process of actually identifying duplicates. If you need help identifying duplicates in your CRM/Database feel free to reach out to StrategicDB at or 877-332-4923.