Data Deduplication Tool

Clean data is a fundamental part of any database, dataset or CRM. Regardless of what percentage of your data is duplicated, it is important to de-dupe. Our tool allows you to identify duplicates based on your own rules. The tool does not stop there, we also identify which record should be selected as the surviving record (master) based on your own business rules. We provide you with the confidence level of the match as well as data completeness per each record.

We let you preview our capabilities by showing you the first 50 rows of duplicates prior to you completing your purchase. Try it today!

Steps to Identify Duplicates and Establish a Surviving Record

Step 1: Normalize

Data Deduplication Tool allows you to automatically normalize the fields in order to increase chances of finding duplicate matches. The following fields can be normalized:

  1. Website – only the domain name will remain. For example: https://www.strategicdb.com , www.strategicdb.com, strategicdb.com, strategicdb.com/deduplication will all become/will transform into strategicdb.com
  2. Company Name – the name is normalized based on the business structure. For example: StrategicDB Corporation and StrategicDB Corp. will all be normalized to StrategicDB Corporation.
  3. Address – States in US and Provinces in Canada as well as country names can be normalized to help identify and select the master record. For example: United States of America, USA, United States and US will become United States.

This is the first step in your data de-duplication process.

Step 2: Select Rules to Identify Duplicates

Input your rules to identify duplicates. You can combine multiple fields to increase the accuracy of your duplicates.

Common fields used to identify duplicates on account level may include:

  • Website Normalized & Country Name
  • Company Name Normalized & Country Name
  • Address
  • Phone #
  • DUNS # or Data.com ID

The most common way to identify duplicates on a contact or lead level includes:

  • E-Mail
  • First and Last Name
  • Last Name & Company Name Normalized
  • Phone # (If it’s personal)

Regardless of the hierarchy of selection, the de-duping tool will identify all possible duplicates. For example, duplicate group numbered 123 could have 3 records of data: two (Record 1 & 2) are matched on website and country, and records 1 & 3 are matched based on phone #. They will all be combined into the same duplicate group.

Step 3: Identify the Surviving (“Master”) Record

You can use different rules based on hierarchy to identify which record will be the master record in other words the surviving record. Please note, that the selection is done based on hierarchy with the final selection of  a “Master” record done randomly if none of the rules was applied to a specific duplicate group.

Examples of rules for Accounts include:

  • Account Record Type: If you have a list of both customers and prospects you will want to make sure that records representing  “customer” are always marked as a “Master”. Please note, that in the event there are two records in the identical duplicate group that are “Customer”, one will be marked as Merge. Therefore, you should always double check the list prior to making any live system updates.
  • # of Opportunities: The account record with the most opportunities can be identified as a “Master”.
  • # of Contacts associated with the accounts: the account with the most number of contacts can take priority.
  • Latest Date: The account that was created last and therefore in theory should have the latest data can be marked as “Master”.
  • Data Completeness: The account with the highest data completeness would be marked as a “Master”.

Examples of rules for Contacts or Leads can include:

  • Status: If the contact is “Active” it should be marked as a “Master”.
  • Deliverability Status: If the email is in a good standing it will be marked as a “Master”.
  • MQL Score: If you are scoring leads, the lead with the highest score would be marked as a “Master”.
  • Data Created: The latest lead added to the system would be marked as a “Master”.
  • Data Completeness: The file with the most complete data should be marked as a “Master”

Please note, it is important to correspond the “Master”/”Merge” rule to your desired business rules. Therefore, fields that you plan to base your decision on must be included in the file that you are planning to upload.

Final Step: Review File

Once you are satisfied with your selection and have previewed the first 50 rows of data, it is time to complete the order form and download the final file. Here is what you can expect to see appended as new columns at the end of your original file. You will see the following fields:

  • All fields that you selected to normalize.
  • Duplicate Group ID: your unique identifier for your duplicate group.
  • Confidence level: is calculated based on the number of records that matched exactly to your de-duplication selection criteria. Please note, that the lowest confidence level is used for each duplicate group.
  • Total Completeness: % of completed fields that were part of the file.
  • “Master”/”Merge”: Based on your master selection your records will be marked as either “Master” OR ”Merge”. Please note, there will be only one “Master” per each duplicate group.
  • Manual Review: there will be a flag that will notify you to pay special attention to those duplicate groups. It will be marked to review if: the total number of records per each duplicate group is greater than 5 and the duplicate groups’ confidence level is lower than 50%.

Prior to making any live system changes, please review all records to make sure that you are comfortable with your selection and your “Master”/”Merge” rules.

Pricing

The cost per record processed is $0.10 US.

  • Running more than 1 Million Records – If you need to de-duplicate large data sets please contact us for discounted pricing.
  • Need to find duplicate with multiple files – We can find duplicates using multiple files, if that is the case please contact us so we can process it for you.

Not For Profit

StrategicDB offers special pricing for Not for Profit organizations. Please contact us to have us run the de-duping tool for you with a special discount.

F.A.Q

What Form of Payment do you accept?
Can I run the file multiple times and change my selection?
How can I be sure that it is a duplicate?
Do you offer De-duplication Services?
I am looking for full data cleaning services?
How secure is your service?
How can I be satisfied with your de-duping tool?
How do I know what fields to pick for my business rules?
Do you have APIs to Salesforce or other tools?
Why should I use StrategicDB's Deduping Tool as opposed to others?
Who to Contact if there is an issue?
Other Information
What Form of Payment do you accept?

Currently we take payment using PayPal, all currency is in US. Should you wish to pay by wire transfer or Cheque, please contact us at hello@strategicdb.com prior to uploading the file.

Can I run the file multiple times and change my selection?

You have unlimited number of changes you can make, to the original file that you have uploaded assuming you are in the current session. Please note, that you should download the selection that you have made prior to downloading the file.

How can I be sure that it is a duplicate?

The system will flag identical records based on your selection criteria. Due to the high risk of merging or deleting duplicate records we do not currently use fuzzy logic to ensure the highest probability of a duplicate. We also provide you with the confidence level that you can use to double or triple check the file prior to making live system updates.

Do you offer De-duplication Services?

Yes, should you wish to have us run the de-duplication for you and have an extra manual verification process, the cost is $0.20 per record. Please contact us at hello@strategicdb.com to get started.

I am looking for full data cleaning services?

StrategicDB is a full service data cleaning company. De-duping is usually one of the main steps of data cleaning. Our other services include: data normalization/standardization, data enhancement/appending and/or validation using third party data as well as helping with other data cleaning initiatives such as parent/child relationships, data governance consulting and so on. Please contact us at hello@strategicdb.com and we will be happy to help.

How secure is your service?

We offer SSL encryption. All communication between you and our server are encrypted using the most modern security standards (SSL/TLS 1.2).

In addition, your data will be deleted 14 days post file upload.

How can I be satisfied with your de-duping tool?

Prior to paying for your file, you will have the option to download a sample of 50 rows of duplicates that we have identified based on your rules. We strongly recommend that you download the file and go through each row to determine that you are satisfied with your selection criteria.

Your data will be maintained on our server for 14 days, in which you may re-process it with different business rules. Since you will be charged if you upload a new file, we highly recommend that you make sure all fields are available on the original file.

Should you have any issues, please do not hesitate to contact us at hello@strategicdb.com and we will try our best to resolve any issues you may encounter.

How do I know what fields to pick for my business rules?

We have outlined above some sample rules that are most commonly used among businesses to help you get started. Should you require a consultation, we will be more than happy to run de-duping for you, the price per record would be $0.20.  Please contact us at hello@strategicdb.com

Do you have APIs to Salesforce or other tools?

Our tool does not connect to Salesforce or any other tool for the simple reason that we want you to have control over your live data. We strongly advise to double check data prior to making a decision on weather to de-dupe or not. In certain situation, you may choose to keep a duplicate record. Some examples include:

  • Both records are “customers” or one record is a “customer” and another is a “partner” and you want to keep them separate.
  • You are selling to the same company but to two different departments.
  • You have multiple roles for the same person and wish to keep it that way.

Should you require someone to make live system changes for you, we are more than happy to assist. Please email us at hello@strategicdb.com

Why should I use StrategicDB's Deduping Tool as opposed to others?

Our tool allows you the freedom to select your own de-duplication rules and your own “Master”/”Merge” rules. We also have the ability to normalize Country, States and Website, which improve match rates. By not being tied down to one CRM or de-duping only based on a few fields we are able to handle all types of data based on your own business rules and requirements.

Who to Contact if there is an issue?

Feel free to email us at hello@strategicdb.com and we will try to get back to you within 24 hours. Our working hours are 10am to 6pm EST Monday – Friday.

Other Information

Special Characters: Please note, that when opening the file in excel some names with special characters may be overwritten in the process. Please do not update address or company name with the normalized data. Country and state should be fine.

Languages: Currently we support only English as a language for normalization. However, Latin languages should work on the de-duping tool but that have not been tested. If there are issues please let us know at hello@strategicdb.com.

Processing Times: It should not take more than a few minutes to run the process. However, for large files there maybe a time delay. If you run into any issues please contact us at hello@strategicdb.com

Location: StrategicDB is located in the Greater Toronto Area in Canada. Our servers are located in the United States.

Just a Few of Our Clients

Menu