Data standardization is the process of fixing data into a common format. Not all data can be standardized and it is different than data normalization, which focuses on the formatting of data rather than the actual data content. Below is what you need to know about data standardization.

Example of Data Standardization: 

For example, your country field has the following inputs:

  • United States, US, U.S., United States of America, USA, USofA all of these values should be standardized to United States or US
  • United Kingdom, UK, U.K., England, Scotland, Wales, ENG all of these values should be standardized to United Kingdom or UK.

What fields can you standardize: 

Any field which can be grouped can be standardized. However, if you are not planning to use the field for analytics, segmentation or any other use, the effort to standardize may not justify the need.

Typical fields to standardize: 

Geographic regions/states and countries are the most common forms of standardization. Other common fields include: industry, job title – function and level, company size, annual revenue ranges, household income ranges, type of record, and so on.

How to standardize: 

  1. First step is to pull all variations of the field into one list.
  2. Create a standardized list that you want to replace the original values with.
  3. Match the new standardized list to the current values, it is advisable to create a new standardized field as oppose to overwriting original values. For anything odd, you can place it in the “Other” bucket and come back to it once it reaches a certain percentage of the database.

Post Standardization: 

Now that your field is standardized, you need to make sure that any new data that is being created will have the standardized data. To do so, follow the following steps:

  1. Check any form submits to make sure the new standardized field is now a drop down and is not in a free text format.
  2. Check any systems which connect to your database to make sure that they also have a standardized list to select from. For example, if you standardized it in Marketo and your sales reps input data into Salesforce, make sure they have a drop down instead of text field.
  3. Create an automatic look-up table to classify any data that is being uploaded will be automatically standardized. You should also have ‘unclassified’ option for any new value where data was not categorized.
  4. Review the ‘unclassified’ records on a monthly or quarterly basis to ensure that you capture anything new into the your look-up table.

Field standardization can take up a long time, especially if you have thousands or millions of rows of data to go through and multiple fields. Machine learning techniques can help find similar information, however, it may still be best to leave data standardization to professionals.

StrategicDB is a boutique data cleansing agency, which can help with standardization for both small data sets and large. Contact us for a quote.