Data cleansing can include many different subsets. It can include data validation, data enhancement, data appending, de-duplication, data standardization/normalizations and data modification/manipulation. Data cleansing typically is performed when an issue has been identified, prior to launching a marketing campaigns, when migrating to a new system or simply prior to doing analysis. However, it is advisable to perform data cleansing on an on-going basis prior to the development of new issues. So what are some examples of data cleansing initiatives?
De-duping Records – there is nothing worst than having duplicates in your dataset for analysts it means double counting, for marketers it means potentially sending different promotions and messaging to the same person and for sales it means confusion on who handles the account, of course the list goes on and on. De-duping involves identifying duplicate records, some obvious ways to identify duplicates for contacts is if duplicate emails exists and for accounts/companies website can be a good way to identify if duplicates exist. Of course, there are other fields that should be considered when doing de-duping such as address, name, phone number and so on.
Data Enhancement – data enhancement refers to adding additional information to your data. Typically done prior to doing analysis or a marketing campaign. Examples of data enhancement includes: adding demographic information, psychographic information and company information that was not collected.
Data Appending – Data appending is similar to data enhancement where you are acquiring data but in terms of data appending typically it is referred to filling in missing information. Example of data appending includes: filling in missing phone numbers, addresses, appending emails, and so on.
Data Validation – Data validation is often used in data cleansing of historical data and often prior to very expensive direct mail campaigns. Typically data is validated for address, phone numbers and email addresses. Data validation can be on an individual basis such as this person lives in this address or general does the email address accept emails.
Data Standardized/Normalized – If your data is collected but is unusable due to multiple variations, you may need to standardize or normalize your data. Examples include breaking up titles into seniority and job function to make it simple to send emails to a specific audience, standardizing phone numbers to be in the same format so your auto dialler can easily dial calls and you may want to pull critical data from a text field by using data mining techniques.
Data Modification/Manipulation – If you are running machine learning algorithms your data should be in a format that you can use. In this case you may need to modify or manipulate your data to be in a specific format so it is usable. Example of data manipulation for machine learning includes: formatting dates, calculating length of time, formatting and grouping different fields and so on.
These are just some data cleansing projects that you may need to perform as either a one time task or on ongoing basis. If you are looking for a data cleansing company, StrategicDB can help!