Six Ways to Cleanse your Data
How clean is your data? Do you have duplicate records or different abbreviations or spelling mistakes or fields containing unrelated information? Did you know that the state of your data can have a big impact on your Salesforce implementation and business reporting?
AAkonsult Payments has built-in features that help keep your data clean such as data matching, address look-ups and reconciliation. Your data will never get dirty again! But in the meantime we’re going to address common data problems highlighting their impact and how to remedy the problems. Let’s start with some basics.
Database Concepts
A database is a collection of related entities such as accounts, contacts and departments. As depicted below, each entity is represented by a table (or object) that has fields and values.
Each column represents a field attribute of the data base (eg name or email) and each row holds a single record with values for each field. Obviously, it’s important that the data is clean, uniform and consistent for accurate reporting and business intelligence.
Data integrity issues can occur for a variety of reasons. Let’s look at some common issues and how to fix them or ‘how to clean your dirty data’!
- Duplicate data
- Non-unique or missing keys
- Spelling mistakes and multiple abbreviations
- Incorrect formatting for specific fields
- Missing mandatory fields
- Multiple values in one field
One: Duplicate data
Duplicate data can occur when data merges in Excel go wrong or data is collected from multiple places or even just human error.
IMPACT:
Duplicate data will result in inaccurate reports. Duplicate data will be counted twice. Key information may be spread over the duplicated records. This means that no one record holds all the contact’s information. If you use the email functionality, the duplicate contacts will get duplicate emails.
REMEDY:
Duplicate records can’t just be deleted on mass as they contain vital information that doesn’t appear on the designated original record.
Commercial software tools are available to de-duplicate (often just called ‘dedup’) records. The duplicated records could also be extracted to a spreadsheet and merged manually or merged within the database itself
NB: Please make a copy of the data before you start to dedup.
Two: Non- unique or missing keys
Keys (or Unique Identifiers) are the foundation of a database system. Every record in a database should be able to be uniquely identified within the table or object that it resides.
IMPACT:
Like duplicate records, non-unique keys can cause a record to be double counted when reporting. If a key is missing the record may not be linked to other records within the database (this is referred to as an orphaned record). In the example below, John Doe is associated to the account ABC PL by the account’s ID number. If this number was missing from John’s contact record we would know which account he belonged to.
REMEDY:
Duplicate keys can be fixed by assigning new unique identifiers to the records that contain the duplicate key. Caution has to be taken to ensure that by changing a key you don’t break an existing connection to another record. In the above example, if two accounts had the ID of 2 and you decided to change the ID of account ABC PL to 3 you will fix the duplicate key issue for the account. However, John Doe’s connection to the account ABC PL is broken as his contact record still refers to the account ID of 2.
Editing keys can be very risky and should only be undertaken by someone who is familiar with database concepts and ideally has a good knowledge of the business and how the data related to each other.
NB: Please make a copy of the data before you start to change keys.
Three: Spelling Mistakes and Multiple Abbreviations
Unlike duplicate records and unique key issues that have the potential to disrupt the importation of data into Salesforce, spelling mistakes and multiple abbreviations will imported without error but become troublesome when used as a filter in reports.
IMPACT:
Any field that can have text manually entered into has the potential to cause filter problems in reports. For example, if you have a field called state on your contact record that can be edited manually you have the potential for the following to values to represent Victoria:
- Victoria
- Vic
- vic.
- Vitoria
If you now want to run a report on all the contacts in Victoria you have to ensure that all these values are accounted for in the report or you will miss someone.
REMEDY:
If you can export the records to excel and sort by the column that has the spelling mistakes or multiple abbreviations you change the values to be consistent.
AAkonsult Payments has an address look-up facility that ensures consistent data entry so that this problem will not occur.
NB: Please make a copy of the data before you start to change any values.
Four: Incorrect formatting for specific fields
Many fields have specific formatting requirements such as websites, emails, phone numbers, postcodes.
IMPACT:
Emails addresses that are incorrectly formed will cause an error when being imported into Salesforce. Incorrectly formed phone numbers, postcodes and websites can be imported successfully into Salesforce but the usefulness of this data will be affected.
REMEDY:
If you are familiar with MS Excel formulas you can use them to identify field values that are incorrectly formed and take action to fix them.
NB: Please make a copy of the data before you start to change field values.
Five: Missing mandatory fields
Salesforce needs certain fields to be populated to ensure that the built in functionality of the system can be assured.
IMPACT:
An example of this is that each contact must be associated to one account. Some custom objects created in Salesforce may also have mandatory fields to ensure that vital business information is captured or custom processes can operate as expected.
REMEDY
If a mandatory field is missing a value, it will need to be added. If a value is not known, you can use a consistent default value to complete this field. An example of this can occur if an organisation has a self-service newsletter subscription list. The subscriber may only supply their first name. Salesforce requires contacts to have a last name. If the last name is not known a value of “[Unknown]”, or something similar, can be used to complete the mandatory field.
NB: Please make a copy of the data before you start to change field values.
Six: Multiple values in one field
Database design best practice dictates that one field should contain one “piece” of information related to that field.
IMPACT:
For example, an address field that contains a value of “123 Main Street, Melbourne, 3000, Victoria, Australia” actually contains 5 pieces of information. It would be very hard in this situation to create a report of all contacts in Victoria. A better design would be to have a field for street, suburb, post code, state and country. All values in a field should contain information related only to that field. For example, the postcode field should only contain postcodes. Putting a state or other value in the postcode field decreases the usefulness of this field for reporting or any mail merging you may want to do
REMEDY
A tool like MS Excel can be used to split data into multiple fields and identify information that doesn’t relate to the field
NB: Please make a copy of the data before you start to change field values.
We trust this post will help you clean your data so that all your future reports in Salesforce will be accurate and useful in your organisation.