When Name Data Goes Terribly Wrong

February 5, 2014

An Open Letter to Businesses,

Last week we learned of an extremely unfortunate event where a Fortune 500 company unintentionally mislabeled a piece of promotional mail with a very hurtful personal message. If you haven’t seen this article look here.

Big data is a land of opportunity, but unfortunately it is also polluted with typos and poorly inputted information. Marketers today rely on data from hundreds of sources to grow their business. These include compiled telephone books, census data, voter lists, and by rented mailing lists from other companies. As a general rule of thumb, list managers do a pretty good job removing data errors through address validation, but rarely do they look at the quality of the name and title.

Address validation does a very effective job weeding out and cleaning up the bad addresses. Modern address verification tools can append missing ZIP codes, fix common misspellings and even, in some cases, add a suite or apartment number. Address validation tools are inexpensive and come in various types (batch, real-time, standalone).

Name validation however, is far more complex. The distribution of name information is much larger than in the past, and the quality is much worse. Social media allows anyone to enter virtually any name without verification. As marketers dig deeper into their big data resources (like social media) they run the risk of embarrassing themselves, tarnishing their brand, and wasting their resources in the meantime.

The reason name validation is so complicated is because of variants. Take my given name “Geoffrey,” this is valid and known first name. However, there are at least a dozen variations of “Geoffrey” that include Jeff, Geoff, Jeffrey, Jeoff, Geoffroy… In order to weed out bogus names traditional ‘known names’ filters are obsolete. In order to filter out invalid names we need a list of more than 300 million names parsed and ranked by frequency of first and last name. Many IT departments think name quality is only about filter-out the vulgar words only (they have a good time writing the procedures to do so). While linguists have long debates about how to validate names, we take the middle road and use a list of stacked ranked collection of three million first name and 10 million last names. Proper name validation is critical for businesses because it’s vital to the customer relationship. Weeding out bogus names like “Miiike Wiiilson,” “Coffee Cup,” and “Slow Motion” is a difficult challenge and one we are not afraid of. Large datasets provide large insight, with enormous potential for knowledge.

Service Objects has been purifying name information for over a decade. Our goal and mission is to make sure every name in every database is accurate and up-to-date as it possible can be. We truly hope that no one has to suffer or be offended by misaddressed mail again.

Geoffrey W. Grow

Founder & CEO

PS: Above is an example of an offensive label I received a long time ago. I saved it because I was upset with the gross errors, and I am sure I never bought anything for this company. I’ve saved this label for over 20 years, so you can bet I’m passionate about name validation.