What seems easy and natural for a human to do is not so straightforward for a program to do. Humans have a seemingly uncanny ability to recognize when something doesn’t look quite right. Take, for example, the following names:
At a glance, the names listed above look common enough that one wouldn’t question them as being legitimate names. However, what happens if we make some slight changes to them?
|Dave Rubenstein||Dayv Roobinstine||Dacxve Rubqensxein|
|Mary Harper||Maeri Hopper||Mzaxry Harpqer|
|Peter Heinz||Piter Hines||Petexr Hefjinz|
|Chris Jones||Kris Joanz||Cvhris Jonxes|
In the table above, we encounter two distinct categories of misspellings in each example.
Column A is an example of where someone who was unfamiliar with the name perhaps tried to spell the name by sounding it out or perhaps offered an alternate spelling. Column B is an example of an outright misspelling where random characters or garbage, were embedded. If I we were to ask someone to simply examine the examples in columns A and B to identify which ones are not real names, then chances are that some of the names in column A may pass as possible real names.
However, when it comes to the names in column B the person may instinctively dismiss them as not being real names. This is because people have spent their entire lives learning to recognize character patterns, such as what a real name looks like, but when something doesn’t correspond with those patterns then it does not belong. While this type of pattern recognition might be straightforward for individuals when dealing with common names, what occurs when we adopt a global perspective?
What one considers to be a common name can vary depending on one’s cultural background. Below is a new list of names. Can you identify which names are real and which ones are not?
Were you able to determine which names were real and which ones were not? Well, as it turns out, all the names on the list are real. Here they are listed again, but this time the origins of the name are given. For some names, their localized script is included.
|Kahvecioglu||TURKISH AND BALKAN MUSLIM|
Let’s look at one last list. Again, see if you can determine if it is a real name or not.
|áo sơ mi|
Unless you can read different languages you may find that you are unable to determine if the name is real or not. This time around it turns out that none of them are real names. Instead, they all roughly translate the word for shirt in their own language.
|áo sơ mi||Vietnamese|
A service like the DOTS Name Validation is designed to recognize various name patterns, but unlike the average person it has a wealth of data readily available to help determine if a name is valid or not. However, it is important to note that no single data set will ever be one hundred percent complete and issues in translation and transliteration can lead to some discrepancies.
That’s why, at Service Objects, our commitment extends beyond just enhancing our data; we also make it a priority to gain insights into various cultural naming patterns. Patterns can be used to validate not only common names but also to identify uncommon names and distinguish them from bogus names and misspellings.