so_logo.png

From Dave to Dacxve: The Intricacies of Name Recognition

What seems easy and natural for a human to do is not so straightforward for a program to do. Humans have a seemingly uncanny ability to recognize when something doesn’t look quite right. Take, for example, the following names:

Dave Rubenstein

Mary Harper

Peter Heinz

Chris Jones

At a glance, the names listed above look common enough that one wouldn’t question them as being legitimate names. However, what happens if we make some slight changes to them?

Name Examples A B
Dave Rubenstein Dayv Roobinstine Dacxve Rubqensxein
Mary Harper Maeri Hopper Mzaxry Harpqer
Peter Heinz Piter Hines Petexr Hefjinz
Chris Jones Kris Joanz Cvhris Jonxes

 

In the table above, we encounter two distinct categories of misspellings in each example.

Column A is an example of where someone who was unfamiliar with the name perhaps tried to spell the name by sounding it out or perhaps offered an alternate spelling. Column B is an example of an outright misspelling where random characters or garbage, were embedded. If I we were to ask someone to simply examine the examples in columns A and B to identify which ones are not real names, then chances are that some of the names in column A may pass as possible real names.

However, when it comes to the names in column B the person may instinctively dismiss them as not being real names. This is because people have spent their entire lives learning to recognize character patterns, such as what a real name looks like, but when something doesn’t correspond with those patterns then it does not belong. While this type of pattern recognition might be straightforward for individuals when dealing with common names, what occurs when we adopt a global perspective?

What one considers to be a common name can vary depending on one’s cultural background. Below is a new list of names. Can you identify which names are real and which ones are not?

 

Aabadulagafafar
Aabakk
Barabhuyan
Boruboji
Borvde
Cytrinowicz
Dalakabhai
Dieltjens
Dierkx
Gbangbola
Gbehi
Gbiorczyk
Hepzibah
Jakabffy
Jakhhof
Kahvecioglu
Krzys
Kziazekm
Mikolajczuk
Radhinasapathi
Radhoobeer

 

Were you able to determine which names were real and which ones were not? Well, as it turns out, all the names on the list are real. Here they are listed again, but this time the origins of the name are given. For some names, their localized script is included.

 

NAME ORIGIN LOCAL SCRIPT
Aabadulagafafar WEST BENGAL আবদুলগফফার
Aabakk NORDIC
Barabhuyan ORISSA Barabhuyan
Boruboji TELUGU బోరుబోజ్జి
Borvde MARATHITU बोरवडे
Cytrinowicz CENTRAL EUROPE
Dalakabhai GUJERATI દલકાભાઈ
Dieltjens GERMAN/DUTCH/FLEMISH
Dierkx GERMAN/DUTCH/FLEMISH
Gbangbola WEST AFRICAN
Gbehi CENTRAL EUROPE
Gbiorczyk CENTRAL EUROPE
Hepzibah HEBREW
Jakabffy CENTRAL EUROPE
Jakhhof MUSLIM ASIAN
Kahvecioglu TURKISH AND BALKAN MUSLIM
Krzys POLISH
Kziazekm CENTRAL EUROPE
Mikolajczuk EASTERN EUROPE
Radhinasapathi PONDICHERRY ரதபதினசபாபதி
Radhoobeer HINDU रधूबीर

 

Let’s look at one last list. Again, see if you can determine if it is a real name or not.

 

衬衫
చొక్కా
gömlek
สื้อ
Πουκάμισο
Цамц
कमीज
Көйнөк
áo sơ mi
рубашка

 

Unless you can read different languages you may find that you are unable to determine if the name is real or not. This time around it turns out that none of them are real names. Instead, they all roughly translate the word for shirt in their own language.

 

Shirt Language
衬衫 Chinese
చొక్కా Telugu
gömlek Turkish
สื้อ Thai
Πουκάμισο Greek
Цамц Mongolian
कमीज Hindi
Көйнөк Kyrgyz
áo sơ mi Vietnamese
рубашка Russian

 

A service like the DOTS Name Validation is designed to recognize various name patterns, but unlike the average person it has a wealth of data readily available to help determine if a name is valid or not. However, it is important to note that no single data set will ever be one hundred percent complete and issues in translation and transliteration can lead to some discrepancies.

That’s why, at Service Objects, our commitment extends beyond just enhancing our data; we also make it a priority to gain insights into various cultural naming patterns. Patterns can be used to validate not only common names but also to identify uncommon names and distinguish them from bogus names and misspellings.