so_logo.png

The Trouble with Numeric and Fake-looking Chinese Email Addresses

If you were to encounter an email address that was comprised of just numbers, what would be your first reaction? You might suspect that it was a fake or disposable email address. But in some countries, such as China, this isn’t necessarily the case. In this blog post, we will take a deeper dive into when to be cautious about email addresses from China.

Obviously fake email addresses… right?

For example, let’s randomly type in some numbers.

  • 6843619
  • 1684154646514
  • 735416442
  • 94633252361

If we were to use these numbers as an email address with a company domain like @serviceobjects.com or even a free email provider like @gmail.com, to create something like 6843619@serviceobjects.com. Most likely, you would dismiss it as being garbage, fake or just simply bad. However, what if we instead used one of the following domains?

  • 126.com
  • 139.com
  • 163.com

And created something like 6843619@123.com. Now you might be thinking, “That’s even worse! Even the domains are all numbers now. Those are obviously fake email addresses. I’m absolutely positive.”

“Positive I tell you!”

OK, fine. I would agree. It looks fake to me too.

Now, what if we instead applied those numbers to the domain qq.com,  to get this, 6843619@qq.com? Would you still think it was an ‘obviously fake email address’?

Maybe not so ‘obviously fake email address’

In China, all-numeric email addresses are very common. If you made your way to this blog article, then chances are you have encountered one or more numeric email addresses that turned out to be genuine when you may not have expected them to be. For example, the domains noted above, 126.com, 139.com, and 163.com, are not fake. They are real domains with valid Mail Exchange (MX) records that point to real mail servers for handling real email communication.

You might be more familiar with the domain qq.com, particularly if you work in international business and/or marketing.

QQ, which is owned by the Chinese tech giant Tencent, is a messaging application similar to Skype. In China and parts of Asia, qq.com is like what gmail.com, yahoo.com or outlook.com are to the US in terms of providing email, messaging and communication services. In fact, in 2014, QQ was recognized by Guinness World Records for having the most simultaneous users on an instant messaging platform with more than 200 million simultaneous users and over 800 million Monthly Active Users (MAU).

All of these QQ users have a qq.com email address, and all QQ accounts have a numeric email address.

But why numbers for an email address?

Numbers aren’t that hard to memorize. Most people have several phone numbers memorized, maybe a bank account or two, or perhaps a combination lock at their local gym. However, there is something impersonal and dissociative about numbers. A random number, like 845796833, doesn’t really tell you much like say, Support@ or ILuvKittens@ or ImBatman@ or just having a plain old name as an email. So, what’s so different about China that makes numbered email addresses so popular?

Well, there is an interesting article from The New Republic that tries to shed some light on the subject. It brings up an interesting notion that suggests that numbers, when used as homonyms for the Chinese language, can be used to more quickly and easily spell out Chinese words. One example from the article is where the numbers 5 and 1 in Chinese sound like the words “I” “want”, which helps explain why a job-hunting web site would choose 51Job.com for their domain. In Chinese, 5-1-Job would mean “I want Job”. Cute.

The meaning behind numbered emails can go beyond simple homonyms, however. The article calls it a “numbered-based slang,” and here is one example that I think helps explain the idea. Quoting the article:

“The Internet company NetEase uses the web address 163.com—a throwback to the days of dial-up when Chinese Internet users had to enter 163 to get online.”

They go on to state that 163 is not a homonym for anything, but is instead a throwback reference. A similar example would be the 411.com search engine website. 411.com is a throwback to when people in the US would dial 4-1-1 for information (as opposed to now where most people simply ‘google’ to search for information).

More Than Just Numbers

Slang in any language can be very complicated, and staying well-informed on the subject matter to understand its meaning is not easy. Technical slang takes this complexity to a whole new level. Take for example this surprisingly common password, “ji32k7au4a83”. One would think that this seemingly complicated password would be quite rare if not unique; however, it turns out it’s not. As the article in the link points out, the password “ji32k7au4a83” can be translated to mean “my password” in English.

This is how it breaksdown:

ji3 -> 我 -> M

2K7 -> 的 -> Y

au4 -> 密 -> PASS

a83 -> 碼 -> WORD

The article details how a major Chinese transliteration system can be creatively used to map English to Chinese to Unicode and vice-versa. This process can be used to come up with some very complicated looking email addresses and not just passwords.

It would not be a stretch to say that the process bears some resemblance 1337 Speak (Leet Speak). Take the previously mentioned “ImBatman” email example. One leet interpretation of it would be “1mb47m4n”. The result appears similarly nonsensical and complicated, wouldn’t you say? However, the problem with verifying Chinese email addresses goes beyond superficial, fake-looking mailboxes and domains.

Disposable email addresses are easier to create

Let’s circle back to the widely popular QQ application, and the all-numeric qq.com email addresses. When a user registers for a QQ account they are given a QQ ID number, and this number is also their QQ email address. This ID number can be bound to another email address, so instead of giving someone your actual email, you just give them your QQ number. It’s a nice feature. Unfortunately, it is easy for users to create disposable accounts with QQ and bind them to their real email address. These disposable accounts are commonly used by bots, often created for or by Chinese vendors trying to push their products via spam.

This can lead to some false-negatives when validating email addresses. It is not uncommon to receive a business email address with a qq.com domain and for it to end up going bad. The qq.com domain and some of their IP addresses tend to accumulate bad sender reputations due to the large amounts of spam abuse, as mentioned above. Spam and abuse are not just a problem for qq.com, unfortunately, malicious internet activity is very common in China and Chinese service providers struggle with the problem.

Countries with malicious networks or spam saturation: Use Caution

If you were to search for the countries with the worst spam or malicious networks, you would likely find the following result.

Countries with the worst spam/malicious networks

  1. United States
  2. China
  3. Russia

SPAMHAUS lists the worst spam enabling countries and Country IP Blocks (CIPB) lists countries with the most malicious networks, and both lists come back with the same top three countries in the same order. On both lists, the US is the worst offending country of all. Surprised?

CIPB also re-orders their top ten list by the number of malicious networks as a percentage of the total number of networks for the given country. Here is their re-organized list.

Countries with the most infected networks*

  1. Brazil 89%
  2. Turkey 54%
  3. Romania 39%
  4. China 32%
  5. Russia 11%
  6. United Kingdom 11%
  7. Japan 10%
  8. Ukraine 9%
  9. Germany 6%
  10. United States 6%

*Results are based on CIPB’s current top 10 countries with the most malicious networks.

Another CIPB top ten list places China as the current world leader in malicious internet activity. Brazil and Russia take second and third place respectively. The US is not on the list.

SPAMHAUS’ list of the 10 Worst Botnet Countries

  1. India
  2. China
  3. Vietnam
  4. Iran
  5. Thailand
  6. Brazil
  7. Indonesia
  8. Pakistan
  9. Algeria
  10. Russia

Overall, the real issue with trying to verify email addresses from China is not how they look complicated and fake, but that the country is a hot bed for malicious activity. Just because an email address is deliverable, doesn’t mean that it is good or safe. In some cases, it would not be surprising to see one out of three email addresses from China turn out to be a bot and/or disposable.

How Email Validation can help

So how can you differentiate between, say, a legitimate alphanumeric email address that looks suspicious versus a spambot? Our DOTS Email Validation product can help you navigate some of the challenges and complexities of email data quality, particularly for contact or marketing with international addresses.

Our Email Validation service tests emails at multiple distinct levels.

  • First, of course, we check for basic syntax errors, common domain typos and perform a DNS or domain name check to make sure the domain exists and has a valid MX record.
  • We also perform a comprehensive SMTP check by communicating directly with the target mail server to determine three key pieces of information; is the server working, will it accept any address and will it accept a specific address.
  • Finally, we perform multiple integrity checks to see if the email address is associated with problematic addresses and services like; spam-traps, known disposable address providers and blacklisted servers.

Ultimately determining if the email address is a real, functioning email address.

Circling back to the Chinese email addresses we discussed earlier: our Email Validation service can validate these with no problem, but clients often get confused when these emails get a low score. We verify that they are deliverable, but give them a low score because of problems such as being bots or malicious. It is then up to you to decide whether you want to take the risk of using these email addresses or not. So in closing, understand that numerical or nonsensical emails from other countries are often OK is a good first step, but automated validation can help you make an informed decision on whether to use them.