Improved Vulgar Word Detection in Email Validation Service

September 19, 2014

In our ongoing efforts to enhance our email verification services, we’ve recently updated the way that our service assesses a vulgar word within in email addresses. Our email verification service now more accurately catches vulgar words, and does not flag legitimate words or names with embedded profanity.

What exactly is vulgar word detection, and why is it important to email verification?
DOTS Email Validation service performs dozens of validity tests on an email address, and the results of which are used to calculate an overall integrity score for the email address. One of these tests is for vulgar words, which if found, produces a warning in the output code. Vulgarity (along with bogus or celebrity names, an unregistered DNS, and others) is a strong sign that the provided email address is not valid – and not worth contacting.

When running your existing contact database through email validation, the vulgar word detection function can help you weed out fake email addresses, and conversely keep words that may have a shorter vulgar word embedded but not meant as harmful. With real-time validation you can prevent fake email addresses from entering and cluttering your database, and even prompt users to submit real, valid email addresses if they initially try to go rogue.

There is a flip-side to looking for vulgarity in email addresses. Finding a set of characters that one might defined as vulgar does not always indicate a bad or fake email address. There are many dictionary words and common names which may appear to be indecent, but, taken in their entirety, are actually completely appropriate. The new vulgar word detection feature in DOTS Email Validation references our proprietary database of millions of verified dictionary words, first names, and last names, to differentiate valid names and words (which orthographically contain vulgarity), from actual obscene expressions. As a result of accurately distinguishing these cases, much less false positives are returned in the vulgar word check.

Here are some examples of how the new algorithm scores inputs:

Scenario	Input example	Vulgar flag?	Reasoning
Email address containing a vulgar word which is part of another, valid word	scrapheap@wal-mart.com	No	Contains “crap” but is actually part of the acceptable word “scrap”
Email address containing same vulgar word, which is not syntactically contained in another word	xXcrapXx@wal-mart.com	Yes	The only real word found was in fact vulgar
Email address using a name that contains a vulgar word	AndreaCrapo@wal-mart.com	No	“Crap” embedded, but “Crapo” is a popular last name
Email address using a name that explicitly contains a vulgar word	MikeCrap@wal-mart.com	Yes	Vulgar word found, no real name found

Now you can be assured that using DOTS Email Validation will properly reject obscene email addresses from your contact database, and accept those that are surprisingly harmless.

Give the service a try free of charge and tell us what you think!