The Importance of Data Accuracy in Machine Learning

January 17, 2017

Imagine that someone calls your contact center – and before they even get to “Hello,” you know what they might be calling about, how frustrated they might be, and what additional products and services they might be interested in purchasing.

This is just one of the many promises of machine learning: a form of artificial intelligence (AI) that learns from the data itself, rather than from explicit programming. In the contact center example above, machine learning uses inputs ranging from CRM data to voice analysis to add predictive logic to your cu
stomer interactions. (One firm, in fact, cites call center sales efforts improving by over a third after implementing machine learning software.)

Machine learning applications nowadays range from image recognition to predictive analytics. One example of the latter happens every time you log into Facebook: by analyzing your interactions, it makes more intelligent choices about which of your hundreds of friends – and what sponsored content – ends up on your newsfeed. And a recent Forbes article predicts a wealth of new and specialized applications, including helping ships to avoid hitting whales, automating granting employee access credentials, and predicting who is at risk for hospital readmission – before they even leave the hospital the first time!

The common thread between most machine learning applications is deep learning, often fueled by high-speed cloud computing and big data. The data itself is the star of the process: for example, a computer can often learn to play games like an expert, without programming a strategy beforehand, by generating enough moves by trial-and-error to find patterns and create rules. This mimics the way the human brain itself often learns to process information, whether it is learning to walk around in a dark living room at night or finding something in the garage.

Since machine learning is fed by large amounts of data, its benefits can quickly fall apart when this data isn’t accurate. A humorous example of this was when a major department store chain decided (incorrectly) that CNBC host Carol Roth was pregnant – to the point where she was receiving samples of baby formula and other products – and Google targeted her as an older man. Multiply examples like this by the amount of bad data in many contact databases, and the principle of “garbage in, garbage out” can quickly lead to serious costs, particularly with larger datasets.

Putting some numbers to this issue, statistics from IT data quality firm Blazent show that while over two thirds of senior level IT staff intend to make use of machine learning, 60 percent lack confidence in the quality of their data – and 45 percent of their organizations simply react to data errors as they occur. Which is not only costly, but in many cases totally unnecessary: with modern data quality management tools, their absence is too often a matter of inertia or lack of ownership rather than ROI.

Truly unlocking the potential of machine learning will require a marriage between the promise of its applications and the practicalities of data quality. Like most marriages, this will involve good communication and clearly defined responsibilities, within a larger framework of good data governance. Done well, machine learning technology promises to represent another very important step in the process of leveraging your data as an asset.