Posts Tagged ‘Big Data’

A Daisy Chain of Hidden Customer Data Factories

I published the provocatively-titled article, Bad Data Costs the United States $3 Trillion per Year in September, 2016 at Harvard Business Review. It is of special importance to those who need prospect/customer/contact data in the course of their work.

First read the article.

Consider this figure: $136 billion per year. That’s the research firm IDC’s estimate of the size of the big data market, worldwide, in 2016. This figure should surprise no one with an interest in big data.

But here’s another number: $3.1 trillion, IBM’s estimate of the yearly cost of poor quality data, in the US alone, in 2016. While most people who deal in data every day know that bad data is costly, this figure stuns.

While the numbers are not really comparable, and there is considerable variation around each, one can only conclude that right now, improving data quality represents the far larger data opportunity. Leaders are well-advised to develop a deeper appreciation for the opportunities improving data quality present and take fuller advantage than they do today.

The reason bad data costs so much is that decision makers, managers, knowledge workers, data scientists, and others must accommodate it in their everyday work. And doing so is both time-consuming and expensive. The data they need has plenty of errors, and in the face of a critical deadline, many individuals simply make corrections themselves to complete the task at hand. They don’t think to reach out to the data creator, explain their requirements, and help eliminate root causes.

Quite quickly, this business of checking the data and making corrections becomes just another fact of work life.  Take a look at the figure below. Department B, in addition to doing its own work, must add steps to accommodate errors created by Department A. It corrects most errors, though some leak through to customers. Thus Department B must also deal with the consequences of those errors that leak through, which may include such issues as angry customers (and bosses!), packages sent to the wrong address, and requests for lower invoices.

The hidden data factory

Visualizing the extra steps required to correct the costly and time consuming data errors.

I call the added steps the “hidden data factory.” Companies, government agencies, and other organizations are rife with hidden data factories. Salespeople waste time dealing with erred prospect data; service delivery people waste time correcting flawed customer orders received from sales. Data scientists spend an inordinate amount of time cleaning data; IT expends enormous effort lining up systems that “don’t talk.” Senior executives hedge their plans because they don’t trust the numbers from finance.

Such hidden data factories are expensive. They form the basis for IBM’s $3.1 trillion per year figure. But quite naturally, managers should be more interested in the costs to their own organizations than to the economy as a whole. So consider:

There is no mystery in reducing the costs of bad data — you have to shine a harsh light on those hidden data factories and reduce them as much as possible. The aforementioned Friday Afternoon Measurement and the rule of ten help shine that harsh light. So too does the realization that hidden data factories represent non-value-added work.

To see this, look once more at the process above. If Department A does its work well, then Department B would not need to handle the added steps of finding, correcting, and dealing with the consequences of errors, obviating the need for the hidden factory. No reasonably well-informed external customer would pay more for these steps. Thus, the hidden data factory creates no value. By taking steps to remove these inefficiencies, you can spend more time on the more valuable work they will pay for.

Note that very near term, you probably have to continue to do this work. It is simply irresponsible to use bad data or pass it onto a customer. At the same time, all good managers know that, they must minimize such work.

It is clear enough that the way to reduce the size of the hidden data factories is to quit making so many errors. In the two-step process above, this means that Department B must reach out to Department A, explain its requirements, cite some example errors, and share measurements. Department A, for its part, must acknowledge that it is the source of added cost to Department B and work diligently to find and eliminate the root causes of error. Those that follow this regimen almost always reduce the costs associated with hidden data factories by two thirds and often by 90% or more.

I don’t want to make this sound simpler than it really is. It requires a new way of thinking. Sorting out your requirements as a customer can take some effort, it is not always clear where the data originate, and there is the occasional root cause that is tough to resolve. Still, the vast majority of data quality issues yield.

Importantly, the benefits of improving data quality go far beyond reduced costs. It is hard to imagine any sort of future in data when so much is so bad. Thus, improving data quality is a gift that keeps giving — it enables you to take out costs permanently and to more easily pursue other data strategies. For all but a few, there is no better opportunity in data.

The article above was originally written for Harvard Business Review and is reprinted with permission.

In January 2018, Service Objects spoke with the author, Tom Redman, and he gave us an update on the article above, particularly as it relates to the subject of data quality.

According to Tom, the original article anticipated people asking, “What’s going on?  Don’t people care about data quality?”

The answer is, “Of course they care.  A lot.  So much that they implement ‘hidden data factories’ to accommodate bad data so they can do their work.”  And the article explored such factories in a generic “two-department” scenario.

Of course, hidden data factories take a lot of time and cost a lot of money, both contributing to the $3T/year figure.  They also don’t work very well, allowing lots of errors to creep through, leading to another hidden data factory.  And another and another, forming a sort of “daisy chain” of hidden data factories.  Thus, when one extends the figure above and narrows the focus to customer data, one gets something like this:

I hope readers see the essential truth this picture conveys and are appalled.  Companies must get in front on data quality and make these hidden data factories go away!

©2018, Data Quality Solutions

Is Your Data Quality Strategy Gold Medal Worthy?

A lot of you – like many of us here are Service Objects – are enjoying watching the 2018 Winter Olympics in Pyeongchang, Korea this month. Every Olympics is a spectacle where people perform incredible feats of athleticism on the world stage.

Watching these athletes reminds us of how much hard work, preparation, and teamwork go into their success. Most of these athletes spend years behind the scenes perfecting their craft, with the aid of elite coaches, equipment, and sponsors. And the seemingly effortless performances you see are increasingly becoming data-driven as well.

Don’t worry, we aren’t going to put ourselves on the same pedestal as Olympic medalists. But many of the same traits behind successful athletes do also drive reliable real-time API providers for your business. Here are just a few of the qualities you should look for:

The right partners. You probably don’t have access to up-to-the-minute address and contact databases from sources around the world. Or a database of over 400 million phone numbers that is constantly kept current. We do have all of this, and much more – so you can leverage our infrastructure to assure your contact data quality.

The right experience. The average Olympic skater has invested at least three hours a day in training for over a decade by the time you see them twirling triple axels on TV, according to Forbes. Likewise, Service Objects has validated nearly three billion transactions since we were founded in 2001, with a server uptime reliability of 99.999 percent.

The right strategy. In sports where success is often measured in fractions of a second, gold medals are never earned by accident: athletes always work against strategic objectives. We follow a strategy as well. Our tools are purpose-built for the needs of over 2500 customers, ranging from marketing to customer service, with capabilities such as precise geolocation of tax data, composite lead quality scores based on over 130 criteria, or fraud detection based on IP address matching. And we never stop learning and growing.

The right tools. Olympic athletes need the very best equipment to be competitive, from ski boots to bobsleds. In much the same way our customers’ success is based around providing the best infrastructure, including enterprise-grade API interfaces, cloud connectors and web hooks for popular CRM, eCommerce and marketing automation platforms, and convenient batch list processing.

The right support. No one reaches Olympic success by themselves – every athlete is backed by a team of coaches, trainers, sponsors and many others. We back our customers with an industry-leading support team as well, including a 24×7 Quick Response Team for urgent mission-critical issues.

The common denominator between elite athletes and industry-leading data providers is that both work hard to be the best at what they do and aren’t afraid to make big investments to get there. And while we can’t offer you a gold, silver, or bronze medal, we can give you a free white paper on how to make your data quality hit the perfect trifecta of being genuine, accurate and up-to-date. Meanwhile, enjoy the Olympics!

Unique US ZIP and Canadian Postal Codes – Now Available for Download

Customer Service Above All.  It is one of our core values here at Service Objects. Recently, we’ve received several requests for a list of the unique zip codes throughout the US and Canada. By leveraging our existing services, we’ve made this happen. We are now offering both the US and Canada list as a free downloadable resource.

So why is Service Objects providing this data? Our goal is to provide the best data cleansing solutions possible for our clients. Part of this means using our existing data to provide our users with the data they need. While other data providers might charge for this type of information, we’ve decided to make it freely available for anyone’s use. These files can be used for several purposes, such as pre-populating a list of cities and states for a form where a user needs to enter address information. The County and State FIPS information is widely used in census and demographic data or could be used to uniquely identify States and counties within a database. Additionally, the given time zone information can be used to determine appropriate times to place calls to a customer.

Where to download

allows you to access a .zip file containing two CSV records. One CSV contains the US information, the other is for Canada. The files indicate the month and year the records were created. Toward the middle of each month, the data in each record will be updated to account for any changes in US and Canadian postal codes.

What other information is in the files?

Both files will have postal codes, states (or provinces for Canada) and time zone information. The Canadian zip code file will be much larger in size with over 800K records. This is due to Canadian Postal Codes generally being much smaller than US Postal codes. Where a US postal code can sometimes encompass multiple cities or counties, a Canadian postal code can be the size of a couple city blocks or in some cases a single high-rise building.

The US file has information for all United States postal codes including its territories. This file will also include the county that the zip code lies in. There will be County and State FIPS numbers for each of the records to help with processing that information as well.  The US file will be considerably smaller than the Canadian file at only 41K records.

In making these files freely accessible, our hope is to make the integration and business logic easier for our users. If you’d like to discuss your particular contact data validation needs, feel free to contact us!

Don’t Let Bad Data Scare You This Halloween

Most of us here in North America grew up trick-or-treating on Halloween. But did you know the history behind this day?

In early Celtic culture, the feast of All Hallows Eve (or Allhallowe’en) was a time of remembering the souls of the dead – and at a more practical level, preparing for the “death” of the harvest season and the winter to follow. People wore costumes representing the deceased, who by legend were back on earth to have a party or (depending upon cultural interpretation) cause trouble for one last night, and people gave them alms in the form of soul cakes – which evolved to today’s sweet treats – to sustain them.

So what were people preparing for in celebrating Halloween? Good data quality, of course. Back then, when your “data” consisted of the food you grew, people took precautions to protect it from bad things by taking the preventative measure of feeding the dead. Today, Halloween is a fun celebration that actually has some important parallels for managing your data assets. Here are just a few:

An automated process. The traditions of Halloween let people honor the dead and prepare for the harvest in a predictable, dependable way. Likewise, data quality ultimately revolves around automated tools that take the work – and risk – out of creating a smooth flow of business information.

Organizational buy-in. Unlike many other holidays, Halloween was a community celebration fueled by the collective efforts of everyone. Every household took part in providing alms and protecting the harvest. In much the same way, modern data governance efforts make sure that all of the touch points for your data – when it is entered, and when it is used – follow procedures to ensure clean, error free leads, contacts and e-commerce information.

Threat awareness. Halloween was designed to warn people away from the bad guys – for example, the bright glow of a Jack-o-lantern was meant to keep people away from the spirit trapped inside. Today, data quality tools like order and credit card BIN validation keep your business away from the modern-day ghouls that perpetrate fraud.

An ounce of prevention. This is the big one. Halloween represented a small offering to the dead designed to prevent greater harm. When it comes to your data, prevention is dramatically more cost- effective than dealing with the after-effects of bad data: this is an example of the 1-10-100 rule, where you can spend one dollar preventing data problems, ten dollars correcting them, or $100 dealing with the consequences of leaving them unchecked.

These costs range from the unwanted marketing costs of bad or fraudulent leads to the cost in lost products, market share and customer good will when you ship things to the wrong address. And this doesn’t even count some of the potentially big costs for compliance violations, such as the Telephone Consumer Protection Act (TCPA) for outbound telemarketing, the CAN-SPAM act for email marketing, sales and use tax mistakes, and more.

So now you know: once upon a time, people mitigated threats to their data by handing out baked goods to people in costumes. Now they simply call Service Objects, to implement low-cost solutions to “treat” their data with API-based and batch-process solutions. And just like Halloween, if you knock on our door we’ll give you a sample of any of our products for free! For smart data managers, it’s just the trick.

The Talent Gap In Data Analytics

According to a recent blog by Villanova University, the amount of data generated annually has grown tremendously over the last two decades due to increased web connectivity, as well as the ever-growing popularity of internet-enabled mobile devices. Some organizations have found it difficult to take advantage of the data at their disposal due to a shortage of data-analytics experts. Primarily, small-to-medium enterprises (SMBs) who struggle to match the salaries offered by larger businesses are the most affected. This shortage of qualified and experienced professionals is creating a unique opportunity for those looking to break into a data-analysis career.

Below is some more information on this topic.

Data-analytics career outlook

Job openings for computer and research scientists are expected to grow by 11 percent from 2014 to 2024. In comparison, job openings for all occupations are projected to grow by 7 percent over the same period. Besides this, 82 percent of organizations in the US say that they are planning to advertise positions that require data-analytics expertise. This is in addition to 72 percent of organizations that have already hired talent to fill open analytics positions in the last year. However, up to 78 percent of businesses say they have experienced challenges filling open data-analytics positions over the last 12 months.

Data-analytics skills

The skills that data scientists require vary depending on the nature of data to be analyzed as well as the scale and scope of analytical work. Nevertheless, analytics experts require a wide range of skills to excel. For starters, data scientists say they spend up to 60 percent of their time cleaning and aggregating data. This is necessary because most of the data that organizations collect is unstructured and comes from diverse sources. Making sense of such data is challenging, because the majority of modern databases and data-analytics tools only support structured data. Besides this, data scientists spend at least 19 percent of their time collecting data sets from different sources.

Common job responsibilities

To start with, 69 percent of data scientists perform exploratory data-analytics tasks, which in turn form the basis for more in-depth querying. Moreover, 61 percent perform analytics with the aim of answering specific questions, 58 percent are expected to deliver actionable insights to decision-makers, and 53 percent undertake data cleaning. Additionally, 49 percent are tasked with creating data visualizations, 47 percent leverage data wrangling to identify problems that can be resolved via data-driven processes, and 43 percent perform feature extraction, while 43 percent have the responsibility of developing data-based prototype models.

In-demand programming-language skills

In-depth understanding of SQL is a key requirement cited in 56 percent of job listings for data scientists. Other leading programming-language skills include Hadoop (49 percent of job listings), Python (39 percent), Java (36 percent), and R (32 percent).

The big-data revolution

The big-data revolution witnessed in the last few years has changed the way businesses operate substantially. In fact, 78 percent of corporate organizations believe big data is likely to fundamentally change their operational style over the next three years, while 71 percent of businesses expect the same resource to spawn new revenue opportunities. Only 58 percent of executives believe that their employer has the capability to leverage the power of big data. Nevertheless, 53 percent of companies are planning to roll out data-driven initiatives in the next 12 months.

Recruiting Trends

Companies across all industries are facing a serious shortage of experienced data scientists, which means they risk losing business opportunities to firms that have found the right talent. Common responsibilities among these professionals include developing data visualizations, collecting data, cleaning and aggregating unstructured data, and delivering actionable insights to decision-makers. Leading employers include the financial services, marketing, corporate and technology industries.

View the full infographic created by Villanova University’s Online Master of Science in Analytics degree program.

Reprinted with permission.

What Can We Do? Service Objects Responds to Hurricane Harvey

The Service Objects’ team watched the steady stream of images from Hurricane Harvey and its aftermath and we wanted to know, ‘What can we do to help?’  We realized the best thing we could do is offer our expertise and services free to those who can make the most use of them – the emergency management agencies dedicated to helping those affected by this disaster.

It was quickly realized that as Hurricane Harvey continues to cause record floodwaters and entire neighborhoods are under water, these agencies are finding it nearly impossible to find specific addresses in need of critical assistance. In response to this, we are offering emergency management groups the ability to quickly pinpoint addresses with latitude and longitude coordinates by offering unlimited, no cost access to DOTS Address Geocode ℠ (AG-US). By using Address Geocode, the agencies will not have to rely on potentially incomplete online maps. Instead, using Service Objects’ advanced address mapping services, these agencies will be able to reliably identify specific longitude and latitude coordinates in real-time and service those in need.

“The fallout of the catastrophic floods in Texas is beyond description, and over one million locations in Houston alone have been affected,” said Geoff Grow, CEO and Founder of Service Objects.  “With more than 450,000 people likely to seek federal aid in recovering from this disaster, Service Objects is providing advanced address mapping to help emergency management agencies distribute recovery funds as quickly as possible. We are committed to helping those affected by Hurricane Harvey.”

In addition, as disaster relief efforts are getting underway, Service Objects will provide free access to our address validation products to enable emergency management agencies to quickly distribute recovery funds by address type, geoid, county, census-block and census-track. These data points are required by the federal government to release funding.  This will allow those starting the recovery process from this natural disaster to get next level services as soon as possible.

To get access to Service Objects address solutions or request maps, qualified agencies can contact Service Objects directly by calling 805-963-1700 or by emailing us at

Our team wishes the best for all those affected by Hurricane Harvey.

Image by National Weather Service 

How Millennials Will Impact Your Data Quality Strategy

The so-called Millennial generation now represents the single largest population group in the United States. If they don’t already, they will soon represent your largest base of customers, and a majority of the work force. What does that mean for the rest of us?

It doesn’t necessarily mean that you have to start playing Adele on your hold music, or offering free-range organic lattes in the company cafeteria. What it does mean, according to numerous social observers, is that expectations of quality are changing radically.

The Baby Boomer generation, now dethroned as the largest population group, grew up in a world of amazing technological and social change – but also a world where wrong numbers and shoddy products were an annoying but inevitable part of life. Generation X and Y never completely escaped this either:  ask anyone who ever drove a Yugo or sat on an airport tarmac for hours. But there is growing evidence that millennials, who came of age in a world where consumer choices are as close as their smartphones, are much more likely to abandon your brand if you don’t deliver.

This demographic change also means you can no longer depend on your father’s enterprise data strategy, with its focus on things like security and privacy. For one thing, according to USA Today, millennials could care less about privacy. The generation that grew up oversharing on Instagram and Facebook understands that in a world where information is free, they – and others – are the product. Everyone agrees, however, that what they do care about is access to quality data.

This also extends to how you manage a changing workforce. According to this article, which notes that millennials will make up three quarters of the workforce by 2020, dirty data will become a business liability that can’t be trusted for strategic purposes, whether it is being used to address revenues, costs or risk. Which makes them much more likely to demand automated strategies for data quality and data governance, and push to engineer these capabilities into the enterprise.

Here’s our take: more than ever, the next generation of both consumers and employees will expect data to simply work. There will be less tolerance than ever for bad addresses, mis-delivered orders and unwanted telemarketing. And when young professionals are launching a marketing campaign, serving their customers, or rolling out a new technology, working with a database riddled with bad contacts or missing information will feel like having one foot on the accelerator and one foot on the brake.

We are already a couple of steps ahead of the millennials – our focus is on API-based tools that are built right into your applications, linking them in real time to authoritative data sources like the USPS as well as a host of proprietary databases. They help ensure clean data at the point of entry AND at the time of use, for everything from contact data to scoring the quality of a marketing lead. These tools can also fuel their e-commerce capabilities by automating sales and use tax calculations, or ensure regulatory compliance with telephone consumer protection regulations.

In a world where an increasing number of both our customers and employees will have been born in the 21st century, and big data becomes a fact of modern life, change is inevitable in the way we do business. We like this trend, and feel it points the way towards a world where automated data quality finally becomes a reality for most of us.

How secure is your ‘Data at Rest’?

In a world where millions of customer and contact records are commonly stolen, how do you keep your data safe? First, lock the door to your office. Now you’re good, right? Oh wait, you are still connected to the internet. Disconnect from the internet. Now you’re good, right? What if someone sneaks into the office and accesses your computer? Unplug your computer completely. You know what, while you are at it, pack your computer into some plain boxes to disguise it. Oh wait, this is crazy, not very practical and only somewhat secure.

The point is, as we try to determine what kind of security we need, we also have to find a balance between functionality and security. A lot of this depends on the type of data we are trying to protect. Is it financial, healthcare, government related, or is it personal, like pictures from the last family camping trip. All of these will have different requirements and many of them are our clients’ requirements. As a company dealing with such diverse clientele, Service Objects needs to be ready to handle data and keep it as secure as possible, in all the different states that digital data can exist.

So what are the states that digital data can exist in? There are a number of states and understanding them should be considered when determining a data security strategy. For the most part, the data exists in three states; Data in Motion/transit, Data at Rest/Endpoint and Data in Use and are defined as:

Data in motion/transit

“…meaning it moves through the network to the outside world via email, instant messaging, peer-to-peer (P2P), FTP, or other communication mechanisms.” –

Data at rest/endpoint

“data at rest, meaning it resides in files systems, distributed desktops and large centralized data stores, databases, or other storage centers” –

“data at the endpoint, meaning it resides at network endpoints such as laptops, USB devices, external drives, CD/DVDs, archived tapes, MP3 players, iPhones, or other highly mobile devices” –

Data in use

“Data in use is an information technology term referring to active data which is stored in a non-persistent digital state typically in computer random access memory (RAM), CPU caches, or CPU registers. Data in use is used as a complement to the terms data in transit and data at rest which together define the three states of digital data.” –

How Service Objects balances functionality and security with respect to our clients’ data, which is at rest in our automated batch processing, is the focus of this discussion. Our automated batch process consists of this basic flow:

  • Our client transfers a file to a file structure in our systems using our secure ftp. [This is an example of Data in motion/transit]
  • The file waits momentarily before an automated process picks it up. [This is an example of Data at rest]
  • Once our system detects a new file; [The data is now in the state of Data in use]
    • It opens and processes the file.
    • The results are written into an output file and saved to our secure ftp location.
  • Input and output files remain in the secure ftp location until client retrieves them. [Data at rest]
  • Client retrieves the output file. [Data in motion/transit]
    • Client can immediately choose to delete all, some or no files as per their needs.
  • Five days after processing, if any files exist, the automated system encrypts (minimum 256 bit encryption) the files and moves them off of the secure ftp to another secure location. Any non-encrypted version is no longer present. [Data at rest and data in motion/transit]
    • This delay gives clients time to retrieve the results.
  • 30 days after processing, the encrypted version is completely purged.
    • This provides a last chance, in the event of an error or emergency, to retrieve the data.

We encrypt files five days after processing but what is the strategy for keeping the files secure prior to the five day expiration? First off, we determined that the five and 30 day rules were the best balance between functionality and security. But we also added flexibility to this.

If clients always picked up their files right when they were completed, we really wouldn’t need to think too much about security as the files sat on the secure ftp. But this is real life, people get busy, they have long weekends, go on vacation, simply forget, whatever the reason, Service Objects couldn’t immediately encrypt and move the data. If we did, clients would become frustrated trying to coordinate the retrieval of their data. So we built in the five and 30 day rule but we also added the ability to change these grace periods and customize them to our clients’ needs. This doesn’t prevent anyone from purging their data sooner than any predefined thresholds and in fact, we encourage it.

When we are setting up the automated batch process for a client, we look at the type of data coming in, and if appropriate, we suggest to the client that they may want to send the file to us encrypted. For many companies this is standard practice. Whenever we see any data that could be deemed sensitive, we let our client know.

When it is established that files need to be encrypted at rest, we use industry standard encryption/decryption methods. When a file comes in and processing begins, the data is now in use, so the file is decrypted. After processing, any decrypted file is purged and what remains is the encrypted version of the input and output files.

Not all clients are concerned or require this level of security but Service Objects treats all data the same, with the utmost care and the highest levels of security reasonable. We simply take no chances and always encourage strong data security.

Big Data – Applied to Day to Day Life

With so much data being constantly collected, it’s easy to get lost in how all of it is applied in our real lives. Let’s take a quick look at a few examples starting with one that most of us encounter daily.

Online Forms

One of the most common and fairly simple to understand instances we come across on a daily basis is completing online forms. When we complete an online form, our contact record data points, like; name, email, phone and address, are being individually verified and corrected in real time to ensure each piece of data is genuine, accurate and up to date. Not only does this verification process help mitigate fraud for the companies but it also ensures that the submitted data is correct. The confidence in data accuracy allows for streamlined online purchases and efficient deliveries to us, the customers. Having our accurate information in the company’s data base also helps streamline customer service should there be a discrepancy with the purchase or we have follow up questions about the product. The company can easily pull up our information with any of the data points initially provided (name, email, phone, address and more) to start resolving the issue faster than ever (at least where companies are dedicated to good customer service).

For the most part we are all familiar with business scenarios like the one described above. Let’s shift to India & New Orleans for a couple new examples of how cities are applying data to improve the day-to-day lives of citizens.

Addressing the Unaddressed in India

According to the U.S. Census Bureau, India is the second most populated country in the world with 1,281,935,911 people. With such a large population there is a shortage of affordable housing in many developed cities, leading to about 37 million households residing in unofficial housing areas referred to as slums. Being “unofficial” housing areas means they are not mapped nor addressed leaving residents with very little in terms of identification. However, the Community Foundation of Ireland (a Dublin based non-profit organization) and the Hope Foundation recently began working together to provide each home for Kolkata’s Chetla slum their very first form of address consisting of a nine-digit unique ID. Beside overcoming obvious challenges like giving someone directions to their home and being able to finally receive mail, the implementation of addresses has given residents the ability to open bank accounts and access social benefits. Having addresses has also helped officials identify the needs in a slum, including healthcare and education.

Smoke Detectors in New Orleans

A recent article, The Rise of the Smart City, from The Wall Street Journal highlights how cities closer to home have started using data to bring about city wide enhancements. New Orleans, in particular, is ensuring that high risk properties are provided smoke detectors. Although the fire department has been distributing smoke detectors for years, residents were required to request them. To change this, the city’s Office of Performance and Accountability, used Census Bureau surveys and other data along with advanced machine-learning techniques to create a map for the fire department that better targets areas more susceptible to deaths caused by fire. With the application of big data, more homes are being supplied with smoke detectors increasing safety for entire neighbors and the city as a whole.

FIRE RISK | By combining census with additional data points, New Orleans mapped the combined risk of missing smoke alarms and fire deaths, helping officials target distribution of smoke detectors. PHOTO: CITY OF NEW ORLEANS/OPA

While these are merely a few examples of how data is applied to our day to day lives around the world, I hope they helped make “Big Data” a bit more relatable. Let us know if we can answer any questions about how data solutions can be applied to help your company as well.

Celebrating Earth Day

April 22 marks the annual celebration of Earth Day, a day of environmental awareness that is now approaching its first half century. Founded by US Senator Gaylord Nelson in 1970 as a nationwide teach-in on the environment, Earth Day is now the largest secular observance in the world, celebrated by over a billion people.

Earth Day has a special meaning here in our hometown of Santa Barbara, California. It was a massive 1969 oil spill off our coast that first led Senator Nelson to propose a day of public awareness and political action. Both were sorely needed back then: the first Earth Day came at a time when there was no US Environmental Protection Agency, environmental groups such as Greenpeace and the Natural Resources Defense Council were in their infancy, and pollution was simply a fact of life for many people.

If you visit our hometown today, you will find the spirit of Earth Day to be alive and well. We love our beaches and the outdoors, this area boasts over 50 local environmental organizations, and our city recently approved a master plan for bicycles that recognizes the importance of clean human-powered transportation. And in general, the level of environmental and conservation awareness here is part of the culture of this beautiful place.

Earth Day

It also has a special meaning for us here at Service Objects. Our founder and CEO Geoff Grow, an ardent environmentalist, started this company from an explicit desire to apply mathematics to the problem of wasted resources from incorrect and duplicate mailings. Today, our concern for the environment is codified as one of the company’s four core values, which reads as follows:

“Corporate Conservation – In addition to preventing about 300 tons of paper from landing in landfills each month with our Address Validation APIs, we practice what we preach: we recycle, use highly efficient virtualized servers, and use sustainable office supplies. Every employee is conscious of how they can positively impact our conservation efforts.”

Today, as Earth Day nears the end of its fifth decade, and Service Objects marks over 15 years in business, our own contributions to the environment have continued to grow. Here are just a few of the numbers behind the impact of our data validation products – so far, we have saved:

  • Over 85 thousand tons of paper
  • A million and a half trees
  • 32 million gallons of oil
  • More than half a billion gallons of water
  • Close to 50 million pounds of air pollution
  • A quarter of a million cubic yards of landfill space
  • 346 million KWH of energy

All of this is an outgrowth of more than two and a half billion transactions validated – and counting! (If you are ever curious about how we are doing in the future, just check the main page of our website: there is a real-time clock with the latest totals there.) And we are always looking for ways to continue making lives better though data validation tools.

We hope you, too, will join us in celebrating Earth Day. And the best way possible to do this is to examine the impact of your own business and community on the environment, and take positive steps to make the earth a better place. Even small changes can create a big impact over time. The original Earth Day was the catalyst for a movement that has made a real difference in our world – and by working together, there is much more good to come!

Medical Data is Bigger than You May Think

What do medical centers have in common with businesses like with Uber, Travelocity, or Amazon? They have a treasure trove of data, that’s what! The quality of that data and what’s done with it can help organizations work more efficiently, more profitably, and more competitively. More importantly for medical centers, data quality can lead to even better quality care.

Here’s just a brief sampling of the types of data a typical hospital, clinic, or medical center generates:

Patient contact information
Medical records with health histories
Insurance records
Payment information
Geographic data for determining “Prime Distance” and “Drive Time Standards”
Employee and payroll data
Ambulance response times
Vaccination data
Patient satisfaction data

Within each of these categories, there may be massive amounts of sub-data, too. For example, medical billing relies on tens of thousands of medical codes. For a single patient, even several addresses are collected such as the patient’s home and mailing addresses, the insurance company’s billing address, the employer’s address, and so forth.

This data must be collected, validated for accuracy, and managed, all in compliance with rigorous privacy and security regulations. Plus, it’s not just big data, it’s important data. A simple transposed number in an address can mean the difference between getting paid promptly or not at all. A pharmaceutical mix-up could mean the difference between life and death.

With so much important data, it’s easy to get overwhelmed. Who’s responsible? How is data quality ensured? How is it managed? Several roles can be involved:

Data stewards – Develop data governance policies and procedures.
Data owners – Generate the data and implement the policies and procedures.
Business users –  Analyze and make use of the data.
Data managers –  Information systems managers and developers who implement and manage the tools need to capture, validate, and analyze the data.

Defining a data quality vision, assembling a data team, and investing in appropriate technology is a must. With the right team and data validation tools in place, medical centers and any organization can get serious about data and data quality.

How Can Data Quality Lead to Quality Care?

Having the most accurate, authoritative and up-to-date information for patients can positively impact organizations in many ways. For example, when patients move, they don’t always think to inform their doctors, labs, hospitals, or radiology centers. With a real-time address validation API, not only could you instantly validate a patient’s address for billing and marketing purposes, you could confirm that the patient still lives within the insurance company’s “prime distance” radius before treatment begins.

Accurate address and demographic data can trim mailing costs and improve patient satisfaction with appropriate timing and personalization. Meanwhile, aggregated health data could be analyzed to look at health outcomes or reach out to patients proactively based on trends or health histories. Just as online retailers recommend products based on past purchases or purchases by customers like you, medical providers can use big data to recommend screenings based on health factors or demographic trends.

Developing a data quality initiative is a major, but worthwhile, undertaking for all types of organizations — and you don’t have to figure it all out on your own. Contact Service Objects today to learn more about our data validation tools.

Data Monetization: Leveraging Your Data as an Asset

Everyone knows that Michael Dell built a giant computer business from scratch in a college dorm room. Less well known is how he got started: by selling newspaper subscriptions in his hometown of Houston.

You see, most newspaper salespeople took lists of prospects and started cold-calling them. Most weren’t interested. In his biography, Dell describes using a different strategy: he found out who had recently married or purchased a house from public records – both groups that were much more likely to want new newspaper subscriptions – and pitched to them. He was so successful that he eventually surprised his parents by driving off to college in a new BMW.

This is an example of data monetization – the use of data as a revenue source to improve your bottom line. Dell used an example of indirect data monetization, where data makes your sales process or other operations more effective. There is also direct data monetization, where you profit directly from the sale of your data, or the intelligence attached to it.

Data monetization has become big business nowadays. According to PWC consulting firm Strategy&, the market for commercializing data is projected to grow to US $300 billion annually in the financial services sector alone, while business intelligence analyst Jeff Morris predicts a US $5 billion-plus market for retail data analytics by 2020. Even Michael Dell, clearly remembering his newspaper-selling days, is now predicting that data analytics will be the next trillion-dollar market.

This growth market is clearly being driven by massive growth in data sources themselves, ranging from social media to the Internet of Things (IoT) – there is now income and insight to be gained out of everything from Facebook posts to remote sensing devices. But for most businesses, the first and easiest source of data monetization lies in their contact and CRM data.

Understanding the behaviors and preferences of customers, prospects and stakeholders is the key to indirect data monetization (such as targeted offers and better response rates), and sometimes direct data monetization (such as selling contact lists or analytical insight). In both cases, your success lives or dies on data quality. Here’s why:

  • Bad data makes your insights worthless. For example, if you are analyzing the purchasing behavior of your prospects, and many of them entered false names or contact information to obtain free information, then what “Donald Duck” does may have little bearing on data from qualified purchasers.
  • The reputational cost of inaccurate data goes up substantially when you attempt to monetize it – for example, imagine sending offers of repeat business to new prospects, or vice-versa.
  • As big data gets bigger, the human and financial costs of responding to inaccurate information rise proportionately.

Information Builders CIO Rado Kotorov puts it very succinctly: “Data monetization projects can only be successful if the data at hand is cleansed and ready for analysis.” This underscores the importance of using inexpensive, automated data verification and validation tools as part of your system. With the right partner, data monetization can become an important part of both your revenue stream and your brand – as you become known as a business that gives more customers what they want, more often.

Marketers and Data Scientists Improving Data Quality and Marketing Results Together

In the era of big data, marketing professionals have added basic data analysis to their toolboxes. However, the data they’re dealing with often requires significantly deeper analysis, and data quality (Is it Accurate? Current? Authentic?) is a huge concern. Thus, data scientists and marketers are more often working side by side to improve campaign efficiencies and results.

What is a Data Scientist?

Harvard Business Review called the data scientist profession “the sexiest job of the 21st century” and described the role of data scientist as “a hybrid of data hacker, analyst, communicator, and trusted adviser.”

The term data scientist itself is relatively new, with many data scientists lacking what we might call a data science degree. Rather, they may have a background in business, statistics, math, economics, or analytics. Data scientists understand business, patterns, and numbers. They tend to enjoy looking at diverse sets of data in search of similarities, differences, trends, and other discoveries. The ability to understand and communicate their discoveries make data scientists a valuable addition to any marketing team.

Data scientists are in demand and command high salaries. In fact, Robert Half Technology’s 2017 Salary Guides suggest that data scientists will see a 6.5 percent bump in pay compared to 2016 (and their average starting salary range is already an impressive $116,000 to $163,500).

Why are Marketers Working with Data Scientists?

Marketers must deal with massive amounts of data and are increasingly concerned about data quality. They recognize that there’s likely valuable information buried within the data, yet making those discoveries requires time, expertise, and tools — each of which pulls them away from their other important tasks. Likewise, even the tried-and-true act of sending direct mail to the masses can benefit from a data scientist who can both dig into the demographic requirements as well as ensure data quality by cross referencing address data against USPS databases.

In short, marketers need those data hackers, analysts, communicators, and trusted advisers in order to make sense of the data and ensure data quality.

A Look at the Marketer – Data Scientist Relationship

As with any collaboration, marketers and data scientists occasionally have differences. They come from different academic backgrounds, and have different perspectives. A marketer, for example, is highly creative whereas a data scientist is more accustomed to analyzing data.

However, when sharing a common goal and understanding their roles in achieving it, marketers and data scientists can forge a worthwhile partnership that positively impacts business success.

We all know that you’re only as good as your data, making data quality a top shared concern between marketers and data scientists alike. Using tools such as data validation APIs, data scientists ensure that the information marketers have is as accurate, authoritative, and up to date as possible. Whether pinpointing geographical trends or validating addresses prior to a massive direct mail campaign, the collaboration between marketers and data scientists leads to increased campaign efficiencies, results, and, ultimately, increased revenue for the company as a whole.

The Role of a Chief Data Officer

According to a recent article in Information Management, nearly two-thirds of CIOs want to hire Chief Data Officers (CDO) over the next year. Why is this dramatic transformation taking place, and what does it mean for you and your organization?

More than anything, the rise of the CDO recognizes the growing role of data as a strategic corporate asset. Decades ago, organizations were focused on automating specific functions within their individual silos. Later, enterprise-level computing like CRM and ERP helped them reap the benefits of data interoperability. And today, trends such as big data and data mining have brought the strategic value of data front and center.

This means that the need is greater than ever for a central, C-level resource who has both a policy-making and advocacy role for an organization’s data. This role generally encompasses data standards, data governance, and the oversight of data metrics. A CDO’s responsibilities can be as specific as naming conventions and standards for common data, and as broad as overseeing enterprise data management and business intelligence software. They are ultimately accountable for maximizing the ROI of an organization’s data assets.

A key part of this role is oversight of data quality. Bad data represents a tangible cost across the organization, including wasted marketing efforts, misdirected product shipments, reduced customer satisfaction, and fraud, tax and compliance issues, among other factors. More important, without a consistent infrastructure for data quality, the many potential sources of bad data can fall through the cracks without insight or accountability. It is an exact analogy to how quality assurance strategies have evolved for manufacturing, software or other areas.

A recent report from the Gartner Group underscored the uphill battle that data quality efforts still face in most organizations: while those surveyed believed that data quality issues were costing each of them US $9.7 million dollars annually on average, most are still seeking justification to address data quality as a priority. Moreover, Gartner concludes that many current efforts to remediate data quality simply encourage line-of-business staff to abandon their own data responsibilities. Their recommendations include making a business case for data quality, linking data quality and business metrics, and above all shifting the mindset of data quality practitioners from being “doers” to being facilitators.

This, in turn, is helping fuel the rise of the central CDO – a role that serves as both a policymaker and an evangelist. In the former role, their job is to create an infrastructure for data quality and deploy it across the entire organization. In the latter role, they must educate their organizations about the ROI of a consistent, measurable approach to data, as well as the real costs and competitive disadvantage of not having one – particularly as more and more organizations add formal C-level responsibility for data to their boardrooms.

Service Objects has long focused on this transition by creating interoperable tools that automate the process of contact data verification, for functions ranging from address and email validation to quantitative lead scoring. We help organizations make data quality a seamless part of their infrastructure, using API and web-based interfaces that tap into global databases of contact information. These efforts have quickly gained acceptance in the marketplace: last year alone, CIO Review named us as one of the 20 most promising API solution providers. And nowadays, in this new era of the Chief Data Officer, our goal as a solutions provider is to support their mission of overseeing data quality.

The Importance of Data Accuracy in Machine Learning

Imagine that someone calls your contact center – and before they even get to “Hello,” you know what they might be calling about, how frustrated they might be, and what additional products and services they might be interested in purchasing.

This is just one of the many promises of machine learning: a form of artificial intelligence (AI) that learns from the data itself, rather than from explicit programming. In the contact center example above, machine learning uses inputs ranging from CRM data to voice analysis to add predictive logic to your cu
stomer interactions. (One firm, in fact, cites call center sales efforts improving by over a third after implementing machine learning software.)

Machine learning applications nowadays range from image recognition to predictive analytics. One example of the latter happens every time you log into Facebook: by analyzing your interactions, it makes more intelligent choices about which of your hundreds of friends – and what sponsored content – ends up on your newsfeed. And a recent Forbes article predicts a wealth of new and specialized applications, including helping ships to avoid hitting whales, automating granting employee access credentials, and predicting who is at risk for hospital readmission – before they even leave the hospital the first time!

The common thread between most machine learning applications is deep learning, often fueled by high-speed cloud computing and big data. The data itself is the star of the process: for example, a computer can often learn to play games like an expert, without programming a strategy beforehand, by generating enough moves by trial-and-error to find patterns and create rules. This mimics the way the human brain itself often learns to process information, whether it is learning to walk around in a dark living room at night or finding something in the garage.

Since machine learning is fed by large amounts of data, its benefits can quickly fall apart when this data isn’t accurate. A humorous example of this was when a major department store chain decided (incorrectly) that CNBC host Carol Roth was pregnant – to the point where she was receiving samples of baby formula and other products – and Google targeted her as an older man. Multiply examples like this by the amount of bad data in many contact databases, and the principle of “garbage in, garbage out” can quickly lead to serious costs, particularly with larger datasets.

Putting some numbers to this issue, statistics from IT data quality firm Blazent show that while over two thirds of senior level IT staff intend to make use of machine learning, 60 percent lack confidence in the quality of their data – and 45 percent of their organizations simply react to data errors as they occur. Which is not only costly, but in many cases totally unnecessary: with modern data quality management tools, their absence is too often a matter of inertia or lack of ownership rather than ROI.

Truly unlocking the potential of machine learning will require a marriage between the promise of its applications and the practicalities of data quality. Like most marriages, this will involve good communication and clearly defined responsibilities, within a larger framework of good data governance. Done well, machine learning technology promises to represent another very important step in the process of leveraging your data as an asset.

The Role of a Data Steward

If you have ever dined at a *really* fine restaurant, it may have featured a wine steward: a person formally trained and certified to oversee every aspect of the restaurant’s wine collection. A sommelier, as they are known, not only tastes wines before serving them but sets policy for wine acquisition and its pairings with food, among other responsibilities. Training for this role may involve as much as two-year college degree.

This is a good metaphor for a growing role in technology and business organizations – that of a data steward. Unlike a database administrator, who takes functional responsibility for repositories of data, a data steward has a broader role encompassing policies, procedures, and data quality. In a very real sense, a data steward is responsible for managing the overall value and long-term sustainability of an organization’s data assets.

According to Dataversity, the key role of a data steward is that they own an organization’s data. This links to the historical definition of a steward, from the Middle Ages – one who oversees the affairs of someone’s estate. This means that an effective data steward needs a broad background including areas like programming and database skills, data modeling and warehousing expertise, and above all good communications skills and business visibility. In larger organizations, Gartner sees this role as becoming increasingly formalized as a C-level position title, either as Chief Data Officer or incorporated as part of another C-level IT officer’s responsibilities.

One of the key advantages of having a formal data steward is that someone is accountable for your data quality. Too often, even in large organizations, this job falls to no one. Frequently individual stakeholders are responsible for data entry or data usage, and the process of strategically addressing bad data would add bandwidth to their jobs. This is an example of the tragedy of the commons, where no one takes responsibility for the common good, and the organization ultimately incurs costs in time, missed marketing opportunities or poor customer relations by living with subpar data quality.

Another advantage of a data steward is that someone is tasked with evaluating and acquiring the right infrastructure for optimizing the value of your data. For example, automated tools exist that not only flag or correct contact data for accuracy, but enhance its value by appending publicly available information such as phone numbers or geographic locations. Or help control fraud and waste by screening your contact data per numerous criteria, and then assigning a quantitative lead score. Ironically, these tools are often inexpensive and make everyone’s life easier, but having a data steward can prevent a situation where implementing these tools is no one’s responsibility.

Looking at a formal role of data stewardship in your own organization is a sign that you take data seriously as an asset, and can start making smart moves to protect and expand its value. It helps you think strategically about your data, and teach everyone to be accountable for their role in it. This, in turn, can become the key to leveraging your organization’s data as a competitive advantage.

Data Quality and the Environment

Service Objects recently celebrated our 15th year in business and it made me reflect on something that is important to me and is an underappreciated reason for improving your data quality: protecting our environmental resources.

Lots of companies talk about protecting the environment. Hotels ask you to re-use your towels, workplaces encourage you to recycle, and restaurants sometimes forego that automatic glass of ice water on your table. Good for them – it saves them all money as well as conserving resources. But our perspective is somewhat different because environmental conservation is one of the key reasons I founded this company in 2001.

Ever since I was a young man, I’ve been an avid outdoorsman who has felt a very strong connection to the natural world we inhabit. So one of the things I couldn’t help but notice was how much mislabeled direct mail showed up at my doorstep, as well as those of my friends. Some companies might even send three copies of the same thick catalog, addressed to different spellings of my name. Add in misdirected mail that never arrives, poor demographic targeting, and constant changes in workplace addresses, and you have a huge – and preventable – waste of resources.

As a mathematician and an engineer by training, thinking through the mathematics of how better data quality could affect this massive waste stream was a large part of the genesis of Service Objects. We discovered that the numbers involved were truly staggering. And we discovered that simple, automated filters driven by sophisticated database technology could make a huge difference in these figures.

Since then, our products have made a real difference. Over the past 15 years, our commitment to reducing waste associated with bad address data has saved over 1.2 million trees, and prevented over 150 million pounds of paper from winding up in landfills. We have also saved 520 million gallons of water and prevented 44 million pounds of air pollution. More important, these savings are driven by a growing enterprise that has now validated over two and a half billion contact records for over 2400 customers.

As a company, our concern for the environment goes far beyond the services we provide to customers. We encourage our staff to ride their bicycle to work instead of driving their car, use sustainable office supplies, and keep a sharp eye on our own resource usage. Corporate conservation is one of the four core values of our company’s culture. The result is a team I am proud of, with a shared vision and sense of purpose.

There are many great business reasons for using Service Objects’ data quality products, including cost savings, fraud prevention, more effective marketing, and improved customer loyalty. But to me personally, using a smaller footprint of the Earth’s resources is the core that underlies all of these benefits. It is a true example of doing well by doing good.

For any business – particularly those who do direct marketing, distribute print media or ship tangible products, among many others – improving your data quality with us can make a real difference to both your bottom line AND our planet’s resources. We are proud to play a part in protecting the environment, and look forward to serving you for the next 15 years and beyond.

Fighting Fraud with Big Data

Fraud comes in many forms whether through misrepresentation, concealment or intent to deceive. Traditional methods of identifying and fighting fraud have relied on data analysis to detect anomalies which signal a fraud event has taken place. Detecting anomalies falls into two categories; known and unknown.

Known Fraud Schemes

Known fraud schemes can be easy to identify. They have been committed in the past and thus recognizably fit a pattern. Common known fraud schemes over the web include purchase fraud, internet marketing, and retail fraud. Methods to identify patterns for these types of fraud include tracking user activity, location, and behavior. One example for tracking location might be through IP, determining whether a user is concealing their identity, or is executing a transaction from a high-risk international location. A correlation can be made based on location if it is determined to be High Risk. Another case for location tracking is a physical address. In the past, fraudsters have used unoccupied addresses to accept delivered goods purchased through online and retail stores. Identifying an unoccupied address through DOTS Address Validation DPV notes provides real-time notification of vacant addresses which can be considered a red flag.

Identifying the Unknown

Unknown fraud schemes, on the other hand, are much more difficult to identify. They do not fall into known patterns making detection more challenging. This is starting to change with the paradigm shift from reactive to proactive fraud detection made possible through Big Data technologies. With Big Data, the viewpoint becomes much larger, analyzing each individual event vs sampling random events to attempt to identify an anomaly.

So What is Big Data?

Big Data is generally defined as datasets which are larger or more complex than traditional data processing applications ability to handle them. Big Data can be described by the following characteristics: Volume, Variety, Velocity, Variability, and Veracity.

Volume: The quantity of generated and stored data.

Variety: The type and nature of the data.

Velocity: The speed at which data is generated and processed.

Variability: Inconsistency of the data set.

Veracity: The quality of captured data varies.

Tackling Big Data

With the advent of distributed computing tools such as Hadoop, wrangling these datasets into manageable workloads has become a reality. Spreading the workload across a cluster of nodes provides the throughput and storage space necessary to process such large datasets within an acceptable timeframe. Cloud hosting providers such as Amazon provide an affordable means to provision an already configured cluster; perform data processing tasks, and immediately shut down, reducing infrastructure costs and leveraging the vast hardware resources available through Amazon’s network.

Service Objects Joins the Fight

More recently, Service Objects has been employing Big Data techniques to mine through datasets in the hundreds of terabytes range, collecting information and analyzing results to improve fraud detection in our various services. This ambitious project will provide an industry leading advantage in the sheer amount of data collected, validating identity, location and a host of attributes for businesses. Stay tuned for more updates about this exciting project.

Bots Need Address Validation Too

Remember watching Star Trek as a kid and dreaming of talking to a computer throughout the day? Then PCs arrived, and while you couldn’t control them with your voice, information was at your fingertips. And then along came Siri and Cortana as well as other artificial intelligence technologies like chat bots. The future has arrived!

Though initially clunky and limited in their capabilities, chat bots are getting smarter and more human-like. Earlier this year, students taking a course online at the Georgia Institute of Technology found out that their friendly teaching assistant, Jill Watson, was, in fact, a chat bot and not a real person as they had believed all semester.

Siri, Cortana, and various transactional bots that appear when you order flowers and other services online are likely to play a more prominent role as you interact with businesses online. For example, you can already use Cortana on Windows 10, and Mac OS Sierra, which is now in public beta and expected to arrive in the fall, will bring Siri to the Mac. Not only will you be able to interact with Siri on your computer, she’ll have a direct link to Apple Pay. Developers, at long last, have been given access to Siri, which means you’ll soon be able to order and pay for products with a simple Siri command.

Amazon’s Echo audio device is another example of how technology is changing how we interact with computers. This device is always listening and ready to play music, look up information, give you a weather report, order pizza, read you a story, control smart home devices, and more — all with a voice command.

Star Trek had it right when it envisioned how we interact with computers. In one iconic scene, Scotty traveled back in time to contemporary 1986. He tried to talk to the computer, but given 1980s technology, he got no response. After trying to address it, someone handed him the mouse. He tried talking into the mouse. Again, no response. Finally, he was told to use the keyboard. How primitive can you get?


Scotty would be happy to know that we are finally approaching what he thought was the obvious way to interact with computers. Computer bots are finally not only understanding what we say but also taking lots of complex actions based on what we tell them to do!

Technology has progressed to the point where Facebook wants companies to forgo email and talk to Gen-Zers via chat bots. According to Facebook’s Developer News Post, How to Build Bots for Messenger, “…bots make it possible for you to be more personal, more proactive, and more streamlined in the way that you interact with people.”

Clearly, bots have a big job to do, and that job is getting bigger and more complex. Are they up to the task?

Unfortunately, there have been problems with bots not validating information correctly. Shortly after Facebook’s online demonstration of 1-800 Flowers’ chat bot integration with Messenger, users began posting their own awkward interactions with the bot. One user entered a delivery address multiple times, yet the chat bot continuously ignored the given address, prompting the user to enter an address again or choose from a list of buildings located halfway around the world.



So, while chat bots are getting better, smarter, and more prominent, Facebook’s Messenger and other companies using bots need to do a better job of fuzzy matching. Service Objects easily handles these delivery issues with no problems whatsoever. The address would have been validated correctly and frustration free.

Remember, as friendly and helpful as bots appear, they are not humans; they are computer programs. While a human may quickly recognize an address or apartment number even if it’s in a non-standard format, computers rely on the algorithms and databases they’ve been instructed to use — and it’s your classic case of garbage in, garbage out.

If a chat bot has been integrated with a quality Address Validation API such as Service Objects’, it will be able to instantly understand and recognize an entry as an address.

Now here’s where a chat bot has an advantage over a human: linked to an address verification API like our real-time address parsing software, a chat bot could instantly verify and correct address data as well as retrieve geocodes that pinpoint the exact address location on a map.

As smart and intuitive as bots are, they still need our help. The best way you can help your company’s chat bots is to link them to our address validation API. It’s easy and affordable, and it will deliver a superior customer experience, not to mention delivering those tulips to the correct address.

Your Own ‘Big Data’ is Silently Being Data Mined to Connect the Dots

With apps like Facebook, Waze, and the release of iOS 9, you probably didn’t realize that your cell phone is now quietly mining data behind the scenes. Don’t be afraid, though. This isn’t big brother trying to watch your every move, it’s data scientists trying to help you get the most out of your phone and its applications, ultimately trying to make your life easier.

Here are a few things your phone is doing:

Data mining your email for contacts

Since it was released late last year, Apple’s newest iPhone operating system (iOS 9) now searches through your email in order to connect the dots. For example, let’s say that you get an email from Bob Smith and the signature line in the email gives his phone number. iOS9 records this so that if his number calls you, and Bob isn’t in your contacts, Apple shows the number with text underneath that says “Maybe: Bob Smith”.

Apple was quick to point out that this automatic search is anonymous – not associated with your Apple ID, not shared with third parties, nor linked to your other Apple services, and you can turn it off at any time in your Settings.

Mining your data via Facebook’s facial recognition

Upload a photo with friends into Facebook and it will automatically recognize them and make suggestions for tagging them based on other photos of your friends.

When facial recognition first launched on Facebook in 2010, it automatically matched photos you would upload, and tagged your friends accordingly. This spooked so many users that Facebook removed the feature. They later they brought it back, this time around asking the users if the tagged photos were correct first. They also included the ability to turn it off altogether for those who thought it was still too ‘Big Brother”. You can turn it off via Facebook Settings -> Timeline and Tagging -> Who sees tag suggestions when photos that look like you are uploaded?

Waze crowd-sourced data mining for traffic

Google purchased Waze in 2013 for $1.3 Billion and people wondered “why so much?” Quite simply: because of the data. Accepting the terms of the app when you install it means that even when running in the background, the app sends the data to Waze of where you are and how fast you are driving. Waze had amassed a large enough user base that they have a constant stream of real-time traffic. The users are both the source of how fast they were going on any given road at any given time and the beneficiaries of knowing how fast everyone else is going on all other roads. There is no need for cameras or special sensors on the freeway. This meant Google could use the real-time data to make better maps and projections for traffic conditions, and re-route you based on traffic and incidents others had reported to Waze.

Here is a case where, if you read the fine print of the app user agreement, you might have second guessed your download. But like nearly everyone else, you probably didn’t read it and you are now avoiding traffic for the better.

Un-connecting the dots

Sometimes Big Data will have connected the dots, but you’d like to undo the connection. A recent article in the New York Times gave examples of how people managed breakups on social media:

‘I knew that if we reset our status to “single” or “divorced,” it would send a message to all our friends, and I was really hoping to avoid a mass notification. So we decided to delete the relationship status category on our walls altogether. This way, it would disconnect our pages quietly. In addition, I told him I planned to unfriend him in order to avoid hurt feelings through seeing happy pictures on the news feed.’

As ‘Big Data’ connections become more prevalent, luckily so too are the tools that help undo the connections they make. Facebook’s “OnThisDay” feature allows you to turn off friend’s reminders so that you aren’t shown memories of exes that you’d rather have not appeared.

Here at Service Objects, we are constantly looking at connecting the disparate dots of information to make data as accurate and as up-to-date as possible. Whether scoring a lead the second a user fills out their first online form on your website or preventing fraudulent packages from being delivered to a scam artist’s temporary address, having the freshest, most accurate data allows companies to make the best decisions and avoid costly mistakes.

Popular Myths about Big Data

Everyone’s talking about big data and data quality services. The hubbub isn’t likely to shrink any time soon because big data is only getting bigger and becoming increasingly important to rein in and manage. As with any topic receiving a great deal of attention, several myths have emerged. Ready for some myth-busting? 


Ted Friedman, Vice President and analyst from Gartner, debunked this myth at Gartner’s 2014 Business Intelligence & Analytics Summit in Munich, Germany. IT leaders believe that the huge volume of data that organizations now manage makes individual data quality flaws insignificant due to the “law of large numbers.” Their view is that individual data quality flaws don’t influence the overall outcome when the data is analyzed because each flaw is only a tiny part of the mass of data in their organization.  “In reality, although each individual flaw has a much smaller impact on the whole dataset than it did when there was less data, there are more flaws than before because there is more data,”1 said Ted Friedman, vice president and distinguished analyst at Gartner. Instead of ignoring minor flaws, easy-to-use data quality tools can quickly correct or remove them.


This myth comes courtesy of Michel Guillet of Juice Analytics who shared some myths in an INC article, 3 Big Myths about Big Data. Comparing big data choices to grocery store shelves, Guillet illustrated how too many big data choices (i.e., metrics, chart choices, and so on) can quickly become overwhelming. Guillet suggests that when faced with uncertainty about data options, users may simply ask for all of it. 

What he doesn’t say, however, is that too many choices often lead to indecision. For example, if you’re faced with, and overwhelmed by, too many different potato chip flavors plus traditional, low-fat, gluten-free, and baked options, name brands and store brands, you may grab one just to be done with it. Or, you might not choose one at all.  

Guillet says that users want guidance, not more uncertainty and that expressing an interest in more data is an indicator of uncertainty. What they really want? According to Guillet, “…they want the data presented so as to remove uncertainty, not just raise more questions. They won’t invest more than a few minutes using data to try to answer their questions.”


Sure, customers may own their own data, but they don’t necessarily know how to extract meaningful information from that raw data. Guillet explains that you’re not selling them access to their own data. Rather, you’re selling algorithms, metrics, insights, benchmarks, visualizations, and other data quality services that increase the data’s value.


Not necessarily. While it seems like everyone else has already adopted a big data solution, or is in the process of doing so, according to Gartner, interest in big data technologies and services is at a record high, with 73 percent of the organizations Gartner surveyed in 2014 investing or planning to invest in them. But most organizations are still in the very early stages of adoption — only 13 percent of those we surveyed had actually deployed these solutions.1 The majority remain in the early stages of adoption. 

While you may feel like you’re late to the big data and data quality services party, the party is just getting started! In fact, this is the perfect time to investigate your big data options. Data quality tools have been around long enough to be both innovative and yet mature enough to have had the bugs worked out of them. 

These myths continue to make their rounds, but rest assured: our data quality services take care of data quality so even the tiniest of flaws are removed or corrected. We provide expert guidance, helping you to get the most out of the data quality tools available to you. Moreover, it’s not too late to get started. Simply sign up for a free trial key and try our data quality services in a matter of minutes.


1 Gartner Newsroom, Gartner Debunks Five of the Biggest Data Myths,

5 Trending Data Buzzwords

Here at Service Objects, we love data almost as much as we love our dogs. Few things make us happier than exploring or talking about data. Turns out, we’re not the only ones obsessed with data. Check out the following data buzzwords and join the conversation.


Smart Data

In the last 30 days, “smart data” was mentioned on Twitter 9,234 times. What is it? You know what big data is, right? It’s massive, but it doesn’t always make sense. Denis Igin from NXCBlog explained smart data this way:

“…big data is what we know about consumer behavior, while smart data is how we discover the underlying rationale and predict repetition of such behavior… In short, smart data is adding advanced business intelligence (BI) on top of big data, in order to provide actionable insights.”

Business intelligence and data quality software help you make sense of data. Now that’s smart!

Data Warehousing

Another 1,000+ conversations on Twitter in the last month have centered on “data warehousing.” What is it? Wisegeek explains it best:

Data warehousing combines data from multiple, usually varied, sources into one comprehensive and easily manipulated database.

For example, Service Objects’ data quality services tap into our massive database which pulls data from various sources such as the US Postal Service, telephone databases, and GPS mapping databases to validate, standardize, and enhance data.

Dark Data

Topsy’s Social Analytics reveals nearly 1,700 Twitter mentions for “dark data” in the last month. According to Gartner’s IT Glossary, dark data is defined as:

“… the information assets organizations collect, process and store during regular business activities, but generally fail to use for other purposes (for example, analytics, business relationships and direct monetizing).”1

In other words, dark data is data that’s not being used. It’s often kept solely for compliance purposes. What if you could enhance that data and put it to good use? Using data quality software, for example, you could validate older, potentially obsolete, addresses and create a direct marketing campaign targeting former customers.

Big Analytics

With over 23,000 mentions in the last month, “big analytics” is definitely buzzworthy.

Big data analytics is the process of examining large data sets containing a variety of data types — i.e., big data — to uncover hidden patterns, unknown correlations, market trends, customer preferences and other useful business information.

Big analytics goes hand-in-hand with big data. When big analytics and data quality services work together, your data becomes much smarter and easier to work with.

The Internet of Things

We saved the biggest buzzword for last: “The Internet of Things. Also commonly referred to as “IoT,” the Internet of Things refers to a dramatically more connected Internet. Remember when the only things connected to the Internet were computers and servers? How quaint is that? Look around your personal space. You may have a desktop scanner that scans documents directly to Dropbox or Evernote. Maybe you’re wearing a Fitbit, which transmits your fitness and health stats directly to You may even have Internet-connected light bulbs or video surveillance system. If you’re really fancy, your refrigerator connects, tracking UPC codes and notifying you when your milk is about to go sour. On a larger scale, industrial equipment, parking meters, weather stations, and more are connected.

How many mentions did it get in the last 30 days? Over 150,000 for “Internet of Things” and another 74,000 for “IoT.” Now that’s trending!

As you can imagine, all those IoT “things” are generating data, contributing to big data. Each of these data buzzwords is interrelated and reflects the need for solutions such as data quality services and analytics.

1 Gartner IT Glossary, Dark Data,

Four TED Talks that Anyone Working With Data Should Watch

The era of big data is upon us, and it’s bringing us surprising insights. Remember the Icelandic volcano eruption that grounded flights a few years ago? Data shows us that the volcano emitted 150,000 tons of carbon dioxide while the grounded airplanes would have emitted more than twice that much had they flown. This particular insight was discussed in a TED Talk, one of four that anyone working with data should watch.

The beauty of data visualization by David McCandless



David McCandless describes the Icelandic volcanic eruption along with other data-driven facts beautifully in a 2010 TED Talk about data visualization. McCandless explains that data visualization can help to overcome information overload. By visualizing information, it becomes possible to “… see the patterns and connections that matter and then designing that information so it makes more sense, or it tells a story, or allows us to focus only on the information that’s important.” Plus, he said, it can “look really cool.” To illustrate his point that data visualization is beautiful, indeed, he presented several gorgeous data visualizations.

What do we do with all this big data? by Susan Etlinger



This Ted Talk was filmed in September 2014. Susan Etlinger presents the notion that data that makes you feel comfortable or successful is likely wrongly interpreted. She says it’s hard to move from counting things to understanding them, and that in order to understand data, we must ask hard questions. She wraps up the talk by saying that we have to treat critical thinking with respect, be inspired by some of the examples she presented, and, “like superheroes” use our powers for good.

The best stats you’ve ever seen by Hans Rosling



Starting with a story about quizzing some of the top undergrad students in Sweden on their knowledge of global health and determining that they knew less about it statistically than chimpanzees did, Hans Rosling takes viewers on a visual, stat based journey around the world and through time. Rosling’s data challenges preconceived notions and will likely leave you feeling hopeful about the future.

Why smart statistics are the key to fighting crime by Anne Milgram



Anne Milgram, attorney general of New Jersey, presents an inspirational talk about how data analytics and statistical analysis can be used to fight crime in the United States. After talking office, Milgram discovered that the local police department relied more on Post-it notes to fight crime than it did data. Meanwhile, the Oakland A’s was using smart data to pick players that would help them win games. Why couldn’t the criminal justice system use data to improve public safety? End result: a universal risk assessment tool that presents crime data in a meaningful way.

Each of these TED Talks runs about 15 to 20 minutes. Set aside an hour or so this weekend to watch each one. Prepare to get inspired about data!