Cien.ai Data Exec Series: How Bad is Your Data?
By Rob Käll, CEO & Co-founder Cien.ai
The Dreaded GIGO
How many times has this happened to you? You have an Excel file, a report, and a dashboard, and you want to use the data to assess the current state of the business and make an action plan for improvements. You dig in only to find the data does not make sense, and you gradually lose trust in the information. You have once again hit the dreaded GIGO (Garbage In – Garbage Out). So, what should the data-driven executive do? Revert to gut feels? No, there is hope – AI can help measure the quality of and fix your data.
“Often when I ask an executive if they have faith in their data, they say ‘not really’. But when I follow up and ask why, the answers are often vague.”
Rob Käll, CEO Cien.ai
5 Ways Data Can Be Bad
Often when I ask an executive if they have faith in their data, they say “Not really”. But when I follow up and ask why, the answers are often vague. Here are some ways data can be bad:
1. Duplicated – more than one of something that should be unique
2. Incomplete – a lot of blank fields that should be filled in
3. Inconsistent – multiple definitions for the same thing (e.g., USA vs. United States)
4. Incorrect – the data value is wrong. (E.g., an outlier, like a calendar event that started in 2014 and lasted 10 years instead of 1h )
5. Lagging – data is running through pipelines that are slow or failing (e.g., sales data at the end of the quarter is missing the last 14 days)
Most companies suffer from a combination of some of these issues on a lot of their data, but they do not quite know where the problems are. With Data Science it is possible to measure these issues and provide detailed insights on records and segments with problems. That alone can be useful.
How do you Fix Data Problems?
The perfect scenario: All existing data is cleaned up and no new bad data is entered. Easy? No, very hard… Often nearly impossible. So, a faster and better way is “a little at the time”, starting by identifying the biggest problems and then measuring the progress, and in some cases, using an “annotated” dataset, since it can be problematic to fix the data at the source due to unforeseen consequences and dependencies. The goal is to use the most modern tools (AI can help) to dedupe, make data consistent, etc, so that when the data gets to you as an executive, while perhaps not 100% perfect, you know how good it is (e.g., 97% correct) and that it can now be used to make better business decisions.
What Does Success Look Like?
At Cien.ai we coined the phrase “The New GIGO”. That means Garbage In – Gold Out. We help our clients measure, clean, and analyze their messy GTM data so that they can then use the improved data to improve their sales and marketing performance. That is where the gold part comes in…
About the Cien.ai Data Exec Series
This article is part of our Data Exec Series, inspired by our work with B2B business leaders, growth consultants, and PE operating partners. These articles focus on the aspects of becoming a data-driven executive, ready for the AI revolution.