Data hygiene is usually an afterthought for most organizations, but failing to maintain a well-designed data infrastructure can become a costly mistake over time. Analytics is the master key used to make organizations more profitable by solving problems in everything from marketing to HR. Allowing dirty data to run rampant is like hiding the lock to profitability from your organization, which can ultimately hurt your bottom line. Every organization needs to understand how to identify poor data hygiene practices, and how it affects profitability.
Identifying poor data hygiene practices
Becoming a truly profitable data-driven organization starts with a strong foundation of fully integrated data sources that speak to each other in a way that can be manipulated and analyzed for actionable insights. Without this integration, your organization will be handicapped from fulfilling its profitability potential. The following factors are strong indicators that your data is dirty from an operational standpoint.
1. Adding data sources to your technology stack that can’t be integrated
One of the most common data cleanliness blunders I see happens when different teams within the organization bring in new technologies to their stack without any centralized review process by data and engineering teams. This practice results in having several key data sources that do not integrate with anything else or do not have an API available. Instead of solving attribution and data completeness issues caused by this inability to connect data points, many companies end up resorting to manual workflows that are time-consuming and costly.
2. Failing to integrate your technologies that can be integrated
This one may seem like a given: if your data sources have integrations built into them, you should configure those integrations with your other data sources. Surprisingly, many organizations fail to complete this simple step, which results in more attribution and data completeness issues.
3. Delivering half-baked integration solutions
Some organizations have the opposite problem and over-integrate data sources without a game plan. Maintaining clean data and a good data infrastructure requires very careful planning. Data and engineering teams need to determine which data sources make sense to integrate with which technologies, how that data should be joined together, and what additional data layer technologies should be added. Simply slapping everything together leads to something that “just works” and can cause unhelpful mismapping of data.
4. Lack of a centralized database for data warehousing
So you have all these data sources out there. Some can be directly integrated with each other, while some cannot. Some can store data for several years while others only store data for a few days. Regardless of the situation, all data should be warehoused in a centralized database, and any integrations that can’t be made at the data source level should be made using table joins within the database wherever applicable.
5. Too many hands in the cookie jar
Hands down, the biggest contributor to bad data is the lack of data governance within an organization. A data manager’s job is to create a data governance model and processes for each person within an organization who touches data sources. A single point of failure in data governance results in systemic failures—even something as simple as someone on the sales team not setting lead fields properly in Salesforce, or someone not using consistent naming conventions for media campaign names can cause this devastating domino effect.
6. Manual workflows
Excel spreadsheets are the scourge of all that is wrong with manual workflows. With all the data workflow automation solutions out there (including Google’s free Data Studio), many people are still using Excel spreadsheets for data collection and analysis. Humans are imperfect and prone to error, so it can only help us to use the tools we have at hand. Manual workflows also mean working with data that is outdated as soon as it gets pulled. These two factors together create a wealth of unclean and inaccurate data.
7. Spending more time cleaning data than analyzing it
The easiest and most surefire way to tell if your organization has a data hygiene problem is if your data team is spending 80% of their time cleaning data and only 20% analyzing it. If all your ducks are in a row, your data team should be spending 20% of their time cleaning and 80% analyzing.
Although most organizations claim to be data-driven, the reality is that most are not nearly as data-driven as they think they are or could potentially be. This disparity comes down to two things: poor data hygiene practices and a misunderstanding of the difference between reporting and analytics. Where reporting consists of being reactive to the data you collect, truly effective data analytics requires proactively leveraging data to make decisions in the first place. An organization’s ability to furnish clean data and dedicate resources to analytics beyond simple after the fact measurements is key to becoming more profitable.
Need more reasons to clean up your data?
Keep an eye out for my next post, Data Hygiene Part II, to learn how squeaky clean data can be leveraged to make your organization more efficient and successful.