Why Data Management Must Be Driven by Automation

The origins of data management aren’t hard to guess. People in an organization discovered there were different definitions of the same terms, like customer, lead, or prospect. They questioned the quality of the data. Data needed to be managed for both these reasons, and data management was the result.

Sounds simple and straightforward, but that’s not the world we live in today, unfortunately. A software company likely needs feature usage tracking reporting with metrics such as total number of feature users or number of unique feature users over each day or week in order to understand feature adoption, to tune feature scalability or elasticity, or to measure success of product strategy. Yet all too often, you end up with 50 different product managers who each have their own feature usage report, each one using different calculations and definitions. This causes quality concerns, and poor executive insight into product usage. What’s the source of truth when no one is measuring feature usage the same way, or in 50 different ways?

Data governance demands automation

I’ve built master data management initiatives, data quality systems, and data catalogs. I’ve rolled out countless data governance councils. And one thing I’ve concluded is that as the amount of data we face on a daily basis continues to balloon, the only way to really do data management at scale is through automation.

Consider that automated data management is now a prerequisite for building intuitive, self-service capabilities that allow people to govern or manage data. For example, you can greatly reduce data quality issues by having proper master data processes in place with governance around change management and stewardship, or you can improve speed to insight or innovation capabilities by ensuring data assets are democratized.

Make no mistake, you must bring the processes of governance together with your data and data technology stacks to automate data management as part of a data engineering lifecycle. That way you can stop chasing the data.

As we all know, data is changing each second of every day. Usage patterns evolve, integrations and acquisitions happen, business strategies shift and priorities change. This constant motion, along with the complexity of hybrid, multi-substrate technology stacks, means data is ever growing, ever changing and frankly even more important.

As a data leader, the onus is on you to figure out how to modernize your data management processes and how to automate these processes in an intuitive way to drive quality, change management awareness, accountability of stewards and inherited policies via a seamless Data Systems Development Life Cycle

(SDLC)-based approach. When intuitive process automation and right-sized data governance is in force, you no longer have to herd cats to pay the perceived “data management tax,” and you can reap the value of a trusted data-driven culture effectively.

From manual chaos to automated governance

Here’s a process you might consider to start getting data governance under control:

Align people on the governance strategy.
Identify the data gaps and correlate them to people’s specific use cases or pain points to bring rigor, conformity, and alignment to the thought processes.
Focus on the automations and the tools you need to actually solve the problems by improving quality while vastly improving linkability and trust.

I'm not a ‘governance is just a process’ person, because people don’t tend to read data policies. I read them to publish them, but then I work on automating them into the right tools and passing them down. Any other course of action is not scalable in a modern world. Even if you're looking at a specific domain of data, just at a domain level, it's too much.

Baking process into the data life cycle

At a previous company I created an AI real-time data crawler for data quality that was built around ideas like usage or a financial billing process. It was all structured on anomaly detection, so there was still a human touch regarding big thresholds or setting limits. What was so satisfying, though, was that we got to the point where we could automate a lot of the thought processes and then boil down to all the things that an individual steward or owner needed to look at.

I'm not a fan of pure process because people try to avoid processes. I prefer to bake them into the data life cycle. So if you're building a new pipeline or creating a new report, how can you get to automated data contracts?

You can put in data detections around things like freshness or latency because you define it in a contract.
You can get field-level lineage to support things like GDPR.
You can do impact awareness so that if someone wants to decommission something and they don't know their users, those users can see it during the course of their work.

With automation, I also could create access control of the contract and then have auditability of who approved what – by steward, by workflow, and by service group. I try to solve things in that way because I can scale and keep up with the volume of data and someone doesn't even have to think about the policy or the process. I just train them to go to a catalog and discover. And if you discover something and you want to use it, you elect to subscribe, the technology creates the contract, and it's all intuitive.

With automation, data governance can become a set of intuitive operations you perform but don’t really have to do a lot of thinking about. Which saves you and your data teams cycles to focus on more important things.