This is an edited transcript of a BizData webinar on Exploring Practical Data Governance created using Microsoft Cognitive Services - Speech Studio. You can view the entire webinar for free by following this link.
- Challenges in Establishing a Practical Data Governance Framework
- The Data Governance Committee
- Locating Key Process Areas
- Data Governance Workshops
- Workshop Outcomes
- Data Stewards
- Building a Data Analytics Prototype
- Data Readiness Assessment
- Business Case Template
- Service Catalogue
- Setting Data Standards
- Rating Reports
- Data Classification Scheme
Hello everyone and welcome to today's webinar on practical data governance. My name is Nadav Rayman and i'll be taking you through the session today. This session is brought to you by BizData we're an analytics and AI specialist Microsoft partner with offices in Melbourne, Sydney, Perth, Brisbane and New York. You can find out a lot more about us on our website www.bizdata.com.au. And with that let's start.
Challenges in Establishing a Practical Data Governance Framework
Lack of priority and ownership
What we're going to cover today is addressing some of the data governance challenges that a lot of our clients face today and reflecting on the practices that our clients actually undertake to improve solving some of these challenges. There are three main themes that we're going to cover today. The first one is around a lack of priority and ownership when it comes to data governance, whether that's data quality or metadata management. A lot of clients talk to me about the difficulty in actually assigning ownership to resolve issues or pursue the resolution of issues.
A clear decision making framework
The other thing that we see a lot is a lack a of clear decision making framework or how data gets invested in whether that's to resolve data quality issues or just for analytics analytics investments. I'll use the term analytics governance and data governance interchangeably. we really see them as two sides of the same coin and you'll see through see that throughout their presentation today.
Another area is around lack of trust. For a lot of consumers of analytics there's a great degree of opacity around the definitions behind the reports they're seeing. If you're moving fairly new consumers of information from an Excel based format to something more interactive like Power BI what they lose is the transparency behind the calculations. In Excel you can step into the formulas and see how things are derived. When we move to more modern reporting and analytics platforms that becomes a lot more opaque. We can't necessarily see the definitions just by looking at the surface level of the report. That's a really important thing for us to address today and we'll talk about some techniques to actually alleviate some of that lack of trust.
The third area is really round that lack of collaboration that can occur between different sorts of teams. With a much more democratic access to different tools we see a lot of different types of teams now involved in analytics. You might have analyst teams as opposed to a central kind of reporting team that need to work together. Also, different people within different business units also producing their own data mashup or reports and what we want to alleviate is that kind of tension between people producing stuff in a federated matter and then actually aligning to some standards around security and consistency of those definitions and reports. What we want to do is turn that into more of a partnership between the analyst team in a central kind of IT or Business Intelligence team to one that's more collegiate or collaborative rather than competitive. That's really the reference point of what we're talking about today.
The Data Governance Committee
What i'm also going to talk about are four key streams of activity that really comprise a practical governance framework. I'm going to start with the governance committee. There might be all sort of steering committees that you have in your organisation to approve big project investments. What we encourage clients to do and what we've seen work really well with our clients is establishing a governance committee focused on analytics and data issues and investment. Ideally that governance committee would start with setting a vision for what is important to the organisation, going back to the organisation strategy and really setting out the key business priorities by which you know you'll actually make those sort of investments.
The Responsibilities of a Data Governance Committee
The second area you can see feeding in from the governance committee is some kind of role or function around user engagement and requirements definition. This is absolutely critical to actually making sure that your investments actually land with the business in the right way and are sustainable (are going to actually be used). The function of that team is really to help, first of all, define the business drivers for various process areas. I'll talk more about process areas in a minute. So, defining what the key business drivers are, what things not only do we want to know about, but what interventions can we make, what things would we change about how we run our business based on information being delivered to us.
Out of that process as well a key activity is actually identifying those subject matter experts that are going to help elicit the detailed business rules associated with putting together an analytics solution. Also, a business owner that's going to actually set the right frame of reference around what what is important to invest in.
Establishing a Data Steward
The next area is around data and analytics stewardship. I use those words interchangeably and one of the great things about doing analytics is that you actually are testing data in ways that you haven't before. Enabling analytics people or analysts to take on a data steward role allows them to reflect on the things that they are learning by trying to build new reports in terms of shortcomings of data today:
- Shortcomings of how data is collected
- The completeness of it
- Whether it's collected at all
- Whether it's going to be fit for purpose essentially to answer particular business questions
The way that we found this works best is to actually build an analytical prototype. By that I mean more than just a mock up of something visually, but actually building some kind of rough mashup of data so that you can elicit these issues.
Data Readiness Assessment
After we build that analytical prototype what we get and what we often deliver is a Data Readiness Assessment. What we're able to do is actually educate people on how fit for purpose the data is to answer these particular business questions and what are the investment areas that might be required. Whether that's to improve data quality in particular areas, actually collect new data points that haven't been collected before, sourcing external data. Actually taking a moment to pause and think about how well do we prepare and collect our data today to actually support some of our analytics objectives.
Business Cases and Analytical Prototypes
Ideally out of that as well is the presentation of some sort of business case back up to the data governance committee. Business cases can be quite scary for some business people because it involves a very long process. What i'm going to suggest today is a very short form business case. I'll share an example of a template that's essentially a business case on one page that really drives to the heart of what their return on investment might be and what the investment areas are. This makes it really easy for the governance committee to make a decision on whether you should move forward past an an analytical prototype.
Ideally out of this exercise of building an analytical prototype you have these analytics stewards working with the subject matter experts to come up with the definitions. There's some fundamental questions that might arise out of this exercise such as:
- What is a customer?
- By customer do we mean an active customer?
- Whereabouts in the pipeline of onboarding a customer do we actually call them a customer in the first place?
There's myriad other examples of where there's a lot of detail that needs to be fleshed out in terms of definitions.
Ideally, it would be captured in some sort of business glossary. Going back to the idea of trust, what we want to do is ideally publish that in a format so when people are consuming reports they can also see the terms of reference or the definitions associated with what they're looking at on a report. I'm going to show an example of that later.
Promoting the Use of Data Assets
Once that's done we want to promote those data assets. We want to take that prototype and go through all the steps that we normally go through to deploy that to production, which is a phrase that IT people will use. Essentially, make sure that it's robust and automated so that you don't have to manually update that model everyday or every week.
The process by which the data governance committee is moving the promotion of these data assets is proving the investment of the business case, setting some data quality priorities coming out of that data readiness assessment and then setting some security and access policies so that they can determine what sort of data should be accessed by whom. In today's world this is becoming ever more important to work out how to treat private data and confidential data.
How do we make sure it's secure?
The system automation team might also be called a BI team or a central reporting team. Their job is to take that initial prototype and align it to those common definitions, whether that be in a central data warehouse. This is to make sure that when we're defining gross margin, for instance, as a calculation that it's a consistent calculation that is named the right way so that it's distinct from other potential similar measures.
Rating and Certification of Data Assets
The other thing that we're going to talk about today is going through a data governance certification process. Part of promoting those data assets is actually rating that asset according to how much investment you've actually made in it. Just because we don't necessarily go through the entire process that we could go through to promote a data asset doesn't mean we can't publish it to the business. However, what we need to do is said some expectations around how reliable is it and how much investment the organisation has made in the underlying data. I'm going to show you some examples of that today.
Something else that we've seen as a technique that we're going to share today is the system automation team actually coming up with a service catalogue. A service catalogue is a technique often used by an IT infrastructure team, but we've seen it be really useful in an analytics context to effectively set some expectations with business people as to how much investment is required for each stage of maturing your data to the point where you can use it for business decision making. So that's the agenda for today's session, so hopefully that makes sense.
Locating Key Process Areas
Let's start with setting a vision. The way that this works best is to work out the key process areas for your organisation. Typically we find that there's four to eight things that are business needs to do as part of running their business. There's two ways to think about how you would brainstorm these areas.
One is to look at your business strategy. What are some of the kind of initiatives you have, such as "we plan to grow our branch network by twenty percent". So what does that mean as a process area?
The other area is really around operational imperative. What kind of business as usual activities do we need to really make sure we're doing well to keep our business running? The way we tend to phrase these is as imperatives. Rather than traditionally what you may have done, where you invest in analytics by saying "let's do something for marketing" or "let's do something for finance", really thinking more in terms of "what does a business need to fulfil as a cross-functional process across departments?"
Example: Customer Retention
The beauty of doing that is that you actually reflect on how data flows through different teams and you're coming together as different teams within an organisation to actually ultimately fulfil what you're doing to deliver to customers. An example is improving customer retention.
You might traditionally think marketing is the key stakeholder and that might be true. However, there's a number of other teams that are also stakeholders in that process. There'll be an operations team that might have a call centre that is also a stakeholder in that process of making sure that you retain customers. It might involve your product design team. It will inevitably involve finance in some fashion. That's just one example of how you can think about your business in terms of these imperatives. It means that when you do get to work shopping the requirements for each area you're immediately thinking more goal driven in terms of what are the business drivers to enable that imperative?
Data Governance Workshops
Here is a very simple idea of how we actually run a data governance workshop. This is the first step is around user engagement and requirements definition. It's a process that we take at BizData and one a lot of our clients have adopted to make sure that requirements that are collected for analytics initiatives are relevant and going to touch the most people in the organisation.
The workshop format is very simple. It's typically a two to three hour workshop and what we do is we make sure that we have representation across different teams and across different roles.
The Three Stakeholder Archetypes
The roles are listed in these three rows and we have a label for each archetype to give you a sense of what that person might be. In this case a Sponsor might be an executive, but the key thing about the sponsor is that they're interested in the outcome.
We know automatically that a sponsor will want to receive information in a particular format so we don't need to necessarily go into that level of detail in the workshop. What we want to find out is what they want to know about, so we are really doing that information design. One way to think about this is user-centred design which is very commonly used in software development. We're applying that to analytics.
The second archetype you can see there is an Optimiser. Think of this person as a manager or an analyst. They're really the people that need to understand the drivers behind what makes one of those KPIs go green or red. They typically at least need a dashboard they can drill down on. For more advanced users they might just need a model so they can create their own dashboard to answer certain questions that come up. The idea is that they're really trying to understand what could we change to improve the outcome in the first place.
The third archetype we have their is the Implementer. These are the most often forgotten people, but they're the people typically at the front office of their business process. Think, for example, a call centre operator or your field force or your sales team. They're the people that don't necessarily have a lot of time for reflection, so they're not necessarily interested in lots of clever dashboards. They really just want to optimise what they're doing day to day. They they want to have some kind of operational instruction or an alert to prioritise what they're doing. That might be as simple as focusing on particular customers in terms of the cross-sell activity or highlighting the debtors to call. Usually it's some kind of operational list or some alert that drives a call to action.
You might have multiples of these three stakeholders. Having representation across these three roles ensures that you're really thinking about each type of audience and you're designing the information in a way that the central kind of data model is going to service all of these stakeholders. Ultimately, what you're driving is for people to come together and work together around the same base of data. That's really what were trying to achieve with business intelligence or analytics in the first place.
The outcome of a data governance workshop, and this is a simplified example, would looks something like this.
This is a very simple breakdown of the KPI's that a sponsor might want to look at.
- For the optimiser: the measures and ways that you might want to slice that data.
- For the implementer: a list of reports or alerts that they receive.
This is an simplified example but the beauty of summarising it on the one page is that it's really easy for the workshop participants to sign off. It's really easy to see what the frame of reference is for your initial prototype. It also just makes it easy for everyone to collaborate on what those requirements are. So this is just a simple example of how you can summarise those requirements in the simple way. There's some other techniques that i won't talk about today that we use in the workshops that I'm happy to follow up with you if you want to find out more information.
What is a data steward? A question that I often ask lots of clients and every client I speak to will have a different definition of what a data steward might be. There might be different labels for them. Some organisations I've talked to will have a data owner and a data custodian and have different rules around what they do. This is not so much to be prescriptive about what a data steward is but more to drive some thoughts with you around what do you really want a data steward or different types of data stewardship roles to include? The way we think about a data steward is really the types of activities that they undertake.
Here's a list of some typical examples we see data stewards undertake:
First of all is capturing data quality issues in the central data quality register. There are heaps of observations that different analysts are constantly making about shortcomings there might be with data. Really one of the key things that we see is very useful for a data steward to do is actually capture them centrally so that there can be a call to action for the governance committee to set priority around vesting and improving those data quality issues or at least driving some process change. Very often data quality issues involve going back to the owner of a source system, whether it's a CRM or an ERP system and actually changing the way that system behave or the operational practices to capture data in that system in the first place.
Master Data Management
The second is maintaining master data. Reference data that today might be managed in a set of lists in Excel ideally would be in some online system so that the data is immediately available in a database with some kind of approval process baked into that.
The idea that those additional attributes that you need to capture around your data that might not be captured in your source system are managed somewhere. Typically they're used for managing hierarchies for reporting. It might be for other kinds of attributes. Stores, for instance. If you have a store network you might need to capture some attributes such as the floor space, the accessibility of parking for instance for that store. There's some simple examples of those additional attributes that you need to capture and it's typically called master data.
The third area is maintaining a central business data glossary or data dictionary. It's something that is often done, maybe not necessarily in the easiest way, to kind of search on. It might be a Word document that's published on an intranet or some kind of portal. I'm going to show an example of how we do it with clients today. The idea is to capture those definitions, as i mentioned before, so that people understand that gross margin is actually calculated this particular way and this is what it means.
The fourth area is around more of a general taxonomy and catalogue in some sort of portal. Tagging reports in a particular way so they're easily discoverable. That might be by particular business areas, it might be particular themes. You might use those process areas that i discussed before as a way of tagging reports by a theme so that they are easily searchable. This is where you might also apply your ratings of different data assets as well. As a business consumer, you can understand very quickly how reliable this data is. I might be comfortable using a bronze level rated asset, whether that's a report or a table that are going to produce a new report on. I might be comfortable using that bronze asset for a general kind of heads-up or trend about something. If i'm going to state something to the market I really need to have a gold level asset, to be to be confident that it's going to be accurate.
Building a Data Analytics Prototype
Talking about some of the key elements of a prototype, there's really three steps that are involved. This should be no surprise to you. When we produce an analytical prototype that involves some data sourcing, locating where the original source of data should be is often a large exercise. This is because the first time that you explore the raw data from a source system you're trying to work out:
- Where should the data come from?
- What's actually reliable?
- Do we use the system in a way that the data model makes sense?
The next step is really building that data model. Effectively, that's most of the effort involved in the prototype. Regarding visualisations, obviously they're important to understand how business people will consume that information, but that's typically only ten percent of the effort in the prototype. Most of the effort involved is finding the data in the first place and then bringing that together in a data model to relate data together and create a set of derived measures to really answer those business questions that you have.
Data Readiness Assessment
Out of that prototype, as i mentioned before, we typically will deliver a data readiness assessment on these five areas as a way of kind of highlighting the key topics.
The first things we do is work out:
- How hard is the data to access in the first place?
- Are there challenges in getting the data out?
- Are we reliant on getting data out of a software as a service system where the API throttles how much data we can pull out?
These are the sorts of things that we need to have a heads-up on early before we undertake a large investment in setting up analytics for a particular area.
How well understood and how well agreed are the definitions around the data? That typically is a long process to work with different stakeholders to agree on the definitions of certain measures. Going back to that gross margin example there might be five variants of that gross margin measure that exist across an organisation today. The worst thing you can do is have them all represented as a label on a report. An executive can't really understand the difference between the five measures. The primary reason for that is that they're not named differently.
One of the important things around definitions is coming up with slight variants of names so that people can understand: "ok i'm looking at a slight variant of the calculation" and we don't waste time in meetings arguing about which number is right. There are different numbers that are right for different purposes, but that's a fairly involved process and that's a big part of producing analytics solutions - defining the semantics and making sure the data fits those semantics.
Manual intervention refers to initiatives that you might need to undertake in the future around master data. What data is not actually captured in a formal system? For example, managed by someone updating something manually in a spreadsheet. How can we more formally manage that process in the future so that if we want to automate some reporting or analytics we're not reliant on our manual step every day to refresh our spreadsheet.
Quality of Content
This is really honing in on how fit for purpose is the data for a particular business question. There's different kinds of things we're going to look at there.
- Is it consistently populated?
- Are there issues? For instance, is our address is all over the place and do we need to clean up addresses?
- If we want to report by geography do we have the state fields consistently populated?
- Are we going to have a report with lots of unknowns?
The very typical thing that we see in analytics initiatives, the first report that you stand up will highlight a lot of these data issues in the first place. I've seen many executives ask why is eighty percent of our revenue coming from unknown? That's a good thing because it's going to drive a discussion across the business around how do we actually improve how we codify and capture our data in the first place?
Quality of Structure
How well you know the structure is going to facilitate some of our objectives? This is particularly important when we're bringing data in from different systems and we want to coalesce that in a way to answer a question. For example, customer profitability, which naturally will involve some data from a CRM system and potentially from your GL, some kind of finance system.
- Can we join that together?
- Do we need to have some kind of allocation model to assign costs to each customer?
Doing the prototype allows us to get a heads up around all these topics up front, so that when we go to the next stage of producing a business case, we know what areas of investment are required.
Business Case Template
Here's an example of a business case template was talking about before.
The beauty of it is that it's a business case on a page. You can see on the left hand side we have the financial aspect of it. I don't think that necessarily any business people are after absolute certainty of the benefit. It's very hard to prove that upfront, but it should be reasonable. The best way that I see how this this works is that you take a fairly conservative approach to the estimated financial benefits. For instance, if we enable better sales productivity through this customer analytics project, we expect a .01 % improvement in our revenue and that tends to add up. You'll find that definitely the revenue side of your business case is generally going to be more compelling than the cost side.
Financial benefits are only one side of things, there's also some non financial benefits as well. They are obviously harder to quantify but are also a big consideration. Sometimes it's not necessarily a financial benefit as such, but a compliance imperative to comply with some regulation or just make sure that you're complying with your internal standards.
The middle column is giving some context around the business context of the initiative. There you can consider: if we don't invest in it now, what's the impact of postponing the project? What might be the shortcomings of not doing it now? Also identifying the stakeholders involved.
Key Investment Areas
The right hand side is around the key investment areas. Undertaking an analytics investment it's much more than software or just a data pipeline. There's a number of different areas that we need to invest in, so we need to identify them as part of this business case process. Hopefully that gives you an idea of how you can create a business case very succinctly for a governance committee to actually sign off.
I also talked about the idea of a service catalogue. Here is an example:
Different clients will have a different structure , but the beauty of it is that you're effectively providing time boxed packages for clients or yourself to use. This basically offers a set of services to business people that want to invest in analytics but don't know how to get started. These are based on what we see as a reasonable amount of effort for each category of service. It means that you have a more predictable process around how the investment decision making process works.
Bringing this together, if a business manager wants to invest in improving the situation around customer churn, you can take them through a process where you say:
"Ok, well the first step is doing a feasibility assessment. It's going to be a ten day exercise to build a prototype. From there we'll produce a business case and if we do find that we want to invest further, you have the next data model implementation phase."
Generally speaking, we see the best practice being around two to three months for a data model implementation for a particular process area.
Setting Data Standards
Getting more into some of the traditional areas of what people associate with data governance and data stewardship, I want to talk about setting standards for a data and report catalogue. Here are some examples of the sorts of fields that you might mandate or you might deem mandatory to capture before you publish a report.
The idea here being that you don't want to publish some visualisations to a set of business users without giving context around what that report means. These are just some examples of what you might do, the different types of elements in your data or report catalogue to make sure that you have. You might want to make sure that the calculation logic is captured for all of your measures. You might not necessarily want to put a description for every dimension and attribute because there might be thousands of them. However, the important business terms you might want to capture.
You might want to also reflect part of your own data stewardship process into this catalogue. This is the place that you capture that process in the first place. Who are the owners? You might have different labels for different people involved. It might be who the custodian is or the approver.
This is probably one of the most powerful techniques that i've seen clients use and it's very effective. I'm just giving you one example of how you might come up with that scheme. The beauty of this is that you're taking an approach, when you establish a rating scheme, of being a lot more inclusive across the whole organisation around all of the reports that are being produced.
Not all of the reports will go through the manufacturing process that you would want to ideally undertake. Because you might not have time, you might not be able to justify it. However, that doesn't stop people producing reports. I think the most important thing that we're seen here is to actually set expectations of how much investment has been undertaken. Where you can drive really good adoption of this is educating the senior management team around the difference between a gold, silver and bronze report.
Deciding on How to Rate a Report
I've given some examples above on what might be your decision making criteria as to how to badge the report as gold, silver or bronze. However, different organisations work differently in terms of possibly being more data quality oriented, in terms of how they rate them. In this case I'm talking about the level of investment that's been undertaken underneath the report. The idea is that you're making it really transparent as to how much work has gone in to verify that the data is correct, that it's going to be reliably refreshed everyday or every hour depending on on the business' needs.
I've seen a lot of organisations use that rating scheme, to the extent where they badge the surface area of the report as well. There's can be a badge in the top right hand corner that clearly indicates how reliable the report is. The beauty of it is that, of course you can have all of the reports that people are producing in the central portal and then search out the gold ones or the bronze ones. At a certain point in time if a bronze report, for instance, is being used a lot, you might want to undertake a process to promote that to silver or gold. Those are just some ideas of how you can set expectations around the quality of reports.
Data Classification Scheme
Finally, this comes up very often, and more so in the last couple of years, is actually undertaking a data classification scheme. This means really setting some standards around each table or even particular fields, in terms of what operational risk is associated with that data element.
Done well, you can set up a catalogue of all of these elements that will drive security through many layers in your analytics environment. Not only in your report layer or your semantic layer, if you're using cubes or tabular models, but right down to the data lake level. You might use techniques like schema separation or data masking to make sure that only the right people can see the right data. Particularly important with privacy and making sure that you're privacy compliant. The key thing is to start with, before you can set up a good security model, you really need to know your classification of the catalogue. That will then drive who can access what.
I've seen some organisations split it between a general purpose business consumer end user agreement and one specifically for power users where they have privileged access to different systems because of the nature of the work they need to do. The idea is not necessarily to read through all the fine print but to have something in place. I'm going to show you an example of how you might implement something like that now.
[SKIP TO 42:49 FOR THE LIVE DEMONSTRATIONS]