In a previous blog post we briefly covered Data Lake Governance, Data Stewardship and their importance in maintaining a functional and sustainable enterprise data environment. This blog post will go into more detail on these topics, why they are so important and the steps that organisations should be taking in order to ensure that they establish and abide by a solid Data Stewardship model.
- Why is a Data Stewardship Model Important?
- Data Governance Best Practice
- Poor Stewardship and Data Breaches
- The Difference Between Data Stewardship and Data Governance
- Problems With Enterprise Data
- Roles of a Data Steward
- Data Quality and Identity Resolution
Why is a Data Stewardship Model Important?
One of the primary reasons for appointing a Steward comes down to pinning down accountability in order to ultimately improve the accessibility and security of enterprise data. Even when a new enterprise data solution is implemented, the lack of a plan around ongoing maintenance can spell long term disaster. Areas of responsibility that are poorly assigned increases the likelihood that the data situation will deteriorate over time. This will incur further unexpected costs and hurdles.
This is particularly important when dealing with unstructured data, since the lack of schema means that neglecting good practice will cause a deterioration in the quality of the data that is stored and the ease with which it can be found and accessed. When using a Data Lake rather than a structured Data Warehouse, the ease with which data can be ingested can lead to it being pushed into the Lake without regard for good discipline or best practice. The result of this slipshod approach would lead to what is colloquially referred to as a Data Swamp, meaning a storage platform that is unmanaged, unnavigable and ultimately an organisational burden.
Data Governance Best Practice, Regulation and Security
An IAB Data Stewardship report specifically identifies a case wherein a major retail company had the details of 110 million credit cards stolen and traces the cause of this back to a flawed Data Stewardship model. Whereas massive security breaches are not the only result of dropping the ball in this regard, they are certainly the most potentially devastating to a company.
Following on from this, the report points out the fact that good Stewardship represents the barrier between an organisation's data infrastructure and potentially unwelcome regulation. With every high-profile data security breach, the potential for "potentially wasteful and innovation-inhibiting" regulation to be enforced increases. It is in the interest of all companies to collectively apply and abide by good practices or lose the ability to self-regulate in that field.
We've also previously blogged about a specific and very common data infrastructure error that has had catastrophic results - financial planning using Excel.
Following data governance best practice leads to improved overall data security
Poor Stewardship as a Cause of Data Breaches
In an article on data breaches, Mike Small emphasises that the data lifecycle has several points across which its security can be compromised. As such, Information Stewardship is a catch-all term covering all the “good governance techniques (used) to implement information-centric security”. He also identifies the distinction between the classic model in which every bit of data has a business owner and only exists within the context of a business function, and the new model which is introduced by increasing volumes of unstructured data and means that it is constantly being created and circulated in ad-hoc, unadministered ways. The author states:
"Now, anyone who writes an email or creates a document is responsible for recognising the sensitivity and value of the information it contains."
The Verizon 2019 Data Breach Investigations Report recommends focusing on data hygiene, integrity and attention to detail as some of the main data governance best practices through which organisations can defend themselves against breaches and malicious attacks. All of these elements fall within the purview of good organisational Data Governance. In all the reported data breaches,
21% were directly attributed to errors and failures within existing systems.
These are all unforced organisational errors that didn’t have to happen if a good Data Stewardship model was implemented.
The Difference between Data Stewardship and Data Governance
These two terms are sometimes used interchangeably. However, while they both cover similar territory, there is a distinction that is important to recognise. Regarding this, David Plotkin identifies the Three Ps of effective data management: Policies, Processes and Procedures.
Policies are established at the enterprise level and refer to the overall strategy of “what needs to be done”, encompassing all aspects of an organisation. This is then crystallised into a Process, which is aimed at identifying higher level objectives which can be set and worked towards. These objectives are then met through the creation and enshrinement of Procedures, which represent the specific, operational management of data.
The Problems With Enterprise Data
Data Governance refers to thinking about organisational data at its highest levels, understanding and coming to grips with some of the problems that are fundamental to this space:
- The idea that data is used by a wide variety of people for an unfathomably large number of purposes and is, as such, resistant to rigid and unchanging taxonomies.
- At the same time, data does not explain itself and cannot be relied upon to provide its own context.
- The data process is highly collaborative, with many points of handover and lack of clarity on what is occurring upstream.
- These moments of handover are where mistakes, corruptions and distortions are most likely to happen.
- The fact that those who are responsible for building and maintaining the technical infrastructure will commonly not be familiar with the data’s business meaning or function.
An organisation which acknowledges and accounts for this problem space will be way on its way to achieving good Data Governance. Understanding that a high level of authority is necessary for effective change management is an added catalyst in this. Even if executives aren’t writing the actual policies, they need to be signing off or involved in the process in some other significant way.
Data Stewardship is concerned only with the final, Procedures aspect of the Three Ps of data management. They are not responsible for identifying and writing the higher level policies, but for interpreting and implementing it on a day-to-day level. At this level, the Stewards will have close technical familiarity with the systems that are being used and be able to pass recommendations up to a Data Governance Board or executive committee who will tie the specifics into the broader picture.
Roles of a Data Steward
One ultimate goal of this whole process is to strive towards attaining fully governed data through a well implemented Data Stewardship model. These are defined, at the element level, by being standardised in its business name and definition, having well documented calculation and derivation rules as well as working with strict rules surrounding creation, usage and deletion. Those responsible for this area work to ensure that data adheres to the Governance-mandated and organisationally relevant standards of:
Completeness, Uniqueness, Validity, Reasonableness, Integrity, Timeliness, Coverage and Accuracy
A Data Steward would be responsible for working towards achieving and formalising this in a number of concrete ways. With storage and warehousing, they would define and oversee the relations between data dimensions and data facts. Additionally, they would be aware of the rules governing aggregation and derivation and ensure that they are well documented. With this knowledge and oversight they would be well suited to identifying potential redundancy issues and preventing an identical data point from being added erroneously under a different name or with a different definition. You can learn more about Data Redundancy by reading our blog post on the topic.
Data Quality and Identity Resolution
The Data Steward would also be tasked with ongoing improvement and development of data quality. They would be responsible for defining the standards of quality required for any given business purpose as well as where specifically these quality checks need to be made within the database architecture. Additionally, they would have oversight over expected values and inputs as well as getting to the bottom of why certain data isn’t meeting these quality expectations.
Additionally, within the purview of a Data Steward would be identity resolution of data entities. This would involve overseeing a reference guide which links business terms and concepts with their specific form within the system. On top of this, a process for resolving ambiguities and properly incorporating new potential fields should be created.
Overall, a drive towards ensuring that data models meet project requirements is vital. Having the initiative to push back in situations where tightening project schedules and resources start to impact quality standards is important.
It is critical to remember that while Data Stewards may seem like the shock troops of implementing good Data Governance, they can only accomplish so much without full organisational opt-in and commitment. What is ultimately required is a full cultural shift towards accountability and attention to detail in all things data.
To learn more, watch our free Webinar on Data Governance which goes into detail about how we enable it for organisations in real world situations: