Data Stewardship is a vital part of Data Governance. It represents the allocation of specific roles and responsibilities within and organisation in order to oversee the initiatives of an Executive Steering Committee. These are usually quantified in a set of best practices and new policies, with the job of the data steward being to put them into effect and keep individuals within the organisation accountable.
- Why is Data Stewardship Important?
- What is Data Governance Best Practice?
- What are the Effects of Poor Data Stewardship?
- Data Stewardship vs Data Governance
- Problems With Enterprise Data
- What are the Roles of a Data Steward?
- Data Quality and Identity Resolution
- Automated Data Quality Monitoring Systems for Data Stewards
Why is Data Stewardship Important?
One of the primary reasons for appointing a Steward comes down to pinning down accountability in order to ultimately improve the accessibility and security of enterprise data. Even when a new enterprise data solution is implemented, the lack of a plan around ongoing maintenance can spell long term disaster. Areas of responsibility that are poorly assigned increases the likelihood that the data situation will deteriorate over time. This will incur further unexpected costs and hurdles.
This is particularly important when dealing with unstructured data, since the lack of schema means that neglecting good practice will cause a deterioration in the quality of the data that is stored and the ease with which it can be found and accessed. When using a Data Lake rather than a structured modern Data Warehouse, the ease with which data can be ingested can lead to it being pushed into the Lake without regard for good discipline or best practice. The result of this slipshod approach would lead to what is colloquially referred to as a Data Swamp, meaning a storage platform that is unmanaged, unnavigable and ultimately an organisational burden.
What is Data Governance Best Practice?
An IAB Data Stewardship report specifically identifies a case wherein a major retail company had the details of 110 million credit cards stolen and traces the cause of this back to a flawed Data Stewardship model. Whereas massive security breaches are not the only result of dropping the ball in this regard, they are certainly the most potentially devastating to a company.
Following on from this, the report points out the fact that good Data Stewardship represents the barrier between an organisation's data infrastructure and potentially unwelcome regulation. With every high-profile data security breach, the potential for "potentially wasteful and innovation-inhibiting" regulation to be enforced increases. It is in the interest of all companies to collectively apply and abide by good practices or lose the ability to self-regulate in that field.
We've also previously blogged about a specific and very common data infrastructure error that has had catastrophic results - financial planning using Excel.
Following data governance best practice leads to improved overall data security
What are the Effects of Poor Data Stewardship?
In an article on data breaches, Mike Small emphasises that the data lifecycle has several points across which its security can be compromised. As such, Information Stewardship is a catch-all term covering all the “good governance techniques (used) to implement information-centric security”. He also identifies the distinction between the classic model in which every bit of data has a business owner and only exists within the context of a business function, and the new model which is introduced by increasing volumes of unstructured data and means that it is constantly being created and circulated in ad-hoc, unadministered ways. The author states:
"Now, anyone who writes an email or creates a document is responsible for recognising the sensitivity and value of the information it contains."
The Verizon 2019 Data Breach Investigations Report recommends focusing on data hygiene, integrity and attention to detail as some of the main data governance best practices through which organisations can defend themselves against breaches and malicious attacks. All of these elements fall within the purview of good organisational Data Governance. In all the reported data breaches,
21% were directly attributed to errors and failures within existing systems.
These are all unforced organisational errors that didn’t have to happen if a good Data Stewardship model was implemented.
Data Stewardship vs Data Governance
These two terms are sometimes used interchangeably. However, while they both cover similar territory, there is a distinction that is important to recognise. Regarding this, David Plotkin identifies the Three Ps of effective data management: Policies, Processes and Procedures.
Policies are established at the enterprise level and refer to the overall strategy of “what needs to be done”, encompassing all aspects of an organisation. This is then crystallised into a Process, which is aimed at identifying higher level objectives which can be set and worked towards. These objectives are then met through the creation and enshrinement of Procedures, which represent the specific, operational management of data.
The Problems With Enterprise Data
Data Governance refers to thinking about organisational data at its highest levels, understanding and coming to grips with some of the problems that are fundamental to this space:
- The idea that data is used by a wide variety of people for an unfathomably large number of purposes and is, as such, resistant to rigid and unchanging taxonomies.
- At the same time, data does not explain itself and cannot be relied upon to provide its own context.
- The data process is highly collaborative, with many points of handover and lack of clarity on what is occurring upstream.
- These moments of handover are where mistakes, corruptions and distortions are most likely to happen.
- The fact that those who are responsible for building and maintaining the technical infrastructure will commonly not be familiar with the data’s business meaning or function.
An organisation which acknowledges and accounts for this problem space will be way on its way to achieving good Data Governance. Understanding that a high level of authority is necessary for effective change management is an added catalyst in this. Even if executives aren’t writing the actual policies, they need to be signing off or involved in the process in some other significant way.
Data Stewardship is concerned only with the final, Procedures aspect of the Three Ps of data management. They are not responsible for identifying and writing the higher level policies, but for interpreting and implementing it on a day-to-day level. At this level, the Stewards will have close technical familiarity with the systems that are being used and be able to pass recommendations up to a Data Governance Board or executive committee who will tie the specifics into the broader picture.
What are the Roles of a Data Steward?
One ultimate goal of this whole process is to strive towards attaining fully governed data through a well implemented Data Stewardship model. These are defined, at the element level, by:
- being standardised in its business name and definition
- having well documented calculation and derivation rules
- working with strict rules surrounding creation, usage and deletion
Those responsible for this area work to ensure that data adheres to the Governance-mandated and organisationally relevant standards of:
Completeness, Uniqueness, Validity, Reasonableness, Integrity, Timeliness, Coverage and Accuracy
A Data Steward would be responsible for working towards achieving and formalising this in a number of concrete ways. With storage and warehousing, they would define and oversee the relations between data dimensions and data facts. Additionally, they would be aware of the rules governing aggregation and derivation and ensure that they are well documented. With this knowledge and oversight they would be well suited to identifying potential redundancy issues and preventing an identical data point from being added erroneously under a different name or with a different definition. You can learn more about Data Redundancy by reading our blog post on the topic.
Data Quality and Identity Resolution
The Data Steward would also be tasked with ongoing improvement and development of data quality management. They would be responsible for defining the standards of quality required for any given business purpose as well as where specifically these quality checks need to be made within the database architecture. Additionally, they would have oversight over expected values and inputs as well as getting to the bottom of why certain data isn’t meeting these quality expectations.
Additionally, within the purview of a Data Steward would be identity resolution of data entities. This would involve overseeing a reference guide which links business terms and concepts with their specific form within the system. A business glossary and a data catalogue are tools that can help with this part of the process. On top of this, a process for resolving ambiguities and properly incorporating new potential fields should be created.
Overall, a drive towards ensuring that data models meet project requirements is vital. Having the initiative to push back in situations where tightening project schedules and resources start to impact quality standards is important.
Automated Data Quality Systems for Data Stewards
It is critical to remember that while Data Stewards may seem like the shock troops of implementing good Data Governance, they are only human and can only accomplish so much without full organisational opt-in and commitment as well as an effective data quality management system. What is ultimately required is a full cultural shift towards accountability and attention to detail in all things data and ongoing data quality success is dependent on that.
However, there are many tools that can make the job of Data Stewardship significantly easier, freeing them up from the mundanity of certain low-level tasks, allowing them to focus on higher impact initiatives. One of these is the implementation of an automated data monitoring system which can be set up with custom rules in order to detect anomalies when they occur. A Steward will be responsible for creating the set of rules and alerts which will set off immediate action on a relevant data quality issue before it becomes more of a serious liability.
To learn more, watch our free Webinar on Data Governance which goes into detail about how we enable it for organisations in real world situations: