Even modestly sized organisations can suffer from customer data fragmentation, leading to data redundancy and inconsistency. Users are providing their details through web forms, the sales department is entering leads, tech support is creating tickets, accounts and billing are generating masses of transaction information. The more channels and points of input there are, the worse the situation gets. It is important to know what these two issues are before taking important steps to minimise them.
How Does a Database Decline?
Information about the same client is entering the database through different tools which use their own proprietary data formats. It is not a straightforward task to ensure that when sales and billing refer to the same customer, this is reflected in the database. There are many ways to structure a database and work towards data standardisation, however, some are more effective than others.
Different types of data verification and data validation can help smooth out some of the worst offenders. Nevertheless, an individual at the point of interface will continue to stymie efforts at data standardisation. What eventually emerges from flawed user input and imperfect database design are problems of data inconsistency and redundancy.
Human creativity always has the power to be surprising, especially when it comes to the ways in which it can defy well defined input fields
What is Data Redundancy?
Data redundancy occurs when the same data point is multiplied across the database and can be found repeated in an unnecessary form.
It usually stems from either poor initial relational database design, wherein information is inefficiently structured and needlessly replicated within the same table. Additionally, it can happen over time as the database is expanded to factor in new elements with insufficient planning in how to implement them as efficiently as possible.
A common example of data redundancy is when a name and address are both present in different columns within a table. If the link between these data points is defined in every single new database entry it would lead to unnecessary duplication across the entire table. It would be much more efficient to define the name/address relationship in a separate related table.
Types of Data Anomaly
Depending on the size of the database, over time this would potentially increase data storage and processing requirements. This can also lead to an overly complicated web of table relations, which would likely lead to confusion during the querying process. Data updates might also not take effect across all relevant points, leading to data inconsistency. Another form of this redundancy would be if identical information was stored across several different unrelated tables. This is known as an update anomaly. It would likely occur due to the database being built in multiple stages, possibly with disjointed handovers and a poor overall architectural plan.
There are two other cases stemming from poor database design that lead to severe issues in ongoing database operations but cannot really be classified as a pure problem of redundancy. The first is an insertion anomaly, in which a relational model is so locked down and rigid that a new addition which falls outside the expected scope cannot be included either without substantial re-engineering or, as is more common, an ad-hoc solution which will typically introduce data redundancy errors down the line.
The final case is known as a deletion anomaly, wherein a database is built in such a sub-optimal way, that the deletion of a final data point in a table leads to unintended data loss of an element not located anywhere else. An example of this is if important address details are found within a user table. When a user is deleted, the address details could be completely lost.
How to Minimise Data Redundancy
A way to nip redundancy issues in the bud is to devote more time to planning out more efficient database structures before they are implemented. If that is no longer possible, then a process of database normalisation would have to take place. The aim of the database normalisation process is to reengineer tables so that the purpose of each is well defined and that the relations between them are purposeful and logical. This process also aims to set up the database in such a way that it is scalable, potentially expanded or retracted in the future without creating insertion anomalies or deletion anomalies.
Databases designed in the OLTP format tend to be highly normalised and more resistant to data duplication errors.
What is Data Inconsistency?
Data inconsistency is a situation where there are multiple tables within a database that deal with the same data but may receive it from different inputs.
Inconsistency is generally compounded by data redundancy. However, it is different from data redundancy and its adjacent anomalies in that it typically refers to problems with the content of a database rather than its design and structure. This is where the existence of multiple channels and touchpoints as well as the human propensity for putting creative spin on inputs begins to compound database problems.
An organisation is broken up into different departments, each using their own tools and systems, each following their own processes and with their own interpretation of the data points they are creating and using. When connecting these into the central database, a process of data integration needs to happen which tries to homogenise the differences in proprietary data models by cleaning and transforming the data as much possible with validation and semantic rules.
"An organisation is broken up into different departments, each using their own tools and systems, each following their own processes and with their own interpretation of the data points they are creating and using."
However, poor data quality is not purely a technical problem that can be fixed during data integration. When you are dealing with semantic and definitional questions across departments, it is almost impossible to achieve a consensus on certain business terms let alone implement a standardisation policy. The differences in data requirements across tools and departments mean that even the most comprehensive data integration focus will never consolidate into a single master, unduplicated database. Additionally, even if an organisation implements a Master Data Management program, dependable old human error will still be the cause of a certain level of data inconsistency.
How to Minimise Data Inconsistency
There are two approaches that are increasingly being implemented by organisations in order to tackle the problem of data inconsistency across applications. A central semantic store approach is one that involves focusing on meticulously logging and storing all the rules used by the database integration process in a single centralised repository. This is aimed at making sure that as data sources become updated or new ones are added they do not fall outside data integration rules.
A master reference store approach tries to solve the data consistency problem through greater centralisation. Specifically, it aims to create a main source-of-truth for the most important reference data and creates strict rules about syncing all secondary tables when a change is triggered in the main one. Instead of standardising data on an ad-hoc, partial and situational basis, this approach locks it down into a single central process in an attempt to keep greater control over the most important data points, even if it does come at a greater use of processing resources.
Whichever approach you decide is best for your organisations, implementing it and using it properly may not always be easy and straightforward. Implementing a data stewardship model and empowering a data steward is a good way to ensure that you start off on the right foot and remain on track.
Alternately, using a set of tools such as the Microsoft Power Platform allows you to easily start implementing a common data model across your apps, visualisations and workflow automation. With a data standardisation built into its very foundations, the Power Platform allows you to easily build ad-hoc business tools while avoiding data inconsistency.
The Potential of AI as a Data Quality Tool
No matter how an organisation decides to work towards customer data integration, AI has the potential to greatly expedite the process in certain ways. Regardless of whether reference data updates will be centralised or all integration rules will be carefully monitored, there will come a point at which it will be important to do an overall audit of a database to clean and consolidate the data, providing a solid base upon which to build.
Ironing out database inconsistencies manually is an extremely long and tedious task. Even with the use of some fuzzy matching models, results that are more than a little bit dissimilar risk not being picked up. As described above, the capacity for creative errors can stymie even an elaborate condition-based data quality filter. As such, this task is ripe for the application of machine learning.
By building a ML model which is fed a substantial number of typical data deviations, differing interpretations and misspellings, an underlying database normalisation system can be established to iron out a significant amount of the inconsistencies. This would be on a level above condition checking and fuzzy matching.
As with any machine learning application, investment in model creation, training data and human oversight is an important of the implementation. However, with proper implementation and oversight the potential gains for an organisation would be significant.
How Can You Prevent Data Quality Issues?
Thankfully, there are many ways to improve your organisational data quality, leading to a reduction in data redundancy and data inconsistency. These are relatively intuitive and low-tech to implement. Once you have the knowledge, you can see results relatively quickly.
You can find out more about the immediate steps you can take to improve your data quality by downloading our free eBook, written by one of our directors with over 20 years of practical data experience: