I have done a lot of data analytics in my previous job. One thing that affects judgement based on data analytics is that data sometimes ceases to be aligned to reality. When we start with raw data, we begin collating it, summarising it, putting in different perspectives, making some safe assumptions and making concrete observations based on this analytics. This is round 1. In analytics, we keep diving deeper and deeper, because once we reach somewhere, we get more ideas about how we can use our analysis to find more and more observations based on our earlier assumptions and observations. As a data analyst would get into subsequent rounds (and trust me, it becomes an obsession to find cool and alienish observations out of basic and simple data). The problem is that all our assumptions made at different times of analytics add up, and makes our final observation far from reality.
So when we do analytics for a week or 2 on a piece of raw data, come out with a final observations, it's very hard to track back what we did with the data. As entrepreneurs or managers, we need to relate data to reality.
For example, if the data says that my cost allocation is faulty because my factory in and factory out time recorded by operations team is not matching with that of GPS records. We might want to go on the floor and check if that is the case with a real walkthrough of the process. It may turn out that the problem is just a minor process flaw like a difference in time because of loading unloading, not factored by GPS, but looked at by operations. The solution changes, because in one scenario, you put in controls to make sure operations records the right data, while the solution should be to send an extra person with your pickup van to reduce time taken within the client's factory.
Hope this makes sense..
Real-World Alignment: The Danger of Data Myopia
Whether it's an abstract description of real-world entities (i.e., master data) or an abstract description of real-world interactions (i.e., transaction data) among entities, data is an abstract description of reality. The creation and maintenance of these abstract descriptions shapes the organization's perception of the real world, which I philosophically pondered in my post "Plato's Data."
The inconvenient truth is that the real world is not the same thing as the digital worlds captured within our databases.
And, of course, creating and maintaining these digital worlds is no easy task, which is exactly the danger inherent with the real-world alignment definition of data quality - when the organization's data quality efforts are focused on minimizing the digital distance between data and the constantly changing real world that data attempts to describe, it can lead to a hyper-focus on the data in isolation, otherwise known as data myopia.
Even if we create and maintain perfect real-world alignment, what value does high-quality data possess independent of its use?
Real-world alignment reflects the perspective of the data provider, and its advocates argue that providing a trusted source of data to the organization will be able to satisfy any and all business requirements, i.e., high-quality data should be fit to serve as the basis for every possible use. Therefore, in theory, real-world alignment provides an objective data foundation independent of the subjective uses defined by the organization's many data consumers.
However, providing the organization with a single system of record, a single version of the truth, a single view, a golden copy, or a consolidated repository of trusted data has long been the rallying cry and siren song of enterprise data warehousing (EDW), and more recently, of master data management (MDM). Although these initiatives can provide significant business value, it is usually poor data quality that undermines the long-term success and sustainability of EDW and MDM implementations.
Perhaps the enterprise needs a Ulysses pact to protect it from believing in EDW or MDM as a miracle exception for data quality?
A significant challenge for the data provider perspective on data quality is that it is difficult to make a compelling business case on the basis of trusted data without direct connections to the specific business needs of data consumers, whose business, data, and technical requirements are often in conflict with one another.
In other words, real-world alignment does not necessarily guarantee business-world alignment.
So, if using real-world alignment as the definition of data quality has inherent dangers, we might be tempted to conclude that the fitness for the purpose of use definition of data quality is the better choice. Unfortunately, that is not necessarily the case.
So when we do analytics for a week or 2 on a piece of raw data, come out with a final observations, it's very hard to track back what we did with the data. As entrepreneurs or managers, we need to relate data to reality.
For example, if the data says that my cost allocation is faulty because my factory in and factory out time recorded by operations team is not matching with that of GPS records. We might want to go on the floor and check if that is the case with a real walkthrough of the process. It may turn out that the problem is just a minor process flaw like a difference in time because of loading unloading, not factored by GPS, but looked at by operations. The solution changes, because in one scenario, you put in controls to make sure operations records the right data, while the solution should be to send an extra person with your pickup van to reduce time taken within the client's factory.
Hope this makes sense..
Real-World Alignment: The Danger of Data Myopia
Whether it's an abstract description of real-world entities (i.e., master data) or an abstract description of real-world interactions (i.e., transaction data) among entities, data is an abstract description of reality. The creation and maintenance of these abstract descriptions shapes the organization's perception of the real world, which I philosophically pondered in my post "Plato's Data."
The inconvenient truth is that the real world is not the same thing as the digital worlds captured within our databases.
And, of course, creating and maintaining these digital worlds is no easy task, which is exactly the danger inherent with the real-world alignment definition of data quality - when the organization's data quality efforts are focused on minimizing the digital distance between data and the constantly changing real world that data attempts to describe, it can lead to a hyper-focus on the data in isolation, otherwise known as data myopia.
Even if we create and maintain perfect real-world alignment, what value does high-quality data possess independent of its use?
Real-world alignment reflects the perspective of the data provider, and its advocates argue that providing a trusted source of data to the organization will be able to satisfy any and all business requirements, i.e., high-quality data should be fit to serve as the basis for every possible use. Therefore, in theory, real-world alignment provides an objective data foundation independent of the subjective uses defined by the organization's many data consumers.
However, providing the organization with a single system of record, a single version of the truth, a single view, a golden copy, or a consolidated repository of trusted data has long been the rallying cry and siren song of enterprise data warehousing (EDW), and more recently, of master data management (MDM). Although these initiatives can provide significant business value, it is usually poor data quality that undermines the long-term success and sustainability of EDW and MDM implementations.
Perhaps the enterprise needs a Ulysses pact to protect it from believing in EDW or MDM as a miracle exception for data quality?
A significant challenge for the data provider perspective on data quality is that it is difficult to make a compelling business case on the basis of trusted data without direct connections to the specific business needs of data consumers, whose business, data, and technical requirements are often in conflict with one another.
In other words, real-world alignment does not necessarily guarantee business-world alignment.
So, if using real-world alignment as the definition of data quality has inherent dangers, we might be tempted to conclude that the fitness for the purpose of use definition of data quality is the better choice. Unfortunately, that is not necessarily the case.
- Jim Harris