BLOG POST: 11.30.23
Trust and Transparency in Data
By Jon Iwata, Executive Director, Data & Trust Alliance
Provenance matters. We wouldn’t eat food or drink water whose sources we didn’t know. We require assurance about quality and often chain-of-custody for the medicines and products we buy. Our financial and legal systems demand documentation about the provenance and lineage of capital. These kinds of standards are essential—and expected—for any essential good or service in a well-functioning economy and society.
Today, those same standards are needed for the fuel of our increasingly knowledge- and AI-centric world—namely, the data, both structured and unstructured, that give us insights into what is happening now and is likely to happen next.
Up to now, standards for data provenance—where, when and how data was collected or generated, and who has the rights to use it—haven’t been established in ways that work cross-industry and cross-sector. This is one reason data scientists spend almost 40% of their time on data preparation and cleansing tasks, according to a 2022 Anaconda report. And 61% of CEOs in the 2023 annual IBM Institute for Business Value CEO study cite the lack of clarity on data lineage and provenance as a top barrier to adoption of generative AI.
The new data provenance standards announced today by the Data & Trust Alliance aim to fill that gap. We believe they can accelerate economic growth and societal progress through improving transparency and trust in data and AI.
I want to underscore one key word of this announcement: cross-industry. This is important because the true power of artificial intelligence lies in its capacity to find meaningful patterns in heterogeneous datasets—in the real world where predictive insight and economic value are found at the intersection of different industries, communities, cultures, regulations and the natural environment.
Without the kind of trust and transparency that such standards make possible, consumers, patients, citizens, companies and communities may be at the mercy of “black box” AI—and may be subject to the proprietary control of critical technology by a handful of tech giants, in a time of rapid, often chaotic change. In addition to the risks such a situation poses to important rights such as privacy and security, it also threatens to limit the potential for innovation that comes from a level, transparent playing field, in everything from healthcare to supply chains, manufacturing to consumer choice
Recognition of these threats and opportunities is driving global efforts to develop sound AI regulation—including President Biden’s recent Executive Order on AI, the UK’s AI Safety Summit, and private efforts like Responsible Innovation Labs. However, government regulation is not enough, and the major technology platforms are not the only stakeholders in shaping AI’s future. Businesses across industries—the actual users and deployers of AI—have a critical role to play in helping to shape the ways AI is used.
The CEOs of D&TA companies, including Mastercard, UPS, Pfizer, Nike, AARP, American Express, Walmart and more, knew they couldn’t wait. That’s why their experts rolled up their sleeves to collaborate over the past year to create a solid, robust and, importantly, implementable set of data provenance standards. Those standards identify the most essential, valuable and practical criteria for implementation in data and AI systems across industries—from when the data was generated, to lineage and source, to legal rights, data type and generation method. Of the eight data provenance standards, only one—generation date—is consistently surfaced in metadata today.
In the end, as in every previous technology revolution, sustainable value creation depends on trust, and trust depends on transparency. In turn, transparency increasingly depends on multi-stakeholder openness in how AI models and data standards are developed. This is why our proposed standards were reviewed with industry partners and ecosystem players beyond D&TA and are now undergoing testing across multiple use cases. It is also why the Alliance is inviting practitioners from all interested parties to join our community of practice and kick the tires of these standards from their own domain’s perspective. We expect to release the tested-and-approved standards around Q2 2024.
The first wave of any major technology shift is typically a period of both euphoria (“this changes everything”) and fear (“the end of humanity”). We saw this with the dawn of the machine age, with computers, with the Internet—and now with AI. What happens next is what’s most important—a phase of pragmatic implementation, when practitioners build systems that actually work for business, that are clear and fair, that will endure.
In the shift from the hype of a Wild West to a sustainable economy and civilization, D&TA is all about this second phase. Indeed, this moment is why our Alliance was created by CEOs of world-leading enterprises. For this work, too, provenance matters.
If you have interest in working with us on our data provenance standards, please reach out to email@example.com—we’d love to hear from you!