The Data & Trust Alliance

07.09.24

Blog Post

Announcing the D&TA Data Provenance Standards v1.0.0

By Saira Jesani, Executive Director, Data & Trust Alliance

Last November, we announced that 19 of our member companies—including American Express, Humana, IBM, Mastercard, Pfizer, UPS, Walmart and others—were creating cross-industry standards for where data comes from, how it’s created, and whether it can be used legally. After testing and validation with more than 50 organizations inside and outside our Alliance, we’re excited to announce today the release of v1.0.0 of the D&TA Data Provenance Standards.

Resources

Learn more about the Data Provenance Standards

Resources

This journey started because our members—and business at large—wanted better rules of the road around data quality. It was becoming clear that, in the race to adopt AI, data was the most sustainable source of competitive advantage. Where to start? With a hat tip to my conversations with Bernardo Tavares at J&J (now at Kenvue), Genevy Dimitrion at Humana, and Lee Cox at IBM, we decided that we would develop standards for data at the “start”—namely, data provenance: the origin and intended use of datasets being considered for both AI and traditional data applications.

The situation is this: There has been much movement around adopting and regulating AI, yet we don’t have standard definitions for its critical elements. Model evaluation is difficult, with little transparency around the data that trains and feeds those models. The consequences—from copyright infringement to privacy to authenticity—could affect both the technology’s business value and its acceptance by society, limiting organizations’ ability to determine what is to be trusted.

“

These practical standards, co-created by senior practitioners across industry, are designed to help evaluate whether AI workflows align with ever-changing regulations while also helping generate increased business value.”

—ROB THOMAS, Senior Vice President, IBM Software and Chief Commercial Officer

The standards were created with the goal of adoption by businesses with two lenses in mind:

Business value: To help organizations of all types and sizes assess which datasets are of higher quality, providing the transparency they need for efficiency, accuracy and reliability of data use. This includes minimizing risks from legal and copyright issues and regulatory compliance.
Implementation feasibility: To encourage adoption, only the most essential metadata—required to understand more about a dataset’s origin, its method of creation and whether it can be legally used—were selected.

Since we proposed the Data Provenance Standards in November, our working group of data, AI, ethics and legal experts has focused on testing and validating the standards with an eye towards adoption—ultimately creating v1.0.0. This has led to multiple refinements, including:

Simplifying the categories from 8 to 3 standards
Focusing on metadata that “shows, not tells,” so that data, AI, and legal teams have evidence (not attestations) to inform decisions
Being more specific in how we query Privacy Enhancing Technologies (PETs) to get more accurate information
Surfacing consent language that was shown to a user when collecting personal data to assess risks associated with consumer data usage

Months of testing produced case studies, the first of which has been released by IBM. This testing revealed the power of the Data Provenance Standards to increase overall data quality and to reduce the clearance review time for datasets used to train AI models.

The Data Provenance Standards are being implemented by Alliance members, many of which have extensive ecosystems—but our goal is to increase transparency across business. We encourage all organizations to take advantage of this free tool. More information can be found here on the context and business need for the Standards, including an Executive Overview. Our full adoption kit—designed for practitioners in data acquisition, procurement, data governance and compliance—can be found in our tech resources hub.

“

We are committed to the adoption of Responsible AI, and an important component of that is trust in the data and approach used to train and deploy AI models. We believe AI tech can help customers, businesses and society. It can make commerce smarter, safer, and more personal. It starts with standards, and this is an important step to ensure transparency and responsible innovation.”

—GREG ULRICH, Chief AI and Data Officer, Mastercard

The next phase is about adoption within the larger business ecosystem. Three important efforts are underway:

We are partnering with data providers across the globe—from Dun & Bradstreet to Middle Eastern healthcare data provider SDM and others—to adopt and showcase the use of the standards. We plan to convene data providers on this topic later this year. Please contact us if interested.
We are also working with software providers to share which tools can best adhere to the standards, and therefore help with data efficiency, accuracy and compliance. And we are collaborating with the EDM Council, leveraging their expertise in data management best practices, to identify solutions that best adhere to the standards. “Placing trust in data often begins with knowing the source,” said Jim Halcomb, global head of product management, EDM Council Partners. “The EDM Council is grateful to the Data & Trust Alliance for seeking input from our global data management community on standards for addressing this critical issue. We are pleased to announce our intention to adopt D&TA’s standards for tracing the origin of datasets into the next versions of our flagship data management capability frameworks (DCAM and CDMC). Industry collaboration for advancing data management and analytic capabilities is core to our mission, and we look forward to further collaboration with D&TA.”
Finally, we are thrilled to partner with the standards body OASIS to build an open community of practice for the long-term management of the standards. “OASIS is very pleased to be the future host of D&TA Data Provenance Standards, vital for tackling today's crucial data and AI challenges,” said Francis Beland, executive director, OASIS Open. “Bringing these standards to OASIS will help drive their global advancement and adoption, and we look forward to enhancing the interoperability, transparency, and effectiveness of these standards through collaborative efforts across diverse sectors.”

“

For data to have value, it must be trusted. For data to be trusted, it must have provenance and lineage back to trusted sources. This has never been more true than it is today, as new capabilities such as generative AI flood the business landscape. The Data Provenance Standards are an important foundational tool to help ensure that organizations can continue to make meaningful data-driven decisions.”

CHRIS HAZARD, Co-founder and CTO, Howso

Thank you

A huge thank you to all the experts who contributed along the way, and especially the core D&TA Data Provenance Standards Working Group, who spent more than a year refining these standards. And a personal thank you to the lead of this work, Kristina Podnar, senior policy director at the D&TA, who tirelessly guided the group. It is because of their contributions that this work is practical and valuable, and designed to solve the fundamental data and AI challenges we face today.

Data Provenance Standards Working Group members, including commentary from several:

Ajay Dhaul, Senior Vice President, Global Data, Applied AI and Digital Business Transformation, Kenvue: "As part of our commitment to healthy people and planet, our commitment to well-established data provenance standards is critical to our work providing the correct information to our customers and consumers for everything from sourcing to ingredients. The Data & Trust Alliance has helped us enable cross-industry importance of data provenance and has spearheaded common standards and policies for broad adoption. We are excited to be a member company and look to continue to contribute and leverage these new high standards across companies.”
Bernardo Tavares, Chief Technology and Data Officer, Kenvue: “The newly announced Data Provenance Standards represent a substantial step forward for companies committed to sharing data with greater traceability and trust. This trust, which extends to AI insights and decisions, is bolstered when companies better understand data lineage and associated rights, allowing them to make informed, ethical decisions to grow their business and help consumers.”
Bryan Bortnick, Counsel, IBM Data Governance, IBM: “Content creators justifiably are entitled to be acknowledged for their contributions, especially as businesses require quality data to develop AI applications and run business tasks effectively. The Data Provenance Standards provide industry value for content creators by ensuring that creators’ rights and terms of use are known and respected. Moreover, for businesses, these attributes, along with the values, are critical to making informed choices about sourced data, including suitability for various purposes.”
Bryan Kyle, Senior Technical Staff Member, Platform Architect, IBM Enterprise Data, IBM: “Data is central to everything we do. Understanding where data came from, how it was acquired, and what it contains is essential to trusting what’s built on top.”
Chris Hazard, Co-founder and CTO, Howso
Christine Pierce, Chief Data Officer, Audience Measurement, Nielsen: “As technology and AI are rapidly transforming industries, organizations need a blueprint for evaluating the underlying data that fuels these algorithms. Through the collaboration of experts across multiple industries and disciplines, the D&TA Data Provenance Standards meet this need. The standards promote trust and transparency by surfacing critical metadata elements in a consistent way, helping practitioners make informed decisions about the suitability of data sources and applications.”
Ed DePhillipis, Vice President Data Management & Quality, Mastercard
Genevy Dimitrion, VP, Data Strategy & Governance, Humana: “I am excited to see version 1.0.0 of the Data & Trust Alliance’s Data Provenance Standards, which mark a significant milestone in ensuring data transparency and accountability. At Humana, we are committed to upholding the highest standards of data integrity, and these standards will enhance the trust and reliability of the data we produce and consume across the enterprise to allow us to deliver value to the individuals we serve.”
Genta Spahiu, Director, Enterprise Data Governance Lead, Pfizer
Gregory Schaffer, Chief Counsel Cybersecurity and Vice President Digital Trust Compliance, Walmart
Jaye Campbell, SVP, Legal - Corporate, Media, IP & Privacy, AARP: “Participating in the development of the Data Provenance Standards provided AARP an excellent platform to encourage companies across diverse industries to consider the impact that advances in data and AI technologies have on people over 50.”
Laurel Shifrin, Vice President, Enterprise Data Governance, American Express
Lee Cox, Vice President, Integrated Governance & Market Readiness, Office of Privacy and Responsible Technology, IBM: “The lack of data provenance consistency from one dataset to another is a pain point for organizations that build and use AI. This will be further accentuated as regulatory frameworks around the world require data origin disclosures. It is a game-changer to have organizations agree on a consistent methodology to use end-to-end across the data ecosystem.”
Dr. Mallory Freeman, VP, Enterprise Data and Analytics, UPS: “The new Data Provenance Standards are key to making data more reliable, not just for us at UPS, but for our customers and their supply chains. We’ve strengthened our own standards while collaborating with forward-thinking leaders across industries, and companies and consumers around the world will benefit from this work.”
Michael Meehan, General Counsel and Chief Legal Officer, Howso: “Data provenance standards are important for the entire data ecosystem. Beyond simplifying ingestion and use of data, use of the D&TA Data Provenance Standards, particularly by upstream data providers, will allow analysis of appropriateness, consent, and quality of aggregated datasets in a way that we have not previously had.”
Orla Flannery, Privacy Program Manager, Chief Privacy Office, IBM: “The Data Provenance Standards will enhance transparency about the quality, origin, and intended uses and restrictions of datasets, which will help enterprises more rapidly access trustworthy data.”
Peter Cross, Head of Data, Warby Parker
Thi Montalvo, SVP, Performance Analytics, Transcarent: “These Data Provenance Standards are so important for Transcarent to be able to establish trust. We see them as another layer of quality assurance processes that are now required, at least within Transcarent, when we’re looking at the data to ensure not only the accuracy of data, but the usability of it.”
Thomas Birchfield, Technical Program Manager, Transcarent: “Safe adoption of future AI tools will require trust and transparency in the data powering them. Cross-industry collaboration toward a universal set of data provenance standards is a key component of leveraging data effectively and responsibly.”
Travis Carpenter, Senior Vice President, Data Quality and Sources, Mastercard
Zeenat Syad, Director of Data and AI Governance, UPS

See related

Data Provenance Standards. The first cross-industry metadata standards to bring transparency to the origin of datasets used for both traditional data and AI applications.

Learn more

Trust and Transparency in Data

By Jon Iwata, Executive Director, Data & Trust Alliance

Blog Post

11.30.23

Leading Corporations Introduce Data Provenance Standards

First cross-industry standards to bring transparency to the origin of data, enhancing trustworthiness of many data and AI applications

Press Release

11.30.23