Data Provenance Standards

The first cross-industry metadata to bring transparency to the origin of datasets used for both traditional data and AI applications.

Experts from 19 D&TA enterprises have co-created these standards to help organizations determine if data is suitable and trusted for use. The proposed standards are currently being tested.


Review the standards,
give your input

1. Review the standards

To learn more about the proposed standards and gain necessary context, please watch this video or download the information pack.

2. Give your input

Please share your input on the metadata and values by filling out the survey.


The eight proposed Data Provenance Standards surface metadata on source, legal rights, privacy & protection, generation date, data type, generation method, intended use and restrictions and lineage. Each metadata field has associated values. To see example values, download the Standards information pack. In addition, the standards call for using a unique provenance metadata ID with each dataset.



This essential information about the origin of and rights associated with data allows enterprises to make informed choices about the data they source and use. The result can be improvements in operational efficiency, regulatory compliance, collaboration and value generation.

These practical data standards, co-created by senior practitioners across industry, are designed to help ensure AI workflows are not only compliant with ever-changing government regulations and free of bias, but also generate increased business value.”
— ROB THOMAS, Senior Vice President, IBM Software and Chief Commercial Officer


Data transparency
is critical.

Trust in the insights and decisions coming from both traditional data and AI applications depends on understanding the origin, lineage, and rights associated with the data that feeds them. Lack of transparency has real costs, including unnecessary risks and foregone opportunities. And yet, many organizations today cannot answer basic data questions without considerable difficulty and investment.

To realize the value of data and AI requires a reliable cross-industry baseline of data transparency. Our Data Provenance Standards propose a solution.

40 %

of the time data scientists spend working is on basic data preparation and cleansing tasks, according to a 2022 Anaconda report.

61 %

of CEOs cite lack of clarity on data lineage and provenance as a top barrier to adoption of generative AI, according to the annual IBM Institute for Business Value CEO study.

Companies like ours feel a deep responsibility to ensure new value creation, as well as trust and transparency of data with all of our customers and stakeholders. Data provenance is critical to those efforts.”
— KEN FINNERTY, President, IT & Data Analytics at UPS


We took a "for industry, by industry" approach to creating the standards:

  • Started by identifying provenance pain-points from 25 use cases across our member industries

  • Iterated the standards through over 100 working sessions with Alliance Members and broader industry representatives

  • Consolidated from 53 to 8 standards, focusing on business value and feasibility

  • Co-created by CDOs, CIOs, and leads on data strategy, enterprise data and AI governance, compliance and legal from organizations across healthcare, automotive, IT, media, banking and finance, retail, education and other industries

Join our Community
of Practice

It's not just about managing data; it's about fostering trust and reliability in AI.

Join our Community of Practice as we shape robust, transparent, and adoptable Data Provenance Standards.


© 2023, The Center for Global Alliance.
All rights reserved.