02.22.24

Remarks on Data Transparency at Public Briefing, NAIAC

Jon Iwata, Executive Director, Data & Trust Alliance, delivered remarks at a public briefing on Data Transparency held by the National Artificial Intelligence Advisory Committee on February 22, 2024

Let me tell you briefly what the Data & Trust Alliance is. We are a not-for-profit consortium created by CEOs in September 2020 focused on the responsible use of data and AI.

Today we have 24 members. They include American Express, CVS Health, General Motors, Humana, IBM, Johnson & Johnson, Mastercard, Meta, Nike, Pfizer, UPS, Walmart and Warby Parker.

We are not a think tank. We do not write white papers or publish frameworks. The CEOs formed the Alliance to focus on operational practices. Therefore, we have one KPI: adoption – adoption by practitioners within our member companies and, we hope, beyond.

Data transparency is a critical issue for us—and has been the intense focus of our work for the past year. At the most fundamental level, in order to understand, shape, and monitor the development of AI models – including foundation models – we need to understand the quality of the data sets that train and feed them.

This is a known problem and a big problem to solve. Universal definitions and approaches to data quality remain elusive across industries, lacking consistent standards. Consequently, data science professionals often dedicate a significant portion of their time, ranging from 35 to 46%, to prepare data. This preparation involves not only organizing and cleaning data but also investigating its origin, licensing, and contextual history to gain quality-related insights before determining whether the data is trustworthy and fit for use.

Our member companies identified one aspect of data quality that they believed could be addressed in practical ways: data provenance.

This is what teams of experts from 19 companies set out to address more than a year ago. Their goal was to create a baseline set of contextual metadata applicable across industries— focusing on four standards: origin (for example, supplier, data origin geography, creation date, and method) and restrictions (for example, geographies where data can be stored or processed, and what are the legal rights associated with its use as denoted by a license). Currently, we have four standards and 16 metadata fields. The outcome is that practitioners have enough context from the metadata to assess trust.

We believed it was important to create these data standards with two lenses in mind: Practicality (to aid implementation) and business value (to aid adoption). We released the draft standards in December to invite input and are currently testing the standards across the Alliance. Our goal is to release version 1 in the second quarter of this year.

Data provenance and lineage standards are a vital starting point to data transparency for AI, but certainly not enough. We believe data transparency across areas such as data privacy, security, governance and accountability will become increasingly important – not only for ensuring responsible practices, but also for creating business value and competitive advantage

Our focus on practitioners also shaped how the Alliance developed its first toolset – in 2021 – which were criteria and education to help HR and procurement professionals evaluate algorithmic bias in the products and services provided by vendors to support workforce decisions – from recruitment and hiring to promotion and retention.

Our second toolset was for M&A teams. We knew that many companies were acquiring and investing in data- and AI-centric businesses, but they did not know the right questions to ask to properly assess valuation and risk. So more than 80 experts from M&A teams collaborated and developed a supplementary due diligence tool. And, as I’ve described, we’ve taken the same approach to data provenance standards.

Our members – given their maturity, scale and influence within their industries – bring considerable domain expertise to many of the emerging issues around policy and regulation and also the opportunities inherent in scaling these technologies with trust. But we also know that data and AI are largely terra incognita. And we’re eager to engage across business and society, to work together and to learn.

Thank you.