00:00:02.685 --> 00:00:03.170
[MUSIC]

00:00:03.169 --> 00:00:06.800
KATHLEEN SULLIVAN: Welcome&nbsp;
to AI Testing and Evaluation:&nbsp;&nbsp;

00:00:06.800 --> 00:00:11.520
Learnings from Science and Industry.&nbsp;
I'm your host, Kathleen Sullivan.

00:00:11.520 --> 00:00:16.640
As generative AI continues to advance, Microsoft&nbsp;
has gathered a range of experts—from genome&nbsp;&nbsp;

00:00:16.640 --> 00:00:21.040
editing to cybersecurity—to share how&nbsp;
their fields approach evaluation and&nbsp;&nbsp;

00:00:21.040 --> 00:00:26.160
risk assessment. Our goal is to learn from&nbsp;
their successes and their stumbles to move&nbsp;&nbsp;

00:00:26.160 --> 00:00:31.360
the science and practice of AI testing&nbsp;
forward. In this series, we'll explore&nbsp;&nbsp;

00:00:31.360 --> 00:00:41.266
how these insights might help guide the future of&nbsp;
AI development, deployment, and responsible use.

00:00:41.266 --> 00:00:41.280
[MUSIC ENDS]

00:00:41.280 --> 00:00:45.760
For our introductory episode, I'm pleased to&nbsp;
welcome Amanda Craig Deckard from Microsoft&nbsp;&nbsp;

00:00:45.760 --> 00:00:49.920
to discuss the company's efforts to&nbsp;
learn about testing in other sectors.

00:00:49.920 --> 00:00:54.320
Amanda is senior director of public&nbsp;
policy in the Office of Responsible AI,&nbsp;&nbsp;

00:00:54.320 --> 00:00:57.840
where she leads a team that works&nbsp;
closely with engineers, researchers,&nbsp;&nbsp;

00:00:57.840 --> 00:01:03.680
and policy experts to help ensure AI is being&nbsp;
developed and used responsibly. Their insights&nbsp;&nbsp;

00:01:03.680 --> 00:01:10.400
shape Microsoft's contribution to public policy&nbsp;
discussions on laws, norms, and standards for AI.

00:01:10.400 --> 00:01:12.538
Amanda, welcome to the podcast.

00:01:12.538 --> 00:01:13.160
AMANDA CRAIG DECKARD: Thank you.

00:01:13.160 --> 00:01:17.440
SULLIVAN: Amanda, let's give the listeners&nbsp;
a little bit of your background. What's your&nbsp;&nbsp;

00:01:17.440 --> 00:01:22.240
origin story? Can you talk to us a little bit&nbsp;
about maybe how you started in tech? And I would&nbsp;&nbsp;

00:01:22.240 --> 00:01:27.585
love to also learn a little bit more about what&nbsp;
your team does in the Office of Responsible AI.

00:01:27.585 --> 00:01:36.240
CRAIG DECKARD: Sure. Thank you. I'd say my&nbsp;
[LAUGHS] path to tech, to Microsoft, as well,&nbsp;&nbsp;

00:01:36.240 --> 00:01:42.160
was a bit, like, circuitous, maybe. You know,&nbsp;
I thought for the longest time I was going to&nbsp;&nbsp;

00:01:42.160 --> 00:01:52.720
be a journalist. I studied forced migration. I&nbsp;
worked in a sort of state level sort of trial&nbsp;&nbsp;

00:01:52.720 --> 00:01:57.920
court in Indiana, a legal service provider&nbsp;
in India, just to give you a bit of a flavor.

00:01:57.920 --> 00:02:06.160
I made my way to Microsoft in 2014 and have been&nbsp;
here since, working in cybersecurity public policy&nbsp;&nbsp;

00:02:06.160 --> 00:02:13.440
first and now in responsible AI. And the way&nbsp;
that our Office of Responsible AI has really,&nbsp;&nbsp;

00:02:13.440 --> 00:02:19.360
sort of, structured itself is bringing&nbsp;
together the kind of expertise to really&nbsp;&nbsp;

00:02:19.360 --> 00:02:24.240
work on defining policy and how to&nbsp;
operationalize it at the same time.

00:02:24.960 --> 00:02:29.680
And, you know, that means that we have&nbsp;
been working through this, you know,&nbsp;&nbsp;

00:02:29.680 --> 00:02:37.920
real challenge of defining internal policy and&nbsp;
practice, making sure that's deeply grounded in&nbsp;&nbsp;

00:02:37.920 --> 00:02:43.040
the work of our colleagues at Microsoft Research,&nbsp;
and then really closely working with engineering&nbsp;&nbsp;

00:02:43.040 --> 00:02:48.000
to make sure that we have the processes, that we&nbsp;
have the tools, to implement that policy at scale.

00:02:48.000 --> 00:02:54.400
And I'm really drawn to these kind of hard&nbsp;
problems where they have the character of two&nbsp;&nbsp;

00:02:54.400 --> 00:03:00.240
things being true or there's like, you know,&nbsp;
real tension on both sides and in particular,&nbsp;&nbsp;

00:03:00.240 --> 00:03:05.600
in the context of those kinds of problems, roles&nbsp;
in which, like, the whole job is actually just&nbsp;&nbsp;

00:03:05.600 --> 00:03:10.320
sitting with that tension, not necessarily, like,&nbsp;
resolving it and expecting that you're done.

00:03:10.320 --> 00:03:17.600
And I think, really, there are two reasons why&nbsp;
tech is so, kind of, representative of that kind&nbsp;&nbsp;

00:03:17.600 --> 00:03:22.320
of challenge that I've always found fascinating.&nbsp;
You know, one is that, of course, tech is,&nbsp;&nbsp;

00:03:22.320 --> 00:03:28.480
sort of, ubiquitous. It's really impacting so&nbsp;
many people's lives. But also, you know, because,&nbsp;&nbsp;

00:03:28.480 --> 00:03:34.560
as I think has become part of our vernacular now,&nbsp;
but, you know, is not necessarily immediately&nbsp;&nbsp;

00:03:34.560 --> 00:03:38.640
intuitive, is like the fact that technology is&nbsp;
both a tool and a weapon. And so that's just,&nbsp;&nbsp;

00:03:38.640 --> 00:03:43.520
like, another reason why, you know, we have&nbsp;
to continuously work through that tension and,&nbsp;&nbsp;

00:03:43.520 --> 00:03:47.960
sort of, like, sit with it, right,&nbsp;
and even as tech evolves over time.

00:03:47.960 --> 00:03:52.000
SULLIVAN: You bring up such great points, and&nbsp;
this field is not black and white. I think that&nbsp;&nbsp;

00:03:52.000 --> 00:03:56.800
even underscores, you know, this notion that you&nbsp;
highlighted that it's impacting everyone. And,&nbsp;&nbsp;

00:03:56.800 --> 00:04:02.080
you know, to set the stage for our listeners,&nbsp;
last year, we pulled in a bunch of experts&nbsp;&nbsp;

00:04:02.080 --> 00:04:08.720
from cybersecurity, biotech, finance, and we&nbsp;
ran this large workshop to study how they're&nbsp;&nbsp;

00:04:08.720 --> 00:04:12.240
thinking about governance and those playbooks.&nbsp;
And so I'd love to understand a little bit more&nbsp;&nbsp;

00:04:12.240 --> 00:04:17.200
about what sparked that effort—and, you&nbsp;
know, there's a piece of this which is&nbsp;&nbsp;

00:04:17.200 --> 00:04:23.905
really centered around testing—and to hear from&nbsp;
you why the focus on testing is so important.

00:04:23.905 --> 00:04:28.480
CRAIG DECKARD: If I could rewind a little bit and&nbsp;
give you a bit of history of how we even arrived&nbsp;&nbsp;

00:04:28.480 --> 00:04:34.640
at bringing these experts together, you know,&nbsp;
we actually started on this journey in 2023.&nbsp;&nbsp;

00:04:34.640 --> 00:04:42.240
At that time, there were, like, a lot of&nbsp;
these big questions swirling around about,&nbsp;&nbsp;

00:04:42.240 --> 00:04:45.360
you know, what did we need in terms&nbsp;
of governance for AI? Of course,&nbsp;&nbsp;

00:04:45.360 --> 00:04:51.840
this was in the immediate aftermath of the ChatGPT&nbsp;
sort of wave and everyone recognizing that, like,&nbsp;&nbsp;

00:04:51.840 --> 00:04:56.560
the technology was going to have a different level&nbsp;
of impact in the near term. And so, you know,&nbsp;&nbsp;

00:04:56.560 --> 00:05:00.960
what do we need from governance? What do we need&nbsp;
at the global level, in particular, of governance?

00:05:00.960 --> 00:05:07.840
And so at the time, in early 2023 especially,&nbsp;
there were a lot of attempts to sort of draw&nbsp;&nbsp;

00:05:07.840 --> 00:05:14.800
analogies to other global governance institutions&nbsp;
in other domains. So we actually in 2023 brought&nbsp;&nbsp;

00:05:14.800 --> 00:05:18.720
together a different workshop than the one&nbsp;
that you're referring to specifically focused&nbsp;&nbsp;

00:05:18.720 --> 00:05:26.240
on testing last year. And we, kind of, had&nbsp;
two big takeaways from that conversation.

00:05:26.240 --> 00:05:33.680
One was, what are the actual functions of these&nbsp;
institutions and how do they apply to AI? And,&nbsp;&nbsp;

00:05:33.680 --> 00:05:37.280
actually, one of the takeaways was they&nbsp;
all sort of apply. [LAUGHS] There's,&nbsp;&nbsp;

00:05:37.280 --> 00:05:42.480
like, a role for, you know, any of&nbsp;
the functions, whether it be sort of&nbsp;&nbsp;

00:05:42.480 --> 00:05:47.200
driving consensus on research or building&nbsp;
industry standards or managing, kind of,&nbsp;&nbsp;

00:05:47.200 --> 00:05:52.000
frontier risks, for thinking about how&nbsp;
those might be needed in the AI context.

00:05:52.000 --> 00:05:56.000
And one of the other big takeaways&nbsp;
was that, you know, there are also&nbsp;&nbsp;

00:05:56.000 --> 00:06:04.480
limitations in these analogies. You know, each&nbsp;
of the institutions grew up in its own, sort of,&nbsp;&nbsp;

00:06:04.480 --> 00:06:10.640
unique historical moment, like the one that&nbsp;
we sit in with AI right now. And in each of&nbsp;&nbsp;

00:06:10.640 --> 00:06:15.200
those circumstances, they don't exactly&nbsp;
translate to this moment. And so, yeah,&nbsp;&nbsp;

00:06:15.200 --> 00:06:20.720
there was like this kind of, OK, we want to&nbsp;
draw what we can from this conversation and&nbsp;&nbsp;

00:06:20.720 --> 00:06:27.280
then we also want to understand, what is also very&nbsp;
important that's just different for AI right now?

00:06:27.280 --> 00:06:32.960
We published a book with the lessons&nbsp;
from that conversation in 2023. And then&nbsp;&nbsp;

00:06:32.960 --> 00:06:38.240
we actually went on a bit of a tour&nbsp;
[LAUGHS] with that content where we&nbsp;&nbsp;

00:06:38.240 --> 00:06:43.040
had a number of roundtables actually all&nbsp;
over the world where we gathered feedback&nbsp;&nbsp;

00:06:43.040 --> 00:06:48.800
on how those analogies were landing, how our&nbsp;
takeaways were landing. And one of the things&nbsp;&nbsp;

00:06:48.800 --> 00:06:55.680
that we took from them was a gap that some of the&nbsp;
participants saw in the analogies that we chose to&nbsp;&nbsp;

00:06:55.680 --> 00:07:02.720
focus on. So across multiple conversations, other&nbsp;
domains kept being raised, like, why did you not&nbsp;&nbsp;

00:07:02.720 --> 00:07:09.920
also study pharmaceuticals? Why did you also not&nbsp;
study cybersecurity, for example? And so that,&nbsp;&nbsp;

00:07:09.920 --> 00:07:15.520
you know, naturally got us thinking about what&nbsp;
further lessons we could draw from those domains.

00:07:15.520 --> 00:07:18.400
At the same time, though, we also saw a need to,&nbsp;&nbsp;

00:07:18.400 --> 00:07:23.920
again, go deeper than what we went and&nbsp;
really, like, focus on a narrower problem.

00:07:23.920 --> 00:07:27.680
So that's really what led us to trying to&nbsp;
think about a more specific problem where we&nbsp;&nbsp;

00:07:27.680 --> 00:07:33.040
could think across levels of governance and&nbsp;
bring in some of these other domains. And,&nbsp;&nbsp;

00:07:33.040 --> 00:07:39.280
you know, testing was top of mind. Continues&nbsp;
to be a really important topic in the AI policy&nbsp;&nbsp;

00:07:39.280 --> 00:07:44.880
conversation right now, I think, for really good&nbsp;
reason. A lot of policymakers are focused on,&nbsp;&nbsp;

00:07:44.880 --> 00:07:50.640
you know, what we need to do to, kind&nbsp;
of, have there be sufficient trust,&nbsp;&nbsp;

00:07:50.640 --> 00:07:55.760
and testing is going to be a part of&nbsp;
that—really better understand risk,&nbsp;&nbsp;

00:07:56.480 --> 00:08:01.040
enable everyone to be able to make more, kind&nbsp;
of, risk-informed decisions, right. Testing is&nbsp;&nbsp;

00:08:01.040 --> 00:08:07.440
an important component for governance and AI and,&nbsp;
of course, in all of these other domains, as well.

00:08:07.440 --> 00:08:16.080
So I'll just add the other, kind of, input into&nbsp;
the process for this second round was exploring&nbsp;&nbsp;

00:08:16.880 --> 00:08:25.200
other analogies beyond those that we, kind of,&nbsp;
got feedback on. And one of the early, kind of,&nbsp;&nbsp;

00:08:26.080 --> 00:08:30.320
examples of another domain that would be really&nbsp;
worthwhile to study that came to mind from,&nbsp;&nbsp;

00:08:30.320 --> 00:08:34.400
sort of, just studying the literature was&nbsp;
genome editing. You know, genome editing&nbsp;&nbsp;

00:08:34.400 --> 00:08:37.920
was really interesting through the process of&nbsp;
thinking about other kind of general-purpose&nbsp;&nbsp;

00:08:37.920 --> 00:08:43.240
technologies. We also arrived at nanoscience&nbsp;
and brought those into the conversation.

00:08:43.240 --> 00:08:46.000
SULLIVAN: That's great. I mean,&nbsp;
actually, if you could double-click,&nbsp;&nbsp;

00:08:46.000 --> 00:08:50.160
I mean, you just named a number of&nbsp;
industries. I'd love to just understand&nbsp;&nbsp;

00:08:50.160 --> 00:08:54.000
which of those worlds maybe feels the&nbsp;
closest to what we're wrestling with,&nbsp;&nbsp;

00:08:54.000 --> 00:09:00.065
with AI and maybe which is kind of the farthest&nbsp;
off, and what makes them stand out to you?

00:09:00.065 --> 00:09:02.320
CRAIG DECKARD: Oh, such a good&nbsp;
question. For this second round,&nbsp;&nbsp;

00:09:02.320 --> 00:09:07.440
we actually brought together eight different&nbsp;
domains, right. And I think we actually thought&nbsp;&nbsp;

00:09:07.440 --> 00:09:13.280
we would come out of this conversation with some&nbsp;
bit of clarity around, Oh, if we just, sort of,&nbsp;&nbsp;

00:09:13.280 --> 00:09:20.160
take this approach for this domain or that&nbsp;
domain, we'll sort of have—at least for now—really&nbsp;&nbsp;

00:09:20.160 --> 00:09:27.200
solved part of the puzzle. [LAUGHS] And, you know,&nbsp;
our public policy team the day after the workshop,&nbsp;&nbsp;

00:09:27.200 --> 00:09:32.320
we had a, sort of, follow-on discussion,&nbsp;
and the very first thing that we started&nbsp;&nbsp;

00:09:32.320 --> 00:09:37.040
with in that conversation was like, OK, so&nbsp;
which of these domains? And fascinatingly,&nbsp;&nbsp;

00:09:37.040 --> 00:09:43.280
like, everyone was sort of like, Ahh! [LAUGHS]&nbsp;
None of them are applying perfectly. I mean,&nbsp;&nbsp;

00:09:43.280 --> 00:09:46.720
this is also speaking to the limitations&nbsp;
of analogies that we already acknowledged.

00:09:47.440 --> 00:09:53.360
And also, you know, all of the experts&nbsp;
from across these domains gave us really&nbsp;&nbsp;

00:09:53.360 --> 00:09:59.120
interesting insights into, sort of, the&nbsp;
tradeoffs and the limitations and how they&nbsp;&nbsp;

00:09:59.120 --> 00:10:05.040
were working. None are really applying&nbsp;
perfectly for us. But all of them do&nbsp;&nbsp;

00:10:05.040 --> 00:10:10.880
offer a thread of insight that is really&nbsp;
useful for thinking about testing in AI,&nbsp;&nbsp;

00:10:10.880 --> 00:10:16.960
and there are some different dimensions that&nbsp;
I think are really useful as framing for that.

00:10:16.960 --> 00:10:21.040
I mean, one is just this&nbsp;
horizontal-versus-vertical,&nbsp;&nbsp;

00:10:21.040 --> 00:10:27.040
kind of, difference in domains and, you know,&nbsp;
the horizontal technology like genome editing&nbsp;&nbsp;

00:10:27.040 --> 00:10:37.200
or nanoscience just being inherently different and&nbsp;
seemingly very similar to AI in that you want to&nbsp;&nbsp;

00:10:37.200 --> 00:10:46.080
be able to understand risks in the technology&nbsp;
itself and there is just so much contextual,&nbsp;&nbsp;

00:10:46.080 --> 00:10:52.640
sort of, factor that matters in the application&nbsp;
of those technologies for how the risk manifests&nbsp;&nbsp;

00:10:52.640 --> 00:10:58.400
that you really need to, kind of, do those&nbsp;
two things at once—of understanding the&nbsp;&nbsp;

00:10:58.400 --> 00:11:04.560
technology but then really thinking about risk and&nbsp;
governance in the context of application versus,&nbsp;&nbsp;

00:11:04.560 --> 00:11:11.120
you know, a context like or a domain like civil&nbsp;
aviation or nuclear technology, for example.

00:11:11.120 --> 00:11:15.680
You know, even in the workshop&nbsp;
itself that we hosted late last year,&nbsp;&nbsp;

00:11:15.680 --> 00:11:22.000
where we brought together this second round of&nbsp;
experts, it was really interesting. We actually&nbsp;&nbsp;

00:11:22.000 --> 00:11:27.440
started the conversation by trying to understand&nbsp;
how those different domains defined risks,&nbsp;&nbsp;

00:11:27.440 --> 00:11:32.800
where they were able to set risk thresholds.&nbsp;
That's been such a part of the AI policy&nbsp;&nbsp;

00:11:32.800 --> 00:11:39.520
conversation in the last year. And, you know,&nbsp;
it was really instructive that the more vertical&nbsp;&nbsp;

00:11:39.520 --> 00:11:44.720
domains were able to, sort of, snap to clearer&nbsp;
answers much more quickly.[LAUGHS] But, like,&nbsp;&nbsp;

00:11:44.720 --> 00:11:50.560
the horizontal nanoscience and genome editing were&nbsp;
not because it just depends, right. So anyway,&nbsp;&nbsp;

00:11:50.560 --> 00:11:55.920
the horizontal-vertical dimension seems like a&nbsp;
really important one to draw from and apply to AI.

00:11:55.920 --> 00:12:00.960
The couple of others that I would offer is just,&nbsp;
you know, thinking about the different kinds of&nbsp;&nbsp;

00:12:00.960 --> 00:12:05.840
technologies. You know, obviously, there's some&nbsp;
of the domains that we studied that they're just&nbsp;&nbsp;

00:12:05.840 --> 00:12:11.360
inherently, sort of, like, physical technologies&nbsp;
… a mix of physical and digital or virtual in a&nbsp;&nbsp;

00:12:11.360 --> 00:12:15.680
lot of cases because all of these are, of course,&nbsp;
applying digital technology. But like, you know,&nbsp;&nbsp;

00:12:15.680 --> 00:12:21.120
there is just a difference between something like&nbsp;
an airplane or a medical device or, you know,&nbsp;&nbsp;

00:12:21.120 --> 00:12:27.760
the more kind of virtual or intangible sort of&nbsp;
technologies even, you know, of course, AI and&nbsp;&nbsp;

00:12:27.760 --> 00:12:32.800
some of the other like cyber and genome editing&nbsp;
but also like, you know, financial services having&nbsp;&nbsp;

00:12:32.800 --> 00:12:38.160
some of that quality. And again, I think the thing&nbsp;
that's interesting to us about AI is to think&nbsp;&nbsp;

00:12:38.160 --> 00:12:44.400
about AI and risk evaluation of AI as being, you&nbsp;
know, having a large component of that being about&nbsp;&nbsp;

00:12:44.400 --> 00:12:49.440
the kind of virtual or intangible technology.&nbsp;
And also, you know, there is a future of robotics&nbsp;&nbsp;

00:12:49.440 --> 00:12:55.440
where we might need to think about the, kind of,&nbsp;
physical risk evaluation kind of work, as well.

00:12:55.440 --> 00:13:02.560
And then the final thing I'd maybe say in terms of&nbsp;
thinking about which domains have the lessons for&nbsp;&nbsp;

00:13:02.560 --> 00:13:08.080
AI that are most applicable is just how they've&nbsp;
grappled with these different kind of governance&nbsp;&nbsp;

00:13:08.080 --> 00:13:17.120
questions. Things like how to turn the dial&nbsp;
in terms of being more or less prescriptive on&nbsp;&nbsp;

00:13:17.120 --> 00:13:23.360
risk evaluation approaches, how they think&nbsp;
about the balance of, kind of, pre-market versus&nbsp;&nbsp;

00:13:23.360 --> 00:13:29.120
post-market risk evaluation in testing, and what&nbsp;
the tradeoffs have been there across domains has&nbsp;&nbsp;

00:13:29.120 --> 00:13:34.720
been really interesting to kind of tease out. And&nbsp;
then also thinking about, sort of, who does what?

00:13:34.720 --> 00:13:39.840
So, you know, in each of these different domains,&nbsp;
it was interesting to hear about, like, you know,&nbsp;&nbsp;

00:13:39.840 --> 00:13:48.480
the role of industry, the role of governments,&nbsp;
the role of third-party experts in designing&nbsp;&nbsp;

00:13:48.480 --> 00:13:54.240
evaluations and developing standards and&nbsp;
actually doing the work, and, kind of,&nbsp;&nbsp;

00:13:54.240 --> 00:14:00.480
having the pull through of what it means for risk&nbsp;
and governance decisions. There were, again, there&nbsp;&nbsp;

00:14:00.480 --> 00:14:05.360
was a variety of, sort of, approaches across these&nbsp;
domains that I think were interesting for AI.

00:14:05.360 --> 00:14:09.120
SULLIVAN: You mentioned that there's&nbsp;
a number of different stakeholders to&nbsp;&nbsp;

00:14:09.120 --> 00:14:12.480
be considering across the board&nbsp;
as we're thinking about policy,&nbsp;&nbsp;

00:14:12.480 --> 00:14:17.200
as we're thinking about regulation. Where&nbsp;
can we collaborate more across industry?&nbsp;&nbsp;

00:14:17.200 --> 00:14:22.240
Is it academia? Regulators? Just,&nbsp;
how can we move the needle faster?

00:14:22.240 --> 00:14:28.000
CRAIG DECKARD: I think all of the above&nbsp;
[LAUGHTER] is needed. But it's also really&nbsp;&nbsp;

00:14:28.000 --> 00:14:35.200
important to have all of that, kind of, expertise&nbsp;
brought together, you know, and I think, you know,&nbsp;&nbsp;

00:14:35.200 --> 00:14:44.400
one of the things that we certainly heard from&nbsp;
multiple of the domains, if not all of them, was&nbsp;&nbsp;

00:14:44.400 --> 00:14:50.960
that same actual interest and need and the same&nbsp;
sort of ongoing work to try to figure that out.

00:14:52.000 --> 00:14:58.640
You know, even where there had been progress in&nbsp;
some of the other domains with bringing together,&nbsp;&nbsp;

00:14:58.640 --> 00:15:05.920
you know, some industry stakeholders&nbsp;
or, you know, industry and government,&nbsp;&nbsp;

00:15:05.920 --> 00:15:10.000
there was still a desire to actually do more&nbsp;
there. Like, if there was some progress in&nbsp;&nbsp;

00:15:10.000 --> 00:15:16.160
industry and government, the need was, And&nbsp;
more kind of cross-jurisdiction government&nbsp;&nbsp;

00:15:16.160 --> 00:15:21.760
conversation, for example. Or some progress on,&nbsp;
you know, within the industry but needing to,&nbsp;&nbsp;

00:15:21.760 --> 00:15:26.720
like, strengthen the partnership with academia,&nbsp;
for example. So, you know, I think it speaks to,&nbsp;&nbsp;

00:15:26.720 --> 00:15:31.200
like, the quality of your question, to be&nbsp;
honest, that, you know, all of these domains&nbsp;&nbsp;

00:15:31.200 --> 00:15:36.240
are actually still grappling with this and still&nbsp;
seeing the need to grow in that direction more.

00:15:36.240 --> 00:15:43.360
What I'd say about AI today is that we have made&nbsp;
good progress with, you know, starting to build&nbsp;&nbsp;

00:15:43.360 --> 00:15:49.360
some industry partnerships. You know, we were&nbsp;
a founding member of the Frontier Model Forum,&nbsp;&nbsp;

00:15:49.360 --> 00:15:54.960
or FMF, which has been a very useful place for&nbsp;
us to work with some peers on really trying&nbsp;&nbsp;

00:15:54.960 --> 00:16:00.000
to bring forward some best practices that&nbsp;
apply across our organizations. You know,&nbsp;&nbsp;

00:16:00.000 --> 00:16:04.640
there are other forums as well, like MLCommons,&nbsp;
where we're working with others in industry and&nbsp;&nbsp;

00:16:04.640 --> 00:16:09.120
broader, sort of, academic and civil society&nbsp;
communities. Partnership on AI is another&nbsp;&nbsp;

00:16:09.120 --> 00:16:14.720
one I think about that, kind of, fits that&nbsp;
mold, as well, in a really positive way. And,&nbsp;&nbsp;

00:16:14.720 --> 00:16:20.880
like, there are a lot of different, sort of,&nbsp;
governance needs to think through and where,&nbsp;&nbsp;

00:16:20.880 --> 00:16:25.280
you know, we can really think about bringing that&nbsp;
expertise together is going to be so important.

00:16:25.280 --> 00:16:29.680
I think about almost, like,&nbsp;
in the near to mid-term,&nbsp;&nbsp;

00:16:29.680 --> 00:16:36.480
like three issues that we need to address in&nbsp;
the AI, kind of, policy and testing context.&nbsp;&nbsp;

00:16:36.480 --> 00:16:41.120
One is just building kind of, like,&nbsp;
a flexible framework that allows us&nbsp;&nbsp;

00:16:41.120 --> 00:16:48.800
to really build trust while we continue&nbsp;
to advance the science and the standards.&nbsp;&nbsp;

00:16:49.680 --> 00:16:54.880
You know, we are going to need to do both at once.&nbsp;
And so we need a flexible framework that enables&nbsp;&nbsp;

00:16:54.880 --> 00:17:00.080
that kind of agility, and advancing the science&nbsp;
and the standards, that is going to be something&nbsp;&nbsp;

00:17:00.080 --> 00:17:07.520
that really demands that kind of cross-discipline&nbsp;
or cross kind of expertise group coming together&nbsp;&nbsp;

00:17:07.520 --> 00:17:12.800
to work on that—researchers, academics, civil&nbsp;
society, governments and, of course, industry.

00:17:12.800 --> 00:17:17.440
And so I think that is, actually, the second&nbsp;
problem is, like, how do we actually build&nbsp;&nbsp;

00:17:19.120 --> 00:17:24.960
the kind of forums and ways of working&nbsp;
together, the public-private partnership&nbsp;&nbsp;

00:17:24.960 --> 00:17:32.400
kind of efforts that allow all of that expertise&nbsp;
to come together and fit together over time,&nbsp;&nbsp;

00:17:32.400 --> 00:17:36.560
right. Because when these are really big,&nbsp;
broad challenges, you kind of have to break&nbsp;&nbsp;

00:17:36.560 --> 00:17:40.720
them down incrementally, make progress on&nbsp;
them, and then bring them back together.

00:17:40.720 --> 00:17:46.960
And so I think about, like, one example that I,&nbsp;
you know, really have been reflecting on lately&nbsp;&nbsp;

00:17:46.960 --> 00:17:51.360
is, you know, in the context of building&nbsp;
standards, like, how do you do that,&nbsp;&nbsp;

00:17:51.360 --> 00:17:58.080
right? Again, standards are going to benefit&nbsp;
from that whole community of expertise. And,&nbsp;&nbsp;

00:17:58.080 --> 00:18:02.480
you know, there are lots of different kinds of&nbsp;
quote-unquote standards, though, right. You kind&nbsp;&nbsp;

00:18:02.480 --> 00:18:08.640
of have the “small s” industry standards. You&nbsp;
have the kind of “big S” international standards,&nbsp;&nbsp;

00:18:08.640 --> 00:18:14.160
for example. And how do you, kind of, leverage&nbsp;
one to accelerate the other, I think, is part of,&nbsp;&nbsp;

00:18:14.160 --> 00:18:20.480
like, how we need to work together within this&nbsp;
ecosystem. And, like, I think what we and others&nbsp;&nbsp;

00:18:20.480 --> 00:18:23.840
have done in an organization like C2PA [Coalition&nbsp;
for Content Provenance and Authenticity], for&nbsp;&nbsp;

00:18:23.840 --> 00:18:30.480
example, where we've really built an industry&nbsp;
specification but then built on that towards an&nbsp;&nbsp;

00:18:30.480 --> 00:18:36.480
international standard effort is one example&nbsp;
that is interesting, right, to point to.

00:18:36.480 --> 00:18:39.520
And then, you know, I actually think that&nbsp;
bridges to the third thing that we need to&nbsp;&nbsp;

00:18:39.520 --> 00:18:47.920
do together within this whole community, which is,&nbsp;
you know, really think again about how we manage&nbsp;&nbsp;

00:18:47.920 --> 00:18:54.720
the breadth of this challenge and opportunity&nbsp;
of AI by thinking about this horizontal-vertical&nbsp;&nbsp;

00:18:54.720 --> 00:19:00.880
problem. And, you know, I think that's where&nbsp;
it's not just the sort of tech industry,&nbsp;&nbsp;

00:19:00.880 --> 00:19:05.120
for example. It's broader industry that's going to&nbsp;
be really applying this technology that needs to&nbsp;&nbsp;

00:19:05.120 --> 00:19:10.000
get involved in the conversation about not just,&nbsp;
sort of, testing AI models, for example, but also&nbsp;&nbsp;

00:19:10.000 --> 00:19:17.662
testing how AI systems or applications are working&nbsp;
in context. And so, yes, so much fun opportunity!

00:19:17.662 --> 00:19:17.670
[MUSIC]

00:19:17.670 --> 00:19:20.880
SULLIVAN: Amanda, this was just&nbsp;
fantastic. You've really set the&nbsp;&nbsp;

00:19:20.880 --> 00:19:26.865
stage for this podcast. And thank you so much&nbsp;
for sharing your time and wisdom with us.

00:19:26.865 --> 00:19:27.960
CRAIG DECKARD: Thank you.

00:19:27.960 --> 00:19:32.000
SULLIVAN: And to our listeners, we're so&nbsp;
glad you joined us for this conversation.&nbsp;&nbsp;

00:19:32.000 --> 00:19:37.600
An exciting lineup of episodes are on the way, and&nbsp;
we can't wait to have you back for the next one.

00:19:50.440 --> 00:19:51.440
[MUSIC&nbsp;&nbsp;

00:19:51.440 --> 00:20:03.600
FADES]