WEBVTT

00:00:00.446 --> 00:00:03.081
[MUSIC]

00:00:03.081 --> 00:00:07.160
GRETCHEN HUIZINGA: Welcome to Abstracts,&nbsp;
a Microsoft Research Podcast that puts the&nbsp;&nbsp;

00:00:07.160 --> 00:00:16.120
spotlight on world-class research in brief.&nbsp;
I’m Dr. Gretchen Huizinga. In this series,&nbsp;&nbsp;

00:00:16.120 --> 00:00:20.080
members of the research community at&nbsp;
Microsoft give us a quick snapshot—or&nbsp;&nbsp;

00:00:20.080 --> 00:00:25.987
a podcast abstract—of their&nbsp;
new and noteworthy papers.

00:00:26.000 --> 00:00:31.520
My guest today is Dr. Michel Galley, a senior&nbsp;
principal researcher at Microsoft Research.&nbsp;&nbsp;

00:00:31.520 --> 00:00:37.240
Dr. Galley is the coauthor of a paper called&nbsp;
“MathVista: Evaluating Mathematical Reasoning of&nbsp;&nbsp;

00:00:37.240 --> 00:00:42.945
Foundation Models in Visual Contexts.” Michel,&nbsp;
thanks for joining us on Abstracts today!

00:00:42.945 --> 00:00:44.040
MICHEL GALLEY: Thank you for having me.

00:00:44.040 --> 00:00:47.560
HUIZINGA: So I like to start with a&nbsp;
distillation or sort of an elevator&nbsp;&nbsp;

00:00:47.560 --> 00:00:52.040
pitch of your research. Tell us in&nbsp;
just a couple sentences what problem&nbsp;&nbsp;

00:00:52.040 --> 00:00:56.140
or issue your paper addresses&nbsp;
and why we should care about it.

00:00:56.140 --> 00:00:59.920
GALLEY: So this paper is about&nbsp;
evaluating large foundation models.&nbsp;&nbsp;

00:01:00.560 --> 00:01:06.240
So it's a very important part of researching&nbsp;
large language models because it's a good way&nbsp;&nbsp;

00:01:06.240 --> 00:01:10.920
to evaluate, kind of, the capabilities—what&nbsp;
these models are good at and not good at. And&nbsp;&nbsp;

00:01:10.920 --> 00:01:16.840
a part of the focus of MathVista is to evaluate&nbsp;
these large foundation models in a multimodal&nbsp;&nbsp;

00:01:16.840 --> 00:01:22.760
setup, so when the input to the model is actually&nbsp;
not just text but also text and images. And then,&nbsp;&nbsp;

00:01:22.760 --> 00:01:26.760
an example of a task that such a model&nbsp;
would perform is, like, the input is&nbsp;&nbsp;

00:01:26.760 --> 00:01:31.560
maybe a mathematical question, and then there's&nbsp;
some visual support to that question, let's say,&nbsp;&nbsp;

00:01:31.560 --> 00:01:37.560
of an image of a graph, and then the model has to&nbsp;
respond to something related to that. And why this&nbsp;&nbsp;

00:01:37.560 --> 00:01:42.440
is important … there has been a lot of work, of&nbsp;
course, on large foundation model. Especially when&nbsp;&nbsp;

00:01:42.440 --> 00:01:46.940
it comes to reasoning tasks, like mathematical&nbsp;
reasoning, a lot has focused more on written form.

00:01:46.940 --> 00:01:47.419
HUIZINGA: Yeah …

00:01:47.419 --> 00:01:49.080
GALLEY: So MathVista is one&nbsp;
of the very first datasets&nbsp;&nbsp;

00:01:49.080 --> 00:01:52.600
that has input that is both images and text.

00:01:52.600 --> 00:01:56.760
HUIZINGA: Yeah, yeah. Well, reading your&nbsp;
paper, it seems like this is an area that&nbsp;&nbsp;

00:01:56.760 --> 00:02:00.320
hasn't been studied systematically.&nbsp;
In fact, you actually say that!&nbsp;&nbsp;

00:02:00.320 --> 00:02:06.800
And say that the field is largely unexplored. But&nbsp;
quickly tell us what has been done in this field,&nbsp;&nbsp;

00:02:06.800 --> 00:02:11.060
and then tell us how your research addresses&nbsp;
the proverbial gap in the literature.

00:02:11.060 --> 00:02:15.920
GALLEY: Well, there has been a lot of work&nbsp;
on vision and language in other problems,&nbsp;&nbsp;

00:02:15.920 --> 00:02:21.760
like not just about reasoning. Maybe let me just&nbsp;
mention why reasoning is important. So one reason&nbsp;&nbsp;

00:02:21.760 --> 00:02:25.840
I think it's very interesting to evaluate these&nbsp;
large language models in terms of reasoning skill&nbsp;&nbsp;

00:02:25.840 --> 00:02:31.760
is that we evaluate their capabilities beyond&nbsp;
just memorization. So as many of your listeners&nbsp;&nbsp;

00:02:31.760 --> 00:02:36.400
probably know, these large foundation models&nbsp;
are trained on large amounts of text that is&nbsp;&nbsp;

00:02:36.400 --> 00:02:41.360
public data from various sources. So when you&nbsp;
ask a question to a large foundation model,&nbsp;&nbsp;

00:02:41.360 --> 00:02:45.898
it could be the case, in many cases, that it&nbsp;
just memorizes things it has seen in the data.

00:02:45.898 --> 00:02:49.480
GALLEY: So what makes it interesting in&nbsp;
terms of reasoning, the answer oftentimes&nbsp;&nbsp;

00:02:49.480 --> 00:02:52.920
is not there in the data. So it needs to&nbsp;
develop this ability to connect the dots&nbsp;&nbsp;

00:02:52.920 --> 00:02:57.800
between various pieces of information&nbsp;
to come up with a new answer. So the&nbsp;&nbsp;

00:02:57.800 --> 00:03:01.960
focus of our paper is really on mathematical&nbsp;
reasoning, but it goes also a bit beyond that&nbsp;&nbsp;

00:03:01.960 --> 00:03:05.960
because what is also represented in the&nbsp;
data is also science question and so on.

00:03:05.960 --> 00:03:06.672
HUIZINGA: Yeah …

00:03:06.672 --> 00:03:12.894
GALLEY: So this reasoning part has largely&nbsp;
focused, until MathVista, on text-only modalities.

00:03:12.912 --> 00:03:17.280
GALLEY: So it's one of our very first ones&nbsp;
that combines text and images in terms of&nbsp;&nbsp;

00:03:17.280 --> 00:03:21.880
evaluating these large foundation models.&nbsp;
So you ask about what was done before. So,&nbsp;&nbsp;

00:03:21.880 --> 00:03:26.560
yes, there has been a lot of work, text only,&nbsp;
on reasoning, for example, the mathematical&nbsp;&nbsp;

00:03:26.560 --> 00:03:31.160
question that's just based on text. And there&nbsp;
has been a different stream of work that was&nbsp;&nbsp;

00:03:31.160 --> 00:03:36.050
much more focused on vision. A lot of work has&nbsp;
been on tasks such as visual question answering …

00:03:36.050 --> 00:03:36.068
HUIZINGA: Yeah …

00:03:36.068 --> 00:03:39.760
GALLEY: … where basically, you have an image&nbsp;
and the question is about answer a question&nbsp;&nbsp;

00:03:39.760 --> 00:03:44.853
about this image. So, yes, we’re trying&nbsp;
to fuse the two lines of research here.

00:03:44.853 --> 00:03:44.872
HUIZINGA: Right …

00:03:44.872 --> 00:03:46.880
GALLEY: And that's one of the&nbsp;
first works that does that.

00:03:46.880 --> 00:03:50.560
HUIZINGA: Yeah. Well, let's talk about&nbsp;
your methodology for a minute. Tell&nbsp;&nbsp;

00:03:50.560 --> 00:03:54.320
us how you went about conducting this&nbsp;
research, and what methods did you use?

00:03:54.320 --> 00:03:59.080
GALLEY: Yes, sure. So that's a bit different&nbsp;
from a typical, kind of, machine learning&nbsp;&nbsp;

00:03:59.080 --> 00:04:03.000
paper because the focus on this work is really on&nbsp;
benchmarking on the dataset. So the methodology is&nbsp;&nbsp;

00:04:03.000 --> 00:04:08.560
more about how we collect the data, process it.&nbsp;
So they have two components to doing that. One&nbsp;&nbsp;

00:04:08.560 --> 00:04:13.600
was to look at existing data that already combines&nbsp;
vision and text. And there are existing datasets&nbsp;&nbsp;

00:04:13.600 --> 00:04:19.320
that are actually already fairly big but that were&nbsp;
not focused on reasoning. So we use those existing&nbsp;&nbsp;

00:04:19.320 --> 00:04:25.320
datasets and look for instances in the data that&nbsp;
actually include some mathematical or science&nbsp;&nbsp;

00:04:25.320 --> 00:04:29.920
reasoning. And so that part is leveraging existing&nbsp;
datasets, but the important part is, like,&nbsp;&nbsp;

00:04:29.920 --> 00:04:34.760
we really want to carve out what was interesting&nbsp;
piece in terms of reasoning. And we had different&nbsp;&nbsp;

00:04:34.760 --> 00:04:41.040
stages of processing the data to identify the&nbsp;
subset that was reasoning-based. So one first&nbsp;&nbsp;

00:04:41.040 --> 00:04:48.120
step was basically to apply some automatic filter&nbsp;
to determine whether or not a given example, let's&nbsp;&nbsp;

00:04:48.120 --> 00:04:52.360
say something that is visual and text, is actually&nbsp;
… involves some mathematical reasoning. So we have&nbsp;&nbsp;

00:04:52.360 --> 00:04:56.960
different strategy. For example, if the answer is&nbsp;
numerical, it's likely that it might be something&nbsp;&nbsp;

00:04:56.960 --> 00:05:02.000
mathematically related. But that's just the first&nbsp;
stage. And the second stage, we actually had&nbsp;&nbsp;

00:05:02.000 --> 00:05:07.480
humans, annotators, just certify that the selected&nbsp;
data is actually of high quality. So we do have an&nbsp;&nbsp;

00:05:07.480 --> 00:05:12.800
example of, “Oh, this is mathematical, and that's&nbsp;
either mathematical or scientific,” and so on.&nbsp;&nbsp;

00:05:12.800 --> 00:05:18.280
And that's one part of the effort. The other part&nbsp;
is that we realized while we collected the data,&nbsp;&nbsp;

00:05:18.280 --> 00:05:22.400
there are certain types of mathematical reasoning&nbsp;
or related to mathematical reasoning that were&nbsp;&nbsp;

00:05:22.400 --> 00:05:28.040
not represented in the data. So we created three&nbsp;
new datasets as part of MathVista. So when I said&nbsp;&nbsp;

00:05:28.040 --> 00:05:33.040
dataset, it's more like, think of MathVista as&nbsp;
like an aggregate of different types of data, and&nbsp;&nbsp;

00:05:33.040 --> 00:05:39.880
we added three of them, three new types of data.&nbsp;
One is what you call PaperQA, which is basically&nbsp;&nbsp;

00:05:39.880 --> 00:05:46.040
data that is collected from scientific papers on&nbsp;
arXiv, and that had questions asking about that&nbsp;&nbsp;

00:05:46.040 --> 00:05:52.734
paper and that included some visual components&nbsp;
from the paper, typically a plot or a figure.

00:05:52.734 --> 00:05:52.752
HUIZINGA: Yeah …

00:05:52.752 --> 00:05:55.120
GALLEY: And then we had IQTest,&nbsp;
which is basically, I mean,&nbsp;&nbsp;

00:05:55.120 --> 00:05:59.240
it's vaguely related mathematically,&nbsp;
but basically it also, kind of,&nbsp;&nbsp;

00:06:00.080 --> 00:06:07.040
tried to see maybe more abstractive thinking about&nbsp;
maybe some input that is both text and visual. And&nbsp;&nbsp;

00:06:07.040 --> 00:06:12.340
the final is about FunctionQA, that is basically&nbsp;
algebraic reasoning and function plots and so on.

00:06:12.340 --> 00:06:12.952
HUIZINGA: OK …

00:06:12.952 --> 00:06:17.760
GALLEY: The important part was actually&nbsp;
to identify among vast amounts of data&nbsp;&nbsp;

00:06:17.760 --> 00:06:20.894
what is actually very interesting&nbsp;
in terms of mathematical reasoning.

00:06:20.894 --> 00:06:20.912
HUIZINGA: Yeah …

00:06:20.912 --> 00:06:23.640
GALLEY: So that part, I think,&nbsp;
was quite a big part of doing&nbsp;&nbsp;

00:06:23.640 --> 00:06:27.180
that work—finding existing data&nbsp;
but also creating new data.

00:06:27.180 --> 00:06:30.760
HUIZINGA: Yeah, yeah. Well, my favorite&nbsp;
part of a research paper is where it says,&nbsp;&nbsp;

00:06:30.760 --> 00:06:36.080
“and what we found was … ,” so talk a little&nbsp;
bit about your results. What did you find?

00:06:36.080 --> 00:06:44.280
GALLEY: So we evaluated a wide variety of models,&nbsp;
including GPT-4, Claude 2, GPT-4V, multimodal&nbsp;&nbsp;

00:06:44.280 --> 00:06:50.840
Bard, and LLaVA, and we categorized them into&nbsp;
three categories. So one is text only. So,&nbsp;&nbsp;

00:06:50.840 --> 00:06:57.120
basically, you take a model that is by default&nbsp;
just text, and we give it the text part of the&nbsp;&nbsp;

00:06:57.120 --> 00:07:01.870
question and ask it to answer the question.&nbsp;
Of course, that's, kind of, a bit of a, it’s&nbsp;&nbsp;

00:07:01.870 --> 00:07:05.640
a difficult task because oftentimes [LAUGHTER]&nbsp;
we crucially build these questions so that you&nbsp;&nbsp;

00:07:05.640 --> 00:07:10.440
have to rely on the vision part. But that's for,&nbsp;
you know, scientific investigation to know how&nbsp;&nbsp;

00:07:10.440 --> 00:07:15.720
well they can do, and so that's one category of&nbsp;
model. A different category is still text only&nbsp;&nbsp;

00:07:15.720 --> 00:07:24.360
but that is given the detection from the image. So&nbsp;
on the image, we do OCR. So we convert those words&nbsp;&nbsp;

00:07:24.360 --> 00:07:30.400
from images to text. It’s kind of an extension of&nbsp;
the text-based model, except that what was images&nbsp;&nbsp;

00:07:30.400 --> 00:07:35.280
is translated into text, and then the input to&nbsp;
the model is word only, and that's a different&nbsp;&nbsp;

00:07:35.280 --> 00:07:40.080
category of model. And the third one is basically&nbsp;
truly multimodal model. And what we found, I mean,&nbsp;&nbsp;

00:07:40.080 --> 00:07:44.360
not surprisingly, it’s, kind of, the one that was&nbsp;
doing most poorly is the one that is text only.&nbsp;&nbsp;

00:07:44.360 --> 00:07:50.760
The second is text plus OCR. And then finally,&nbsp;
the one that does best is the multimodal like&nbsp;&nbsp;

00:07:50.760 --> 00:07:56.120
GPT-4V. But while the ordering between these three&nbsp;
categories makes sense, it was a bit surprising&nbsp;&nbsp;

00:07:56.120 --> 00:08:04.320
that maybe the gap between multimodal and text&nbsp;
plus OCR was not bigger. Well, it’s big, but maybe&nbsp;&nbsp;

00:08:04.320 --> 00:08:09.920
not as big as we were expecting. So, for example,&nbsp;
the best detection from the images model achieved&nbsp;&nbsp;

00:08:09.920 --> 00:08:16.160
like 35 percent accuracy while GPT-4V was 50&nbsp;
percent. So it's a substantial gap but not huge.

00:08:16.160 --> 00:08:20.640
HUIZINGA: Right. Just to clarify, you're&nbsp;
saying OCR. What does that stand for?

00:08:20.640 --> 00:08:22.815
GALLEY: [Optical] character recognition.

00:08:22.815 --> 00:08:22.832
HUIZINGA: Gotcha.

00:08:22.832 --> 00:08:28.080
GALLEY: So, basically, it's the task of taking&nbsp;
text, sometimes typed, but sometimes written,&nbsp;&nbsp;

00:08:28.080 --> 00:08:33.680
and convert this into the actual text&nbsp;
like you would have in a text file.

00:08:33.680 --> 00:08:38.000
HUIZINGA: Right. Michel, does any of this&nbsp;
have to do with the difficulty of the&nbsp;&nbsp;

00:08:38.000 --> 00:08:42.680
math problems that you present these&nbsp;
models with? I mean, it seems to me,&nbsp;&nbsp;

00:08:42.680 --> 00:08:46.320
similar to humans, that the easier&nbsp;
the problem, the easier it would be&nbsp;&nbsp;

00:08:46.320 --> 00:08:51.920
for the machine. So at what level of&nbsp;
math are we talking for these tests?

00:08:51.920 --> 00:08:55.760
GALLEY: What's nice about MathVista is there's&nbsp;
continuum [of] different difficulties. So the&nbsp;&nbsp;

00:08:55.760 --> 00:09:01.720
spectrum is quite broad, going from elementary&nbsp;
school to more advanced concepts such as&nbsp;&nbsp;

00:09:01.720 --> 00:09:06.800
calculus. So it's quite broad. So in the paper,&nbsp;
we do have this, kind of, broken down by level.&nbsp;&nbsp;

00:09:06.800 --> 00:09:11.260
So the number I gave you, like 50 percent, is&nbsp;
an aggregate over all the difficulties. But …

00:09:11.260 --> 00:09:11.712
HUIZINGA: Gotcha.

00:09:11.712 --> 00:09:14.160
GALLEY: But the goal there was really,&nbsp;
kind of, to compare different models,&nbsp;&nbsp;

00:09:14.160 --> 00:09:17.640
but we do have a fair amount of&nbsp;
analysis in the appendix. Actually,&nbsp;&nbsp;

00:09:17.640 --> 00:09:23.510
we have 100 pages of appendices of plenty of&nbsp;
analysis and so on. So if people, I mean …

00:09:23.510 --> 00:09:26.640
HUIZINGA: I saw that. I saw the&nbsp;
length of the paper, and I'm going,&nbsp;&nbsp;

00:09:26.640 --> 00:09:33.880
what? [LAUGHS] That’s a LONG paper! Well, research&nbsp;
in the lab is one thing, I always like to say,&nbsp;&nbsp;

00:09:33.880 --> 00:09:38.160
but understanding real-world impact&nbsp;
is important, too. So where's this&nbsp;&nbsp;

00:09:38.160 --> 00:09:42.460
work going to make the most difference,&nbsp;
and who does it help most at this point?

00:09:42.460 --> 00:09:46.440
GALLEY: Well, I think perhaps that's the&nbsp;
main point of this kind of line of work&nbsp;&nbsp;

00:09:46.440 --> 00:09:51.720
in terms of reasoning is that when looking at&nbsp;
this difficult problem that are mathematical,&nbsp;&nbsp;

00:09:51.720 --> 00:09:55.840
actually it's a way to, kind of, abstract&nbsp;
away maybe more complex capabilities,&nbsp;&nbsp;

00:09:55.840 --> 00:10:01.200
and I think while thinking just about&nbsp;
mathematics might seem a bit narrow,&nbsp;&nbsp;

00:10:01.200 --> 00:10:08.480
I don't think that really is. It's more about&nbsp;
seeing whether this model has the ability to do,&nbsp;&nbsp;

00:10:08.480 --> 00:10:12.880
kind of, multistep kind of processing&nbsp;
of your input and think maybe somewhat&nbsp;&nbsp;

00:10:12.880 --> 00:10:19.360
intelligently about a given problem. So we&nbsp;
focus mostly on math. There is some science,&nbsp;&nbsp;

00:10:19.360 --> 00:10:22.600
but we would be very interested, especially&nbsp;
in future work, to, kind of, go beyond that.

00:10:22.600 --> 00:10:24.560
HUIZINGA: OK, well, let me press in a little&nbsp;&nbsp;

00:10:24.560 --> 00:10:29.320
bit there because … just say I'm a&nbsp;
regular person using a GPT model.&nbsp;&nbsp;

00:10:30.040 --> 00:10:35.880
Is your work more addressed upstream from that to&nbsp;
the research community to say, how do we get these&nbsp;&nbsp;

00:10:35.880 --> 00:10:42.440
models to be better so that downstream people&nbsp;
like me can be more confident of the models?

00:10:42.440 --> 00:10:45.040
GALLEY: Yes, I would say at the moment, I mean,&nbsp;&nbsp;

00:10:45.040 --> 00:10:49.120
this line of work is perhaps more geared&nbsp;
towards somewhat more research community,&nbsp;&nbsp;

00:10:49.120 --> 00:10:57.360
but I think it could be some seed for researchers&nbsp;
to think about some applications perhaps that&nbsp;&nbsp;

00:10:57.360 --> 00:11:01.900
also requires some kind of step-by-step&nbsp;
reasoning but perhaps not going beyond math.

00:11:01.900 --> 00:11:05.920
HUIZINGA: Yeah. Michel, if there was&nbsp;
one thing you wanted our listeners to&nbsp;&nbsp;

00:11:05.920 --> 00:11:10.000
take away from this research, kind&nbsp;
of golden nugget, what would it be?

00:11:10.000 --> 00:11:16.000
GALLEY: Well, I would say it’s the challenging&nbsp;
part of these datasets. I think that's what&nbsp;&nbsp;

00:11:16.000 --> 00:11:20.960
makes MathVista stand out compared to other&nbsp;
datasets. By now, there are a few other vision&nbsp;&nbsp;

00:11:20.960 --> 00:11:26.080
and language datasets, and of course, many&nbsp;
that are more text-based. And we've seen,&nbsp;&nbsp;

00:11:26.080 --> 00:11:31.240
for example, some recent papers showing&nbsp;
that actually MathVista remains one of&nbsp;&nbsp;

00:11:31.240 --> 00:11:34.920
the most challenging ones. So I think&nbsp;
it's probably going to stay around for&nbsp;&nbsp;

00:11:34.920 --> 00:11:39.800
a while because of the difficulty it&nbsp;
represents. So it's open source of&nbsp;&nbsp;

00:11:39.800 --> 00:11:44.580
available datasets that everybody can use,&nbsp;
and I very much encourage people to use it.

00:11:44.580 --> 00:11:47.000
HUIZINGA: Is it on GitHub?

00:11:47.000 --> 00:11:48.280
GALLEY: Yes, it's on GitHub.

00:11:48.280 --> 00:11:53.360
HUIZINGA: So what's next on the&nbsp;
research agenda for helping LLMs&nbsp;&nbsp;

00:11:53.360 --> 00:11:57.680
get better at math, Michel? What are the&nbsp;
big challenges in the field yet? I mean,&nbsp;&nbsp;

00:11:57.680 --> 00:12:03.280
you've alluded to many of them already, sort&nbsp;
of, but what's next on your research agenda?

00:12:03.280 --> 00:12:10.520
GALLEY: Well, I would say what we found so far&nbsp;
is these models are very good at processing the&nbsp;&nbsp;

00:12:10.520 --> 00:12:16.040
textual part of problems it's given, to the model,&nbsp;
but you have the equivalent in images actually&nbsp;&nbsp;

00:12:16.040 --> 00:12:21.520
harder somehow. So I think a lot more work needs&nbsp;
to be done in terms of vision capabilities,&nbsp;&nbsp;

00:12:21.520 --> 00:12:26.800
in terms of reasoning over images, because the&nbsp;
capabilities you will see in text are actually&nbsp;&nbsp;

00:12:26.800 --> 00:12:31.800
quite advanced, whereas the equivalent in images&nbsp;
doesn't seem that good. I mean, a fair disclaimer:&nbsp;&nbsp;

00:12:31.800 --> 00:12:35.200
my background is more on the text side, [LAUGHTER]&nbsp;
so some of my colleagues on the paper are more&nbsp;&nbsp;

00:12:35.200 --> 00:12:39.600
on the vision side, so maybe if a listener maybe&nbsp;
run into some of our coauthors at the conference,&nbsp;&nbsp;

00:12:39.600 --> 00:12:45.150
they might want to talk to these vision people&nbsp;
because that's less of my background. [LAUGHS]

00:12:45.150 --> 00:12:47.360
HUIZINGA: Well, and if you think&nbsp;
about Venn diagrams, you know,&nbsp;&nbsp;

00:12:47.360 --> 00:12:50.240
you've got people that are doing&nbsp;
text, people that are doing vision,&nbsp;&nbsp;

00:12:50.240 --> 00:12:55.140
and then the people that are trying to&nbsp;
do both to see how the worlds collide.

00:12:55.140 --> 00:12:55.920
[MUSIC]

00:12:55.920 --> 00:12:59.520
Well, Michel Galley, thanks for&nbsp;
joining us today. And to our listeners,&nbsp;&nbsp;

00:12:59.520 --> 00:13:06.600
thanks for tuning in. If you want to read this&nbsp;
paper, you can find a link at aka.ms/abstracts,&nbsp;&nbsp;

00:13:06.600 --> 00:13:11.200
or you can find it on arXiv. You can also read it&nbsp;
on the website for the International Conference&nbsp;&nbsp;

00:13:11.200 --> 00:13:17.240
on Learning Representations, or ICLR. And if you&nbsp;
happen to be at the ICLR conference this week,&nbsp;&nbsp;

00:13:17.240 --> 00:13:29.280
you can hear more about it there.&nbsp;
See you next time on Abstracts!

00:13:29.280 --> 00:13:36.111
[MUSIC FADES]