00:00:02.485 --> 00:00:04.080
[MUSIC]
ELIZA STRICKLAND: Welcome&nbsp;&nbsp;

00:00:04.080 --> 00:00:07.600
to the Microsoft Research Podcast, where&nbsp;
Microsoft's leading researchers bring you to&nbsp;&nbsp;

00:00:07.600 --> 00:00:12.360
the cutting edge. This series of conversations&nbsp;
showcases the technical advances being pursued&nbsp;&nbsp;

00:00:12.360 --> 00:00:16.560
at Microsoft through the insights and&nbsp;
experiences of the people driving them.&nbsp;

00:00:16.560 --> 00:00:21.240
I'm Eliza Strickland, a senior editor at IEEE&nbsp;
Spectrum and your guest host for a special&nbsp;&nbsp;

00:00:21.240 --> 00:00:25.560
edition of the podcast.
[MUSIC FADES]&nbsp;

00:00:25.560 --> 00:00:30.040
Joining me today in the Microsoft Booth&nbsp;
at the 38th annual Conference on Neural&nbsp;&nbsp;

00:00:30.040 --> 00:00:35.680
Information Processing Systems, or NeurIPS, is&nbsp;
Chris Bishop. Chris is a Microsoft technical&nbsp;&nbsp;

00:00:35.680 --> 00:00:40.120
fellow and the director of Microsoft&nbsp;
Research AI for Science. Chris is with&nbsp;&nbsp;

00:00:40.120 --> 00:00:44.400
me for one of our two on-site conversations&nbsp;
that we’re having here at the conference.&nbsp;

00:00:44.400 --> 00:00:46.240
Chris, welcome to the podcast.
CHRIS BISHOP: Thanks,&nbsp;&nbsp;

00:00:46.240 --> 00:00:49.200
Eliza. Really great to join you.
STRICKLAND: How did your long&nbsp;&nbsp;

00:00:49.200 --> 00:00:53.360
career in machine learning lead you to this&nbsp;
focus on AI for Science, and were there any&nbsp;&nbsp;

00:00:53.360 --> 00:00:56.800
pivotal moments when you started to think that,&nbsp;
hey, this deep learning thing, it's going to&nbsp;&nbsp;

00:00:56.800 --> 00:01:00.040
change the way scientific discovery happens?
BISHOP: Oh, that's such a great question.&nbsp;&nbsp;

00:01:00.040 --> 00:01:04.560
I think this is like my career coming full circle,&nbsp;
really. I started out studying physics at Oxford,&nbsp;&nbsp;

00:01:04.560 --> 00:01:10.280
and then I did a PhD in quantum field theory. And&nbsp;
then I moved into the fusion program. I wanted to&nbsp;&nbsp;

00:01:10.280 --> 00:01:15.040
do something of practical value, [LAUGHTER] so I&nbsp;
worked on nuclear fusion for about seven or eight&nbsp;&nbsp;

00:01:15.040 --> 00:01:19.360
years doing theoretical physics, and then that&nbsp;
was about the time that Geoff Hinton published&nbsp;&nbsp;

00:01:19.360 --> 00:01:23.800
his backprop paper. And it really caught&nbsp;
my imagination as an exciting approach to&nbsp;&nbsp;

00:01:23.800 --> 00:01:29.480
artificial intelligence that might actually yield&nbsp;
some progress. So that was, kind of, 35 years ago,&nbsp;&nbsp;

00:01:29.480 --> 00:01:33.600
and I moved into the field of machine learning.&nbsp;
And, actually, the way I made that transition&nbsp;&nbsp;

00:01:33.600 --> 00:01:40.160
was by applying neural networks to fusion. I&nbsp;
was working at the JET experiment, which was&nbsp;&nbsp;

00:01:40.160 --> 00:01:45.880
the world's largest fusion experiment. It was&nbsp;
sort of big data in its day. And so I had to,&nbsp;&nbsp;

00:01:45.880 --> 00:01:47.512
first of all, teach myself to program.
STRICKLAND: [LAUGHS] Right.&nbsp;

00:01:47.512 --> 00:01:51.400
BISHOP: I was a pencil-and-paper theoretician&nbsp;
up to that point. Persuade my boss to buy me&nbsp;&nbsp;

00:01:51.400 --> 00:01:55.640
a workstation and then started to play with&nbsp;
these neural nets. So right from the get-go,&nbsp;&nbsp;

00:01:55.640 --> 00:02:01.640
I was applying machine learning 35 years ago&nbsp;
to data from science experiments. And that was&nbsp;&nbsp;

00:02:01.640 --> 00:02:06.760
a great on-ramp for me. And then, eventually,&nbsp;
I just got so distracted, I decided I wanted&nbsp;&nbsp;

00:02:06.760 --> 00:02:11.480
to build my career in machine learning. Spent a&nbsp;
few years as a research professor and then joined&nbsp;&nbsp;

00:02:11.480 --> 00:02:17.800
Microsoft 27 years ago, when Microsoft opened its&nbsp;
first research lab outside the US in Cambridge,&nbsp;&nbsp;

00:02:17.800 --> 00:02:23.440
UK, and have been there very happily ever&nbsp;
since. Went on to become lab director. But&nbsp;&nbsp;

00:02:23.440 --> 00:02:30.200
about three or four years ago, I realized&nbsp;
that not only was deep learning transforming&nbsp;&nbsp;

00:02:30.200 --> 00:02:34.640
so many different things, but I felt it was&nbsp;
especially relevant to scientific discovery.&nbsp;&nbsp;

00:02:34.640 --> 00:02:40.960
And so I had an opportunity to pitch to our chief&nbsp;
technology officer to go start a new team. And he&nbsp;&nbsp;

00:02:40.960 --> 00:02:46.000
was very excited by this. So just over two and a&nbsp;
half years ago now, we set up Microsoft Research&nbsp;&nbsp;

00:02:46.000 --> 00:02:51.100
AI for Science, and it's a global team, and&nbsp;
it, sort of, does what it says on the tin.&nbsp;

00:02:51.100 --> 00:02:56.320
STRICKLAND: So you’ve said that AI could usher&nbsp;
in a fifth paradigm of scientific discovery,&nbsp;&nbsp;

00:02:56.320 --> 00:03:00.160
which builds upon the ideas of Turing&nbsp;
Award–winner Jim Gray, who described&nbsp;&nbsp;

00:03:00.160 --> 00:03:05.360
four stages in the evolution of science. Can you&nbsp;
briefly explain the four prior paradigms and then&nbsp;&nbsp;

00:03:05.360 --> 00:03:09.480
tell us about what makes this stage different?
BISHOP: Yeah, sure. So it was a nice insight&nbsp;&nbsp;

00:03:09.480 --> 00:03:14.080
by Jim. He said, well, of course, the first&nbsp;
paradigm of scientific discovery was really&nbsp;&nbsp;

00:03:14.080 --> 00:03:18.800
the empirical one. I tend to think of some&nbsp;
cave dweller picking up a big rock and a small&nbsp;&nbsp;

00:03:18.800 --> 00:03:22.660
rock and letting go of them at the same time and&nbsp;
thinking the big rock will hit the ground first …&nbsp;

00:03:22.660 --> 00:03:24.280
STRICKLAND: [LAUGHS] Right …
BISHOP: … discovering they land together.&nbsp;&nbsp;

00:03:24.280 --> 00:03:27.960
And this is interesting. They've discovered&nbsp;
a, sort of, pattern irregularity in nature,&nbsp;&nbsp;

00:03:27.960 --> 00:03:32.640
and even today, the first paradigm is in&nbsp;
a sense the prime paradigm. It’s the most&nbsp;&nbsp;

00:03:32.640 --> 00:03:37.760
important one because at the end of the day, it's&nbsp;
experimental results that determine the truth,&nbsp;&nbsp;

00:03:37.760 --> 00:03:42.920
if you like. So that's the first paradigm. And&nbsp;
it continues to be of critical importance today.&nbsp;&nbsp;

00:03:42.920 --> 00:03:48.520
And then the second paradigm really emerged&nbsp;
in the 17th century. When Newton discovered&nbsp;&nbsp;

00:03:48.520 --> 00:03:53.880
the laws of motion and the law of gravity, and&nbsp;
not only did he discover the equations but this,&nbsp;&nbsp;

00:03:53.880 --> 00:03:57.800
sort of, remarkable fact that nature&nbsp;
can even be described by equations,&nbsp;&nbsp;

00:03:57.800 --> 00:04:01.920
right. It's not obvious that this would be true,&nbsp;
but it turns out that, you know, the world around&nbsp;&nbsp;

00:04:01.920 --> 00:04:06.320
us can be described by very simple equations&nbsp;
that you can write on a T-shirt. And so in the&nbsp;&nbsp;

00:04:06.320 --> 00:04:10.440
19th century, James Clerk Maxwell discovered&nbsp;
some simple equations that describe the whole&nbsp;&nbsp;

00:04:10.440 --> 00:04:14.640
of electricity and magnetism, electromagnetic&nbsp;
waves, and so on. And then very importantly,&nbsp;&nbsp;

00:04:14.640 --> 00:04:19.080
the beginning of the 20th century, we had this&nbsp;
remarkable breakthrough in quantum physics. So&nbsp;&nbsp;

00:04:19.080 --> 00:04:23.160
again down at the molecular—the atomic—level,&nbsp;
the world is described with exquisite precision&nbsp;&nbsp;

00:04:23.160 --> 00:04:29.320
by Schrödinger's equation. And so this was the&nbsp;
second paradigm, the theoretical. That the world&nbsp;&nbsp;

00:04:29.320 --> 00:04:35.840
is described with incredible precision of a huge&nbsp;
range of length and time by very simple equations.&nbsp;

00:04:35.840 --> 00:04:41.800
But of course, there's a catch, which is those&nbsp;
equations are very hard to solve. And so the&nbsp;&nbsp;

00:04:41.800 --> 00:04:45.840
third paradigm really began, I guess, sort of,&nbsp;
in the ’50s and ’60s, the development of digital&nbsp;&nbsp;

00:04:45.840 --> 00:04:50.000
computers. And, actually, the very first use&nbsp;
of digital computers was to simulate physics,&nbsp;&nbsp;

00:04:50.000 --> 00:04:55.800
and it's been at the core of digital computing&nbsp;
right up to the present day. And so what you're&nbsp;&nbsp;

00:04:55.800 --> 00:05:01.360
doing there is using a computer to go with a&nbsp;
numerical algorithm to solve those very simple&nbsp;&nbsp;

00:05:01.360 --> 00:05:06.520
equations but solve them in a practical setting.&nbsp;
And so that's, I’ll refer to that as simulation.&nbsp;&nbsp;

00:05:06.520 --> 00:05:12.200
That's the third paradigm. And that's proven&nbsp;
to be tremendously powerful. If you look up the&nbsp;&nbsp;

00:05:12.200 --> 00:05:16.920
weather forecast on your phone today, it's done&nbsp;
by numerical weather forecasting, solving in those&nbsp;&nbsp;

00:05:16.920 --> 00:05:22.280
case Navier-Stokes equations using big numerical&nbsp;
simulators. What Jim Gray observed, though,&nbsp;&nbsp;

00:05:22.280 --> 00:05:28.160
really emerging at the beginning of the 21st&nbsp;
century was what he called the fourth paradigm,&nbsp;&nbsp;

00:05:28.160 --> 00:05:35.040
or data-intensive scientific discovery. So this&nbsp;
is the era of big data. Think of particle physics&nbsp;&nbsp;

00:05:35.040 --> 00:05:42.320
at the CERN accelerator, for example, generating&nbsp;
colossal amounts of data in real time. And that&nbsp;&nbsp;

00:05:42.320 --> 00:05:46.680
data can then be processed and filtered. We can do&nbsp;
statistics on it. But of course, we can do machine&nbsp;&nbsp;

00:05:46.680 --> 00:05:53.280
learning on that data. And so machine learning&nbsp;
feeds off large data. And so the fourth paradigm&nbsp;&nbsp;

00:05:53.280 --> 00:05:59.240
really is dominated today by machine learning.&nbsp;
And again that remains tremendously important.&nbsp;

00:05:59.240 --> 00:06:03.280
What I noticed, though, is that there's&nbsp;
again another framework. We call it the fifth&nbsp;&nbsp;

00:06:03.280 --> 00:06:08.440
paradigm. Again, it goes back to those fundamental&nbsp;
equations. But again, it's driven by computation,&nbsp;&nbsp;

00:06:08.440 --> 00:06:16.360
and it's the idea that we can train machine&nbsp;
learning systems not using the empirical data&nbsp;&nbsp;

00:06:16.360 --> 00:06:21.240
of the fourth paradigm but instead using the&nbsp;
results of simulation. So the output of the&nbsp;&nbsp;

00:06:21.240 --> 00:06:27.160
third paradigm. So think of it this way. You want&nbsp;
to predict the property of some molecule, let's&nbsp;&nbsp;

00:06:27.160 --> 00:06:32.080
say. You could in principle solve Schrödinger’s&nbsp;
equation on a digital computer; it’d be very&nbsp;&nbsp;

00:06:32.080 --> 00:06:36.200
expensive. And let's say you want to screen&nbsp;
hundreds of millions of molecules. That's going&nbsp;&nbsp;

00:06:36.200 --> 00:06:42.800
to get far too costly. So instead, what you can&nbsp;
do is have a mindset shift. You can think of that&nbsp;&nbsp;

00:06:42.800 --> 00:06:48.600
simulator not as a tool to predict the molecule’s&nbsp;
properties directly but instead as a way of&nbsp;&nbsp;

00:06:48.600 --> 00:06:54.200
generating synthetic training data. And then you&nbsp;
use that training data to train a deep learning&nbsp;&nbsp;

00:06:54.200 --> 00:07:00.800
system to give what I like to call an emulator,&nbsp;
an emulator of the simulator. Once it's trained,&nbsp;&nbsp;

00:07:00.800 --> 00:07:07.760
that emulator is fast. It's usually three to four&nbsp;
orders of magnitude faster than the simulator. So&nbsp;&nbsp;

00:07:07.760 --> 00:07:12.320
if you're going to do something over and over&nbsp;
again, that three-to-four-order-of-magnitude&nbsp;&nbsp;

00:07:12.320 --> 00:07:17.160
acceleration is tremendously disruptive. And&nbsp;
what's really interesting is we see that fifth&nbsp;&nbsp;

00:07:17.160 --> 00:07:21.800
paradigm occur in many, many different places.&nbsp;
The idea goes back a long way. The, actually,&nbsp;&nbsp;

00:07:21.800 --> 00:07:27.640
the last project that I worked on before I left&nbsp;
the fusion program was to do what was the world's&nbsp;&nbsp;

00:07:27.640 --> 00:07:35.200
first-ever real-time control of a tokamak fusion&nbsp;
plasma using a neural net and the computers of the&nbsp;&nbsp;

00:07:35.200 --> 00:07:40.680
day. But the processors were just far too slow,&nbsp;
long before GPUs, and so on. And so it wasn't&nbsp;&nbsp;

00:07:40.680 --> 00:07:45.240
possible to solve the equations. In that case,&nbsp;
it was called the Grad-Shafranov equation. Again,&nbsp;&nbsp;

00:07:45.240 --> 00:07:49.360
a simple differential equation you could write&nbsp;
on a T-shirt, but solving it was expensive on&nbsp;&nbsp;

00:07:49.360 --> 00:07:56.280
a computer. We were about a million times too slow&nbsp;
to solve it directly in real time. And so instead,&nbsp;&nbsp;

00:07:56.280 --> 00:08:01.200
we generated lots and lots of solutions. We&nbsp;
used those solutions to train a very simple&nbsp;&nbsp;

00:08:01.200 --> 00:08:05.720
neural network, not a deep network, just a&nbsp;
simple two-layer network back in the day,&nbsp;&nbsp;

00:08:05.720 --> 00:08:09.600
and then we implemented that in special hardware&nbsp;
and did real-time feedback control. So that was an&nbsp;&nbsp;

00:08:09.600 --> 00:08:14.360
example of the fifth paradigm from, you know,&nbsp;
a quarter of a century ago. But of course,&nbsp;&nbsp;

00:08:14.360 --> 00:08:21.000
deep learning just tremendously expands&nbsp;
the range of applicability. So today we're&nbsp;&nbsp;

00:08:21.000 --> 00:08:27.000
using the fifth paradigm in many, many different&nbsp;
scenarios. And time and time again, we see these&nbsp;&nbsp;

00:08:27.000 --> 00:08:32.560
four-orders-of-magnitude acceleration. So I think&nbsp;
it's worthy of thinking of that as a new paradigm&nbsp;&nbsp;

00:08:32.560 --> 00:08:37.240
because it's so pervasive and so ubiquitous.
STRICKLAND: So how do you identify fields of&nbsp;&nbsp;

00:08:37.240 --> 00:08:41.440
science and particular problems that are&nbsp;
amenable to this kind of AI assistance?&nbsp;&nbsp;

00:08:41.440 --> 00:08:44.980
Is it all about availability of data&nbsp;
or the need for that kind of speed up?&nbsp;

00:08:44.980 --> 00:08:49.000
BISHOP: So there are lots of factors that&nbsp;
go into this. And when I think about AI for&nbsp;&nbsp;

00:08:49.000 --> 00:08:54.240
Science actually, the space of opportunity&nbsp;
is colossal because science is, science is&nbsp;&nbsp;

00:08:54.240 --> 00:08:59.800
really just understanding more about the world&nbsp;
around us. And so the range of possibilities is&nbsp;&nbsp;

00:08:59.800 --> 00:09:05.840
daunting really. So in choosing what to work on,&nbsp;
I think there are several factors. Yes, of course,&nbsp;&nbsp;

00:09:05.840 --> 00:09:10.560
data is important, but very interestingly, we&nbsp;
can use experimental data or we can generate&nbsp;&nbsp;

00:09:10.560 --> 00:09:14.960
synthetic data by running simulators. So we're a&nbsp;
big fan of the fifth paradigm. But I think another&nbsp;&nbsp;

00:09:14.960 --> 00:09:20.480
factor—and this is particularly at Microsoft—is&nbsp;
thinking about, how can we have real-world impact&nbsp;&nbsp;

00:09:20.480 --> 00:09:24.680
at scale? Because that's our job, is to make the&nbsp;
world a better place and to do so at a planetary&nbsp;&nbsp;

00:09:24.680 --> 00:09:31.960
scale. And so we've settled on, for the most part,&nbsp;
working at the molecular level. So if you think&nbsp;&nbsp;

00:09:31.960 --> 00:09:36.720
about the number of different ways of combining&nbsp;
atoms together to make new stable configurations&nbsp;&nbsp;

00:09:36.720 --> 00:09:43.800
of atoms, it’s gargantuan. I mean, the number of&nbsp;
just small molecules, small organic molecules,&nbsp;&nbsp;

00:09:43.800 --> 00:09:48.200
that are potential drug candidates is about 10&nbsp;
to the power of 60. It's about the same as the&nbsp;&nbsp;

00:09:48.200 --> 00:09:53.200
number of atoms in the solar system. The number&nbsp;
of proteins, maybe the fourth power of the number&nbsp;&nbsp;

00:09:53.200 --> 00:09:58.080
of atoms in the universe, or something crazy. So&nbsp;
you've got this gargantuan space to search, and&nbsp;&nbsp;

00:09:58.080 --> 00:10:03.440
within that space, for sure, there'll be all sorts&nbsp;
of interesting molecules, materials, new drugs,&nbsp;&nbsp;

00:10:03.440 --> 00:10:08.280
new therapies, new materials for carbon capture,&nbsp;
new kinds of batteries, new photovoltaics. The&nbsp;&nbsp;

00:10:08.280 --> 00:10:12.840
list is endless because everything around us&nbsp;
is made of atoms, including our own bodies.&nbsp;&nbsp;

00:10:12.840 --> 00:10:18.240
So the potential just in the molecular space is&nbsp;
gargantuan. And so that's why we focus there.&nbsp;

00:10:18.240 --> 00:10:21.960
STRICKLAND: It's a big focus. [LAUGHTER]
BISHOP: It's a broad focus, still, yes.&nbsp;

00:10:21.960 --> 00:10:27.600
STRICKLAND: So let's take one of these case&nbsp;
studies then. In a project on drug discovery,&nbsp;&nbsp;

00:10:27.600 --> 00:10:33.440
you worked with the Global Health Drug Discovery&nbsp;
Institute on molecules that would interact with&nbsp;&nbsp;

00:10:33.440 --> 00:10:40.080
tuberculosis and coronaviruses, I think. And you&nbsp;
found, I think, candidate molecules in five months&nbsp;&nbsp;

00:10:40.080 --> 00:10:46.040
instead of several years. Can you talk about&nbsp;
what models you used in this work and how they&nbsp;&nbsp;

00:10:46.040 --> 00:10:50.520
helped you get this vastly sped up process?
BISHOP: Sure. Yes. We're very proud of this&nbsp;&nbsp;

00:10:50.520 --> 00:10:55.640
project. We're working with the Gates Foundation&nbsp;
and the Global Health Drug Discovery Institute to&nbsp;&nbsp;

00:10:55.640 --> 00:10:59.240
look at particularly diseases that affect&nbsp;
low-income countries like tuberculosis.&nbsp;&nbsp;

00:11:00.200 --> 00:11:04.400
And in terms of the models we use, I think we're&nbsp;
all familiar with a large language model. We train&nbsp;&nbsp;

00:11:04.400 --> 00:11:09.000
it on a sequence of words or sequence of word&nbsp;
tokens, and it's trained to predict the next&nbsp;&nbsp;

00:11:09.000 --> 00:11:13.880
token. We can do a similar thing, but instead of&nbsp;
learning the language of humans, we can learn the&nbsp;&nbsp;

00:11:13.880 --> 00:11:20.120
language of nature. So in particular, what we're&nbsp;
looking for here is a small organic molecule&nbsp;&nbsp;

00:11:20.120 --> 00:11:24.840
that we could synthesize in a laboratory that will&nbsp;
bind with a particular target protein. It's called&nbsp;&nbsp;

00:11:24.840 --> 00:11:30.800
ClpP. And by interfering with that protein, we can&nbsp;
arrest the process of tuberculosis. So the goal is&nbsp;&nbsp;

00:11:30.800 --> 00:11:36.320
to search that space of 10 to the 60 molecules and&nbsp;
find a new one that has the right properties. Now,&nbsp;&nbsp;

00:11:36.320 --> 00:11:41.200
the way we do this is to train something that's&nbsp;
essentially a transformer. So it looks like a&nbsp;&nbsp;

00:11:41.200 --> 00:11:46.520
language model, but the language it's trained&nbsp;
on is a thing called SMILES strings. It's an&nbsp;&nbsp;

00:11:46.520 --> 00:11:49.960
idea that's been around in chemistry for&nbsp;
a long time. It's just a way of taking a&nbsp;&nbsp;

00:11:49.960 --> 00:11:55.280
three-dimensional molecule and representing it as&nbsp;
a one-dimensional sequence of characters. So this&nbsp;&nbsp;

00:11:55.280 --> 00:12:01.680
is perfect for feeding into a language model. So&nbsp;
we take a transformer and we train it on a large&nbsp;&nbsp;

00:12:01.680 --> 00:12:06.680
database of small organic molecules that are, sort&nbsp;
of, typical of the kinds of things you might see&nbsp;&nbsp;

00:12:06.680 --> 00:12:12.840
in the space of drug molecules. Once that's been&nbsp;
trained, we can now run it generatively. And it&nbsp;&nbsp;

00:12:12.840 --> 00:12:17.240
will output new molecules. Now, we don't just&nbsp;
want to generate molecules at random because&nbsp;&nbsp;

00:12:17.240 --> 00:12:22.480
that doesn't help. We want to generate molecules&nbsp;
that bind to this particular binding site on this&nbsp;&nbsp;

00:12:22.480 --> 00:12:27.440
particular protein. So the next step is we&nbsp;
have to tell the model about the protein and&nbsp;&nbsp;

00:12:27.440 --> 00:12:33.320
the protein binding site. And we do that by giving&nbsp;
it information about not actually—well, we do tell&nbsp;&nbsp;

00:12:33.320 --> 00:12:37.200
it about the whole protein, but we especially&nbsp;
give it information about the three-dimensional&nbsp;&nbsp;

00:12:37.200 --> 00:12:42.600
geometry of the binding site. So we tell about&nbsp;
the locations of the atoms that are in the binding&nbsp;&nbsp;

00:12:42.600 --> 00:12:47.760
site. And we do this in a way that satisfies&nbsp;
certain physics constraints, sort of, equivariance&nbsp;&nbsp;

00:12:47.760 --> 00:12:52.240
properties, it's called. So if you think about a&nbsp;
molecule, if I rotate the molecule in space, the&nbsp;&nbsp;

00:12:52.240 --> 00:12:56.280
positions of all the atoms change in a complicated&nbsp;
way. But it's the same molecule; it has the same&nbsp;&nbsp;

00:12:56.280 --> 00:13:01.480
energy and other properties and so on. So we need&nbsp;
the right kind of representation. That's then fed&nbsp;&nbsp;

00:13:01.480 --> 00:13:06.440
into this transformer using a technique called&nbsp;
cross-attention. So internally, the transformer&nbsp;&nbsp;

00:13:06.440 --> 00:13:11.120
uses self-attention to look at the history&nbsp;
of tokens, but it can now use cross-attention&nbsp;&nbsp;

00:13:11.120 --> 00:13:15.560
to look at another model that understands the&nbsp;
proteins. But even that's not enough. Because&nbsp;&nbsp;

00:13:15.560 --> 00:13:21.120
in discovering drugs and exploring this gargantuan&nbsp;
space and looking for these needles in a haystack,&nbsp;&nbsp;

00:13:21.120 --> 00:13:25.680
what typically happens [is] you find a hit,&nbsp;
a molecule that binds, but now you want to&nbsp;&nbsp;

00:13:25.680 --> 00:13:30.120
optimize it. You want to make lots of small&nbsp;
variations of that molecule in order to make&nbsp;&nbsp;

00:13:30.120 --> 00:13:34.800
it better and better at binding. So the third&nbsp;
piece of the architecture is another module, a&nbsp;&nbsp;

00:13:34.800 --> 00:13:40.440
thing called a variational autoencoder, that again&nbsp;
uses deep learning. But this time, it can take as&nbsp;&nbsp;

00:13:40.440 --> 00:13:46.960
input an organic molecule that is already known,&nbsp;
a hit that's already known to bind to the site,&nbsp;&nbsp;

00:13:46.960 --> 00:13:53.160
and that again is fed in through cross-attention.&nbsp;
And now the SMILES autoregressive model can now&nbsp;&nbsp;

00:13:53.160 --> 00:13:57.560
generate a molecule that's an improvement&nbsp;
on the starting molecule and knows about the&nbsp;&nbsp;

00:13:57.560 --> 00:14:02.880
protein binding. And so what we do is, we start&nbsp;
off with the state-of-the-art molecule. And the&nbsp;&nbsp;

00:14:02.880 --> 00:14:08.120
best example we found is one that's more than&nbsp;
two orders of magnitude stronger binding affinity&nbsp;&nbsp;

00:14:08.120 --> 00:14:12.520
to the binding pocket, which is a tremendous&nbsp;
advance; it’s the state of the art in addressing&nbsp;&nbsp;

00:14:12.520 --> 00:14:17.080
tuberculosis. And of course, the exciting thing&nbsp;
is that this is tested in the laboratory. So this&nbsp;&nbsp;

00:14:17.080 --> 00:14:21.320
is not just a computer experiment in some sort&nbsp;
of benchmark or whatever. We sent a description&nbsp;&nbsp;

00:14:21.320 --> 00:14:26.480
of the molecule to the laboratories at GHDDI.&nbsp;
They synthesized a molecule, characterized it,&nbsp;&nbsp;

00:14:26.480 --> 00:14:30.800
measured its binding property, and said, well,&nbsp;
hey, this is a new state of the art for this&nbsp;&nbsp;

00:14:30.800 --> 00:14:35.920
target protein. So we're continuing to work with&nbsp;
them to further refine this. There are obviously&nbsp;&nbsp;

00:14:35.920 --> 00:14:40.040
quite a few more steps. If you know about the&nbsp;
drug discovery process, there’s a lot of hurdles&nbsp;&nbsp;

00:14:40.040 --> 00:14:43.520
you have to get through, including, of course,&nbsp;
very important clinical trials, before you have&nbsp;&nbsp;

00:14:43.520 --> 00:14:49.200
something that can actually be used in humans.&nbsp;
But we're already hugely excited about the fact&nbsp;&nbsp;

00:14:49.200 --> 00:14:53.720
that we were able to make such a big advance&nbsp;
so quickly, in such a short amount of time,&nbsp;&nbsp;

00:14:53.720 --> 00:14:59.040
compared to the usual drug discovery process.
STRICKLAND: And while you were looking for that&nbsp;&nbsp;

00:14:59.040 --> 00:15:04.720
molecule that had the proper characteristics,&nbsp;
were you also determining whether it could be&nbsp;&nbsp;

00:15:04.720 --> 00:15:08.240
manufactured easily, like trying to think about&nbsp;
practical realities of bringing this thing out&nbsp;&nbsp;

00:15:08.240 --> 00:15:11.320
of the computer and into the lab?
BISHOP: Great question. I mean,&nbsp;&nbsp;

00:15:11.320 --> 00:15:16.240
you're hinting there at the fact the discovery&nbsp;
process, of course, is a long pipeline. You start&nbsp;&nbsp;

00:15:16.240 --> 00:15:20.480
with the protein. You have to find a molecule that&nbsp;
binds. You then refine the molecule. Now you have&nbsp;&nbsp;

00:15:20.480 --> 00:15:25.440
to look at ADMET, you know, the absorption,&nbsp;
metabolism, and excretion and so on of the&nbsp;&nbsp;

00:15:25.440 --> 00:15:30.000
molecule. Also make sure that it's not toxic.&nbsp;
But then you need to be able to synthesize it.&nbsp;&nbsp;

00:15:30.000 --> 00:15:33.400
It's no good if nobody can make this molecule.&nbsp;
So you have to look at that. So, actually,&nbsp;&nbsp;

00:15:33.400 --> 00:15:38.720
in the AI for Science team, we look at all of&nbsp;
these aspects of that drug discovery process.&nbsp;&nbsp;

00:15:38.720 --> 00:15:42.320
And we find particular areas, especially where&nbsp;
there’s, sort of, low-hanging fruit where we can&nbsp;&nbsp;

00:15:42.320 --> 00:15:47.640
see that deep learning can make a big impact. It&nbsp;
doesn't necessarily help much to take a very easy,&nbsp;&nbsp;

00:15:47.640 --> 00:15:51.280
fast piece of the pipeline and go work on that.&nbsp;
You want to understand, what are the bottlenecks,&nbsp;&nbsp;

00:15:51.280 --> 00:15:54.560
and can we really unlock those with deep&nbsp;
learning? So we're very interested in that&nbsp;&nbsp;

00:15:54.560 --> 00:15:59.120
whole process. It’s a fascinating problem.&nbsp;
You've got a gargantuan search space,&nbsp;&nbsp;

00:15:59.120 --> 00:16:03.280
and yet you have so many different constraints&nbsp;
that need to be met. And deep learning just feels&nbsp;&nbsp;

00:16:03.280 --> 00:16:07.520
like the perfect tool to go after this problem.
STRICKLAND: When you talk to the scientists&nbsp;&nbsp;

00:16:07.520 --> 00:16:11.200
that you collaborate with, is AI&nbsp;
changing the kinds of questions&nbsp;&nbsp;

00:16:11.200 --> 00:16:16.520
that they are able to ask? That they want to ask?
BISHOP: Oh, for sure. And it's really empowering.&nbsp;&nbsp;

00:16:16.520 --> 00:16:22.400
It's enabling those working in the drug discovery&nbsp;
space to, I think, to think in a much more&nbsp;&nbsp;

00:16:22.400 --> 00:16:27.200
expansive way. If you think about just the kind&nbsp;
of acceleration that I talked about from the fifth&nbsp;&nbsp;

00:16:27.200 --> 00:16:33.080
paradigm, if you go to four-order-of-magnitude&nbsp;
acceleration, OK, it may not sound like much of&nbsp;&nbsp;

00:16:33.080 --> 00:16:38.600
a dent onto the 10 to the power 60 space, but now&nbsp;
when you're exploring variants of molecules and so&nbsp;&nbsp;

00:16:38.600 --> 00:16:43.520
on, the ability to explore that space orders&nbsp;
of magnitude faster allows you to think much&nbsp;&nbsp;

00:16:43.520 --> 00:16:49.120
more creatively, allows you to think in a more&nbsp;
expansive way about how much of that space you can&nbsp;&nbsp;

00:16:49.120 --> 00:16:53.280
explore and how efficiently you can explore it.&nbsp;
So I think it really is opening up new horizons,&nbsp;&nbsp;

00:16:53.280 --> 00:16:58.160
and certainly, we have an exciting partnership&nbsp;
with Novartis. We've been working with them for&nbsp;&nbsp;

00:16:58.160 --> 00:17:04.280
the last five years, and they've been deploying&nbsp;
some of our techniques and models in practice&nbsp;&nbsp;

00:17:04.280 --> 00:17:09.880
for their drug discovery pipeline. We get a lot&nbsp;
of great feedback from them about how exciting&nbsp;&nbsp;

00:17:09.880 --> 00:17:13.640
they're finding these techniques to use in&nbsp;
practice because it is changing the way they&nbsp;&nbsp;

00:17:13.640 --> 00:17:18.840
go about doing the drug discovery process.
STRICKLAND: To jump to one other case study,&nbsp;&nbsp;

00:17:18.840 --> 00:17:22.960
we don't have to go into great detail on it,&nbsp;
but I'm very curious about your Project Aurora,&nbsp;&nbsp;

00:17:22.960 --> 00:17:27.480
this foundation model for state-of-the-art&nbsp;
weather forecasting that, I believe, is 5,000&nbsp;&nbsp;

00:17:27.480 --> 00:17:32.760
times faster than traditional physics-based&nbsp;
methods. Can you talk a little bit about how&nbsp;&nbsp;

00:17:32.760 --> 00:17:38.200
that project is evolving, how you imagine these&nbsp;
AI forecasting models working with traditional&nbsp;&nbsp;

00:17:38.200 --> 00:17:42.640
forecasting models, perhaps, or replacing them?
BISHOP: Yes. So I said most of what we do is&nbsp;&nbsp;

00:17:42.640 --> 00:17:47.320
down at the molecular level. So this is one of the&nbsp;
exceptions. So this is really at the global level,&nbsp;&nbsp;

00:17:47.320 --> 00:17:52.200
the planetary level. Again, it's a beautiful&nbsp;
example of the fifth paradigm because the way&nbsp;&nbsp;

00:17:52.200 --> 00:17:56.240
forecasting has been done for a number of&nbsp;
decades now and the way most forecasting is&nbsp;&nbsp;

00:17:56.240 --> 00:17:59.920
done at the moment is through what's called&nbsp;
numerical weather prediction. So again,&nbsp;&nbsp;

00:17:59.920 --> 00:18:05.200
you have these simple equations. It's no longer&nbsp;
Schrödinger’s equation of atomic physics. It's&nbsp;&nbsp;

00:18:05.200 --> 00:18:10.320
now Navier–Stokes equations of fluid flows and&nbsp;
a whole bunch of other equations that describe&nbsp;&nbsp;

00:18:10.320 --> 00:18:15.160
moisture in the atmosphere and the weather&nbsp;
and so on. And those equations are solved&nbsp;&nbsp;

00:18:15.160 --> 00:18:21.800
on a supercomputer. And again, we can think&nbsp;
of that numerical simulator now not just as&nbsp;&nbsp;

00:18:21.800 --> 00:18:26.320
the way you're going to do the forecasting but&nbsp;
actually as the way to generate training data&nbsp;&nbsp;

00:18:26.320 --> 00:18:30.520
for a deep learning emulator. So several&nbsp;
groups have been exploring this over the&nbsp;&nbsp;

00:18:30.520 --> 00:18:35.640
last couple of years. And again, we see this&nbsp;
very robust three-to-four-order-of-magnitude&nbsp;&nbsp;

00:18:35.640 --> 00:18:41.240
acceleration. But what's really interesting about&nbsp;
Aurora, it's the world's first foundation model,&nbsp;&nbsp;

00:18:41.240 --> 00:18:47.000
so instead of just building an emulator of&nbsp;
a particular numerical weather simulator,&nbsp;&nbsp;

00:18:47.000 --> 00:18:53.720
which is already very interesting, we trained&nbsp;
Aurora on a much more diverse set of data and&nbsp;&nbsp;

00:18:53.720 --> 00:18:58.720
really trying to force it not just to emulate&nbsp;
a particular simulator but really, as it were,&nbsp;&nbsp;

00:18:59.360 --> 00:19:05.440
understand or model the fundamental equations of&nbsp;
fluid flows in the Earth's atmosphere. And then&nbsp;&nbsp;

00:19:05.440 --> 00:19:10.280
the reason we want to do this is because we now&nbsp;
want to take that foundation model and fine-tune&nbsp;&nbsp;

00:19:10.280 --> 00:19:16.040
it to other downstream applications where there’s&nbsp;
much less data. So one example would be pollution&nbsp;&nbsp;

00:19:16.040 --> 00:19:21.960
flow. So obviously the flow of pollution around&nbsp;
the atmosphere is extremely important. But the&nbsp;&nbsp;

00:19:21.960 --> 00:19:26.200
data is far more sparse. There are far fewer&nbsp;
sensors for pollution than there are for, sort of,&nbsp;&nbsp;

00:19:26.200 --> 00:19:32.760
wind and rain and temperature and so on. And so we&nbsp;
were able to achieve state-of-the-art performance&nbsp;&nbsp;

00:19:32.760 --> 00:19:39.000
in modeling the flow of pollution by leveraging&nbsp;
huge data and building this foundation model and&nbsp;&nbsp;

00:19:39.000 --> 00:19:44.520
then using relatively little data, our pollution&nbsp;
monitoring, to build that downstream fine-tuned&nbsp;&nbsp;

00:19:44.520 --> 00:19:49.680
model. So beautiful example of a foundation model.
STRICKLAND: That is a cool example. And finally,&nbsp;&nbsp;

00:19:49.680 --> 00:19:53.320
just to wrap up, what have you seen or heard&nbsp;
at NeurIPS that’s gotten you excited? What&nbsp;&nbsp;

00:19:53.320 --> 00:19:56.320
kind of trends are in the air? What’s the buzz?
BISHOP: Oh, that’s a great question. I mean,&nbsp;&nbsp;

00:19:56.320 --> 00:20:00.280
it's such a huge conference. There's something&nbsp;
like 17,000 people or so here this year,&nbsp;&nbsp;

00:20:00.280 --> 00:20:05.040
I've heard. I think, you know, one of the things&nbsp;
that's happened so far that's actually given me an&nbsp;&nbsp;

00:20:05.040 --> 00:20:10.320
enormous amount of energy wasn’t just a technical&nbsp;
talk. It was actually an event we had on the first&nbsp;&nbsp;

00:20:10.320 --> 00:20:15.720
day called Women in Machine Learning. And I&nbsp;
was a mentor on one of the mentorship tables,&nbsp;&nbsp;

00:20:15.720 --> 00:20:21.320
and I found it very energizing just to meet&nbsp;
so many people, early-career-stage people,&nbsp;&nbsp;

00:20:21.320 --> 00:20:26.440
who were very excited about AI for Science and&nbsp;
realizing that, you know, it's not just that I&nbsp;&nbsp;

00:20:26.440 --> 00:20:30.560
think AI for Science is important. A lot of&nbsp;
people are moving into this field now. It is&nbsp;&nbsp;

00:20:30.560 --> 00:20:35.520
a big frontier for AI. I'm a little biased,&nbsp;
perhaps. I think that it's the most important&nbsp;&nbsp;

00:20:35.520 --> 00:20:39.840
application area. Intellectually, it's very&nbsp;
exciting because we get to deal with science&nbsp;&nbsp;

00:20:39.840 --> 00:20:45.880
as well as machine learning. But also if you think&nbsp;
about [it], science is really about learning more&nbsp;&nbsp;

00:20:45.880 --> 00:20:50.680
about the world. And once we learn more about&nbsp;
the world, we can then develop aquaculture;&nbsp;&nbsp;

00:20:50.680 --> 00:20:54.120
we can develop the steam engine; we can develop&nbsp;
silicon chips; we can change the world. We can&nbsp;&nbsp;

00:20:54.120 --> 00:20:58.720
save lives and make the world a better place. And&nbsp;
so I think it's the most fundamental undertaking&nbsp;&nbsp;

00:20:58.720 --> 00:21:03.720
we have in AI for Science and the thing I&nbsp;
loved about the Women in Machine Learning&nbsp;&nbsp;

00:21:03.720 --> 00:21:08.000
event is that the AI for Science table was&nbsp;
just completely swamped with all of these&nbsp;&nbsp;

00:21:08.000 --> 00:21:12.160
people at early stages of their career, either&nbsp;
already working in this field and doing PhDs&nbsp;&nbsp;

00:21:12.160 --> 00:21:16.280
or wanting to get into it. That was very exciting.
STRICKLAND: That is really exciting and inspiring,&nbsp;&nbsp;

00:21:16.280 --> 00:21:20.520
and it gives me a lot of hope. Well, Chris&nbsp;
Bishop, thank you so much for joining us&nbsp;&nbsp;

00:21:20.520 --> 00:21:24.061
today and thanks for a great conversation.
BISHOP: Thank you. I really appreciate it.&nbsp;

00:21:24.061 --> 00:21:25.080
[MUSIC]
STRICKLAND: And to our listeners,&nbsp;&nbsp;

00:21:25.080 --> 00:21:28.600
thanks for tuning in. If you want to&nbsp;
learn more about research at Microsoft,&nbsp;&nbsp;

00:21:28.600 --> 00:21:33.560
you can check out the Microsoft Research&nbsp;
website at microsoft.com/research. Until&nbsp;&nbsp;

00:21:33.560 --> 00:21:50.004
next time.
[MUSIC FADES]

