00:00:00.983 --> 00:00:01.711
[MUSIC] 

00:00:01.711 --> 00:00:08.720
AMBER TINGLE: Welcome to Abstracts, a Microsoft&nbsp;
Research Podcast that puts the spotlight on&nbsp;&nbsp;

00:00:08.720 --> 00:00:16.240
world-class research in brief. I’m Amber Tingle.&nbsp;
In this series, members of the research community&nbsp;&nbsp;

00:00:16.240 --> 00:00:24.240
at Microsoft give us a quick snapshot—or a podcast&nbsp;
abstract—of their new and noteworthy papers.

00:00:24.240 --> 00:00:25.120
[MUSIC FADES]

00:00:25.120 --> 00:00:31.920
Our guests today are Megan Stanley and Wessel&nbsp;
Bruinsma. They are both senior researchers&nbsp;&nbsp;

00:00:31.920 --> 00:00:38.080
within the Microsoft Research AI for Science&nbsp;
initiative. They are also two of the coauthors&nbsp;&nbsp;

00:00:38.080 --> 00:00:44.640
on a new Nature publication called “A&nbsp;
Foundation Model for the Earth System.”

00:00:44.640 --> 00:00:48.000
This is such exciting work&nbsp;
about environmental forecasting,&nbsp;&nbsp;

00:00:48.000 --> 00:00:51.040
so we're happy to have the&nbsp;
two of you join us today.

00:00:51.040 --> 00:00:52.960
Megan and Wessel, welcome.

00:00:52.960 --> 00:00:56.207
MEGAN STANLEY: Thank you.&nbsp;
Thanks. Great to be here.

00:00:56.207 --> 00:00:58.480
WESSEL BRUINSMA: Thanks.
TINGLE: Let's jump right in. Wessel,

00:00:58.480 --> 00:01:03.680
share a bit about the problem your research&nbsp;
addresses and why this work is so important.

00:01:03.680 --> 00:01:08.240
BRUINSMA: I think we're all very much aware of the&nbsp;
revolution that's happening in the space of large&nbsp;&nbsp;

00:01:08.240 --> 00:01:13.680
language models, which have just become so strong.&nbsp;
What's perhaps lesser well-known is that machine&nbsp;&nbsp;

00:01:13.680 --> 00:01:18.800
learning models have also started to revolutionize&nbsp;
this field of weather prediction. Whereas&nbsp;&nbsp;

00:01:18.800 --> 00:01:23.520
traditional weather prediction models, based on&nbsp;
physical laws, used to be the state of the art,&nbsp;&nbsp;

00:01:23.520 --> 00:01:28.320
these traditional models are now challenged&nbsp;
and often even outperformed by AI models.

00:01:28.320 --> 00:01:32.960
This advancement is super impressive and&nbsp;
really a big deal. Mostly because AI weather&nbsp;&nbsp;

00:01:32.960 --> 00:01:36.400
forecasting models are computationally&nbsp;
much more efficient and can even be&nbsp;&nbsp;

00:01:36.400 --> 00:01:40.400
more accurate. What's unfortunate&nbsp;
though, about this big step forward,&nbsp;&nbsp;

00:01:40.400 --> 00:01:45.040
is that these developments are mostly limited&nbsp;
to the setting of weather forecasting.

00:01:45.040 --> 00:01:48.880
Weather forecasting is very important,&nbsp;
obviously, but there are many other&nbsp;&nbsp;

00:01:48.880 --> 00:01:54.160
important environmental forecasting problems&nbsp;
out there, such as air pollution forecasting&nbsp;&nbsp;

00:01:54.160 --> 00:02:00.400
or ocean wave forecasting. We have developed a&nbsp;
model, named Aurora, which really kicks the AI&nbsp;&nbsp;

00:02:00.400 --> 00:02:05.280
revolution in weather forecasting into the&nbsp;
next gear by extending these advancements&nbsp;&nbsp;

00:02:05.280 --> 00:02:10.560
to other environmental forecasting fields,&nbsp;
too. With Aurora, we're now able to produce&nbsp;&nbsp;

00:02:10.560 --> 00:02:15.960
state-of-the-art air pollution forecasts using&nbsp;
an AI approach. And that wasn't possible before!

00:02:15.960 --> 00:02:19.440
TINGLE: Megan, how does this approach differ from&nbsp;&nbsp;

00:02:19.440 --> 00:02:24.280
or build on work that's already been&nbsp;
done in the atmospheric sciences?

00:02:24.280 --> 00:02:28.480
STANLEY: Current approaches have really&nbsp;
focused training very specifically on&nbsp;&nbsp;

00:02:28.480 --> 00:02:34.080
weather forecasting models. And in contrast,&nbsp;
with Aurora, what we've attempted to do is&nbsp;&nbsp;

00:02:34.080 --> 00:02:38.400
train a so-called foundation model for&nbsp;
the Earth system. In the first step,&nbsp;&nbsp;

00:02:38.400 --> 00:02:44.240
we train Aurora on a vast body of Earth&nbsp;
system data. This is our pretraining step.

00:02:44.240 --> 00:02:49.520
And when I say a vast body of data, I really do&nbsp;
mean a lot. And the purpose of this pretraining&nbsp;&nbsp;

00:02:49.520 --> 00:02:54.640
is to let Aurora, kind of, learn some&nbsp;
general-purpose representation of the&nbsp;&nbsp;

00:02:54.640 --> 00:02:59.200
dynamics that govern the Earth system.&nbsp;
But then once we've pretrained Aurora,&nbsp;&nbsp;

00:02:59.200 --> 00:03:04.000
and this really is the crux of this, the reason&nbsp;
why we're doing this project, is after the model&nbsp;&nbsp;

00:03:04.000 --> 00:03:09.280
has been pretrained, it can leverage this&nbsp;
learned general-purpose representation and&nbsp;&nbsp;

00:03:09.280 --> 00:03:15.920
efficiently adapt to new tasks, new domains,&nbsp;
new variables. And this is called fine-tuning.

00:03:15.920 --> 00:03:19.920
The idea is that the model really uses&nbsp;
the learned representation to perform&nbsp;&nbsp;

00:03:19.920 --> 00:03:25.680
this adaptation very efficiently, which&nbsp;
basically means Aurora is a powerful,&nbsp;&nbsp;

00:03:25.680 --> 00:03:31.200
flexible model that can relatively cheaply be&nbsp;
adapted to any environmental forecasting task.

00:03:31.200 --> 00:03:34.320
TINGLE: Wessel, can you tell us about your&nbsp;&nbsp;

00:03:34.320 --> 00:03:38.200
methodology? How did you&nbsp;
all conduct this research?

00:03:38.200 --> 00:03:44.240
BRUINSMA: While approaches so far have trained&nbsp;
models on primarily one particular data set,&nbsp;&nbsp;

00:03:44.240 --> 00:03:49.440
this one dataset is very large, which makes&nbsp;
it possible to train very good models. But it&nbsp;&nbsp;

00:03:49.440 --> 00:03:54.560
does remain only one dataset, and that's not&nbsp;
very diverse. In the domain of environmental&nbsp;&nbsp;

00:03:54.560 --> 00:03:59.760
forecasting, we have really tried to push the&nbsp;
limits of scaling to large data by training&nbsp;&nbsp;

00:03:59.760 --> 00:04:05.760
Aurora on not just this one large dataset, but&nbsp;
on as many very large datasets as we could find.

00:04:05.760 --> 00:04:10.080
These datasets are a combination of estimates&nbsp;
of the historical state of the world,&nbsp;&nbsp;

00:04:10.080 --> 00:04:15.040
forecasts by other models, climate simulations,&nbsp;
and more. We've been able to show that training&nbsp;&nbsp;

00:04:15.040 --> 00:04:19.760
on not just more data but more diverse&nbsp;
data helps the model achieve even better&nbsp;&nbsp;

00:04:19.760 --> 00:04:24.080
performance. Showing this is difficult&nbsp;
because there is just so much data.

00:04:24.080 --> 00:04:28.320
In addition to scaling to more and more&nbsp;
diverse data, we also increased the size&nbsp;&nbsp;

00:04:28.320 --> 00:04:34.080
of the model as much as we could. Here we found&nbsp;
that bigger models, despite being slower to run,&nbsp;&nbsp;

00:04:34.080 --> 00:04:39.200
make more efficient use of computational&nbsp;
resources. It's cheaper to train a good big&nbsp;&nbsp;

00:04:39.200 --> 00:04:44.400
model than a good small model. The mantra of&nbsp;
this project was to really keep it simple and&nbsp;&nbsp;

00:04:44.400 --> 00:04:50.240
to scale to simultaneously very large and, more&nbsp;
importantly, diverse data and large model size.

00:04:50.240 --> 00:04:53.280
TINGLE: So, Megan, what were your major&nbsp;&nbsp;

00:04:53.280 --> 00:04:56.871
findings? And we know they're major&nbsp;
because they're in Nature. [LAUGHS]

00:04:56.871 --> 00:05:02.320
STANLEY: Yeah, [LAUGHS] I guess they really are.&nbsp;
So the main outcome of this project is we were&nbsp;&nbsp;

00:05:02.320 --> 00:05:08.480
actually able to train a single foundation model&nbsp;
that achieves state-of-the-art performance in&nbsp;&nbsp;

00:05:08.480 --> 00:05:12.880
four different domains. Air pollution&nbsp;
forecasting. For example, predicting&nbsp;&nbsp;

00:05:12.880 --> 00:05:18.400
particulate matter near the surface or ozone&nbsp;
in the atmosphere. Ocean wave forecasting,&nbsp;&nbsp;

00:05:18.400 --> 00:05:21.680
which is critical for planning shipping routes.

00:05:21.680 --> 00:05:23.600
Tropical cyclone track forecasting,&nbsp;&nbsp;

00:05:23.600 --> 00:05:29.200
so that means being able to predict where&nbsp;
a hurricane or a typhoon is expected to go,&nbsp;&nbsp;

00:05:29.200 --> 00:05:34.320
which is obviously incredibly important, and&nbsp;
very high-resolution weather forecasting.

00:05:34.320 --> 00:05:38.320
And I've, kind of, named these forecasting&nbsp;
domains as if they're just items in a list,&nbsp;&nbsp;

00:05:38.320 --> 00:05:43.040
but in every single one, Aurora really&nbsp;
pushed the limits of what is possible&nbsp;&nbsp;

00:05:43.040 --> 00:05:46.240
with AI models. And we're really proud of that.

00:05:46.240 --> 00:05:51.680
But perhaps, kind of, you know, to my mind, the&nbsp;
key takeaway here is that the foundation model&nbsp;&nbsp;

00:05:51.680 --> 00:05:56.960
approach actually works. So what we have shown&nbsp;
is it's possible to actually train some kind&nbsp;&nbsp;

00:05:56.960 --> 00:06:02.720
of general model, a foundation model, and then&nbsp;
adapt it to a wide variety of environmental tasks.&nbsp;&nbsp;

00:06:02.720 --> 00:06:07.440
Now we definitely do not claim that Aurora&nbsp;
is some kind of ultimate environmental&nbsp;&nbsp;

00:06:07.440 --> 00:06:12.240
forecasting model. We are sure that the&nbsp;
model and the pretraining procedure can&nbsp;&nbsp;

00:06:12.240 --> 00:06:18.560
actually be improved. But, nevertheless,&nbsp;
we've shown that this approach works for&nbsp;&nbsp;

00:06:18.560 --> 00:06:22.920
environmental forecasting. It really holds&nbsp;
massive promise, and that's incredibly cool.

00:06:22.920 --> 00:06:29.840
TINGLE: Wessel, what do you think will&nbsp;
be the real-world impact of this work?

00:06:29.840 --> 00:06:33.600
BRUINSMA: Well, for applications that&nbsp;
we mentioned, which are air pollution&nbsp;&nbsp;

00:06:33.600 --> 00:06:37.760
forecasting, ocean wave forecasting,&nbsp;
tropical cyclone track forecasting,&nbsp;&nbsp;

00:06:37.760 --> 00:06:43.200
and very high-resolution weather forecasting,&nbsp;
Aurora could today be deployed in real-time&nbsp;&nbsp;

00:06:43.200 --> 00:06:47.440
systems to produce near real-time&nbsp;
forecasts. And, you know, in fact,&nbsp;&nbsp;

00:06:47.440 --> 00:06:53.040
it already is. You can view real-time weather&nbsp;
forecasts by the high-resolution version of&nbsp;&nbsp;

00:06:53.040 --> 00:06:55.920
the model on the website of ECMWF (European&nbsp;
Centre for Medium-Range Weather Forecasts).

00:06:55.920 --> 00:07:01.040
But what's remarkable is that every of these&nbsp;
applications took a small team of engineers&nbsp;&nbsp;

00:07:01.040 --> 00:07:06.000
about four to eight weeks to fully execute. You&nbsp;
should compare this to a typical development&nbsp;&nbsp;

00:07:06.000 --> 00:07:11.520
timeline for more traditional models, which&nbsp;
can be on the order of multiple years. Using&nbsp;&nbsp;

00:07:11.520 --> 00:07:16.400
the pretraining fine-tuning approach that we&nbsp;
used for Aurora, we might see significantly&nbsp;&nbsp;

00:07:16.400 --> 00:07:21.760
accelerated development cycles for environmental&nbsp;
forecasting problems. And that's exciting.

00:07:21.760 --> 00:07:28.720
TINGLE: Megan, if our listeners only walk away&nbsp;
from this conversation with one key talking point,&nbsp;&nbsp;

00:07:28.720 --> 00:07:33.240
what would you like that to be? What&nbsp;
should we remember about this paper?

00:07:33.240 --> 00:07:37.920
STANLEY: The biggest takeaway is that&nbsp;
the pretraining fine-tuning paradigm,&nbsp;&nbsp;

00:07:37.920 --> 00:07:42.880
it really works for environmental forecasting,&nbsp;
right? So you can train a foundational model,&nbsp;&nbsp;

00:07:42.880 --> 00:07:47.040
it learns some kind of general-purpose&nbsp;
representation of the Earth system dynamics,&nbsp;&nbsp;

00:07:47.040 --> 00:07:53.280
and this representation boosts performance in a&nbsp;
wide variety of forecasting tasks. But we really&nbsp;&nbsp;

00:07:53.280 --> 00:07:58.320
want to emphasize that Aurora only scratches&nbsp;
the surface of what's actually possible. 
 
 

00:07:58.320 --> 00:08:03.440
So there are many more applications to explore&nbsp;
than the four we've mentioned. And undoubtedly,&nbsp;&nbsp;

00:08:03.440 --> 00:08:05.760
the model and pretraining procedure can actually&nbsp;&nbsp;

00:08:05.760 --> 00:08:10.880
be improved. So we're really excited to&nbsp;
see what the next few years will bring.

00:08:10.880 --> 00:08:14.880
TINGLE: Wessel, tell us more about&nbsp;
those opportunities and unanswered&nbsp;&nbsp;

00:08:14.880 --> 00:08:19.800
questions. What's next on the research&nbsp;
agenda in environmental prediction?

00:08:19.800 --> 00:08:25.280
BRUINSMA: Well, Aurora has two main&nbsp;
limitations. The first is that the&nbsp;&nbsp;

00:08:25.280 --> 00:08:31.120
model produces only deterministic predictions,&nbsp;
by which I mean a single predicted value. For&nbsp;&nbsp;

00:08:31.120 --> 00:08:36.400
variables like temperature, this is mostly&nbsp;
fine. But other variables like precipitation,&nbsp;&nbsp;

00:08:36.400 --> 00:08:40.880
they are inherently some kind of stochastic.&nbsp;
For these variables, we really want to assign&nbsp;&nbsp;

00:08:40.880 --> 00:08:46.560
probabilities to different levels of precipitation&nbsp;
rather than predicting only a single value.

00:08:46.560 --> 00:08:51.760
An extension of Aurora to allow this sort&nbsp;
of prediction would be a great next step.

00:08:51.760 --> 00:08:57.280
The second limitation is that Aurora depends on&nbsp;
a procedure called assimilation. Assimilation&nbsp;&nbsp;

00:08:57.280 --> 00:09:02.480
attempts to create a starting point for the model&nbsp;
from real-world observations, such as from weather&nbsp;&nbsp;

00:09:02.480 --> 00:09:08.160
stations and satellites. The model then takes the&nbsp;
starting point and uses it to make predictions.&nbsp;&nbsp;

00:09:08.720 --> 00:09:11.760
Unfortunately, assimilation is super expensive,&nbsp;&nbsp;

00:09:11.760 --> 00:09:15.920
so it would be great if we could&nbsp;
somehow circumvent the need for it.

00:09:15.920 --> 00:09:20.160
Finally, what we find really important is&nbsp;
to make our advancements available to the&nbsp;&nbsp;

00:09:20.160 --> 00:09:21.432
community.
[MUSIC]

00:09:21.432 --> 00:09:23.520
TINGLE: Great. Megan and Wessel,&nbsp;&nbsp;

00:09:23.520 --> 00:09:26.240
thanks for joining us today on&nbsp;
the Microsoft Research Podcast.

00:09:26.240 --> 00:09:27.520
BRUINSMA: Thanks for having us.

00:09:27.520 --> 00:09:29.120
STANLEY: Yeah, thank you. It's been great.

00:09:29.120 --> 00:09:35.200
TINGLE: You can check out the Aurora model on&nbsp;
Azure AI Foundry. You can read the entire paper,&nbsp;&nbsp;

00:09:35.200 --> 00:09:40.880
“A Foundation Model for the Earth&nbsp;
System,” at aka.ms/abstracts. And&nbsp;&nbsp;

00:09:40.880 --> 00:09:44.480
you'll certainly find it&nbsp;
on the Nature website, too.

00:09:44.480 --> 00:09:52.320
Thank you so much for tuning in to&nbsp;
Abstracts today. Until next time.

00:09:52.320 --> 00:09:53.169
[MUSIC FADES]