00:00:00.343 --> 00:00:00.887
[MUSIC]

00:00:00.887 --> 00:00:08.000
AMBER TINGLE: Welcome to Abstracts, a Microsoft&nbsp;
Research Podcast that puts the spotlight on&nbsp;&nbsp;

00:00:08.000 --> 00:00:15.560
world-class research in brief. I’m Amber Tingle.&nbsp;
In this series, members of the research community&nbsp;&nbsp;

00:00:15.560 --> 00:00:23.706
at Microsoft give us a quick snapshot—or a podcast&nbsp;
abstract—of their new and noteworthy papers.

00:00:23.706 --> 00:00:23.720
[MUSIC FADES]

00:00:23.720 --> 00:00:30.000
Our guests today are Chris Hawblitzel and&nbsp;
Jay Lorch. They are both senior principal&nbsp;&nbsp;

00:00:30.000 --> 00:00:36.920
researchers at Microsoft and two of the coauthors&nbsp;
on a paper called “Verus: A Practical Foundation&nbsp;&nbsp;

00:00:36.920 --> 00:00:44.680
for Systems Verification.” This work received the&nbsp;
Distinguished Artifact Award at the 30th Symposium&nbsp;&nbsp;

00:00:44.680 --> 00:00:52.040
on Operating Systems Principles, also known as&nbsp;
SOSP, which is happening right now in Austin,&nbsp;&nbsp;

00:00:52.040 --> 00:00:58.789
Texas. Chris and Jay, thank you for joining&nbsp;
us today for Abstracts and congratulations!

00:00:58.789 --> 00:00:59.537
JAY LORCH: Thank you for having us.

00:00:59.537 --> 00:01:00.240
CHRIS HAWBLITZEL: Glad to be here.

00:01:00.240 --> 00:01:06.160
TINGLE: Chris, let's start with an overview.&nbsp;
What problem does this research address,&nbsp;&nbsp;

00:01:06.160 --> 00:01:10.520
and why is Verus something that the broader&nbsp;
research community should know about?

00:01:10.520 --> 00:01:16.280
HAWBLITZEL: So what we're trying to address is&nbsp;
a very simple problem where we're trying to help&nbsp;&nbsp;

00:01:16.280 --> 00:01:22.120
developers write software that doesn't have bugs&nbsp;
in it. And we're trying to provide a tool with&nbsp;&nbsp;

00:01:22.120 --> 00:01:29.600
Verus that will help developers show that their&nbsp;
code actually behaves the way it's supposed to;&nbsp;&nbsp;

00:01:29.600 --> 00:01:34.760
it obeys some sort of specification&nbsp;
for what the program is supposed to do.

00:01:34.760 --> 00:01:40.200
TINGLE: How does this publication build on&nbsp;
or differ from other research in this field,&nbsp;&nbsp;

00:01:40.200 --> 00:01:42.580
including your previous Verus-related work?

00:01:42.580 --> 00:01:50.320
HAWBLITZEL: So formal verification is a process&nbsp;
where you write down what it is that you want your&nbsp;&nbsp;

00:01:50.320 --> 00:02:00.040
program to do in mathematical terms. So if you're&nbsp;
writing an algorithm to sort a list, for example,&nbsp;&nbsp;

00:02:00.040 --> 00:02:07.720
you might say that the output of this algorithm&nbsp;
should be a new list that is a rearrangement&nbsp;&nbsp;

00:02:07.720 --> 00:02:14.160
of the elements of the old list, but now this&nbsp;
rearrangement should be in sorted order. So you&nbsp;&nbsp;

00:02:14.160 --> 00:02:21.320
can write that down using standard mathematics.&nbsp;
And now given that mathematical specification,&nbsp;&nbsp;

00:02:21.320 --> 00:02:25.880
the challenge is to prove that your piece of&nbsp;
software written in a particular language,&nbsp;&nbsp;

00:02:25.880 --> 00:02:33.360
like Java or C# or Rust, actually generates an&nbsp;
output that meets that mathematical specification.&nbsp;&nbsp;

00:02:33.360 --> 00:02:41.640
So this idea of using verification to prove that&nbsp;
your software obeys some sort of specification,&nbsp;&nbsp;

00:02:41.640 --> 00:02:46.600
this has been around for a long time, so, you&nbsp;
know, even Alan Turing talked about ways of&nbsp;&nbsp;

00:02:46.600 --> 00:02:55.720
doing this many, many decades ago. The challenge&nbsp;
has always been that it's really hard to develop&nbsp;&nbsp;

00:02:55.720 --> 00:03:02.840
these proofs for any large piece of software.&nbsp;
It simply takes a long time for a human being&nbsp;&nbsp;

00:03:02.840 --> 00:03:09.840
to write down a proof of correctness of their&nbsp;
software. And so what we're trying to do is to&nbsp;&nbsp;

00:03:09.840 --> 00:03:19.440
build on earlier work in verification and recent&nbsp;
developments in programming languages to try to&nbsp;&nbsp;

00:03:19.440 --> 00:03:24.880
make this as easy as possible and to try to&nbsp;
make it as accessible to ordinary software&nbsp;&nbsp;

00:03:24.880 --> 00:03:31.480
developers as possible. So we've been using&nbsp;
existing tools. There are automated theorem&nbsp;&nbsp;

00:03:31.480 --> 00:03:36.640
provers—one of them from Microsoft Research&nbsp;
called Z3—where you give it a mathematical&nbsp;&nbsp;

00:03:36.640 --> 00:03:41.760
formula and ask it to prove that the formula&nbsp;
is valid. We're building on that. And we're&nbsp;&nbsp;

00:03:41.760 --> 00:03:49.520
also taking a lot of inspiration from tools&nbsp;
developed at Microsoft Research and elsewhere,&nbsp;&nbsp;

00:03:49.520 --> 00:03:56.600
like Dafny and F* and so on, that we've used in&nbsp;
the past for our previous verification projects.&nbsp;&nbsp;

00:03:56.600 --> 00:04:03.080
And we're trying to take ideas from those and make&nbsp;
them accessible to developers who are using common&nbsp;&nbsp;

00:04:03.080 --> 00:04:07.680
programming languages. In this case, the Rust&nbsp;
programming language is what we're focusing on.

00:04:07.680 --> 00:04:12.720
TINGLE: Jay, could you describe your&nbsp;
methodology for us and maybe share a&nbsp;&nbsp;

00:04:12.720 --> 00:04:17.320
bit about how you and your coauthors&nbsp;
tested the robustness of Verus.

00:04:17.320 --> 00:04:22.280
LORCH: So the question we really want to&nbsp;
answer is, is Verus suitable for systems&nbsp;&nbsp;

00:04:22.280 --> 00:04:28.920
programming? So that means a variety of&nbsp;
things. Is it amenable to a variety of&nbsp;&nbsp;

00:04:29.840 --> 00:04:35.440
kinds of software that you want to build as part&nbsp;
of a system? Is it usable by developers? Can they&nbsp;&nbsp;

00:04:35.440 --> 00:04:41.560
produce compact proofs? And can they get timely&nbsp;
feedback about those proofs? Can the verifier&nbsp;&nbsp;

00:04:41.560 --> 00:04:47.840
tell you quickly that your proof is correct or, if&nbsp;
it's wrong, that it's wrong and guide you to fix&nbsp;&nbsp;

00:04:47.840 --> 00:04:55.920
it? So the main two methodological techniques we&nbsp;
used were millibenchmarks and full systems. So the&nbsp;&nbsp;

00:04:55.920 --> 00:05:02.200
millibenchmarks are small pieces of programs that&nbsp;
have been verified by other tools in the past,&nbsp;&nbsp;

00:05:02.200 --> 00:05:08.560
and we built them in Verus and compared to what&nbsp;
other tools would do to find whether we could&nbsp;&nbsp;

00:05:08.560 --> 00:05:14.640
improve usability. And we found generally that we&nbsp;
could verify the same things but with more compact&nbsp;&nbsp;

00:05:14.640 --> 00:05:20.120
proofs and proofs that would give much snappier&nbsp;
feedback. The difference between one second and&nbsp;&nbsp;

00:05:20.120 --> 00:05:25.720
10 seconds might not seem a lot, but when you're&nbsp;
writing code and working with the verifier,&nbsp;&nbsp;

00:05:25.720 --> 00:05:29.880
it's much nicer to get immediate feedback about&nbsp;
what is wrong with your proof so you can say,&nbsp;&nbsp;

00:05:29.880 --> 00:05:33.360
oh, what about this? And it can say, oh, well,&nbsp;
I still see a problem there. And you could say,&nbsp;&nbsp;

00:05:33.360 --> 00:05:39.800
OK, let me fix that. As opposed to waiting&nbsp;
10, 20 seconds between each such query to&nbsp;&nbsp;

00:05:39.800 --> 00:05:45.040
the verifier. So the millibenchmarks helped&nbsp;
us evaluate that. And the macrobenchmarks,&nbsp;&nbsp;

00:05:45.040 --> 00:05:49.320
the building entire systems, we built&nbsp;
a couple of distributed systems that&nbsp;&nbsp;

00:05:49.320 --> 00:05:54.080
had been verified before—a key value store&nbsp;
and a node replication system—to show that&nbsp;&nbsp;

00:05:54.080 --> 00:05:59.320
you could do them more effectively and with less&nbsp;
verification time. We also built some new systems,&nbsp;&nbsp;

00:05:59.320 --> 00:06:05.040
a verified OS page table, a memory allocator,&nbsp;
and a persistent memory append-only log.

00:06:05.040 --> 00:06:10.800
TINGLE: Chris, the paper mentions that&nbsp;
successfully verifying system software&nbsp;&nbsp;

00:06:10.800 --> 00:06:16.240
has required—you actually use the word heroic&nbsp;
to describe the developer effort. Thinking of&nbsp;&nbsp;

00:06:16.240 --> 00:06:21.720
those heroes in the developer community and&nbsp;
perhaps others, what real-world impact do&nbsp;&nbsp;

00:06:21.720 --> 00:06:26.280
you expect Verus to have? What kind&nbsp;
of gains are we talking about here?

00:06:26.280 --> 00:06:32.280
HAWBLITZEL: Yeah, so I think, you know,&nbsp;
traditionally verification or this formal&nbsp;&nbsp;

00:06:32.280 --> 00:06:38.520
software verification that we're doing has been&nbsp;
considered a little bit of a pie-in-the-sky&nbsp;&nbsp;

00:06:38.520 --> 00:06:46.000
research agenda. Something that people have&nbsp;
applied to small research problems but has&nbsp;&nbsp;

00:06:46.000 --> 00:06:53.680
not necessarily had a real-world impact before.&nbsp;
And so I think it's just, you know, recently,&nbsp;&nbsp;

00:06:53.680 --> 00:07:00.840
in the last 10 or 15 years, that we started to&nbsp;
see a change in this and started to see verified&nbsp;&nbsp;

00:07:00.840 --> 00:07:09.280
software actually deployed in practice. So on one&nbsp;
of our previous projects, we worked on verifying&nbsp;&nbsp;

00:07:09.280 --> 00:07:16.320
the cryptographic primitives that people use&nbsp;
when, say, they browse the web or something and&nbsp;&nbsp;

00:07:16.320 --> 00:07:24.600
their data is encrypted. So in these cryptographic&nbsp;
primitives, there's a very clear specification for&nbsp;&nbsp;

00:07:24.600 --> 00:07:30.720
exactly what bytes you're supposed to produce when&nbsp;
you encrypt some data. And the challenge is just&nbsp;&nbsp;

00:07:30.720 --> 00:07:37.800
writing software that actually performs those&nbsp;
operations and does so efficiently. So in one&nbsp;&nbsp;

00:07:37.800 --> 00:07:46.080
of our previous projects that we worked on called&nbsp;
HACL* and EverCrypt, we verified some of the most&nbsp;&nbsp;

00:07:46.080 --> 00:07:52.400
commonly used and efficient cryptographic&nbsp;
primitives for things like encryption and&nbsp;&nbsp;

00:07:52.400 --> 00:07:59.840
hashing and so on. And these are things that&nbsp;
are actually used on a day-to-day basis. So we,&nbsp;&nbsp;

00:07:59.840 --> 00:08:05.920
kind of, took from that experience that the&nbsp;
tools that we're building are getting ready for&nbsp;&nbsp;

00:08:05.920 --> 00:08:14.720
prime time here. We can actually verify software&nbsp;
that is security critical, reliability critical,&nbsp;&nbsp;

00:08:14.720 --> 00:08:22.120
and is in use. So some of the things that Jay just&nbsp;
mentioned, like verifying, you know, persistent&nbsp;&nbsp;

00:08:22.120 --> 00:08:29.800
memory storage systems and so on, those are the&nbsp;
things that we're looking at next for software&nbsp;&nbsp;

00:08:29.800 --> 00:08:36.120
that would really benefit from reliability&nbsp;
and where we can formally prove that your data&nbsp;&nbsp;

00:08:36.120 --> 00:08:42.920
that's written to disk is read correctly back from&nbsp;
disk and not lost during a crash, for example. So&nbsp;&nbsp;

00:08:42.920 --> 00:08:47.660
that's the kind of software that we're looking&nbsp;
to verify to try to have a real-world impact.

00:08:47.660 --> 00:08:52.800
LORCH: The way I see the real-world impact, is it&nbsp;
going to enable Microsoft to deal with a couple&nbsp;&nbsp;

00:08:52.800 --> 00:08:59.800
of challenges that are severe and increasing&nbsp;
in scale? So the first challenge is attackers,&nbsp;&nbsp;

00:08:59.800 --> 00:09:04.480
and the second challenge is the vast scale&nbsp;
at which we operate. There's a lot of hackers&nbsp;&nbsp;

00:09:04.480 --> 00:09:08.720
out there with a lot of resources that&nbsp;
are trying to get through our defenses,&nbsp;&nbsp;

00:09:08.720 --> 00:09:12.760
and every bug that we have offers them&nbsp;
purchase, and techniques like this,&nbsp;&nbsp;

00:09:12.760 --> 00:09:20.080
that can get rid of bugs, allow us to deal with&nbsp;
that increasing attacker capability. The other&nbsp;&nbsp;

00:09:20.080 --> 00:09:27.200
challenge we have is scale. We have billions&nbsp;
of customers. We have vast amounts of data and&nbsp;&nbsp;

00:09:27.200 --> 00:09:33.800
compute power. And when you have a bug that&nbsp;
you've thoroughly tested but then you run it&nbsp;&nbsp;

00:09:33.800 --> 00:09:42.640
on millions of computers over decades, those&nbsp;
rare bugs eventually crop up. So they become&nbsp;&nbsp;

00:09:42.640 --> 00:09:48.160
a problem, and traditional testing has a lot of&nbsp;
difficulty finding those. And this technology,&nbsp;&nbsp;

00:09:48.160 --> 00:09:53.920
which enables us to reason about the infinite&nbsp;
possibilities in a finite amount of time and&nbsp;&nbsp;

00:09:53.920 --> 00:09:59.760
observe all possible ways that the system can go&nbsp;
wrong and make sure that it can deal with them,&nbsp;&nbsp;

00:09:59.760 --> 00:10:03.520
that enables us to deal with the vast&nbsp;
scale that Microsoft operates on today.

00:10:03.520 --> 00:10:08.680
HAWBLITZEL: Yeah, and I think this is an&nbsp;
important point that differentiates us from&nbsp;&nbsp;

00:10:08.680 --> 00:10:16.480
testing. Traditionally, you find a bug when&nbsp;
you see that bug happen in running software.&nbsp;&nbsp;

00:10:16.480 --> 00:10:21.960
With formal verification, we're catching the&nbsp;
bugs before you run the software at all. We're&nbsp;&nbsp;

00:10:21.960 --> 00:10:27.240
trying to prove that on all possible inputs,&nbsp;
on all possible executions of the software,&nbsp;&nbsp;

00:10:27.240 --> 00:10:32.800
these bugs will not happen, and it's&nbsp;
much cheaper to fix bugs before you've&nbsp;&nbsp;

00:10:32.800 --> 00:10:38.460
deployed the software that has bugs, before&nbsp;
attackers have tried to exploit those bugs.

00:10:38.460 --> 00:10:44.360
TINGLE: So, Jay, ideally, what would you&nbsp;
like our listeners and your fellow SOSP&nbsp;&nbsp;

00:10:44.360 --> 00:10:49.460
conference attendees to tell their colleagues&nbsp;
about Verus? What's the key takeaway here?

00:10:49.460 --> 00:10:54.520
LORCH: I think the key takeaway is that it is&nbsp;
possible now to build software without bugs,&nbsp;&nbsp;

00:10:54.520 --> 00:10:59.600
to build systems code that is going to&nbsp;
obey its specification on all possible&nbsp;&nbsp;

00:10:59.600 --> 00:11:05.400
inputs always. We have that technology.&nbsp;
And this is possible now because a lot&nbsp;&nbsp;

00:11:05.400 --> 00:11:09.560
of technology has advanced to the point&nbsp;
where we can use it. So for one thing,&nbsp;&nbsp;

00:11:09.560 --> 00:11:14.640
there's advances in programming languages.&nbsp;
People are moving from C to Rust. They've&nbsp;&nbsp;

00:11:14.640 --> 00:11:19.800
discovered that you can get the high performance&nbsp;
that you want for systems code without having to&nbsp;&nbsp;

00:11:19.800 --> 00:11:25.920
sacrifice the ability to reason about ownership&nbsp;
and lifetimes, concurrency. The other thing&nbsp;&nbsp;

00:11:25.920 --> 00:11:33.680
that we build on is advances in computer-aided&nbsp;
theorem proving. So we can really make compact&nbsp;&nbsp;

00:11:33.680 --> 00:11:40.880
and quick-to-verify mathematical descriptions&nbsp;
of all possible behaviors of a program and get&nbsp;&nbsp;

00:11:40.880 --> 00:11:46.480
fast answers that allow us to rapidly turn&nbsp;
around proof challenges from developers.

00:11:46.480 --> 00:11:51.120
TINGLE: Well, finally, Chris, what&nbsp;
are some of the open questions or&nbsp;&nbsp;

00:11:51.120 --> 00:11:54.920
future opportunities for formal&nbsp;
software verification research,&nbsp;&nbsp;

00:11:54.920 --> 00:11:59.240
and what might you and your collaborators tackle&nbsp;
next? I heard a few of the things earlier.

00:12:01.440 --> 00:12:07.960
HAWBLITZEL: Yes, I think despite, you know, the&nbsp;
effort that we and many other researchers have put&nbsp;&nbsp;

00:12:07.960 --> 00:12:15.520
into trying to make these tools more accessible,&nbsp;
trying to make them easier to use, there still&nbsp;&nbsp;

00:12:15.520 --> 00:12:23.200
is a lot of work to prove a piece of software&nbsp;
correct, even with advanced state-of-the-art&nbsp;&nbsp;

00:12:23.200 --> 00:12:30.160
tools. And so we're still going to keep trying&nbsp;
to push to make that easier. Trying to figure out&nbsp;&nbsp;

00:12:30.160 --> 00:12:36.640
how to automate the process better. There's a lot&nbsp;
of interest right now in artificial intelligence&nbsp;&nbsp;

00:12:36.640 --> 00:12:43.880
for trying to help with this, especially&nbsp;
if you think about artificial intelligence&nbsp;&nbsp;

00:12:43.880 --> 00:12:49.560
actually writing software. You ask it to write&nbsp;
a piece of software to do a particular task,&nbsp;&nbsp;

00:12:49.560 --> 00:12:54.720
and it generates some C code or some Rust code&nbsp;
or some Java code, and then you hope that that's&nbsp;&nbsp;

00:12:54.720 --> 00:13:01.640
correct because it could have generated any sort&nbsp;
of code that performs the right thing or does&nbsp;&nbsp;

00:13:01.640 --> 00:13:09.080
total nonsense. So it would be really great going&nbsp;
forward if when we ask AI to develop software,&nbsp;&nbsp;

00:13:09.080 --> 00:13:13.560
we also expect it to create a proof that&nbsp;
the software is correct and does what the&nbsp;&nbsp;

00:13:13.560 --> 00:13:22.520
user asked for. We’ve started working on some&nbsp;
projects, and we found that the AI is not quite&nbsp;&nbsp;

00:13:22.520 --> 00:13:29.600
there yet for realistic code. It can do small&nbsp;
examples this way. But I think this is still&nbsp;&nbsp;

00:13:29.600 --> 00:13:36.920
a very large challenge going forward that&nbsp;
could have a large payoff in the future if&nbsp;&nbsp;

00:13:36.920 --> 00:13:41.640
we can get AI to develop software and&nbsp;
prove that the software is correct.

00:13:41.640 --> 00:13:48.560
LORCH: Yeah, I see there's a lot of synergy&nbsp;
between—potential synergy—between AI and&nbsp;&nbsp;

00:13:48.560 --> 00:13:53.080
verification. Artificial intelligence can solve&nbsp;
one of the key challenges of verification,&nbsp;&nbsp;

00:13:53.080 --> 00:13:57.000
namely making it easy for developers to&nbsp;
write that code. And verification can&nbsp;&nbsp;

00:13:57.000 --> 00:14:01.440
solve one of the key challenges&nbsp;
of AI, which is hallucinations,&nbsp;&nbsp;

00:14:01.440 --> 00:14:06.780
synthesizing code that is not correct, and Verus&nbsp;
can verify that that code actually is correct.

00:14:06.780 --> 00:14:11.200
TINGLE: Well, Chris Hawblitzel and Jay&nbsp;
Lorch, thank you so much for joining&nbsp;&nbsp;

00:14:11.200 --> 00:14:15.280
us today on the Microsoft Research&nbsp;
Podcast to discuss your work on Verus.

00:14:15.280 --> 00:14:16.000
[MUSIC]

00:14:16.000 --> 00:14:16.820
HAWBLITZEL: Thanks for having us.

00:14:16.820 --> 00:14:17.440
LORCH: Thank you.

00:14:17.440 --> 00:14:22.080
TINGLE: And to our listeners, we appreciate&nbsp;
you, too. If you'd like to learn more about&nbsp;&nbsp;

00:14:22.080 --> 00:14:31.440
Verus, you'll find a link to the paper at&nbsp;
aka.ms/abstracts or you can read it on the SOSP&nbsp;&nbsp;

00:14:31.440 --> 00:14:54.000
website. Thanks for tuning in. I'm Amber Tingle,&nbsp;
and we hope you'll join us again for Abstracts.

00:14:54.000 --> 00:14:54.953
[MUSIC FADES]