Geoff Huston  0:00

    8.8.8.8, operated by Google, their public DNS system is not an
    engine, it's hundreds, possibly 1000s, of engines all over the
    world. And you can ask a query, and I can ask a query, and they'll
    go in different places, because it's a compound system. It's a
    hybrid. Now, why is it important to sort of look at this and go,
    I'm really concerned about resolvers, because what we're talking
    about is whether it's viable to have a name server running V6 only
    and in the way the DNS works the theory on paper is that users and
    clients, applications, things in your phone, things in my phone,
    don't ask authoritative servers, we asked middleware, those
    invisible agents called recursive resolvers, and they do all the
    hard work. Why?

George Michaelson  1:00

    you're listening to ping, a podcast by APNIC discussing all things
    related to measuring the Internet. I'm your host, George
    Michaelson, this time, I'm talking to Geoff Huston from APNIC labs
    again in his regular monthly spot on ping. Geoff has been running
    advertising based experiments for over a decade measuring behavior
    on the Internet. Recently, he's been exploring a problem of
    interest in the modern DNS. The Domain Name System, DNS
    fundamentally requires all of the end users, their chosen resolver
    provider and the authoritative servers of the names they ask about
    to cooperate in a dance over IP protocols answering DNS questions.
    The specifics of how these questions are encoded and passed around
    can get complex very quickly, but a specific problem is emerging
    in how we would define, with strong force or "normatively" the
    ways that the protocol works, and this is going to affect future
    deployment, code, development and operational dependencies. This
    question relates to the use of IPV6 inside the DNS system at
    large. Can we yet declare that an IPV6 only DNS could be used
    reliably? and should we write it into operational practices An RFC
    can define? These practices can be elevated to the status of what
    is called a best current practice, or BCP document. Geoff is
    exploring measurement of this question by exploiting a model of
    DNS that is called "glueless" glue is extra information passed
    around behind the scenes that makes DNS work if it's not given it
    forces the DNS resolver to ask extra questions. These, in turn,
    can be used to force delivery over an IPV6 only network between
    the resolver and the authority. This avoids many questions about
    error rate, drop off rate, and problems with measurements seen
    measuring end users whose minds may wander away from the web page
    that triggered the advert. Geoff. Welcome back to ping. What shall
    we talk about this time?

Geoff Huston  3:24

    Hi George. Let's talk about the Internet.

George Michaelson  3:27

    Okay, where to begin?

Geoff Huston  3:29

    Where to begin. I'm going to take the two topics that I've spent a
    lot of time in two out of three. Ain't bad. I'm going to talk
    about V6 and the DNS

George Michaelson  3:39

    Awww two together!

Geoff Huston  3:41

    yes, but not security. So, you know, I'm going to talk about
    measuring the DNS and its use of V6, and in particular, because
    the DNS operates at one level above transport, you can put DNS
    queries over v4 or V6. DNS doesn't care. And so the question is,
    when you set up some DNS infrastructure, your recursive resolver,
    your authoritative name server, and all those other bits and
    pieces. Should you just use v4 and forget about it? Should you go
    all dual-stack everywhere? Are you brave and adventurous? Should
    you try V6 only great questions and so this is not a recent
    question, and several some answers if you, you know, shuffle
    around in the RFCs, and the first of these actually is now, gee
    whiz, a dozen years old, 2004 RFC3901 DNS, IPV6, transport
    operational guidelines. And it's bizarre, because you sort of try
    and look and say, Well, what advice can you give me about V6 and
    the DNS and it says every recursive name server should either be
    v4 only or dual-stack.

George Michaelson  4:52

    Now, that's quite a strong directing instruction, isn't it? Geoff.
    I mean, that begs many questions. Did we ever get there?

Geoff Huston  5:02

    Well, we're there today. I'm like you were given either v4 only or
    dual-stack. What it's missing is, don't be brave. Don't do V6
    only, and it's almost like a blessing of the status quo.

George Michaelson  5:15

    So it didn't try to suggest the path into V6 only. It only made
    requirements to be four or dual.

Geoff Huston  5:22

    Yes. And what about authoritative name servers, those boxes that
    say I know all about these names, ask me, Well, that's easy. Every
    DNS zone should be served by at least one v4 reachable name
    server,

George Michaelson  5:36

    which is, when you think about it, that's a position that
    absolutely made sense then, and may very well make sense now,
    although that's a testable proposition, it was a reflection of
    need and a reflection of behavior of systems, the dependencies on
    things like tunnels, the stuff that you've been exploring many
    times on PING around V6 DNS and large packets. That wasn't a bad
    direction Geoff.

Geoff Huston  6:00

    Well, it was the status quo. And what it really says is this V6
    transport operational guideline, don't drop v4 don't let your V6
    enthusiasm get the better of you. This is not, you know,
    evangelism about V6.

George Michaelson  6:15

    This is pragmatism. This is a reflection of the emerging reality
    we live in, documented as this is what we see,

Geoff Huston  6:22

    A V6 piece of infrastructure can't select its clients, and
    authoritative name servers are promiscuous in the sense that you
    answer all questions. There's no bad questioner. And so in some
    ways, this was saying dozen years ago, no time to be brave if you
    really want this stuff to work and work well, you better keep V4
    going [George: right]  -Now. It's a dozen years later, and there's
    been some work, predominantly by the folk who have strong opinions
    about promoting V6. And this is turned into a soon to be RFC, I
    guess. And it's some odd No, it's RFC3901bis

George Michaelson  7:02

    Yep. So it's the BIS thing, which comes from the CCITT, which was
    a French language aware standardization effort, and the BIS and
    TER denotations of the French for second and third.

Geoff Huston  7:18

    Thank you, George. Because I was wondering, why are we using BIS
    I'm like, Really, I thought you okay.

George Michaelson  7:25

    So in normal RFC practice, every RFC is a new number, and we had
    RFC822, but the RFC that is normative for header structure in
    emails does not have to stay 822, with a version one, version two,
    version three flag. We might talk about it as 822BIS, but at the
    point we actually get off the seats and publish it, it gets
    assigned a new number, doesn't it?

Geoff Huston  7:52

    Right? So this thing hasn't got a number yet. We're working on a
    successor to this 12 year old advice, and it is currently titled
    In the interim, version RFC3901

George Michaelson  8:03

    So it sets the stage. It sets an expectation. It's reflecting on
    that, and it's going to probably be an enhancement on that, not a
    complete change from that.

Geoff Huston  8:15

    So the two recommendations that are kind of critical both actually
    relate to the authoritative name servers. And this time, it's
    saying two name servers for a zone are dual-stack, and every name
    server should be served by at least one name server that is
    reachable over V6.

George Michaelson  8:36

    Wow. So it's flipped a bit from saying you must have a 4 to saying
    you must be dual, which implicitly says has a 4 somewhere in your
    name server set, and you must have a 6 somewhere in your name
    server set. That's a flip.

Geoff Huston  8:53

    I actually read the first one as two name servers. A dual-stack
    means two 6, [George: yes]. And what it's minimally saying is two
    4 and two 6. What it's not saying is v4 only. It's kind of saying
    you can do that, but as long as you have at least two dual-stack

George Michaelson  9:11

    This is an exquisite moment in some ways, because when we talk
    about things in Internet and standards, there's always this giant
    fork in the road, right? There's things as they are, what are we
    actually doing? And there's aspirational, where do we want to be,
    [Geoff: right] And I cannot help but think we are entering the
    door of changing from a things as they are moment to an
    aspirational,

Geoff Huston  9:37

    oh Yep, that touches upon something that we'll touch upon right
    now, because both of these documents, the old one, RFC3901, and
    this current sort of BIS suggestion, are what we call best current
    practice documents. And I must admit, when you poke and prod at
    the IETF, the actual intention of a best current practice Is not
    clear. Even in you know, the brains of the IESG, the Internet
    Engineering Steering Group, those worthy folk who are the holders
    of of the IETF terms and traditions. [George: Yeah] I'll ask the
    question, is a BCP, best current practice, a checklist, a shopping
    list if you're going to deploy something, tick these boxes,
    because that's the best way we know how to do it.

George Michaelson  10:27

    That word current, if you believe words have meanings and are not
    just labels, if you believe in what I might call nominative
    determinism, that word current strongly implies what you just
    said, it it would empower me to go out into the marketplace and
    say to providers of services, providers of software, this is what
    everyone does. And I want you to tell me you will do this, because
    that's what I need. And I'm putting three tenders out there. Every
    one of you has to say, if you meet this level, to me, that's what
    it kind of says when it says current,

Geoff Huston  11:02

    Current practice. So what it says is implementations should
    support this, because customers, users and people, the whole idea
    of what RFCs are about, protecting the customer and saying, This
    is the technology you know you should be buying. It's the shopping
    list

George Michaelson  11:21

    I adhere. We're done recording over. PING, finished.

Geoff Huston  11:25

    There's another view. George, [George: okay] this document
    actually has a little bit of this aspiration, [George: right] It
    would be good if we're not going to describe what implementations
    do. We're going to describe what implementations should do if they
    adhere to the principles of the author, [George: right] And what
    the document actually includes is consideration which is novel.
    This is the change that you can mount a DNS authoritative name
    server, or recursive name server that operates only in V6.
    [George: Oh, wow]. And let's look at the poor plight of the DNS
    recursive name server, because we know out there that there are
    authoritative name servers that say v4 only, dude, that's it. You
    can only reach me if you do v4. Now I am trying to ask questions.
    I'm the recursive resolver, and I only have V6. I'm fine. I look
    at this name server, and it says, Here's my v4 address. And I go,
    Look, I'm sorry, there's nothing I can do now, the draft says,
    hand wave, hand wave, hand wave. You should forward on to someone
    who knows what they're doing. Any implementation code about there
    no does DNS support forwarding? Well, it doesn't. It doesn't

George Michaelson  12:40

    it supports it in as much as in my operational platform. In the
    boundary of configuration about how I run my machine, I get to
    say, for this machine, I want to use these things to forward
    questions I can't answer, but there's zero in the protocol that
    talks to that forwarding behavior, and there's no obligation on me
    to have a forwarder configured. It actually isn't a mandatory
    component of behavior.

Geoff Huston  13:06

    Well, there be dragons in routing. If you forward packets to me
    and I forward packets to you, then if any packet enters the Death
    Loop, they will swing there forever. You know this is a trap!

George Michaelson  13:19

    yeah, oh, we've talked about that in other protocols, and so this
    immediately presents, if there isn't protocol support first class
    for forwarding behavior, you've made a problem.

Geoff Huston  13:30

    So the protocol response is, you put in a counter a time to live a
    hop count. And basically the answer is, you can set it to a high
    number, and everyone who touches the packet drops that number, and
    if it ever gets to zero, kill it. You're in a loop. So you can't
    have packets forwarding to eternity. You just can't. Does the DNS
    have one of those? Well, the answer is, no, it does not. So when
    you have ahh just forward your way out of hell, the answer is,
    well, what if where I'm forwarding to forwards it back to me,
    either directly or indirectly. The answer is, you will be passing
    that packet around until the heat death of the universe. Nothing
    else is going to stop it. And this is this critical assumption,
    whether, when a BCP becomes aspirational, and there is no code in
    common practice, is it still a BCP? And I say,

George Michaelson  14:19

    I don't see how it can be.

Geoff Huston  14:21

    And I say to the worthy folk in the ISG, that document may be many
    things, and it may even say good aspirational things, but if I
    apply the words best current practice, which are English, not
    French, then the answer is, No, this one is not that. It's just
    wide of the mark. I'm sorry.

George Michaelson  14:41

    So where is the IESG on this point of view? Geoff,

Geoff Huston  14:47

    Oh, you go searching for a spine, and you can take forever to find
    one. They'll do it because doing it's the path of least
    resistance. However, that's not what I'm bemoaning and not what
    I'm really worrying about here, what I'm actually looking at. Is
    actually something more interesting here is the DNS and V6 at a
    level where V6 is mature, well understood and deployed
    sufficiently for V6 only infrastructure to be deployed as being
    efficient and as fast as v4 is it out there? Can we measure it?
    That's the question in my head, right?

George Michaelson  15:22

    So this is kind of the role of measurement as information
    gathering to help us tool up for a rational conversation. If the
    thing about BCP versus aspiration is on the table and we're now
    exploring even as aspiration, would it be sensible to do this? You
    need info. That's where measurement comes to the story.

Geoff Huston  15:42

    Right? If I put up a server and it only had V6, how much of the
    Internet in terms of end clients couldn't reach me? And you think,
    Well, that would be a really easy question, wouldn't it? Do it!
    Hang on. Hang on. You see, that's a harder question than you might
    think, what are you counting users? How can I get billions of
    people to come to my website? Not easy. It's kind of a harder
    question. And am I counting users or something else? So let's have
    a look at what we don't understand. And the first thing we don't
    understand is in the DNS we talk about resolvers and servers.
    Resolvers are those things that take a DNS name and doing
    operations. Let's not define them yet. Doing things return back
    the translation of that name into an IP address. So I am building
    a resolver system, and I'm building out a components that can
    handle 10 queries a second, and I put in 100 of them and do a
    front end that disperses the queries across that so 10 by 100
    that's 1000 queries a second. But it's not one resolver anymore.
    It's 1000 isn't it? Is a resolver one or many? Is it a single
    platform? What is it a collection now? Are they independent? Do
    they each have their own caches? So if I ask a question to this
    mythical compound beast, and it sends me back an answer, has that
    beast cached that answer for all future queries, or has only one
    engine in that beast, cached that answer, and if I ask it again,
    will it take the same amount of time? Because that cache response
    is not everywhere? Perhaps. Wow!

George Michaelson  17:33

    We've been here across many, many protocols in the history of
    distributed computing and computer science. We've been here with
    databases. We've been here with squid and web caches. We've been
    here with questions about cache coherency, in the behavior of
    written to disk versus inactive memory versus level one cache.
    This problem of "where are things" and when I have two machines
    working and churning against them. When do they both represent the
    same state? These things have been around as problems a long time,
    and the answers are deceptively simple, right? It's very glib to
    say, Oh yes, I know exactly what this is. And the answer generally
    is, no, you haven't considered all of the corner cases that can
    exist in this situation,

Geoff Huston  18:23

    8.8.8.8, operated by Google, the public DNS system is not an
    engine. It's hundreds, possibly 1000s, of engines all over the
    world. And you can ask a query, and I can ask a query, and they'll
    go in different places, because it's a compound system. It's a
    hybrid. Now, why is it important to sort of look at this and go,
    I'm really concerned about resolvers, because what we're talking
    about is whether it's viable to have a name server running V6 only
    and in the way the DNS works. The theory on paper is that users
    end clients, applications, things in your phone, things in my
    phone. Don't ask authoritative servers. We asked middleware, those
    invisible agents called recursive resolvers, and they do all the
    hard work. Why?

George Michaelson  19:13

    Yeah, there's no actual barrier to me as an edge device doing that
    directly authority. [Geoff... George it's so slow!] but there's no
    protocol level boundary. It is slow Geoff, but it's not an
    enforced behavior, because the protocol police said You mustn't do
    it. And it's not that the authority is going to refuse to answer.
    It's that we've built the system predicated on a belief the
    millions of customers don't they go through an intermediary.

Geoff Huston  19:43

    We've built a system to scale and to work at speed, and if
    everyone didn't cache, didn't use those intermediate systems, we'd
    have junked the DNS 20 years ago, 30 it would never have got off
    the ground, because the way we actually make this system scale and
    work really, really well is to place these agents. Normally,
    there's one or two or maybe 100 in every Internet service
    provider. It's their DNS recursive resolver that they say to their
    customers, use this. And the beauty of it is, if you and I are in
    the same ISP and you have a question, what is, you know, dub, dub
    dot Geoff. And it goes, Well, the answer is 10.1 and then I ask,
    What's dub? Dub? Dub Geoff? The answer goes, I don't need to look
    this up. Dude, I just answered George. The answer is 10.0.0.1 have
    a good life. And all of a sudden, things speed up.

George Michaelson  20:35

    We've had protocol developments across the life of DNS where
    people actually observed the caching timers in this system and
    said, Look, if I know that Google's being hammered, and I'm
    answering from my local cache for google.com and I've got a timer
    when I last asked the question, I could choose to go and
    revalidate that question before the timer runs out.

Geoff Huston  20:58

    Ah, yes, I know, but you're into my second question about what we
    don't understand while we're asking questions of the DNS. So the
    first thing we don't understand is, we actually don't know what a
    resolver is. We think we do what we don't. The second thing is,
    what's a query that's crazy. We know what a query is. It's a DNS
    protocol

George Michaelson  21:16

    It's a question. Man

Geoff Huston  21:18

    You know, the query get answered, gets answered, and it's an
    answer, but that's not actually what I'm what I'm on about. You
    see, because we're running UDP, we don't know when to expect an
    answer.

George Michaelson  21:30

    So a reminder that DNS, before all the extra pieces are added, is
    questions about name to something put into DNS protocol buffers,
    put into UDP, put into IP, but we're assuming everything is still
    put into

Geoff Huston  21:48

    UDP, and UDP is exactly like putting a message in a bottle. No,
    seriously, you might get a bottle back. You don't know when, and
    so any any good cast away would put a mess the same message in
    bottles, one a day until eventually one gets answered. In other
    words, if you don't know when the answer is going to come, and
    your internal model of how long it should take, times out you do
    it again.

George Michaelson  22:13

    So the Internet model of the cast away is you are cast away on a
    desert island with a crate of Coca Cola in bottles and an endless
    supply of corks

Geoff Huston  22:22

    And paper and pen. Yes, you just go for it. That's the Internet
    for you. That's the new model. Keep that in your head. And so the
    issue is, we fan out queries excessively in the DNS in order to
    make the thing fast and in order to give the simulation of
    reliability even when we don't have it. And interestingly, this
    kind of Oh, the bottle hasn't made it back. I need to send it
    again. Implementations have their own view of timers. It's not
    written in any RFC. Some resolvers are unduly patient. We call
    them slow. Some resolvers are incredibly impatient and fan out
    queries like crazy. And oddly enough, they're quite fast, but very
    wasteful,

George Michaelson  23:05

    yeah. But we also call those denial of service sometimes.

Geoff Huston  23:09

    Oops. So in some ways, queries just splay out. And if you look at
    the other side, I'm an authoritative name server, and these things
    keep on asking me questions now, because it's a Stateless
    protocol, I don't know who the originator of that query was. A
    recursive resolver was asking me questions. There's no path,
    there's no hop count, there's no reason. I just have to answer it
    and move on.

George Michaelson  23:34

    If you're a true believer, a sweet, innocent child like me, you
    want to believe that if you're the authoritative server for
    geoff.com and there is a reason that a million people want to know
    about geoff.com the great advantage of this model of
    intermediaries the resolvers, is that although you're going to see
    a lot of queries, in principle, you shouldn't see a million, you
    should be seeing a lot of queries, but less, because those front
    end resolvers are supplying the answer. Here is geoff.com from a
    local cache. So the naive child view is this model is working.

Geoff Huston  24:14

    Right. If I have 1000 recursive resolvers out there and geoff.com
    is new, if I'm the server, I might see 1000 queries from each of
    those recursive resolvers. But if I say, here's the answer, and by
    the way, remember it for a week, then as long as the recursive
    resolvers follow my suggestion, keep that in your local cache for
    a week, I shouldn't see any questions at all for the next week,
    none. So the authoritative server sees cache misses. It sort of
    sees what's left after the recursive resolver has applied its
    local memory.

George Michaelson  24:49

    It sees the real hits, but the vast majority of questions it sees
    coming to it are things that either have timed out of cache or
    somehow have never been asked before.

Geoff Huston  25:00

    Right. Okay, so that, dear listeners, is the DNS in a nutshell,
    ready? It's complicated.

George Michaelson  25:06

    Okay, so we're really done. Let's go.

Geoff Huston  25:09

    So now we get back to this question, Is it viable to run V6 only
    DNS servers? And for that, I'm going to go back to our measurement
    rig that we run in APNIC labs. We use Google's advertising
    network, and Google support us in that. Thank you, Google. Love
    your work now, really, it is very valuable, and we do appreciate
    it. Now, this ad network seeds millions of ads a day across the
    entire Internet, about 40 million or so these days, ads are
    everywhere, literally, from the Faroe Islands to the Seychelles to
    even to deepest, darkest Canberra in Australia. You know, it's
    everywhere. And the way we do this is we measure what those users
    can do. The problem is, is that ads can't drive the machine where
    the ad is placed, it can't seize control of your mobile phone or
    your laptop or anything else. They can't do that. [George: yeah]
    all ads can do when an ad is placed is get some objects.

George Michaelson  26:13

    That's it. Make you the edge user, issue a request to get
    something

Geoff Huston  26:18

    And each request has a DNS part and an HTTPS part these days,

George Michaelson  26:23

    assuming the DNS part goes the distance, yes.

Geoff Huston  26:27

    So what if each person who receives an ad, if, when I say person,
    each device, then if I make that name unique in some form or
    fashion, then caching won't help anyone.

George Michaelson  26:40

    Some part of the name being unique means that some part of the DNS
    system cannot satisfy the answer from local cache. It must go back
    to an authority to say, what about this one?

Geoff Huston  26:53

    And if I run the authoritative server for that name, I get to see
    that user trying to resolve that name through some recursive
    resolver, I get to see the query. Now it's the DNS. I don't get to
    see the user. I get to see their recursive resolvers. But I was
    asking a question about the use of V6 as transport for recursive
    resolvers in the DNS. So yay. I'm actually showing the behaviors
    that I want to measure.

George Michaelson  27:23

    Yeah, if you craft the correct way of the resolver having to come
    to you to answer that question,

Geoff Huston  27:30

    yes, this is fantastic. So okay, let's take this one step further
    and start to set it up. So I'm going to put up a V6 server, DNS
    server running V6 only, and I'm going to set up an ad campaign
    that says to users, Hi, I'm an ad go and fetch this URL with a
    unique name, and then you go, Well, what am I seeing? And the
    answer is, whoa, dude, back off. This is getting to be really
    noisy. The first issue is the DNS, for all of its strengths, has
    some real rock banging primitives going on. It's a dual-stack
    world these days. And so if I want to sort of maximize my chances
    of getting things done, if I have a web name and I want to resolve
    it, I'll resolve it to both an A v4 and quad A V6 address record.
    Give me the v4 and V6 address records for this name, please. Can
    you do that in the DNS? No, there is an RFC that says you can't
    ask for two things at once. Dude, you got to ask for each thing in
    a separate query.

    Yep, moments not crossed. If we'd thought to invent the record
    that deterministically is a single question that does receive both
    types of response, give me the address rather than give me the 4
    and the 6. We might be in a different place. But the fact is, you
    can't do that in today's best current practice DNS.

    Ah, the best current practice DNS, if I was allowed my wish list,
    I would follow the path that Apple has already taken with their
    infrastructure, and I would add this third record that now most
    Apple family devices actually ask for. It's called the HTTPS
    record, and like everything else in the DNS, if you really want to
    get things done, use a no format, text, record TXT, and ping a
    whole lot of things in it. And interestingly, in the HTTPS query
    or the answer, you can put in a V6 hint, a v4 hint, and even an
    application level protocol. Whoa, so I can do this,

George Michaelson  29:36

    but only if you know to ask the right kind of question.

Geoff Huston  29:40

    Right And you can't assume that everyone has done HTTPS answers.
    So guess what we do today? Bizarrely, we ask for a records and
    quad A records and HTTPS records, and because it's UDP, if we
    don't get an answer soon enough, or we just feel like it's a
    Wednesday or. Whatever. We'll ask it again and again and again and
    again. Now the drop drop off is pretty high, about 30% of folk ask
    a second time, about 10% ask a third time, about 10% ask a fourth
    time, down to 1% ask a fifth, sixth, seventh, eighth. Now I'm
    answering everything. I'm not withholding the answers. The answers
    are out there, but queries fan out.

George Michaelson  30:27

    I want to say, Oh, this is strong evidence of something like
    packet loss in the system, but again, that's a naive answer. The
    likelihood is that people are using scattergun methods. More than
    one resolve is in play. Surface of query has got timers that are
    more aggressive in the end to end delay. It's not actually
    necessarily a strong signal the network is failing. It's that DNS
    is trying really hard,

Geoff Huston  30:53

    or it's standard economic behavior. The DNS is free dude, packets
    are free dude, I've paid my, you know, 100 bucks a month. Do what
    I want? Knock it out. Well, I'm going to send 100 queries for the
    same name. What's the penalty? Zero? So yes, the DNS, because it
    uses UDP, because there's very little overhead in this, we
    actually have a pretty wasteful theory of the way it operates, and
    so we just replicate queries in a rapid fire query. Fan out, and
    you tend to see on average, on average, for each name you query,
    you get the three queries, A,  Quad-A, HTTPS and a 30%

George Michaelson  31:31

    if they're an Apple user,

Geoff Huston  31:32

    if they're a Chrome user, you just get the two. But then you get
    with a 30% likelihood, a second query for the same type, the same
    name and the same query type, and again, 10% of the cases, you'll
    get a third and a fourth. It'll tail off pretty quickly. But the
    issue is, if you just count queries, you start to get misled,
    because the same...

George Michaelson  31:53

    more queries seen than actual edge questions were asked

Geoff Huston  31:58

    right. So let's wade through all this now. We don't quite
    understand what a resolver is. We don't quite understand what a
    query is. We don't understand what uniqueness is, and because
    that's the DNS, we really have no handle on the end customer. So
    you know, we're mucking around a bit in the dark. How can we get
    all this back together? Aha, the user in this ad did not do a DNS
    question. They did a web fetch, and we are both the DNS server and
    a web server. Aha. So what we do is just look at a web object
    whose only DNS entry is  a V6 record, and we just count the web
    fetches. Now the web fetch is an end user. It is a unique
    instance, so whether it was 100 queries or not, it's still, I got
    the web object or I didn't.

George Michaelson  32:53

    You do not have to care over what transport method that web object
    was fetched.

Geoff Huston  32:59

    That's not the question.

George Michaelson  33:00

    That is not the point. If that web object is fetched, it means the
    end user was told an answer for the multiplicity of questions. No
    matter what the answer was, they got an answer that led to them
    being able to send a request over the web. So it brings the web
    user into focus, and the very first web fetch you see says that
    user's question got answered to the user

Geoff Huston  33:28

    And more to the point. And I don't care what protocol the actual
    user is using to ask the DNS question of their recursive resolver.
    I don't care. What I care about is if I'm standing up a V6 only
    authoritative name server, did the users recursive DNS
    infrastructure? Was it capable of asking that question? So the
    question is, kind of, what percentage of the world's users are
    served by a DNS server that has only got V6.

George Michaelson  34:01

    So this, to me, is quite a functional measure to utility in a real
    network delivering real outcomes, because you're asking 40 million
    real edge devices, and you're not directing them that they must
    use Google public DNS, or they must use CloudFlare, or they must
    use their ISPs DNS you're measuring what do I actually see from a
    random pool of 40 million users? What is the real infrastructure
    impact of saying to the real deployed infrastructure to do this,
    you've got to be able to do V6 only. That's a good test.

Geoff Huston  34:36

    The answer is, and it's varied slightly over the last three months
    we've been running this we do, as I said, about 40 million a day.
    So it's been running for quite some time now. The low point,
    actually, over Christmas, over the New Year break, was 55% of
    users, and the current reading yesterday was 62 63% of users. And
    it's moderately stable, moderately. The it kind of varies 2% to 3%
    per day. So it seems like a pretty repeatable experiment. Ads go
    to new users every day, so it's not measuring the same people, and
    it's a pretty consistent set of answers. So if you stood up
    something in V6 in the DNS, you can expect to serve around two
    thirds of the world's users and not serve the other third. Yeah,
    maybe it's not a very good measurement, though. George, oops, you
    see what we're trying to measure here is the absence of an
    outcome. It's we're measuring the people who can't do what we
    expect them to do. We don't get any indication of that. We get a
    positive indication if they can. But there are many reasons why it
    might not happen, and one of those reasons might be the inability
    to do V6 over you know, in the DNS but there may be others. For
    example, if you think about what's going on in the user's engine
    that's running that ad, they have a DNS task must resolve this
    name, scurry, scurry scurry. Back comes an answer, if they can do
    this. But then that answer is placed over to the fetch part of the
    engine. Must fetch this URL. Now I have an IP address, but what if
    the user got bored? What if they terminated that and went to
    another page, because most users have the attention span?

George Michaelson  36:20

    Squirrel!

Geoff Huston  36:21

    Squirrel, yes, what? Who was that bright object? You know, they
    don't hang around. So there is a point where the user can just
    walk away in the middle of the browser trying to do something.

George Michaelson  36:33

    So you kind of need to know the overall effective delivery and
    error rate in your measurement rig to even begin to assess what
    percentage of this might be mis ascribed to a DNS problem.

Geoff Huston  36:46

    Right. So I can see, maybe the DNS query that's good, but I don't
    know who the user is, because DNS and I can see the web fetch if
    they bother to do it, but I can't correlate the two, because I
    don't know who's asking in the DNS easily. So I'm not sure that
    60% it might be low. It might really be 90% George, but the actual
    measurement system is not delivering,

George Michaelson  37:11

    Right? Because the nature of the measurement system is positive.
    Things are logged. Absence of something is not logged, but you
    don't know why it was absent.

Geoff Huston  37:19

    So I'm using two systems to do this measurement of the DNS. I'm
    looking at the DNS, and then I'm measuring its performance by
    looking at the web. So the real question is, Can I do the
    measurement in the DNS itself? Ooh, I'd like the DNS itself to be
    able to say, Yeah, I did that. I got the answer.

George Michaelson  37:40

    Given you're trying to measure resolver to authority, if you could
    construct behaviors that were confirming the resolver absolutely
    saw a thing because of some subsequent behavior the resolver does,
    you can take the user out of the equation.

Geoff Huston  37:54

    And I sat in on a presentation, at DNS OARC, a great, bunch of
    people, great. You know, if you have a look at some of the
    presentations on the DNS, it's cool stuff. And one of them was
    from French security folk who were talking about a technique they
    had, and it was actually for some kind of setting up DNS abuse.
    But the principle was actually quite neat. It was called
    "glueless" delegation, right?

George Michaelson  38:18

    I've seen drafts about stuff in glueless going back a number of
    years. So the concepts been around for a while, but I've never
    been very sure what glueless means, because of the confusion I
    have about what glue means. You can't know the "glue..less" if you
    don't know glue

Geoff Huston  38:37

    the DNS is a distributed database, and the poor old recursive
    resolver, using its local cache to help it, starts every query at
    the root or the cache root, or whatever you're doing, and then
    follows the chain down. So when I've got a.example.com I asked the
    root, hey, can you tell me the IP address of a.example.com and the
    root goes, you know, good try, dude, but I really don't care, and
    I don't know, but I do know who the servers for .com are, and
    here's their names. And just to help you along, and because I'm a
    nice dude, I'm going to attach in the additional section my
    understanding of the IP addresses of those name servers. Woohoo
    says I I'm going to go and ask those name servers one by one until
    I get an answer, a.example.com and these are the servers for.com
    and again, they'll do the same thing.

George Michaelson  39:34

    You started at the root. You gave them the whole label, fully
    qualified domain name. They told you haven't got a clue, but I
    know the next hop along the path, the com part, and they gave you
    additional data,

Geoff Huston  39:50

    No a referral response.

George Michaelson  39:52

    A referral response that included

Geoff Huston  39:55

    "go ask here" contains two elements, the names of those Name
    Servers and the glue, the IP addresses of those name servers,

George Michaelson  40:05

    the IP addresses that the parent wants you to know,

Geoff Huston  40:10

    some value of it right now, the odd part of the DNS is that the
    additional part, that  glue is not mandatory. Oh, hang on a
    second. How do I work this if I don't get the IP addresses, I've
    just got these names? And the answer is, that's quite okay. Stop
    what you're doing. That was task, A

George Michaelson  40:28

    Park it over here.

Geoff Huston  40:29

    Park it over here. You now have a new task, resolve these name
    server names. Okay, off I go. And this goes on. The beauty of
    recursion, as long as you want, you can set up infinitely long
    sort of glueless resolution chains, which is what that
    presentation back in OARC was all about, about trying to find the
    names of name servers.

George Michaelson  40:52

    These exist as brand new questions that have nothing to do with
    the question the original user asked in as much as they didn't ask
    this brand new question. It absolutely relates. It's material. The
    resolver needs to do it, but they didn't do it because the edge
    user passed it along. They got told to do it in asking the
    question to a parent zone.

Geoff Huston  41:16

    And the issue is, I can't proceed with my original query until
    I've satisfied the sub goal of resolving those name server names,
    because I don't know who to ask. And so the DNS, if you do it
    right, not using the additional section, not using if I
    effectively see the major task being resumed, the delegated server
    being queried. I know that it's been able to resolve those name
    server names, so now let me pose an interesting question, what if
    those name server names were only accessible in V6? ooooooh

George Michaelson  41:58

    So you might have skipped over a moment here that you would need
    those names not to already be cached and known in the name server.

    They're all unique.

    So as long as you can make a unique name to throw back as the name
    to be checked in this glueless moment and it doesn't exist, you
    can force them to have to go look and you have just created, in
    that moment, a subsidiary question and a proof the subsidiary
    question was answered and acted on, and you can now, in back,
    intrude, make that subsidiary question happen only over V6.

Geoff Huston  42:36

    This is all happening, not in the user's browser, not in the user
    at all. This is happening in the user's resolver that they've
    chosen to use, [George: right] It's happening somewhere else, and
    it's totally automatic. So in some ways, you can have the
    attention span of a highly active gnat, and it doesn't matter,
    yeah, the DNS has said, I am going to resolve this name you've
    given me.

George Michaelson  43:01

    There's no cancel. Once you see the question, this is, machine is
    going to run the distance,

Geoff Huston  43:07

    it's going to run the distance, it's going to do it, dude. And so
    you'd think, if I can set up the same issue in the DNS itself
    using glueless delegation, surely this is a better answer.

George Michaelson  43:20

    Sounds like it.

Geoff Huston  43:21

    So let's do it. Let's do both for the same user. I'm going to give
    you two tasks here. Resolve this name, and it's got a name server.
    It's V6. Only fetch the web object. We said, you know, 62 63%
    today. Do this, and I do the exactly the same thing to the same
    user, different name, but now the V6 test is embedded in this
    glueless referral, and I have a lot of confidence in that DNS
    answer that this is automated. No matter what the user's attention
    span, they can't stop it. It's the DNS that's taking over this. So
    I can measure this with a much greater level of assurance. And
    when I say, well, it's not 90% but 70% just a tad over 70% can
    resolve a name when the authoritative name server is V6 only, I'm
    pretty cool with that measurement. I think it's good. I'm much
    more confident in that 70% than I was in the 62 and you kind of
    go, well, Geoff isn't the problem solved? What it appears to be,
    is there's a loss rate of around 10% in converting the DNS to the
    web by using a DNS measurement, I'm on a better thing. And you go,
    Well, cool. Let's break things up a bit. And we're doing all these
    measurements all over the world. Let's look at the world, country
    by country. Let's see that this 10% difference everywhere. And the
    answer is, the DNS is always surprising, and nothing lives up to
    expectation. And when I go and look at the numbers for ISPs in
    Algeria or Libya or even Egypt, I get the shock of my life,
    because while there is an expected amount of web retrieval roughly
    60 to 70% except in Egypt, where it's a bit lower, at around 30%
    now, I expected the DNS answer to be eight to 10% larger,

George Michaelson  45:11

    Right? That's what you see in most economies.

Geoff Huston  45:14

    Well, that's what I see in Algeria. No, it's not. I see only 2% of
    users can actually resolve 2% yes, glueless is not supported
    there, and Libya just 10% in Egypt, 11, 12% or something like
    that. So in each case, those three countries, and there are a few
    others in the similar boat, our theory about the DNS is not right,
    that somehow inside their DNS infrastructure they go. You haven't
    given me the glue records for these delegations. I don't like this
    referral response. I'm not going to do it.

George Michaelson  45:51

    That feels like the fact that in geography, we're talking three
    economies on the coast of the Mediterranean in the North African
    sphere, that feels very like a local supply chain issue. Some
    aspect of their software or service supply chain is common and
    it's not good.

Geoff Huston  46:08

    Okay, so let's go a little bit further. We do other measurements
    in APNIC, and one of them is we look at which recursive resolver
    asks questions to our authoritative and in Algeria, which is a
    good one to pick on, about half the users find that their queries
    are ultimately handled by Google's 8.8 dot 8.8, and it's Google
    handing us those questions other places that use Google. Glueless
    is just fine, but not in Algeria. [George: Wow] Is Google doing
    something funny for Algeria? No, Google doesn't do that. Is there
    something between the user and the externally visible resolvers
    going on in that country? And the answer is, oh yes. There is oh
    yes. And a similar situation in Egypt, a similar situation in
    Libya. The proximity of those three countries tends to suggest
    that there is some kind of DNS filtering, slash whatever
    middleware that's been sold to all three to ISP operators in all
    three countries, that does some special handling before it gets to
    recursive resolver that kind of doesn't like glueless delegation,
    [George: right] Okay, good theory. Let's now look at the opposite.
    I said the drop rate was around eight to 10% which feels about
    right, except in Bolivia, where the drop rate 60, the DNS gives
    you an answer of, oh, about 85% of users can handle using glueless
    delegation queries over V6 in the DNS. But when I pose the same
    problem as a web problem, and don't forget, the web object itself
    is dual-stack, only 40% of users actually manage to resolve that
    name. It's got, whoa, not right. Ethiopia, similarly, 20% in the
    web, but 85% using the DNS. Myanmar similarly, and it's kind of,
    huh, I'm lost. I'm like, it's a dual-stack object, but somehow
    your loss rate in converting from the DNS to the web is really
    quite high, because most of those users do support queries over V6
    glueless DNS showed that we have evidence of that, yet, when you
    just simply have, this is a V6 only name server, simple question,
    simple answer, nah. Not

George Michaelson  48:31

    Again. It's tempting to say, well, they've made a different choice
    in middleware that relates to aspects of behavior between DNS and
    web, and it isn't the same as the coding fault. That means they
    drop glueless. It's some other aspect of this, but the net outcome
    is web doesn't work even when you have proof it should have

Geoff Huston  48:52

    but it's my browser. It's my web. The gluing of the DNS to the web
    happens on my processor, in my operating system in my hands. It's
    not someone else playing sillies with me. It's me.

George Michaelson  49:07

    Yeah, what's been sold to end users in those economies that has
    this effect?

Geoff Huston  49:12

    And you go, Well, what was the flesh tone? It was a one by one
    pixel of flesh tone the ends. No, it's white. It's just white. Do
    they just hate one by one blots? or maybe they do. I don't
    understand why there's such a loss rate for a tiny, tiny web
    object with a totally innocuous domain name, but somehow it's
    attracted the ire of of some kind of intercepting middleware,
    because that's what it seems to be that has taken an exception.
    And so, I suppose the issue out of all of this is measurement.
    Measurement really does rely on some assumptions about the
    architecture and machinery of what you're trying to measure.

George Michaelson  49:51

    You come out with a good feel about oh, 80/20, rule. This number
    is the error bars on the thing that I'm trying to assess. And I
    tell you. But when you dig down into that 80/20 you find yourself
    saying, Wow, this machine is way more complicated than I thought,
    and there are so many if-but-maybes for some number of people in
    this system, it's different.

Geoff Huston  50:14

    It's different. And no matter what you think might be going on,
    you can probably find evidence of that and everything else, which
    is totally, totally weird, but it does, I suppose, illustrate the
    fact that assuming a simple model of operation might be fine, but
    misleading, and taking a more subtle view and trying to actually
    establish measurements that expose those anomalous cases, I think,
    leads you a bit further in understanding the broader picture,
    [George: yeah], the real result about all this is, is the DNS
    really ready for V6 only? And the answer is, well, it might not be
    as bad as losing 40% of users, but you probably lose about 30%
    that's still bad.

George Michaelson  50:53

    We're still on a trajectory. We need to get a higher number to
    really say we can head to a behavior that's a bit in nudge theory,
    if we did this now, we'd actually be a roadblock on people's web
    experience.

Geoff Huston  51:06

    If someone did and said, I'm a pioneer, I'm going to do V6 only
    DNS, the answer is, well, you know, a whole bunch of users won't
    be able to get to you. Dude. That's just the basic answer,
    [George: yeah] that won't happen. Exactly how many? That's an
    interesting path, and it's kind of well, it depends on how you try
    and measure it is what the answer you get, [George: yeah] which
    was totally unexpected, I think, a salutary lesson in measurement.

George Michaelson  51:28

    This is written up on your website, Geoff, you've got a report on
    this.

Geoff Huston  51:32

    I wish I could say yes, my hands are still on the keyboard,

George Michaelson  51:35

    so coming to the website soon, will be a report on this. That's
    been really fascinating. Geoff, that's great. Thank you,

Geoff Huston  51:42

    And thank you, and thank you, dear listener for hanging in with
    us. Thanks.

George Michaelson  51:47

    If you've got a story or research to share here on ping, why not
    get in contact by email to ping@apnic.net or via the APNIC social
    media channels. Also remember the measurement@apnic.net mailing
    list on orbit. Is there to discuss and share relevant
    collaborative opportunities, grants and funding opportunities,
    jobs and graduate placings, or to seek feedback from the community
    on your own measurement projects, be sure to check out the APNIC
    website for all your resource and community needs until next time
    you.