Geoff Huston 0:00 8.8.8.8, operated by Google, their public DNS system is not an engine, it's hundreds, possibly 1000s, of engines all over the world. And you can ask a query, and I can ask a query, and they'll go in different places, because it's a compound system. It's a hybrid. Now, why is it important to sort of look at this and go, I'm really concerned about resolvers, because what we're talking about is whether it's viable to have a name server running V6 only and in the way the DNS works the theory on paper is that users and clients, applications, things in your phone, things in my phone, don't ask authoritative servers, we asked middleware, those invisible agents called recursive resolvers, and they do all the hard work. Why? George Michaelson 1:00 you're listening to ping, a podcast by APNIC discussing all things related to measuring the Internet. I'm your host, George Michaelson, this time, I'm talking to Geoff Huston from APNIC labs again in his regular monthly spot on ping. Geoff has been running advertising based experiments for over a decade measuring behavior on the Internet. Recently, he's been exploring a problem of interest in the modern DNS. The Domain Name System, DNS fundamentally requires all of the end users, their chosen resolver provider and the authoritative servers of the names they ask about to cooperate in a dance over IP protocols answering DNS questions. The specifics of how these questions are encoded and passed around can get complex very quickly, but a specific problem is emerging in how we would define, with strong force or "normatively" the ways that the protocol works, and this is going to affect future deployment, code, development and operational dependencies. This question relates to the use of IPV6 inside the DNS system at large. Can we yet declare that an IPV6 only DNS could be used reliably? and should we write it into operational practices An RFC can define? These practices can be elevated to the status of what is called a best current practice, or BCP document. Geoff is exploring measurement of this question by exploiting a model of DNS that is called "glueless" glue is extra information passed around behind the scenes that makes DNS work if it's not given it forces the DNS resolver to ask extra questions. These, in turn, can be used to force delivery over an IPV6 only network between the resolver and the authority. This avoids many questions about error rate, drop off rate, and problems with measurements seen measuring end users whose minds may wander away from the web page that triggered the advert. Geoff. Welcome back to ping. What shall we talk about this time? Geoff Huston 3:24 Hi George. Let's talk about the Internet. George Michaelson 3:27 Okay, where to begin? Geoff Huston 3:29 Where to begin. I'm going to take the two topics that I've spent a lot of time in two out of three. Ain't bad. I'm going to talk about V6 and the DNS George Michaelson 3:39 Awww two together! Geoff Huston 3:41 yes, but not security. So, you know, I'm going to talk about measuring the DNS and its use of V6, and in particular, because the DNS operates at one level above transport, you can put DNS queries over v4 or V6. DNS doesn't care. And so the question is, when you set up some DNS infrastructure, your recursive resolver, your authoritative name server, and all those other bits and pieces. Should you just use v4 and forget about it? Should you go all dual-stack everywhere? Are you brave and adventurous? Should you try V6 only great questions and so this is not a recent question, and several some answers if you, you know, shuffle around in the RFCs, and the first of these actually is now, gee whiz, a dozen years old, 2004 RFC3901 DNS, IPV6, transport operational guidelines. And it's bizarre, because you sort of try and look and say, Well, what advice can you give me about V6 and the DNS and it says every recursive name server should either be v4 only or dual-stack. George Michaelson 4:52 Now, that's quite a strong directing instruction, isn't it? Geoff. I mean, that begs many questions. Did we ever get there? Geoff Huston 5:02 Well, we're there today. I'm like you were given either v4 only or dual-stack. What it's missing is, don't be brave. Don't do V6 only, and it's almost like a blessing of the status quo. George Michaelson 5:15 So it didn't try to suggest the path into V6 only. It only made requirements to be four or dual. Geoff Huston 5:22 Yes. And what about authoritative name servers, those boxes that say I know all about these names, ask me, Well, that's easy. Every DNS zone should be served by at least one v4 reachable name server, George Michaelson 5:36 which is, when you think about it, that's a position that absolutely made sense then, and may very well make sense now, although that's a testable proposition, it was a reflection of need and a reflection of behavior of systems, the dependencies on things like tunnels, the stuff that you've been exploring many times on PING around V6 DNS and large packets. That wasn't a bad direction Geoff. Geoff Huston 6:00 Well, it was the status quo. And what it really says is this V6 transport operational guideline, don't drop v4 don't let your V6 enthusiasm get the better of you. This is not, you know, evangelism about V6. George Michaelson 6:15 This is pragmatism. This is a reflection of the emerging reality we live in, documented as this is what we see, Geoff Huston 6:22 A V6 piece of infrastructure can't select its clients, and authoritative name servers are promiscuous in the sense that you answer all questions. There's no bad questioner. And so in some ways, this was saying dozen years ago, no time to be brave if you really want this stuff to work and work well, you better keep V4 going [George: right] -Now. It's a dozen years later, and there's been some work, predominantly by the folk who have strong opinions about promoting V6. And this is turned into a soon to be RFC, I guess. And it's some odd No, it's RFC3901bis George Michaelson 7:02 Yep. So it's the BIS thing, which comes from the CCITT, which was a French language aware standardization effort, and the BIS and TER denotations of the French for second and third. Geoff Huston 7:18 Thank you, George. Because I was wondering, why are we using BIS I'm like, Really, I thought you okay. George Michaelson 7:25 So in normal RFC practice, every RFC is a new number, and we had RFC822, but the RFC that is normative for header structure in emails does not have to stay 822, with a version one, version two, version three flag. We might talk about it as 822BIS, but at the point we actually get off the seats and publish it, it gets assigned a new number, doesn't it? Geoff Huston 7:52 Right? So this thing hasn't got a number yet. We're working on a successor to this 12 year old advice, and it is currently titled In the interim, version RFC3901 George Michaelson 8:03 So it sets the stage. It sets an expectation. It's reflecting on that, and it's going to probably be an enhancement on that, not a complete change from that. Geoff Huston 8:15 So the two recommendations that are kind of critical both actually relate to the authoritative name servers. And this time, it's saying two name servers for a zone are dual-stack, and every name server should be served by at least one name server that is reachable over V6. George Michaelson 8:36 Wow. So it's flipped a bit from saying you must have a 4 to saying you must be dual, which implicitly says has a 4 somewhere in your name server set, and you must have a 6 somewhere in your name server set. That's a flip. Geoff Huston 8:53 I actually read the first one as two name servers. A dual-stack means two 6, [George: yes]. And what it's minimally saying is two 4 and two 6. What it's not saying is v4 only. It's kind of saying you can do that, but as long as you have at least two dual-stack George Michaelson 9:11 This is an exquisite moment in some ways, because when we talk about things in Internet and standards, there's always this giant fork in the road, right? There's things as they are, what are we actually doing? And there's aspirational, where do we want to be, [Geoff: right] And I cannot help but think we are entering the door of changing from a things as they are moment to an aspirational, Geoff Huston 9:37 oh Yep, that touches upon something that we'll touch upon right now, because both of these documents, the old one, RFC3901, and this current sort of BIS suggestion, are what we call best current practice documents. And I must admit, when you poke and prod at the IETF, the actual intention of a best current practice Is not clear. Even in you know, the brains of the IESG, the Internet Engineering Steering Group, those worthy folk who are the holders of of the IETF terms and traditions. [George: Yeah] I'll ask the question, is a BCP, best current practice, a checklist, a shopping list if you're going to deploy something, tick these boxes, because that's the best way we know how to do it. George Michaelson 10:27 That word current, if you believe words have meanings and are not just labels, if you believe in what I might call nominative determinism, that word current strongly implies what you just said, it it would empower me to go out into the marketplace and say to providers of services, providers of software, this is what everyone does. And I want you to tell me you will do this, because that's what I need. And I'm putting three tenders out there. Every one of you has to say, if you meet this level, to me, that's what it kind of says when it says current, Geoff Huston 11:02 Current practice. So what it says is implementations should support this, because customers, users and people, the whole idea of what RFCs are about, protecting the customer and saying, This is the technology you know you should be buying. It's the shopping list George Michaelson 11:21 I adhere. We're done recording over. PING, finished. Geoff Huston 11:25 There's another view. George, [George: okay] this document actually has a little bit of this aspiration, [George: right] It would be good if we're not going to describe what implementations do. We're going to describe what implementations should do if they adhere to the principles of the author, [George: right] And what the document actually includes is consideration which is novel. This is the change that you can mount a DNS authoritative name server, or recursive name server that operates only in V6. [George: Oh, wow]. And let's look at the poor plight of the DNS recursive name server, because we know out there that there are authoritative name servers that say v4 only, dude, that's it. You can only reach me if you do v4. Now I am trying to ask questions. I'm the recursive resolver, and I only have V6. I'm fine. I look at this name server, and it says, Here's my v4 address. And I go, Look, I'm sorry, there's nothing I can do now, the draft says, hand wave, hand wave, hand wave. You should forward on to someone who knows what they're doing. Any implementation code about there no does DNS support forwarding? Well, it doesn't. It doesn't George Michaelson 12:40 it supports it in as much as in my operational platform. In the boundary of configuration about how I run my machine, I get to say, for this machine, I want to use these things to forward questions I can't answer, but there's zero in the protocol that talks to that forwarding behavior, and there's no obligation on me to have a forwarder configured. It actually isn't a mandatory component of behavior. Geoff Huston 13:06 Well, there be dragons in routing. If you forward packets to me and I forward packets to you, then if any packet enters the Death Loop, they will swing there forever. You know this is a trap! George Michaelson 13:19 yeah, oh, we've talked about that in other protocols, and so this immediately presents, if there isn't protocol support first class for forwarding behavior, you've made a problem. Geoff Huston 13:30 So the protocol response is, you put in a counter a time to live a hop count. And basically the answer is, you can set it to a high number, and everyone who touches the packet drops that number, and if it ever gets to zero, kill it. You're in a loop. So you can't have packets forwarding to eternity. You just can't. Does the DNS have one of those? Well, the answer is, no, it does not. So when you have ahh just forward your way out of hell, the answer is, well, what if where I'm forwarding to forwards it back to me, either directly or indirectly. The answer is, you will be passing that packet around until the heat death of the universe. Nothing else is going to stop it. And this is this critical assumption, whether, when a BCP becomes aspirational, and there is no code in common practice, is it still a BCP? And I say, George Michaelson 14:19 I don't see how it can be. Geoff Huston 14:21 And I say to the worthy folk in the ISG, that document may be many things, and it may even say good aspirational things, but if I apply the words best current practice, which are English, not French, then the answer is, No, this one is not that. It's just wide of the mark. I'm sorry. George Michaelson 14:41 So where is the IESG on this point of view? Geoff, Geoff Huston 14:47 Oh, you go searching for a spine, and you can take forever to find one. They'll do it because doing it's the path of least resistance. However, that's not what I'm bemoaning and not what I'm really worrying about here, what I'm actually looking at. Is actually something more interesting here is the DNS and V6 at a level where V6 is mature, well understood and deployed sufficiently for V6 only infrastructure to be deployed as being efficient and as fast as v4 is it out there? Can we measure it? That's the question in my head, right? George Michaelson 15:22 So this is kind of the role of measurement as information gathering to help us tool up for a rational conversation. If the thing about BCP versus aspiration is on the table and we're now exploring even as aspiration, would it be sensible to do this? You need info. That's where measurement comes to the story. Geoff Huston 15:42 Right? If I put up a server and it only had V6, how much of the Internet in terms of end clients couldn't reach me? And you think, Well, that would be a really easy question, wouldn't it? Do it! Hang on. Hang on. You see, that's a harder question than you might think, what are you counting users? How can I get billions of people to come to my website? Not easy. It's kind of a harder question. And am I counting users or something else? So let's have a look at what we don't understand. And the first thing we don't understand is in the DNS we talk about resolvers and servers. Resolvers are those things that take a DNS name and doing operations. Let's not define them yet. Doing things return back the translation of that name into an IP address. So I am building a resolver system, and I'm building out a components that can handle 10 queries a second, and I put in 100 of them and do a front end that disperses the queries across that so 10 by 100 that's 1000 queries a second. But it's not one resolver anymore. It's 1000 isn't it? Is a resolver one or many? Is it a single platform? What is it a collection now? Are they independent? Do they each have their own caches? So if I ask a question to this mythical compound beast, and it sends me back an answer, has that beast cached that answer for all future queries, or has only one engine in that beast, cached that answer, and if I ask it again, will it take the same amount of time? Because that cache response is not everywhere? Perhaps. Wow! George Michaelson 17:33 We've been here across many, many protocols in the history of distributed computing and computer science. We've been here with databases. We've been here with squid and web caches. We've been here with questions about cache coherency, in the behavior of written to disk versus inactive memory versus level one cache. This problem of "where are things" and when I have two machines working and churning against them. When do they both represent the same state? These things have been around as problems a long time, and the answers are deceptively simple, right? It's very glib to say, Oh yes, I know exactly what this is. And the answer generally is, no, you haven't considered all of the corner cases that can exist in this situation, Geoff Huston 18:23 8.8.8.8, operated by Google, the public DNS system is not an engine. It's hundreds, possibly 1000s, of engines all over the world. And you can ask a query, and I can ask a query, and they'll go in different places, because it's a compound system. It's a hybrid. Now, why is it important to sort of look at this and go, I'm really concerned about resolvers, because what we're talking about is whether it's viable to have a name server running V6 only and in the way the DNS works. The theory on paper is that users end clients, applications, things in your phone, things in my phone. Don't ask authoritative servers. We asked middleware, those invisible agents called recursive resolvers, and they do all the hard work. Why? George Michaelson 19:13 Yeah, there's no actual barrier to me as an edge device doing that directly authority. [Geoff... George it's so slow!] but there's no protocol level boundary. It is slow Geoff, but it's not an enforced behavior, because the protocol police said You mustn't do it. And it's not that the authority is going to refuse to answer. It's that we've built the system predicated on a belief the millions of customers don't they go through an intermediary. Geoff Huston 19:43 We've built a system to scale and to work at speed, and if everyone didn't cache, didn't use those intermediate systems, we'd have junked the DNS 20 years ago, 30 it would never have got off the ground, because the way we actually make this system scale and work really, really well is to place these agents. Normally, there's one or two or maybe 100 in every Internet service provider. It's their DNS recursive resolver that they say to their customers, use this. And the beauty of it is, if you and I are in the same ISP and you have a question, what is, you know, dub, dub dot Geoff. And it goes, Well, the answer is 10.1 and then I ask, What's dub? Dub? Dub Geoff? The answer goes, I don't need to look this up. Dude, I just answered George. The answer is 10.0.0.1 have a good life. And all of a sudden, things speed up. George Michaelson 20:35 We've had protocol developments across the life of DNS where people actually observed the caching timers in this system and said, Look, if I know that Google's being hammered, and I'm answering from my local cache for google.com and I've got a timer when I last asked the question, I could choose to go and revalidate that question before the timer runs out. Geoff Huston 20:58 Ah, yes, I know, but you're into my second question about what we don't understand while we're asking questions of the DNS. So the first thing we don't understand is, we actually don't know what a resolver is. We think we do what we don't. The second thing is, what's a query that's crazy. We know what a query is. It's a DNS protocol George Michaelson 21:16 It's a question. Man Geoff Huston 21:18 You know, the query get answered, gets answered, and it's an answer, but that's not actually what I'm what I'm on about. You see, because we're running UDP, we don't know when to expect an answer. George Michaelson 21:30 So a reminder that DNS, before all the extra pieces are added, is questions about name to something put into DNS protocol buffers, put into UDP, put into IP, but we're assuming everything is still put into Geoff Huston 21:48 UDP, and UDP is exactly like putting a message in a bottle. No, seriously, you might get a bottle back. You don't know when, and so any any good cast away would put a mess the same message in bottles, one a day until eventually one gets answered. In other words, if you don't know when the answer is going to come, and your internal model of how long it should take, times out you do it again. George Michaelson 22:13 So the Internet model of the cast away is you are cast away on a desert island with a crate of Coca Cola in bottles and an endless supply of corks Geoff Huston 22:22 And paper and pen. Yes, you just go for it. That's the Internet for you. That's the new model. Keep that in your head. And so the issue is, we fan out queries excessively in the DNS in order to make the thing fast and in order to give the simulation of reliability even when we don't have it. And interestingly, this kind of Oh, the bottle hasn't made it back. I need to send it again. Implementations have their own view of timers. It's not written in any RFC. Some resolvers are unduly patient. We call them slow. Some resolvers are incredibly impatient and fan out queries like crazy. And oddly enough, they're quite fast, but very wasteful, George Michaelson 23:05 yeah. But we also call those denial of service sometimes. Geoff Huston 23:09 Oops. So in some ways, queries just splay out. And if you look at the other side, I'm an authoritative name server, and these things keep on asking me questions now, because it's a Stateless protocol, I don't know who the originator of that query was. A recursive resolver was asking me questions. There's no path, there's no hop count, there's no reason. I just have to answer it and move on. George Michaelson 23:34 If you're a true believer, a sweet, innocent child like me, you want to believe that if you're the authoritative server for geoff.com and there is a reason that a million people want to know about geoff.com the great advantage of this model of intermediaries the resolvers, is that although you're going to see a lot of queries, in principle, you shouldn't see a million, you should be seeing a lot of queries, but less, because those front end resolvers are supplying the answer. Here is geoff.com from a local cache. So the naive child view is this model is working. Geoff Huston 24:14 Right. If I have 1000 recursive resolvers out there and geoff.com is new, if I'm the server, I might see 1000 queries from each of those recursive resolvers. But if I say, here's the answer, and by the way, remember it for a week, then as long as the recursive resolvers follow my suggestion, keep that in your local cache for a week, I shouldn't see any questions at all for the next week, none. So the authoritative server sees cache misses. It sort of sees what's left after the recursive resolver has applied its local memory. George Michaelson 24:49 It sees the real hits, but the vast majority of questions it sees coming to it are things that either have timed out of cache or somehow have never been asked before. Geoff Huston 25:00 Right. Okay, so that, dear listeners, is the DNS in a nutshell, ready? It's complicated. George Michaelson 25:06 Okay, so we're really done. Let's go. Geoff Huston 25:09 So now we get back to this question, Is it viable to run V6 only DNS servers? And for that, I'm going to go back to our measurement rig that we run in APNIC labs. We use Google's advertising network, and Google support us in that. Thank you, Google. Love your work now, really, it is very valuable, and we do appreciate it. Now, this ad network seeds millions of ads a day across the entire Internet, about 40 million or so these days, ads are everywhere, literally, from the Faroe Islands to the Seychelles to even to deepest, darkest Canberra in Australia. You know, it's everywhere. And the way we do this is we measure what those users can do. The problem is, is that ads can't drive the machine where the ad is placed, it can't seize control of your mobile phone or your laptop or anything else. They can't do that. [George: yeah] all ads can do when an ad is placed is get some objects. George Michaelson 26:13 That's it. Make you the edge user, issue a request to get something Geoff Huston 26:18 And each request has a DNS part and an HTTPS part these days, George Michaelson 26:23 assuming the DNS part goes the distance, yes. Geoff Huston 26:27 So what if each person who receives an ad, if, when I say person, each device, then if I make that name unique in some form or fashion, then caching won't help anyone. George Michaelson 26:40 Some part of the name being unique means that some part of the DNS system cannot satisfy the answer from local cache. It must go back to an authority to say, what about this one? Geoff Huston 26:53 And if I run the authoritative server for that name, I get to see that user trying to resolve that name through some recursive resolver, I get to see the query. Now it's the DNS. I don't get to see the user. I get to see their recursive resolvers. But I was asking a question about the use of V6 as transport for recursive resolvers in the DNS. So yay. I'm actually showing the behaviors that I want to measure. George Michaelson 27:23 Yeah, if you craft the correct way of the resolver having to come to you to answer that question, Geoff Huston 27:30 yes, this is fantastic. So okay, let's take this one step further and start to set it up. So I'm going to put up a V6 server, DNS server running V6 only, and I'm going to set up an ad campaign that says to users, Hi, I'm an ad go and fetch this URL with a unique name, and then you go, Well, what am I seeing? And the answer is, whoa, dude, back off. This is getting to be really noisy. The first issue is the DNS, for all of its strengths, has some real rock banging primitives going on. It's a dual-stack world these days. And so if I want to sort of maximize my chances of getting things done, if I have a web name and I want to resolve it, I'll resolve it to both an A v4 and quad A V6 address record. Give me the v4 and V6 address records for this name, please. Can you do that in the DNS? No, there is an RFC that says you can't ask for two things at once. Dude, you got to ask for each thing in a separate query. Yep, moments not crossed. If we'd thought to invent the record that deterministically is a single question that does receive both types of response, give me the address rather than give me the 4 and the 6. We might be in a different place. But the fact is, you can't do that in today's best current practice DNS. Ah, the best current practice DNS, if I was allowed my wish list, I would follow the path that Apple has already taken with their infrastructure, and I would add this third record that now most Apple family devices actually ask for. It's called the HTTPS record, and like everything else in the DNS, if you really want to get things done, use a no format, text, record TXT, and ping a whole lot of things in it. And interestingly, in the HTTPS query or the answer, you can put in a V6 hint, a v4 hint, and even an application level protocol. Whoa, so I can do this, George Michaelson 29:36 but only if you know to ask the right kind of question. Geoff Huston 29:40 Right And you can't assume that everyone has done HTTPS answers. So guess what we do today? Bizarrely, we ask for a records and quad A records and HTTPS records, and because it's UDP, if we don't get an answer soon enough, or we just feel like it's a Wednesday or. Whatever. We'll ask it again and again and again and again. Now the drop drop off is pretty high, about 30% of folk ask a second time, about 10% ask a third time, about 10% ask a fourth time, down to 1% ask a fifth, sixth, seventh, eighth. Now I'm answering everything. I'm not withholding the answers. The answers are out there, but queries fan out. George Michaelson 30:27 I want to say, Oh, this is strong evidence of something like packet loss in the system, but again, that's a naive answer. The likelihood is that people are using scattergun methods. More than one resolve is in play. Surface of query has got timers that are more aggressive in the end to end delay. It's not actually necessarily a strong signal the network is failing. It's that DNS is trying really hard, Geoff Huston 30:53 or it's standard economic behavior. The DNS is free dude, packets are free dude, I've paid my, you know, 100 bucks a month. Do what I want? Knock it out. Well, I'm going to send 100 queries for the same name. What's the penalty? Zero? So yes, the DNS, because it uses UDP, because there's very little overhead in this, we actually have a pretty wasteful theory of the way it operates, and so we just replicate queries in a rapid fire query. Fan out, and you tend to see on average, on average, for each name you query, you get the three queries, A, Quad-A, HTTPS and a 30% George Michaelson 31:31 if they're an Apple user, Geoff Huston 31:32 if they're a Chrome user, you just get the two. But then you get with a 30% likelihood, a second query for the same type, the same name and the same query type, and again, 10% of the cases, you'll get a third and a fourth. It'll tail off pretty quickly. But the issue is, if you just count queries, you start to get misled, because the same... George Michaelson 31:53 more queries seen than actual edge questions were asked Geoff Huston 31:58 right. So let's wade through all this now. We don't quite understand what a resolver is. We don't quite understand what a query is. We don't understand what uniqueness is, and because that's the DNS, we really have no handle on the end customer. So you know, we're mucking around a bit in the dark. How can we get all this back together? Aha, the user in this ad did not do a DNS question. They did a web fetch, and we are both the DNS server and a web server. Aha. So what we do is just look at a web object whose only DNS entry is a V6 record, and we just count the web fetches. Now the web fetch is an end user. It is a unique instance, so whether it was 100 queries or not, it's still, I got the web object or I didn't. George Michaelson 32:53 You do not have to care over what transport method that web object was fetched. Geoff Huston 32:59 That's not the question. George Michaelson 33:00 That is not the point. If that web object is fetched, it means the end user was told an answer for the multiplicity of questions. No matter what the answer was, they got an answer that led to them being able to send a request over the web. So it brings the web user into focus, and the very first web fetch you see says that user's question got answered to the user Geoff Huston 33:28 And more to the point. And I don't care what protocol the actual user is using to ask the DNS question of their recursive resolver. I don't care. What I care about is if I'm standing up a V6 only authoritative name server, did the users recursive DNS infrastructure? Was it capable of asking that question? So the question is, kind of, what percentage of the world's users are served by a DNS server that has only got V6. George Michaelson 34:01 So this, to me, is quite a functional measure to utility in a real network delivering real outcomes, because you're asking 40 million real edge devices, and you're not directing them that they must use Google public DNS, or they must use CloudFlare, or they must use their ISPs DNS you're measuring what do I actually see from a random pool of 40 million users? What is the real infrastructure impact of saying to the real deployed infrastructure to do this, you've got to be able to do V6 only. That's a good test. Geoff Huston 34:36 The answer is, and it's varied slightly over the last three months we've been running this we do, as I said, about 40 million a day. So it's been running for quite some time now. The low point, actually, over Christmas, over the New Year break, was 55% of users, and the current reading yesterday was 62 63% of users. And it's moderately stable, moderately. The it kind of varies 2% to 3% per day. So it seems like a pretty repeatable experiment. Ads go to new users every day, so it's not measuring the same people, and it's a pretty consistent set of answers. So if you stood up something in V6 in the DNS, you can expect to serve around two thirds of the world's users and not serve the other third. Yeah, maybe it's not a very good measurement, though. George, oops, you see what we're trying to measure here is the absence of an outcome. It's we're measuring the people who can't do what we expect them to do. We don't get any indication of that. We get a positive indication if they can. But there are many reasons why it might not happen, and one of those reasons might be the inability to do V6 over you know, in the DNS but there may be others. For example, if you think about what's going on in the user's engine that's running that ad, they have a DNS task must resolve this name, scurry, scurry scurry. Back comes an answer, if they can do this. But then that answer is placed over to the fetch part of the engine. Must fetch this URL. Now I have an IP address, but what if the user got bored? What if they terminated that and went to another page, because most users have the attention span? George Michaelson 36:20 Squirrel! Geoff Huston 36:21 Squirrel, yes, what? Who was that bright object? You know, they don't hang around. So there is a point where the user can just walk away in the middle of the browser trying to do something. George Michaelson 36:33 So you kind of need to know the overall effective delivery and error rate in your measurement rig to even begin to assess what percentage of this might be mis ascribed to a DNS problem. Geoff Huston 36:46 Right. So I can see, maybe the DNS query that's good, but I don't know who the user is, because DNS and I can see the web fetch if they bother to do it, but I can't correlate the two, because I don't know who's asking in the DNS easily. So I'm not sure that 60% it might be low. It might really be 90% George, but the actual measurement system is not delivering, George Michaelson 37:11 Right? Because the nature of the measurement system is positive. Things are logged. Absence of something is not logged, but you don't know why it was absent. Geoff Huston 37:19 So I'm using two systems to do this measurement of the DNS. I'm looking at the DNS, and then I'm measuring its performance by looking at the web. So the real question is, Can I do the measurement in the DNS itself? Ooh, I'd like the DNS itself to be able to say, Yeah, I did that. I got the answer. George Michaelson 37:40 Given you're trying to measure resolver to authority, if you could construct behaviors that were confirming the resolver absolutely saw a thing because of some subsequent behavior the resolver does, you can take the user out of the equation. Geoff Huston 37:54 And I sat in on a presentation, at DNS OARC, a great, bunch of people, great. You know, if you have a look at some of the presentations on the DNS, it's cool stuff. And one of them was from French security folk who were talking about a technique they had, and it was actually for some kind of setting up DNS abuse. But the principle was actually quite neat. It was called "glueless" delegation, right? George Michaelson 38:18 I've seen drafts about stuff in glueless going back a number of years. So the concepts been around for a while, but I've never been very sure what glueless means, because of the confusion I have about what glue means. You can't know the "glue..less" if you don't know glue Geoff Huston 38:37 the DNS is a distributed database, and the poor old recursive resolver, using its local cache to help it, starts every query at the root or the cache root, or whatever you're doing, and then follows the chain down. So when I've got a.example.com I asked the root, hey, can you tell me the IP address of a.example.com and the root goes, you know, good try, dude, but I really don't care, and I don't know, but I do know who the servers for .com are, and here's their names. And just to help you along, and because I'm a nice dude, I'm going to attach in the additional section my understanding of the IP addresses of those name servers. Woohoo says I I'm going to go and ask those name servers one by one until I get an answer, a.example.com and these are the servers for.com and again, they'll do the same thing. George Michaelson 39:34 You started at the root. You gave them the whole label, fully qualified domain name. They told you haven't got a clue, but I know the next hop along the path, the com part, and they gave you additional data, Geoff Huston 39:50 No a referral response. George Michaelson 39:52 A referral response that included Geoff Huston 39:55 "go ask here" contains two elements, the names of those Name Servers and the glue, the IP addresses of those name servers, George Michaelson 40:05 the IP addresses that the parent wants you to know, Geoff Huston 40:10 some value of it right now, the odd part of the DNS is that the additional part, that glue is not mandatory. Oh, hang on a second. How do I work this if I don't get the IP addresses, I've just got these names? And the answer is, that's quite okay. Stop what you're doing. That was task, A George Michaelson 40:28 Park it over here. Geoff Huston 40:29 Park it over here. You now have a new task, resolve these name server names. Okay, off I go. And this goes on. The beauty of recursion, as long as you want, you can set up infinitely long sort of glueless resolution chains, which is what that presentation back in OARC was all about, about trying to find the names of name servers. George Michaelson 40:52 These exist as brand new questions that have nothing to do with the question the original user asked in as much as they didn't ask this brand new question. It absolutely relates. It's material. The resolver needs to do it, but they didn't do it because the edge user passed it along. They got told to do it in asking the question to a parent zone. Geoff Huston 41:16 And the issue is, I can't proceed with my original query until I've satisfied the sub goal of resolving those name server names, because I don't know who to ask. And so the DNS, if you do it right, not using the additional section, not using if I effectively see the major task being resumed, the delegated server being queried. I know that it's been able to resolve those name server names, so now let me pose an interesting question, what if those name server names were only accessible in V6? ooooooh George Michaelson 41:58 So you might have skipped over a moment here that you would need those names not to already be cached and known in the name server. They're all unique. So as long as you can make a unique name to throw back as the name to be checked in this glueless moment and it doesn't exist, you can force them to have to go look and you have just created, in that moment, a subsidiary question and a proof the subsidiary question was answered and acted on, and you can now, in back, intrude, make that subsidiary question happen only over V6. Geoff Huston 42:36 This is all happening, not in the user's browser, not in the user at all. This is happening in the user's resolver that they've chosen to use, [George: right] It's happening somewhere else, and it's totally automatic. So in some ways, you can have the attention span of a highly active gnat, and it doesn't matter, yeah, the DNS has said, I am going to resolve this name you've given me. George Michaelson 43:01 There's no cancel. Once you see the question, this is, machine is going to run the distance, Geoff Huston 43:07 it's going to run the distance, it's going to do it, dude. And so you'd think, if I can set up the same issue in the DNS itself using glueless delegation, surely this is a better answer. George Michaelson 43:20 Sounds like it. Geoff Huston 43:21 So let's do it. Let's do both for the same user. I'm going to give you two tasks here. Resolve this name, and it's got a name server. It's V6. Only fetch the web object. We said, you know, 62 63% today. Do this, and I do the exactly the same thing to the same user, different name, but now the V6 test is embedded in this glueless referral, and I have a lot of confidence in that DNS answer that this is automated. No matter what the user's attention span, they can't stop it. It's the DNS that's taking over this. So I can measure this with a much greater level of assurance. And when I say, well, it's not 90% but 70% just a tad over 70% can resolve a name when the authoritative name server is V6 only, I'm pretty cool with that measurement. I think it's good. I'm much more confident in that 70% than I was in the 62 and you kind of go, well, Geoff isn't the problem solved? What it appears to be, is there's a loss rate of around 10% in converting the DNS to the web by using a DNS measurement, I'm on a better thing. And you go, Well, cool. Let's break things up a bit. And we're doing all these measurements all over the world. Let's look at the world, country by country. Let's see that this 10% difference everywhere. And the answer is, the DNS is always surprising, and nothing lives up to expectation. And when I go and look at the numbers for ISPs in Algeria or Libya or even Egypt, I get the shock of my life, because while there is an expected amount of web retrieval roughly 60 to 70% except in Egypt, where it's a bit lower, at around 30% now, I expected the DNS answer to be eight to 10% larger, George Michaelson 45:11 Right? That's what you see in most economies. Geoff Huston 45:14 Well, that's what I see in Algeria. No, it's not. I see only 2% of users can actually resolve 2% yes, glueless is not supported there, and Libya just 10% in Egypt, 11, 12% or something like that. So in each case, those three countries, and there are a few others in the similar boat, our theory about the DNS is not right, that somehow inside their DNS infrastructure they go. You haven't given me the glue records for these delegations. I don't like this referral response. I'm not going to do it. George Michaelson 45:51 That feels like the fact that in geography, we're talking three economies on the coast of the Mediterranean in the North African sphere, that feels very like a local supply chain issue. Some aspect of their software or service supply chain is common and it's not good. Geoff Huston 46:08 Okay, so let's go a little bit further. We do other measurements in APNIC, and one of them is we look at which recursive resolver asks questions to our authoritative and in Algeria, which is a good one to pick on, about half the users find that their queries are ultimately handled by Google's 8.8 dot 8.8, and it's Google handing us those questions other places that use Google. Glueless is just fine, but not in Algeria. [George: Wow] Is Google doing something funny for Algeria? No, Google doesn't do that. Is there something between the user and the externally visible resolvers going on in that country? And the answer is, oh yes. There is oh yes. And a similar situation in Egypt, a similar situation in Libya. The proximity of those three countries tends to suggest that there is some kind of DNS filtering, slash whatever middleware that's been sold to all three to ISP operators in all three countries, that does some special handling before it gets to recursive resolver that kind of doesn't like glueless delegation, [George: right] Okay, good theory. Let's now look at the opposite. I said the drop rate was around eight to 10% which feels about right, except in Bolivia, where the drop rate 60, the DNS gives you an answer of, oh, about 85% of users can handle using glueless delegation queries over V6 in the DNS. But when I pose the same problem as a web problem, and don't forget, the web object itself is dual-stack, only 40% of users actually manage to resolve that name. It's got, whoa, not right. Ethiopia, similarly, 20% in the web, but 85% using the DNS. Myanmar similarly, and it's kind of, huh, I'm lost. I'm like, it's a dual-stack object, but somehow your loss rate in converting from the DNS to the web is really quite high, because most of those users do support queries over V6 glueless DNS showed that we have evidence of that, yet, when you just simply have, this is a V6 only name server, simple question, simple answer, nah. Not George Michaelson 48:31 Again. It's tempting to say, well, they've made a different choice in middleware that relates to aspects of behavior between DNS and web, and it isn't the same as the coding fault. That means they drop glueless. It's some other aspect of this, but the net outcome is web doesn't work even when you have proof it should have Geoff Huston 48:52 but it's my browser. It's my web. The gluing of the DNS to the web happens on my processor, in my operating system in my hands. It's not someone else playing sillies with me. It's me. George Michaelson 49:07 Yeah, what's been sold to end users in those economies that has this effect? Geoff Huston 49:12 And you go, Well, what was the flesh tone? It was a one by one pixel of flesh tone the ends. No, it's white. It's just white. Do they just hate one by one blots? or maybe they do. I don't understand why there's such a loss rate for a tiny, tiny web object with a totally innocuous domain name, but somehow it's attracted the ire of of some kind of intercepting middleware, because that's what it seems to be that has taken an exception. And so, I suppose the issue out of all of this is measurement. Measurement really does rely on some assumptions about the architecture and machinery of what you're trying to measure. George Michaelson 49:51 You come out with a good feel about oh, 80/20, rule. This number is the error bars on the thing that I'm trying to assess. And I tell you. But when you dig down into that 80/20 you find yourself saying, Wow, this machine is way more complicated than I thought, and there are so many if-but-maybes for some number of people in this system, it's different. Geoff Huston 50:14 It's different. And no matter what you think might be going on, you can probably find evidence of that and everything else, which is totally, totally weird, but it does, I suppose, illustrate the fact that assuming a simple model of operation might be fine, but misleading, and taking a more subtle view and trying to actually establish measurements that expose those anomalous cases, I think, leads you a bit further in understanding the broader picture, [George: yeah], the real result about all this is, is the DNS really ready for V6 only? And the answer is, well, it might not be as bad as losing 40% of users, but you probably lose about 30% that's still bad. George Michaelson 50:53 We're still on a trajectory. We need to get a higher number to really say we can head to a behavior that's a bit in nudge theory, if we did this now, we'd actually be a roadblock on people's web experience. Geoff Huston 51:06 If someone did and said, I'm a pioneer, I'm going to do V6 only DNS, the answer is, well, you know, a whole bunch of users won't be able to get to you. Dude. That's just the basic answer, [George: yeah] that won't happen. Exactly how many? That's an interesting path, and it's kind of well, it depends on how you try and measure it is what the answer you get, [George: yeah] which was totally unexpected, I think, a salutary lesson in measurement. George Michaelson 51:28 This is written up on your website, Geoff, you've got a report on this. Geoff Huston 51:32 I wish I could say yes, my hands are still on the keyboard, George Michaelson 51:35 so coming to the website soon, will be a report on this. That's been really fascinating. Geoff, that's great. Thank you, Geoff Huston 51:42 And thank you, and thank you, dear listener for hanging in with us. Thanks. George Michaelson 51:47 If you've got a story or research to share here on ping, why not get in contact by email to ping@apnic.net or via the APNIC social media channels. Also remember the measurement@apnic.net mailing list on orbit. Is there to discuss and share relevant collaborative opportunities, grants and funding opportunities, jobs and graduate placings, or to seek feedback from the community on your own measurement projects, be sure to check out the APNIC website for all your resource and community needs until next time you.