Geoff Huston 0:00 You know, it's often said in places in the world where you have to toot your horn to stop accidents, that you know, the person who has an accident wasn't tooting often enough and loud enough, and that was the problem. And the problem with this bit bonding wasn't the fact that there were engineers down in the voice systems and so on doing things on the circuit. It's the fact that you didn't cross all your toes and all your fingers enough and weren't genuflecting often enough. It's all your fault, equally silly in terms of attribution engineering. But you're right. You know, trying to fit a more stringent application on top of a system engineer for something completely different was always going to be difficult. But the point was, the only way you got better speed out of this system, at least at that level, scunging voice circuits was to try and sort of sticky tape the voice circuits together. George Michaelson 0:56 You're listening to ping, a podcast by APNIC, discussing all things related to measuring the Internet. I'm your host, George Michaelson, this time, I'm talking to Geoff Huston from APNIC labs again in his regular monthly spot on ping Geoff and I talked about making things go faster than the visible bandwidth of a single link. How is that even possible? We've been using the same basic techniques to manage this problem since the dawn of digital communication. We send different bits of the data, stream down different links and reconstruct it at the other side. When you come down to it, there isn't really a good alternative, but this introduces a problem of sequencing. What do you do when the bits arrive out of order, or worse yet, not at all. Different protocols and different layers in the protocol stack have now come to the fore in this engineering problem. Geoff reviews some of the past history and the kinds of problem we're now hitting as the underlying network gets faster and faster. Geoff, welcome back to ping. What should we talk about this time? Geoff Huston 2:03 Oh, hi, George. Look. It's good to be back today. I actually want to talk about, how can you make things go faster than the components underneath them? George Michaelson 2:12 Throw them harder. Geoff Huston 2:13 Yeah, blow at it harder. How do you kind of extract greater performance than any individual element can? And this is not a problem that's unique to networking. I remember in the days of mainframe computers, which, you know, there are still few of them still around, I guess, and the way they kind of solved the problem of, how do you make a bigger, faster computer, was kind of put two of them together double the speed. That's not linear speed, but it certainly double the through point. George Michaelson 2:43 Do you remember Seymour Cray's comment about people who were ganging up processes? He said, Would you rather have two oxen or a million chickens pulling your plow? So there's kind of history here that scaling up by doing more in at the same time, there's some skepticism. Geoff Huston 2:59 I have news for Seymour, I really do. And the answer is a retrospective high from a million chicken world, because, you know, we went there. George Michaelson 3:10 Let's put it back in the network, in context, though, what about a network and the limit of speed that you've got available? What is it here that's allowing you to get faster than you think you can. Geoff Huston 3:21 Well it's exactly the same approach with computing that, in essence, if you're trying to get more jobs through a computer, and not necessarily this job to go faster, but just more jobs per hour, then assigning more computers to the problem, more processes, etc, going parallel, actually solves your problem. Now, in some ways, it's not the most efficient solution. Typically, when you scale up, you get benefits that are way in excess of linear and simply adding more plus one plus one plus one is more of a linear problem. It doesn't make the unit costs any cheaper, but it does solve your problem. Now we've encountered this in networking, probably from the year dot. And by dot, I mean, you know, somewhere around late 70s, early 80s, we started building computer networks. leaching parasitically On the back of telephone networks. And at the time, the telephone networks are just going through a tranche of digitalization, that rather than carrying analog sort of streams of signal for each voice conversation, we actually digitized them. And inside the telephone network, if you could peel apart all these virtual switching layers, what you find is that each conversation was actually a stream of bits, right? Yeah. And the way the human voice works most human voices is that, as long as kind of the basic aim is intelligibility and clarity, you can digitize a human voice stream by doing, oh, 8000 samples a second, George Michaelson 4:58 really quite a small number. When you think about it, it's not a massive amount of sampling, is it? Geoff Huston 5:04 Well, George, I don't know about your soprano voice, but mine wouldn't even make eight kilo hertz, George Michaelson 5:11 yeah. The thing is, both of us worked at the tail end of this process in telecommunications related roles. I was working for a minor player, while you worked for the major player, the former monopoly telco, and we actually exploited a niche in the market of ideas here. You the big guys. You were meant to use the full width of a voice channel to send a message between two points long distance, because otherwise there would be too much quality loss this weird idea called the QDU, and we, as the on top carrier, we were allowed to compress and send either two or four voices down the same channel. And it was a really funny business working out, oh, the quality was not good at the four channel end of things, Geoff Huston 5:58 yes, I was getting to 8000 samples a second. And in the original spec, each sample effectively said, I'll give you one of 256 different volume levels. And so every 8000 of a second, it sent an eight bit value, which was the volume at that point. And that would actually accurately reproduce the signal of anything up to four kilohertz, and your best soprano voice and mine can't go above about 3.9 kilohertz. So it kind of worked, right? But the issue is, yes, you can, with enough cleverness in the way you encode it and decode it, cram more in. And as soon as sort of this standard came along, then along came seven bits at 8000 hertz, probably enough used by the Americans, 56 kilobits. And then came four bits at 8000 kilohertz, which is two to one compression. And I even got this further, actually getting down to two bits at 8000 hertz, which is one of the higher compression algorithms, and they used that in the mobile phone industry, and just sort of squish the voice down to the point where, as you say, it was pretty heavily disordered, [George: yeah] but by God, it made use of the network. Okay? George Michaelson 7:18 We stopped talking about QDU use very rapidly in this world, we stopped actually having some idea of the the amount of nastiness in the voice call we would tolerate, because we realized people would actually tolerate a hell of a lot more nastiness for a cheaper call. Geoff Huston 7:36 The telco benchmark, I think, was 13 QDUs for any conversation, including international, which was a bit of a joke, really. But I come back to our story. You see, we started networking as an overlay across basic circuitry, which was, I'll strap up a to b and I'll nail a voice call, because we are a voice switching network. And if you lived in America, that was the 56 kilobit per second line seven bits and 8000 cycles a second. Oddly enough, because of framing and signaling, in the sort of European slash British oriented world and included in Australia, you got 48 kilobits because they sold two bits on each frame for signaling and everything else so fast as you could go from A to B was 48 kilobits. Interesting. I want to go faster. Fascinating. What are we going to do now? I can't give you a single 96 kilobit per second circuit because I don't have one. I can give you 2 48 kilobit per second circuits. Great. Okay, what can I do with them? George Michaelson 8:45 How do you manage having two things that are kind of half what you want? Can you use them in a way to make it look like one thing? Geoff Huston 8:55 Well, that was the first kind of approach, and it persisted for quite some time in the industry, but only really worked kilobit per second rate, which was called bonding, where you actually look at these two circuits and you assume fingers crossed, toes crossed, you know, face in the appropriate direction, at genuflect many times, that both circuits will remain absolutely stable. And then you go with your bits, 1010, your AB, AB, AB, AB, and hope the other end locks in and that it sort of puts the bits back together in precisely the right order. George Michaelson 9:32 Yeah, because an AA, BB moment isn't doing you any favors whatsoever here, right? You need to be able to reliably sequence everything you write down this. Geoff Huston 9:43 yes, I remember, we did a leap from Melbourne to Perth about a year and a half on bonded 64k circuits. We managed to get as high as 256 kilobits per second. But beyond that, it's not going to go there. So bit level bonding kind of works but the underlying fabric needs to be absolutely rock steady, because otherwise the receiver is kind of clueless. George Michaelson 10:06 Yeah, and you have to also remember, this is technology that was never, ever constructed and built out to be clean, for digital. It was built for voice. It was built for analog signaling. And an engineer in a switching fabric could have reasons that he needs to randomly put another 200 meters of cable between two points in this system for a number of reasons. These two things might no longer actually have the same imaginary length between A and B. There are any number of reasons they could get out of sync with each other. Geoff Huston 10:37 You know, it's often said in places in the world where you have to toot your horn to stop accidents that you know, the person who has an accident wasn't tooting often enough and loud enough, and that was the problem. And the problem with this bit bonding wasn't the fact that there were engineers down in the voice systems and so on, doing things on the circuits. It's the fact that you didn't cross all your toes and all your fingers enough and weren't genuflecting enough? Yeah, it's all your fault, equally silly in terms of attribution engineering. But you're right. You know, trying to fit a more stringent application on top of a system engineer for something completely different was always going to be difficult. But the point was, the only way you got better speed out of this system, at least at that level, scunging voice circuits was to try and sort of sticky tape the voice circuits together. George Michaelson 11:27 Yeah, use them in a way that, although you absolutely knew at some lower level, these are four discretely different things, you treated them through infrastructure costs that you had to wear both ends, you treated them as one fatter pipe, four times the width, probably losing a bit of overhead managing this thing, so maybe three and a bit times better. But Geoff Huston 11:49 yeah, you can actually get all four if you use the technique in a router rather than in a bit level driver. So instead of down at the line driver circuit, you're going a b, a b. Christ, I hope this is working, A, B, was it a, or was it B? A, B, instead, up at the router, you have interface a and interface B, and your currency is now packets, [George: right] And so you can take these individual packets that you're switching through the router and a naive person, a very naive person and silly, as it turns out, might simply go, you're all destined to the same next hop. You're all going to go to the B end, whatever your ultimate destination would be, packet a. Interface, one, packet B, interface, 2 121212, and because the maximum size of packets is actually pretty low, 1500 bytes or so, the two systems won't get out of order that much. You know, you won't be transmitting a huge packet for three hours down one line while the other line is taking all the load. You won't be doing that when you do sort of packet level, you know, alternate switching, things don't get out of order too much, but they do get out of order, right? George Michaelson 13:00 So there is a cost consequence emerging here. Geoff Huston 13:04 Well, let's think about now going up a level in the protocol stack to our good old friend TCP. You see, TCP is the piece of magic. So far, when I'm doing a B, A, B, packets might get out of order. I have violated none of the rules of IP. You know the golden rule? It's a datagram network. A, I'm allowed to drop packets what, of course, I am every packet is an adventure. B, I'm allowed to reorder them, really, Yes, yep, the data grab. George Michaelson 13:33 No guarantee of order of delivery, if you're just at the IP label, Geoff Huston 13:37 yeah, no guarantee they won't get duplicated. All kinds of things happen at IP, and all of them are legal. All of them, it's TCP job to try and make sense and say, here's an ordered sequence of packets, 123456, and present them to the application correctly. It's the engines at either end of this network connection that have all the work to do, right? So they're sitting there analyzing individual TCP packets that come to them, going, I guess you're the next one in order. That's good. I'll acknowledge it. Tick, send back an ACK. You know, I've got this packet. And that's how you kind of reassemble things. And when you go, 1 2 4 5 6, it's going, Oh, I'm missing number three. Hello, hello. The last good packet I got was packet two. Let's resume at packet two and press on, which seems a bit useless, but it kind of works, right? George Michaelson 14:32 Well, it kind of works, but it incurs this problem that it's almost like you throw away all the subsequent acts. If only you could come up with an engine that said, I'm going to hang on to things a little bit longer and maybe ask just for the hole to be filled in. You could do a little better, but that means you got to hang on to things you're adding delay. Geoff Huston 14:52 Well, there's an odd part about all of this. So in TCP, an out of order packet, a packet that kind of is. At the next one in sequence is treated as well. I'll hang on to it, but I'm going to send you back an indication that there are lost packets. So when I received packets 1 2 4 5 6, the real information I need to convey back to you is I lost it at packet three. I was expecting three. I got two, but I didn't get three. Whatever else you've said, I'm missing three. So what I do is I ACK number 2 4 4 5 6, so the ACK stream goes one, ACK two, ACK. When I receive four, I go, two. ACK. When I receive five, I go, two ACK. When I received six, I'm shouting at you. Doesn't matter. Well, it does. The theory is you, because you keep a copy of everything you set until it gets ACKed, realize eventually, because, you know, I'm getting a bit branch of this point, the three hasn't been ACKed. So you move back your send pointer and you send three. Now, if you just send three and nothing else, in this sort of theoretical model, I put three in the hole that I just have, [George: yeah], and then I notice that I've got, I received three, and I've got four, and I've got five and I've got six, ACK six, George Michaelson 16:19 right So you can fill in the gap at the cost of keeping enough buffer to hold those packets right, and there's a bit of delay here. Facing upwards, you can't pass things on in clean conscience til that hole has been filled Geoff Huston 16:33 right. So you need a fair deal of knowledge and time of sensitivity that you know, quite frankly, doesn't exist very much. So out of order packets are a nightmare. TCP horrendously confused. They cost because out of order packets are assumed to be lost packets, and in the worst case, go back at the loss point and send everything again. George Michaelson 16:53 Yep. So I figure Geoff, based on your line of reasoning, that where we're going with this is, well, if you think you get out of order packets when you've got one thing underneath you? What do you think happens when you've got two or three or four, right? Is that where we're going? Geoff Huston 17:10 Well, that's part of it. The problem gets magnificently worse for TCP, right? Magnificently worse because once you get three duplicate ACKs in TCP, the conventional TCP, the TCP of the biblical age says, ah, that's catastrophic. Shut down. Everything. Start fresh. Let's do a slow start again. You give it per second. No, no, no, not anymore. Close that stuff down. Shut down, because I've just had three duplicate ACKs. So this is a disaster. One packet again, George Michaelson 17:44 three in a row. Hello. That's not good. Geoff Huston 17:47 So you were ganging together four lines, and if you lost one of the packets on one of the lines, you guaranteed you were going to get three duplicate ACKs, if you're just doing it that way. So you think about this, and you think something's missing, and this is not working for equal cost multi path. What can we do? George Michaelson 18:03 Equal cost multi path? That's kind of the name for the situation when you know you've got four things, and you think they're pretty much the same weight, and you think they're the same delay, and you want to treat them as splits of something. You're going to gang up together to make a fatter unit to get between two places. Geoff Huston 18:22 I've got 4 64k circuits, and I want to mimic a quarter of a meg. I've got four ten gig circuits, and I want to mimic 40 gig, Yeah, same problem, just 1000 times faster, but same problem. George Michaelson 18:34 So equal cost, multi path. That's a great phrase. I'm going to hang on to that Geoff Huston 18:38 well, fair enough. And the issue is, what could you do about now, before I talk about going on with TCP, I just want to talk for a second about that evil word fragmentation. IP fragmentation. George Michaelson 18:53 Oh, we have talked about that. We have talked about that so many times, particularly about IPv6 Geoff Huston 19:01 Yes, I was gonna say evil, particularly if you're very got your v6 glasses on your head. Yes, very evil. But oddly enough, it is actually amazingly resilient for packet reordering. George Michaelson 19:14 Wait, run that one again. Frags are bad, but, but Geoff Huston 19:19 when I effectively take a set of frags of one packet. So I have a single packet, and I apply the fragmentation dicer, and I slice and dice this packet, and then I send them to you in purely random order. No attempt to do any kind of ordering, none. And as long as I send them continuously. I sort of push it out through all my paths simultaneously. Yeah, the other end goes yummy, yummy, yummy, not a problem. urgle urgle, urgle high up a layer is the completed packet. George Michaelson 19:53 So it'll de duplicate them, it'll reorder them, and it'll pass up a valid packet, if you just do. Your worst and chuck the fragments out there right, Geoff Huston 20:03 because there's a piece of information in an IP fragment that isn't necessarily there in a TCP fragment. And it's actually, if you will, the fragments address within the packet that you just sliced and diced. Hi, I'm fragment number three. I'm fragment number seven, I'm fragment number two. George Michaelson 20:22 The fragment knows where it fits in the entire packet. You know, you're building up a picture of what you've got and what you're missing. Geoff Huston 20:29 Yes, and if you actually had used that kind of sequencing addressing in TCP, TCP would be equally resilient. But we didn't do that. We used a single sequence counter, which doesn't give you much information at all, whereas in IP fragments actually use two. I'm packet number, n of m, fascinating. So one of the ways to get around this is to fragment everything. George Michaelson 20:56 I don't think really going there. Geoff Huston 21:00 I'm not. I'm not. The other way of doing this actually, is to make, I suppose, an assumption which the purists of the world, and I think the last purist died in about 1980 the purists of the world would say, is a layer violation. You reach in and look at the header of the transport session. Oh, look, there's a source address, a destination address that's in IP there's a source port and a destination port, no, that's a TCP or UDP construct, and there's a protocol number, I'm TCP or I'm UDP. So the protocol number, the two port numbers and the two IP addresses, are actually unique signature of a session. [George: Yeah]. So if you and I are having two conversations simultaneously, one of those values would be different, [George: right] Probably one of the port numbers. So George Michaelson 21:51 the packets arrive at the air quotes same place because the same source and destination address are on it. And it's possible one of the port numbers is the same port number, but one of the other ones is going to be different, because someone has said, Give me a random port to send some stuff. You get a different random number. Geoff Huston 22:09 So let's say I take this value now. That's two addresses, 32 bits, two port numbers. That's another 32 bits. Yes, 16 plus 16 plus 30, sorry, 2 32. 64 I can't add the staff this evening. 64 plus 16 and one, yep. 81 bits, and I hash it. Let's say I've got four lines, 0 1 2, and three. I hash this into a value between zero and three. George Michaelson 22:38 We're going to have to start calling this podcast the hash podcast, because you were talking about hashes in the context of NSEC3 DNS last time, Geoff. Hashes are unbelievably powerful and useful, aren't they? Geoff Huston 22:52 We should pay mathematicians more money. They're brilliant people, the life blood of civilization. I speak as a reformed mathematician, George Michaelson 23:01 I was just about to say, what is your degree classification? So you take the four unique values, you get an 81 bit number, and you generate a hash from it, Geoff Huston 23:10 And the same sort of values in that 81 bits will always give you the same hash, always right. But a random selection of ports, and, you know, destination addresses, etc, will give you a pretty good distribution across your, you know, zero to three across your four port numbers. So what this means, oddly enough, is, if you can do this quickly, and it's really easy to do, take 81 bits and hash them, return back, basically two bits with zero through four, however many I want. That's the selection which interface to use. Every session will go down. The same interface all the time. George Michaelson 23:49 Wait, wait, wait, wait, so if I'm trying to load balance, things happening through me, and I've got a cheap way of making four bits, four values in two bits pretty much cover an even distribution of 00 01 10 11, and the other thing you just said, the other part of this deal, is that if I am a particular pattern of source, destination, random port, specific port, I always wind up In the same two bit value, which means, if you're using that to address the port, I will always go down the same line, always Geoff Huston 24:28 right? So if I've got equal cost multi path across 4 10 gig paths, a single session will only ever see a 10 gig path. But if I've got a whole bunch of traffic going down these four by 10 gig circuits, which are equal cost, multi part a bunch of traffic, the aggregate throughput, I can push up to 40 gig. George Michaelson 24:49 You're moving the goal posts a little. I mean, not in a bad way. The net outcome is beneficial. You make efficient use of all the bandwidth you've got available. So. But no individual person for one TCP session is getting 40 gig. That ain't gonna happen. Geoff Huston 25:07 That's a very magic phrase you just said there, George, no individual TCP session can get more than one of the individual paths in my path group. George Michaelson 25:18 On the other hand, you are using your four links absolutely as efficiently as possible, even spread of traffic, very few wasted moments. It's quite simple for TCP to reconstruct what's gone on down that pipe less Geoff Huston 25:33 well. It's full of sequence, no out of order packets. So for a little bit of layer violation, little bit of layer violation, routers are now looking at TCP headers, whoops and UDP. A little bit of that, I get back an amazing, an amazing amount of payback. You know, how do we do these days, 800 gig circuitry? George Michaelson 25:53 Well, you can't buy 800 gig things off the counter. Can you you have to buy multiples and gang them up together somehow, Geoff Huston 26:01 you just said it. That's how we do it. And so a lot of this actually relies on this ability at the network layer to look inside the end to end transport layer and go, Look, I don't want to be a meanie here. I don't want to be a bad person. I will try and keep sessions together so that I don't give you a flurry of out of order packets, and then we're cool, aren't we? And the answer is, well, basically, apart from a few folk bleating in the corner down the back of the room, going, No, I'm not happy. George Michaelson 26:30 I'm not happy. I wanted all of it. Geoff Huston 26:33 I wanted all of it. George Michaelson 26:34 Can you do better? I mean, that's the golden question, right? Is there a moment out there where we could do better? Geoff Huston 26:41 Well, this is the issue of where do you want to do it, and by where, I mean network, transport, application. Course, you can do better. You just got to look at the right spot. And in this particular case, if you're using TCP, if you're using TCP and you really want to get much greater bandwidth across some of these scenarios, you use a very elegant, and I'll call it a hack, but it's actually an elegant piece of engineering called multi path TCP. George Michaelson 27:10 So we were on equal cost, multi path IP, and we are now on multi path, Geoff Huston 27:17 multi path IP, George Michaelson 27:19 multi path TCP, Geoff Huston 27:21 right where I establish multiple TCP sessions between the same two end points. Now they'll have the same source and destination IP whoopi doo, but they'll have a different port number of those pair reports. That's how you distinguish them. So I get a session to you with port one, another, session with port two, another with port three, oh, port 128, who cares? George Michaelson 27:45 Yeah, it does not have to be that the port numbers are in a sequence that is not part of the magic source here. The point is that there are four things each of us know is the other party. And if we talk into them, the other party gets what comes out. But Geoff, these are TCP sessions, so each of them is a reliable stream of bits, and I've now got to divide what I want to do into those four things, right? I mean, am I having to consciously chunk my stuff up into this? How do I do this? Geoff Huston 28:16 Yes, you are going to chunk your stuff up, and you're going to assign a chunk down a path, and so you've got independent TCP control sessions, and of your data you take, you chunk it up, and you assign a chunk to a single path. And that works actually better than you thought, much better than you ever possibly could believe. Why? Ah, there's this thing about TCP and friendliness. You see what TCP tries to do when there are multiple independent TCP sessions. Let's say there are 20 of them, is to equilibrate amongst them such that each independent TCP session gets its fair share, 1/20 of the bandwidth. George Michaelson 28:58 Yeah. Now there's a twist here. We've talked about a few times on ping that that really depends on all the TCPs having a similar idea of what fair means. But let's push that to one side, that they all agree about what fair means. They do a fair shares div between them, Geoff Huston 29:18 and with my single path TCP, I'll get one nth of the common network resource tops. [George: Yeah] with two TCP sessions, I'll get, hmm, two nth. George Michaelson 29:31 Because for the network, it doesn't know that these two things correlate. As far as it's concerned, it's got to work with you TCP to make fair shares happen, so you get two of the units. Geoff Huston 29:43 And in fact, it's not a conversation between me and the network TCP. Fair sharing is a conversation between me and all the other TCP sessions. So if I can exert more pressure on the network, I get more value. And by having, aww. 20, 100 1000 multiple TCP sessions. I'm the bully in the block. I'm pushing everything else to one side. So not only does it kind of get me around this issue of I can only go the speed of a single path in a multi path environment, I can actually dominate the multi path environment, woo hoo, and there's no way the network can actually arbitrate that without introducing more complexity, George Michaelson 30:28 right? But I've now got complexity, so I want to send two gigabytes of buffer to you, and I've constructed four TCP sessions between you and me. What tricks have we got on the shelf that will allow us to divvy these up amongst the TCP sessions? Do we do a version before pushing it out to TCP of that trick you were doing hashing on values and using the bottom bits as a distribution across the path? How do I manage this Geoff? I've got to do it Geoff Huston 30:59 Oh it now you can actually do this in much the same way as Bit Torrent works. A file is just a sequence of blocks on a disk, and let's think of it as a queue of transfer requests. Each one is a standard size, whatever the size of the file system might be. Let's say it's a kilobyte so there's a queue of one kilobyte requests. Now I have 10 multipath TCP sessions. So I assign the first 10 blocks to the first 10 TCP sessions. Off we go, yep, [George: yep]. First one to finish its block gets the next block. The second one to finish gets the next block. And so I don't pre determine which block goes through which path, [George: right] awesomely fast, it gets more blocks. And if one path is awesomely slow, it gets far fewer blocks, George Michaelson 31:49 Right And I've sat watching the visual display of how your blocks are moving in a bit torrent type situation, I use it to download ISO images when I'm doing operating system upgrades, and it's kind of a weird spatter gun of blocks being sent. It's not a linear order, so it fits the file model Geoff on the presumption I need all of the file to get my job done. But there are use cases in data transfer where this isn't necessarily a brilliant fit. Sometimes you want to work on partial content, and you've now got to back off and think, Okay, I've got limits on exactly what partial means here, because if you want to work on the first 10 gig of this file, I can't randomly send all of the back of it. I've got to know that that's what you want to do. So there is a bit of kind of application awareness complexity that comes in here, Geoff Huston 32:40 right But it certainly solves a problem, doesn't it? George Michaelson 32:43 Oh, my goodness, I'm getting this file through 10 times faster, Geoff Huston 32:47 exactly for a particular class of problem. By utilizing the application layer, you can put more pressure on the network layer to actually improve your position against everyone else. Mine how you add but yes, improve your position. Is this a George Michaelson 33:02 theoretical model, or is this something people actually went out and coded? Geoff Huston 33:06 Oh, people coded multi part TCP is out there in the RFC standards. There are implementations, I dare say, if you foster around, you'll find it certainly for things like Linux. It was said that Apple implemented it for Siri. And I mean, but looking at Siri with a Packet Tracer and actually saying yes, Siri was implemented using TCP multi part bizarre. George Michaelson 33:28 I think I remember a conversation with you about how quickly Apple were able to ship downloads to people, and it looked like if your handset, your phone, knew it had both WiFi and cellular, they were doing some things to use both channels simultaneously to fetch those assets. Geoff Huston 33:44 And you're willing to, if you will, tell the device that the cost was the same in dollars. [George: Yeah], use it both. And as soon as you could say that, use them both. I can, I can actually use the WiFi and the cellular data connections and run the two together and just simply, whatever's fast just gets more data, but I can use them both at the same time. It's quite clever. George Michaelson 34:05 Wow. So this is not a theoretical thing. This is out there in the world as a real way of using more bandwidth underneath. Geoff Huston 34:14 It didn't get an awful lot of traction. Syria was almost the only app that used it in the Apple ecosystem at the time. I'm not sure it's expanded any further, but it's sort of an interesting case in point. But I want to progress the story. George Michaelson 34:26 There's more. Geoff Huston 34:28 Oh god, there's more. You see, the next thing to come along, and it's now been 10 years, is QUIC, right? George Michaelson 34:34 QUIC being another mechanism to get reliable bites between things. But it's kind of not TCP, is it? Geoff Huston 34:41 Well, it's TCP hidden inside UDP through encryption. So what you actually see is just a set of UDP packets. You can't see the TCP control systems. It actually has the support for having multiple TCP but it's all inside one single UDP flow state, you know, right? George Michaelson 35:06 Yeah, but UDP is not TCP Geoff. Geoff Huston 35:09 How do you load balance? QUIC? George Michaelson 35:10 Well, UDP is not TCP, so you can't do the trick of using packet tricks in TCP specific ways to get reliably the same TCP session down the same path, all the TCPs inside this UDP, so yes, Geoff, how do you make this fly? Geoff Huston 35:28 It's really hard because all the Embedded TCP sessions inside your QUIC session, fate share, all of them fate share down one path. George Michaelson 35:36 This isn't sounding good. Geoff Huston 35:38 Well, it's kind of a bit of a step backwards, because, again, many things are trade offs here, and by deliberately not exposing your structured flow information to the network, the network, it just cannot do compensation for multi pathing at the network level. It can't all the packets with the same UDP header sort of set go down the same single path you go. Well, I can live with that. The answer is, can you? George Michaelson 36:06 Sounds like we've just decimated the available bandwidth. For me as a consumer, the great, a good outcome might be all these UDPS are equipoising in the network providers balancing them over links, but I'm not seeing the benefit as me as an individual, Geoff Huston 36:22 that's kind of where it's heading, isn't it? And it is a bit of a to and fro between what you expose to the network and what you get as a benefit back from the network. But the issue is, quite frankly, you and I aren't given a say. George Michaelson 36:36 Hang on. What do you mean? Geoff Huston 36:38 Your browser, statistically made by Google called Chrome. George Michaelson 36:42 Yes, that's true. Geoff Huston 36:43 Is making decisions on your behalf along these very trade offs and seeing what's best for you. And you don't get believers. You don't get anything of that. They don't George Michaelson 36:53 right. So there is a sort of fantasy that we have a degree of control over these decision logics on how things are done. If you are a technologist, you can probably write systems to exploit this. But in the general, ordinary case of applications, on phones, tablets, laptops like this, it's not my call, it's some intermediary writing code, deciding how they're going to package it, and if they decided QUIC is the one, I'm not getting any multi path outcomes from what you're saying, there isn't a wait, there's more moment here. Is there Geoff Geoff Huston 37:24 not down this particular path? No, am I? As I said, I think this is trade off around we've decided we can't keep on increasing the clock. Clock speeds haven't gone any faster for decades now, and so the way we get more capacity out of the network is by going seriously parallel. Seriously. The other part of this so is also a look at how we do routing protocols, because that's the other problem inside all this space. The conventional view of a routing protocol is there is one single best path, [George: yeah]. So imagine, you know, peak hour, and everyone's going back to their house. They all live in the same spot for some reason. So what routing says is, you're all taking freeway 101, but there are other ways to get there. So what routing says there is one best part that is the best part, but it's clogged, it's full No. Routing says, George Michaelson 38:17 Do you remember the time you Me and Paul were going to IETF in San [Geoff: San Diego], San Diego, and Paul said, Let's drive then he couldn't go. We wound up with a car on i 405, in rush hour. And once you're on it, you can't get off it. You are basically on that road, crawling at 10 miles an hour until you get to San Diego. What a road trip? Geoff Huston 38:41 Well, that's what routing does to you. In packets. On networks, they don't have any feedback as to what's the best path. There's no Google Map View. There's no outer level that says That road is completely congested. Take another path. Routing can't do that. George Michaelson 38:59 Current routing the way we use it right now. BGP routing doesn't do that. Geoff Huston 39:04 Oh, every attempt to try and fold that kind of load feedback, and I'm going way, way back to the mid 80s with the HELLO routing protocol and similar where you try to sort of include a factor of its loading and efficiency, and give you a routing system that was a path that was the best path on the day at the time that second. [George: Yeah], the problem is feedback. Hey, everyone use I-102 oh shit. Everyone's using 102 Hey, everyone use 101 Yeah, 102 101 102 101 George Michaelson 39:42 and we do actually sometimes see in BGP evidence of people who are using some kind of weird traffic management framework. So between 10 and two, they do pattern A, and then between two and four, they do pattern B, as if someone is actually changing the levers, cranking it to. try and make a marginal benefit on it. Geoff Huston 40:02 It's been observed many times that amongst the lists of the most dynamic BGP updaters, the folk who spend all their time updating their BGP routes. A lot of it is attributable to the so called route optimizers, [George: yep] unless you've got your feedback loop tuned really well, and most of them don't. As soon as you push traffic one way, that creates traffic, and that then means you push it another way, and that creates traffic in you, and so on. And so nothing is stable. George Michaelson 40:31 So we've had this beautiful conversation about the ways you can multiplex up multiple things to get an efficiency gain and achieve something close to the best possible bandwidth between two points, and you're now saying, because of the way we do BGP and the idea of best path as a single thing, we can't do that in the routing plane. We've got no trick in the armory to make routing share load across two links that might otherwise look like really good choices, because we select one of them as best path. Well, what are we going to do? Geoff Huston 41:05 I actually think the answer lies in this area of QUIC and TCP multi path. And I suppose the observation is a pretty simple one. Think of the amount of silicon processing per packet in my end device in my laptop, it's a super computer. It has an enormous amount of processing capability, and its packet rate is not that high. You know, I can do a lot of clever tricks at this end with my traffic, particularly if I've got choices in the way. I can mark things. So if I establish 20 different sessions using, you know, multi path TCP, I can play with stuff because I have the process in power to do. So what about inside the network? What about in a router that has, I don't know, 40 800 gig channels connected to it? You can't breathe. There's no oxygen. You are just panicking. The packet rate is so high you've got about two cycles per packet, if you're lucky to get rid of the bloody thing, you can't be clever in the core of today's networks. You just can't George Michaelson 42:06 it's not the place to put these kinds of complexities into the story. Is it Geoff Huston 42:10 right? Your silicon is barely keeping, you, know, on par with fiber, and so all we do is just hunk in the fiber parallelize it hunk in dumb silicon, and do basic routing tricks with ECMP, and that's all you've got, and then basically hand the problem off to the edges of the network, saying, if you really want speed out of this, you need to be clever. The network isn't going to do it for you, George Michaelson 42:33 but you're kind of hinting at the possibility that, with a little bit of knowledge about path diversity, exposed to an application, and for the right kind of traffic behavior, I could chunk what I want to fetch into four different IP sources, which I know lie on diverse paths. And I could play this game a bit like BitTorrent does with multiple sources for the hashes, and get the benefit of discretely different paths if I'm prepared to do a little bit of work. Geoff Huston 43:03 Do we ever talk about Explicit Congestion Notification? George Michaelson 43:07 We might have talked about that recently. Geoff, yes. Geoff Huston 43:10 So here's a signal coming back from a path and a session that says the path you've chosen is getting a bit busy. And if you continue to do this, I'm going to drop a packet. That's what the bit is saying. And if you fold ECN with multi path, and if you make a large diverse collection of paths, you can actually and if the network is actually doing ECN, you can actually get a decent stab at this kind of highly adaptive end point where it's actually the end hosts. It's doing all of the work in trying to optimize its performance through a network that has considerable diversity. If I'm the end host, brilliant, nothing could be better if I'm the poor. Be nice at network operator, trying desperately hard to minimize my bills, maximize my throughput. Whoops, I've just handed the keys of the car to the application to the host? [George: yeah], I don't feel very good about that. George Michaelson 44:07 No, it's an interesting question. Who's in control of this bus at this point, it's a bit like there are many hands on the steering wheel. Geoff Geoff Huston 44:15 Oh, I think there's only one. And as usual, I'll end it by saying everything is answered with money the richest people on the planet these days build the end systems. You know, the Googles, the Apples of this world, the chrome and so on. The poke at the application level, the stack are at the driver's seat. [George: Yeah], the network operators are being hit around the face with wet fish continuously with every new shock. And it's kind of it's a sad life. It is a sad life, but, you know, that's the way it's panned out. We kind of put the power at the end point, so that's where we're living. George Michaelson 44:48 Not a bad place to be. In some ways, I'm surprised by how complex this story of doing things in parallel gets Geoff. That's been really fascinating. Thank you. Geoff Huston 44:57 Well, thank you and dear listener, if you've listened this long, Thank you for your persistence. George Michaelson 45:03 If you've got a story or research to share here on ping, why not get in contact by email to ping@apnic.net or via the APNIC social media channels. Also remember the measurement@apnic.net mailing lists on orbit is there to discuss and share relevant collaborative opportunities, grants and funding opportunities, jobs and graduate placings, or to seek feedback from the community on your own measurement projects, be sure to check out the APNIC website for all your resource and community needs until next time you