Robert Kisteleki 0:00 These probes are the essence of the network. These probes do the measurements themselves. When you plug them in, they connect to the central infrastructure and basically register to say, Hi, I'm ready to work right. The central infrastructure knows about these probes, so you cannot be a rogue probe. So they connect to the system proactively. We cannot connect to them, only they can connect to us. And this might be a good place to say, these probes are not designed to supply any kind of service to the local user. Their only purpose is to talk to the central system and say, "What shall I do now?" Then the central system says, Well, I have the following measurements that you should be running. Pings, traceroutes, DNS and the rest. George Michaelson 0:48 you're listening to ping, a podcast by APNIC, discussing all things related to measuring the Internet. I'm your host, George Michaelson, this time I'm talking to Robert Kisteleki from RIPE NCC. Robert is a principal engineer, the product owner and technical lead for RIPE Atlas. Many of the Internet measurement studies we've discussed on ping have used Atlas as the basis of their research. It's become a central plank of measurement between different parts of the Internet across the range of protocols since its launch at ripe 61 held in Rome in 2010 but how does Atlas work? What's the history of the system and how do its components fit together? Robert was one of the key initiators of the service, and continues to architect its design and behaviors. He's the perfect guide under the hood looking into this measurement system. Robert, welcome to ping. Robert Kisteleki 1:44 Thank you. Thank you, having me here. George Michaelson 1:46 Can you tell everyone a little bit about yourself? Robert Kisteleki 1:49 Yes, I'm Robert. I joined the RIPE NCC a long, long time ago as a system architect to work on RPKI, the evolving RPKI back in the day, and then moved on to what is now called research in the RIPE NCC, and worked as the manager of the research and development team and worked on measurement systems and the like ever since. George Michaelson 2:11 So you are now in charge of the Atlas system. Robert Kisteleki 2:14 That is correct. I'm the moment, the principal engineer, and my role include being the product owner of RIPE Atlas, and basically the technical lead as well. George Michaelson 2:23 Now, listeners to PING probably hear us talking about Atlas all the time. So in some way, I think this is redundant, because I think it's a given, if you're in measurement in the Internet, you know what Atlas is. But just in case, can you kind of give us the top level view what is Atlas? Robert Kisteleki 2:41 Yeah, RIPE Atlas is an active Internet measurement network using a whole lot of distributed nodes all over the world to do the actual measurements. We collect these measurements and present it to the users, and also let any user of RIPE Atlas execute these measurements using the whole platform. George Michaelson 2:59 So you said active, it's not a passive data collection method, like things which are fetching the BGP state and just recording it. You are actually using systems to make traffic happen for the purpose of measurement. Robert Kisteleki 3:12 Yes, indeed. Very early on, we decided that it is in some sense safer just to stick with active measurements. But also our major user groups really are interested in doing the active measurements, as in reaching out and seeing what happens when you send out this packet, what is the reply when you want to do a DNS measurement or a ping or a traceroute or something like that. And this has a lot of consequences, of course, but we thought that this is still better for the constituents George Michaelson 3:41 the project's been running quite a long time, hasn't it? The origins of this back in the early 2000 Robert Kisteleki 3:47 Yeah, that's indeed the case. Originally we started thinking about this around 2007, or eight, when we saw a lot of requests at the time from various network operators who mailed, for example, NANOG and various other mailing lists, and said someone told me that this issue is happening when you try to get from A to B. Is there someone who can do a traceroute for me, please from that network? And we figured that, well, maybe there's a better way of doing this, a bit more democratic. George Michaelson 4:15 So at that time, people would have been used to maybe using Looking Glasses, but they were in very restricted places, perhaps in central facilities of large ISPs. They couldn't always detect problems that might exist between two random points in the net. Could they? Robert Kisteleki 4:31 Yeah, that is one of the cases they but it's, it's still, I think the bigger problem was trying to find someone in the network where you think the problem is, or that is maybe close enough to the problem that can tell you something about what the problem is. And most of the time, this was used for actually problem discovery. George Michaelson 4:49 And so there were some systems available that provided a kind of distributed view of the network. At that time? Robert Kisteleki 4:55 there were some systems, but they were mostly used for research. And for example, Planet labs is one of them, which at that time had in the ballpark of six or 700 servers, and various Masters or PhD students could use that to discover things. Ever since then, it shrank a bit, and I think that was defunct, but at that point, that was the closest possible or available to a distributed system where you could say, Do this query for me. From that point of the world, George Michaelson 5:26 there's this aspect of the RIPE NCC. I mean, you work in a Regional Internet Registry. I work in APNIC, a Regional Internet Registry. But RIPE NCC actually existed before the RIR system was established. You were kind of always there for research and operational best practice, and these things weren't you. You were a pre existing community. Robert Kisteleki 5:47 Yeah. Indeed, the rapid NCC was founded not only to be the RIR of the European and surrounding regions, but also do a little bit more. We also run K Root, one of the DNS root instances, and we have always been supportive of extra activities that can help the community that we serve. George Michaelson 6:03 How did this sort of emerge? You wanted to have a democratized way of answering the Can you see me type problems you had things like Planet Labs in existence? What happened? Robert Kisteleki 6:15 Yeah, we took a little bit of inspiration from and a little bit, actually give a little bit of shout out to Ethan Katz-Bassett who was a PhD student at the moment, I think, who did something that he called Hubble, which was a really early version of what we thought it's a good idea. So we thought, let's put that on steroids and do it for real in our way, basically. Well, that meant that we basically spent a year or two in trying to come up with the concepts. What would this do? How would it actually work? And that's where we decided that it's actually a good idea to stick to active measurements. There were other systems who were trying to do passive measurements for different reasons. We even employed an intern who looked at particular hardware devices. You can use this? Can we use that? We built a prototype that showed promise. Interestingly, one of the first things we built was to be able to field upgrade hardware devices that we will ship out to people. George Michaelson 7:04 So you were already thinking about the operational burdens to operate a network of things at scale, knowing that you wouldn't be able to just knock on the door and go and do things you needed a way to do remote management. So what kind of time frame was this? Robert Kisteleki 7:19 This was 2008 2009, [george: right] We envision that we, as the RIPE NCC, we will never be able to be everywhere that was not possible and even impossible today. So one of the concepts was that we actually wanted to use the community support to deploy these vantage points all over the world, which is why we said, Okay, once these devices fly out. There is no way that we can take them back to upgrade and ship them back to the original places. So the first functionality was, well, upgrades. George Michaelson 7:49 And when did you launch this? The launch was in 2010 at the Rome right meeting, where we handed out, I think the ballpark of 500 or so early versions of the probes. Wow. So starting off with 500 straight out the door. Robert Kisteleki 8:04 We consider that to be the prototype level to see, you know, how it actually works in the field. And around 300 maybe 400 of those, actually came alive and were active very soon. George Michaelson 8:15 That's really amazing. And I think I have one of these early generation devices. These are really quite physically small devices, aren't they? Can you maybe talk a little bit about what the structure of Atlas is? Robert Kisteleki 8:28 Yeah, indeed, it has a couple of components. There is a sizable infrastructure component consisting of multiple machines that is dealing with collection of the results and telling the probes in first place what to do and what to measure. We have a relatively large big data back end, where we store the results that we collect. And we've been storing every result going back to 2010 so if you want to go there, you can fetch those out from the archives. We have an API, we have a user interface, we have streaming so you can get the data real time as it comes into the system. You can get it out. George Michaelson 9:05 That's the component that you do kind of operate as a main central asset to keep this system running. But it's a distributed system, so there's other things which are in the field, like these small probes, [Robert: yeah]. So is that all that's in the field? Robert Kisteleki 9:18 Well, in essence, yes. So next to the infrastructure component, which just must be there, because otherwise that the whole system doesn't work, the main, main main component is the set of probes. We call them probes that exist out and in the wild. These probes we originally intended to be hardware only, and that's why the original design was field upgradability and so on. We started with a really early physical device that was meant to be serial to Ethernet converter, which we just repurposed to be the first generation probe. Imagine it had eight megabytes of flash and eight megabytes of RAM. George Michaelson 9:52 Well, it's kind of I can put my head back around to thinking that sounds like a lot, but when you've got to pack an operating system. System, a network stack, and some form of scriptable space to do things, plus administrative back end burdens. That's not a lot of room to operate on, and it's certainly not a lot of room to store data on. It is so you were coming out the door knowing you were working in a small environment, Robert Kisteleki 10:17 I have to say it was a technical feat to make that happen. And by the way, don't forget, IPV6 was a must. George Michaelson 10:23 So this device already went out the door, dual-stack, v4 V6. Robert Kisteleki 10:26 It could do everything. George Michaelson 10:27 Okay? So if we understand probes are out there, and you gave out 500 of them in 2010 what are the probes actually do? Robert Kisteleki 10:35 These probes are the essence of the network. These probes do the measurements themselves. When you plug them in, they connect to the central infrastructure and basically register to say, Hi, I'm ready to work right. The central infrastructure knows about these probes, so you cannot be a rogue probe. So they connect to the system proactively. We cannot connect to them, only they can connect to us. And this might be a good place to say these probes are not designed to supply any kind of service to the local user. Their only purpose is to talk to the central system and say, What shall I do now? Then the central system says, Well, I have the following measurements that you should be running, pings, traceroutes, DNS and the rest. So the probes have, let's call it a crontab. It's not exactly that, but it's close enough, George Michaelson 11:18 right? Are they actually a Unix system? Is it like a lightweight Unix or is it just that we can use the Unix concepts to describe what they do? Robert Kisteleki 11:26 All of them run basically Linux. It's a variation of openWRT or on the early probes, it's micro ucLinux, right? George Michaelson 11:33 So it's crontab-like. they can run schedule tasks, and the tasks, because it has a full blown network stack, are functions like ping and trace route or DNS lookups. It can do more than those? Robert Kisteleki 11:46 We have a finite set of measurement types. So in essence, we have pings, trace routes, DNS queries, NTP queries, a relatively restricted HTTP query. [George: Yeah]. We can talk about that later. But these are what we would like to call basically networking primitives. George Michaelson 12:02 Yeah, they're building blocks that you can then construct a set of tasks to be done, indeed. And this is to use something from IoT, orchestrated from the center. Robert Kisteleki 12:13 It is. part of the central logic. Is to figure out which probe is the best one to execute a particular measurement as a user, you can say, please give me a probe from this part of Asia, and the system will try to figure out, okay, well, then this might just work for you. Assign the task to the probe, the probe executes it, and then here comes the second part of what the probes do. They dutifully report what they see. What they see is basically the results of those measurements. They wrap them in JSON and send it up to the network, and the network then routes it back to the data store and to the online streaming back to the user. George Michaelson 12:46 So from this initial, simple beginning with a small device, have you done subsequent revisions of the hardware? Robert Kisteleki 12:53 We are on generation five at the moment. George Michaelson 12:55 Well, project now it's 2025, so five generations in 15 years. That's not too bad really Robert Kisteleki 13:03 Not too bad. Funny story, we imagined that the total lifetime of the version one probes will be probably a year or two. George Michaelson 13:11 Oh, I think I'm still running one. Robert Kisteleki 13:12 Yes, we still have a ballpark of like, 500 of them up and running after 15 years, and they did not wear out their flash. So that's just, kudos to the manufacturers back in the day, but we did move on to more generic devices. Version two was the same as the version one, but with a little bit more memory, so it eased up the pain a bit, but essentially otherwise it was the same. Version Three was a repurposed travel router. It was essentially a TPlink, which we put a USB stick in just to have enough storage, because otherwise, you know, travel router does not have space for you, but with a USB stick, it did, and that was wonderful, because it was an off the shelf component. We could just put anything we wanted on it with a little bit constraint that actually had 32 megabytes of memory. That was awesome. George Michaelson 13:57 Constrain the initial we could do anything we wanted to work with. Robert Kisteleki 14:01 Look, we made the thing work on eight megs, so making it work on 32 was a breeze, George Michaelson 14:07 and big changes with version three and four Robert Kisteleki 14:10 no to the version three had this USB stick and an off the shelf router, but at some point the manufacturer stopped George Michaelson 14:17 right. Sometimes versions adjust supply chain dynamics Robert Kisteleki 14:23 Exactly. So we switched over to version four, which is a Raspberry Pi clone. It's a nano pi model, George Michaelson 14:30 and that's really quite respectable hardware. Robert Kisteleki 14:32 It is an absolutely different generation George Michaelson 14:35 with MMC on board Robert Kisteleki 14:37 verything. It's orders of magnitude. We are actually using just a tiny proportion of what it can do, because the code is efficient enough to run on old devices. It can do anything on the newer ones. But it did open the door to make it a new generation of device that we no longer have to be afraid of. Oh, will this code fit? So yeah. Basically, from generation four on, we have bigger capabilities George Michaelson 15:01 still running essentially scripted invocations of the building block commands, Robert Kisteleki 15:06 yes, otherwise, they do exactly the same things as the previous generations George Michaelson 15:10 did so the most recent version. Robert Kisteleki 15:12 The most recent version is a v5 which is a clone of the Turris MOX device. It's a teenie, weenie home router with parts stripped off to make it cheaper to cheap enough for this purpose. George Michaelson 15:24 So you've now reached into the supply chain and actually expect variations on a commodity to bring down the build cost, correct? Robert Kisteleki 15:31 And it also let us escape the manufacturing business. We just outsource that to someone else and say, Please make more. George Michaelson 15:36 Do you even get them to blow your initial operating system image onto the device. Robert Kisteleki 15:41 That's what they do. And what's interesting about these devices is these now have crypto capabilities in the CPU, so key generation and all of that, which is a huge security benefit for everyone. At the end of the day, George Michaelson 15:52 I think you've arrived at a very nice place with this hardware. But I believe there is this third class of device. There's the central facility, there's the edge probes, but you also intruded into the model, a slightly bigger unit. Robert Kisteleki 16:05 So the probes are imagined to be run at the edge of the network. It could be, and most of them are running in home networks. But you could install that in your business or in your ISP, anywhere you want to. But there was a demand to have a slightly bigger device, more reliable, and don't forget, rackmountable version of it, the probes were so tiny they couldn't be rackmounted. But that's actually quite a problem when it comes to giving it to a company that only has technical hardware in racks. Just having small devices floating around, not permissible. But if it was in a rack, it would be so we came up with this concept of anchors, and we call them anchors because they're not only probes, they're also willing targets of measurements. They are advertised to the world, George Michaelson 16:48 right? Because that question, if you've got this huge network of things that can emit packets, okay, so I'm individually, maybe interested in having tests of reachability to me and my devices, but that doesn't necessarily mean I want to randomly at home receive 10s and hundreds and 1000s of queries pointing at me. And if you guys are pointing at things out in the real world, there are only so many people who want to be looked at this way. The anchors sound like they're capable of being somewhere that puts their hand up and say, send things to me. Robert Kisteleki 17:20 Remember, the probes don't want to talk to you. The probes only want to decide who they should talk to, right? So you can't, even if you could technically reach them, they will not answer you. Anchors will so they have this function of running some basic services. They are willing to answer on ping trace route. They have a very tiny DNS server, so you can ask something which is a large result or a small results, you can see, actually get the result, George Michaelson 17:42 not just acting as a forwarder into public DNS, but they can generate outcomes in themselves that test qualities of the system, Robert Kisteleki 17:51 right and on top of this, we also involve all of the anchors right away from day one in a full mesh of measurements each anchor is targeting every other anchor on pings and traceroutes and some DNS queries, and you get that as a free benefit if you run an anchor. So you are providing multiple services. When you when you sign up, you get your own probe. You can measure stuff. You are also measured. Therefore you get free data. And you are offering this service to the world, like, Hey, if you want to get from wherever you are to me, here's a fixed target. You can use that. George Michaelson 18:26 So 2010, launch in Rome, 500 of the version 1's, and here we are. We've rolled the clock forward. Version five of the hardware. How big is the project? How are things going now? Robert Kisteleki 18:37 At the moment, we have about 13,000 devices out there, and we probably didn't even mention the software probe version. So we started with the hardware, but at some point we relaxed the constraints and said, Fine, the package could be available on any Linux. So here it is. You can install it. So as of today, we have a little bit less of 1000 anchors, about 4000 software probes. And the remaining ones are still hardware probe. George Michaelson 19:03 And there is about 10,000 total, or is it 10,000 13,000 13,000 Wow, 13,000 that's huge. Robert Kisteleki 19:13 It's huge enough, or large enough to be representative now. So we have presence in something like 180 190 countries. In 4000 or so version four ASNs. So probes which have a connection from any of the AS on IPV4 and about 2000 v6 ASNs. George Michaelson 19:33 So there is actually somewhere around 80 to 100,000 ASNs active in BGP. And people might hear 4000 and things, that isn't very big, but the thing is, an awful lot of those ASNs are either not really functional or really stubby. They have a tiny, tiny amount of traffic, whereas the 4000 that you're in are very likely to be active, engaged, stub and transit AS's aren't they. Robert Kisteleki 20:01 we have definitely presence in Edge ASs and in transit, ASs far as as we know. But in some arguments can be made that as long as you cover at least one AS behind the transit AS you probably share your faith with everyone else behind the same transit AS so it is certainly true that we are not present in all of those 100,000 AS's it would be nice if we did, but the representation is good enough, so to speak. George Michaelson 20:25 So I was lucky enough to be in the Rome meeting, and I think I can remember, either there or around that time, Daniel karenberg had a really nice descriptive idea of his vision of what this system was going to look like. Can you talk a little bit about that? Robert Kisteleki 20:42 Yeah, indeed. So Daniel is essentially one of the instigators here. I should call him the main instigator, to be fair. And his vision included a light map of the earth. So we looked at this dark map where the lights that's where people live, presumably, and with the high correlation, that's where Internet is. So that's where we want to be, right? And in some sense, consciously or subconsciously, we wanted to divert those probes in those areas. Now, if you look at the map that Atlas produces today about where the probes are, there's a magnificent correlation between that light map and our map of where the things George Michaelson 21:19 are. So in some sense, that visionary sense light up the world. With Atlas, you've kind of achieved it, although I think we might talk later about some of the coverage assets. It's not completely equal, is it? Robert Kisteleki 21:30 No, it isn't. We have never set hard limits on how many probes can there be maximum or minimum in a particular AS, which means that the larger ASs is have a larger a probe population. So in some sense, that is definitely biased to these numbers, George Michaelson 21:45 right? I was going to ask, is that not potentially a risk in the model of the statistics you're gathering that it over represents certain links and under represents others? Robert Kisteleki 21:55 It certainly needs some kind of understanding when you when just ask the system give me probes, what it will do. So if you insist on representation in the sense that I want to have proportional number of probes to the size of the network, you can just let the system do its thing. But if what you want is no no, please give me one single probe from all the places that you can then it's a different selection criteria. George Michaelson 22:19 But in essence, there are adjustment methods you can use, either in probe selection to run experiments or in view of data from experiments that have been run to make if you like, an adjustment rebalancing of the data. Robert Kisteleki 22:34 We have various means for you to say, to express to us what kinds of probes you want for your measurement. You can select a country, you can select an AS, you can select a region. And some of these obviously come with the bias. If you say, give me 1000 probes from Germany, you will get a lot from Deutsche Telekom. But we're addressing this by other metrics, where you can say, I really want different probes, and that's just a feature that some people really, would like George Michaelson 23:00 so early in our conversation, you talked about the democratization of a system like this, and that presumably means that other people in the community are using Atlas to conduct research. Can you talk a little bit about the kinds of things that people are doing? I've just come back from the APRICOT meeting in Malaysia, and there was this forum run by the Internet Society, the PULSE, Internet measurement forum, PIMPF and Leah Hestina from the RIPE NCC was there talking about the role of Atlas in measurements. People do. I think that's a nice example of the community engagement. But you must have more that's going on. Robert Kisteleki 23:38 Originally, the intention was to create this service for network operators. That is our primary target group. That is still true, but it's certainly true that I don't even know how many PhDs were written on Atlas data. So there is it's a significant subgroup of what we have, and as far as I know, they're really happy with what they get. They always have more questions, especially on new protocols. Can you please implement this experimental protocol for us, but the main target group is network operator. George Michaelson 24:05 So operators do a bit of research, but they're also quite focused on a mechanistic outcome. They want to know facts about their system. So how are you kind of looking at that? Robert Kisteleki 24:15 Two major things that you can do with Atlas. One is what we can call ongoing measurements. So these are intended to give you ongoing data, continuous data flow about how things are and how they change over time. George Michaelson 24:28 So trend analysis based on a long history of the same basis of measurement, Robert Kisteleki 24:33 Trend analysis monitoring, so you get the baseline and now you know if something changed, what was it before and what it is now, and then it gives you a hook to say, did I make that change? Did someone else make that change? Is it okay? Do I need to look at that? George Michaelson 24:48 So that's one form. Robert Kisteleki 24:49 That's one form. The other is what we can call ad hoc analysis, where you get an indication that something is off. There's a problem, probably somewhere, and maybe someone in. Japan says, I cannot get to you. And then you can ask the system, okay, please traceroute me from Japan. Then you get the results basically immediately. And you can do an analysis on that. We can, we have visualizations and so on that. Help you with this. Coming back to the to the ongoings, we have a lot of what we call built in measurements, so they run all probes against DNS machines, and in particular the DNS root servers, [George: yeah], to see how they work, whether the latency is good or bad, or how you compare to one or the other. George Michaelson 25:28 And that kind of data set that anyone can look at. I mean, this data is available. I wouldn't use the word data lake, because it's more structured than that, but I could come into the system, and instead of having to ask for things to be done. I could look at this history of data, of measurements, of reachability to A root, or L root or whatever, and it's all there. Robert Kisteleki 25:49 Yes, you can, and we even have visualizations for you. We have other services built on top of RIPE Atlas, like DNSmon, which actually precisely does this, and don't forget, with the history of 15 years. So we can observe the evolution of the roots. As I mentioned, for the anchors, you get the constant data flow. So on day one, when you start up your anchor, we start measuring you from all the other anchors, and you get this data flow for free. You have to do nothing else but keep that machine up. It's not only for you, but it's also for everyone else, which means it did open the door for a lot of interesting use cases. For example, recently there were some undersea cable problems in the Baltic and other areas where the researchers of the RIPE NCC looked at the data set and identified things that actually changed. We published lots of articles about this. It's very observable, and the only reason why you can do that is because we do constant data collection, and that data is retrievable. You can look at George Michaelson 26:46 so you can see the transit variances. You can correlate it with BGP announcements that are in other activities, like the ripe RIS. You can actually integrate all of this stuff and get a more holistic view of what's going on. Robert Kisteleki 26:59 You can look at RTT changes, and when you see one, you can say, Oh, what was the path before? What was the path after? Using the trap measurements that we have. So this is a lot of interesting data. We also had a question back in the day about, there is this new idea of using the reserved space? 2..240/something. Something is it already used in the wild. And to be honest, we didn't have to do much, because our pros and anchors already were measuring stuff. So we looked at the traces and said, it's on this path, it's on that path. And we identified some of the providers that have been using this already, formally or informally, but it's just the fact it's a historical record, George Michaelson 27:42 kind of as a side effect of the way people use the system and the data it collects. You're able to do retrospective analysis, but you could also now construct more formal let's actively measure this and integrate it into the system. What about in the ad hoc space? What kind of things can people do if they're scripting their own experiments? Robert Kisteleki 27:59 So as I mentioned, the question there is, I think something is up, so it's mostly aimed at debugging. You can have your own actual question. It does not need to relate to debugging, but that's the envisioned original use case. So for example, if someone says you run a DNS server, but it doesn't really give the answers that I think it should give me, is it okay, or is it not? And especially if you have a distributed DNS with multiple servers, anycast and all of that beautiful stuff, it's really, really hard to leave up, whereas with Atlas, you can say, Fine, ask this question for 1000 different points, and then look at how many different answers you get. And I think Stefan Bortzmaier is one who always jumps on these questions on NANOG in particular, and sometimes someone says, I think something is off. He uses his own tool under the hood, right Atlas is serving the data, and he just says, Yep, the problem is real. Look, here's the evidence. George Michaelson 28:51 But it doesn't only have to relate to a service like DNS. You can look at perturbations in the routing plane, in anycast, in BGP in general. Robert Kisteleki 29:00 You can do this with all of the kinds of measurements we do. So pings, they only really give you the black and white yes or no. This works or it doesn't. So what's more useful is traceroutes and DNS and perhaps NTP questions or the other measurements that we do. But at the end of the day, it can help you answering this question. Is the problem real or not? And in many cases, especially if you're using trace route, it can also give you where the choke point is. We have tools that can visualize this one we call tracemon where you just see the path. It's mechanistically difficult to do, but at least it gives you the idea that these paths converge and then they don't go anywhere. For example, George Michaelson 29:37 So what's your vision of the future for this system? This is something that the RIPE NCC is committed to for the long term. What's next for Atlas? Robert Kisteleki 29:46 We do want to stay on what we can call the network measurements layer, so pings and traceroute will stay, DNS queries will stay. We ventured a little bit further to NTP. Is it network level? It kind of but not. Really, we have limited support of HTTP because we definitely did not want to do a full fledged HTTP client that can fetch anything from all over the world that has a lot of security issues and risks. But we do want to do some basic level HTTP monitoring, for example, to discover who is the closest CDN to you, what would be the response time if I fetch something very, very, very simple and then fetching active content, George Michaelson 30:26 modern HTTP is 97% or more HTTPS. So this also means you would need to implement a full TLS client side stack to connect into this Robert Kisteleki 30:37 if you actually want to fetch full content, or at least partial content, then, yes, but we do have something that we call TLS measurements, which is a simplified version of this. It only goes as far as to fetch the certificate that server presents you, which is already a treasure trove for researchers, because it does expose, if you want to go from point A to point B and ask point B's TLS certificate. What do you get? George Michaelson 31:02 Yeah, who are they using as their chain of trust behind their certificate issue? Robert Kisteleki 31:07 Not only that, but when it changes, was it a change for the better? Or is there a man in the middle? Or is there some something fishy going on? George Michaelson 31:15 So it actually, in some ways, can feed into the pulse governance and societal aspects of networking, you can help uncover things like intermediate occlusion of data. Those kinds of things potentially are in the Atlas system as well. Robert Kisteleki 31:29 It is, I would say, that the system is capable of supplying data that is evidence for you, for good or bad about network behavior in general and in particular, for whatever you want to use it for, and then how you use that data? Whether you write a research paper on this, you actually use it for your day to day network operations, that's really your question. George Michaelson 31:49 So something Lea mentioned that I think we haven't touched on, is that you operate a kind of credit system as well, that hosting a device gives you slightly more units of Atlas money to be able to operate in the system. Can you talk a little bit about that? Robert Kisteleki 32:06 Yeah, that's George Michaelson 32:07 it's not real money. Robert Kisteleki 32:08 It's not real money, and that's because of a number of reasons that exactly that is... not something we want to entertain. But indeed, if you run a probe, if you host a probe, or you host an anchor or sponsor, or there are some other channels, we recognize your contribution by giving you what we call the credits. So in a case of a probe, the more uptime you have to be a probe, the more useful it is to the whole system, the more credits we give you with some limits. Now that's nice, but what can you use your credits for? George Michaelson 32:44 They don't get you luggage at the airport. Robert Kisteleki 32:46 No, this is not that kind of credit. George Michaelson 32:48 They don't get upgrades. Robert Kisteleki 32:49 No, right? What you can use it for is to run your own measurements so you get credits, and then you say, actually, I want to use the system. I don't only want to contribute, but I want to use the system. So here's my measurement specification. Please do these traceroutes, these ping measurements, these DNS queries and so on, from these and these and these probes go, the system will basically start digging into your pocket, so to speak, of credits and use those credits up. George Michaelson 33:15 But that's the nature of money in the real world. It's kind of a rationing system, right? I mean money credits in this system help control excessive use of the system. It stops people asking you to perform massive amounts of work that incur a burden on you in managing data that then ultimately doesn't go in. Robert Kisteleki 33:34 It also gives a little bit of a fair use to the system George Michaelson 33:38 The democracy concept. Robert Kisteleki 33:40 The more you contribute, the more you can get out of the system. If you run an anchor, you are providing more service to the world than if you run a probe. We recognize it by giving you more credits. It's give you the higher capacity to spend those credits on things that are important to you. George Michaelson 33:54 So I know that RIPE NCC also do continuous improvement in their general web services and their registry functions are you also in an upgrade cycle. In your software suite, there improvements in how this works? Robert Kisteleki 34:07 Of course, we have to be that concerns central system as well. You know, imagine 15 years of software. There's a lot of things that need maintenance and replacing, but also on the probes we have to follow, at a minimum, with functionality, but also of OS upgrades and so on. So there is, there is enough to do. Just recently, we released the software versions on the newest Red Hats and debians. So you know, please feel free to run with it. But as soon as the next generation comes up, the next version of Red Hat or Debian comes up, we intend to follow it as well. George Michaelson 34:38 And there might be new features released, like new forms of data comparison, new visualizations. There's work in that space, Robert Kisteleki 34:45 right, right. There are two major works going on at the moment in this space. One of them is better support for recognized use cases, so to speak. So when we know that there are a whole bunch of people who ask similar questions and wouldn't it? Be nice if there was a button that made it really easy, yes, it would. So we are trying to discover what are the commonalities between network operators, for example, who say, actually, I want this thing to happen, but I don't want to fiddle with your API in the UI. So no, just give me the button. So in that space, a long time ago, we built something that we called "quick look". Just basically, just tell us your target and push this button. And what the system does, it just selects a bunch of probes, it figures out all the details, and within 10 to 20 seconds, it comes back with a map to you that tells you what's green and what's red and what's in between. You don't want to burden the network operators with the details of what's actually going on behind the scenes. So I envision there will be a lot more of these where people say, I kind of know what I want and you know what I want. Just make it easy for me. That's one. The other aspect that we are strongly thinking about is doing more comparisons. So imagine your probes show something weird, and then you really want to know, am I alone with this? Is this my problem, or is this the world's problem? George Michaelson 35:55 Does anyone else see this Robert Kisteleki 36:00 Exactly. So wouldn't it be nice if a host had access to a response that says, Actually, all the probes in your AS are more or less the same, or all the probes in your country or more different. George Michaelson 36:17 You're not seeing the rest of the world sees Robert Kisteleki 36:19 Exactly. So this would be beautiful, because it could help, especially the target use group, to pinpoint where the problem is. So if it is close to the destination, for example, all the probes have a different behavior today than they did yesterday, and it's worse, or it's better. You might want to know that too, as opposed to it's all the probes in your AS, nobody else. George Michaelson 36:41 If there's a cable cut in Turkey and you lie behind Turkish Telecom, everything you do is going to be affected by that change in Turkish telecom. But if there's a storm in your immediate city and you lose connectivity, it really is going to look quite different, isn't it? Robert Kisteleki 36:57 It is, Which reminds me, we can also observe local problems in this sense. So, for example, we see effects of electricity outages. We see country George Michaelson 37:09 power outages. Robert Kisteleki 37:10 Yep, the famous case was a long time ago, something like 2013, or so. There was a hackathon in Amsterdam, about RIPE Atlas and DNS measurements, and there was a power outage around Amsterdam. So the day after, we made videos about imagine the green lights around Amsterdam being the probes, and two thirds of them that went down at the same time and then reconnected a couple of hours later, George Michaelson 37:35 Right? But in the structural sense of data, the absence of data in a system that has been regularly reporting in the right geographic context actually says there was a consistent problem. [Robert: Yes], it's not just cable cut. It can be power out. So you could probably have a restore state flag inside the probe that says I was offline. Robert Kisteleki 37:53 That's one of the interesting details of the system, that the probes remember what they were told to do and carry on those tasks, even if they are disconnected, George Michaelson 38:02 so you don't break this connection from power exactly Robert Kisteleki 38:06 right? They don't need hand holding, which means every time there is a disconnect, we see what happens from the inside, assuming the thing recovers, and then the probes can report what they saw. This is very unique, and it's very different from another system, where you constantly have to tell the probes, what are your system, the edge devices, what to do, and if they are not available, you just can't tell them what to do. George Michaelson 38:31 I think you've built a really interesting outcome here, Robert. I think this is an investment in the community that's going to pay back for a very long time. Well done. Robert Kisteleki 38:41 Thank you very much. I would like to claim that we have happy users out there, and we have more and more users of the system, so this is still on the upwards trajectory. George Michaelson 38:50 And is there a web page that people can go to to learn more about Atlas, perhaps recruit to host an anchor or get involved? Robert Kisteleki 38:58 I would say that the easiest thing to do is go to atlas.ripe.net, and from there you will see all the visualizations, all the documentations, all the way to ways to engage. If you want to be a host or a sponsor, just a user of the system, that's the place to start. George Michaelson 39:12 That's great. Thank you, Robert. Robert Kisteleki 39:13 Thank you very much. George Michaelson 39:16 If you've got a story or research to share here on ping, why not get in contact by email to ping@apnic.net or via the APNIC social media channels, also remember the measurement@apnic.net mailing list on orbit is there to discuss and share relevant collaborative opportunities, grants and funding opportunities, jobs and graduate placings, or to seek feedback from the community on your own Measurement Project, be sure to check out the APNIC website for all your resource and community needs until next time you.