Robert Kisteleki  0:00

    These probes are the essence of the network. These probes do the
    measurements themselves. When you plug them in, they connect to
    the central infrastructure and basically register to say, Hi, I'm
    ready to work right. The central infrastructure knows about these
    probes, so you cannot be a rogue probe. So they connect to the
    system proactively. We cannot connect to them, only they can
    connect to us. And this might be a good place to say, these probes
    are not designed to supply any kind of service to the local user.
    Their only purpose is to talk to the central system and say, "What
    shall I do now?" Then the central system says, Well, I have the
    following measurements that you should be running.  Pings,
    traceroutes, DNS and the rest.

George Michaelson  0:48

    you're listening to ping, a podcast by APNIC, discussing all
    things related to measuring the Internet. I'm your host, George
    Michaelson, this time I'm talking to Robert Kisteleki from RIPE
    NCC. Robert is a principal engineer, the product owner and
    technical lead for RIPE Atlas. Many of the Internet measurement
    studies we've discussed on ping have used Atlas as the basis of
    their research. It's become a central plank of measurement between
    different parts of the Internet across the range of protocols
    since its launch at ripe 61 held in Rome in 2010 but how does
    Atlas work? What's the history of the system and how do its
    components fit together? Robert was one of the key initiators of
    the service, and continues to architect its design and behaviors.
    He's the perfect guide under the hood looking into this
    measurement system. Robert, welcome to ping.

Robert Kisteleki  1:44

    Thank you. Thank you, having me here.

George Michaelson  1:46

    Can you tell everyone a little bit about yourself?

Robert Kisteleki  1:49

    Yes, I'm Robert. I joined the RIPE NCC a long, long time ago as a
    system architect to work on RPKI, the evolving RPKI back in the
    day, and then moved on to what is now called research in the RIPE
    NCC, and worked as the manager of the research and development
    team and worked on measurement systems and the like ever since.

George Michaelson  2:11

    So you are now in charge of the Atlas system.

Robert Kisteleki  2:14

    That is correct. I'm the moment, the principal engineer, and my
    role include being the product owner of RIPE Atlas, and basically
    the technical lead as well.

George Michaelson  2:23

    Now, listeners to PING probably hear us talking about Atlas all
    the time. So in some way, I think this is redundant, because I
    think it's a given, if you're in measurement in the Internet, you
    know what Atlas is. But just in case, can you kind of give us the
    top level view what is Atlas?

Robert Kisteleki  2:41

    Yeah, RIPE Atlas is an active Internet measurement network using a
    whole lot of distributed nodes all over the world to do the actual
    measurements. We collect these measurements and present it to the
    users, and also let any user of RIPE Atlas execute these
    measurements using the whole platform.

George Michaelson  2:59

    So you said active, it's not a passive data collection method,
    like things which are fetching the BGP state and just recording
    it. You are actually using systems to make traffic happen for the
    purpose of measurement.

Robert Kisteleki  3:12

    Yes, indeed. Very early on, we decided that it is in some sense
    safer just to stick with active measurements. But also our major
    user groups really are interested in doing the active
    measurements, as in reaching out and seeing what happens when you
    send out this packet, what is the reply when you want to do a DNS
    measurement or a ping or a traceroute or something like that. And
    this has a lot of consequences, of course, but we thought that
    this is still better for the constituents

George Michaelson  3:41

    the project's been running quite a long time, hasn't it? The
    origins of this back in the early 2000

Robert Kisteleki  3:47

    Yeah, that's indeed the case. Originally we started thinking about
    this around 2007, or eight, when we saw a lot of requests at the
    time from various network operators who mailed, for example, NANOG
    and various other mailing lists, and said someone told me that
    this issue is happening when you try to get from A to B. Is there
    someone who can do a traceroute for me, please from that network?
    And we figured that, well, maybe there's a better way of doing
    this, a bit more democratic.

George Michaelson  4:15

    So at that time, people would have been used to maybe using
    Looking Glasses, but they were in very restricted places, perhaps
    in central facilities of large ISPs. They couldn't always detect
    problems that might exist between two random points in the net.
    Could they?

Robert Kisteleki  4:31

    Yeah, that is one of the cases they but it's, it's still, I think
    the bigger problem was trying to find someone in the network where
    you think the problem is, or that is maybe close enough to the
    problem that can tell you something about what the problem is. And
    most of the time, this was used for actually problem discovery.

George Michaelson  4:49

    And so there were some systems available that provided a kind of
    distributed view of the network. At that time?

Robert Kisteleki  4:55

    there were some systems, but they were mostly used for research.
    And for example, Planet labs is one of them, which at that time
    had in the ballpark of six or 700 servers, and various Masters or
    PhD students could use that to discover things. Ever since then,
    it shrank a bit, and I think that was defunct, but at that point,
    that was the closest possible or available to a distributed system
    where you could say, Do this query for me. From that point of the
    world,

George Michaelson  5:26

    there's this aspect of the RIPE NCC. I mean, you work in a
    Regional Internet Registry. I work in APNIC, a Regional Internet
    Registry. But RIPE NCC actually existed before the RIR system was
    established. You were kind of always there for research and
    operational best practice, and these things weren't you. You were
    a pre existing community.

Robert Kisteleki  5:47

    Yeah. Indeed, the rapid NCC was founded not only to be the RIR of
    the European and surrounding regions, but also do a little bit
    more. We also run K Root, one of the DNS root instances, and we
    have always been supportive of extra activities that can help the
    community that we serve.

George Michaelson  6:03

    How did this sort of emerge? You wanted to have a democratized way
    of answering the Can you see me type problems you had things like
    Planet Labs in existence? What happened?

Robert Kisteleki  6:15

    Yeah, we took a little bit of inspiration from and a little bit,
    actually give a little bit of shout out to Ethan Katz-Bassett who
    was a PhD student at the moment, I think, who did something that
    he called Hubble, which was a really early version of what we
    thought it's a good idea. So we thought, let's put that on
    steroids and do it for real in our way, basically. Well, that
    meant that we basically spent a year or two in trying to come up
    with the concepts. What would this do? How would it actually work?
    And that's where we decided that it's actually a good idea to
    stick to active measurements. There were other systems who were
    trying to do passive measurements for different reasons. We even
    employed an intern who looked at particular hardware devices. You
    can use this? Can we use that? We built a prototype that showed
    promise. Interestingly, one of the first things we built was to be
    able to field upgrade hardware devices that we will ship out to
    people.

George Michaelson  7:04

    So you were already thinking about the operational burdens to
    operate a network of things at scale, knowing that you wouldn't be
    able to just knock on the door and go and do things you needed a
    way to do remote management. So what kind of time frame was this?

Robert Kisteleki  7:19

    This was 2008 2009, [george: right] We envision that we, as the
    RIPE NCC, we will never be able to be everywhere that was not
    possible and even impossible today. So one of the concepts was
    that we actually wanted to use the community support to deploy
    these vantage points all over the world, which is why we said,
    Okay, once these devices fly out. There is no way that we can take
    them back to upgrade and ship them back to the original places. So
    the first functionality was, well, upgrades.

George Michaelson  7:49

    And when did you launch this? The launch was in 2010 at the Rome
    right meeting, where we handed out, I think the ballpark of 500 or
    so early versions of the probes.

    Wow. So starting off with 500 straight out the door.

Robert Kisteleki  8:04

    We consider that to be the prototype level to see, you know, how
    it actually works in the field. And around 300 maybe 400 of those,
    actually came alive and were active very soon.

George Michaelson  8:15

    That's really amazing. And I think I have one of these early
    generation devices. These are really quite physically small
    devices, aren't they? Can you maybe talk a little bit about what
    the structure of Atlas is?

Robert Kisteleki  8:28

    Yeah, indeed, it has a couple of components. There is a sizable
    infrastructure component consisting of multiple machines that is
    dealing with collection of the results and telling the probes in
    first place what to do and what to measure. We have a relatively
    large big data back end, where we store the results that we
    collect. And we've been storing every result going back to 2010 so
    if you want to go there, you can fetch those out from the
    archives. We have an API, we have a user interface, we have
    streaming so you can get the data real time as it comes into the
    system. You can get it out.

George Michaelson  9:05

    That's the component that you do kind of operate as a main central
    asset to keep this system running. But it's a distributed system,
    so there's other things which are in the field, like these small
    probes, [Robert: yeah]. So is that all that's in the field?

Robert Kisteleki  9:18

    Well, in essence, yes. So next to the infrastructure component,
    which just must be there, because otherwise that the whole system
    doesn't work, the main, main main component is the set of probes.
    We call them probes that exist out and in the wild. These probes
    we originally intended to be hardware only, and that's why the
    original design was field upgradability and so on. We started with
    a really early physical device that was meant to be serial to
    Ethernet converter, which we just repurposed to be the first
    generation probe. Imagine it had eight megabytes of flash and
    eight megabytes of RAM.

George Michaelson  9:52

    Well, it's kind of I can put my head back around to thinking that
    sounds like a lot, but when you've got to pack an operating
    system. System, a network stack, and some form of scriptable space
    to do things, plus administrative back end burdens. That's not a
    lot of room to operate on, and it's certainly not a lot of room to
    store data on. It is so you were coming out the door knowing you
    were working in a small environment,

Robert Kisteleki  10:17

    I have to say it was a technical feat to make that happen. And by
    the way, don't forget, IPV6 was a must.

George Michaelson  10:23

    So this device already went out the door, dual-stack, v4 V6.

Robert Kisteleki  10:26

    It could do everything.

George Michaelson  10:27

    Okay? So if we understand probes are out there, and you gave out
    500 of them in 2010 what are the probes actually do?

Robert Kisteleki  10:35

    These probes are the essence of the network. These probes do the
    measurements themselves. When you plug them in, they connect to
    the central infrastructure and basically register to say, Hi, I'm
    ready to work right. The central infrastructure knows about these
    probes, so you cannot be a rogue probe. So they connect to the
    system proactively. We cannot connect to them, only they can
    connect to us. And this might be a good place to say these probes
    are not designed to supply any kind of service to the local user.
    Their only purpose is to talk to the central system and say, What
    shall I do now? Then the central system says, Well, I have the
    following measurements that you should be running, pings,
    traceroutes, DNS and the rest. So the probes have, let's call it a
    crontab. It's not exactly that, but it's close enough,

George Michaelson  11:18

    right? Are they actually a Unix system? Is it like a lightweight
    Unix or is it just that we can use the Unix concepts to describe
    what they do?

Robert Kisteleki  11:26

    All of them run basically Linux. It's a variation of openWRT or on
    the early probes, it's micro ucLinux, right?

George Michaelson  11:33

    So it's crontab-like. they can run schedule tasks, and the tasks,
    because it has a full blown network stack, are functions like ping
    and trace route or DNS lookups. It can do more than those?

Robert Kisteleki  11:46

    We have a finite set of measurement types. So in essence, we have
    pings, trace routes, DNS queries, NTP queries, a relatively
    restricted HTTP query. [George: Yeah]. We can talk about that
    later. But these are what we would like to call basically
    networking primitives.

George Michaelson  12:02

    Yeah, they're building blocks that you can then construct a set of
    tasks to be done, indeed. And this is to use something from IoT,
    orchestrated from the center.

Robert Kisteleki  12:13

    It is. part of the central logic. Is to figure out which probe is
    the best one to execute a particular measurement as a user, you
    can say, please give me a probe from this part of Asia, and the
    system will try to figure out, okay, well, then this might just
    work for you. Assign the task to the probe, the probe executes it,
    and then here comes the second part of what the probes do. They
    dutifully report what they see. What they see is basically the
    results of those measurements. They wrap them in JSON and send it
    up to the network, and the network then routes it back to the data
    store and to the online streaming back to the user.

George Michaelson  12:46

    So from this initial, simple beginning with a small device, have
    you done subsequent revisions of the hardware?

Robert Kisteleki  12:53

    We are on generation five at the moment.

George Michaelson  12:55

    Well, project now it's 2025, so five generations in 15 years.
    That's not too bad really

Robert Kisteleki  13:03

    Not too bad. Funny story, we imagined that the total lifetime of
    the version one probes will be probably a year or two.

George Michaelson  13:11

    Oh, I think I'm still running one.

Robert Kisteleki  13:12

    Yes, we still have a ballpark of like, 500 of them up and running
    after 15 years, and they did not wear out their flash. So that's
    just, kudos to the manufacturers back in the day, but we did move
    on to more generic devices. Version two was the same as the
    version one, but with a little bit more memory, so it eased up the
    pain a bit, but essentially otherwise it was the same. Version
    Three was a repurposed travel router. It was essentially a TPlink,
    which we put a USB stick in just to have enough storage, because
    otherwise, you know, travel router does not have space for you,
    but with a USB stick, it did, and that was wonderful, because it
    was an off the shelf component. We could just put anything we
    wanted on it with a little bit constraint that actually had 32
    megabytes of memory. That was awesome.

George Michaelson  13:57

    Constrain the initial we could do anything we wanted to work with.

Robert Kisteleki  14:01

    Look, we made the thing work on eight megs, so making it work on
    32 was a breeze,

George Michaelson  14:07

    and big changes with version three and four

Robert Kisteleki  14:10

    no to the version three had this USB stick and an off the shelf
    router, but at some point the manufacturer stopped

George Michaelson  14:17

    right. Sometimes versions adjust supply chain dynamics

Robert Kisteleki  14:23

    Exactly. So we switched over to version four, which is a Raspberry
    Pi clone. It's a nano pi model,

George Michaelson  14:30

    and that's really quite respectable hardware.

Robert Kisteleki  14:32

    It is an absolutely different generation

George Michaelson  14:35

    with MMC  on board

Robert Kisteleki  14:37

    verything. It's orders of magnitude. We are actually using just a
    tiny proportion of what it can do, because the code is efficient
    enough to run on old devices. It can do anything on the newer
    ones. But it did open the door to make it a new generation of
    device that we no longer have to be afraid of. Oh, will this code
    fit? So yeah. Basically, from generation four on, we have bigger
    capabilities

George Michaelson  15:01

    still running essentially scripted invocations of the building
    block commands,

Robert Kisteleki  15:06

    yes, otherwise, they do exactly the same things as the previous
    generations

George Michaelson  15:10

    did so the most recent version.

Robert Kisteleki  15:12

    The most recent version is a v5 which is a clone of the Turris MOX
    device. It's a teenie, weenie home router with parts stripped off
    to make it cheaper to cheap enough for this purpose.

George Michaelson  15:24

    So you've now reached into the supply chain and actually expect
    variations on a commodity to bring down the build cost, correct?

Robert Kisteleki  15:31

    And it also let us escape the manufacturing business. We just
    outsource that to someone else and say, Please make more.

George Michaelson  15:36

    Do you even get them to blow your initial operating system image
    onto the device.

Robert Kisteleki  15:41

    That's what they do. And what's interesting about these devices is
    these now have crypto capabilities in the CPU, so key generation
    and all of that, which is a huge security benefit for everyone. At
    the end of the day,

George Michaelson  15:52

    I think you've arrived at a very nice place with this hardware.
    But I believe there is this third class of device. There's the
    central facility, there's the edge probes, but you also intruded
    into the model, a slightly bigger unit.

Robert Kisteleki  16:05

    So the probes are imagined to be run at the edge of the network.
    It could be, and most of them are running in home networks. But
    you could install that in your business or in your ISP, anywhere
    you want to. But there was a demand to have a slightly bigger
    device, more reliable, and don't forget, rackmountable version of
    it, the probes were so tiny they couldn't be rackmounted.

    But that's actually quite a problem when it comes to giving it to
    a company that only has technical hardware in racks. Just having
    small devices floating around, not permissible.

    But if it was in a rack, it would be so we came up with this
    concept of anchors, and we call them anchors because they're not
    only probes, they're also willing targets of measurements. They
    are advertised to the world,

George Michaelson  16:48

    right? Because that question, if you've got this huge network of
    things that can emit packets, okay, so I'm individually, maybe
    interested in having tests of reachability to me and my devices,
    but that doesn't necessarily mean I want to randomly at home
    receive 10s and hundreds and 1000s of queries pointing at me. And
    if you guys are pointing at things out in the real world, there
    are only so many people who want to be looked at this way. The
    anchors sound like they're capable of being somewhere that puts
    their hand up and say, send things to me.

Robert Kisteleki  17:20

    Remember, the probes don't want to talk to you. The probes only
    want to decide who they should talk to, right? So you can't, even
    if you could technically reach them, they will not answer you.
    Anchors will so they have this function of running some basic
    services. They are willing to answer on ping trace route. They
    have a very tiny DNS server, so you can ask something which is a
    large result or a small results, you can see, actually get the
    result,

George Michaelson  17:42

    not just acting as a forwarder into public DNS, but they can
    generate outcomes in themselves that test qualities of the system,

Robert Kisteleki  17:51

    right and on top of this, we also involve all of the anchors right
    away from day one in a full mesh of measurements each anchor is
    targeting every other anchor on pings and traceroutes and some DNS
    queries, and you get that as a free benefit if you run an anchor.
    So you are providing multiple services. When you when you sign up,
    you get your own probe. You can measure stuff. You are also
    measured. Therefore you get free data. And you are offering this
    service to the world, like, Hey, if you want to get from wherever
    you are to me, here's a fixed target. You can use that.

George Michaelson  18:26

    So 2010, launch in Rome, 500 of the version 1's, and here we are.
    We've rolled the clock forward. Version five of the hardware. How
    big is the project? How are things going now?

Robert Kisteleki  18:37

    At the moment, we have about 13,000 devices out there, and we
    probably didn't even mention the software probe version. So we
    started with the hardware, but at some point we relaxed the
    constraints and said, Fine, the package could be available on any
    Linux. So here it is. You can install it. So as of today, we have
    a little bit less of 1000 anchors, about 4000 software probes. And
    the remaining ones are still hardware probe.

George Michaelson  19:03

    And there is about 10,000 total, or is it 10,000 13,000 13,000
    Wow, 13,000 that's huge.

Robert Kisteleki  19:13

    It's huge enough, or large enough to be representative now. So we
    have presence in something like 180 190 countries. In 4000 or so
    version four ASNs. So probes which have a connection from any of
    the AS on IPV4 and about 2000 v6 ASNs.

George Michaelson  19:33

    So there is actually somewhere around 80 to 100,000 ASNs active in
    BGP. And people might hear 4000 and things, that isn't very big,
    but the thing is, an awful lot of those ASNs are either not really
    functional or really stubby. They have a tiny, tiny amount of
    traffic, whereas the 4000 that you're in are very likely to be
    active, engaged, stub and transit AS's aren't they.

Robert Kisteleki  20:01

    we have definitely presence in Edge ASs and in transit, ASs far as
    as we know. But in some arguments can be made that as long as you
    cover at least one AS behind the transit AS you probably share
    your faith with everyone else behind the same transit AS so it is
    certainly true that we are not present in all of those 100,000
    AS's it would be nice if we did, but the representation is good
    enough, so to speak.

George Michaelson  20:25

    So I was lucky enough to be in the Rome meeting, and I think I can
    remember, either there or around that time, Daniel karenberg had a
    really nice descriptive idea of his vision of what this system was
    going to look like. Can you talk a little bit about that?

Robert Kisteleki  20:42

    Yeah, indeed. So Daniel is essentially one of the instigators
    here. I should call him the main instigator, to be fair. And his
    vision included a light map of the earth. So we looked at this
    dark map where the lights that's where people live, presumably,
    and with the high correlation, that's where Internet is. So that's
    where we want to be, right? And in some sense, consciously or
    subconsciously, we wanted to divert those probes in those areas.
    Now, if you look at the map that Atlas produces today about where
    the probes are, there's a magnificent correlation between that
    light map and our map of where the things

George Michaelson  21:19

    are. So in some sense, that visionary sense light up the world.
    With Atlas, you've kind of achieved it, although I think we might
    talk later about some of the coverage assets. It's not completely
    equal, is it?

Robert Kisteleki  21:30

    No, it isn't. We have never set hard limits on how many probes can
    there be maximum or minimum in a particular AS, which means that
    the larger ASs is have a larger a probe population. So in some
    sense, that is definitely biased to these numbers,

George Michaelson  21:45

    right? I was going to ask, is that not potentially a risk in the
    model of the statistics you're gathering that it over represents
    certain links and under represents others?

Robert Kisteleki  21:55

    It certainly needs some kind of understanding when you when just
    ask the system give me probes, what it will do. So if you insist
    on representation in the sense that I want to have proportional
    number of probes to the size of the network, you can just let the
    system do its thing. But if what you want is no no, please give me
    one single probe from all the places that you can then it's a
    different selection criteria.

George Michaelson  22:19

    But in essence, there are adjustment methods you can use, either
    in probe selection to run experiments or in view of data from
    experiments that have been run to make if you like, an adjustment
    rebalancing of the data.

Robert Kisteleki  22:34

    We have various means for you to say, to express to us what kinds
    of probes you want for your measurement. You can select a country,
    you can select an AS, you can select a region. And some of these
    obviously come with the bias. If you say, give me 1000 probes from
    Germany, you will get a lot from Deutsche Telekom. But we're
    addressing this by other metrics, where you can say, I really want
    different probes, and that's just a feature that some people
    really, would like

George Michaelson  23:00

    so early in our conversation, you talked about the democratization
    of a system like this, and that presumably means that other people
    in the community are using Atlas to conduct research. Can you talk
    a little bit about the kinds of things that people are doing? I've
    just come back from the APRICOT meeting in Malaysia, and there was
    this forum run by the Internet Society, the PULSE, Internet
    measurement forum, PIMPF and Leah Hestina from the RIPE NCC was
    there talking about the role of Atlas in measurements. People do.
    I think that's a nice example of the community engagement. But you
    must have more that's going on.

Robert Kisteleki  23:38

    Originally, the intention was to create this service for network
    operators. That is our primary target group. That is still true,
    but it's certainly true that I don't even know how many PhDs were
    written on Atlas data. So there is it's a significant subgroup of
    what we have, and as far as I know, they're really happy with what
    they get. They always have more questions, especially on new
    protocols. Can you please implement this experimental protocol for
    us, but the main target group is network operator.

George Michaelson  24:05

    So operators do a bit of research, but they're also quite focused
    on a mechanistic outcome. They want to know facts about their
    system. So how are you kind of looking at that?

Robert Kisteleki  24:15

    Two major things that you can do with Atlas. One is what we can
    call ongoing measurements. So these are intended to give you
    ongoing data, continuous data flow about how things are and how
    they change over time.

George Michaelson  24:28

    So trend analysis based on a long history of the same basis of
    measurement,

Robert Kisteleki  24:33

    Trend analysis monitoring, so you get the baseline and now you
    know if something changed, what was it before and what it is now,
    and then it gives you a hook to say, did I make that change? Did
    someone else make that change? Is it okay? Do I need to look at
    that?

George Michaelson  24:48

    So that's one form.

Robert Kisteleki  24:49

    That's one form. The other is what we can call ad hoc analysis,
    where you get an indication that something is off. There's a
    problem, probably somewhere, and maybe someone in. Japan says, I
    cannot get to you. And then you can ask the system, okay, please
    traceroute  me from Japan. Then you get the results basically
    immediately. And you can do an analysis on that. We can, we have
    visualizations and so on that. Help you with this. Coming back to
    the to the ongoings, we have a lot of what we call built in
    measurements, so they run all probes against DNS machines, and in
    particular the DNS root servers, [George: yeah], to see how they
    work, whether the latency is good or bad, or how you compare to
    one or the other.

George Michaelson  25:28

    And that kind of data set that anyone can look at. I mean, this
    data is available. I wouldn't use the word data lake, because it's
    more structured than that, but I could come into the system, and
    instead of having to ask for things to be done. I could look at
    this history of data, of measurements, of reachability to A root,
    or L root or whatever, and it's all there.

Robert Kisteleki  25:49

    Yes, you can, and we even have visualizations for you. We have
    other services built on top of RIPE Atlas, like DNSmon, which
    actually precisely does this, and don't forget, with the history
    of 15 years. So we can observe the evolution of the roots. As I
    mentioned, for the anchors, you get the constant data flow. So on
    day one, when you start up your anchor, we start measuring you
    from all the other anchors, and you get this data flow for free.
    You have to do nothing else but keep that machine up. It's not
    only for you, but it's also for everyone else, which means it did
    open the door for a lot of interesting use cases. For example,
    recently there were some undersea cable problems in the Baltic and
    other areas where the researchers of the RIPE NCC looked at the
    data set and identified things that actually changed. We published
    lots of articles about this. It's very observable, and the only
    reason why you can do that is because we do constant data
    collection, and that data is retrievable. You can look at

George Michaelson  26:46

    so you can see the transit variances. You can correlate it with
    BGP announcements that are in other activities, like the ripe RIS.
    You can actually integrate all of this stuff and get a more
    holistic view of what's going on.

Robert Kisteleki  26:59

    You can look at RTT changes, and when you see one, you can say,
    Oh, what was the path before? What was the path after? Using the
    trap measurements that we have. So this is a lot of interesting
    data. We also had a question back in the day about, there is this
    new idea of using the reserved space? 2..240/something. Something
    is it already used in the wild. And to be honest, we didn't have
    to do much, because our pros and anchors already were measuring
    stuff. So we looked at the traces and said, it's on this path,
    it's on that path. And we identified some of the providers that
    have been using this already, formally or informally, but it's
    just the fact it's a historical record,

George Michaelson  27:42

    kind of as a side effect of the way people use the system and the
    data it collects. You're able to do retrospective analysis, but
    you could also now construct more formal let's actively measure
    this and integrate it into the system. What about in the ad hoc
    space? What kind of things can people do if they're scripting
    their own experiments?

Robert Kisteleki  27:59

    So as I mentioned, the question there is, I think something is up,
    so it's mostly aimed at debugging. You can have your own actual
    question. It does not need to relate to debugging, but that's the
    envisioned original use case. So for example, if someone says you
    run a DNS server, but it doesn't really give the answers that I
    think it should give me, is it okay, or is it not? And especially
    if you have a distributed DNS with multiple servers, anycast and
    all of that beautiful stuff, it's really, really hard to leave up,
    whereas with Atlas, you can say, Fine, ask this question for 1000
    different points, and then look at how many different answers you
    get. And I think Stefan Bortzmaier is one who always jumps on
    these questions on NANOG in particular, and sometimes someone
    says, I think something is off. He uses his own tool under the
    hood, right Atlas is serving the data, and he just says, Yep, the
    problem is real. Look, here's the evidence.

George Michaelson  28:51

    But it doesn't only have to relate to a service like DNS. You can
    look at perturbations in the routing plane, in anycast, in BGP in
    general.

Robert Kisteleki  29:00

    You can do this with all of the kinds of measurements we do. So
    pings, they only really give you the black and white yes or no.
    This works or it doesn't. So what's more useful is traceroutes and
    DNS and perhaps NTP questions or the other measurements that we
    do. But at the end of the day, it can help you answering this
    question. Is the problem real or not? And in many cases,
    especially if you're using trace route, it can also give you where
    the choke point is. We have tools that can visualize this one we
    call tracemon where you just see the path. It's mechanistically
    difficult to do, but at least it gives you the idea that these
    paths converge and then they don't go anywhere. For example,

George Michaelson  29:37

    So what's your vision of the future for this system? This is
    something that the RIPE NCC is committed to for the long term.
    What's next for Atlas?

Robert Kisteleki  29:46

    We do want to stay on what we can call the network measurements
    layer, so pings and traceroute will stay, DNS queries will stay.
    We ventured a little bit further to NTP. Is it network level? It
    kind of but not. Really, we have limited support of HTTP because
    we definitely did not want to do a full fledged HTTP client that
    can fetch anything from all over the world that has a lot of
    security issues and risks. But we do want to do some basic level
    HTTP monitoring, for example, to discover who is the closest CDN
    to you, what would be the response time if I fetch something very,
    very, very simple and then fetching active content,

George Michaelson  30:26

    modern HTTP is 97% or more HTTPS. So this also means you would
    need to implement a full TLS client side stack to connect into
    this

Robert Kisteleki  30:37

    if you actually want to fetch full content, or at least partial
    content, then, yes, but we do have something that we call TLS
    measurements, which is a simplified version of this. It only goes
    as far as to fetch the certificate that server presents you, which
    is already a treasure trove for researchers, because it does
    expose, if you want to go from point A to point B and ask point
    B's TLS certificate. What do you get?

George Michaelson  31:02

    Yeah, who are they using as their chain of trust behind their
    certificate issue?

Robert Kisteleki  31:07

    Not only that, but when it changes, was it a change for the
    better? Or is there a man in the middle? Or is there some
    something fishy going on?

George Michaelson  31:15

    So it actually, in some ways, can feed into the pulse governance
    and societal aspects of networking, you can help uncover things
    like intermediate occlusion of data. Those kinds of things
    potentially are in the Atlas system as well.

Robert Kisteleki  31:29

    It is, I would say, that the system is capable of supplying data
    that is evidence for you, for good or bad about network behavior
    in general and in particular, for whatever you want to use it for,
    and then how you use that data? Whether you write a research paper
    on this, you actually use it for your day to day network
    operations, that's really your question.

George Michaelson  31:49

    So something Lea mentioned that I think we haven't touched on, is
    that you operate a kind of credit system as well, that hosting a
    device gives you slightly more units of Atlas money to be able to
    operate in the system. Can you talk a little bit about that?

Robert Kisteleki  32:06

    Yeah, that's

George Michaelson  32:07

    it's not real money.

Robert Kisteleki  32:08

    It's not real money, and that's because of a number of reasons
    that exactly that is... not something we want to entertain. But
    indeed, if you run a probe, if you host a probe, or you host an
    anchor or sponsor, or there are some other channels, we recognize
    your contribution by giving you what we call the credits. So in a
    case of a probe, the more uptime you have to be a probe, the more
    useful it is to the whole system, the more credits we give you
    with some limits. Now that's nice, but what can you use your
    credits for?

George Michaelson  32:44

    They don't get you luggage at the airport.

Robert Kisteleki  32:46

    No, this is not that kind of credit.

George Michaelson  32:48

    They don't get upgrades.

Robert Kisteleki  32:49

    No, right? What you can use it for is to run your own measurements
    so you get credits, and then you say, actually, I want to use the
    system. I don't only want to contribute, but I want to use the
    system. So here's my measurement specification. Please do these
    traceroutes, these ping measurements, these DNS queries and so on,
    from these and these and these probes go, the system will
    basically start digging into your pocket, so to speak, of credits
    and use those credits up.

George Michaelson  33:15

    But that's the nature of money in the real world. It's kind of a
    rationing system, right? I mean money credits in this system help
    control excessive use of the system. It stops people asking you to
    perform massive amounts of work that incur a burden on you in
    managing data that then ultimately doesn't go in.

Robert Kisteleki  33:34

    It also gives a little bit of a fair use to the system

George Michaelson  33:38

    The democracy concept.

Robert Kisteleki  33:40

    The more you contribute, the more you can get out of the system.
    If you run an anchor, you are providing more service to the world
    than if you run a probe. We recognize it by giving you more
    credits. It's give you the higher capacity to spend those credits
    on things that are important to you.

George Michaelson  33:54

    So I know that RIPE NCC also do continuous improvement in their
    general web services and their registry functions are you also in
    an upgrade cycle. In your software suite, there improvements in
    how this works?

Robert Kisteleki  34:07

    Of course, we have to be that concerns central system as well. You
    know, imagine 15 years of software. There's a lot of things that
    need maintenance and replacing, but also on the probes we have to
    follow, at a minimum, with functionality, but also of OS upgrades
    and so on. So there is, there is enough to do. Just recently, we
    released the software versions on the newest Red Hats and debians.
    So you know, please feel free to run with it. But as soon as the
    next generation comes up, the next version of Red Hat or Debian
    comes up, we intend to follow it as well.

George Michaelson  34:38

    And there might be new features released, like new forms of data
    comparison, new visualizations. There's work in that space,

Robert Kisteleki  34:45

    right, right. There are two major works going on at the moment in
    this space. One of them is better support for recognized use
    cases, so to speak. So when we know that there are a whole bunch
    of people who ask similar questions and wouldn't it? Be nice if
    there was a button that made it really easy, yes, it would. So we
    are trying to discover what are the commonalities between network
    operators, for example, who say, actually, I want this thing to
    happen, but I don't want to fiddle with your API in the UI. So no,
    just give me the button. So in that space, a long time ago, we
    built something that we called "quick look". Just basically, just
    tell us your target and push this button. And what the system
    does, it just selects a bunch of probes, it figures out all the
    details, and within 10 to 20 seconds, it comes back with a map to
    you that tells you what's green and what's red and what's in
    between. You don't want to burden the network operators with the
    details of what's actually going on behind the scenes. So I
    envision there will be a lot more of these where people say, I
    kind of know what I want and you know what I want. Just make it
    easy for me. That's one. The other aspect that we are strongly
    thinking about is doing more comparisons. So imagine your probes
    show something weird, and then you really want to know, am I alone
    with this? Is this my problem, or is this the world's problem?

George Michaelson  35:55

    Does anyone else see this

Robert Kisteleki  36:00

    Exactly. So wouldn't it be nice if a host had access to a response
    that says, Actually, all the probes in your AS are more or less
    the same, or all the probes in your country or more different.

George Michaelson  36:17

    You're not seeing the rest of the world sees

Robert Kisteleki  36:19

    Exactly. So this would be beautiful, because it could help,
    especially the target use group, to pinpoint where the problem is.
    So if it is close to the destination, for example, all the probes
    have a different behavior today than they did yesterday, and it's
    worse, or it's better. You might want to know that too, as opposed
    to it's all the probes in your AS, nobody else.

George Michaelson  36:41

    If there's a cable cut in Turkey and you lie behind Turkish
    Telecom, everything you do is going to be affected by that change
    in Turkish telecom. But if there's a storm in your immediate city
    and you lose connectivity, it really is going to look quite
    different, isn't it?

Robert Kisteleki  36:57

    It is, Which reminds me, we can also observe local problems in
    this sense. So, for example, we see effects of electricity
    outages. We see country

George Michaelson  37:09

    power outages.

Robert Kisteleki  37:10

    Yep, the famous case was a long time ago, something like 2013, or
    so. There was a hackathon in Amsterdam, about RIPE Atlas and DNS
    measurements, and there was a power outage around Amsterdam. So
    the day after, we made videos about imagine the green lights
    around Amsterdam being the probes, and two thirds of them that
    went down at the same time and then reconnected a couple of hours
    later,

George Michaelson  37:35

    Right? But in the structural sense of data, the absence of data in
    a system that has been regularly reporting in the right geographic
    context actually says there was a consistent problem. [Robert:
    Yes], it's not just cable cut. It can be power out. So you could
    probably have a restore state flag inside the probe that says I
    was offline.

Robert Kisteleki  37:53

    That's one of the interesting details of the system, that the
    probes remember what they were told to do and carry on those
    tasks, even if they are disconnected,

George Michaelson  38:02

    so you don't break this connection from power exactly

Robert Kisteleki  38:06

    right? They don't need hand holding, which means every time there
    is a disconnect, we see what happens from the inside, assuming the
    thing recovers, and then the probes can report what they saw. This
    is very unique, and it's very different from another system, where
    you constantly have to tell the probes, what are your system, the
    edge devices, what to do, and if they are not available, you just
    can't tell them what to do.

George Michaelson  38:31

    I think you've built a really interesting outcome here, Robert. I
    think this is an investment in the community that's going to pay
    back for a very long time. Well done.

Robert Kisteleki  38:41

    Thank you very much. I would like to claim that we have happy
    users out there, and we have more and more users of the system, so
    this is still on the upwards trajectory.

George Michaelson  38:50

    And is there a web page that people can go to to learn more about
    Atlas, perhaps recruit to host an anchor or get involved?

Robert Kisteleki  38:58

    I would say that the easiest thing to do is go to atlas.ripe.net,
    and from there you will see all the visualizations, all the
    documentations, all the way to ways to engage. If you want to be a
    host or a sponsor, just a user of the system, that's the place to
    start.

George Michaelson  39:12

    That's great. Thank you, Robert.

Robert Kisteleki  39:13

    Thank you very much.

George Michaelson  39:16

    If you've got a story or research to share here on ping, why not
    get in contact by email to ping@apnic.net or via the APNIC social
    media channels, also remember the measurement@apnic.net mailing
    list on orbit is there to discuss and share relevant collaborative
    opportunities, grants and funding opportunities, jobs and graduate
    placings, or to seek feedback from the community on your own
    Measurement Project, be sure to check out the APNIC website for
    all your resource and community needs until next time you.