Route Hygiene: The Dirt on the Internet

March 25, 2009 Earl Zmijewski

Since Renesys maintains large quantities of data on the Internet going back many years, we sometimes get the question: If you guys are watching the entire ‘net, why don’t you just warn people when things break? My response is generally along the lines of: Sure we can do that. Simply tell us the correct state of the Internet at each moment in time and we’ll alert you to any operational differences we observe. This is generally met with silence.

Renesys can tell you a lot about the current state of the Internet, but absolutely no one can tell you the correct state. And that is because no one is in charge, and so there is no central authoritative source of information. Think of the Internet as a highway system where anyone can buy a car and simply start driving: no need to register the car, attach a license plate, buy insurance or get a driver’s license. You don’t even have to show an id or be sober. Just pay some fees, buy some equipment, hook up and go. The barrier to entry really is that low.

Obviously, this arrangement can cause some problems. When Pakistan hijacked YouTube last year by announcing YouTube IP space, out of the hundreds of thousands of routing announcements seen on Internet, how was anyone to know this particular one was incorrect? Okay sure, you couldn’t get your videos, but maybe YouTube had just opened a data center in Karachi and the problem was internal to them? Without some way of checking the authenticity of routes, the routers that direct traffic on the Internet simply believe what they are told. And if the best route to YouTube appears to be via Pakistan, then they are all going to use it, no questions asked. This is not a new problem, and this blog explores an old and largely failed attempt to address it. We then compare the differences between countries with respect to their routing hygiene.


The intended solution to this problem was developed way back in the early 1990s when the concept of a routing registry was first proposed. The idea was to construct a common database of routing information maintained by the appropriate parties, which could then be used to debug operational problems and filter out incorrect routes. So for example, YouTube might have registered its blocks of IP addresses (prefixes) with a registry saying that these prefixes belonged to them. They might further declare that their prefixes should only be seen originating from the YouTube Autonomous System (AS 36561), which in turn has the following list of providers. And so forth. The typical language for expressing your routing policy, known as RPSL, is very rich, if somewhat arcane, and you can fully describe your Internet presence using it. This allows other ‘net citizens to figure out if what they observe about you on the Internet matches your stated policy.

The only problem with registries is that, like most things on the Internet, no one is required to use them or, if they do use them, is required to keep them up-to-date. In addition, there are dozens to choose from. Which one do you pick? Some are regional, others are run by individual countries, nonprofits, or even Internet service providers. Of the 39 registries that Renesys monitors, the nonprofit Merit Networks Inc maintains the registry with the largest amount of routing information (the RADb). In fact, it has more than double the second largest such service, run by RIPE NCC. Others of note are run by Level 3, Savvis, NTT, and ARIN. So as with most things on the Internet, there are potentially many sources of information and there is no telling how much of it is incomplete, outdated or otherwise erroneous. And while registry data is widely viewed as unreliable at best, we set out to answer the question more definitively. For a moment in time (20 March 2009 at 00:00:00 UTC), we compared the true operational state of the Internet to the explicitly listed “correct” state as articulated in any one of the aforementioned 39 registries.

To conduct this comparison, we first had to come up with a scoring mechanism. We used prefixes as the basic building blocks for our scoring approach because these can then be rolled up into more meaningful higher level scores, such as by country or by provider. For each prefix, we looked at only two things: how is the prefix originated and who are the upstream providers for the originator? We compared the registry entries (if any) to what we were actually seeing on the Internet and gave each prefix a score based on the variance. The best score is 100 and each prefix starts there — innocent until proven guilty. Then points are deducted for each difference found between registry data and reality. The exact algorithm is not terribly important and competing alternatives are easy to suggest, but what is important is that we tried to set the bar very low, making a good score easy to obtain. For example, correct originations are worth much more than correct upstream providers. If a prefix has a correct origination and only one upstream, it will still get a score of 80 even if the upstream is incorrectly registered or missing altogether. For two incorrect or missing upstreams, the score drops by only a few points. Thus, correctly originated prefixes tend to have scores in excess of 70. And if you register nothing at all for a prefix, its score will be around 25 or slightly less, depending on the number of providers. To get lower scores, you have to do less than nothing, namely, make lots of mistakes in your registration, enough to outweigh anything you managed to handle correctly.

Once each prefix is scored in this way, we compute the average scores for all prefixes in certain classes, such as those that geolocate to a particular region or are ultimately transited by a given provider. We can then compare countries, providers and organizations for their level of routing hygiene, i.e., how close does their presence on the Internet match their stated policy? My expectation was that there would be a few folks who got things completely right, but by and large, most would either not have registered their routes or not have done so correctly. In other words, I expected most of the world to map to a score of around 25, with a few outliers around 100 and very little in between. That turns out not to be the case. The average score for all of planet Earth was fairly respectable 65.6 for around 280,000 routed prefixes. And there was quite a wide range of scores and surprising differences between countries.



The above scatter plot displays our per-country scoring results. Each point on the graph represents one of the 228 countries or territories that we see originating prefixes on the Internet. Since it is much easier for countries with few prefixes to get a high score, we plot the prefix count versus the total score for each country. Thus, we’d expect countries lower on the graph (fewer prefixes) to have higher scores, but that is generally not the case. Five tiny countries manage a perfect score of 100, namely,
San Marino, Djibouti, Faroe Islands,
Åland Islands, and Cape Verde. The bottom three, all with scores below 25, are also quite small: Saint Helena, Equatorial Guinea, and Senegal.

It is important to compare countries with a similar number of prefixes, i.e., those that appear in the same horizontal band of the graph. For example, Ukraine is in first place in their band (1,000 — 10,000 prefixes), while Chile is last. The US is in its own band as the only country with over 100,000 prefixes, but comes in with a below-average score of 50.5. Of the major powers, Russia is way ahead of the others with an 88.4, while China manages only a miserable 45.4.And finally, we note that Korea (84.9) nudges out India (84.0) to win their band.

If you ignore prefix count entirely, the US ranks 191st out of 228 countries. If instead you consider only countries with at least 1000 prefixes, i.e., those with a rich Internet infrastructure, the US ranks 34th out of 39 countries, just behind Mexico and just ahead of Colombia. The top 39 such Internet-rich countries are listed below by decreasing score.

  1. Ukraine
  2. Thailand
  3. Poland
  4. Austria
  5. Switzerland
  6. Romania
  7. Russia
  8. Pakistan
  9. Sweden
  10. Germany
  11. Korea
  12. Japan
  13. India
  14. Indonesia
  15. Bulgaria
  16. Italy
  17. Egypt
  18. United Kingdom
  19. Israel
  20. Netherlands
  21. Australia
  22. South Africa
  23. Philippines
  24. Hong Kong
  25. Singapore
  26. Canada
  27. Taiwan
  28. Spain
  29. New Zealand
  30. France
  31. Turkey
  32. Brazil
  33. Mexico
  34. United States
  35. Colombia
  36. Ecuador
  37. China
  38. Argentina
  39. Chile

In my view, if the Internet is ever going to avoid degenerating into nothing more than a den of thieves and predators, we have to introduce some accountability somewhere. Knowing the assigned IP addresses of each organization and their authorized providers is a good first step, and it doesn’t seem to be asking all that much. And while a lot of the world falls short of accurately providing even this minimal information to some “authority”, I couldn’t help but notice that over 7,300 autonomous systems managed to get a perfect score of 100 in our system, although 63% of them had only 1 prefix to worry about. Of those with a perfect score, Daewoo Information Systems (AS 4961) manages the most prefixes at just under 100. The remaining 24,000 or so ASes could easily do a better job, and if they did, the registry information could be used as was originally intended, to both solve and avoid problems.

Notes:

  • Renesys computes these scores and others on a daily basis for the purposes of analysis, trending and product offerings.
  • For the most part, individual countries listed here have little control over prefixes registered in their country. That is, the scores are more a reflection of the businesses found in each country, rather than their governments.
  • Richard A. Steenbergen
    presented work on registry scoring of select providers at NANOG 44.
  • While writing this blog, I was alerted to similar work being done by the US National Institute of Standards and Technology and described here.

Update:
The earlier version of this blog inadvertently misclassified a number of registered networks due to a software bug in a script used for displaying the data. The scoring algorithm remains unchanged.

The post Route Hygiene: The Dirt on the Internet appeared first on Dyn Research.

Read more...

About the Author

Earl leads a peerless team of data scientists who are committed to analyzing Dyn’s vast Internet Performance data resources and applying their expertise to continually improve upon Dyn’s products and services.

More Content by Earl Zmijewski
Previous Article
The Blind Routing the Blind
The Blind Routing the Blind

In our last blog entry, we talked about measuring the state of routing...

Next Article
Longer is not always better
Longer is not always better

This post is a follow-up to our blog last week about a small Czech provider...