Over the course of a given month, hundreds of Internet-impacting “events” are visible within the Oracle Internet Intelligence Map. Many are extremely short-lived, lasting only minutes, while others last for hours or days; some have a minor impact on a single metric, while others significantly disrupt all three metrics. In addition, for some events, the root cause is publicly available/known, while for other events, digging into the underlying data helps us make an educated guess about what happened. Ultimately, this creates challenges in separating the signal from the noise, triaging and prioritizing that month’s events for review in this blog post.
Having said that, in September we observed Internet disruptions due to exams, power outages, extreme weather, and submarine cable issues, as well as a number of others with unknown causes. Additionally, a third test of nationwide mobile Internet connectivity took place in Cuba.
As noted in our August post, ETECSA (the Cuban state telecommunications company) carried out two tests of nationwide mobile Internet connectivity, which were evident as spikes in the DNS query rates from Cuba. In a Facebook post, they noted, “On August 14th was a first test that measured levels of traffic congestion and that audited the network in stress conditions, the second test was made on August 22 and its purpose was to try the portal my cubacel and the short codes for service management.”
The company planned a third test, this one lasting three days from September 8-10, highlighting it in a promotional graphic that was posted on their Facebook page. They noted that this third test “was designed for three days with the purpose of checking traffic management in different structures of the network,” intended to validate optimizations made as a result of the connection difficulties and network congestion that resulted from the August tests.
Similar to the prior tests, Cuba’s DNS Query Rate spikes at 05:00 GMT (midnight local time) on September 8, remaining elevated through the end of the day (local time) on the 10th, when it settles back down into a much lower diurnal pattern. ETECSA’s Facebook post noted that more than 1.5 million people had participated in these tests of nationwide mobile Internet access.
Similar to actions taken a number of times in the past, Internet connectivity in Iraq was shut down repeatedly between September 1-10 to prevent cheating on nationwide student exams. A published report noted that a statement from the Iraqi Ministry of Communications planned to suspend Internet service between 06:30 and 08:30 (local time).
As seen in the figures below, multi-hour Internet shutdowns were implemented on nine of the 10 days, with September 7 the only exception. Partial drops seen in each metric indicate that the shutdowns were not complete – that is, Internet access remained available across some parts of the country.
According to a published report, late in the day on September 6, western and southern regions of Libya, including the capital city of Tripoli, experienced a total blackout. The power outage was reportedly related to the impact of bloody clashes in Tripoli, which prevented repair teams from reaching power stations and grids in the impacted area. The impact of the power outage is evident in the graph below, showing a drop in the traceroute completion rate metric starting late in the day (GMT) on September 6, lasting for approximately half a day. A minor perturbation in the BGP routes metric is evident as well. Ongoing turmoil in the country also impacted Internet availability in Libya several days later, with another multi-hour drop in the traceroute completion rate evident on September 9.
After forming on September 7 as a tropical depression in the Pacific Ocean, Typhoon Mangkhut quickly strengthened and moved west towards Micronesia. On September 10, the typhoon moved across both the Northern Mariana Islands and Guam, causing damage with winds in excess of 100 miles per hour.
As shown in the figure below, the storm impacted Internet connectivity in the Northern Mariana Islands, with the traceroute completion rate metric declining around mid-day local time (the Islands are GMT+10) on September 10, with the DNS Query Rate also lower than normal for that time of day. The following figure shows that Internet connectivity on Guam was impacted several hours later, with the traceroute completion rate metric declining later in the day local time (Guam is also GMT+10) on September 10. It also appears that there was a slight impact to the number of routed networks at around the same time, with a concurrent drop in the DNS query rate metric.
By the next morning, the storm had reportedly moved past the islands, although the calculated metrics took several days to return to “normal” levels.
Figures below illustrate the impact that Typhoon Mangkhut had on local network providers. AS7131 (IT&E Overseas) has prefixes that are routed on both Guam and the Northern Mariana Islands. The number of completed traceroutes to endpoints in this autonomous system begin to drop mid-day local time, likely due to power outages or damage to local infrastructure alongside the arrival of the storm. Interestingly, a number of traceroutes started to pass through AS9304 (Hutchinson) around the same time as well, but it isn’t clear if this is simply coincidental, or if traffic through this provider was increased as part of a disaster recovery process. The number of completed traceroutes to endpoints in AS9246 (Teleguam Holdings) also began to decline later in the day local time on September 10, also likely due to local power outages or infrastructure damage. Interestingly, while some endpoints across both networks became unreachable as a result of Typhoon Mangkhut, there did not appear to be a meaningful impact to measured latency, which remained within the ranges seen during the days ahead of the storm.
On September 4, Australian telecommunications infrastructure provider Vocus posted an “Incident Summary” regarding a suspected fault in the SeaMeWe-3 (SMW3) cable between Perth, Australia and Singapore.
The figure below (from one of Oracle’s commercial Internet Intelligence tools) illustrates the impact of the cable failure on the median latency of paths between Singapore and Perth – specifically, from a measurement in cloud provider Digital Ocean’s Singapore location to endpoints in Perth on selected Internet service providers. Among the measured providers, latencies increased 3-4x on September 2/3, stabilizing by the 4th.
The initial incident summary published by Vocus noted that similar faults seen in the past have taken upwards of 4 weeks to restore. However, on September 5, an article in ZDNet revealed that Vocus pressed the new Australia Singapore Cable (ASC) into service two weeks ahead of schedule, shifting customer traffic onto it from the damaged SMW3. The figures below, generated by internal Internet Intelligence tools, illustrate how failure of the SMW3 cable caused measured latencies to increase, and how they returned to previous levels when the ASC cable was activated and traffic was shifted onto it.
On September 10, @RightsCon, a Twitter account associated with Internet advocacy group AccessNow, posted a Tweet looking for verification of Internet disruptions in several countries.
Calling on the #RightsCon community: #KeepItOn is looking to verify internet shutdowns/disruptions in Angola, the Maldives, & Saint Vincent and the Grenadines. If you have information, or know someone who might, reach out to @btayeg PGP 0x8050D4F68EBB844E70BF94F6C7C45F98F350B5E3
— RightsCon (@rightscon) September 10, 2018
Doug Madory, director of Internet analysis on Oracle’s Internet Intelligence team, replied, noting that “Internet connectivity issues in Angola was due to problems on the WACS submarine cable.” The figure below shows the impact of the submarine cable issues that occurred several days earlier, with disruptions evident in both the traceroute completion ratio and BGP routes metrics on September 7.
The disruptions reviewed above were caused by known issues with the SMW-3 and WACS submarine cables. However, September also saw a number of additional disruptions that may have been related to issues with submarine cable connectivity, but such correlations were not definitively confirmed.
On September 5-6, a significant Internet disruption was observed in Comoros, impacting all three metrics as seen in the figure below. A complete outage was observed at Comores Telecom, with the number of completed traceroutes to endpoints in that network dropping to zero during the disruption. As the figure below shows, prior to the outage, traceroutes reached Comores Telecom primarily through Level 3 and BICS, but went through West Indian Ocean Cable Company for approximately three days after the outage, before transiting Level 3 and BICS once again.
International connectivity to Comoros is carried over both the Eastern Africa Submarine System (EASSy) as well as FLY-LION3, although the latter only connects Comoros to Mayotte. The observed shift in upstream providers could be indicative of a problem on one submarine cable, forcing traffic onto the other until issues with the primary cable were resolved.
Later in the month, Caribbean islands Saint Martin and Saint Barthelemy both saw disruptions that lasted for approximately 24 hours across September 28-29, as evident in the declines seen in the traceroute completion rate and BGP routes metrics shown in the figures below. (Because the disruption occurred on Friday night/Saturday, DNS query rates were lower anyway, so the evidence of the disruption in that metric would be harder to see.) Both islands are connected to Southern Caribbean Fiber, with a spur running from Saint Martin to Saint Barthelemy.
On September 25, 26, and 30, disruptions to Internet connectivity in American Samoa were evident in the Internet Intelligence Map, as shown in the figure below. Brief drops across all three metrics were observed on the 25th and 26th, while multiple drops were observed on the 30th. Internal tools indicated that the underlying issues impacted BlueSky Communications/SamoaTel, and the issues can be seen in the traceroutes going through Hurricane Electric at times that align with the issues seen in the American Samoa graph. The territory has been connected to the American Samoa-Hawaii (ASH) submarine cable for nearly a decade, but also connected to the Hawaiki cable earlier this year. BlueSky appears to connect with Hurricane Electric in San Jose, California, but it isn’t clear which cable carries traffic from that exchange point to the island.
Associating Internet disruptions with an underlying cause can be easy to do when related events are publicly known – severe weather, power outages, civil unrest, and even school exams. In many cases, these disruptions last for hours or days, making it more likely that they will impact Internet connectivity for users in the impacted country. However, for each well-understood disruption, there are dozens more that we observe each month that are brief, partial (not dropping the calculated metrics to zero), and unexplained. Due to their nature, these disruptions may not have a significant impact on user connectivity, which makes finding public commentary (such as news articles or Twitter posts) on them all the more challenging. Using internal Internet infrastructure analysis tools and public tools like Telegeography’s Submarine Cable Map, we can surmise what may have caused the disruption, but the actual root cause remains unknown.