DNS – Understanding it helps to debug it

The key to working out what’s going on with your DNS is understanding how it works. DNS can be deceptive, since there’s lots of caching.

XKCD - The Cloud

Courtesy of xkcd.com

Many distributed denial of service attacks on root nameserversWikipedia: Distributed denial-of-service attacks on root nameservers are Internet events in which distributed denial-of-service attacks target one or more of the thirteen Domain Name System root nameserver clust... have been carried out over the years. These efforts have been entirely futile because of the resiliency of the DNS system, as created by its caching.

DNS is a Tree

You must first understand that DNS is a tree structure:

A Tree Structure

At the very top of the tree is a single file about 250KB in size. This is called the Root Zone Database and is served from a web server. You can download it yourself from http://www.internic.net/zones/root.zone. This file is controlled by the Internet Assigned Numbers AuthorityWikipedia: The Internet Assigned Numbers Authority (IANA) is a department of ICANN, a nonprofit private American corporation that oversees global IP address allocation, autonomous system number allocation, root ... and is overseen by the US Department of CommerceWikipedia: The United States Department of Commerce is the Cabinet department of the United States government concerned with promoting economic growth. The mission of the department is to "promote job creation a.... As I said before, the key to DNS is caching and a denial of service attack against the root zone database server would have very little affect on the DNS system.

On the second level of the tree is the DNS Root name serversWikipedia: A root name server is a name server for the root zone of the Domain Name System (DNS) of the Internet. It directly answers requests for records in the root zone and answers other requests by returning.... Information about these can be found on the root servers website. These servers are all run by big players in the world of the Internet. As you can see from the site, there are currently 13 registered root servers which can be accessed at [a-m].root-servers.net (e.g. c.root-servers.net). There cannot be any more of these because of the size limit of a UDP datagram (as used for DNS lookups). You’ll see, from the map, that there’s a lot more than 13 physical servers. The root servers utilize anycastWikipedia: Anycast is a network addressing and routing method in which datagrams from a single sender are routed to any one of several destination nodes, selected on the basis of which is the nearest, lowest cos... to serve the same IP address on servers in different locations. Anycast typically works such that connections to an anycasted IP should hit the server physically closest to you. The a-m root server mappings are more about who runs the servers than they are about how many there actually are.

The root servers hold information, as taken from the aforementioned root.zone, about all of the top level domainsWikipedia: A top-level domain (TLD) is one of the domains at the highest level in the hierarchical Domain Name System of the Internet.[10] The top-level domain names are installed in the root zone of the name sp... the DNS system has to offer. Examples of a top level domain include com, net, org and uk. On the third level of the tree sits the DNS servers run by the various organizations that control the top level domains. For example, Nominet control the uk TLD and Verisign control the com TLD. The DNS servers on this level hold the most information in the tree. For example, Verisign’s DNS servers hold information relating to the nameserver settings of every registered .com address.

On the fourth level of the tree sits ISP DNS servers. An ISP, in this sense, is anyone who provides DNS hosting for registered domains. If you register your domain through namecheap.com, for example, namecheap give you the option to use their DNS servers. On the namecheap DNS servers will sit the IP address of the web server which a domain is hosted on. Some people, like myself, choose to run their own DNS servers. They also sit on this fourth level.

It is possible for the tree to be of infinite depth. DNS servers on the fourth and subsequent levels can pass (often called delegate) the handling of DNS queries for a domain to DNS servers in lower levels of the tree. At this point, the tree starts to get a little flaky as a server can both serve IP addresses, delegate to other servers and be delegated to by other DNS servers. As such, a DNS server could be in level 4, 5, 6 and 7 of the tree. Since the tree is an analogy, this doesn’t really mean anything.

Finally you have what I shall call “Resolving Servers”. These are servers run by many ISPs (e.g. AOL, BT, etc.) to allow their customer’s computers to look up DNS records. A BT customer’s computer will have the IP addresses of BT’s DNS servers configured into it and it shall send requests for DNS records via this server. This server will typically not serve any data of its own – rather just forward requests through the tree.

What can a “Level 4+” server serve?

The “Level 4 or higher” servers that I described above can serve different types of DNS data. The most prevalent examples are below:

  • A Records – This is a mapping of a domain name to an IP address.
  • CNAME Records – This is a mapping of a domain name to another domain name. This has the affect of effectively aliasing a domain name to another.
  • NS Records – These “delegate” a domain to be served by another DNS server. I mentioned delegation above.
  • TXT Records – These can hold arbitrary text and are used for informational purposes for 3rd party services.
  • MX Records – These are used to point a domain to the mailserver which handles its e-mail.

So how can I debug?

Remember, DNS caches a lot. DNS is cached at every point of the tree and not just on your computer. It is a common mistake to think that ipconfig /flushdns on Windows will clear all forms of cache. It is more likely that caching will have happened on a “Level 4” or “Resolving” server than your computer.

Because of this, we need to avoid the cache. Best bet, during debugging, is to work from the top of the tree down. The linux dig tool is invaluable in debugging. dig allows you to do a lookup for any type of DNS record on any particular DNS server. Lets think about how the lookup process might work for something.example.com:

  1. Query a root server (e.g. a.root-servers.net) for the mail.127dot0dot0dot1.com A record. This will return some NS records and delegate responsibility to a load of other DNS servers.
  2. Take a random one of the servers returned in the NS records in step 1 and query it for the mail.127dot0dot0dot1.com A record. Lets use a.gtld-servers.net. This will return some more NS records delegating the example.com domain to some level 4 DNS servers.
  3. Take a random one of the servers returned in the NS records in step 2 and query it for the mail.127dot0dot0dot1.com A record. Lets use ns1.127dot0dot0dot1.com. In this case, it will return an A record giving you the IP address of mail.127dot0dot0dot1.com. It could very well have returned another NS record and you would have done a lookup on that server – rinse and repeat until you get an A record (or very bored).

So, to do this with dig, you can use the following syntax:

  1. dig domain-to-lookup.com IN A @name.server.com

So, in our above example, we would do the following commands:

  1. dig mail.127dot0dot0dot1.com IN A @a.root-servers.net;
  2. dig mail.127dot0dot0dot1.com IN A @a.gtld-servers.net;
  3. dig mail.127dot0dot0dot1.com IN A @ns1.127dot0dot0dot1.com;

In each reply, you are looking for the ANSWER or AUTHORITY section. AUTHORITY section usually means a delegation has occurred whereas an ANSWER section will typically contain the record you’ve asked for, since it is served on the server you queried. Do not get confused by the ADDITIONAL section. You will see that this provides useful information, such as the IP addresses of the nameservers which have been delegated to with NS records.

The AUTHORITY sections of the above commands will be as follows:

;; AUTHORITY SECTION:
com. 172800 IN NS a.gtld-servers.net.
com. 172800 IN NS b.gtld-servers.net.
com. 172800 IN NS c.gtld-servers.net.
com. 172800 IN NS d.gtld-servers.net.
com. 172800 IN NS e.gtld-servers.net.
com. 172800 IN NS f.gtld-servers.net.
com. 172800 IN NS g.gtld-servers.net.
com. 172800 IN NS h.gtld-servers.net.
com. 172800 IN NS i.gtld-servers.net.
com. 172800 IN NS j.gtld-servers.net.
com. 172800 IN NS k.gtld-servers.net.
com. 172800 IN NS l.gtld-servers.net.
com. 172800 IN NS m.gtld-servers.net.

;; AUTHORITY SECTION:
127dot0dot0dot1.com. 38400 IN NS ns1.127dot0dot0dot1.com.
127dot0dot0dot1.com. 38400 IN NS ns2.127dot0dot0dot1.com.

;; ANSWER SECTION:
mail.127dot0dot0dot1.com. 38400 IN A 217.13.139.111

The art of debugging is in trial and error. It should be the case that all nameservers in a delegation have the same information on each of them. In some DNS setups, there can be a considerable propagation time for DNS records placed on one nameserver to find their way to another. You can check if a record has changed on all nameservers by querying them individually:

  1. dig mail.127dot0dot0dot1.com IN A @ns1.127dot0dot0dot1.com;
  2. dig mail.127dot0dot0dot1.com IN A @ns2.127dot0dot0dot1.com;

 

If one of them reports a different IP address, you can usually assume that propagation is still ongoing and you will need to wait a bit longer before the DNS works completely.

Finally (ya, really) how long do servers in the tree cache for?

The cache time (also called Time to Live, or TTL) is decided by the server which serves the data about a domain. This server will tell other caching DNS servers how long they should cache for. A lower cache time would be set for domains whose data changes often however lower cache times make the DNS system more prone to attack or failure as once a cache has expired, a server must re-contact the original DNS server to re-obtain this information.

The TTL can be seen in each record output by the dig command. This is the number between the domain and the “IN”. You’ll see that the gtld-servers records on the DNS root servers have a cache time of  172800 seconds (2 days). This is quite a long time in the world of DNS and means that changes can take 2 days before they are fully seen across the Internet. You’ll also see that the nameserver for mail.127dot0dot0dot1.com has a cache time of just under 11 hours. This is still quite a long time but it implies that the records do not change very often.

So, that is to say that if IANA wish to change the address of a Verisign .com nameserver, it will take up to 2 days to be fully active. If I wanted to change the IP address of mail.127dot0dot0dot1.com, it would take just under 11 hours for it to take full effect. If a user looks up a recently changed A record and has the good fortune that no un-expired cache entries exist for it, in its lookup tree, the change can be seen instantly. If a cache entry was created immediately before the change, it will take the full 11 hours – this is in the same way that if the cache entry was created 10 hours before the change, it will take only 1 hour.

DNS servers also implement a negative cache time which tells a caching DNS server how long it should remember that a domain didn’t resolve for before it retries.

In summary…

DNS is not a complex system – rather it is made difficult to debug by its requirement to cache in order to provide resiliency. dig is a powerful tool to debug DNS. You should work down the tree from the root servers to the “Level 4+” ISP servers following NS record delegations and checking the consistency of the record you are looking for.