DNS and hostname resolution

I've always been confused by technical discussions of DNS. They say that DNS is used to match a domain name with the IP address of a host on the Internet. The thing that confuses me is this premise that there is an exact, one-to-one correspondence between Internet domain names and hosts on the Internet. I find that to be a gross oversimplification. Like youtube.com is one computer. Is there really one computer for the domain www(.)youtube(.)com? I find that hard to believe. Considering that there are literally billions of videos on YouTube, I find it really hard to believe that all of them are stored on one computer. The same goes for things like Facebook. I seriously doubt that every single Facebook profile in the world is stored on a single server.

So I need someone to straighten this out for me. How does this work? What are these explanations of DNS leaving out of the picture? I'm sure domain names are, in fact, often distributed over multiple servers. It's possible these servers have the same IP address due to network address translation, so that might be an explanation. Also, do they have different hostnames that need to be resolved, and is this additional hostname resolution done with another directory service like NIS?

Using nslookup you can explore domains

nslookup -q=ns youtube.com

and hosts

nslookup -q=a youtube.com

The host entry is for convenience. Normally servers are inside the domain; e.g. by convention a Web server gets a www. prefix.
In fact

nslookup -q=a www.youtube.com

gives the same result.
--
Interesting, the IPv6 lookups differ; I guess they use a different distribution mechanism.

nslookup -q=aaaa youtube.com
nslookup -q=aaaa www.youtube.com
1 Like

Okay, so there are many hosts for the domain youtube.com. I understand somewhat better now. How does DNS figure out which host to use after it narrows down the domain?

Dns is just a lookup table. And yes, it can return multiple records for a query. It's supposed to provide a cheap man's round robin on the returns (but you never know how many things are doing lookups, so sort of random). You can run your own DNS server, point to it and lie (if you want) about being authoritative for "whatever" (e.g. unix.com). Then all clients can be tricked into seeing your mappings for unix.com if they use your "tricky" DNS. You could cache other non-housed requests.... so you DNS would be "complete", but doing a tricky override for "whatever" you wanted to lie about. Hopefully this scares you a bit and it's a good segue into DNSSEC, etc...

Do a query, pick the first answer. That's how it's normally done.

Queries for records you don't house start at the root (of all... the root servers... "dot" if you will), then goes to what we call the TLDs and so on. If you own a registered domain, you can associate a DNS to it to build the relationship required (if you do this without cooperation of the holder of the parent zone this won't work, the parent zone has to know how to delegate requests to your server... sometimes the parent zone will stand authoriatative for your zone by default and they'll handle servicing your records... but normallly, you'd tell them about your server and that it will now stand authoritative for those zone (your registered domain).

Oh... and there is some caching going on the speed this lookups up quite a bit.. but the root servers get hit pretty hard.

There are good books out there... I recommend you look at the documentation at: Professional Support for Open Source - ISC

You can approach it from various angles:
a) 1-to-1 mapping is not strictly correct. It can be as many as you want to point your domains to a specific IP. For example, I have this public IP 8x.73.1.9 (all made up, course). And I registered 3 domains: mydomain.com, myotherdomain.net, mythirdsite.org. I can have them all pointed to the same IP. Assuming that I have setup web servers (or mail servers) to each and one of them, they can serve contents distincts from one another while sharing the same IP and users won't even notice that they are connected to 1 IP.

b) With regards to big companies, the IPs that you are connecting to are the ones closest to your location. So 2 people (who are geographical separated) both connecting to youtube.com are actually connecting to different IPs. That is made possible by CDNs, DNS etc. I think IPv6 has inherent support for it via anycast.

c) IPs used by big co's are mostly virtual. Meaning they are not actually tied to a single machine but rather to a group of machine. That provides huge bandwith, failover, loadbalancing etc.

Sorry for my English, I tried my best.