Anatomy of a URL and the DNS process.

Anatomy of a URL

URL stands for Uniform Resource Locator. A URL is a web address that points to a specific online resource. Think of a physical address for a home or business for comparison. The address of 972 Market St, San Francisco, CA 94102 points to a specific building in the United States, while the address of https://mail.google.com/gmail points to a specific online resource on a server.

Protocol:

In general terms, a protocol is a set of standards or rules all parties have agree upon to use in practice. In terms of the internet, a network protocol is a set of standards that determines how two computers communicate with each other over a network.

How SSL Works

Domain name (second level domain)

A domain name is your website name. In the example above, ‘google’ is the address where internet users can access your website.

Subdomain

A subdomain is a sub-division of a domain name. The subdomain always comes before the domain name. With most URLs, the subdomain in www.google.com is ‘www’, but in the example above, ‘mail’ is the subdomain for the domain ‘google’. The most common use case for a subdomain is to organize or divide web content into distinct sections or customer bases.

Domain and Subdomain explanations.

Top Level Domain

In our example above ‘com’ is the top level domain. Top level domains are at the highest level in the hierarchical Domain Name System of the Internet. The most common TLDs are .com, .net and .org.

Path

The path refers to the file or directory on the web server. When you incorrectly type in a URL with the wrong Path, you either get a HTTP status code of 404 Resource Not Found or 301 Moved Permanently.

Putting it all together with DNS (Domain Name System)

https://mail.google.com is in our browser and we hit ENTER. What happens now? The DNS process takes the human readable URL of ‘https://mail.google.com’ and translates it into machine readable numbers called an IP address.

The best overview of DNS currently on the web
  1. The browser, Operating System and the router check their memory cache for the requested URL.
  2. If not found the browser tells the Operating System to query the Resolving Name Server for IP addresses it doesn’t know. If the Resolver does not have the IP address in memory, it checks the Root Name Server.
  3. If the Root does not have the IP address, it then sends the resolver request to the Top Level Domain (TLD) servers. In this example, the root sends the resolver request to the COM TLD name servers.
  4. If the COM TLD name servers do not have the IP address, they then send the resolver request to the google name servers. The COM TLDs know to send the resolver request to the authoritative name servers because of the Registrar. When a domain name is purchased, the registrar updates the authoritative name servers to reflect the purchased IP addresses.
  5. The resolver request is now at the authoritative name servers. The authoritative name servers give the IP address associated with the resolver query (https://mail.google.com/gmail).
  6. The resolver then gives the IP address to the Operating System. The Operating System then gives the IP to the browser and now the browser knows which server’s IP address to request online resources from.
  7. During this entire step, the resolver, Operating System and browser cache all the requests in memory. If the same URL is queried another time, the IP address is found in memory and does not have to go through the DNS look up process again.
Actors from left to right: Resolver, Root, TLD, Authoritative Name Servers

We have the IP address. Now what?

We now have the machine readable numbers associated with the human readable URL. Great! Your browser now sends over a request to Google’s server. Remember the ‘https’ protocol used above? That protocol is part of a larger protocol scheme called TCP/IP.

Let’s make a connection

TCP/IP, or the Transmission Control Protocol/Internet Protocol is a set of data transfer protocols used to connect network devices on the internet. With an IP address, your browser will attempt to initiate a TCP connection with the requested server to send information.

TCP Handshake overview
TCP Handshake video overview

Connection Denied.

Firewalls might be present to restrict incoming and/or outgoing traffic and protect against unwanted connections. Depending on the firewall configuration, a list of IP addresses are checked before the connection. If an IP address is on the firewall list, a connection is blocked and data does not get sent over the network.

Load-Balancers

Once the connection is established, both computers can send data back and forth to each other over whatever desired protocol. But what happens when there are multiple requests coming to the same server and the server can’t handle the request load?

Simple Web/Application Server and Database Set Up

If you take a look at the picture below, our server has two instances of web application running. Depending on the load balancing algorithm, the request is sent to the server to retrieve information. Once the request hits the server, depending on the resource type (dynamic or static) the web/application server sends the data back to the user’s browser.

Resources:

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store