Application layer¶

Warning

This is an unpolished draft of the second edition of this ebook. If you find any error or have suggestions to improve the text, please create an issue via https://github.com/obonaventure/cnp3/issues?milestone=5

The DNS¶

The Domain Name System (DNS) plays a key role in the Internet today as it allows applications to use fully qualified domain names (FQDN) instead of IPv4 or IPv6 addresses. Many tools enable queries through DNS servers. For this exercise, we will use dig which is installed on most Unix systems.

A typical usage of dig is as follows

dig @server -t type fqdn

where

server is the IP address or the name of a DNS server or resolver

type is the type of DNS record that is requested by the query such as NS for a nameserver, A for an IPv4 address, AAAA for an IPv6 address, MX for a mail relay, ...

fqdn is the fully qualified domain name being queried

dig also contains some additional parameters and flags that are described in the manpage. Among these, the +trace flag allows to trace all requests that are sent when recursing through DNS servers.

What are the IP addresses of the resolvers that the dig implementation you are using relies on [1] ?
What is the IPv6 address that corresponds to inl.info.ucl.ac.be ? Which type of DNS query does dig send to obtain this information ?
Which type of DNS request do you need to send to obtain the nameservers that are responsible for a given domain ?
What are the nameservers that are responsible for the be top-level domain ? Is it possible to use IPv6 to query them ?
When run without any parameter, dig queries one of the root DNS servers and retrieves the list of the names of all root DNS servers. For technical reasons, there are only 13 different root DNS servers. This information is also available as a text file from http://www.internic.net/zones/named.root . What are the IPv6 addresses of all these servers.
Assume now that you are residing in a network where there is no DNS resolver and that you need to perform your query manualla starting from the DNS root.
- Use dig to send a query to one of these root servers to find the IPv6 address of the DNS server(s) (NS record) responsible for the org top-level domain
- Use dig to send a query to one of these DNS servers to find the IP address of the DNS server(s) (NS record) responsible for root-servers.org
- Continue until you find the server responsible for www.root-servers.org
- What is the lifetime associated to this IPv6 address ?
Perform the same analysis for a popular website such as www.google.com. What is the lifetime associated to the corresponding IPv6 address ? If you perform the same request several times, do you always receive the same answer ? Can you explain why a lifetime is associated to the DNS replies ?
Use dig to find the mail relays used by the uclouvain.be and student.uclouvain.be domains. What is the TTL of these records ? Can you explain the preferences used by the MX records. You can find more information about the MX records in RFC 5321
When dig is run, the header section in its output indicates the id the DNS identifier used to send the query. Does your implementation of dig generates random identifiers ?

dig -t MX gmail.com

; <<>> DiG 9.4.3-P3 <<>> -t MX gmail.com
;; global options:  printcmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 25718

A DNS implementation such as dig and more importantly a name resolver such as bind or unbound, always checks that the received DNS reply contains the same identifier as the DNS request that it sent. Why is this so important ?

Imagine an attacker who is able to send forged DNS replies to, for example, associate www.bigbank.com to his own IP address. How could he attack a DNS implementation that

sends DNS requests containing always the same identifier

sends DNS requests containing identifiers that are incremented by one after each request

sends DNS requests containing random identifiers

The DNS protocol can run over UDP and over TCP. Most DNS servers prefer to use UDP because it consumes fewer resources on the server. However, TCP is useful when a large answer is expected or when a large answer is expected. Use time dig +tcp to query a root DNS server. Is it faster to receive an answer via TCP or via UDP ?

Internet email protocols¶

Many Internet protocols are ASCII-based protocols where the client sends requests as one line of ASCII text terminated by CRLF and the server replies with one of more lines of ASCII text. Using such ASCII messages has several advantages compared to protocols that rely on binary encoded messages

the messages exchanged by the client and the server can be easily understood by a developer or network engineer by simply reading the messages

it is often easy to write a small prototype that implements a part of the protocol

it is possible to test a server manually by using telnet Telnet is a protocol that allows to obtain a terminal on a remote server. For this, telnet opens a TCP connection with the remote server on port 23. However, most telnet implementations allow the user to specify an alternate port as telnet hosts port When used with a port number as parameter, telnet opens a TCP connection to the remote host on the specified port. telnet can thus be used to test any server using an ASCII-based protocol on top of TCP. Note that if you need to stop a running telnet session, Ctrl-C will not work as it will be sent by telnet to the remote host over the TCP connection. On many telnet implementations you can type Ctrl-] to freeze the TCP connection and return to the telnet interface.

Use your preferred email tool to send an email message to yourself containing a single line of text. Most email tools have the ability to show the source of the message, use this function to look at the message that you sent and the message that you received. Can you find an explanation for all the lines that have been added to your single line email ?

The TCP protocol supports 65536 different ports numbers. Many of these port numbers have been reserved for some applications. The official repository of the reserved port numbers is maintained by the Internet Assigned Numbers Authority (IANA) on http://www.iana.org/assignments/port-numbers [3] Using this information, what is the default port number for the POP3 protocol ? Does it run on top of UDP or TCP ?
The Post Office Protocol (POP) is a rather simple protocol described in RFC 1939. POP operates in three phases. The first phase is the authorization phase where the client provides a username and a password. The second phase is the transaction phase where the client can retrieve emails. The last phase is the update phase where the client finalises the transaction. What are the main POP commands and their parameters ? When a POP server returns an answer, how can you easily determine whether the answer is positive or negative ?
On smartphones, users often want to avoid downloading large emails over a slow wireless connection. How could a POP client only download emails that are smaller than 5 KBytes ?

The HyperText Transfer Protocol¶

System administrators who are responsible for web servers often want to monitor these servers and check that they are running correctly. As a HTTP server uses TCP on port 80, the simplest solution is to open a TCP connection on port 80 and check that the TCP connection is accepted by the remote host. However, as HTTP is an ASCII-based protocol, it is also very easy to write a small script that downloads a web page on the server and compares its content with the expected one. Use telnet to verify that a web server is running on host cnp3book.info.ucl.ac.be [4]
Instead of using telnet on port 80, it is also possible to use a command-line tool such as curl Use curl with the –trace-ascii tracefile option to store in tracefile all the information exchanged by curl when accessing the server.
- what is the version of HTTP used by curl ?
- can you explain the different headers placed by curl in the request ?
- can you explain the different headers found in the response ?
HTTP 1.1, specified in RFC 2616 forces the client to use the Host: in all its requests. HTTP 1.0 does not define the Host: header, by most implementations support it. By using telnet and curl retrieve the first page of the http://cnp3book.info.ucl.ac.be webserver by sending http requests with and without the Host: header. Explain the difference between the two.
The headers sent in a HTTP request allow the client to provide additional information to the server. One of these headers is the Accept-Language header that allows to indicate the preferred language of the client [7]. For example, curl -HAccept-Language:en http://www.google.be will send to http://www.google.be a HTTP request indicating English (en) as the preferred language. Does google provides a different page in French (fr) and Walloon (wa) ? Same question for http://www.uclouvain.be (given the size of the homepage, use diff to compare the different pages retrieved from www.uclouvain.be)
Compare the size of the http://www.yahoo.com and http://www.google.com web pages by downloading them with curl

The ipvfoo extension on google chrome allows to visually detect whether a website is using IPv6 and IPv4, but also to see which web sites have been contacted when rendering a given webpage. Some websites are distributed over several dozens of different servers. Can you find one ?

Footnotes

[1]	On a Linux machine, the Description section of the dig manpage tells you where dig finds the list of nameservers to query.

[2]	You may obtain additional information about the root DNS servers from http://www.root-servers.org

[3]	On Unix hosts, a subset of the port assignments is often placed in /etc/services

[4]	The minimum command sent to a HTTP server is GET / HTTP/1.0 followed by CRLF and a blank line

[5]	See section 5 of RFC 1945

[6]	See section 6.1 of RFC 1945

[7]	The list of available language tags can be found at http://www.iana.org/assignments/language-subtag-registry Versions in other formats are available at http://www.langtag.net/registries.html Additional information about the support of multiple languages in Internet protocols may be found in rfc5646