HTTP Explained
The Hypertext Transfer Protocol is an application-level protocol that is used for fetching resources. It is part of the internet protocol suite (IP suite), which includes other protocols such as DNS, FTP, STL/SSL, and POP. It is the foundation of resource and data exchange on the web, and HTTP sessions are normally initiated by the recipient of the resources.
An HTML document is typically made up of several sub-documents, including text, images, videos, and more. Messages are passed between clients and servers and depending on the version of HTTP, connections will persist but regardless, communication is done by exchanging messages rather than in a streaming format.
The original HTTP specifications were written in the early 1990s, and intended to be scalable and extendable. Over time, it has evolved through several iterations and many specifications now extend the original.
Components
HTTP is made up of several components, including the client, server, and intermediaries such as proxies. Clients initiate requests that are answered by a server, and the intermediaries are devices such as proxy servers.
Client
The client identifies itself with the user-agent, which is any tool that makes requests for the benefit of the user. This is typically a web browser, although other applications interact with resources using HTTP as well. An example might be a content management system that accesses web-based resources through an API.
When a browser displays a web page, it creates a request to fetch several resources from the server. This begins with an HTML document that the client parses to determine what additional resources need to be fetched, what scripts need to be run, and the appropriate layout instruction. Once the initial HTML page is presented, user input or script execution can cause the browser to fetch additional resources and update the content being displayed.
Server
The web server is the entity that serves the documents, or resources, that are requested by the client. The server will also appear as a single machine but in reality, can be a network of servers. Multiple machines are used to distribute the load and maintain responsiveness for many clients.
It is also important to recognize that several sites can be hosted on a single machine or the same network. Different websites can even share the same IP address, as they are routed by the web server according to the Host request header.
Intermediaries
There can be myriad connections between the client and the server. The majority of these operate at the lower transport, network, or even physical layers of the network. Intermediaries that operated at the application layer are often referred to as proxy servers.
A proxy server, like others operating at different layers in the web stack, can operate transparently or non-transparently. Messages can be relayed and left in their original form but a proxy might alter the content before passing the message to the next device. Proxy servers will perform Caching, load-balancing, filtering, logging, and Authentication.
Control and Flow
HTTP sessions are made using a TCP connection, where the client will initiate the connection and the server, perhaps through several intermediaries, will acknowledge and use it. Clients can create one or more new connections, as well as re-use existing ones, for transmitting and receiving messages.
Depending on the options and what version of HTTP is being used, connections remain persistent between messages. In older versions of HTTP, for example HTTP/1.1, a single connection was used to retrieve a single resource. The overhead of opening and closing connections multiple times for a single web page, such as one that is multimedia heavy, contributed to slow load times. Newer versions of HTTP, notably HTTP/2 and HTTP/3, are designed to reuse open connections to retrieve multiple resources and thereby speed up load times.
Types of Messages
The two types of HTTP messages are requests and responses. Requests include a keyword, or method, that specifies what operation is to be performed. A common method is HTTP GET, which can often be used to retrieve a specific resource. Requests can include the path of the resource, the version of the HTTP protocol being used, various HTTP Headers used to supply the server with additional information, and a message body for transmitting bulk content. A message body can be used by a method such as HTTP POST, where the content will be stored on the server for consumption at a later time or by another client.
Responses are sent by servers as answers to requests. These include some of the same information that a request does, such as the HTTP protocol version, but also contain details such as a status code to indicate success, failure, and the reasons why.
Takeaway
The Hypertext Transfer Protocol (HTTP) is the primary means for requesting and receiving web-based resources. It is easy to implement, use, and scale. It is constantly evolving to meet the ever-increasing needs of clients and can take advantage of the latest in hardware and software advances. The client-server architecture allows for flexible network topology, and built-in encryption help to maintain security and alleviate privacy concerns.