The Hyper Text Transfer Protocol (HTTP), the Web’s application-layer protocol, is at the heart of the Web. It is defined in [RFC 1945] and [RFC 2616]. HTTP is implemented in two programs: a client program and a server program. The client program and server program, executing on different end systems, talk to each other by exchanging HTFP messages. HTTP defines the structure of these messages and how the client and server exchange the messages. Before explaining HTTP in detail, we should review some Web terminology. A Web page (also called a document) consists of objects. An object is simply a file—such as an HTML file, a JPEG image. a Java applet, or a video clip—that is addressable by a single URL. Most Web pages consist of a base HTML file and several referenced-objects. For example, if a Web page contains HTML text and five JPEG images, then the Web page has six objects: the base WI’ML file plus the five images. The base HTML file references the other objects in the page with the objects’ URLs. Each URL has two components: the hostname of the server that houses the object and the object’s path name. For example, the URL
has www.Someschool.edu for a hostname and /somedepartment/picture.gif for a path name. Because Web browsers (such as Internet Explorer and Firefox) implement the client side of HTI’P, in the context of the Web, we will use the words browser and client interchangeably. Web servers, which implement the server side of HTTP, house Web objects, each addressable by a URL. Popular Web servers include Apache and Microsoft Internet Information Server.
HTTP defines how Web clients request Web pages from Web servers and how servers transfer Web pages to clients. We discuss the interaction between client and server in detail later, but the general idea is illustrated in Figure 2.6. When a user requests a Web page (for example, clicks on a hyperlink), the browser sends HT[P request messages for the objects in the page to the server. The server receives the requests and responds with HTI’P response messages that contain the objects.
HTTP uses TCP as its underlying transport protocol (rather than running on top of UDP). The HTJ’P client first initiates a TCP connection with the server. Once the connection is established, the browser and the server processes access TCP through their socket interfaces on the client side the socket interface is the door between the client process and the TCP connection; on the server side it is the door between the server process and the TCP connection. The client sends HTTP request messages into its socket interface and receives HTTP response
messages from its socket interface. Similarly, the HT1’P server receives request messages from its socket interface and sends response messages into its socket interface. Once the client sends a message into its socket interface, the message is out of the client’s hands and is “in the hands” of TCP. Recall from Section 2.1 that TCP provides a reliable data transfer service to HTTP. This implies that each HTTP request message sent by a client process eventually arrives intact at the server; similarly, each HT1’P response message sent by the server process eventually arives intact at the client. Here we see one of the great advantages of a layered architecture—.-H’fl’P need not worry about lost data or the details of how TCP recovers from loss or reordJring of data within the network: That is the job of TCP and the protocols in the lower layers of the protocol stack. It is important to note, that the server sends requested files to clients without storing any state information about the client. If a particular client asks for the same object twice in a period of a few seconds, the server does not respond by saying that it just served the object to the client; instead, the server resends the object, as it has completely forgotten what it did earlier. Because an t1’fl’P server maintains no information about the clients, HTI’P is said to be a stateless protocol. W&also remark that the Web uses the client-server application architecture, as described in Section 2.1. A Web server is always on, with a fixed II’ address, and it services requests from potentially millions of different browsers.