9.2 The LWP Bundle:: Writing Sophisticated Web Clients

9.2  The LWP Bundle:: Writing Sophisticated Web Clients

      The LWP::Simple module discussed in the previous section provides a simple functional interface for simple Web-client related tasks. However, to write sophisticated Web clients, we need more muscle. The set of modules that constitute the bundle called LWP provides additional facilities.

To write powerful Web clients, we need to be familiar with the HTTP protocol or the Hypertext Transfer Protocol. It is the language that Web clients and servers use to communicate with each other. A request from a client to a server and a subsequent response from the server to the client constitute an HTTP transaction. A client request and a server response follow the same syntax. There is a request or response line followed by a header section and then the entire body.

A client always initiates a transaction by using the following steps.

1.  The client contacts the server at the port number used by the Web server. The default port number is 80.

Once the connection has been established, the client sends a request to the server. The request is framed in terms of an HTTP command, usually called a method. The method is given arguments such as the address of the document to be fetched, and an HTTP version number.

The number of HTTP requests or methods is limited and they are discussed in books on networking, or in a site such as www.w3.org, the World Wide Web Consortium. The commonly used HTTP methods are GET, HEAD, POST and PUT. The GET method is used by a client to request a server to send a document found at a specific location. It can be used to fill forms that use the GET action attribute. The HEAD method is similar to GET, but it requests only some information on a file or a resource, and not the actual document. The POST method allows data to be sent to the server in a client request. For example, it can be used to provide data to a server for a newsgroup posting, or for a database operation. The POST method is frequently used by forms instead of the GET method. The PUT
method is normally used to publish a document on a Web site.

2.  Next, the client may send some header information to the server. The header information tells the server of the client’s configuration and document types that the client can accept. The header information is a sequence of lines, each line containing a header name followed by a header value. The header information is optional.

3.  Finally, the client may optionally send additional data. This additional data is usually meant for POST forms. It may also contain the content of a file to be published by the server.

The server gets the client’s request at the specified port and performs the needed action and then responds in the following manner.

1.  The first line is the status line containing the HTTP version, the status code, and a description of the status such as the word OK.

2.  The next several lines are header information sent by the server to the client. The header information usually contains information about the server itself and information about the requested document. The header lines are terminated by a blank line.

3.  If the server is successful in fulfilling the client’s request, the requested data is sent next.

When a client sends an HTTP request to a server, if the server is alive and can respond to the request, it does so. As far as client programming goes, we do not have to worry about what the server does and how. The client however needs to capture the response that comes back from the server and deal with it. If the client requests a header, a header comes back and Perl captures it. If a file is requested, Perl gets a header back as well as the contents of the file.

Perl provides two object-oriented modules: HTTP::Request and HTTP::Response to model an HTTP interaction at the client’s end. These are used to create a request to send and capture the response that comes back. A program creates an HTTP::Request object, and sends it. The response is captured automatically as an HTTP::Response object. To send out an HTTP::Request and to receive the response, Perl provides a class called LWP::UserAgent. The user agent is a software entity that actively pursues Web client related activities on behalf of a user or an application. It is a conduit between
an application and a Web server. In simple and informal terms, it is an incarnation of the Web client. It takes an HTTP request, say to fetch a Web page, and then waits for the response to arrive from the server. Once the response is received, the user agent makes it available to the rest of the program.

In the simple use situation, an application creates a LWP::UserAgent object, and creates an

HTTP::Request object for the request that needs to be performed. The request is then given to the request method of the UserAgent object. The request method opens up a communication channel with the server and sends the request out. When the response comes back, the response is captured by Perl in the form of a HTTP::Response object. Figure 9.19 describes the process diagrammatically.

 

Figure 9.19:  The LWP::UserAgent object on the client sends an HTTP::Request object to the server. The UserAgent object receives the reply as an HTTP::Response object.

The request() method of the UserAgent object can make arrangements to get the response in one of three ways: as a scalar available for direct manipulation, to be saved in a file, or handed over to a callback routine for processing. The first alternative is useful if the returned content is small and needs additional processing such as parsing to cull out relevant data. The second alternative is suitable for large objects such as a graphic or audio file. The last alternative can be used to process data in chunks as it comes in.