CGI or Common Gateway Interface is a protocol that defines the communication between an http client (such as a browser) and an application that will be executed by the WWW server as a fulfillment of an http request. WWW servers can recognize requests to execute scripts as being different from requests for simple document retrieval by the location or type of the file requested. In many cases, WWW servers will attempt to execute files only if they are in a particular directory tree. The file may also need to be recognizable as an executable file (by extension, registration, or file system settings).
When a web client makes either a GET or POST request that server recognizes as a request to execute a script, the server will attempt to execute the script, supplying it with information about the request. This information may include a query string.
The CGI specifies a very simple data exchange for applications. The script to be executed will be given a single string, and the output (STDOUT) of the script will be used to determine the information that will be returned to the client. In the simplest cases, this is just a document preceded by a simple content-type header. The server supplements this with additional header information before returning it to the client.
Http GET requests can be generated by a browser through a link activation, a form submission, or a url entered into the address bar. In these cases, the query string is part of the url; it is appended to the end of the url with an intervening question mark (?). The WWW server splits this off and makes it available to the CGI application by placing it in the system environment table. Perl scripts can access the query string through %ENV:
$qs = $ENV{'QUERY_STRING'};
Http POST requests are generally the result of a form submission (with method set to post). Form submissions cause a query string to be constructed from the successful controls. The url is found in the form's action attribute. POST requests send the query string to the server as part of the request package; it is not appended to the url. The WWW server provides the query string to the CGI application through STDIN. Note that in the event of a GET request, there will be nothing in STDIN and a request for input will "hang" the script!
$qs = <STDIN>;
Before the server activates the script, it is required to add several items to the environment table. One of these is the request method (which should be GET or POST). In the case of a POST request, the content length should be used to access the query string via read.
if ( not exists $ENV{'REQUEST_METHOD'} ){
$qs = '';
if ( $ENV{'REQUEST_METHOD'} eq 'GET' ){
$qs = $ENV{'QUERY_STRING'};
}elsif ( $ENV{'REQUEST_METHOD'} eq 'POST'){
read (STDIN, $qs, $ENV{'CONTENT_LENGTH'});
}else {
$qs = '';
}
It is possible for a POST request to also include a query string as part of the URL, so the above code is a bit naive, ignoring this possibility. It also assumes no errors will occur.
The query string is a sequence of name/value pairs. These are separated form one another by the ampersand character (&). An equal character (=) is used to separate the name from the value in each name/value pair. Query strings may not contain some characters, so escaping is used to allow special characters to be encoded in the string.
name=Tim+Margush&office=229&building=CAS&grade=99%25
Parsing a query string requires separating the parts of the string and interpreting the escape characters. Escaped characters are either plus signs (+) which represent spaces, or two digit hex values preceded by a percent sign (%dd). The following loop assumes the query string is legally formed - it need not be!
@nv_pairs = split /&/, $qs;
foreach $nvp (@nv_pairs){
( $n, $v ) = split /=/, $nvp;
$n =~ tr/+/ /;
$n =~ s/%([\da-f][\da-f])/chr( hex($1) )/egi;
$v = "" unless defined $v;
$v =~ tr/+/ /;
$v =~ s/%([\da-f][\da-f])/chr( hex($1) )/egi;
$qs_data{$n} = $v;
}
Note that the substitution step matches the escape sequences, placing the two-digit hex values in $1 (note the parentheses used to group this part of the pattern). Case is ignored in the match (the 'i' suffix). The 'e' suffix causes the chr and hex functions to be executed, translating the matched 'dd' to a character value before the substitution takes place. The 'g' suffix executes the substitution through the entire string.
Error handling (and debugging) CGI applications in Perl is a bit tricky. Getting the "Internal Server Message" seldom provides any hints as to why the script failed. Two techniques can help track down the cause of your errors. These are detailed in the Debugging and Help part of the Scripting Hints page.
The process of retrieving and parsing the query string is so common, it is no surprise that Perl includes facilities to automate it. A module named CGI.pm is shipped with standard Perl installations that includes this ability, and makes available a collection of functions that help with the generation of html code which is generally required as the output of the script. You can access the contents of the module by adding the following use clause to the top of your script:
use CGI qw(:standard);
This includes the most common functions from the module. It also implies that you will be using the function-oriented interface rather than the object-oriented interface (provides some additional features). You can read the full documentation of the module at http://stein.cshl.org/WWW/software/CGI/
With this module, a CGI object is created to encapsulate the CGI environment. The object contains a representation of the query string and provides access to the parameters. Parameters can be extracted with a simple function call. You pass the parameter name to get the value, or pass nothing to get the list of parameter names:
@plist = param;
$val = param('name');
@vals = param('hobbies');
To allow for the possibility that a particular name appears more than once, param can return a list of values (if called in list context). The param function returns undef if the name is not found in the query string. Illegal query strings are retrieved as a list of words under the name keywords
@w_list = param('keywords');
You can access the original query string via the environment, or use the query_string method to access the string associated with the current name/value pairs in the CGI object. You can directly manipulate the parameters (name/value pairs) of the object through methods; the changes are reflected in the return value of query_string. Note that the query string in the environment table is not affected by these manipulations.
param('last', "Alias Alias"); #create/modify value for param named last
delete('age'); #remove age name/value pair
delete_all; #remove all name/value pairs
The CGI module simplifies the task of outputting html
print header('text/html'); #creates the usual response header
print start_html({-title=>"My CGI.pm Page",
-bgcolor=>"black", -text=>"white"});
print h1("Check This Out!");
print end_html;
These functions return strings with the appropriate html tags surrounding the content. Attributes are set by passing a reference to a hash as the first argument to the function. The minus signs (-) in front of the attribute names are not required, but is a convention that keeps the notation compatible from some abbreviated alternative syntaxes. Use undef as the value for an attribute that has no value. The above statements create the following output (reformatted for clarity):
Content-Type: text/html; charset=ISO-8859-1
<?xml version="1.0" encoding="iso-8859-1"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" lang="en-US">
<head>
<title>My CGI.pm Page</title>
</head>
<body bgcolor="black" text="white">
<h1>Check This Out!</h1>
</body>
</html>
It is possible for a script to tell the WWW server to return a page located elsewhere. This is accomplished by outputting a redirect header:
print redirect( "http://www.yahoo.com" );
Many of the CGI functions also take list or list references as arguments. When a list reference is passed, the tags are distributed over the elements of the list. This is very convenient for lists and tables. Note that passing a list does not cause distribution, but rather combines the items within the same element. This fragment...
print p('This is a sentence.', 'So is this.');
print p( {-style=>'text-align:right'},
['This is a paragraph', 'So is this']);
print ol (li ( ['one', 'two', 'three'] ));
...produces the following html...
<p>This is a sentence. So is this.</p>
<p style="text-align:right">This is a paragraph</p>
<p style="text-align:right">So is this</p>
<ol><li>one</li> <li>two</li> <li>three</li></ol>
Scripts can be designed to support forms created in HTML files. They can also produce HTML files containing forms. It is quite common to use a script to generate a form as well as process the form. This allows the form to be redisplayed with error messages in case some of the data is invalid or incomplete. The following pattern therefore is quite common
if (param()){#been here before - process data
process_data;
}else{#no params, just display form
show_form;
}
Creating forms from the Perl script offers some additional advantages, one of which is the ability for form widgets to automatically retain their values when the form is redisplayed. This is sometimes referred to as a sticky widget. The CGI module provides this automatically by substituting the current CGI objects' values in the element tags when the tags are created.
The CGI module provides functions to create any type of widget. The widget attributes are usually supplied by a hash reference. Here are a few example surrounded by the form boilerplate.
print startform("POST", "http://www.where-ever.com/cgi-bin/script.pl");
print textfield( {-name=>'customer',
-default=>'Please enter your name',
-size=>50,
-maxlength=>80 }
);
print textarea( {-name=>'address',
-default=>'',
-rows=>5, -columns=>50 }
);
print popup_menu( {-name=>'age_group',
-values=>[qw/young middle other/],
-labels=>{young=>'0-15', middle=>'16-29', other=>'30+'},
-default=>'middle' }
);
print scrolling_list(-name=>'hobbies',
-values=>{sewing=>'Sewing and Embroidery',
football=>'Football', beading=>'Beading',
fishing=>'Fishing', cooking=>'Culinary Arts',
singing=>'Singing'},
-default=>['fishing', 'football'],
-size=>5, -multiple=>'true',
);
print checkbox(-name=>'sex',
-checked=>'checked',
-value=>'YES',
-label=>'Male or Female?'
);
print checkbox_group(-name=>'style',
-values=> {bold=>'bold', italic=>'slanted', large=>'big'},
-default=>['italic'],
-linebreak=>'true'
);
print radio_group( {-name=>'favorite',
-values=>[qw/red blue green yellow/],
-default=>'green' }
);
print reset('Reset The Form');
print defaults('Restore Defaults');
print hidden({-name=>'secret_data',
-default=>[qw/this data is hidden 45/]}
);
print endform;
One of the confusing aspects of writing form-handling scripts is keeping track of what has happenned in the past. Remember the sequence of events -
The script, in general, has no idea that the third request is associated with the second - for that matter, these might all be requests from different locations around the world. The script must be written to act on exactly what it is given (or not given). It is possible to store some tracking information in hidden fields in the form that is sent to the client. This hidden data can be unique to each client and used to provide a history of past submissions. Keep in mind that many unrelated form processing requests may be fulfilled by the script between the time the script sends a blank form to the client and the time the form is returned by that client.
Cookies provide a simple mechanism for scripts to relate previous submission information with a current submission. A cookie is a text file stored on the client's machine. Browsers automatically store cookies when the response from an http request contains instructions to do so. Future requests initiated from the browser will automatically include the cookie data as part of the request. Scripts can access this data in much the same was that they access the query string.
Cookies are created when a cookie is sent as part of the response package (the response to an http request). Perl scripts can easily add a cookie to the output (when it is creating the html page) using the cookie function:
$sugar = cookie ( { -name=>'mycookie',
-value=>'whatever you want to store',
-expires=>'+1h',
-path=>'/cgi-bin',
-domain=>'cs.uakron.edu' }
);
Cookies are part of the header of an html document, so add the cookie info to the document as follows:
print header( { -type=>'text/html',
-cookie => $sugar } );
Cookie data is retrieved from the http request package via the cookie function (request by name or get all names):
$cookie_value = cookie('mycookie');
@allcookienames = cookie();
Cookie sizes are not limited, however any browser may impose limits. The most effective (and secure) way of using cookies is to keep the data locally in a database, keyed to a unique session identifier. The session ID would be the only thing stored in the cookie.
The expires value can look like any of these:
+30s 30 seconds from now
+10m ten minutes from now
+1h one hour from now
-1d yesterday (i.e. "ASAP!")
now immediately
+3M in three months
+10y in ten years time
Thursday, 25-Apr-1999 00:40:33 GMT
The domain attribute must contain at least 2 periods (as in .uakron.edu) and specifies that the cookie should be sent as part of any request sent to a machine whose domain matches the partial domain. The default is the machine the cookie came from.
The path attribute allows you to limit cookies to requests going to a particular subtree of the directory. The default is "/" meaning any request will include the associated cookies.
.