Department of Computing Imperial College London
CGI Scripts

These are some notes on CGI (Common Gateway Interface) scripts, how they work and possible problems you may have. The Common Gateway Interface is a standard agreed upon by Web server writers to allow servers to call user written programs in a standardised way. Please remember that any scripts you run must comply with the usual computing regulations.

DoC runs the Apache web server, version 2.2.6. The Frequently Asked Questions page at their website covers some queries on CGI.

This page just gives you the basics of running CGI scripts in DoC. CGI scripts can be written in any language, but the majority of them tend to be written using either the Perl programming language (recommended!), PHP (very common, but not recommended for new software development!), or Python (which some people like a lot while others hate it).

Permissions, Ownerships, Hash-Bang lines

Any CGI script in your public_html directory having the suffix .cgi (or .php) can be run provided it meets the following conditions:

These restrictions are requirements of suexec (see below) - since the script runs as you, they are for your protection.

The best way to ensure that these conditions are met is to run the following commands in Linux:

chmod 755 scriptname
chmod 755 .
chown -R yourusername.yourgroupname .

Note that if you are running CGI from a group project area in /vol/project things are slightly different:

Files must be in Unix format, not Windows. The Programmer's File Editor (Start -> Programs -> Editors -> PFE) on Windows workstations has an option under "File/Save As" to use Unix format, or from the Linux command line dos2unix can be used to convert Windows files.

When a web request comes in that names one of your CGI scripts, and the above constraints have been satisfied, the script itself is executed as you in the normal Unix way. This means that unless it is a true compiled executable, the first line of the script should be a hash-bang line of the form #!/path/to/interpreter. So, for example, all Perl CGI scripts should have first line #!/usr/bin/perl, and all PHP CGI scripts must have the first line #!/usr/bin/php.

The Most Common Problem with pre-written PHP Software

Given that loads of pre-written open-source web software already exists written in PHP, lots of people download such software and want to run it here. First off: THINK ABOUT SECURITY because much pre-written PHP software is not that well written and many web servers have been hacked due to badly written PHP scripts being installed without due thought. Ideally, you should read the code, vet the code, think about the code to check that it seems high quality, research security issues reported on the web.

When you have decided that the software is secure enough to run on our web server, the single most common problem you will have with pre-written PHP software is that existing trees of PHP source files usually do not have #!/usr/bin/php at the top of each script, and worse still the PHP language makes no distinction between a program (script invoked by a URL) - which needs #!/usr/bin/php adding - and a library (file which is included into another PHP file) - which must not have the #!/usr/bin/php line added. So in this situation you need to identify which individual PHP files are programs (scripts, accessed as URLs, not libraries that are included into other .php files) and then go through adding the #!/usr/bin/php to just those files, and making them executable. We know of no easy way of finding which .php files are programs and which are libraries - if you work one out please let us know. One possible avenue: programs are not libraries, libraries are .php files which are included in other .php files.

But once you've come up with the list of files needing hash-bang lines, here's a canned bulk in-line edit command which will make the #!/usr/bin/php insertion on every file in a list of files:

perl -pine 'print "#\!/usr/bin/php\n" if $. == 1; close ARGV if eof' LIST_OF_FILES

The CGI Environment

When your CGI script runs, it will be called with a number of shell environment variables set. If the URL that invokes the CGI script contains a '?' then anything following the '?' is placed into the QUERY_STRING environment variable, and also given to the script as a command line argument. Similarly, if the invoking URL names a CGI script and has a '/' section after the name of a valid CGI script, the '/' section is extracted and placed into the PATH_INFO environment variable.

Try the following WWW pages to see what environment variables are available to your CGI program. The program is a shell script and is available here.

The program has to return not only the contents of the page but the WWW header information for the page. Normally this only requires returning the Content-type: line followed by a blank line, to mark the end of the pages header information.

For example to return a plain, unformatted, text document:

    Content-type: text/plain

    Some boring text.
    
To return a page of formatted HTML:
    Content-type: text/html

    <b>Bold</b>,<i>Italic</i> and much much more!
    

Consider using the perl module CGI.pm for CGI scripting. On Linux systems type:

perldoc CGI
for details.

You can run the script outside of the web server and CGI.pm has ways of simulating argument passing.

To get you started, here's a typical perl CGI script to play with: See the source code here, run it via:

To diagnose faults check the latest error_log in /vol/wwwhomeslogs or include the pragma:

use CGI::Carp qw(fatalsToBrowser);
at the top of your CGI script.

You can syntax check a perl program by typing:

perl -cw script.pl

suEXEC

The suEXEC feature enables CGI and SSI programs to be run under user IDs different from the user ID of the calling web-server. Normally, when a CGI or SSI program executes, it runs as the same user who is running the web server. The permissions and ownership of some CGI scripts and the directories they live in may need adjusting. As a general rule CGI scripts and the directory containing them should be owned by you and your group, but should not be writable by group or others. Check our guide to file permissions if you don't know what this means. Also, to protect any files your script creates from unwanted web access, be sure to set a umask in your script. Using 077 (octal) should do the trick.

Access and error logs on the server

There are various logs on the web server to record accesses to web pages and error messages when things don't work. The most useful places to start when your CGI script refuses to run (usually generating the famous "premature end of script headers" error page) are the suexec log file - /vol/wwwhomeslogs/server-suexec.log - and today's error log /vol/wwwhomeslogs/wwwhomes.doc.ic.ac.uk/error.log-YYYYMMDD. This usually tells you which of the suexec conditions you're breaking, or occasionally tells you there's a syntax error in the script itself. (You did, naturally, syntax check the CGI script before you ran it via the web page as we mentioned above? You didn't?? Tut tut!!).

If you want to search these logs, they can all be found from the Linux systems in the directory /vol/wwwhomeslogs.

The most recent log file is the one with the highest number. The numbers are system times (seconds since epoch), not dates as such, and only relate to the start time of the server not the current date.

Other features of the DoC web server

DoC runs the Apache web server, current version information can be gained from the Netcraft site.

Perl DBI and DBD drivers for PostgreSQL are installed. Oracle, MySQL, and Sybase drivers are installed, but not supported or encouraged. JSP and Java servlets are not supported on the main web server, although we make a personal tomcat system available for experimental, short-term, runs of JSP/servlet webapps on lab machines.

© CSG / Oct 2008