DoC Computing Support Group


CGI Scripts

These are some notes on CGI (Common Gateway Interface) scripts, how they work and possible problems you may have. The Common Gateway Interface is a standard agreed upon by webserver developers to allow servers to call user-written programs in a standardised way. Please remember that any scripts you run must comply with the usual computing regulations.

DoC runs the Apache web server, version 2.2.8. The Apache2 project CGI Howto covers some queries on CGI.

This page describes the basics of running CGI scripts in DoC. CGI scripts can be written in any language, but the majority of them tend to be written using either Perl or PHP.

CGI Scripts in your ~/public_html directory

Any script in your public_html directory having the suffix .cgi (or .php) can be run provided it meets the following conditions:

  • The CGI script must be readable and executable by you
  • If the CGI script is writable, it must only be writable by you

  • The CGI script must be owned by you, and it must have the same group as your primary group (eg. cs2)
  • The directory containing the CGI script must only be writable by you
  • The directory containing the CGI script must also have the same group as your primary group (eg. cs2)
  • Unless the CGI script is a compiled executable, the first line must be a proper #!/path/to/interpreter line. So, for example, all Perl CGI scripts should have a first line of #!/usr/bin/perl, and all PHP CGI scripts must have the first line #!/usr/bin/php.

  • The CGI script must have Unix line endings (newline), not Windows or Mac OS 9 line endings. While certain Windows editors (eg. the Programmer's File Editor) can save with Unix line endings, and the Linux command line tool dos2unix can convert Windows line endings to Unix, we recommend that you edit CGI scripts on Unix. Windows users can ssh into shell1 and use your favourite Unix editor (pico, vim, emacs, kedit etc).

These restrictions are requirements of suexec (see below) - since the script runs with your user permissions, they are for your protection.

If any of these conditions are not met, then suexec will refuse to run your CGI script, and will generate the famous "500 Internal Server Error" page.

The best way to ensure that the permission and group ownership conditions are met is to run the following commands in Linux:

cd ~/public_html
chmod 755 scriptname
chmod 755 .
chown -R yourusername .

Then, to ensure that permissions are correct for scripts created in future, check that your umask (the default permissions mode) is set to 022, this is the default for most groups of students. Check it via:

umask

and, if you need to, set it via the command:

umask 022

CGI Scripts in a group project area

Note that if you are running CGI scripts from a group project area in /vol/project, the rules are slightly different:

  • CGI scripts in /vol/project must be executable and readable by the group
  • CGI scripts in /vol/project must be group owned by the project group (g09.... or similar), and not the primary group of any user (cs2 or whatever). It's easy to fall foul of this problem when creating directories without the setgid bit set - hence we recommend that whenever any group member creates a directory in the group project area, they must also set the setgid bit via 'chmod g+s NEWDIR'.

  • the directories the CGI scripts live in (a subdirectory of /vol/project, obviously) must also have the project group as the group owner.

  • CGI scripts in /vol/project (and the directories they live in) can be group writable to the group for convenience of group updating.
  • CGI scripts in /vol/project (and the directories they live in) must not be writable to other, but should be readable and executable to other.
  • CGI scripts in /vol/project still need a valid #!/usr/bin/interpreter line, and Unix line endings, as discussed above.

Again, if any of these conditions are not met, then suexec will refuse to run your CGI script, and will generate the famous "500 Internal Server Error" page.

We recommend the following simple rules for keeping permissions in your group project directory sensible:

  • All members of the group should set umask 002 before working in the group project directory. This ensures that all new files/dirs are user-writable, group-writable and other-readable. You might wish, for the duration of the project, to add umask 002 to the end of your ~/.cshrc file, while noting that this will affect all new files created - not just those in /vol/project. In particular, having this umask in place when creating files in your ~/public_html directory would create publically writable files which suexec would refuse to run!

  • Whenever one of the group creates a new subdirectory in the group directory, they should do "chmod g+ws NEW_DIR_NAME". This will ensure that newly created files/dirs within that directory inherit the group ownership from the parent (i.e. be group-owned by the project group) rather than the default behaviour of inheriting to be group-owned by your primary group. The latter is one of the main causes of the dreaded "Internal server error" message.

If the permissions have already gone wrong because you didn't read these pages first, you can fix it by running the commands:

cd /vol/project/YOUR/GROUP/PROJECT/DIR
chmod -R ug=rwX,o=rX .
chgrp -R YOUR_GROUP_PROJECT .

However, note that you will not be able to change the permissions and group of files that other members of the group have created. So each member of the group who broke things may need to run the above commands to achieve complete coverage. This is so inconvenient that we strongly recommend you follow the above rules all the time, to prevent the problems occurring.

General CGI Notes

When a web request comes in that names one of your CGI scripts, and the above constraints have been satisfied, the script itself is executed as you in the normal Unix way. This means that, unless it is a true compiled executable, the first line of the script must be a #!/path/to/interpreter line. Hence all Perl CGI scripts should have a first line of #!/usr/bin/perl, and all PHP CGI scripts must have the first line #!/usr/bin/php.

See our PHP scripts guide for more information about making PHP scripts run.

Note that if you download CGI scripts from the Internet (whatever language they're written in), you are responsible for ensuring that the scripts really do what they say they will, do not misbehave and are secure. Bear in mind that by allowing these scripts to run as CGI scripts on our web servers, they will be invoked by untrusted users across the Internet, but will run with your user privileges on our web server, and hence have the same access to files in your home directory as you do! If you choose to use a well known piece of web software (eg a wiki or a content management system), it is quite likely that particular versions of such software will have vulnerabilities which hackers know how to exploit. It is your responsibility for keeping your web-based software up to date and (as far as possible) secure.

The CGI Environment

Your CGI script will be called by the webserver with a number of shell environment variables set. If the URL that invokes the CGI script contains a '?' then anything following the '?' is placed into the QUERY_STRING environment variable, and also given to the script as a command line argument. Similarly, if the invoking URL names a CGI script and has a '/' section after the name of a valid CGI script, the '/' section is extracted and placed into the PATH_INFO environment variable.

Try the following WWW pages to see what environment variables are available to your CGI program. The program is a shell script and is available here.

The program has to return not only the contents of the page, but also the WWW header information for the page. Normally this only requires printing out the Content-type: line followed by two newlines, to mark the end of the pages header information.

Some Examples

For example to return a plain, unformatted, text document, you simply make your CGI script print out the following text:

Content-type: text/plain

 1: Some boring text.
 2: Some boring text.
 3: Some boring text.
...
20: Some boring text.

To do this, you could write the following CGI shell script (if you want to make your life hard!):

# !/bin/sh   <-- delete the space between # and !

echo Content-type: text/plain
echo
i=1
while [ $i -le 20 ]
do
   echo "$i: Some boring text."
   i=`expr $i + 1`
done

Or the following Perl script:

# !/usr/bin/perl
print "Content-type: text/plain\n\n";
foreach my $i (1..20)
{
   print "$i: Some boring text.\n";
}

If (more typically) you wanted to return a page of formatted HTML, you might want your CGI script to print out the following text:

Content-type: text/html

<html><body><ol>
 <li>we can do <b>Bold</b>,<i>Italic</i> and much much more!</li>
 <li>we can do <b>Bold</b>,<i>Italic</i> and much much more!</li>
 ...
 <li>we can do <b>Bold</b>,<i>Italic</i> and much much more!</li>
</li></body></html>

To do this, you could write the following Perl script:

# !/usr/bin/perl
print "Content-type: text/html\n\n";
print "<html><body><ol>\n";
foreach my $i (1..20)
{
    print "<li>we can do <b>Bold</b>,<i>Italic</i> and much much more!</li>\n";
}
print "</ol></body></html>\n";

Or the following PHP script:

# !/usr/bin/php
<?
echo "<html><body><ol>\n";
for( $i = 1; $i <= 20; $i++ )
{
    echo "<li>we can do <b>Bold</b>,<i>Italic</i> and much much more!</li>\n";
}
echo "</ol></body></html>\n";
?>

You should consider using helper modules such as the perl CGI module to make your life easier. On Linux systems type:

perldoc CGI

for details.

You can run the script outside of the web server, and CGI.pm has ways of simulating argument passing.

To get you started, here's a typical Perl CGI script to play with: See the source code here, run it via:

To diagnose faults, check the error.log in /vol/wwwhomeslogs/wwwhomes.doc.ic.ac.uk/, or (for Perl CGI scripts) include the pragma:

use CGI::Carp qw(fatalsToBrowser);

at the top of your CGI script.

You can syntax check a Perl program by typing:

perl -cw script.pl

Access and error logs on the server

There are various logs on the webserver to record accesses to web pages and error messages when things don't work. The most useful places to start when your CGI script refuses to run (usually generating the famous "500 Internal Server Error") are the suexec log - /vol/wwwhomeslogs/server-suexec.log - and the current error log - /vol/wwwhomeslogs/wwwhomes.doc.ic.ac.uk/error.log. This usually tells you which of the suexec conditions you're breaking, or occasionally tells you there's a syntax error in the script itself. (You did, naturally, syntax check the CGI script before you ran it via the web page as we mentioned above?)

 
 

guides/web/cgi (last edited 2010-11-29 15:02:44 by dcw)