Copyright Notice

This text is copyright by CMP Media, LLC, and is used with their permission. Further distribution or use is not permitted.

This text has appeared in an edited form in WebTechniques magazine. However, the version you are reading here is as the author originally submitted the article for publication, not after their editors applied their creativity.

Please read all the information in the table of contents before using this article.
Download this listing!

Web Techniques Column 41 (Sep 1999)

[suggested title: mod_perl enabled thumbnail Picture server]

Back in [my column for January], I showed a nice little CGI ``DirectoryIndex'' handler, allowing users to step through thumbnails of images on my site. Well, I now have 3500 images from my digital camera, totalling some 500M available for download, and all those people wandering through the pictures were firing off quite a few CGI processes.

So, I decided to start using Perl embedded in my Apache webserver to reduce the number of new Perl compilations and separate processes, as well as give back not modified (304) return codes more often if someone was viewing what would be the same page as before. And, going one step further, some people were complaining that the pictures were mostly too large, so I added a ``reduce 50%'' link for each picture that scales the image on the fly with ImageMagick's Perl interface!

Perl-enabled Apache (better known as mod_perl) is described at http://perl.apache.org, and is actively supported by Doug MacEachern and a distributed volunteer crew. Doug and fellow WebTechniques columnist Lincoln Stein have released the excellent Writing Apache Modules with Perl and C. For details about this book, see http://www.modperl.com.

First, mod_perl needs to be told that we're going to have a special content handler for a section of the directory tree. In the .htaccess file at the top of my pictures directory, I've placed these three lines:

    SetHandler perl-script
    PerlHandler Stonehenge::Pictures
    PerlSendHeader On

(I'm not sure if the third one is needed or not, but it works when I leave it on.) Once this is done, when the time comes to show any document within this directory tree, Apache executes the equivalent of:

    use Stonehenge::Pictures;
    Stonehenge::Pictures::handler($r);

where $r is an Apache::Request object with all the related information about this particular request. The code to implement this handler is presented shortly.

To make sense of the code, we also need to look at the data this code reads to generate the image listings. (I haven't changed the format from the previous program, so you can skip this part if you remember.) Each directory contains a file named .title with a one-line short summary of the contents:

    Pictures around the house

Additionally, each directory contains a file named <.info> to give specific details about particular files or images, and a special entry named . for a long description of the current directory:

    . These are pictures around the house.  I
      started taking them when I first got my
      camera, and I've been adding to them ever
      since.  Please check back frequently for
      more pictures.
    FrontDoor.jpg This is the front door.
      The front door is <b>locked!</b>
    BackDoor.jpg This is the backdoor.

A long description can wrap over multiple lines, as long as every line after the first is indented. Finally, each image that is to be shown with a thumbnail must include that thumbnail in the same directory with a name that ends in .thumb.jpg.

The listing of the handler is given in [listing one, below]. Because of its size, I won't be doing my usual ``ramble through the code one line at a time'' as in most other columns. Instead, let me hit the highlights.

Lines 19 to 23 pull in the required modules, used later in the program. Line 23 in particular is like use CGI, but uses a new standard CGI module that makes very readable nicely indented HTML when you use the HTML-generating shortcuts. No more ``400 characters without a newline'' output!

Lines 26 through 30 define the ``global'' per-request variables. Here, $R is the request object, and the other three items are extracted from the request object.

Line 34 pulls in my Stonehenge::Reload module, which reloads this module into memory if it ever changes on disk. You probably won't have this, so comment that line out. One of my future columns will probably talk about this approach versus Apache::StatINC.

Line 40 decides if we're getting called on a directory or an actual file (or bad URL). If we're called on a directory, we'll spit out some number of thumbnails and links to other directories. If it's a file, we'll process the half-size flag if present, or just punt to let the original URL be shown.

Line 50 lets this handler decline to handle files that are ordinary (have no ``path info'' or aren't pictures). Similarly, lines 51 through 56 decline on any item that doesn't have ?size=half as an argument suffix. (If it has arguments, but they don't look like that, Apache will trigger an error for us automatically.)

Lines 58 to 61 let us save some CPU time if the request was merely a HEAD and not a GET.

If we make it to line 63, we're being asked to spit out a half-size version of an existing picture. We'll do this inside an eval block to catch for errors. Any error is logged via the notice routine (defined later), and we'll send back the dreaded error 500 instead. Otherwise, the subroutine has already sent the image, and we just need to return the right status.

Lines 71 to 91 perform the (image) magic. We'll load the image into memory (line 77), scale it by 50% (line 78), and then write it to a tempfile (in line 82). We need to use a tempfile because PerlMagick cannot (yet) write to a variable. Lines 84 through 86 show how much CPU time we're trading for bandwidth here. Lines 88 and 89 send the actual file to the browser. The tempfile created in line 80 is automatically deleted at the cleanup phase (after the browser has disconnected), a nice touch.

Lines 93 to 102 handle the top decisions about sending a directory. If it's a directory, but doesn't end in a slash, we decline. The built-in mod_dir module handles this as an external redirect to the same directory with a trailing slash. This makes relative names from this page work correctly, even if people forget to put in the final slash in their URLs for directories.

Line 101 has a lot of magic. First, we'll update our local cache -- a view of the disk files -- with update_cache(). That returns back a timestamp of the most recently changed item in the cache. We hand that to possible_304(), which uses that information plus the compile time of this script to determine if the browser is already up to date. If so, there's no need to regenerate it, and we'll return the 304 code. Otherwise, it's on to showing the directory.

The showdir() function beginning in line 104 is basically the guts of the prior CGI script. I'm using the CGI shortcuts for header and start_html and so on, and a standard print operator for output. STDOUT is actually a tied filehandle at this point so that the right thing happens for output into the middle of an HTTP/1.1 session. Lines 118 to 123 put in a plug for mod_perl, as long as I'm at it.

The showlinks() routine beginning at line 129 dumps out the directory links to other directories, or random files that are not thumbnailed images. A nice alternating-color table is used for neat effect, and to squelch those complaints that my pages are too plain.

And there's show_pics starting in line 143, which has the job of dumping out 1 to 10 thumbnails at a time, including the forward/backward links. The starting and ending numbers are computed from the arguments, like ?start=10&end=20 to go from 10 to 20. These generally arise from self-referencing URLs, set up by make_links_row(). For ease of navigation, I put these now at the top and bottom of the table, with a slight green background.

The hairy code in the middle of show_pics generates the table rows and table cells. The scaled 50% link is a reference to the picture-file's URL, followed by ?size=half, generated in lines 174 to 176.

Both make_links_row and range_link (defined beginning in lines 184 and 201 respectively) help create the ``next/previous'' links, truncating the shown range to the valid range of pictures available.

The dump_cache routine (defined starting in line 218> is strictly for debugging, to help me understand why the cache was sometimes wrong. I left the code in so that you could see what's being recorded for each directory.

Lines 226 through 232 define notice and escape_html_uri, both helper routines that interface with Apache API calls.

Lines 234 through 248 define the possible_304 routine, which pays attention to the newest item in the cache, as well as the time that this program module was last updated. If the client passed in some conditions like ``only if modified after this certain time'', and we haven't been modified, then we can pass back a 304 response which says ``hey, you've already got this''. This is great for caches. Most of the smart logic is built into the Apache API callbacks against the request object in lines 241 to 246. It's just up to me to return the response code from that last method.

From line 250 to the end of the program is all the cache management, keeping track in memory of what the directory holds, to minimize the number of system calls. To do that, we've got to watch timestamps very carefully, lest the directory or files change without us knowing.

Lines 253 to 259 document the format of the %cache hash, keyed by a directory path. The Depends element records timestamps of everything that went into making this entry; we can check that quickly later to determine if the cache is up to date. The other elements hold the information derived from looking at the directory and its contents.

The cache is made fresh with update_cache (line 261), which either verifies that everything hasn't changed, or reloads everything. The response is the modification time of the newest item (which gets handed to possible_304, and so on). And check_depends (line 265) looks at all the items in the dependency list, and makes sure they haven't changed.

If it's time to read in a new bunch of information, that happens starting in line 281. We'll put thumbnailed pictures in one pile, and ``other things'' in the other, then save up that info into the cache in lines 305 to 308. Line 309 is strictly for logging -- I have a ChildExitHandler (not shown) that shows how many picture directories were cached.

Lines 315 to 357 fetch and parse the annotation files (.title and .info). As each file is opened, its timestamp is automatically recorded into the Depends entry for this directory via open_and_add_depends in line 336.

Lines 359 to 367 provide access to the cache information for the various show-whatever routines above.

Wow. What a long one. But it works rather nicely. Already, people are commenting on how much faster my pictures page seems now. Go visit it at http://www.stonehenge.com/merlyn/Pictures/ for a test-drive.

So, if CGI isn't fast enough, get yourself an embedded Perl interpreter right inside Apache. In future columns, I'll show how mod_perl can be used for more than content-generation. Until next time, enjoy.

Listings

        =1=     package Stonehenge::Pictures;
        =2=     use strict;
        =3=     
        =4=     use vars qw($VERSION);
        =5=     BEGIN { $VERSION = 2.00; };
        =6=     
        =7=     use vars qw($CACHED);
        =8=     BEGIN { $CACHED = undef; }
        =9=     
        =10=    ### configuration
        =11=    my $PICSPERPAGE = 10;
        =12=    my $ODD = "#dddddd";            # bgcolor for odd rows
        =13=    my $EVEN = "#ffffff";           # bgcolor for even rows
        =14=    my $NEXTPREV = "#ddffdd";       # bgcolor for next/prev links rows
        =15=    my $REPORT = 1;                 # if 1, report times for Magick
        =16=    my $DEBUG_DUMP_CACHE = 0;       # if 1, append dump of cache in response
        =17=    ### end configuration
        =18=    
        =19=    use Apache::Constants qw(:common DIR_MAGIC_TYPE);
        =20=    use Apache::Log;
        =21=    use Apache::File;
        =22=    use Apache::Util qw(escape_uri escape_html size_string);
        =23=    use CGI::Pretty qw(:all);
        =24=    
        =25=    ### globals
        =26=    my $R;                          # request object
        =27=    my $LOG;                        # log object
        =28=    
        =29=    my $URI;                        # uri string
        =30=    my $DIR;                        # directory (set only if directory)
        =31=    ### end globals
        =32=    
        =33=    sub handler {
        =34=      use Stonehenge::Reload; goto &handler if Stonehenge::Reload->reload_me;
        =35=    
        =36=      $R = shift;
        =37=      $LOG = $R->log;
        =38=      $URI = $R->uri;
        =39=    
        =40=      if ($R->content_type eq DIR_MAGIC_TYPE) {
        =41=        return handle_directory();
        =42=      } else {
        =43=        return handle_file();
        =44=      }
        =45=    }
        =46=    
        =47=    sub handle_file {
        =48=    
        =49=      ## handle only JPEG files with ?size=half suffix
        =50=      return DECLINED if $R->content_type ne "image/jpeg" or $R->path_info;
        =51=      {
        =52=        my (%args) = $R->args;
        =53=        return DECLINED if
        =54=          not exists $args{size} or not delete $args{size} eq "half"
        =55=            or %args;               # must be empty now
        =56=      }
        =57=    
        =58=      if ($R->header_only) {        # save some work
        =59=        $R->send_http_header('image/jpeg');
        =60=        return OK;
        =61=      }
        =62=    
        =63=      my $rc = eval { do_magick() };
        =64=      if ($@) {                     # dead magick
        =65=        notice("Magick error: $@");
        =66=        return SERVER_ERROR;
        =67=      }
        =68=      return $rc;
        =69=    }
        =70=    
        =71=    sub do_magick {                 # may die
        =72=      my @times = (time, times);
        =73=    
        =74=      require Image::Magick;
        =75=    
        =76=      my $q = Image::Magick->new or die "Cannot new Image::Magick";
        =77=      my $err = $q->Read($R->filename); die $err if $err;
        =78=      $err = $q->Minify; die $err if $err;
        =79=    
        =80=      my ($tmpnam,$fh) = Apache::File->tmpfile or die "Cannot create tmpfile: $@";
        =81=    
        =82=      $err = $q->Write('filename' => "JPG:$tmpnam"); die $err if $err;
        =83=    
        =84=      $REPORT and
        =85=        notice(sprintf "%s magick: real %d user %.2f sys %.2f",
        =86=               $URI, (map { $_ - shift @times } time, times)[0,1,2]);
        =87=    
        =88=      $R->send_http_header('image/jpeg');
        =89=      $R->send_fd($fh);
        =90=      return OK;
        =91=    }
        =92=    
        =93=    sub handle_directory {
        =94=      ## if non-slash URL, send external redirect via mod_dir:
        =95=      return DECLINED unless $URI =~ /\/$/;
        =96=    
        =97=      $DIR = $R->filename;
        =98=    
        =99=      $R->chdir_file("$DIR/");
        =100=   
        =101=     return possible_304(update_cache()) || showdir();
        =102=   }
        =103=   
        =104=   sub showdir {
        =105=     my $title = "Picture index for " . get_cache_dir_title();
        =106=   
        =107=     print
        =108=       header(-expires => "+1d"),
        =109=       start_html(-title => $title,
        =110=                  -dtd => "-//W3C//DTD HTML 4.0 Transitional//EN"),
        =111=       h1($title),
        =112=       p(get_cache_dir_info());
        =113=   
        =114=     show_links();
        =115=     show_pics();
        =116=     dump_cache() if $DEBUG_DUMP_CACHE;
        =117=   
        =118=     print
        =119=       hr,
        =120=       p("This page is powered by",
        =121=         a({href =>"http://perl.apache.org";},
        =122=           img({src => "http://perl.apache.org/logos/mod_perl.gif";,
        =123=                alt => "Apache and mod_perl!"})));
        =124=   
        =125=     print end_html;
        =126=     return OK;
        =127=   }
        =128=   
        =129=   sub show_links {
        =130=     my $flip = 0;
        =131=     
        =132=     print
        =133=       h2("links"),
        =134=       table({ cellspacing => 0, cellpadding => 10 },
        =135=             map { Tr({bgcolor => (($flip = !$flip) ? $ODD : $EVEN)},
        =136=                      td(a({Href => escape_html_uri($_->[0])},
        =137=                           escape_html($_->[0]))),
        =138=                      td($_->[1]),
        =139=                     ) }
        =140=             @{get_cache_other($DIR)});
        =141=   }
        =142=   
        =143=   sub show_pics {
        =144=     if (my @all_pics = @{get_cache_pictures($DIR)}) {
        =145=       my $max = @all_pics;
        =146=   
        =147=       my %args = $R->args;
        =148=       my $start = int($args{'start'} || 1);
        =149=       $start = 1 if $start < 1;
        =150=       $start = $max if $start > $max; # dubious
        =151=       my $end = int($args{'end'} || ($start + $PICSPERPAGE - 1));
        =152=       $end = $start if $end < $start;
        =153=       $end = $max if $end > $max;
        =154=   
        =155=       my @pics = @all_pics[($start-1)..($end-1)];
        =156=   
        =157=       my @links_row = make_links_row(3, $start, $end, $max);
        =158=   
        =159=       my $flip = 0;
        =160=       print
        =161=         h2("Pictures $start through $end of $max"),
        =162=         p("Please note the new 50% links...",
        =163=           "a reduced-size JPEG will be created on the fly."),
        =164=         table({ cellspacing => 0, cellpadding => 10 },
        =165=               @links_row,
        =166=               map ({my($jpg, $thumb, $info, $size, $mtime) = @$_;
        =167=                     Tr({bgcolor => (($flip = !$flip) ? $ODD : $EVEN)},
        =168=                         td(a({Href => escape_html_uri($jpg)},
        =169=                              img({src => escape_html_uri($thumb),
        =170=                                   alt => "[thumbnail for ".
        =171=                                   escape_html($jpg)."]"}))),
        =172=                         td(size_string($size).",",
        =173=                            "uploaded",int((time - $mtime)/86400),"days ago,",
        =174=                            "scaled:",
        =175=                            a({Href => escape_html_uri($jpg)."?size=half" },
        =176=                              "50%")),
        =177=                         td($info),
        =178=                        ) } @pics),
        =179=               @links_row,
        =180=              );
        =181=     }
        =182=   }
        =183=   
        =184=   sub make_links_row {
        =185=     my ($colspan, $start, $end, $max) = @_;
        =186=   
        =187=     my @links;
        =188=     push @links, range_link($start - $PICSPERPAGE,
        =189=                             $start - 1,
        =190=                             $max, "previous")
        =191=       if $start > 1;
        =192=     push @links, range_link($end + 1,
        =193=                             $end + $PICSPERPAGE,
        =194=                             $max, "next")
        =195=       if $end < $max;
        =196=     return @links
        =197=       ? Tr({bgcolor => $NEXTPREV}, td({colspan => $colspan}, join ", ", @links))
        =198=         : ();
        =199=   }
        =200=   
        =201=   sub range_link {
        =202=     my $start = shift;
        =203=     my $end = shift;
        =204=     my $max = shift;
        =205=     my $text = shift;
        =206=   
        =207=     $start = 1 if $start < 1;
        =208=     $start = $max if $start > $max;
        =209=     $end = $start if $end < $start;
        =210=     $end = $max if $end > $max;
        =211=   
        =212=     my $count = $end - $start + 1;
        =213=     my $pictures = $count > 1 ? "$count pictures" : "picture";
        =214=   
        =215=     return a({href => "$URI?start=$start&end=$end"}, "$text $pictures");
        =216=   }
        =217=   
        =218=   sub dump_cache {
        =219=     require Data::Dumper;
        =220=     print
        =221=       table({ Border => 1 },
        =222=             Tr(th("dump of %cache")),
        =223=             Tr(td(pre(escape_html(Data::Dumper::Dumper(get_cache_ref($DIR)))))));
        =224=   }
        =225=   
        =226=   sub notice {
        =227=     $LOG->notice("[$$] ", @_);
        =228=   }
        =229=   
        =230=   sub escape_html_uri {
        =231=     return escape_html(escape_uri(shift));
        =232=   }
        =233=   
        =234=   BEGIN {
        =235=     ## cache this file's mtime at compile-time
        =236=     my $module_mtime = (stat(__FILE__))[9];
        =237=   
        =238=     sub possible_304 {
        =239=       my $cache_time = shift;
        =240=   
        =241=       $R->update_mtime;           # from current directory/file
        =242=       $R->update_mtime($module_mtime);
        =243=       $R->update_mtime($cache_time);
        =244=       $R->set_last_modified;
        =245=       $R->set_etag;
        =246=       return $R->meets_conditions;
        =247=     }
        =248=   }
        =249=   
        =250=   BEGIN {                         # cache-related functions
        =251=     my %cache;
        =252=   
        =253=     ## $cache{$DIR} = {
        =254=     ##    { Depends => { "relative name" => $mod_time, ... },
        =255=     ##    { Other => [["link" => "desc"], ["link" => "desc"], ... ] },
        =256=     ##    { Pictures => [["foo","foo.thumb.jpg","desc", stats], ...] },
        =257=     ##    { Title => "my title" },
        =258=     ##    { Info => "my info" },
        =259=     ## }
        =260=   
        =261=     sub update_cache {
        =262=       return check_depends() || make_new_cache();
        =263=     }
        =264=   
        =265=     sub check_depends {
        =266=       my $no_stat = shift || 0;   # if true, returns max(@times)
        =267=   
        =268=       return 0 unless exists $cache{$DIR} and exists $cache{$DIR}{Depends};
        =269=       my $items = $cache{$DIR}{Depends};
        =270=       my $most_recent = 0;
        =271=       while (my ($key, $value) = each %$items) {
        =272=         unless ($no_stat) {
        =273=           return 0 unless my (@stat) = stat($key);
        =274=           return 0 if $stat[9] != $value;
        =275=         }
        =276=         $most_recent = $value unless $most_recent > $value;
        =277=       }
        =278=       return $most_recent;
        =279=     }
        =280=   
        =281=     sub make_new_cache {
        =282=   
        =283=       ## clear it out, initialize to dot mtime
        =284=       $cache{$DIR} = { Depends => { "." => (stat("."))[9] } };
        =285=   
        =286=       my @files = get_files_in_dot();
        =287=       my $info = get_info_in_dot();
        =288=   
        =289=       my @pictures;
        =290=       my @other;
        =291=   
        =292=       for (@files) {
        =293=         my @stat = stat;
        =294=         if (-d _) {
        =295=           push @other, ["$_/" => get_title($_)];
        =296=         } elsif (-r "$_.thumb.jpg") {
        =297=           push @pictures,
        =298=           [$_, "$_.thumb.jpg", get_info($info, $_), $stat[7], $stat[9]];
        =299=         } elsif (/\.thumb\.jpg$/) { # ignore
        =300=         } else {
        =301=           push @other, [$_ => get_info($info, $_)],
        =302=         }
        =303=       }
        =304=   
        =305=       $cache{$DIR}{Pictures} = \@pictures;
        =306=       $cache{$DIR}{Other} = \@other;
        =307=       $cache{$DIR}{Title} = get_title(".");
        =308=       $cache{$DIR}{Info} = get_info($info,".");
        =309=   
        =310=       $CACHED = keys %cache;      # for ChildExitHandler report
        =311=   
        =312=       return check_depends(1);
        =313=     }
        =314=   
        =315=     sub get_files_in_dot {
        =316=       my $dot = Apache::File->new;
        =317=       opendir $dot, ".";
        =318=       return sort "..", grep !/^\.|~$/, readdir $dot;
        =319=     }
        =320=   
        =321=     sub get_info_in_dot {
        =322=       my %info = ();
        =323=       local($/,$_);
        =324=       my $fh = open_and_add_depends(".info");
        =325=       if ($fh and defined($_ = <$fh>)) {
        =326=         s/^\s*\#.*\n//mg;         # toss comments
        =327=         s/[ \t]*\n[ \t]+/ /g;     # fold continuation lines
        =328=         %info = /^(\S+)\s+(.*)/mg;
        =329=       }
        =330=       return \%info;
        =331=     }
        =332=   
        =333=     sub open_and_add_depends {
        =334=       my $path = shift;
        =335=       my $fh = Apache::File->new($path);
        =336=       $fh and $cache{$DIR}{Depends}{$path} = (stat($fh))[9];
        =337=       return $fh;                 # possibly undef
        =338=     }
        =339=   
        =340=     sub get_title {
        =341=       my $name = shift;
        =342=   
        =343=       my $fh = open_and_add_depends("$name/.title");
        =344=       $fh and <$fh> =~ /(.+)/ and return escape_html($1);
        =345=       $name eq ".." and return "Go up";
        =346=       $name eq "." and $name = $URI; # use uri instead of dot
        =347=       return "The ".escape_html($name)." directory";
        =348=     }
        =349=   
        =350=     sub get_info {
        =351=       my $info = shift;
        =352=       my $name = shift;
        =353=   
        =354=       exists $info->{$name} and return $info->{$name};
        =355=       $name eq "." and return " ";
        =356=       return "Description not provided for ".escape_html($name);
        =357=     }
        =358=   
        =359=     sub get_cache_pictures {return $cache{$DIR}{Pictures}}
        =360=   
        =361=     sub get_cache_other {return $cache{$DIR}{Other}}
        =362=   
        =363=     sub get_cache_dir_title {return $cache{$DIR}{Title}}
        =364=   
        =365=     sub get_cache_dir_info {return $cache{$DIR}{Info}}
        =366=   
        =367=     sub get_cache_ref {return \%cache} # for debugging only
        =368=   }
        =369=   
        =370=   1;

Randal L. Schwartz is a renowned expert on the Perl programming language (the lifeblood of the Internet), having contributed to a dozen top-selling books on the subject, and over 200 magazine articles. Schwartz runs a Perl training and consulting company (Stonehenge Consulting Services, Inc of Portland, Oregon), and is a highly sought-after speaker for his masterful stage combination of technical skill, comedic timing, and crowd rapport. And he's a pretty good Karaoke singer, winning contests regularly.

Schwartz can be reached for comment at merlyn@stonehenge.com or +1 503 777-0095, and welcomes questions on Perl and other related topics.