Copyright Notice

This text is copyright by CMP Media, LLC, and is used with their permission. Further distribution or use is not permitted.

This text has appeared in an edited form in WebTechniques magazine. However, the version you are reading here is as the author originally submitted the article for publication, not after their editors applied their creativity.

Please read all the information in the table of contents before using this article.
Download this listing!

Web Techniques Column 29 (Sep 1998)

I recently found myself with a handful of photographs that I wanted to add to the personal section of my website. I thought to myself, ``No Problem... just scan them in, upload them, and add some links, and I'm done.'' Of course, I then remembered that I had wanted to create ``thumbnails'' of these pictures. Ugh. This made matters more complicated.

Thumbnails, which are miniature versions of the original pictures, help the visitors to a site decide whether they want to take the time to download the entire picture. (Please don't confuse this with what one of my friends calls dumbnails, which are fullsized downloads that are scaled in the browser to be small. Lame.) There's nothing worse than spending two to five minutes downloading a typical JPEG file, only to discover that you've already got it, or it looks, well, useless.

Now, all modern photo manipulation programs have tools to scale pictures down to a small size, but the thought of doing this over and over again for each picture just didn't make sense to me. And even then, I'd have twice as many pictures to upload.

Then I remembered that I had previous installed the NETPBM tools on my website machine. This freely available package can perform many manipulations on pictures programmatically (from a command line). The NETPBM package is available from:

        ftp://ftp.x.org/contrib/utilities/netpbm-1mar1994.p1.tar.gz

Be sure to comply with the extremely liberal licensing agreements for this package.

So, after pouring over the docs for PBM, I came up with the following command lines to make a thumbnail. For a JPEG, I type:

        djpeg infile | \
                pnmscale -xy 100 100 | \
                cjpeg -smoo 10 -qual 50 >infile.thumb.jpg

and for a GIF, I enter:

        giftopnm infile | \
                pnmscale -xy 100 100 | \
                cjpeg -smoo 10 -qual 50 >infile.thumb.jpg

Notice that only the first is different, because the remaining steps take the PNM-format data and manipulate it. This is typical when using NETPBM.

Ahh. All that was left was to wrap this up into a program. Then, as long as I was making the thumbnails with a program, I might as well generate the right HTML into my index.html directly! I wanted each picture to end up something like:

        <tr><td><a href="FlamingCamel.jpg">
        <img
                src="FlamingCamel.jpg.thumb.jpg"
                alt="[thumbnail of FlamingCamel.jpg]"
        ></a></td>
        <td>136K</td>
        <td>
                This is me with the ingredients of a
                <em>Flaming Camel</em> drink:
                one shot Aftershock, one shot Buttershotz.
        </td></tr>

This would then be the row of a table, so that all the thumbnails, byte sizes (in K), and descriptions ended up lined up. The thumbnails themselves are also the links to the actual pictures. Once again, Perl can do all the hard work, leaving me to come up with the creative captions.

So, first, let's take a look at the program, presented in [listing one, below], and then we'll see how I used it.

Lines 1 through 3 begin nearly every program I write, enabling compiler time restrictions for large programs.

Line 5 defines a tag that will be placed at the head of the table in which the new entries will be inserted. It's important that you don't delete this tag from the table, or the program will create a new one! Lines 6 and 7 pull in two modules from the LWP library to help translate Unix pathnames into things we can put in HTML. Line 8 sets the Unix umask to a value that is compatible with my webserver reading the files.

Lines 10 through 36 form the main portion of the code. I enclosed this section in a block, and ended it with an exit 0 for emphasis. Any local variables declared within this block are useful only to the main routine, and not to the subroutines.

Line 11 determines which files will be examined as potential pictures needing thumbnails. If command-line arguments are present (as given in @ARGV, then we use that. Otherwise, we'll look at all of the binary files in the current directory.

Lines 13 and 14 set us up for an inplace edit of the index.html file in the current directory. The file to edit is placed in @ARGV, and the backup suffix (here, the GNU Emacs edited-file suffix of a single tilde) is placed in $^I. This is a convenient way to read a file and generate an updated version of that file without a lot of hassle.

Line 16 opens the index.html file for reading, generating a new file for output. Each line ends up in $_.

Lines 17 to 24 trigger if we've made it all the way through the index file without seeing the ADD HERE tag. In this case, we need to generate the entire table ourselves, at the end of the file.

Line 18 prints the last line that got us here. Line 19 follows that with an HTML comment to give you some direction about what to do with the following table. Line 20 dumps a simple table header along with our ADD HERE tag. Line 21 calls the &scan_pictures subroutine to create the thumbnails and dump the HTML in the right place, and line 22 finishes up the closing of the table. Line 23 breaks us out of the loop.

Lines 25 to 32 handle the case where we are now looking at the ADD HERE tag. Note that this is triggered by a regular expression that matches $TAG, but that we're automatically backslashing any characters that might be special to a regular expression. There weren't any in this particular value of $TAG, but you never know.

Line 26 copies through the line that contains the tag. Line 27 creates the thumbnails and HTML in the right place. Lines 28 through 30 copy the remainder of the index from the previous version to the new version. Line 31 breaks the outer loop.

If we haven't seen the tag, and we're not at end-of-file yet, we need to copy all the rest of the lines, and that's handled by line 33.

Lines 40 through 62 define the &scan_pictures routine. The parameter passed in is a list of filenames to scan as picture files, in a loop from lines 41 through 61.

Line 42 skips over the thumbnails. Don't make thumbnails of thumbnails, as I did on one of the first versions of this program.

Line 43 makes the name of the thumbnail file for this file, and line 44 sees if this file already exists. If it does, we presume it's up to date, so we skip over it. (A possible refinement of this program would be to regenerate the thumbnail if it's older than the source file, but I didn't need that for my application.)

Line 45 attempts to convert either a GIF or JPEG into the PNM portable format by calling the subroutine &get_pnm, defined later. If this succeeds, we have a good picture (in $pnm), and if not, we'll skip the name entirely.

Lines 46 through 48 convert this $pnm data into a JPEG thumbnail, using a filehandle opened to a shell pipe. The stdin of pnmscale will be the PNM data. The output is sent to the file named $thumb.

If all that worked, it's time to generate the HTML for the table, handled in lines 50 through 60. Note that a filename used as a URL must be both URI escaped (using uri_escape from the URI::Escape module) and ``entity-ized'' (using encode_entities from the HTML::Entities module), but a filename appearing as text (in the ALT text) merely needs to be ``entity-ized''. Whew.

The final battle here is the creation of the PNM data, in the subroutine defined in lines 64 through 71. Line 65 takes the sole argument and puts it into a temporary $_ variable.

Lines 66 through 69 try different methods of creating the PNM file. You can add other methods in this list (like TIFF or PNG) as long as the command spits out PNM on the output, and exits with a zero exit status if everything worked OK.

Each command in $cmd is tried inside backquotes in line 67. If the exit status is 0, the value is returned as being good. If we make it through the list of commands without success, we'll return an undef value in line 70.

So that's all there is to the program. To use it, create a template index.html file in a directory, then place all your GIFs and JPEGs in that same directory. With that as your current directory, invoke the program.

This program will generate all the thumbnails, and add an HTML table to the end of your index.html file. You'll need to edit the file, moving the table up to wherever you want it. The hardest part is adding the appropriate descriptions, but at least the links and the thumbnails and the filesizes are added for you. Don't delete the ADD HERE tag, or the next invocation won't find it.

Speaking of that, once you've got your file all set, and people are viewing your data, you can insert additional files at any time. Just place the additional JPEG or GIF files into the directory, rerun the program, and only the added files will have new thumbnails created. The HTML table entries for just those files will be added to the ADD HERE location, and you can then add descriptions and move them to where you want them (if necessary).

As you can see, thumbnails are nothing to be scared of when you can use a nice tool like Perl to take out the boring parts. Now if we could just get a way to come up with those descriptions automatically... Hmm. Enjoy!

Listings

        =1=     #!/usr/local/bin/perl
        =2=     
        =3=     use strict;
        =4=     
        =5=     my $TAG = "<!-- ADD HERE -->";
        =6=     use HTML::Entities;
        =7=     use URI::Escape;
        =8=     umask 0022;
        =9=     
        =10=    {
        =11=      my @names = @ARGV ? @ARGV : grep { -f and -B } <*>;
        =12=    
        =13=      local @ARGV = "index.html";
        =14=      local $^I = "~";
        =15=    
        =16=      while (<>) {
        =17=        if (eof) {
        =18=          print;                    # last line
        =19=          print "<!-- move the following table to the proper location -->\n";
        =20=          print "<table border=2> $TAG\n";
        =21=          &scan_pictures(@names);
        =22=          print "</table>\n";
        =23=          last;
        =24=        }
        =25=        if (/\Q$TAG/o) {
        =26=          print;                    # tag line
        =27=          &scan_pictures(@names);
        =28=          while (<>) {              # dump remaining lines
        =29=            print;
        =30=          }
        =31=          last;
        =32=        }
        =33=        print;                      # default
        =34=      }
        =35=    }
        =36=    exit 0;
        =37=    
        =38=    ## subroutines
        =39=    
        =40=    sub scan_pictures {
        =41=      for (@_) {
        =42=        next if /\.thumb\.jpg$/;
        =43=        my $thumb = "$_.thumb.jpg";
        =44=        next if -e $thumb;
        =45=        my $pnm = &get_pnm($_) or next;
        =46=        open PNMTOTHUMB,"| pnmscale -xy 100 100 | cjpeg -smoo 10 -qual 50 >$thumb"
        =47=          or next;
        =48=        print PNMTOTHUMB $pnm;
        =49=        close PNMTOTHUMB;
        =50=        print
        =51=          "<tr><td><a href=\"",
        =52=          encode_entities(uri_escape($_)),
        =53=          "\"><img src=\"",
        =54=          encode_entities(uri_escape($thumb)),
        =55=          "\" alt=\"[thumbnail of ",
        =56=          encode_entities($_),
        =57=          "]\"></a></td><td>",
        =58=          int((1023 + -s)/1024),
        =59=          "K</td><td>\n  Description not provided\n</td>",
        =60=          "</tr>\n";
        =61=      }
        =62=    }
        =63=    
        =64=    sub get_pnm {
        =65=      local $_ = shift;
        =66=      for my $cmd ("djpeg $_", "giftopnm $_") {
        =67=        my $pnm = `$cmd 2>/dev/null`;
        =68=        return $pnm unless $?;
        =69=      }
        =70=      return;
        =71=    }

Randal L. Schwartz is a renowned expert on the Perl programming language (the lifeblood of the Internet), having contributed to a dozen top-selling books on the subject, and over 200 magazine articles. Schwartz runs a Perl training and consulting company (Stonehenge Consulting Services, Inc of Portland, Oregon), and is a highly sought-after speaker for his masterful stage combination of technical skill, comedic timing, and crowd rapport. And he's a pretty good Karaoke singer, winning contests regularly.

Schwartz can be reached for comment at merlyn@stonehenge.com or +1 503 777-0095, and welcomes questions on Perl and other related topics.