Copyright Notice

This text is copyright by CMP Media, LLC, and is used with their permission. Further distribution or use is not permitted.

This text has appeared in an edited form in WebTechniques magazine. However, the version you are reading here is as the author originally submitted the article for publication, not after their editors applied their creativity.

Please read all the information in the table of contents before using this article.
Download this listing!

Web Techniques Column 16 (August 1997)

I've been wanting to write about the save method of the all-singing, all-dancing, gotta-have-it CGI.pm module for quite some time, as a way of saving structured data into a flat textfile to be processed later. Well, I finally stumbled on to a nice little idea that works pretty well, and also provides yet another example of flock()-ing a datafile and generating HTML on the fly.

The idea is not a new one... it's a ``web chat'' script. This is the kind of thing where you and others go to a particular URL at the same time, and you start typing your messages into a form field, press ``submit'', and then you get to see what the others just said at the same time as you. Kindof like the too-huge-for-its-own-good ``Internet Relay Chat'', but with a lot less bells and whistles. Or a really fast-moving guestbook that keeps only the 32 most recent entries.

So I decided to hack out a little under-100-line web chat script. No bells, no whistles, no frills. Stick it somewhere, and you can talk with a friend, or make friends.

Of course, writing a column like this particular one makes me a ``newbie magnet'', as in ``someone who is likely to get a lot of uninteresting questions from people that won't do research for themselves''. I can imagine the number of email requests I'll now be getting from people who are not actually programmers but think they would be a R3AL K00L D00D to have a chat area on their website. So, they copy scripts like mine into some likely (or unlikely!) webserver area, without even bothering to configure anything, and then write me when it breaks. (I get lots of email with a first line of ``why doesn't [this program] work?'' and then a spew of 50 to 500 lines of code... joy.)

So let me state this up front, as a paragraph that I can point them to later: this script is not meant to be used as-is. In fact, it's not meant to be used at all. It's merely an illustration of some technology around the CGI.pm module and saving and restoring queries, and yet another demonstration of flock()-ing. The fact that the application is a simple web-chat script that actually works (for a very small narrow definition of ``works'') is irrelevant. OK, end of disclaimer.

But, in any event, I hereby present my little toy web chat script in [Listing one, below].

Lines 1 and 2 begin nearly every lengthy program I write, enabling taint-checks, warnings, and appropriate compiler and run-time restrictions.

Line 4 pulls in the CGI.pm module, and defines the standard useful set of form-access methods and HTML-generation methods.

Lines 7 and 8 define constants, using the new constant module. This module is part of the 5.004 (and later, I presume) Perl distributions, and was created by my associate Perl trainer, Tom Phoenix (rootbeer@teleport.com). However, if you don't have constant (or cannot get it for some ludicrous pointy-haired-manager reason), you can replace those lines with something like:

        sub CHATFILE { "/home/merlyn/Web/chatfile" }
        sub MAXENTRIES { 32 }

and it'll work approximately the same. These two constants define the location of the chat information, and the number of prior messages to retain.

Lines 10 through 14 define a subroutine to encode the required HTML entities into their HTML-safe counterparts. Quotes, less-than, greater-than, and ampersands are all handled nicely.

Line 16 prints an HTTP header, the beginning of the HTML page, and an H1 header, using routines from the CGI module. Line 17 executes the main routine (defined later) in an eval block, protecting it from any dangerous die operations.

Should a die occur, the $@ variable is set to the death message; otherwise, the $@ is blank. Lines 18 through 21 detect the error, sending out the error message (properly escaped using ent defined above).

Lines 23 through 38 define the main routine. I did it this way so that the eval block above is very small and easy to see. Of course, I could have just put the entire definition for main into the eval block.

Line 24 fetches the prior chat entries from the datafile, including updating the file with the submitted form if necessary. More on that later when I discuss the get_old_entries subroutine.

Lines 25 through 30 print the input form to be submitted. Line 25 takes care of the horizontal line (via hr) and start-of-form information. The form will be made a POST form that is self-referential, by default, meaning that a submit button in this form will cause this same script to be reinvoked.

Lines 26 through 28 create the three form fields: ``name'', ``email'' and ``message''. Note here that ``message'' has a default value of the empty string, but also has an override parameter (the final 1) set to true. This means that any prior value of ``message'' will be ignored, and the requested default (empty string) will have precedence. Because the other two fields do not have override set to true, any prior value for those fields will carry forward from one invocation to the next as a default.

Line 29 puts a submit button at the end of the form, along with a note about submitting an empty message to listen. Line 30 closes off the form.

Lines 31 through 36 display the prior messages, kept in the @entries variable. The syntax here (with for my $var ...) is new to Perl version 5.004, so again if you don't have the latest Perl, you'll have to make some slight adjustments. Each element of @entries goes into the lexical local variable $entry, which is then examined in the body of the loop.

Line 32 fetches the ``name'' field from a particular entry, and prints it. Similarly, line 33 handles the ``email'' field. Line 34 is a little strange, because as you'll see later, we're saving the current time of day as a Unix-timestamp value into the entries. Luckily, in one swift move, we can convert this to a human-readable string (using scalar localtime). Finally, line 35 takes care of the ``message'' parameter (what they actually ended up saying).

Line 37 closes out the output of the HTML page, and is the last output normally done.

Lines 40 through 74 define the subroutine that handles the interaction with the chat-file. This subroutine was called from above in line 24, and is expected to return a list of the current chat entries. Line 43 creates an empty array that we'll use as the return value.

Lines 44 and 45 set up a temporary filehandle using the IO::File class. (Again, if your Perl version is not at least 5.004, you might need to upgrade to use this particular part as-is.) The filehandle is opened read/write (indicated by the ``+<'' opening mode). This filehandle allows the program to access the history of messages posted to this chat.

Line 47 ensures that only one invocation of this program at any particular time is reading, modifying, or writing the chat history file. The flock() operator will block the program until we can get an exclusive lock.

Now, from here on down to the point where we release the lock (line 71), we are the only script operating, so it's important to keep this amount of time short, especially on a busy system. I usually flag these moments with comments such as the ones on line 46 and 72, which tells me rather visually how many steps are being hacked during this time.

Line 48 rewinds the file, not completely necessary here, but mostly a safety precaution, because the next operation really wants to process the entire file. (I generally seek right after obtaining a flock, because the file size might have changed from the last time I looked.)

Lines 49 through 51 pull in all the historical chat messages. Each time through the loop (as long as we haven't hit end-of-file, detected with eof()), the CGI module's new routine is called, passing it the filehandle. This triggers the routine to read a standardized save-and-restore form data format from the file, creating an independent query record. The push() takes this and shoves it onto the ever-increasing @entries array. When we're done @entries is a list of CGI ``objects'', each one containing a separate submitted chat message, along with all of its identifying information.

Lines 52 and 53 check if this particular script invocation came from a form submission containing a valid message to post, or just a message consisting of whitespace (something to be ignored) or even absent (such as the first time this script's URL gets called up). Note the explicit check for defined(), and then a further check for that defined element containing any non-whitespace character with /\S/.

Lines 55 through 62 transfer the user's ``query'' as one of the posted messages. However, we must be careful what gets transferred across, to prevent resource-hogging from a malicious-and-slightly-clever user. So, I have to ensure that only the selected fields get added to the history file, and that those fields are limited in size.

Here, I've chosen to accept three user-returned fields (the same as in my generated form above) and limit those to 1024 characters each. By doing so, the worst that mad user can do is fill up each slot with about 10K each (1K times 3 items times 3 bytes per hex-escaped character plus a little overhead). Because we limit the posts to 32 slots, we're always gonna be under 320K for the filesize then -- not a big deal. Yes, there are other resource starvation issues, but at least filling up the disk is not going to be one of them.

To transfer just a limited about of information into the history file, I create a brand-new CGI object in line 55, empty except for a timestamp (using the Unix internal time value). Lines 56 to 61 then add the other three parameters from the user's input query, being careful to truncate the data to 1024 characters without prejudice.

Lines 62 through 64 put the user's query in front of the data (so new messages are automatically visible at the top) and then ensure that only the 32 most recent messages are saved into the file.

Lines 65 through 69 rewrite the output data, using the save method. This method causes the data to be scribbled out into the history file in such a way that they can be loaded up by the code in line 50 on the next script invocation. So we've essentially got a flat text file acting as a structured data repository, thanks to the save/restore code built in to CGI.pm. Cool.

And there's not much left but to close the filehandle (line 71) and return the entries (line 73). Actually, the filehandle would have been automatically closed when the subroutine exited, because the IO::File reference is a lexically local variable. Sometimes, I even therefore leave the close() out.

So, to use this script, I'd plop it into a CGI directory somewhere, and create the file designated by CHATFILE, and make it writeable to the user-id of the CGI process. How you do that is pretty much site-dependant, so ask your webmaster. (If you are your webmaster and don't know, that's gonna be a tough one.) See you next time!

Listing One

        =1=     #!/home/merlyn/bin/perl -Tw
        =2=     use strict;
        =3=     
        =4=     use CGI ":standard";
        =5=     
        =6=     ## following must be writable by CGI user:
        =7=     use constant CHATFILE => "/home/merlyn/Web/chatfile";
        =8=     use constant MAXENTRIES => 32;
        =9=     
        =10=    sub ent {                       # translate to entity
        =11=      local $_ = shift;
        =12=      s/["<&>"]/"&#".ord($&).";"/ge; # entity escape
        =13=      $_;
        =14=    }
        =15=    
        =16=    print header, start_html("Chat!"), h1("Chat!");
        =17=    eval { &main };
        =18=    if ($@) {
        =19=      print hr, "ERROR: ", ent($@), hr;
        =20=      exit 0;
        =21=    }
        =22=    
        =23=    sub main {
        =24=      my @entries = get_old_entries();
        =25=      print hr, start_form;
        =26=      print p, "name: ", textfield("name","", 40);
        =27=      print "  email: ", textfield("email", "", 30), br;
        =28=      print "message: ", textarea("message", "", 4, 40, 1);
        =29=      print br, p, "(Submit an empty message to listen)", submit;
        =30=      print end_form, hr;
        =31=      for my $entry (@entries) {
        =32=        print p(), ent($entry->param("name"));
        =33=        print " (", ent($entry->param("email")), ") at ";
        =34=        print ent(scalar localtime $entry->param("time")), " said: ";
        =35=        print p(), ent($entry->param("message"));
        =36=      }
        =37=      print end_html;
        =38=    }
        =39=    
        =40=    sub get_old_entries {
        =41=      use IO::File;
        =42=    
        =43=      my @entries = ();
        =44=      my $chatfh = new IO::File "+<".CHATFILE
        =45=        or die "Cannot open ".CHATFILE.": $!";
        =46=      ## begin critical region (keep short)
        =47=      flock $chatfh, 2;
        =48=      seek $chatfh, 0, 0;
        =49=      while (not eof $chatfh) {
        =50=        push @entries, new CGI $chatfh;
        =51=      }
        =52=      my $message = param("message");
        =53=      if (defined $message and $message =~ /\S/) {
        =54=        ## must transfer limited query to file
        =55=        my $saver = new CGI {"time" => time};
        =56=        for (qw(name email message)) {
        =57=          my $val = param($_);
        =58=          $val = "" unless defined $val;
        =59=          substr($val, 1024) = "" if length $val > 1024;
        =60=          $saver->param($_, $val);
        =61=        }
        =62=        unshift @entries, $saver;
        =63=        splice @entries, MAXENTRIES
        =64=          if @entries > MAXENTRIES;
        =65=        seek $chatfh, 0, 0;
        =66=        truncate $chatfh, 0;
        =67=        for my $entry (@entries) {
        =68=          $entry->save($chatfh);
        =69=        }
        =70=      }
        =71=      close $chatfh;
        =72=      ## end critical region
        =73=      @entries;
        =74=    }

Randal L. Schwartz is a renowned expert on the Perl programming language (the lifeblood of the Internet), having contributed to a dozen top-selling books on the subject, and over 200 magazine articles. Schwartz runs a Perl training and consulting company (Stonehenge Consulting Services, Inc of Portland, Oregon), and is a highly sought-after speaker for his masterful stage combination of technical skill, comedic timing, and crowd rapport. And he's a pretty good Karaoke singer, winning contests regularly.

Schwartz can be reached for comment at merlyn@stonehenge.com or +1 503 777-0095, and welcomes questions on Perl and other related topics.