Copyright Notice

This text is copyright by CMP Media, LLC, and is used with their permission. Further distribution or use is not permitted.

This text has appeared in an edited form in WebTechniques magazine. However, the version you are reading here is as the author originally submitted the article for publication, not after their editors applied their creativity.

Please read all the information in the table of contents before using this article.
Download this listing!

Web Techniques Column 50 (Jun 2000)

[suggested title: Self-registering password protection, part 2]

Last month, I introduced a mod_perl authentication and authorization module that permitted an extended format htpasswd-like file. The purpose of the extensions were to associate an email address with each user, and give a list of ``keys'' that the user had in his permission. Then the .htaccess files would refer to a series of ``locks'' needed for access to a particular file or directory, and access was granted only when the user had the keys for all the locks.

But the cool part of the module was that we redirected the failed authentication or authorization to a CGI URL. The intent of this URL was to give the user a chance to create their own username/password pair, provided that their email address was an address we already recognized. For example, if the authorization file is:

  merlyn@stonehenge.com admin,stoners,perl merlyn xyF9kYWtJIFZ6
  larry@wall.org perl lwall lwUHddn0dCD1I
  fred@flintstone.comm stoners
  barney@rubble.nett stoners

then Larry Wall and I are both already permitted (with the corresponding username and encrypted password), but the user fred@flintstone.comm will be redirected to this CGI script (because they don't have an active login). Now it's time to present the CGI handler to conclude the picture.

But before I do that, I have to confess. I blew it. Line 46 of last month's listing looks like:

  my $error_uri = Apache::URI->parse($r, $FAIL);

when really it needs to be something like:

  my $error_uri = Apache::URI->parse($r);
  $error_uri->path($FAIL);

So much for believing the documentation! (Just kidding, Doug.) Without this change, this month's CGI script doesn't know where it lives in the URL-space, and doesn't give a correct self-referencing URL as the form action. As you might guess, I sent last month's column off to bed before thoroughly testing (or even beginning to write) the CGI script. Oh well, my $bad.

The other thing I didn't realize until I finished this month's code is that it's h-u-u-g-e by comparison to many of my other CGI programs, weighing in at 224 lines. Because of that, I've only got room here to hit the highlights, and won't do my usual line-by-line attack.

Line 1 turns on taint mode, and because I use a child process later, I set the PATH in line 4 to keep taint mode from failing.

Line 6 is a reminder that this script won't work if I move it to /perl (my Apache::Registry area) because I use nested subroutines and file-scoped my variables -- a deadly combo with that module. I could have written this to be a mod_perl module, but I expect it to be invoked so infrequently that I wouldn't have saved any CPU time (at least not compared to the human time it would take to work that out). (Find out more about that in the mod_perl guide linked from perl.apache.org.)

Lines 10 through 12 are the configuration sections. $DIR must be the same as the module's value for the Unix path to the directory where the password files are stored. $ADMIN_EMAIL is the address shown on the web page for problems or questions, and mail will also be sent from this address for the verification of the email address.

Lines 22 to 36 define the top-level algorithm. Inside an eval block (in case we die somewhere), I first connect to the database specified by the realm parameter. Then, I see if I'm trying to verify a URL that looks like:

  http://www.stonehenge.com/cgi/restricttolist?realm=abc&verify=abc123456

that I've sent to the user in email. If not, perhaps I'm trying to process form data. Either of those routines can return a message. If the message begins with FORM:, I send a new form to the user. Otherwise, it's just a status message and I show it.

If there were any near-fatal errors in that section, I show the ``internal error'' message in lines 32 to 35, and also log the real error message to the web server log. The warn dumps to STDERR, which hopefully is still attached to the log.

Lines 47 to 64 dump the form, if needed. The incoming message in line 48 is shown at the top. The email address, requested username, and password fields are displayed in a nice table for layout, along with some descriptive comments. The realm name is also included as a sticky hidden field.

Lines 66 to 91 handle the form response, including kicking out the form values for various reasons. If the given email address is not one of the ones we're expecting, or already registered, then we bail early. Otherwise, we're looking at a potential user to add, and we'll verify the requested username (not necessarily correlated with the email address) and password. The username has to be a simple alpha-numeric, but the password can be any arbitrary gobbledygook, as long as it's long enough. If we make it through all the hoops, we jump into the add_user routine.

Lines 93 to 102 take an email address, a requested user name, and a clear-text password and add it to the data file. To ensure that we're talking to the right user, we'll create a very-difficult-to-guess-or-hack hex string using the Digest::MD5 module (from the CPAN). $hexhash ends up as a 32-character hex string based on the process ID, the time of day, a constant string, and a 32-bit integer based on the current processes running on the system. Probably not enough for some three-letter government agency to consider secure, but close enough for this application.

After that, we'll update the database password field to be the crypted password (using the first two random hex characters as a salt) followed by a dash and the random 32-character hex string. The combination will prevent the mod_perl module from successfully authenticating to this string, but nevertheless let us record what we need.

Next, the hex string is used to create a URL to be mailed to the user, handled by the subroutine in lines 104 to 135. Here, I'm using the basic Net::SMTP module (out of the libnet distribution in the CPAN) to attach to the localhost mailer for delivery. I'm presenting the link in two different ways in the mail because some mail readers interpret the URL only when enclosed in the angle brackets, while others simply display that as text.

The mail contains a URL of the form:

  http://www.stonehenge.Xcom/cgi/restricttolist?realm=abc&verify=abc123456

where the text after verify= is the 32-character hex string generated by the random function. Once a user gets the mail and follows this link, they end up back at this script in the handle_confirm_request subroutine of lines 137 to 147. This confirms that it's indeed the user that we believe it is, and we can go ahead and enable the password. The unique hexstring serves as the key to locate the appropriate email address in the database. In line 142, I strip out the hex string from the encrypted password, leaving just the 13-character output of crypt from earlier. This then gets rewritten back into the database, and we're done. Any false moves and we get the message in line 146, and nothing changes.

Lines 149 to 159 get and cache the realm value. This is the ``basic auth'' realm, and is also the basename of the file holding the database, once we've stripped out any non alphanumeric characters. Because this originally comes from a form, we must untaint the value before using it as an output filename, hence the taint steps.

Lines 161 to 224 form a collection of routines to get, update, query, and store the database. The shared private data for this section is defined in lines 162 and 163. Lines 165 to 188 open the database file exclusively, and load up the internal hash information for quick access.

Lines 168 to 175 are the trickiest part of this. Because we want the database file to be usable by the webserver at all times, we must use a ``write a temp file then atomically rename'' operation to update the file to contain new information. However, this method doesn't work well with flocking, because an exclusive flock may indeed be acquired on a file that has now been deleted as the result of the previous holder now renaming a new file over the top! So, it's messy. We open the filehandle, and execute the exclusive flock. If we get it, we then also verify that the filehandle we hold is the same file as the full pathname we can stat, by comparing the device and inode numbers of the two items. If they're not the same, we release the handle, which is now probably a file that is completely unlinked, and start over. Eventually, we'll hold an exclusive flock on a filehandle that is also the named file, and we can move on. Another approach would be to do a non-blocking flock, and if unattained, simply sleep for a moment and retry. On second thought, that sounds simpler.

Lines 177 to 188 parse the data. The majority of the database is held in (and recreated from) the %info_for hash, but we also flag all the email addresses and usernames in additional hashes for quick lookup. The keys of the %info_for are lowercased to allow case-insensitive email address lookup, although the data of the first field preserves the case, in case the case matters.

Lines 196 to 215 write out the data. We're presuming that we'll do this (and get called) only once per invocation, because any update is also an immediately flush and release. We'll write a temporary file in the same directory as the database, but with .tmp appended to the name, and then give that file the same permission bits. Then, the new updated data gets written (in sorted email address order, so that may not be the same as the original order). And finally, the rename moves the file into place as the new active database, ready to be authenticated and authorized on the next webserver hit.

The last two subroutines access some of the database data in an orderly manner, needed because the data is not visible outside the BEGIN block.

Whew! That's a long one. But it also does a lot of things. But the basic strategy is simple:

  1. If a user that needs access comes to the protected area, and cannot provide a valid username/password, they get redirected to this script, with the proper realm passed as a parameter.

  2. On the first invocation (or if an error occurs), the form is displayed, and the user fills in their email address, the username they want, and the password they want.

  3. If the email address is in the database, the username/password is added (but not activated), and a random string for verification gets generated in email to the user

  4. The user waits for the email, and follows the included URL

  5. The script, upon verifying that the user got the email via the random string, activates the selected username and password.

  6. The user can then go back to the protected area with an active username and password, and we have unique logging and access control.

So, install this script at the URL defined by $FAIL in last month's column, and you'll have a fully working system (presuming you did all the things from last month as well). And next time I write a two-part column, I'll be sure to test both parts before submitting the first one, I promise! Until next time, enjoy.

Listings

        =1=     #!/usr/bin/perl -Tw
        =2=     use strict;
        =3=     $|++;
        =4=     $ENV{PATH} = "/bin:/usr/bin";
        =5=     
        =6=     ## do not run this under Apache::Registry (nested subroutines abound)
        =7=     
        =8=     ## config
        =9=     
        =10=    my $DIR = "/home/merlyn/Web/RestrictToList";
        =11=    my $ADMIN_EMAIL = 'webmaster@stonehenge.comm';
        =12=    my $ADMIN_HUMAN = 'Randal L. Schwartz';
        =13=    
        =14=    ## end config
        =15=    
        =16=    use CGI qw(:all);
        =17=    use IO::File;
        =18=    
        =19=    print
        =20=      header, start_html("Web access request"), h1("Web access request");
        =21=    
        =22=    eval {                          # death if unexpected things happen
        =23=      connect_to_database();
        =24=      my $message = handle_confirm_request() || handle_form_response();
        =25=      if ($message =~ /^FORM:\s+(.*)/s) {
        =26=        show_form($1);
        =27=      } else {
        =28=        print p($message);
        =29=      }
        =30=    };
        =31=    if ($@) {
        =32=      warn $@;                      # to web error log
        =33=      print
        =34=        h1("An internal error has occurred"),
        =35=        p("Please contact me at once, and describe what you did.");
        =36=    }
        =37=    
        =38=    print
        =39=      h1("If you have any questions..."),
        =40=      p("Please contact me at", a({href => "mailto:$ADMIN_EMAIL"}, $ADMIN_EMAIL)),
        =41=      end_html;
        =42=    
        =43=    exit 0;
        =44=    
        =45=    ## only subroutines from here down
        =46=    
        =47=    sub show_form {
        =48=      my $message = shift;
        =49=    
        =50=      print
        =51=        hr, p($message), start_form,
        =52=        table({cellspacing => 0, cellpadding => 2, border => 1},
        =53=              Tr(th("Your email address"),
        =54=                 td(textfield(-name => 'email', -size => 50)),
        =55=                 td("just the USER\@HOSTNAME.COM part, please")),
        =56=              Tr(th("Your preferred username"),
        =57=                 td(textfield(-name => 'username', -size => 50)),
        =58=                 td("letters and digits only, at least 5 characters")),
        =59=              Tr(th("Your preferred password"),
        =60=                 td(password_field(-name => 'password', -size => 50)),
        =61=                 td("at least 5 characters")),
        =62=             ),
        =63=        submit, hidden('realm'), end_form, hr;
        =64=    }
        =65=    
        =66=    sub handle_form_response {
        =67=      my $form_email = param('email')
        =68=        or return "FORM: Please give your email address.";
        =69=      my $info = info_for_email($form_email)
        =70=        or return "FORM: I don't recognize that email address, try again.";
        =71=      if (my $user = $info->{user}) {
        =72=        if (my $password = $info->{password}) {
        =73=          if ($password =~ /-/) {   # needs to be verified
        =74=            return "You need to check your email for further instructions.";
        =75=          }
        =76=        }
        =77=        return "You are already registered!";
        =78=      }
        =79=      ## OK, it's a new user with a good email - let's make a login
        =80=      my $form_user = param('username')
        =81=        or return "FORM: Please give an access username for $form_email.";
        =82=      $form_user =~ /^\w{5,20}$/
        =83=        or return "FORM: Please use 5 to 20 letters or digits in the username.";
        =84=      seen_user($form_user)
        =85=        and return "FORM: Sorry, that username is taken. Try another.";
        =86=      my $form_password = param('password')
        =87=        or return "FORM: Please give a password for $form_user.";
        =88=      length($form_password) >= 5
        =89=        or return "FORM: Please use at least 5 characters in the password.";
        =90=      return add_user($form_email, $form_user, $form_password);
        =91=    }
        =92=    
        =93=    sub add_user {
        =94=      my ($email, $user, $password) = @_;
        =95=      
        =96=      require Digest::MD5;
        =97=      my $hexhash = Digest::MD5::md5_hex($$, time, "sekret kode",
        =98=                                         unpack "%L*", `ps axww`);
        =99=      update_info_for_email($email, $user,
        =100=                           crypt($password, $hexhash) . "-$hexhash");
        =101=     return send_email($email, $hexhash); # DEBUG
        =102=   }
        =103=   
        =104=   sub send_email {
        =105=     my ($email, $hexhash) = @_;
        =106=   
        =107=     my $confirm_url = url()."?realm=".get_realm()."&verify=$hexhash";
        =108=   
        =109=     require Net::SMTP;
        =110=     my $mail = Net::SMTP->new('localhost') or die "Cannot open mail: $@/$!";
        =111=     $mail->mail($ADMIN_EMAIL);
        =112=     $mail->to($email);
        =113=     $mail->data(<<END);
        =114=   To: $email
        =115=   From: "$ADMIN_HUMAN" <$ADMIN_EMAIL>
        =116=   Subject: confirming your web access request (PLEASE READ)
        =117=   
        =118=   Please visit the following URL:
        =119=   <URL:$confirm_url>
        =120=   
        =121=   If the link above is not a clickable link, visit this location:
        =122=   
        =123=     $confirm_url
        =124=   
        =125=   Please copy the link into your web browser's "location" or "URL"
        =126=   field carefully.
        =127=   
        =128=   If you have any questions, please reply to this email.
        =129=   
        =130=   Thank you,
        =131=   $ADMIN_HUMAN <$ADMIN_EMAIL>
        =132=   END
        =133=     
        =134=     return "Look for email sent to ".tt($email)." for further instructions.";
        =135=   }
        =136=   
        =137=   sub handle_confirm_request {
        =138=     my $form_verify = param('verify') or return;
        =139=     if (my $email_from_hexhash = seen_hexhash($form_verify)) {
        =140=       my ($email, $keys, $user, $encrypted_password) =
        =141=         @{info_for_email($email_from_hexhash)}{qw(email keys user pw)};
        =142=       $encrypted_password =~ s/-.*//;
        =143=       update_info_for_email($email, $user, $encrypted_password);
        =144=       return "Your access has been verified!";
        =145=     }
        =146=     return "Please be sure to copy the URL very carefully from the email.";
        =147=   }
        =148=   
        =149=   BEGIN {                         # realm holder
        =150=     my $realm;
        =151=   
        =152=     sub get_realm {
        =153=       return $realm if $realm;
        =154=       $realm = param('realm') or die "Missing Realm";
        =155=       $realm =~ tr/A-Za-z0-9//cd;
        =156=       $realm =~ /(\w+)/ or die "empty realm";
        =157=       return $realm = $1;         # untaint
        =158=     }
        =159=   }
        =160=   
        =161=   BEGIN {                         # database things
        =162=     my ($filename, $db_handle);
        =163=     my (@data, %info_for, %seen_user, %seen_hexhash);
        =164=   
        =165=     sub connect_to_database {
        =166=       $filename = "$DIR/".get_realm();
        =167=       {
        =168=         my $h = IO::File->new("< $filename") or die "cannot open $filename: $!";
        =169=         flock($h, 2);
        =170=         ## because this file might be renamed underneath us:
        =171=         my @handle_stat = stat $h;
        =172=         my @file_stat = stat $filename;
        =173=         redo if $handle_stat[0] != $file_stat[0];
        =174=         redo if $handle_stat[1] != $file_stat[1];
        =175=         $db_handle = $h;          # fall out
        =176=       }
        =177=       for (@data = <$db_handle>) {
        =178=         my ($email, $keys, $user, $pw) = split;
        =179=         for ($info_for{lc $email} ||= {}) {
        =180=           $_->{email} = $email;
        =181=           $_->{keys} = $keys;
        =182=           $_->{user} = $user if $user;
        =183=           $_->{pw} = $pw if $pw;
        =184=         }
        =185=         $seen_user{$user}++ if $user;
        =186=         $seen_hexhash{$1} = $email if $pw and $pw =~ /-(.*)/;
        =187=       }
        =188=     }
        =189=   
        =190=     sub info_for_email {
        =191=       my $email = shift;
        =192=   
        =193=       $info_for{lc $email};
        =194=     }
        =195=   
        =196=     sub update_info_for_email {
        =197=       my ($email, $user, $encrypted_password) = @_;
        =198=   
        =199=       for ($info_for{lc $email}) {
        =200=         defined $_ or die "shouldn't happen - undef lc $email info";
        =201=         $_->{user} = $user;
        =202=         $_->{pw} = $encrypted_password;
        =203=       }
        =204=       my $tmp = "$filename.tmp";
        =205=       {
        =206=         my $h = IO::File->new("> $tmp") or die "Cannot create $tmp: $!";
        =207=         chmod +(stat($db_handle))[2], $tmp or warn "Cannot chmod $tmp: $!";
        =208=         for my $email (sort keys %info_for) {
        =209=           print $h join(" ", map {
        =210=             exists $info_for{$email}{$_} ? $info_for{$email}{$_} : ()
        =211=           } qw(email keys user pw)), "\n";
        =212=         }
        =213=       }
        =214=       rename $tmp, $filename or warn "Cannot rename $tmp to $filename: $!";
        =215=     }
        =216=   
        =217=     sub seen_user {
        =218=       $seen_user{+shift} ? 1 : 0;
        =219=     }
        =220=   
        =221=     sub seen_hexhash {
        =222=       $seen_hexhash{+shift};      # returns email key
        =223=     }
        =224=   }

Randal L. Schwartz is a renowned expert on the Perl programming language (the lifeblood of the Internet), having contributed to a dozen top-selling books on the subject, and over 200 magazine articles. Schwartz runs a Perl training and consulting company (Stonehenge Consulting Services, Inc of Portland, Oregon), and is a highly sought-after speaker for his masterful stage combination of technical skill, comedic timing, and crowd rapport. And he's a pretty good Karaoke singer, winning contests regularly.

Schwartz can be reached for comment at merlyn@stonehenge.com or +1 503 777-0095, and welcomes questions on Perl and other related topics.