Copyright Notice

This text is copyright by CMP Media, LLC, and is used with their permission. Further distribution or use is not permitted.

This text has appeared in an edited form in WebTechniques magazine. However, the version you are reading here is as the author originally submitted the article for publication, not after their editors applied their creativity.

Please read all the information in the table of contents before using this article.
Download this listing!

Web Techniques Column 43 (Nov 1999)

[suggested title: Automatically Testing a Form]

I've written quite a few CGI programs to handle form data. I'm always intersted in finding ways to help me test these forms, especially to invoke them with canned or computed data, or even to stress test them.

Of course, it's easy to write a Perl program that invokes a CGI script with computed form data. You just whip out LWP, add a dabble of HTTP::Request::Common, and you're done.

But what if there are many fields, or you didn't write the form yourself and you're having trouble scanning through the text to get all the form field names and default values? Well, Perl can help with that too. After all, we can parse HTML and generate Perl programs with Perl programs. What if we wrote a Perl program to invoke a URL, study the resulting HTML, and come up with another Perl program that invokes the resulting form action script? Well, that doesn't have to be a ``what if'', because such a program is presented in [listing one, below].

Lines 1 through 3 enable warnings, common compiler restrictions, and disable buffering on STDOUT, letting us see output as it occurs.

Lines 5 through 9 bring in 5 modules from the LWP library, found in the CPAN (at http://www.cpan.org/, amongst other places). I'll be using methods from these modules later in the program.

Lines 11 through 30 form the ``main'' part of the code. I put this section into a block of its own so that I could make sure that main variables don't leak into subroutines unless I explicitly pass them. Line 12 gets the URL to be processed from the command line. Line 13 creates a ``user agent'': a web client to fetch that page.

Line 14 creates a cookie jar that will hold any cookie responses from the server. If a cookie comes back, it's likely that we'll need to pass this same cookie when we submit the form. This is detected in line 25, described in a moment. Lines 15 through 17 fetch the initial form, aborting if we can't even get the form in the first place.

Lines 19 through 28 dump to standard output a skeleton of a Perl program. This program will post a form response to the selected URL with the default values of each of the fields, as if you selected a submit button without changing anything. If the initial fetch of the form generated a cookie, the generated program also fetches the form, so that a proper cookie can be stored. (The program does not attempt to detect if the cookie is needed; you can always comment that part out if you determine otherwise.) Line 29 invokes the rest of the program on the HTML response from the URL, which generates the rest of the program.

Lines 32 through 39 make the &S subroutine a simple ``value-to-Perl-string'' converter. The arguments passed to this subroutine are a list of arbitrary scalar values. The return string is something that when compiled should generate the same value. I could have used Data::Dumper here, but that probably would have been overkill. (I pasted this routine from my October 1998 column anyway, so it was a no-brainer.)

The MyParser class definition begins in line 41. I put it inside a BEGIN block to mimic a use operation, even though it's part of the same file. Lines 42 and 43 establish the class name, and declare it to be a subclass of HTML::Parser. The HTML::Parser class is an abstract class, which notes the different elements of HTML-like text and calls back to initially empty methods as start and end tags are noted. To get anything useful, we have to subclass the class, and override the callbacks.

Line 44 makes it easier for me to use the S subroutine inside the MyParser class, doing the same thing that Exporter does in a normal import method.

Lines 46 through 53 define the sole constructor method for MyParser. In fact, this constructor method not only constructs an object of type MyParser (in line 48), it also invokes instance methods on that object. When this class method returns, the object has already fulfilled its entire function!

Note in line 50 the use of __PACKAGE__. I wanted to have variables that I knew would not collide with existing hash keys in the inherited variable. There are many ways to do this, but I chose to put all of my variables inside a single instance variable named for the name of this class (MyPackage). Hence the two-level variable indirections for all of my subclass-specific variables. The methods of parse and eof used in lines 51 and 52 are defined in HTML::Parser, causing the object to parse the data and invoke the callbacks defined below.

Lines 55 through 96 define the start callback, called whenever a start tag is seen in the text. The parameters are extracted in lines 56 and 57, using the same variable names as given in the documentation for HTML::Parser.

Lines 58 to 65 handle appearances of a form start tag. This is our reason for existance, so we set up camp here. Line 59 notes that we're now inside a form. Line 60 clears out the array (reference) holding an ordered list of all the fieldnames and values. Lines 61 and 62 compute the URL to invoke the action on this form (as if we were pressing a submit button). Line 63 saves this URL as yet another instance variable.

Line 66 ignores the rest of the subroutine if we're not already inside a form. This saves us time when we are looking at tags outside of forms. Most of the remainder of the start tags simply change the state of the object, rather than dumping any text, because we really need to see the end of a construct to know what to do.

Lines 67 to 74 handle an input start tag. If there's any accumulated text (described later), we'll dump that first in line 68. Then the type, name, and default value are all saved away into the fields instance variable as an array ref. Most form fields are input tags, so this portion gets invoked quite frequently.

Lines 75 to 81 handle the select start tags. Again, we extract the name of the field, but we also need to pass along information about whether this is a normal select or a ``select multiple'' tag, determined and recorded in line 78. Line 79 remembers the name and type in another instance variable.

Lines 82 through 88 handle the options within a select structure. First, a previous option is ended, if necessary, in line 83. Then we note and record the ``selected'' attribute as well as the value in lines 84 through 86. The option itself isn't dumped until later.

Lines 89 through 94 handle a textarea start tag. Again, we note the name, and save this into a state variable for being within a text area, in line 92.

Line 95 takes any unrecognized tag (probably markup for the text around the fields) and shoves it into the text instance variable for later access.

Lines 98 to 136 define the end callback, when an end tag is seen. Again, the parameters to this callback are defined by the HTML::Parser class, and extracted in lines 99 and 100. Line 101 returns quickly if we're not yet in a form.

Lines 102 to 118 handle the important ``end form'' tag. At this point, we've saved into instance variables all the necessary fields and values, and it's time to dump them out. Line 103 resets the in_form flag so we don't process elements outside a form. Line 104 dumps out any miscellaneous accumulated text.

Lines 105 and 106 generate Perl code that invokes the action URL with a POST method. The form data comes from the following lines, passed as an array reference to the POST subroutine (defined by code earlier in the generated program).

Lines 107 to 114 dump the fields and original non-field text from the form. If an element of the fields array ref (dereferenced to get the list) is a reference itself, then we had a field, and we dump the fieldname and its value. The type of the original field is also noted as a comment.

It's up to you to edit these lines to get the right field value, but at least you can leave it alone if you want the default values. That is, unless the original field was a radio button or a select with multiple options. The generated code will act as if all items are simultaneously selected, probably confusing the heck out of the form responder (although that could be a valid test). You should comment out the lines that are generating unused values, or add extra code to select the right radio button according to your wishes.

Line 112 dumps (as a Perl comment) the original HTML text that is intermingled with the field tags. This is often useful to determine what form elements mean. These items come from all those dumptext invocations earlier (and later) in the program.

Lines 115 through 117 close off the request method invocation in the generated program, including the proper setting of the referer header in case the accessing program is picky.

Lines 120 to 123 handle an option end tag by calling an instance method, defined below. Likewise, Lines 124 to 128 handle the end of a select, and remove all memory of having been within the select.

Lines 129 to 134 handle the end of a textarea by turning all the accumulated text into the default value, and triggering a field with that name, default value, and type.

If we make it to line 135, we have an unrecognized end tag, and we'll just throw that into the text accumulator as something other than an interesting form field markup.

Lines 138 to 143 define the text callback, as specified again by HTML::Parser. Any basic non-tag non-comment text comes here. Line 141 discards anything not with a form. Line 142 shoves any other text into the text accumulator, to be processed by the various end tags that are looking for text.

Lines 145 to 155 define an instance method, used whenever we suspect we might have been in an option of a select. If we're not, line 147 bails out quickly. Line 148 gets the name and type of the enclosing select that we must be in. Note that bad HTML can fool us here, but I didn't put a lot of error checking into this. That's left for you to do.

Line 149 grabs the option attribute (selected or not) and the value if any, both from the option tag that started this option. Line 150 also grabs any accumulated text to provide the presentation value, and therefore the default option value (copied in line 152). Line 151 adds selected to the type if that was present in the original HTML. Line 153 notes a differing presentation value from option value, in case that makes a difference while you are deconstructing the form. Finally, line 154 adds the field name, type, and value to the list of fields that will be dumped at the end of this form.

Finally, lines 157 to 163 handle any accumulated text outside of a textarea or option. In this case, we simply take the text and push it into the fields array ref, causing it to be dumped when we dump the form.

OK, so that's a long program to do a short task. To use it, invoke the program giving a single URL on the command line. The output should be a valid Perl program that will require some hand editing on your part.

Each form in the original HTML turns into an invocation of the request method. If there's more than one form, you'll need to toss the ones you don't want, or add some logic to invoke them in sequence. If you want to process the response from invoking the form, you'll also need to change:

        $ua->request(POST ...

to

        my $response = $ua->request(POST ...

and then do something with the HTTP::Response object in $response.

As described earlier, you'll also need to edit any radio buttons or selects into code that selects a particular item instead of all of them at the same time.

Once you've done that minimal hand editing, invoke the program! It'll be just like visiting the page with a browser, filling out the fields as you've selected, and then submitting the form!

For making rapid-fire submissions to stress test a server, you can combine the code from this program with the technique given in my August 1998 column. Please don't use this to stuff ballot boxes though for those online surveys, however. That would be an unfortunate abuse of good Perl code. Until next time, enjoy!

Listings

        =1=     #!/usr/bin/perl -w
        =2=     use strict;
        =3=     $|++;
        =4=     
        =5=     use LWP::UserAgent;
        =6=     use HTTP::Cookies;
        =7=     use HTTP::Request::Common;
        =8=     use HTML::Parser;
        =9=     use URI;
        =10=    
        =11=    {
        =12=      my $url = shift;
        =13=      my $ua = LWP::UserAgent->new;
        =14=      $ua->cookie_jar(HTTP::Cookies->new);
        =15=      my $response = $ua->request(GET $url);
        =16=      $response->is_success
        =17=        or die "Cannot get $url: ", $response->status_line, "\n";
        =18=      
        =19=      print <<'END';
        =20=    use LWP::UserAgent;
        =21=    use HTTP::Request::Common;
        =22=    my $ua = LWP::UserAgent->new;
        =23=    $ua->env_proxy;
        =24=    END
        =25=      if ($ua->cookie_jar->as_string) { # they set cookies
        =26=        print 'use HTTP::Cookies; $ua->cookie_jar(HTTP::Cookies->new);', "\n";
        =27=        print '$ua->request(GET ', S($url), "); # to set cookies\n";
        =28=      }
        =29=      MyParser->dump_forms_for($response);
        =30=    }
        =31=    
        =32=    sub S {                         # perl stringify argument
        =33=      join ", ",
        =34=      map {
        =35=        local $_ = $_;
        =36=        s/([^ !#%-?A-~])/sprintf "\\x%02x", ord $1/ge;
        =37=          qq{"$_"};
        =38=      } @_;
        =39=    }
        =40=    
        =41=    BEGIN {                         # MyParser
        =42=      package MyParser;
        =43=      use base qw(HTML::Parser);
        =44=      *S = \&main::S;
        =45=    
        =46=      sub dump_forms_for {
        =47=        my $class = shift;
        =48=        my $self = $class->SUPER::new;
        =49=        my $response = shift;
        =50=        $self->{__PACKAGE__}{response} = $response;
        =51=        $self->parse($response->content);
        =52=        $self->eof();
        =53=      }
        =54=    
        =55=      sub start {                   # CALLBACK
        =56=        my $self = shift;
        =57=        my ($tag,$attr,$attrseq,$origtext) = @_;
        =58=        if ($tag eq "form") {
        =59=          $self->{__PACKAGE__}{in_form} = 1;
        =60=          $self->{__PACKAGE__}{fields} = [];
        =61=          my $url = URI->new_abs($attr->{action} || "",
        =62=                                 $self->{__PACKAGE__}{response}->base)->as_string;
        =63=          $self->{__PACKAGE__}{invoke} = $url;
        =64=          return;
        =65=        }
        =66=        return unless $self->{__PACKAGE__}{in_form};
        =67=        if ($tag eq "input") {
        =68=          $self->dumptext;
        =69=          my $type = lc $attr->{type};
        =70=          my $name = $attr->{name}; $name = "" unless defined $name;
        =71=          my $value = $attr->{value}; $value = "" unless defined $value;
        =72=          push @{$self->{__PACKAGE__}{fields}}, [$name, $type, $value];
        =73=          return;
        =74=        }
        =75=        if ($tag eq "select") {
        =76=          $self->dumptext;
        =77=          my $name = $attr->{name};
        =78=          my $type = exists $attr->{multiple} ? "select multiple" : "select";
        =79=          $self->{__PACKAGE__}{in_select} = [$name, $type];
        =80=          return;
        =81=        }
        =82=        if ($tag eq "option") {
        =83=          $self->endoption;         # end previous one if needed
        =84=          my $selected = exists $attr->{selected} ? "selected" : "";
        =85=          my $value = $attr->{value}; $value = "" unless defined $value;
        =86=          $self->{__PACKAGE__}{in_option} = [$selected, $value];
        =87=          return;
        =88=        }
        =89=        if ($tag eq "textarea") {
        =90=          $self->dumptext;
        =91=          my $name = $attr->{name};
        =92=          $self->{__PACKAGE__}{in_textarea} = $name;
        =93=          return;
        =94=        }
        =95=        $self->{__PACKAGE__}{text} .= $origtext;
        =96=      }
        =97=    
        =98=      sub end {                     # CALLBACK
        =99=        my $self = shift;
        =100=       my ($tag, $origtext) = @_;
        =101=       return unless $self->{__PACKAGE__}{in_form};
        =102=       if ($tag eq "form") {
        =103=         delete $self->{__PACKAGE__}{in_form};
        =104=         $self->dumptext;
        =105=         my $url = $self->{__PACKAGE__}{invoke};
        =106=         print "\$ua->request(POST ",S($url),", [\n";
        =107=         for (@{$self->{__PACKAGE__}{fields}}) {
        =108=           if (ref $_) {
        =109=             my ($name, $type, $value) = @$_;
        =110=             print "  ", S($name), " => ", S($value), ", # $type\n";
        =111=           } else {                # original text
        =112=             print "  # text ", S($_), "\n";
        =113=           }
        =114=         }
        =115=         print "],\n";
        =116=         print "  Referer => ", S($self->{__PACKAGE__}{response}->base), ",\n";
        =117=         print ");\n";
        =118=         return;
        =119=       }
        =120=       if ($tag eq "option") {
        =121=         $self->endoption;
        =122=         return;
        =123=       }
        =124=       if ($tag eq "select") {
        =125=         $self->endoption;
        =126=         delete $self->{__PACKAGE__}{in_select};
        =127=         return;
        =128=       }
        =129=       if ($tag eq "textarea") {
        =130=         my $name = delete $self->{__PACKAGE__}{in_textarea};
        =131=         my $text = delete $self->{__PACKAGE__}{text};
        =132=         push @{$self->{__PACKAGE__}{fields}}, [$name, "textarea", $text];
        =133=         return;
        =134=       }
        =135=       $self->{__PACKAGE__}{text} .= $origtext;
        =136=     }
        =137=   
        =138=     sub text {                    # CALLBACK
        =139=       my $self = shift;
        =140=       my ($text) = @_;
        =141=       return unless $self->{__PACKAGE__}{in_form};
        =142=       $self->{__PACKAGE__}{text} .= $text;
        =143=     }
        =144=   
        =145=     sub endoption {
        =146=       my $self = shift;
        =147=       return unless exists $self->{__PACKAGE__}{in_option};
        =148=       my ($name, $type) = @{$self->{__PACKAGE__}{in_select}};
        =149=       my ($selected, $value) = @{delete $self->{__PACKAGE__}{in_option}};
        =150=       my $text = delete $self->{__PACKAGE__}{text};
        =151=       $type .= " $selected" if $selected;
        =152=       $value = $text unless length $value;
        =153=       $type .= " (text ".S($text).")" unless $text eq $value;
        =154=       push @{$self->{__PACKAGE__}{fields}}, [$name, $type, $value];
        =155=     }
        =156=   
        =157=     sub dumptext {
        =158=       my $self = shift;
        =159=       my $text = delete $self->{__PACKAGE__}{text};
        =160=       if (defined $text and length $text) {
        =161=         push @{$self->{__PACKAGE__}{fields}}, $text;
        =162=       }
        =163=     }
        =164=   }

Randal L. Schwartz is a renowned expert on the Perl programming language (the lifeblood of the Internet), having contributed to a dozen top-selling books on the subject, and over 200 magazine articles. Schwartz runs a Perl training and consulting company (Stonehenge Consulting Services, Inc of Portland, Oregon), and is a highly sought-after speaker for his masterful stage combination of technical skill, comedic timing, and crowd rapport. And he's a pretty good Karaoke singer, winning contests regularly.

Schwartz can be reached for comment at merlyn@stonehenge.com or +1 503 777-0095, and welcomes questions on Perl and other related topics.