Copyright Notice

This text is copyright by CMP Media, LLC, and is used with their permission. Further distribution or use is not permitted.

This text has appeared in an edited form in WebTechniques magazine. However, the version you are reading here is as the author originally submitted the article for publication, not after their editors applied their creativity.

Please read all the information in the table of contents before using this article.
Download this listing!

Web Techniques Column 65 (Sep 2001)

[suggested title: Simple Table-Output Spanning and Sorting]

I participate quite frequently at the web-based community of Perl users known as the Monestery, found at www.perlmonks.org. Recently, a fellow ``monk'' (as we call each other) wanted to know how to take a data table and generate pretty HTML from it. That's not very hard, but the monk wanted to know how to ``span'' the identical data items in a vertical column to make it easier to read, since apparently many of the items were identical.

There were various solutions proposed, including a very nice one that pulled out the HTML::Table module from the CPAN, which I had seen but not played with before. However, the solution presented spanned only on the first column, and in their defense, that's all that was being asked for.

But I saw this as an opportunity to take it step further, and generate a spanned HTML table from a dataset, with the spanning being in any column, not just the first. So I quickly whipped up a few lines of code, and then wanted some sort of data to try this out: something with a lot of similar vertical data rows. ``Aha!'' I said, ``how about the output of ps?''. After a dozen more minutes, I was grabbing the output of ps, and dumping it out as a CGI script with all those wonderful spanned columns, and when I hit reload, the values would change. In fact, I was rather amazed at how much it cleaned up the output to have those spanned items there.

But then I wanted to sort the output by the different columns, to bring different combinations of things together to be spanned. Rather than hardwire the new sorting order into the program, I decided to add links to let me select it on the fly. After another 10 or 15 minutes, I was done, with the program I happily present in [listing one, below].

Lines 1 through 3 start nearly every CGI program I write, turning on taint mode, warnings, compiler restrictions, and unbuffered output.

Line 5 pulls in the CGI module, along with all the HTML-generation shortcuts as subroutines. Line 6 imports the HTML::Table module, found in the CPAN.

Lines 8 to 11 define the sort column. If this is the first invocation, then the parameter will be empty or undefined. However, if it's an integer (with a possible leading minus), we'll accept it as-is.

Line 13 prints the CGI header and the HTML header, giving the output a title of Processes.

Line 15 sets the PATH environment variable to a known short value. This is needed in taint mode to permit commands to be executed, which we're doing in the next statement.

Lines 16 through 19 extract the output of the ps command as a two-dimensional table. First, the command is executed in backquotes in a list context, so that we get a list with each element being one line of output. The map then takes each line in $_, and unpacks it according to the format string. This format string took a bit of trial and error to create, and is valid only for the version of ps on my system. Finally, each element of that result is passed through an inner map to remove any leading and trailing whitespace, while retaining any inner whitespace. That list is wrapped in an anonymous array constructor (the outer square brackets on lines 17 and 18), which gets returned as one of the elements eventually ending up in @result.

There's a lot of work in all of that. The most important thing is that we got some interesting data in a two-dimensional ``array of arrays'' (using the terms loosely), and that's our data table for the next step.

Speaking of that next step, line 23 defines the HTML::Table object. Then lines 24 to 40 add the first row, one column at a time. This first row is special: the contents come from the first data line, which is presumed to be the column labels from the ps command. And the column heads need to have links to control the sorting on subsequent invocations.

Lines 24 to 27 define the parameters for this for loop. @headers is the entire header info from the two-dimensional array and $col is the current column number (0-based). We execute this loop over every element of @headers.

Line 28 converts the 0-based column number into a 1-based ``tag'', which really is just another column number except that I had already used $col for a variable name. Line 29 defines the icon that will be associated with this column, initially empty.

Lines 30 to 36 adjust both the tag and the icon to control sorting parameters. If the tag is the same as the sort column, then we're at the column that wants to be sorted in an ascending order. So, we flip the direction and flip the tag to its negative value. Similarly, if the tag is the opposite direction of the sort column, we are performing a descending sort, so we'll flip back to a forward sort.

Another way to say this is that if we have 5 columns, initially we'll dump out ``1, 2, 3, 4, 5'' for the 5 tags. If the third tag is selected, on the next time we dump out ``1, 2, -3, 4, 5'' so that selecting it a second time will perform a descending sort, but selecting any of the others creates an ascending sort on that column. And selecting the column that's already in descending sort will make it an ascending sort once again. Simple but intuitive user interface. The icons are included from the standard Apache distribution in the standard places. I was too lazy to invent my own.

Lines 37 through 39 add the column to the first (only) row of the table. Each label is wrapped in an A HREF=... with a target of this same script and the appropriate sort flag. Note that the labels are HTML-entitized, in case the output of ps has a column like X&Y or some other HTML-significant character.

Line 41 tells the HTML::Table object that this first row is a ``header'' and should be in some sort of header visual distinction. Some browsers may eventually scroll a too-large body while keeping this header visible. One can only hope.

Lines 45 to 60 handle the request sorting, if any. If $sortcol is positive, we want an ascending sort, and if negative, a descending sort. Line 46 sets up an ``alpha sort needed'' flag, initially false. Line 47 reduces the direction to +1 or -1. Line 48 gets the column number, but reduces it by one to make it 0-based instead of 1-based.

When I first played with this, I had strictly chosen alpha sorts all the time, and then noticed that the numeric columns were getting mangled. So as a first cut, I added an ``alpha needed'' simple heuristic, and it works (mostly). For my data set, if the values are all entirely digits and decimals, with perhaps a leading minus sign, I figure a numeric sort is enough. So lines 51 to 53 run through the data, flipping on the $alpha flag if any non-numeric data is seen.

This actually doesn't work very well for the Start column (the time or possibly the date of process launch). I'd have to add a further heuristic to determine how to sort that particular column. At this point, I didn't much care, so I stopped.

Finally, lines 55 through 59 perform the sort, using either an alpha sort or a numeric sort. The direction multiplies the +1 or -1 from the inner comparison operator by either another +1 or -1, to correctly invert the direction of the sort. However, another way to write this would be:

  if ($alpha) {
    @result = sort { $a->[$column] cmp $b->[$column] } @result;
  } else {
    @result = sort { $a->[$column] <=> $b->[$column] } @result;
  }
  if ($direction < 0) {
    @result = reverse @result;
  }

I'm not sure which would benchmark faster, and again, the time difference over all the possible invocations of this program is probably less than the time it took me to just type that, so sometimes benchmarks really don't matter.

So now we're down to the fun stuff. @result is the array of rows of data. Starting in line 64, we process that into the rest of the table.

Lines 64 and 65 define two variables needed for the process. The @previous value contains the values in the various columns one or more rows above the row we are processing (initially undef as we create it). The @previous_row_number spells out which row that particular value started at, especially important if the value continues for many rows.

For each row of the result, we loop through lines 67 to 82. Line 68 grabs the column data for that particular row;

Lines 69 and 70 add the row to the table, being careful to sanitize the HTML once again. The return value here is the new row number, which appears not to be documented in the current release, but looks very deliberate in the source code, so I presume it's a valid and valuable feature. (Perhaps the author will choose to document it and make it official?)

With this row number, I can then look for vertical spans, starting in line 71, which scans over every column, putting the column number (zero-based) into $col. Line 72 evaluates the old-value cache, and if it's defined and the same as the value we're currently putting into a cell, then we have a span. Lines 74 to 76 cause the span to be noted into the table. For the span, we need to know the previous row number, the column number (one-based), and the number of rows, which we compute as needed.

If the values differ, we have started a new span, so it's time to store the value into the @previous array, and the row number in the corresponding slot of the @previous_row_number array. This code will also work to initialize all of those values on the very first row, because we checked for not being undef as one of the conditions.

Nearly done, and it's time to pretty-up the table a bit. I never claim to be a web page designer, but at least I decided that I like borders and cellspacing/cellpadding in a particular way, so I set those in lines 84 to 86.

And the table gets dumped in line 87 as the proper HTML table it deserves to be. There's an asString method for an HTML::Table object, but apparently the value is also overloaded so that converting it to a string also does the same thing.

And line 89 wraps it all up, closing off the HTML to make it nice and standard.

Note that this is not necessarily the most efficient way to go about this: the processing has to be reprocessed just to re-sort the data. Perhaps the data could have been cached, and reused until a time limit ran out. But there you have it. Until next time, enjoy!

Listings

        =1=     #!/usr/bin/perl -Tw
        =2=     use strict;
        =3=     $|++;
        =4=     
        =5=     use CGI qw(:all);
        =6=     use HTML::Table;
        =7=     
        =8=     my $sortcol = param('sortcol');
        =9=     unless (defined $sortcol and $sortcol =~ /\A-?\d+\z/) {
        =10=      $sortcol = 0;
        =11=    }
        =12=    
        =13=    print header, start_html("Processes");
        =14=    
        =15=    $ENV{PATH} = '/bin:/usr/bin:/usr/local/bin';
        =16=    my @result = map {
        =17=      [map /\s*(\S.*\S|\S?)/,
        =18=       unpack "A8 x1 A5 x1 A4 x1 A4 x1 A5 x1 A5 x1 A3 x1 A4 x0 A6 x1 A6 x1 A*", $_]
        =19=    } `ps uaxww`;
        =20=    
        =21=    ## first generate the table by generating the header...
        =22=    
        =23=    my $table = HTML::Table->new;
        =24=    for ((my @headers = @{shift @result}),
        =25=         (my $col = 0);
        =26=         $col <= $#headers;
        =27=         $col++) {
        =28=      my $tag = $col + 1;
        =29=      my $icon = "";
        =30=      if ($tag == $sortcol) {
        =31=        $icon = img({src => "/icons/down.gif"});
        =32=        $tag = -$tag;
        =33=      } elsif (-$tag == $sortcol) {
        =34=        $icon = img({src => "/icons/up.gif"});
        =35=        $tag = -$tag;
        =36=      }
        =37=      $table->addCol($icon .
        =38=                     a({href => script_name()."?sortcol=$tag"},
        =39=                       escapeHTML($headers[$col])));
        =40=    }
        =41=    $table->setRowHead(1);
        =42=    
        =43=    ## at this point, we need to sort the data based on $sortcol, which is 1-based
        =44=    
        =45=    if ($sortcol) {
        =46=      my $alpha = 0;
        =47=      my $direction = $sortcol <=> 0; # -1 or +1
        =48=      my $column = abs($sortcol) - 1; # 0-based now
        =49=    
        =50=      ## detect the need for an alpha sort, if column contains non-numeric data
        =51=      for (@result) {
        =52=        $alpha = 1, last if $_->[$column] =~ /[^\-\d.]/;
        =53=      }
        =54=    
        =55=      if ($alpha) {
        =56=        @result = sort { $direction * ($a->[$column] cmp $b->[$column])} @result;
        =57=      } else {
        =58=        @result = sort { $direction * ($a->[$column] <=> $b->[$column])} @result;
        =59=      }
        =60=    }
        =61=    
        =62=    ## and finally add the sorted data to the table, checking for spanning...
        =63=    
        =64=    my @previous;
        =65=    my @previous_row_number;
        =66=    
        =67=    for (@result) {
        =68=      my @this_row = @$_;
        =69=      my $this_row_number = $table->addRow(map escapeHTML($_),
        =70=                                           @this_row); # undocumented return value
        =71=      for my $col (0..$#this_row) {
        =72=        if (defined $previous[$col] and $previous[$col] eq $this_row[$col]) {
        =73=          ## we have a span
        =74=          my $previous_row_number = $previous_row_number[$col];
        =75=          $table->setCellRowSpan($previous_row_number, 1 + $col,
        =76=                                 $this_row_number - $previous_row_number + 1);
        =77=        } else {
        =78=          $previous[$col] = $this_row[$col];
        =79=          $previous_row_number[$col] = $this_row_number;
        =80=        }
        =81=      }
        =82=    }
        =83=    
        =84=    $table->setBorder(2);
        =85=    $table->setCellSpacing(0);
        =86=    $table->setCellPadding(2);
        =87=    print "$table";
        =88=    
        =89=    print end_html;

Randal L. Schwartz is a renowned expert on the Perl programming language (the lifeblood of the Internet), having contributed to a dozen top-selling books on the subject, and over 200 magazine articles. Schwartz runs a Perl training and consulting company (Stonehenge Consulting Services, Inc of Portland, Oregon), and is a highly sought-after speaker for his masterful stage combination of technical skill, comedic timing, and crowd rapport. And he's a pretty good Karaoke singer, winning contests regularly.

Schwartz can be reached for comment at merlyn@stonehenge.com or +1 503 777-0095, and welcomes questions on Perl and other related topics.