Copyright Notice

This text is copyright by CMP Media, LLC, and is used with their permission. Further distribution or use is not permitted.

This text has appeared in an edited form in WebTechniques magazine. However, the version you are reading here is as the author originally submitted the article for publication, not after their editors applied their creativity.

Please read all the information in the table of contents before using this article.
Download this listing!

Web Techniques Column 69 (Jan 2002)

[suggested title: Generating clickable graphs]

In this column nearly a year ago, I introduced the use of the GraphViz program to generate a nice graph. For that example, I was able to visualize the flow of users through website, and GraphViz saved me from having to do a lot of the messy work of creating a drawing, and laying out the nodes with a minimum of fuss.

I was recently contracted by a client to perform a similar task: perform a query against some data, and show the relationships graphically. However, as an added requirement, each node on the display had to be clickable, to further refine the query in progress.

``No problem'', I said, whipping out GraphViz once again. After all, I had recalled seeing something on the GraphViz manpage about constructing a client-side imagemap for the graph, and a similar thing on the manpage for the CPAN GraphViz driver module.

But I had to have an image coordinated with the imagemap, and I needed to have the imagemap call URLs that made sense for the data. Not only that, but performing the query and running GraphViz eats up a bit of CPU, and I wanted to cache the picture for a given query result, so others could share it. And that I easily handled using the Cache::FileCache module, also found in the CPAN.

So, in summary, I have a query, which generates links. I generate a picture with those links using GraphViz, and a client side imagemap that lets me click on the nodes of that picture to select a particular node. The image gets written into a cache, indexed by a ``session'' number, and referenced in the HTML imagemap. The session number is computed from the links, so a given link list will reuse the imagemap and image computations, if a prior computation has been performed recently enough.

Sound tough? Nope, it's just a mere hundred lines of code or so, as illustrated in [the listing below]. For this example, I'm not performing the query as part of the program, but just simulating it. A text area form field lets me enter ``from/to'' pairs directly. This let me play around with the rendering logic independent of the query logic, so I could see what some typical images would look and act like.

Lines 1 through 3 start nearly every one of my Perl CGI programs. However, note the absence of the -T switch, because apparently, some of the GraphViz module is not taint aware. Oh well.

Line 5 brings in the standard CGI.pm shortcuts, including two nonstandard extensions to create imagemap values.

Line 6 sets the PATH for finding the dot program, part of the GraphViz package. The GraphViz module calls dot directly.

Lines 8 through 14 set up my persistent storage for this program. I called the program forestdump because I was trying to see the trees. OK, bad pun. Note that I've set the expire time to 10 minutes, which I'm hoping is a good tradeoff between reuse of cached values and size of the resulting cache. Of course, if I have lots of cheap disk space, I could make this 24 hours or more; it doesn't change the way the program works -- just how efficient it is.

Lines 16 to 33 handle the image access, which happens only after the script instructs the browser to come back to get the image. I'll set this portion aside and revisit it later in the description.

Line 35 sets up the pairs parameter if not present, defaulting it to the text appearing after the __END__ tag. I selected the dataset as a typical result string, including a couple of the ``messy'' characters for HTML and HTTP as part of the node values to ensure that encoding and decoding was being done appropriately.

Lines 36 to 40 include a standard form with a textarea. The initial value will be the testing data just described above. However, I can add, delete, or modify that, and resubmit the form, causing a new graph to be generated, and the value will stick from one iteration to the next.

Lines 42 to 46 handle the ``clicked on a node'' link, which appears as the goto param. For the demo, I'm not actually doing anything other than reporting it, but in the real code (not shown), I used it to modify the query as needed by the client.

Lines 48 to 51 compute a ``session'' ID based on the wanted pairs. I use the MD5 module (from the CPAN) to compute a 32-character ``hex hash'' from the value. This value is nice in that the odds of a collision are extremely small, and yet it's 32 nice hex characters which I can use safely in URLs.

Line 53 notes whether we've already recently seen and computed this particular pair combination. If we have, $image_and_imagemap is now defined, and we can simply print the previously computed imagemap in line 56. Otherwise, we've got to do all the computations, beginning in line 60.

Line 60 starts a wall-clock and CPU clock timer so I can see how much CPU I'm burning to generate the GraphViz result. The values are used in line 101, subtracted out from the new wall-clock and CPU clock values, to show the differences in the web error log.

Line 62 brings in the GraphViz module, with a require to avoid any CPU for the times in which I won't be using GraphViz. Lines 63 and 64 create the GraphViz object. rankdir makes the picture in a relatively horizontal direction, and the node values set the shape of each node, and provide guidance for the URL for the clickmap. I define the URL to simply be the nodename, although it just needed to be something nicely replaceable, because I simply parse it out in line 87 and 88.

Line 66 declares a hash to map the GraphViz-generated nodenames into my node names. I did it this way because I wasn't sure exactly how well GraphViz would encode HTML-sensitive and URL-sensitive characters (if at all), and I wanted to ensure that this was possible. So rather than let any node be named based on the content of the node, every node is named something like node00001, autogenerated by GraphViz, and this hash gives the mapping from our label to the nodename.

Lines 68 to 71 parse the values in the $pair form value into the from/to pairs, including ignoring bad lines. This let me conveniently ``comment out'' a line, simply by adding an extra comma to the line.

Lines 72 to 74 add a node for each of the start and end points of the edge to be added. If the nodename for a given label exists, it's reused; otherwise, add_node is called to generate a new node, and a new node name.

Line 75 adds the edge for the two nodes just created (or reused). The edge appears as a directed arrow. The edge also influences the layout of the nodes, because the creators of GraphViz worked very hard to make sure that the edges don't cross and are as short as possible. This makes for nice-looking uncluttered graphs.

When the loop ending in line 76 is complete, the $g object is now ready to be rendered in two different ways: once to get the imagemap, and once to get the image. First, we'll do the image map.

Line 78 reverses the mapping, so that I can find a given label from a node name quickly.

Lines 80 to 95 create the imagemap as a single string, consisting of two pieces: the image (defined in lines 82 and 83) and the associated map (defined in lines 84 through 94).

The image URL in line 83 causes the browser to call back to the same CGI script, but with a slash appended followed by 32 hex characters and .gif. (Don't tell Unisys I was generating GIFs with unlicensed software, but for some reason, the installation of GraphViz on my ISPs machine is broken with respect to PNGs, which I would have rather used.) This then shows back up at line 16 on the next invocation as the ``path info'' value, which I then show as an image from the cache, or a status 404 ``not found'' message if something has gone wrong.

The image map comes from calling the as_ismap method on the graph object (buried down in line 94), which then goes through a little assembly line of transformations in lines 87 through 93 to construct an actual image map. I'm not sure what format GraphViz thought it was generating, but it was clearly unusable as a client-side image map, contrary to the documentation.

Lines 87 and 88 parse out the x-y coordinates of the corners of the rectangle for each node, as well as its node name. These names are the autogenerated names, which must be mapped back into a label, which we do in line 89. To generate a URL that responds to a click, passing the right label, and also maintaining the sticky fields for the pairs value, I shove the goto value in as if it was already selected, and then call the CGI.pm's url routine to perform all the right arranging and encoding. In testing, I discovered that the x-y pairs were actually mismatched, so I swap $y1 with $y2 before generating the value. (That one took about 20 minutes of debugging right there. I hate underdocumented standards.)

Line 97 prints the resulting imagemap as part of the CGI response.

Line 99 takes the computed imagemap, and creates the associated image, and shoves the two of them (as an arrayref) into the cache. This is then accessible by the next browser hit to get the image, and reusable by any future queries that return the same link set.

And there you have it. A structure for generating dynamic GraphViz images and a clickable image map, and caching the results as well. Until next time, enjoy!

Listings

        =1=     #!/usr/bin/perl -w
        =2=     use strict;
        =3=     $|++;
        =4=     
        =5=     use CGI qw(:all area map);
        =6=     $ENV{PATH} = "/usr/local/bin:/bin:/usr/bin";
        =7=     
        =8=     use Cache::FileCache;
        =9=     my $cache = Cache::FileCache->new
        =10=      ({namespace => 'forestdump',
        =11=        username => 'nobody',
        =12=        default_expires_in => '10 minutes',
        =13=        auto_purge_interval => '1 hour',
        =14=       });
        =15=    
        =16=    if (length (my $info = path_info())) { # I am the image
        =17=      my ($session) = $info =~ m{\A/([0-9a-f]+)\.gif\z}i
        =18=        or do {
        =19=          warn("bad URL $info");
        =20=          print header(-status => '404 Not Found');
        =21=          exit 0;
        =22=        };
        =23=    
        =24=      defined(my $image_and_imagemap = $cache->get($session))
        =25=        or do {
        =26=          warn("Cannot find $session");
        =27=          print header(-status => '404 Not Found');
        =28=          exit 0;
        =29=        };
        =30=    
        =31=      print header('image/gif'), $image_and_imagemap->[0];
        =32=      exit 0;
        =33=    }
        =34=    
        =35=    param('pairs', do {local $/; <DATA>}) unless param('pairs');
        =36=    print header,
        =37=      hr, start_form, p('enter pairs of parent-to-child,',
        =38=                        'one pair per line, separated by commas'), br,
        =39=      textarea(-name => 'pairs', -rows => 10),
        =40=      submit, end_form, hr;
        =41=    
        =42=    if (param('goto')) {
        =43=      my $selected = param('goto');
        =44=      Delete('goto');
        =45=      print p("You selected node", escapeHTML($selected)."!");
        =46=    }
        =47=    
        =48=    my $pairs = param('pairs');
        =49=    $pairs =~ tr/\r//d;
        =50=    
        =51=    my $session = do { require MD5; MD5->hexhash($pairs) };
        =52=    
        =53=    if (defined(my $image_and_imagemap = $cache->get($session))) {
        =54=      ## we have a good imagemap already, so reuse it
        =55=      warn "reusing imagemap $session"; # DEBUG
        =56=      print $image_and_imagemap->[1];
        =57=    } else {
        =58=      ## we must compute it from the pairs
        =59=    
        =60=      my (@times) = (time,times);
        =61=    
        =62=      require GraphViz;
        =63=      my $g = GraphViz->new
        =64=        (rankdir => 1, node => {height => '0.05', shape => 'box', URL => '\N'});
        =65=      
        =66=      my %nodename;
        =67=    
        =68=      for (split /\n/, $pairs) {
        =69=        my @values = split /\s*,\s*/;
        =70=        next unless @values == 2;
        =71=        my ($fromlabel, $tolabel) = @values;
        =72=        my ($fromnode, $tonode) = map {
        =73=          $nodename{$_} ||= $g->add_node('label' => $_)
        =74=        } ($fromlabel, $tolabel);
        =75=        $g->add_edge($fromnode, $tonode);
        =76=      }
        =77=    
        =78=      my %nodename_to_label = reverse %nodename;
        =79=    
        =80=      my $imagemap = join
        =81=        ("",
        =82=         img({ismap => 1, usemap => '#my_image_map',
        =83=              src => url(-relative => 1)."/$session.gif"}),
        =84=         &map({name => 'my_image_map'},
        =85=              join("\n",
        =86=                   map {
        =87=                     my ($x1,$y1,$x2,$y2, $nodename) =
        =88=                       /^rectangle \((\d+),(\d+)\) \((\d+),(\d+)\) (\S+) /;
        =89=                     param('goto', $nodename_to_label{$nodename});
        =90=                     ## y1 needs to be swapped with y2 apparently
        =91=                     area({shape => 'rect',
        =92=                           href => url(-relative => 1, -query => 1),
        =93=                           coords => "$x1,$y2,$x2,$y1"});
        =94=                   } split /\n/, $g->as_ismap)));
        =95=      Delete('goto'); # set in the loop above
        =96=      
        =97=      print $imagemap;
        =98=    
        =99=      $cache->set($session, [$g->as_gif, $imagemap]);
        =100=   
        =101=     @times = map { $_ - shift @times } time, times;
        =102=     warn "CPU used for new item: @times"; # debug
        =103=   }
        =104=   
        =105=   __END__
        =106=   pa1, a
        =107=   pa2, a
        =108=   a, b
        =109=   a, c
        =110=   a, d%
        =111=   d%, e&f
        =112=   j, k
        =113=   k, l
        =114=   k, m
        =115=   k, n
        =116=   pa1, n

Randal L. Schwartz is a renowned expert on the Perl programming language (the lifeblood of the Internet), having contributed to a dozen top-selling books on the subject, and over 200 magazine articles. Schwartz runs a Perl training and consulting company (Stonehenge Consulting Services, Inc of Portland, Oregon), and is a highly sought-after speaker for his masterful stage combination of technical skill, comedic timing, and crowd rapport. And he's a pretty good Karaoke singer, winning contests regularly.

Schwartz can be reached for comment at merlyn@stonehenge.com or +1 503 777-0095, and welcomes questions on Perl and other related topics.