HTML::Stream - HTML output stream class, and some markup utilities


NAME

HTML::Stream - HTML output stream class, and some markup utilities


SYNOPSIS

Here's small sample of some of the non-OO ways you can use this module:

      use HTML::Stream qw(:funcs);
      
      print html_tag('A', HREF=>$link);     
      print html_escape("<<Hello & welcome!>>");

And some of the OO ways as well:

      use HTML::Stream;
      $HTML = new HTML::Stream \*STDOUT;
      
      # The vanilla interface...
      $HTML->tag('A', HREF=>"$href");
      $HTML->tag('IMG', SRC=>"logo.gif", ALT=>"LOGO");
      $HTML->text($copyright);
      $HTML->tag('_A');
      
      # The chocolate interface...
      $HTML -> A(HREF=>"$href");
      $HTML -> IMG(SRC=>"logo.gif", ALT=>"LOGO");
      $HTML -> t($caption);
      $HTML -> _A;
       
      # The chocolate interface, with whipped cream...
      $HTML -> A(HREF=>"$href")
            -> IMG(SRC=>"logo.gif", ALT=>"LOGO")
            -> t($caption)
            -> _A;
      # The strawberry interface...
      output $HTML [A, HREF=>"$href"], 
                   [IMG, SRC=>"logo.gif", ALT=>"LOGO"],
                   $caption,
                   [_A];


DESCRIPTION

The HTML::Stream module provides you with an object-oriented (and subclassable) way of outputting HTML. Basically, you open up an ``HTML stream'' on an existing filehandle, and then do all of your output to the HTML stream. You can intermix HTML-stream-output and ordinary-print-output, if you like.

There's even a small built-in subclass, HTML::Stream::Latin1, which can handle Latin-1 input right out of the box. But all in good time...


INTRODUCTION (the Neapolitan dessert special)

Function interface

Let's start out with the simple stuff. This module provides a collection of non-OO utility functions for escaping HTML text and producing HTML tags, like this:

    use HTML::Stream qw(:funcs);        # imports functions from @EXPORT_OK
    
    print html_tag(A, HREF=>$url);
    print '&copy; 1996 by', html_escape($myname), '!';
    print html_tag('/A');

By the way: that last line could be rewritten as:

    print html_tag(_A);

And if you need to get a parameter in your tag that doesn't have an associated value, supply the undefined value (not the empty string!):

    print html_tag(TD, NOWRAP=>undef, ALIGN=>'LEFT');
    
         <TD NOWRAP ALIGN=LEFT>
    
    print html_tag(IMG, SRC=>'logo.gif', ALT=>'');
    
         <IMG SRC="logo.gif" ALT="">

There are also some routines for reversing the process, like:

    $text = "This <i>isn't</i> &quot;fun&quot;...";    
    print html_unmarkup($text);
       
         This isn't &quot;fun&quot;...
      
    print html_unescape($text);
       
         This isn't "fun"...

Yeah, yeah, yeah, I hear you cry. We've seen this stuff before. But wait! There's more...

OO interface, vanilla

Using the function interface can be tedious... so we also provide an ``HTML output stream'' class. Messages to an instance of that class generally tell that stream to output some HTML. Here's the above example, rewritten using HTML streams:

    use HTML::Stream;
    $HTML = new HTML::Stream \*STDOUT;
    
    $HTML->tag(A, HREF=>$url);
    $HTML->ent('copy');
    $HTML->text(" 1996 by $myname!");
    $HTML->tag(_A);

As you've probably guessed:

    text()   Outputs some text, which will be HTML-escaped.
    
    tag()    Outputs an ordinary tag, like <A>, possibly with parameters.
             The parameters will all be HTML-escaped automatically.
     
    ent()    Outputs an HTML entity, like the &copy; or &lt; .
             You mostly don't need to use it; you can often just put the 
             Latin-1 representation of the character in the text().

You might prefer to use t() and e() instead of text() and ent(): they're absolutely identical, and easier to type:

    $HTML -> tag(A, HREF=>$url);
    $HTML -> e('copy');
    $HTML -> t(" 1996 by $myname!");
    $HTML -> tag(_A);

Now, it wouldn't be nice to give you those text() and ent() shortcuts without giving you one for tag(), would it? Of course not...

OO interface, chocolate

The known HTML tags are even given their own tag-methods, compiled on demand. The above code could be written even more compactly as:

    $HTML -> A(HREF=>$url);
    $HTML -> e('copy');
    $HTML -> t(" 1996 by $myname!");
    $HTML -> _A;

As you've probably guessed:

    A(HREF=>$url)   ==   tag(A, HREF=>$url)   ==   <A HREF="/the/url">
    _A              ==   tag(_A)              ==   </A>

All of the autoloaded ``tag-methods'' use the tagname in all-uppercase. A "_" prefix on any tag-method means that an end-tag is desired. The "_" was chosen for several reasons: (1) it's short and easy to type, (2) it doesn't produce much visual clutter to look at, (3) _TAG looks a little like /TAG because of the straight line.

I should stress that this module will only auto-create tag methods for known HTML tags. So you're protected from typos like this (which will cause a fatal exception at run-time):

    $HTML -> IMGG(SRC=>$src);

(You're not yet protected from illegal tag parameters, but it's a start, ain't it?)

If you need to make a tag known (sorry, but this is currently a global operation, and not stream-specific), do this:

    accept_tag HTML::Stream 'MARQUEE';       # for you MSIE fans...

Note: there is no corresponding ``reject_tag''. I thought and thought about it, and could not convince myself that such a method would do anything more useful than cause other people's modules to suddenly stop working because some bozo function decided to reject the FONT tag.

OO interface, with whipped cream

In the grand tradition of C++, output method chaining is supported in both the Vanilla Interface and the Chocolate Interface. So you can (and probably should) write the above code as:

    $HTML -> A(HREF=>$url) 
          -> e('copy') -> t(" 1996 by $myname!") 
          -> _A;

But wait! Neapolitan ice cream has one more flavor...

OO interface, strawberry

I was jealous of the compact syntax of HTML::AsSubs, but I didn't want to worry about clogging the namespace with a lot of functions like p(), a(), etc. (especially when markup-functions like tr() conflict with existing Perl functions). So I came up with this:

    output $HTML [A, HREF=>$url], "Here's my $caption", [_A];

Conceptually, arrayrefs are sent to html_tag(), and strings to html_escape().


ADVANCED TOPICS

Auto-formatting and inserting newlines

Auto-formatting is the name I give to the Chocolate Interface feature whereby newlines (and maybe, in the future, other things) are inserted before or after the tags you output in order to make your HTML more readable. So, by default, this:

    $HTML -> HTML 
          -> HEAD  
          -> TITLE -> t("Hello!") -> _TITLE 
          -> _HEAD
          -> BODY(BGCOLOR=>'#808080');

Actually produces this:

    <HTML><HTML>
    <HEAD>
    <TITLE>Hello!</TITLE>
    </HEAD>
    <BODY BGCOLOR="#808080">

To turn off autoformatting altogether on a given HTML::Stream object, use the auto_format() method:

    $HTML->auto_format(0);        # stop autoformatting!

To change whether a newline is automatically output before/after the begin/end form of a tag at a global level, use set_tag():

    HTML::Stream->set_tag('B', Newlines=>15);   # 15 means "\n<B>\n \n</B>\n"
    HTML::Stream->set_tag('I', Newlines=>7);    # 7 means  "\n<I>\n \n</I>  "

To change whether a newline is automatically output before/after the begin/end form of a tag for a given stream level, give the stream its own private ``tag info'' table, and then use set_tag():

    $HTML->private_tags;
    $HTML->set_tag('B', Newlines=>0);     # won't affect anyone else!

To output newlines explicitly, just use the special nl method in the Chocolate Interface:

    $HTML->nl;     # one newline
    $HTML->nl(6);  # six newlines

I am sometimes asked, ``why don't you put more newlines in automatically?'' Well, mostly because...

So I've stuck to outputting newlines in places where it's most likely to be harmless.

Entities

As shown above, You can use the ent() (or e()) method to output an entity:

    $HTML->t('Copyright ')->e('copy')->t(' 1996 by Me!');

But this can be a pain, particularly for generating output with non-ASCII characters:

    $HTML -> t('Copyright ') 
          -> e('copy') 
          -> t(' 1996 by Fran') -> e('ccedil') -> t('ois, Inc.!');

Granted, Europeans can always type the 8-bit characters directly in their Perl code, and just have this:

    $HTML -> t("Copyright \251 1996 by Fran\347ois, Inc.!');

But folks without 8-bit text editors can find this kind of output cumbersome to generate. Sooooooooo...

Auto-escaping: changing the way text is escaped

Auto-escaping is the name I give to the act of taking an ``unsafe'' string (one with ``>'', ``&'', etc.), and magically outputting ``safe'' HTML.

The default ``auto-escape'' behavior of an HTML stream can be a drag if you've got a lot character entities that you want to output, or if you're using the Latin-1 character set, or some other input encoding. Fortunately, you can use the auto_escape() method to change the way a particular HTML::Stream works at any time.

First, here's a couple of special invocations:

    $HTML->auto_escape('ALL');      # Default; escapes [<>"&] and 8-bit chars.
    $HTML->auto_escape('LATIN_1');  # Like ALL, but uses Latin-1 entities
                                    #   instead of decimal equivalents.
    $HTML->auto_escape('NON_ENT');  # Like ALL, but leaves "&" alone.

You can also install your own auto-escape function (note that you might very well want to install it for just a little bit only, and then de-install it):

    sub my_auto_escape {
        my $text = shift;
        HTML::Entities::encode($text);     # start with default
        $text =~ s/\(c\)/&copy;/ig;        # (C) becomes copyright
        $text =~ s/\\,(c)/\&$1cedil;/ig;   # \,c becomes a cedilla
        $text;
    }
    
    # Start using my auto-escape:
    my $old_esc = $HTML->auto_escape(\&my_auto_escape);  
    
    # Output some stuff:
    $HTML-> IMG(SRC=>'logo.gif', ALT=>'Fran\,cois, Inc');
    output $HTML 'Copyright (C) 1996 by Fran\,cois, Inc.!';
    
    # Stop using my auto-escape:
    $HTML->auto_escape($old_esc);

If you find yourself in a situation where you're doing this a lot, a better way is to create a subclass of HTML::Stream which installs your custom function when constructed. For an example, see the HTML::Stream::Latin1 subclass in this module.

Outputting HTML to things besides filehandles

As of Revision 1.21, you no longer need to supply new() with a filehandle: any object that responds to a print() method will do. Of course, this includes blessed FileHandles, and IO::Handles.

If you supply a GLOB reference (like \*STDOUT) or a string (like "Module::FH"), HTML::Stream will automatically create an invisible object for talking to that filehandle (I don't dare bless it into a FileHandle, since the underlying descriptor would get closed when the HTML::Stream is destroyed, and you might not want that).

You say you want to print to a string? For kicks and giggles, try this:

    package StringHandle;
    sub new {
        my $self = '';
        bless \$self, shift;
    }
    sub print {
        my $self = shift;
        $$self .= join('', @_);
    }
    
  
    package main;
    use HTML::Stream;
    
    my $SH = new StringHandle;
    my $HTML = new HTML::Stream $SH;
    $HTML -> H1 -> t("Hello & <<welcome>>!") -> _H1;
    print "PRINTED STRING: ", $$SH, "\n";

Subclassing

This is where you can make your application-specific HTML-generating code much easier to look at. Consider this:

    package MY::HTML;
    @ISA = qw(HTML::Stream);
     
    sub Aside {
        $_[0] -> FONT(SIZE=>-1) -> I;
    }
    sub _Aside {
        $_[0] -> _I -> _FONT;
    }

Now, you can do this:

    my $HTML = new MY::HTML \*STDOUT;
    
    $HTML -> Aside
          -> t("Don't drink the milk, it's spoiled... pass it on...")
          -> _Aside;

If you're defining these markup-like, chocolate-interface-style functions, I recommend using mixed case with a leading capital. You probably shouldn't use all-uppercase, since that's what this module uses for real HTML tags.


PUBLIC INTERFACE

Functions

html_escape TEXT
Given a TEXT string, turn the text into valid HTML by escaping ``unsafe'' characters. Currently, the ``unsafe'' characters are 8-bit characters plus:
    <  >  =  &

Note: provided for convenience and backwards-compatibility only. You may want to use the more-powerful HTML::Entities::encode function instead.

html_tag TAG [, PARAM=>VALUE, ...]
Return the text for a given TAG, possibly with parameters. As an efficiency hack, only the values are HTML-escaped currently: it is assumed that the tag and parameters will already be safe.

For convenience and readability, you can say _A instead of "/A" for the first tag, if you're into barewords.

html_unescape TEXT
Remove angle-tag markup, and convert the standard ampersand-escapes (lt, gt, amp, quot, and #ddd) into ASCII characters.

Note: provided for convenience and backwards-compatibility only. You may want to use the more-powerful HTML::Entities::decode function instead: unlike this function, it can collapse entities like copy and ccedil into their Latin-1 byte values.

html_unmarkup TEXT
Remove angle-tag markup from TEXT, but do not convert ampersand-escapes. Cheesy, but theoretically useful if you want to, say, incorporate externally-provided HTML into a page you're generating, and are worried that the HTML might contain undesirable markup.

Vanilla

new [PRINTABLE]
Class method. Create a new HTML output stream.

The PRINTABLE may be a FileHandle, a glob reference, or any object that responds to a print() message. If no PRINTABLE is given, does a select() and uses that.

auto_escape [NAME|SUBREF]
Instance method. Set the auto-escape function for this HTML stream.

If the argument is a subroutine reference SUBREF, then that subroutine will be used. Declare such subroutines like this:

    sub my_escape {
        my $text = shift;     # it's passed in the first argument
        ...
        $text;
    }

If a textual NAME is given, then one of the appropriate built-in functions is used. Possible values are:

ALL
Default for HTML::Stream objects. This escapes angle brackets, ampersands, double-quotes, and 8-bit characters. 8-bit characters are escaped using decimal entity codes (like #123).

LATIN_1
Like "ALL", but uses Latin-1 entity names (like ccedil) instead of decimal entity codes to escape characters. This makes the HTML more readable but it is currently not advised, as ``older'' browsers (like Netscape 2.0) do not recognize many of the ISO-8859-1 entity names (like deg).

Warning: If you specify this option, you'll find that it attempts to ``require'' HTML::Entities at run time. That's because I didn't want to force you to have that module just to use the rest of HTML::Stream. To pick up problems at compile time, you are advised to say:

    use HTML::Stream;
    use HTML::Entities;

in your source code.

NON_ENT
Like "ALL", except that ampersands (&) are not escaped. This allows you to use &-entities in your text strings, while having everything else safely escaped:
    output $HTML "If A is an acute angle, then A > 90&deg;";

Returns the previously-installed function, in the manner of select(). No arguments just returns the currently-installed function.

auto_format ONOFF
Instance method. Set the auto-formatting characteristics for this HTML stream. Currently, all you can do is supply a single defined boolean argument, which turns auto-formatting ON (1) or OFF (0). The self object is returned.

Please use no other values; they are reserved for future use.

comment COMMENT
Instance method. Output an HTML comment. As of 1.29, a newline is automatically appended.

ent ENTITY
Instance method. Output an HTML entity. For example, here's how you'd output a non-breaking space:
      $html->ent('nbsp');

You may abbreviate this method name as e:

      $html->e('nbsp');

Warning: this function assumes that the entity argument is legal.

io
Return the underlying output handle for this HTML stream. All you can depend upon is that it is some kind of object which responds to a print() message:
    $HTML->io->print("This is not auto-escaped or nuthin!");

nl [COUNT]
Instance method. Output COUNT newlines. If undefined, COUNT defaults to 1.

tag TAGNAME [, PARAM=>VALUE, ...]
Instance method. Output a tag. Returns the self object, to allow method chaining. You can say _A instead of "/A", if you're into barewords.

text TEXT...
Instance method. Output some text. You may abbreviate this method name as t:
      $html->t('Hi there, ', $yournamehere, '!');

Returns the self object, to allow method chaining.

text_nbsp TEXT...
Instance method. Output some text, but with all spaces output as non-breaking-space characters:
      $html->t("To list your home directory, type: ")
           ->text_nbsp("ls -l ~yourname.")

Returns the self object, to allow method chaining.

Strawberry

output ITEM,...,ITEM
Instance method. Go through the items. If an item is an arrayref, treat it like the array argument to html_tag() and output the result. If an item is a text string, escape the text and output the result. Like this:
     output $HTML [A, HREF=>$url], "Here's my $caption!", [_A];

Chocolate

accept_tag TAG
Class method. Declares that the tag is to be accepted as valid HTML (if it isn't already). For example, this...
     # Make sure methods MARQUEE and _MARQUEE are compiled on demand:
     HTML::Stream->accept_tag('MARQUEE');

...gives the Chocolate Interface permission to create (via AUTOLOAD) definitions for the MARQUEE and _MARQUEE methods, so you can then say:

     $HTML -> MARQUEE -> t("Hi!") -> _MARQUEE;

If you want to set the default attribute of the tag as well, you can do so via the set_tag() method instead; it will effectively do an accept_tag() as well.

     # Make sure methods MARQUEE and _MARQUEE are compiled on demand,
     #   *and*, set the characteristics of that tag.
     HTML::Stream->set_tag('MARQUEE', Newlines=>9);

private_tags
Instance method. Normally, HTML streams use a reference to a global table of tag information to determine how to do such things as auto-formatting, and modifications made to that table by set_tag will affect everyone.

However, if you want an HTML stream to have a private copy of that table to munge with, just send it this message after creating it. Like this:

    my $HTML = new HTML::Stream \*STDOUT;
    $HTML->private_tags;

Then, you can say stuff like:

    $HTML->set_tag('PRE',   Newlines=>0);
    $HTML->set_tag('BLINK', Newlines=>9);

And it won't affect anyone else's auto-formatting (although they will possibly be able to use the BLINK tag method without a fatal exception :-( ).

Returns the self object.

set_tag TAG, [TAGINFO...]
Class/instance method. Accept the given TAG in the Chocolate Interface, and (if TAGINFO is given) alter its characteristics when being output.

The TAGINFO, if given, is a set of key=>value pairs with the following possible keys:

Newlines
Assumed to be a number which encodes how newlines are to be output before/after a tag. The value is the logical OR (or sum) of a set of flags:
     0x01    newline before <TAG>         .<TAG>.     .</TAG>.    
     0x02    newline after <TAG>          |     |     |      |
     0x04    newline before </TAG>        1     2     4      8
     0x08    newline after </TAG>

Hence, to output BLINK environments which are preceded/followed by newlines:

     set_tag HTML::Stream 'BLINK', Newlines=>9;

Returns the self object on success.

tags
Class/instance method. Returns an unsorted list of all tags in the class/instance tag table (see set_tag for class/instance method differences).


SUBCLASSES

HTML::Stream::Latin1

A small, public package for outputting Latin-1 markup. Its default auto-escape function is LATIN_1, which tries to output the mnemonic entity markup (e.g., &ccedil;) for ISO-8859-1 characters.

So using HTML::Stream::Latin1 like this:

    use HTML::Stream;
    
    $HTML = new HTML::Stream::Latin1 \*STDOUT;
    output $HTML "\253A right angle is 90\260, \277No?\273\n";

Prints this:

    &laquo;A right angle is 90&deg;, &iquest;No?&raquo;

Instead of what HTML::Stream would print, which is this:

    &#171;A right angle is 90&#176;, &#191;No?&#187;

Warning: a lot of Latin-1 HTML markup is not recognized by older browsers (e.g., Netscape 2.0). Consider using HTML::Stream; it will output the decimal entities which currently seem to be more ``portable''.

Note: using this class ``requires'' that you have HTML::Entities.


PERFORMANCE

Slower than I'd like. Both the output() method and the various ``tag'' methods seem to run about 5 times slower than the old just-hardcode-the-darn stuff approach. That is, in general, this:

    ### Approach #1...
    tag  $HTML 'A', HREF=>"$href";
    tag  $HTML 'IMG', SRC=>"logo.gif", ALT=>"LOGO";
    text $HTML $caption;
    tag  $HTML '_A';
    text $HTML $a_lot_of_text;

And this:

    ### Approach #2...
    output $HTML [A, HREF=>"$href"], 
                 [IMG, SRC=>"logo.gif", ALT=>"LOGO"],
                 $caption,
                 [_A];
    output $HTML $a_lot_of_text;

And this:

    ### Approach #3...
    $HTML -> A(HREF=>"$href")
          -> IMG(SRC=>"logo.gif", ALT=>"LOGO")
          -> t($caption)
          -> _A
          -> t($a_lot_of_text);

Each run about 5x slower than this:

    ### Approach #4...
    print '<A HREF="', html_escape($href), '>',
          '<IMG SRC="logo.gif" ALT="LOGO">',
          html_escape($caption),
          '</A>';
    print html_escape($a_lot_of_text);

Of course, I'd much rather use any of first three (especially #3) if I had to get something done right in a hurry. Or did you not notice the typo in approach #4? ;-)

(BTW, thanks to Benchmark:: for allowing me to... er... benchmark stuff.)


VERSION

$Id: Stream.pm,v 1.55 2003/10/28 dstaal Exp $


CHANGE LOG

Version 1.55 (2003/10/28)
New maintainer: Daniel T. Staal. No major changes in the code, except to complete the tag list to HTML 4.01 specifications. (With the exception of the 'S' tag, which I want to test, and is depreciated anyway. Note that the DOCTYPE is not actually a HTML tag, and is not currently included.)

Version 1.54 (2001/08/20)
The terms-of-use have been placed in the distribution file ``COPYING''. Also, small documentation tweaks were made.

Version 1.51 (2001/08/16)
No real changes to code; just improved documentation, and removed HTML::Entities and HTML::Parser from ./etc at CPAN's request.

Version 1.47 (2000/06/10)
No real changes to code; just improved documentation.

Version 1.45 (1999/02/09)
Cleanup for Perl 5.005: removed duplicate typeglob assignments.

Version 1.44 (1998/01/14)
Win95 install (5.004) now works. Added SYNOPSIS to POD.

Version 1.41 (1998/01/02)
Removed $& for efficiency. Thanks, Andreas!

Added support for OPTION, and default now puts newlines after SELECT and /SELECT. Also altered ``TELEM'' syntax to put newline after end-tags of list element tags (like /OPTION, /LI, etc.). In theory, this change could produce undesireable results for folks who embed lists inside of PRE environments... however, that kind of stuff was done in the days before TABLEs; also, you can always turn it off if you really need to. Thanks to John D Groenveld for these patches.

Added text_nbsp(). Thanks to John D Groenveld for the patch. This method may also be invoked as nbsp_text() as in the original patch, but that's sort of a private tip-of-the-hat to the patch author, and the synonym may go away in the future.

Version 1.37 (1997/02/09)
No real change; just trying to make CPAN.pm happier.

Version 1.32 (1997/01/12)
NEW TOOL for generating Perl code which uses HTML::Stream! Check your toolkit for html2perlstream.

Added built-in support for escaping 8-bit characters.

Added LATIN_1 auto-escape, which uses HTML::Entities to generate mnemonic entities. This is now the default method for HTML::Stream::Latin1.

Added auto_format(), so you can now turn auto-formatting off/on.

Added private_tags(), so it is now possible for HTML streams to each have their own ``private'' copy of the %Tags table, for use by set_tag().

Added set_tag(). The tags tables may now be modified dynamically so as to change how formatting is done on-the-fly. This will hopefully not compromise the efficiency of the chocolate interface (until now, the formatting was compiled into the method itself), and will add greater flexibility for more-complex programs.

Added POD documentation for all subroutines in the public interface.

Version 1.29 (1996/12/10)
Added terminating newline to comment(). Thanks to John D Groenveld for the suggestion and the patch.

Version 1.27 (1996/12/10)
Added built-in HTML::Stream::Latin1, which does a very simple encoding of all characters above ASCII 127.

Fixed bug in accept_tag(), where 'my' variable was shadowing argument. Thanks to John D Groenveld for the bug report and the patch.

Version 1.26 (1996/09/27)
Start of history.


COPYRIGHT

This program is free software. You may copy or redistribute it under the same terms as Perl itself.


ACKNOWLEDGEMENTS

Warmest thanks to...

    Eryq                   For writing the orginal version of this module.
    John Buckman           For suggesting that I write an "html2perlstream",
                           and inspiring me to look at supporting Latin-1.
    Tony Cebzanov          For suggesting that I write an "html2perlstream"
    John D Groenveld       Bug reports, patches, and suggestions
    B. K. Oxley (binkley)  For suggesting the support of "writing to strings"
                           which became the "printable" interface.


AUTHOR

Daniel T. Staal (DStaal@usa.net).

Enjoy. Yell if it breaks.

 HTML::Stream - HTML output stream class, and some markup utilities