HTML::Tree - overview of HTML::TreeBuilder et al


    use HTML::TreeBuilder;
    my $tree = HTML::TreeBuilder->new();
        # Then do something with the tree, using HTML::Element
        # methods -- for example:
        # Finally:


HTML-Tree is a suite of Perl modules for making parse trees out of HTML source. It consists of mainly two modules, whose documentation you should refer to: HTML::TreeBuilder and HTML::Element.

HTML::TreeBuilder is the module that builds the parse trees. (It uses HTML::Parser to do the work of breaking the HTML up into tokens.)

The tree that TreeBuilder builds for you is made up of objects of the class HTML::Element.

If you find that you do not properly understand the documentation for HTML::TreeBuilder and HTML::Element, it may be because you are unfamiliar with tree-shaped data structures, or with object-oriented modules in general. Sean Burke has written some articles for The Perl Journal ( that seek to provide that background. The full text of those articles is contained in this distribution, as:

``User's View of Object-Oriented Modules'' from TPJ17.

``Trees'' from TPJ18

``Scanning HTML'' from TPJ19

Readers already familiar with object-oriented modules and tree-shaped data structures should read just the last article. Readers without that background should read the first, then the second, and then the third.


You can find documentation for this module with the perldoc command.

    perldoc HTML::Tree
    You can also look for information at:


the HTML::TreeBuilder manpage, the HTML::Element manpage, the HTML::Tagset manpage, the HTML::Parser manpage, the HTML::DOMbo manpage

The book Perl & LWP by Sean M. Burke published by O'Reilly and Associates, 2002. ISBN: 0-596-00178-9

It has several chapters to do with HTML processing in general, and HTML-Tree specifically. There's more info at:


Thanks to Gisle Aas, Sean Burke and Andy Lester for their original work.

Thanks to Chicago Perl Mongers ( for their patches submitted to HTML::Tree as part of the Phalanx project (

Thanks to the following people for additional patches and documentation: Terrence Brannon, Gordon Lack, Chris Madsen and Ricardo Signes.


