.\" Automatically generated by Pod::Man 2.27 (Pod::Simple 3.28)
.\"
.\" Standard preamble:
.\" ========================================================================
.de Sp \" Vertical space (when we can't use .PP)
.if t .sp .5v
.if n .sp
..
.de Vb \" Begin verbatim text
.ft CW
.nf
.ne \\$1
..
.de Ve \" End verbatim text
.ft R
.fi
..
.\" Set up some character translations and predefined strings. \*(-- will
.\" give an unbreakable dash, \*(PI will give pi, \*(L" will give a left
.\" double quote, and \*(R" will give a right double quote. \*(C+ will
.\" give a nicer C++. Capital omega is used to do unbreakable dashes and
.\" therefore won't be available. \*(C` and \*(C' expand to `' in nroff,
.\" nothing in troff, for use with C<>.
.tr \(*W-
.ds C+ C\v'-.1v'\h'-1p'\s-2+\h'-1p'+\s0\v'.1v'\h'-1p'
.ie n \{\
. ds -- \(*W-
. ds PI pi
. if (\n(.H=4u)&(1m=24u) .ds -- \(*W\h'-12u'\(*W\h'-12u'-\" diablo 10 pitch
. if (\n(.H=4u)&(1m=20u) .ds -- \(*W\h'-12u'\(*W\h'-8u'-\" diablo 12 pitch
. ds L" ""
. ds R" ""
. ds C` ""
. ds C' ""
'br\}
.el\{\
. ds -- \|\(em\|
. ds PI \(*p
. ds L" ``
. ds R" ''
. ds C`
. ds C'
'br\}
.\"
.\" Escape single quotes in literal strings from groff's Unicode transform.
.ie \n(.g .ds Aq \(aq
.el .ds Aq '
.\"
.\" If the F register is turned on, we'll generate index entries on stderr for
.\" titles (.TH), headers (.SH), subsections (.SS), items (.Ip), and index
.\" entries marked with X<> in POD. Of course, you'll have to process the
.\" output yourself in some meaningful fashion.
.\"
.\" Avoid warning from groff about undefined register 'F'.
.de IX
..
.nr rF 0
.if \n(.g .if rF .nr rF 1
.if (\n(rF:(\n(.g==0)) \{
. if \nF \{
. de IX
. tm Index:\\$1\t\\n%\t"\\$2"
..
. if !\nF==2 \{
. nr % 0
. nr F 2
. \}
. \}
.\}
.rr rF
.\"
.\" Accent mark definitions (@(#)ms.acc 1.5 88/02/08 SMI; from UCB 4.2).
.\" Fear. Run. Save yourself. No user-serviceable parts.
. \" fudge factors for nroff and troff
.if n \{\
. ds #H 0
. ds #V .8m
. ds #F .3m
. ds #[ \f1
. ds #] \fP
.\}
.if t \{\
. ds #H ((1u-(\\\\n(.fu%2u))*.13m)
. ds #V .6m
. ds #F 0
. ds #[ \&
. ds #] \&
.\}
. \" simple accents for nroff and troff
.if n \{\
. ds ' \&
. ds ` \&
. ds ^ \&
. ds , \&
. ds ~ ~
. ds /
.\}
.if t \{\
. ds ' \\k:\h'-(\\n(.wu*8/10-\*(#H)'\'\h"|\\n:u"
. ds ` \\k:\h'-(\\n(.wu*8/10-\*(#H)'\`\h'|\\n:u'
. ds ^ \\k:\h'-(\\n(.wu*10/11-\*(#H)'^\h'|\\n:u'
. ds , \\k:\h'-(\\n(.wu*8/10)',\h'|\\n:u'
. ds ~ \\k:\h'-(\\n(.wu-\*(#H-.1m)'~\h'|\\n:u'
. ds / \\k:\h'-(\\n(.wu*8/10-\*(#H)'\z\(sl\h'|\\n:u'
.\}
. \" troff and (daisy-wheel) nroff accents
.ds : \\k:\h'-(\\n(.wu*8/10-\*(#H+.1m+\*(#F)'\v'-\*(#V'\z.\h'.2m+\*(#F'.\h'|\\n:u'\v'\*(#V'
.ds 8 \h'\*(#H'\(*b\h'-\*(#H'
.ds o \\k:\h'-(\\n(.wu+\w'\(de'u-\*(#H)/2u'\v'-.3n'\*(#[\z\(de\v'.3n'\h'|\\n:u'\*(#]
.ds d- \h'\*(#H'\(pd\h'-\w'~'u'\v'-.25m'\f2\(hy\fP\v'.25m'\h'-\*(#H'
.ds D- D\\k:\h'-\w'D'u'\v'-.11m'\z\(hy\v'.11m'\h'|\\n:u'
.ds th \*(#[\v'.3m'\s+1I\s-1\v'-.3m'\h'-(\w'I'u*2/3)'\s-1o\s+1\*(#]
.ds Th \*(#[\s+2I\s-2\h'-\w'I'u*3/5'\v'-.3m'o\v'.3m'\*(#]
.ds ae a\h'-(\w'a'u*4/10)'e
.ds Ae A\h'-(\w'A'u*4/10)'E
. \" corrections for vroff
.if v .ds ~ \\k:\h'-(\\n(.wu*9/10-\*(#H)'\s-2\u~\d\s+2\h'|\\n:u'
.if v .ds ^ \\k:\h'-(\\n(.wu*10/11-\*(#H)'\v'-.4m'^\v'.4m'\h'|\\n:u'
. \" for low resolution devices (crt and lpr)
.if \n(.H>23 .if \n(.V>19 \
\{\
. ds : e
. ds 8 ss
. ds o a
. ds d- d\h'-1'\(ga
. ds D- D\h'-1'\(hy
. ds th \o'bp'
. ds Th \o'LP'
. ds ae ae
. ds Ae AE
.\}
.rm #[ #] #H #V #F C
.\" ========================================================================
.\"
.IX Title "HTML::Element 3"
.TH HTML::Element 3 "2020-03-05" "perl v5.16.3" "User Contributed Perl Documentation"
.\" For nroff, turn off justification. Always turn off hyphenation; it makes
.\" way too many mistakes in technical documents.
.if n .ad l
.nh
.SH "NAME"
HTML::Element \- Class for objects that represent HTML elements
.SH "VERSION"
.IX Header "VERSION"
This document describes version 5.07 of
HTML::Element, released August 31, 2017
as part of HTML-Tree.
.SH "SYNOPSIS"
.IX Header "SYNOPSIS"
.Vb 3
\& use HTML::Element;
\& $a = HTML::Element\->new(\*(Aqa\*(Aq, href => \*(Aqhttp://www.perl.com/\*(Aq);
\& $a\->push_content("The Perl Homepage");
\&
\& $tag = $a\->tag;
\& print "$tag starts out as:", $a\->starttag, "\en";
\& print "$tag ends as:", $a\->endtag, "\en";
\& print "$tag\e\*(Aqs href attribute is: ", $a\->attr(\*(Aqhref\*(Aq), "\en";
\&
\& $links_r = $a\->extract_links();
\& print "Hey, I found ", scalar(@$links_r), " links.\en";
\&
\& print "And that, as HTML, is: ", $a\->as_HTML, "\en";
\& $a = $a\->delete;
.Ve
.SH "DESCRIPTION"
.IX Header "DESCRIPTION"
(This class is part of the HTML::Tree dist.)
.PP
Objects of the HTML::Element class can be used to represent elements
of \s-1HTML\s0 document trees. These objects have attributes, notably attributes that
designates each element's parent and content. The content is an array
of text segments and other HTML::Element objects. A tree with HTML::Element
objects as nodes can represent the syntax tree for a \s-1HTML\s0 document.
.SH "HOW WE REPRESENT TREES"
.IX Header "HOW WE REPRESENT TREES"
Consider this \s-1HTML\s0 document:
.PP
.Vb 9
\&
\&
\& Stuff
\&
\&
\&
\& I like potatoes!
\&
\&
.Ve
.PP
Building a syntax tree out of it makes a tree-structure in memory
that could be diagrammed as:
.PP
.Vb 11
\& html (lang=\*(Aqen\-US\*(Aq)
\& / \e
\& / \e
\& / \e
\& head body
\& /\e \e
\& / \e \e
\& / \e \e
\& title meta h1
\& | (name=\*(Aqauthor\*(Aq, |
\& "Stuff" content=\*(AqJojo\*(Aq) "I like potatoes"
.Ve
.PP
This is the traditional way to diagram a tree, with the \*(L"root\*(R" at the
top, and it's this kind of diagram that people have in mind when they
say, for example, that \*(L"the meta element is under the head element
instead of under the body element\*(R". (The same is also said with
\&\*(L"inside\*(R" instead of \*(L"under\*(R" \*(-- the use of \*(L"inside\*(R" makes more sense
when you're looking at the \s-1HTML\s0 source.)
.PP
Another way to represent the above tree is with indenting:
.PP
.Vb 8
\& html (attributes: lang=\*(Aqen\-US\*(Aq)
\& head
\& title
\& "Stuff"
\& meta (attributes: name=\*(Aqauthor\*(Aq content=\*(AqJojo\*(Aq)
\& body
\& h1
\& "I like potatoes"
.Ve
.PP
Incidentally, diagramming with indenting works much better for very
large trees, and is easier for a program to generate. The \f(CW\*(C`$tree\->dump\*(C'\fR
method uses indentation just that way.
.PP
However you diagram the tree, it's stored the same in memory \*(-- it's a
network of objects, each of which has attributes like so:
.PP
.Vb 4
\& element #1: _tag: \*(Aqhtml\*(Aq
\& _parent: none
\& _content: [element #2, element #5]
\& lang: \*(Aqen\-US\*(Aq
\&
\& element #2: _tag: \*(Aqhead\*(Aq
\& _parent: element #1
\& _content: [element #3, element #4]
\&
\& element #3: _tag: \*(Aqtitle\*(Aq
\& _parent: element #2
\& _content: [text segment "Stuff"]
\&
\& element #4 _tag: \*(Aqmeta\*(Aq
\& _parent: element #2
\& _content: none
\& name: author
\& content: Jojo
\&
\& element #5 _tag: \*(Aqbody\*(Aq
\& _parent: element #1
\& _content: [element #6]
\&
\& element #6 _tag: \*(Aqh1\*(Aq
\& _parent: element #5
\& _content: [text segment "I like potatoes"]
.Ve
.PP
The \*(L"treeness\*(R" of the tree-structure that these elements comprise is
not an aspect of any particular object, but is emergent from the
relatedness attributes (_parent and _content) of these element-objects
and from how you use them to get from element to element.
.PP
While you could access the content of a tree by writing code that says
"access the 'src' attribute of the root's \fIfirst\fR child's \fIseventh\fR
child's \fIthird\fR child\*(L", you're more likely to have to scan the contents
of a tree, looking for whatever nodes, or kinds of nodes, you want to
do something with. The most straightforward way to look over a tree
is to \*(R"traverse" it; an HTML::Element method (\f(CW\*(C`$h\->traverse\*(C'\fR) is
provided for this purpose; and several other HTML::Element methods are
based on it.
.PP
(For everything you ever wanted to know about trees, and then some,
see Niklaus Wirth's \fIAlgorithms + Data Structures = Programs\fR or
Donald Knuth's \fIThe Art of Computer Programming, Volume 1\fR.)
.SS "Weak References"
.IX Subsection "Weak References"
\&\s-1TL\s0;DR summary: \f(CW\*(C`use\ HTML::TreeBuilder\ 5\ \-weak;\*(C'\fR and forget about
the \f(CW\*(C`delete\*(C'\fR method (except for pruning a node from a tree).
.PP
Because HTML::Element stores a reference to the parent element, Perl's
reference-count garbage collection doesn't work properly with
HTML::Element trees. Starting with version 5.00, HTML::Element uses
weak references (if available) to prevent that problem. Weak
references were introduced in Perl 5.6.0, but you also need a version
of Scalar::Util that provides the \f(CW\*(C`weaken\*(C'\fR function.
.PP
Weak references are enabled by default. If you want to be certain
they're in use, you can say \f(CW\*(C`use\ HTML::Element\ 5\ \-weak;\*(C'\fR. You
must include the version number; previous versions of HTML::Element
ignored the import list entirely.
.PP
To disable weak references, you can say \f(CW\*(C`use\ HTML::Element\ \-noweak;\*(C'\fR.
This is a global setting. \fBThis feature is deprecated\fR and is
provided only as a quick fix for broken code. If your code does not
work properly with weak references, you should fix it immediately, as
weak references may become mandatory in a future version. Generally,
all you need to do is keep a reference to the root of the tree until
you're done working with it.
.PP
Because HTML::TreeBuilder is a subclass of HTML::Element, you can also
import \f(CW\*(C`\-weak\*(C'\fR or \f(CW\*(C`\-noweak\*(C'\fR from HTML::TreeBuilder: e.g.
\&\f(CW\*(C`use\ HTML::TreeBuilder:\ 5\ \-weak;\*(C'\fR.
.SH "BASIC METHODS"
.IX Header "BASIC METHODS"
.SS "new"
.IX Subsection "new"
.Vb 1
\& $h = HTML::Element\->new(\*(Aqtag\*(Aq, \*(Aqattrname\*(Aq => \*(Aqvalue\*(Aq, ... );
.Ve
.PP
This constructor method returns a new HTML::Element object. The tag
name is a required argument; it will be forced to lowercase.
Optionally, you can specify other initial attributes at object
creation time.
.SS "attr"
.IX Subsection "attr"
.Vb 2
\& $value = $h\->attr(\*(Aqattr\*(Aq);
\& $old_value = $h\->attr(\*(Aqattr\*(Aq, $new_value);
.Ve
.PP
Returns (optionally sets) the value of the given attribute of \f(CW$h\fR. The
attribute name (but not the value, if provided) is forced to
lowercase. If trying to read the value of an attribute not present
for this element, the return value is undef.
If setting a new value, the old value of that attribute is
returned.
.PP
If methods are provided for accessing an attribute (like \f(CW\*(C`$h\->tag\*(C'\fR for
\&\*(L"_tag\*(R", \f(CW\*(C`$h\->content_list\*(C'\fR, etc. below), use those instead of calling
attr \f(CW\*(C`$h\->attr\*(C'\fR, whether for reading or setting.
.PP
Note that setting an attribute to \f(CW\*(C`undef\*(C'\fR (as opposed to "", the empty
string) actually deletes the attribute.
.SS "tag"
.IX Subsection "tag"
.Vb 2
\& $tagname = $h\->tag();
\& $h\->tag(\*(Aqtagname\*(Aq);
.Ve
.PP
Returns (optionally sets) the tag name (also known as the generic
identifier) for the element \f(CW$h\fR. In setting, the tag name is always
converted to lower case.
.PP
There are four kinds of \*(L"pseudo-elements\*(R" that show up as
HTML::Element objects:
.IP "Comment pseudo-elements" 4
.IX Item "Comment pseudo-elements"
These are element objects with a \f(CW\*(C`$h\->tag\*(C'\fR value of \*(L"~comment\*(R",
and the content of the comment is stored in the \*(L"text\*(R" attribute
(\f(CW\*(C`$h\->attr("text")\*(C'\fR). For example, parsing this code with
HTML::TreeBuilder...
.Sp
.Vb 3
\&
.Ve
.Sp
produces an HTML::Element object with these attributes:
.Sp
.Vb 4
\& "_tag",
\& "~comment",
\& "text",
\& " I like Pie.\en Pie is good\en "
.Ve
.IP "Declaration pseudo-elements" 4
.IX Item "Declaration pseudo-elements"
Declarations (rarely encountered) are represented as HTML::Element
objects with a tag name of \*(L"~declaration\*(R", and content in the \*(L"text\*(R"
attribute. For example, this:
.Sp
.Vb 1
\&
.Ve
.Sp
produces an element whose attributes include:
.Sp
.Vb 1
\& "_tag", "~declaration", "text", "DOCTYPE foo"
.Ve
.IP "Processing instruction pseudo-elements" 4
.IX Item "Processing instruction pseudo-elements"
PIs (rarely encountered) are represented as HTML::Element objects with
a tag name of \*(L"~pi\*(R", and content in the \*(L"text\*(R" attribute. For
example, this:
.Sp
.Vb 1
\&
.Ve
.Sp
produces an element whose attributes include:
.Sp
.Vb 1
\& "_tag", "~pi", "text", "stuff foo?"
.Ve
.Sp
(assuming a recent version of HTML::Parser)
.IP "~literal pseudo-elements" 4
.IX Item "~literal pseudo-elements"
These objects are not currently produced by HTML::TreeBuilder, but can
be used to represent a \*(L"super-literal\*(R" \*(-- i.e., a literal you want to
be immune from escaping. (Yes, I just made that term up.)
.Sp
That is, this is useful if you want to insert code into a tree that
you plan to dump out with \f(CW\*(C`as_HTML\*(C'\fR, where you want, for some reason,
to suppress \f(CW\*(C`as_HTML\*(C'\fR's normal behavior of amp-quoting text segments.
.Sp
For example, this:
.Sp
.Vb 6
\& my $literal = HTML::Element\->new(\*(Aq~literal\*(Aq,
\& \*(Aqtext\*(Aq => \*(Aqx < 4 & y > 7\*(Aq
\& );
\& my $span = HTML::Element\->new(\*(Aqspan\*(Aq);
\& $span\->push_content($literal);
\& print $span\->as_HTML;
.Ve
.Sp
prints this:
.Sp
.Vb 1
\& x < 4 & y > 7
.Ve
.Sp
Whereas this:
.Sp
.Vb 4
\& my $span = HTML::Element\->new(\*(Aqspan\*(Aq);
\& $span\->push_content(\*(Aqx < 4 & y > 7\*(Aq);
\& # normal text segment
\& print $span\->as_HTML;
.Ve
.Sp
prints this:
.Sp
.Vb 1
\& x < 4 & y > 7
.Ve
.Sp
Unless you're inserting lots of pre-cooked code into existing trees,
and dumping them out again, it's not likely that you'll find
\&\f(CW\*(C`~literal\*(C'\fR pseudo-elements useful.
.SS "parent"
.IX Subsection "parent"
.Vb 2
\& $parent = $h\->parent();
\& $h\->parent($new_parent);
.Ve
.PP
Returns (optionally sets) the parent (aka \*(L"container\*(R") for this element.
The parent should either be undef, or should be another element.
.PP
You \fBshould not\fR use this to directly set the parent of an element.
Instead use any of the other methods under \*(L"Structure-Modifying
Methods\*(R", below.
.PP
Note that \f(CW\*(C`not($h\->parent)\*(C'\fR is a simple test for whether \f(CW$h\fR is the
root of its subtree.
.SS "content_list"
.IX Subsection "content_list"
.Vb 2
\& @content = $h\->content_list();
\& $num_children = $h\->content_list();
.Ve
.PP
Returns a list of the child nodes of this element \*(-- i.e., what
nodes (elements or text segments) are inside/under this element. (Note
that this may be an empty list.)
.PP
In a scalar context, this returns the count of the items,
as you may expect.
.SS "content"
.IX Subsection "content"
.Vb 1
\& $content_array_ref = $h\->content(); # may return undef
.Ve
.PP
This somewhat deprecated method returns the content of this element;
but unlike content_list, this returns either undef (which you should
understand to mean no content), or a \fIreference to the array\fR of
content items, each of which is either a text segment (a string, i.e.,
a defined non-reference scalar value), or an HTML::Element object.
Note that even if an arrayref is returned, it may be a reference to an
empty array.
.PP
While older code should feel free to continue to use \f(CW\*(C`$h\->content\*(C'\fR,
new code should use \f(CW\*(C`$h\->content_list\*(C'\fR in almost all conceivable
cases. It is my experience that in most cases this leads to simpler
code anyway, since it means one can say:
.PP
.Vb 1
\& @children = $h\->content_list;
.Ve
.PP
instead of the inelegant:
.PP
.Vb 1
\& @children = @{$h\->content || []};
.Ve
.PP
If you do use \f(CW\*(C`$h\->content\*(C'\fR (or \f(CW\*(C`$h\->content_array_ref\*(C'\fR), you should not
use the reference returned by it (assuming it returned a reference,
and not undef) to directly set or change the content of an element or
text segment! Instead use content_refs_list or any of the other
methods under \*(L"Structure-Modifying Methods\*(R", below.
.SS "content_array_ref"
.IX Subsection "content_array_ref"
.Vb 1
\& $content_array_ref = $h\->content_array_ref(); # never undef
.Ve
.PP
This is like \f(CW\*(C`content\*(C'\fR (with all its caveats and deprecations) except
that it is guaranteed to return an array reference. That is, if the
given node has no \f(CW\*(C`_content\*(C'\fR attribute, the \f(CW\*(C`content\*(C'\fR method would
return that undef, but \f(CW\*(C`content_array_ref\*(C'\fR would set the given node's
\&\f(CW\*(C`_content\*(C'\fR value to \f(CW\*(C`[]\*(C'\fR (a reference to a new, empty array), and
return that.
.SS "content_refs_list"
.IX Subsection "content_refs_list"
.Vb 1
\& @content_refs = $h\->content_refs_list;
.Ve
.PP
This returns a list of scalar references to each element of \f(CW$h\fR's
content list. This is useful in case you want to in-place edit any
large text segments without having to get a copy of the current value
of that segment value, modify that copy, then use the
\&\f(CW\*(C`splice_content\*(C'\fR to replace the old with the new. Instead, here you
can in-place edit:
.PP
.Vb 4
\& foreach my $item_r ($h\->content_refs_list) {
\& next if ref $$item_r;
\& $$item_r =~ s/honour/honor/g;
\& }
.Ve
.PP
You \fIcould\fR currently achieve the same affect with:
.PP
.Vb 5
\& foreach my $item (@{ $h\->content_array_ref }) {
\& # deprecated!
\& next if ref $item;
\& $item =~ s/honour/honor/g;
\& }
.Ve
.PP
\&...except that using the return value of \f(CW\*(C`$h\->content\*(C'\fR or
\&\f(CW\*(C`$h\->content_array_ref\*(C'\fR to do that is deprecated, and just might stop
working in the future.
.SS "implicit"
.IX Subsection "implicit"
.Vb 2
\& $is_implicit = $h\->implicit();
\& $h\->implicit($make_implicit);
.Ve
.PP
Returns (optionally sets) the \*(L"_implicit\*(R" attribute. This attribute is
a flag that's used for indicating that the element was not originally
present in the source, but was added to the parse tree (by
HTML::TreeBuilder, for example) in order to conform to the rules of
\&\s-1HTML\s0 structure.
.SS "pos"
.IX Subsection "pos"
.Vb 2
\& $pos = $h\->pos();
\& $h\->pos($element);
.Ve
.PP
Returns (and optionally sets) the \*(L"_pos\*(R" (for "current \fIpos\fRition")
pointer of \f(CW$h\fR. This attribute is a pointer used during some
parsing operations, whose value is whatever HTML::Element element
at or under \f(CW$h\fR is currently \*(L"open\*(R", where \f(CW\*(C`$h\->insert_element(NEW)\*(C'\fR
will actually insert a new element.
.PP
(This has nothing to do with the Perl function called \f(CW\*(C`pos\*(C'\fR, for
controlling where regular expression matching starts.)
.PP
If you set \f(CW\*(C`$h\->pos($element)\*(C'\fR, be sure that \f(CW$element\fR is
either \f(CW$h\fR, or an element under \f(CW$h\fR.
.PP
If you've been modifying the tree under \f(CW$h\fR and are no longer
sure \f(CW\*(C`$h\->pos\*(C'\fR is valid, you can enforce validity with:
.PP
.Vb 1
\& $h\->pos(undef) unless $h\->pos\->is_inside($h);
.Ve
.SS "all_attr"
.IX Subsection "all_attr"
.Vb 1
\& %attr = $h\->all_attr();
.Ve
.PP
Returns all this element's attributes and values, as key-value pairs.
This will include any \*(L"internal\*(R" attributes (i.e., ones not present
in the original element, and which will not be represented if/when you
call \f(CW\*(C`$h\->as_HTML\*(C'\fR). Internal attributes are distinguished by the fact
that the first character of their key (not value! key!) is an
underscore (\*(L"_\*(R").
.PP
Example output of \f(CW\*(C`$h\->all_attr()\*(C'\fR :
\&\f(CW\*(C`\*(Aq_parent\*(Aq, \*(C'\fR\fI[object_value]\fR\f(CW\*(C` , \*(Aq_tag\*(Aq, \*(Aqem\*(Aq, \*(Aqlang\*(Aq, \*(Aqen\-US\*(Aq,
\&\*(Aq_content\*(Aq, \*(C'\fR\fI[array\-ref value]\fR.
.SS "all_attr_names"
.IX Subsection "all_attr_names"
.Vb 2
\& @names = $h\->all_attr_names();
\& $num_attrs = $h\->all_attr_names();
.Ve
.PP
Like \f(CW\*(C`all_attr\*(C'\fR, but only returns the names of the attributes.
In scalar context, returns the number of attributes.
.PP
Example output of \f(CW\*(C`$h\->all_attr_names()\*(C'\fR :
\&\f(CW\*(C`\*(Aq_parent\*(Aq, \*(Aq_tag\*(Aq, \*(Aqlang\*(Aq, \*(Aq_content\*(Aq, \*(C'\fR.
.SS "all_external_attr"
.IX Subsection "all_external_attr"
.Vb 1
\& %attr = $h\->all_external_attr();
.Ve
.PP
Like \f(CW\*(C`all_attr\*(C'\fR, except that internal attributes are not present.
.SS "all_external_attr_names"
.IX Subsection "all_external_attr_names"
.Vb 2
\& @names = $h\->all_external_attr_names();
\& $num_attrs = $h\->all_external_attr_names();
.Ve
.PP
Like \f(CW\*(C`all_attr_names\*(C'\fR, except that internal attributes' names
are not present (or counted).
.SS "id"
.IX Subsection "id"
.Vb 2
\& $id = $h\->id();
\& $h\->id($string);
.Ve
.PP
Returns (optionally sets to \f(CW$string\fR) the \*(L"id\*(R" attribute.
\&\f(CW\*(C`$h\->id(undef)\*(C'\fR deletes the \*(L"id\*(R" attribute.
.PP
\&\f(CW\*(C`$h\->id(...)\*(C'\fR is basically equivalent to \f(CW\*(C`$h\->attr(\*(Aqid\*(Aq, ...)\*(C'\fR,
except that when setting the attribute, this method returns the new value,
not the old value.
.SS "idf"
.IX Subsection "idf"
.Vb 2
\& $id = $h\->idf();
\& $h\->idf($string);
.Ve
.PP
Just like the \f(CW\*(C`id\*(C'\fR method, except that if you call \f(CW\*(C`$h\->idf()\*(C'\fR and
no \*(L"id\*(R" attribute is defined for this element, then it's set to a
likely-to-be-unique value, and returned. (The \*(L"f\*(R" is for \*(L"force\*(R".)
.SH "STRUCTURE-MODIFYING METHODS"
.IX Header "STRUCTURE-MODIFYING METHODS"
These methods are provided for modifying the content of trees
by adding or changing nodes as parents or children of other nodes.
.SS "push_content"
.IX Subsection "push_content"
.Vb 1
\& $h\->push_content($element_or_text, ...);
.Ve
.PP
Adds the specified items to the \fIend\fR of the content list of the
element \f(CW$h\fR. The items of content to be added should each be either a
text segment (a string), an HTML::Element object, or an arrayref.
Arrayrefs are fed thru \f(CW\*(C`$h\->new_from_lol(that_arrayref)\*(C'\fR to
convert them into elements, before being added to the content
list of \f(CW$h\fR. This means you can say things concise things like:
.PP
.Vb 6
\& $body\->push_content(
\& [\*(Aqbr\*(Aq],
\& [\*(Aqul\*(Aq,
\& map [\*(Aqli\*(Aq, $_], qw(Peaches Apples Pears Mangos)
\& ]
\& );
.Ve
.PP
See the \*(L"new_from_lol\*(R" method's documentation, far below, for more
explanation.
.PP
Returns \f(CW$h\fR (the element itself).
.PP
The push_content method will try to consolidate adjacent text segments
while adding to the content list. That's to say, if \f(CW$h\fR's \f(CW\*(C`content_list\*(C'\fR is
.PP
.Vb 1
\& (\*(Aqfoo bar \*(Aq, $some_node, \*(Aqbaz!\*(Aq)
.Ve
.PP
and you call
.PP
.Vb 1
\& $h\->push_content(\*(Aqquack?\*(Aq);
.Ve
.PP
then the resulting content list will be this:
.PP
.Vb 1
\& (\*(Aqfoo bar \*(Aq, $some_node, \*(Aqbaz!quack?\*(Aq)
.Ve
.PP
and not this:
.PP
.Vb 1
\& (\*(Aqfoo bar \*(Aq, $some_node, \*(Aqbaz!\*(Aq, \*(Aqquack?\*(Aq)
.Ve
.PP
If that latter is what you want, you'll have to override the
feature of consolidating text by using splice_content,
as in:
.PP
.Vb 1
\& $h\->splice_content(scalar($h\->content_list),0,\*(Aqquack?\*(Aq);
.Ve
.PP
Similarly, if you wanted to add 'Skronk' to the beginning of
the content list, calling this:
.PP
.Vb 1
\& $h\->unshift_content(\*(AqSkronk\*(Aq);
.Ve
.PP
then the resulting content list will be this:
.PP
.Vb 1
\& (\*(AqSkronkfoo bar \*(Aq, $some_node, \*(Aqbaz!\*(Aq)
.Ve
.PP
and not this:
.PP
.Vb 1
\& (\*(AqSkronk\*(Aq, \*(Aqfoo bar \*(Aq, $some_node, \*(Aqbaz!\*(Aq)
.Ve
.PP
What you'd to do get the latter is:
.PP
.Vb 1
\& $h\->splice_content(0,0,\*(AqSkronk\*(Aq);
.Ve
.SS "unshift_content"
.IX Subsection "unshift_content"
.Vb 1
\& $h\->unshift_content($element_or_text, ...)
.Ve
.PP
Just like \f(CW\*(C`push_content\*(C'\fR, but adds to the \fIbeginning\fR of the \f(CW$h\fR
element's content list.
.PP
The items of content to be added should each be
either a text segment (a string), an HTML::Element object, or
an arrayref (which is fed thru \f(CW\*(C`new_from_lol\*(C'\fR).
.PP
The unshift_content method will try to consolidate adjacent text segments
while adding to the content list. See above for a discussion of this.
.PP
Returns \f(CW$h\fR (the element itself).
.SS "splice_content"
.IX Subsection "splice_content"
.Vb 2
\& @removed = $h\->splice_content($offset, $length,
\& $element_or_text, ...);
.Ve
.PP
Detaches the elements from \f(CW$h\fR's list of content-nodes, starting at
\&\f(CW$offset\fR and continuing for \f(CW$length\fR items, replacing them with the
elements of the following list, if any. Returns the elements (if any)
removed from the content-list. If \f(CW$offset\fR is negative, then it starts
that far from the end of the array, just like Perl's normal \f(CW\*(C`splice\*(C'\fR
function. If \f(CW$length\fR and the following list is omitted, removes
everything from \f(CW$offset\fR onward.
.PP
The items of content to be added (if any) should each be either a text
segment (a string), an arrayref (which is fed thru \*(L"new_from_lol\*(R"),
or an HTML::Element object that's not already
a child of \f(CW$h\fR.
.SS "detach"
.IX Subsection "detach"
.Vb 1
\& $old_parent = $h\->detach();
.Ve
.PP
This unlinks \f(CW$h\fR from its parent, by setting its 'parent' attribute to
undef, and by removing it from the content list of its parent (if it
had one). The return value is the parent that was detached from (or
undef, if \f(CW$h\fR had no parent to start with). Note that neither \f(CW$h\fR nor
its parent are explicitly destroyed.
.SS "detach_content"
.IX Subsection "detach_content"
.Vb 1
\& @old_content = $h\->detach_content();
.Ve
.PP
This unlinks all of \f(CW$h\fR's children from \f(CW$h\fR, and returns them.
Note that these are not explicitly destroyed; for that, you
can just use \f(CW\*(C`$h\->delete_content\*(C'\fR.
.SS "replace_with"
.IX Subsection "replace_with"
.Vb 1
\& $h\->replace_with( $element_or_text, ... )
.Ve
.PP
This replaces \f(CW$h\fR in its parent's content list with the nodes
specified. The element \f(CW$h\fR (which by then may have no parent)
is returned. This causes a fatal error if \f(CW$h\fR has no parent.
The list of nodes to insert may contain \f(CW$h\fR, but at most once.
Aside from that possible exception, the nodes to insert should not
already be children of \f(CW$h\fR's parent.
.PP
Also, note that this method does not destroy \f(CW$h\fR if weak references are
turned off \*(-- use \f(CW\*(C`$h\->replace_with(...)\->delete\*(C'\fR if you need that.
.SS "preinsert"
.IX Subsection "preinsert"
.Vb 1
\& $h\->preinsert($element_or_text...);
.Ve
.PP
Inserts the given nodes right \s-1BEFORE \s0\f(CW$h\fR in \f(CW$h\fR's parent's
content list. This causes a fatal error if \f(CW$h\fR has no parent.
None of the given nodes should be \f(CW$h\fR or other children of \f(CW$h\fR.
Returns \f(CW$h\fR.
.SS "postinsert"
.IX Subsection "postinsert"
.Vb 1
\& $h\->postinsert($element_or_text...)
.Ve
.PP
Inserts the given nodes right \s-1AFTER \s0\f(CW$h\fR in \f(CW$h\fR's parent's content
list. This causes a fatal error if \f(CW$h\fR has no parent. None of
the given nodes should be \f(CW$h\fR or other children of \f(CW$h\fR. Returns
\&\f(CW$h\fR.
.SS "replace_with_content"
.IX Subsection "replace_with_content"
.Vb 1
\& $h\->replace_with_content();
.Ve
.PP
This replaces \f(CW$h\fR in its parent's content list with its own content.
The element \f(CW$h\fR (which by then has no parent or content of its own) is
returned. This causes a fatal error if \f(CW$h\fR has no parent. Also, note
that this does not destroy \f(CW$h\fR if weak references are turned off \*(-- use
\&\f(CW\*(C`$h\->replace_with_content\->delete\*(C'\fR if you need that.
.SS "delete_content"
.IX Subsection "delete_content"
.Vb 2
\& $h\->delete_content();
\& $h\->destroy_content(); # alias
.Ve
.PP
Clears the content of \f(CW$h\fR, calling \f(CW\*(C`$h\->delete\*(C'\fR for each content
element. Compare with \f(CW\*(C`$h\->detach_content\*(C'\fR.
.PP
Returns \f(CW$h\fR.
.PP
\&\f(CW\*(C`destroy_content\*(C'\fR is an alias for this method.
.SS "delete"
.IX Subsection "delete"
.Vb 2
\& $h\->delete();
\& $h\->destroy(); # alias
.Ve
.PP
Detaches this element from its parent (if it has one) and explicitly
destroys the element and all its descendants. The return value is
the empty list (or \f(CW\*(C`undef\*(C'\fR in scalar context).
.PP
Before version 5.00 of HTML::Element, you had to call \f(CW\*(C`delete\*(C'\fR when
you were finished with the tree, or your program would leak memory.
This is no longer necessary if weak references are enabled, see
\&\*(L"Weak References\*(R".
.SS "destroy"
.IX Subsection "destroy"
An alias for \*(L"delete\*(R".
.SS "destroy_content"
.IX Subsection "destroy_content"
An alias for \*(L"delete_content\*(R".
.SS "clone"
.IX Subsection "clone"
.Vb 1
\& $copy = $h\->clone();
.Ve
.PP
Returns a copy of the element (whose children are clones (recursively)
of the original's children, if any).
.PP
The returned element is parentless. Any '_pos' attributes present in the
source element/tree will be absent in the copy. For that and other reasons,
the clone of an HTML::TreeBuilder object that's in mid-parse (i.e, the head
of a tree that HTML::TreeBuilder is elaborating) cannot (currently) be used
to continue the parse.
.PP
You are free to clone HTML::TreeBuilder trees, just as long as:
1) they're done being parsed, or 2) you don't expect to resume parsing
into the clone. (You can continue parsing into the original; it is
never affected.)
.SS "clone_list"
.IX Subsection "clone_list"
.Vb 1
\& @copies = HTML::Element\->clone_list(...nodes...);
.Ve
.PP
Returns a list consisting of a copy of each node given.
Text segments are simply copied; elements are cloned by
calling \f(CW\*(C`$it\->clone\*(C'\fR on each of them.
.PP
Note that this must be called as a class method, not as an instance
method. \f(CW\*(C`clone_list\*(C'\fR will croak if called as an instance method.
You can also call it like so:
.PP
.Vb 1
\& ref($h)\->clone_list(...nodes...)
.Ve
.SS "normalize_content"
.IX Subsection "normalize_content"
.Vb 1
\& $h\->normalize_content
.Ve
.PP
Normalizes the content of \f(CW$h\fR \*(-- i.e., concatenates any adjacent
text nodes. (Any undefined text segments are turned into empty-strings.)
Note that this does not recurse into \f(CW$h\fR's descendants.
.SS "delete_ignorable_whitespace"
.IX Subsection "delete_ignorable_whitespace"
.Vb 1
\& $h\->delete_ignorable_whitespace()
.Ve
.PP
This traverses under \f(CW$h\fR and deletes any text segments that are ignorable
whitespace. You should not use this if \f(CW$h\fR is under a \f(CW\*(C`\*(C'\fR element.
.SS "insert_element"
.IX Subsection "insert_element"
.Vb 1
\& $h\->insert_element($element, $implicit);
.Ve
.PP
Inserts (via push_content) a new element under the element at
\&\f(CW\*(C`$h\->pos()\*(C'\fR. Then updates \f(CW\*(C`$h\->pos()\*(C'\fR to point to the inserted
element, unless \f(CW$element\fR is a prototypically empty element like
\&\f(CW\*(C`
\*(C'\fR, \f(CW\*(C`
\*(C'\fR, \f(CW\*(C`\*(C'\fR, etc.
The new \f(CW\*(C`$h\->pos()\*(C'\fR is returned. This
method is useful only if your particular tree task involves setting
\&\f(CW\*(C`$h\->pos()\*(C'\fR.
.SH "DUMPING METHODS"
.IX Header "DUMPING METHODS"
.SS "dump"
.IX Subsection "dump"
.Vb 2
\& $h\->dump()
\& $h\->dump(*FH) ; # or *FH{IO} or $fh_obj
.Ve
.PP
Prints the element and all its children to \s-1STDOUT \s0(or to a specified
filehandle), in a format useful
only for debugging. The structure of the document is shown by
indentation (no end tags).
.SS "as_HTML"
.IX Subsection "as_HTML"
.Vb 4
\& $s = $h\->as_HTML();
\& $s = $h\->as_HTML($entities);
\& $s = $h\->as_HTML($entities, $indent_char);
\& $s = $h\->as_HTML($entities, $indent_char, \e%optional_end_tags);
.Ve
.PP
Returns a string representing in \s-1HTML\s0 the element and its
descendants. The optional argument \f(CW$entities\fR specifies a string of
the entities to encode. For compatibility with previous versions,
specify \f(CW\*(Aq<>&\*(Aq\fR here. If omitted or undef, \fIall\fR unsafe
characters are encoded as \s-1HTML\s0 entities. See HTML::Entities for
details. If passed an empty string, no entities are encoded.
.PP
If \f(CW$indent_char\fR is specified and defined, the \s-1HTML\s0 to be output is
intented, using the string you specify (which you probably should
set to \*(L"\et\*(R", or some number of spaces, if you specify it).
.PP
If \f(CW\*(C`\e%optional_end_tags\*(C'\fR is specified and defined, it should be
a reference to a hash that holds a true value for every tag name
whose end tag is optional. Defaults to
\&\f(CW\*(C`\e%HTML::Element::optionalEndTag\*(C'\fR, which is an alias to
\&\f(CW%HTML::Tagset::optionalEndTag\fR, which, at time of writing, contains
true values for \f(CW\*(C`p, li, dt, dd\*(C'\fR. A useful value to pass is an empty
hashref, \f(CW\*(C`{}\*(C'\fR, which means that no end-tags are optional for this dump.
Otherwise, possibly consider copying \f(CW%HTML::Tagset::optionalEndTag\fR to a
hash of your own, adding or deleting values as you like, and passing
a reference to that hash.
.SS "as_text"
.IX Subsection "as_text"
.Vb 2
\& $s = $h\->as_text();
\& $s = $h\->as_text(skip_dels => 1);
.Ve
.PP
Returns a string consisting of only the text parts of the element's
descendants. Any whitespace inside the element is included unchanged,
but whitespace not in the tree is never added. But remember that
whitespace may be ignored or compacted by HTML::TreeBuilder during
parsing (depending on the value of the \f(CW\*(C`ignore_ignorable_whitespace\*(C'\fR
and \f(CW\*(C`no_space_compacting\*(C'\fR attributes). Also, since whitespace is
never added during parsing,
.PP
.Vb 2
\& HTML::TreeBuilder\->new_from_content("a
b
")
\& \->as_text;
.Ve
.PP
returns \f(CW"ab"\fR, not \f(CW"a b"\fR or \f(CW"a\enb"\fR.
.PP
Text under \f(CW\*(C`