qPc@sdZddlmZddlmZddlmZddlm Z m Z m Z y e Z Wnek r{eefZ nXyddlmZWn!ek rddlmZnXyddlmZWn!ek rddlmZnXdefd YZydd lmZWnek r)n Xd efd YZeZd ZeddZeeddZeeddZ eddZ!eddZ"dZ#eZ$dS(s? An interface to html5lib that mimics the lxml.html interface. i(t HTMLParser(t TreeBuilder(tetree(t_contains_block_level_tagtXHTML_NAMESPACEtElement(turlopen(turlparseRcBseZdZedZRS(s*An html5lib HTML parser with lxml as tree.cKs tj|d|dt|dS(Ntstrictttree(t _HTMLParsert__init__R(tselfRtkwargs((s;/usr/lib64/python2.7/site-packages/lxml/html/html5parser.pyR s(t__name__t __module__t__doc__tFalseR (((s;/usr/lib64/python2.7/site-packages/lxml/html/html5parser.pyRs(t XHTMLParserRcBseZdZedZRS(s+An html5lib XHTML Parser with lxml as tree.cKs tj|d|dt|dS(NRR (t _XHTMLParserR R(R RR ((s;/usr/lib64/python2.7/site-packages/lxml/html/html5parser.pyR (s(RRRRR (((s;/usr/lib64/python2.7/site-packages/lxml/html/html5parser.pyR%scCs6|j|}|dk r|S|jdt|fS(Ns{%s}%s(tfindtNoneR(R ttagtelem((s;/usr/lib64/python2.7/site-packages/lxml/html/html5parser.pyt _find_tag.s cCsLt|tstdn|dkr3t}n|j|d|jS(s%Parse a whole document into a string.sstring requiredt useChardetN(t isinstancet_stringst TypeErrorRt html_parsertparsetgetroot(thtmlt guess_charsettparser((s;/usr/lib64/python2.7/site-packages/lxml/html/html5parser.pytdocument_fromstring5s   cCst|tstdn|dkr3t}n|j|dd|}|rt|dtr|r|djrtjd|dn|d=qn|S(sParses several HTML elements, returning a list of elements. The first item in the list may be a string. If no_leading_text is true, then it will be an error if there is leading text, and it will always be a list of only elements. If `guess_charset` is `True` and the text was not unicode but a bytestring, the `chardet` library will perform charset guessing on the string. sstring requiredtdivRisThere is leading text: %rN( RRRRRt parseFragmenttstripRt ParserError(R tno_leading_textR!R"tchildren((s;/usr/lib64/python2.7/site-packages/lxml/html/html5parser.pytfragments_fromstring@s     cCs;t|tstdnt|}t|d|d|d| }|rt|tsgd}nt|}|rt|dtr|d|_|d=n|j|n|S|stj dnt |dkrtj d n|d}|j r.|j j r.tj d |j nd |_ |S( sXParses a single HTML element; it is an error if there is more than one element, or if anything but whitespace precedes or follows the element. If create_parent is true (or is a tag name) then a parent node will be created to encapsulate the HTML in a single element. In this case, leading or trailing text is allowed. sstring requiredR!R"R(R$isNo elements foundisMultiple elements foundsElement followed by text: %rN(RRRtboolR*RttexttextendRR'tlenttailR&R(R t create_parentR!R"taccept_leading_texttelementstnew_roottresult((s;/usr/lib64/python2.7/site-packages/lxml/html/html5parser.pytfragment_fromstring\s2         cCst|tstdnt|d|d|}|d jj}|jdsj|jdrn|St|d}t|r|St|d}t|d kr|j s|j j r|d j s|d j j r|d St |r d |_ n d |_ |S(sParse the html, returning a single element/document. This tries to minimally parse the chunk of text, without knowing if it is a fragment or a document. base_url will set the document's base_url attribute (and the tree's docinfo.URL) sstring requiredR"R!i2sRR@R(((s;/usr/lib64/python2.7/site-packages/lxml/html/html5parser.pytsB        (*