nfc@s}dZddlZddlZdgZdd dYZdd dYZdd dYZd ejfd YZdS(s< robotparser.py Copyright (C) 2000 Bastian Kleineidam You can choose between two licenses when using this package: 1) GNU GPLv2 2) PSF license for Python 2.2 The robots.txt Exclusion Protocol is implemented as specified in http://info.webcrawler.com/mak/projects/robots/norobots-rfc.html iNtRobotFileParsercBsbeZdZddZdZdZdZdZdZdZ d Z d Z RS( ss This class provides a set of methods to read, parse and answer questions about a single robots.txt file. tcCs>g|_d|_t|_t|_|j|d|_dS(Ni(tentriestNonet default_entrytFalset disallow_allt allow_alltset_urlt last_checked(tselfturl((s#/usr/lib64/python2.7/robotparser.pyt__init__s      cCs|jS(sReturns the time the robots.txt file was last fetched. This is useful for long-running web spiders that need to check for new robots.txt files periodically. (R (R ((s#/usr/lib64/python2.7/robotparser.pytmtime scCsddl}|j|_dS(sYSets the time the robots.txt file was last fetched to the current time. iN(ttimeR (R R((s#/usr/lib64/python2.7/robotparser.pytmodified)s cCs/||_tj|dd!\|_|_dS(s,Sets the URL referring to a robots.txt file.iiN(R turlparsethosttpath(R R ((s#/usr/lib64/python2.7/robotparser.pyR1s cCst}|j|j}g|D]}|j^q"}|j|j|_|jdkrkt|_n@|jdkrt|_n%|jdkr|r|j |ndS(s4Reads the robots.txt URL and feeds it to the parser.iiiiN(ii( t URLopenertopenR tstriptcloseterrcodetTrueRRtparse(R topenertftlinetlines((s#/usr/lib64/python2.7/robotparser.pytread6s     cCsAd|jkr-|jdkr=||_q=n|jj|dS(Nt*(t useragentsRRRtappend(R tentry((s#/usr/lib64/python2.7/robotparser.pyt _add_entryDscCsd}d}t}x|D]}|d7}|s~|dkrPt}d}q~|dkr~|j|t}d}q~n|jd}|dkr|| }n|j}|sqn|jdd}t|dkr|djj|dRA(((s#/usr/lib64/python2.7/robotparser.pyRs     3 R0cBs)eZdZdZdZdZRS(soA rule line is a single "Allow:" (allowance==True) or "Disallow:" (allowance==False) followed by a path.cCs;|dkr| rt}ntj||_||_dS(NR(RR-R9RR;(R RR;((s#/usr/lib64/python2.7/robotparser.pyR s cCs|jdkp|j|jS(NR(Rt startswith(R tfilename((s#/usr/lib64/python2.7/robotparser.pyR:scCs|jrdpdd|jS(NtAllowtDisallows: (R;R(R ((s#/usr/lib64/python2.7/robotparser.pyRAs(RBRCRDR R:RA(((s#/usr/lib64/python2.7/robotparser.pyR0s  R(cBs2eZdZdZdZdZdZRS(s?An entry has one or more user-agents and zero or more rulelinescCsg|_g|_dS(N(R R/(R ((s#/usr/lib64/python2.7/robotparser.pyR s cCsjg}x'|jD]}|jd|dgqWx*|jD]}|jt|dgq:Wdj|S(Ns User-agent: s R(R textendR/R@R?(R trettagentR((s#/usr/lib64/python2.7/robotparser.pyRAs cCs]|jddj}x=|jD]2}|dkr9tS|j}||kr#tSq#WtS(s2check if this entry applies to the specified agentR4iR(R*R,R RR(R R<RK((s#/usr/lib64/python2.7/robotparser.pyR:s   cCs.x'|jD]}|j|r |jSq WtS(sZPreconditions: - our agent applies to this entry - filename is URL decoded(R/R:R;R(R RFR((s#/usr/lib64/python2.7/robotparser.pyR;s (RBRCRDR RAR:R;(((s#/usr/lib64/python2.7/robotparser.pyR(s    RcBs#eZdZdZdZRS(cGs tjj||d|_dS(Ni(R-tFancyURLopenerR R(R targs((s#/usr/lib64/python2.7/robotparser.pyR scCsdS(N(NN(R(R Rtrealm((s#/usr/lib64/python2.7/robotparser.pytprompt_user_passwdscCs(||_tjj||||||S(N(RR-RLthttp_error_default(R R tfpRterrmsgtheaders((s#/usr/lib64/python2.7/robotparser.pyRPs (RBRCR RORP(((s#/usr/lib64/python2.7/robotparser.pyRs  (((( RDRR-t__all__RR0R(RLR(((s#/usr/lib64/python2.7/robotparser.pyt s   $