This module contains everything you need to create XIST objects by parsing files, strings, URLs etc. Parsing XML is done with a pipelined approach. The first step in the pipeline is a source object that provides the input for the rest of the pipeline. The next step is the XML parser. It turns the input source into an iterator over parsing events (an "event stream"). Further steps in the pipeline might resolve namespace prefixes (``NS``), and instantiate XIST classes (``Node``). The final step in the pipeline is either building an XML tree via ``tree`` or an iterative parsing step (similar to ElementTrees ``iterparse`` function) via ``itertree``. Parsing a simple HTML string might e.g. look like this: >>> from ll.xist import xsc, parse >>> from ll.xist.ns import html >>> source = b"Python" >>> doc = parse.tree( ... parse.String(source), ... parse.Expat(), ... parse.NS(html), ... parse.Node(pool=xsc.Pool(html)), ... ) >>> doc.string() 'Python' A source object is an iterable object that produces the input byte string for the parser (possibly in multiple chunks) (and information about the URL of the input): >>> from ll.xist import parse >>> list(parse.String(b"Python")) [('url', URL('STRING')), ('bytes', "Python")] All subsequent objects in the pipeline are callable objects, get the input iterator as an argument and return an iterator over events themselves. The following code shows an example of an event stream: >>> from ll.xist import parse >>> source = b"Python" >>> list(parse.events(parse.String(source), parse.Expat())) [('url', URL('STRING')), ('position', (0, 0)), ('enterstarttag', 'a'), ('enterattr', 'href'), ('text', 'http://www.python.org/'), ('leaveattr', 'href'), ('leavestarttag', 'a'), ('position', (0, 39)), ('text', 'Python'), ('endtag', 'a')] An event is a tuple consisting of the event type and the event data. Different stages in the pipeline produce different event types. The following event types can be produced by source objects: ``"url"`` The event data is the URL of the source. Usually such an event is produced only once at the start of the event stream. For sources that have no natural URL (like strings or streams) the URL can be specified when creating the source object. ``"bytes"`` This event is produced by source objects (and ``Transcoder`` objects). The event data is a byte string. ``"str"`` The event data is a string. This event is produced by ``Decoder`` or source objects. Note that the only predefined pipeline objects that can handle ``"str"`` events are ``Encoder`` objects, i.e. normally a parser handles ``"bytes"`` events, but not ``"str"`` events. The following type of events are produced by parsers (in addition to the ``"url"`` event from above): ``"position"`` The event data is a tuple containing the line and column number in the source (both starting with 0). All the following events should use this position information until the next position event. ``"xmldecl"`` The XML declaration. The event data is a dictionary containing the keys ``"version"``, ``"encoding"`` and ``"standalone"``. Parsers may omit this event. ``"begindoctype"`` The begin of the doctype. The event data is a dictionary containing the keys ``"name"``, ``"publicid"`` and ``"systemid"``. Parsers may omit this event. ``"enddoctype"`` The end of the doctype. The event data is ``None``. (If there is no internal subset, the ``"enddoctype"`` event immediately follows the ``"begindoctype"`` event). Parsers may omit this event. ``"comment"`` A comment. The event data is the content of the comment. ``"text"`` Text data. The event data is the text content. Parsers should try to avoid outputting multiple text events in sequence. ``"cdata"`` A CDATA section. The event data is the content of the CDATA section. Parsers may report CDATA sections as ``"text"`` events instead of ``"cdata"`` events. ``"enterstarttag"`` The beginning of an element start tag. The event data is the element name. ``"leavestarttag"`` The end of an element start tag. The event data is the element name. The parser will output events for the attributes between the ``"enterstarttag"`` and the ``"leavestarttag"`` event. ``"enterattr"`` The beginning of an attribute. The event data is the attribute name. ``"leaveattr"`` The end of an attribute. The event data is the attribute name. The parser will output events for the attribute value between the ``"enterattr"`` and the ``"leaveattr"`` event. (In almost all cases this is one text event). ``"endtag"`` An element end tag. The event data is the element name. ``"procinst"`` A processing instruction. The event data is a tuple consisting of the processing instruction target and the data. ``"entity"`` An entity reference. The event data is the entity name. The following events are produced for elements and attributes in namespace mode (instead of those without the ``ns`` suffix). They are produced by ``NS`` objects or by ``Expat`` objects when the ``ns`` argument is true (i.e. the expat parser performs the namespace resolution): ``"enterstarttagns"`` The beginning of an element start tag in namespace mode. The event data is an (namespace name, element name) tuple. ``"leavestarttagns"`` The end of an element start tag in namespace mode. The event data is an (namespace name, element name) tuple. ``"enterattrns"`` The beginning of an attribute in namespace mode. The event data is an (namespace name, element name) tuple. ``"leaveattrns"`` The end of an attribute in namespace mode. The event data is an (namespace name, element name) tuple. ``"endtagns"`` An element end tag in namespace mode. The event data is an (namespace name, element name) tuple. Once XIST nodes have been instantiated (by ``Node`` objects) the following events are used: ``"xmldeclnode"`` The XML declaration. The event data is an instance of ``ll.xist.xml.XML``. ``"doctypenode"`` The doctype. The event data is an instance of ``ll.xist.xsc.DocType``. ``"commentnode"`` A comment. The event data is an instance of ``ll.xist.xsc.Comment``. ``"textnode"`` Text data. The event data is an instance of ``ll.xist.xsc.Text``. ``"enterelementnode"`` The beginning of an element. The event data is an instance of ``ll.xist.xsc.Element`` (or one of its subclasses). The attributes of the element object are set, but the element has no content yet. ``"leaveelementnode"`` The end of an element. The event data is an instance of ``ll.xist.xsc.Element``. ``"procinstnode"`` A processing instruction. The event data is an instance of ``ll.xist.xsc.ProcInst``. ``"entitynode"`` An entity reference. The event data is an instance of ``ll.xist.xsc.Entity``. For consuming event streams there are three functions: ``events`` This generator simply outputs the events. ``tree`` This function builds an XML tree from the events and returns it. ``itertree`` This generator builds a tree like ``tree``, but returns events during certain steps in the parsing process. ======= Example ======= The following example shows a custom generator in the pipeline that lowercases all element and attribute names: from ll.xist import xsc, parse from ll.xist.ns import html def lowertag(input): for (event, data) in input: if event in {"enterstarttag", "leavestarttag", "endtag", "enterattr", "leaveattr"}: data = data.lower() yield (event, data) e = parse.tree( parse.String(b"gurk"), parse.Expat(), lowertag, parse.NS(html), parse.Node(pool=xsc.Pool(html)) ) print(e.string()) This scripts outputs: gurk ============================================ class ``UnknownEventError``​(``TypeError``): ============================================ This exception is raised when a pipeline object doesn't know how to handle an event. def ``__init__``​(``self``, ``pipe``, ``event``): ------------------------------------------------- def ``__str__``​(``self``): --------------------------- ============================== class ``String``​(``object``): ============================== Provides parser input from a string. def ``__init__``​(``self``, ``data``, ``url``=``None``): -------------------------------------------------------- Create a ``String`` object. ``data`` must be a ``bytes`` or ``str`` object. ``url`` specifies the URL for the source (defaulting to ``"STRING"``). def ``__iter__``​(``self``): ---------------------------- Produces an event stream of one ``"url"`` event and one ``"bytes"`` or ``"str"`` event for the data. ============================ class ``Iter``​(``object``): ============================ Provides parser input from an iterator over strings. def ``__init__``​(``self``, ``iterable``, ``url``=``None``): ------------------------------------------------------------ Create a ``Iter`` object. ``iterable`` must be an iterable object producing ``bytes`` or ``str`` objects. ``url`` specifies the URL for the source (defaulting to ``"ITER"``). def ``__iter__``​(``self``): ---------------------------- Produces an event stream of one ``"url"`` event followed by the ``"bytes"``/``"str"`` events for the data from the iterable. ============================== class ``Stream``​(``object``): ============================== Provides parser input from a stream (i.e. an object that provides a ``read`` method). def ``__init__``​(``self``, ``stream``, ``url``=``None``, ``bufsize``=``8192``): -------------------------------------------------------------------------------- Create a ``Stream`` object. ``stream`` must have a ``read`` method (with a ``size`` argument). ``url`` specifies the URL for the source (defaulting to ``"STREAM"``). ``bufsize`` specifies the chunksize for reads from the stream. def ``__iter__``​(``self``): ---------------------------- Produces an event stream of one ``"url"`` event followed by the ``"bytes"``/``"str"`` events for the data from the stream. ============================ class ``File``​(``object``): ============================ Provides parser input from a file. def ``__init__``​(``self``, ``filename``, ``bufsize``=``8192``): ---------------------------------------------------------------- Create a ``File`` object. ``filename`` is the name of the file and may start with ``~`` or ``~user`` for the home directory of the current or the specified user. ``bufsize`` specifies the chunksize for reads from the file. def ``__iter__``​(``self``): ---------------------------- Produces an event stream of one ``"url"`` event followed by the ``"bytes"`` events for the data from the file. =========================== class ``URL``​(``object``): =========================== Provides parser input from a URL. def ``__init__``​(``self``, ``name``, ``bufsize``=``8192``, *``args``, **``kwargs``): ------------------------------------------------------------------------------------- Create a ``URL`` object. ``name`` is the URL. ``bufsize`` specifies the chunksize for reads from the URL. ``args`` and ``kwargs`` will be passed on to the ``open`` method of the URL object. The URL for the input will be the final URL for the resource (i.e. it will include redirects). def ``__iter__``​(``self``): ---------------------------- Produces an event stream of one ``"url"`` event followed by the ``"bytes"`` events for the data from the URL. ============================= class ``ETree``​(``object``): ============================= Produces a (namespaced) event stream from an object that supports the ElementTree API. def ``__init__``​(``self``, ``data``, ``url``=``None``, ``defaultxmlns``=``None``): ----------------------------------------------------------------------------------- Create an ``ETree`` object. Arguments have the following meaning: ``data`` An object that supports the ElementTree API. ``url`` The URL of the source. Defaults to ``"ETREE"``. ``defaultxmlns`` The namespace name (or a namespace module containing a namespace name) that will be used for all elements that don't have a namespace. def ``_asxist``​(``self``, ``node``): ------------------------------------- def ``__iter__``​(``self``): ---------------------------- Produces an event stream of namespaced parsing events for the ElementTree object passed as ``data`` to the constructor. =============================== class ``Decoder``​(``object``): =============================== Decode the ``bytes`` object produced by the previous object in the pipeline to ``str`` object. This input object can be a source object or any other pipeline object that produces ``bytes`` objects. def ``__init__``​(``self``, ``encoding``=``None``): --------------------------------------------------- Create a ``Decoder`` object. ``encoding`` is the encoding of the input. If ``encoding`` is ``None`` it will be automatically detected from the XML data. def ``__call__``​(``self``, ``input``): --------------------------------------- def ``__repr__``​(``self``): ---------------------------- =============================== class ``Encoder``​(``object``): =============================== Encode the ``str`` objects produced by the previous object in the pipeline to ``bytes`` objects. This input object must be a pipeline object that produces string output (e.g. a ``Decoder`` object). This can e.g. be used to parse a ``str`` object instead of a ``bytes`` object like this: >>> from ll.xist import xsc, parse >>> from ll.xist.ns import html >>> source = "Python" >>> doc = parse.tree( ... parse.String(source), ... parse.Encoder(encoding="utf-8"), ... parse.Expat(encoding="utf-8"), ... parse.NS(html), ... parse.Node(pool=xsc.Pool(html)), ... ) >>> doc.string() 'Python' def ``__init__``​(``self``, ``encoding``=``None``): --------------------------------------------------- Create an ``Encoder`` object. ``encoding`` will be the encoding of the output. If ``encoding`` is ``None`` it will be automatically detected from the XML declaration in the data. def ``__call__``​(``self``, ``input``): --------------------------------------- def ``__repr__``​(``self``): ---------------------------- ================================== class ``Transcoder``​(``object``): ================================== Transcode the ``bytes`` object of the input object into another encoding. This input object can be a source object or any other pipeline object that produces ``bytes`` events. def ``__init__``​(``self``, ``fromencoding``=``None``, ``toencoding``=``None``): -------------------------------------------------------------------------------- Create a ``Transcoder`` object. ``fromencoding`` is the encoding of the input. ``toencoding`` is the encoding of the output. If any of them is ``None`` the encoding will be detected from the data. def ``__call__``​(``self``, ``input``): --------------------------------------- def ``__repr__``​(``self``): ---------------------------- ============================== class ``Parser``​(``object``): ============================== Basic parser interface. ============================= class ``Expat``​(``Parser``): ============================= A parser using Pythons builtin ``expat`` parser. def ``__init__``​(``self``, ``encoding``=``None``, ``xmldecl``=``False``, ``doctype``=``False``, ``loc``=``True``, ``cdata``=``False``, ``ns``=``False``): ---------------------------------------------------------------------------------------------------------------------------------------------------------- Create an ``Expat`` parser. Arguments have the following meaning: ``encoding`` (string or ``None``) Forces the parser to use the specified encoding. The default ``None`` results in the encoding being detected from the XML itself. ``xmldecl`` (bool) Should the parser produce events for the XML declaration? ``doctype`` (bool) Should the parser produce events for the document type? ``loc`` (bool) Should the parser produce ``"location"`` events? ``cdata`` (bool) Should the parser output CDATA sections as ``"cdata"`` events? (If ``cdata`` is false ``"text"`` events are output instead.) ``ns`` (bool) If ``ns`` is true, the parser performs namespace processing itself, i.e. it will emit ``"enterstarttagns"``, ``"leavestarttagns"``, ``"endtagns"``, ``"enterattrns"`` and ``"leaveattrns"`` events instead of ``"enterstarttag"``, ``"leavestarttag"``, ``"endtag"``, ``"enterattr"`` and ``"leaveattr"`` events. def ``__repr__``​(``self``): ---------------------------- def ``__call__``​(``self``, ``input``): --------------------------------------- Return an iterator over the events produced by ``input``. def ``_event``​(``self``, ``evtype``, ``evdata``): -------------------------------------------------- def ``_flush``​(``self``, ``force``): ------------------------------------- def ``_getname``​(``self``, ``name``): -------------------------------------- def ``_handle_startcdata``​(``self``): -------------------------------------- def ``_handle_endcdata``​(``self``): ------------------------------------ def ``_handle_xmldecl``​(``self``, ``version``, ``encoding``, ``standalone``): ------------------------------------------------------------------------------ def ``_handle_begindoctype``​(``self``, ``doctypename``, ``systemid``, ``publicid``, ``has_internal_subset``): -------------------------------------------------------------------------------------------------------------- def ``_handle_enddoctype``​(``self``): -------------------------------------- def ``_handle_default``​(``self``, ``data``): --------------------------------------------- def ``_handle_comment``​(``self``, ``data``): --------------------------------------------- def ``_handle_text``​(``self``, ``data``): ------------------------------------------ def ``_handle_startelement``​(``self``, ``name``, ``attrs``): ------------------------------------------------------------- def ``_handle_endelement``​(``self``, ``name``): ------------------------------------------------ def ``_handle_procinst``​(``self``, ``target``, ``data``): ---------------------------------------------------------- ============================== class ``SGMLOP``​(``Parser``): ============================== A parser based on ``sgmlop``. def ``__init__``​(``self``, ``encoding``=``None``, ``cdata``=``False``): ------------------------------------------------------------------------ Create a ``SGMLOP`` parser. Arguments have the following meaning: ``encoding`` (string or ``None``) Forces the parser to use the specified encoding. The default ``None`` results in the encoding being detected from the XML itself. ``cdata`` (bool) Should the parser output CDATA sections as ``"cdata"`` events? (If ``cdata`` is false output ``"text"`` events instead.) def ``__repr__``​(``self``): ---------------------------- def ``__call__``​(``self``, ``input``): --------------------------------------- Return an iterator over the events produced by ``input``. def ``_event``​(``self``, ``evtype``, ``evdata``): -------------------------------------------------- def ``_flush``​(``self``, ``force``): ------------------------------------- def ``handle_comment``​(``self``, ``data``): -------------------------------------------- def ``handle_data``​(``self``, ``data``): ----------------------------------------- def ``handle_cdata``​(``self``, ``data``): ------------------------------------------ def ``handle_proc``​(``self``, ``target``, ``data``): ----------------------------------------------------- def ``handle_entityref``​(``self``, ``name``): ---------------------------------------------- def ``handle_enterstarttag``​(``self``, ``name``): -------------------------------------------------- def ``handle_leavestarttag``​(``self``, ``name``): -------------------------------------------------- def ``handle_enterattr``​(``self``, ``name``): ---------------------------------------------- def ``handle_leaveattr``​(``self``, ``name``): ---------------------------------------------- def ``handle_endtag``​(``self``, ``name``): ------------------------------------------- ========================== class ``NS``​(``object``): ========================== An ``NS`` object is used in a parsing pipeline to add support for XML namespaces. It replaces the ``"enterstarttag"``, ``"leavestarttag"``, ``"endtag"``, ``"enterattr"`` and ``"leaveattr"`` events with the appropriate namespace version of the events (i.e. ``"enterstarttagns"`` etc.) where the event data is a ``(namespace, name)`` tuple. The output of an ``NS`` object in the stream looks like this: >>> from ll.xist import parse >>> from ll.xist.ns import html >>> list(parse.events( ... parse.String(b"Python"), ... parse.Expat(), ... parse.NS(html) ... )) [('url', URL('STRING')), ('position', (0, 0)), ('enterstarttagns', ('http://www.w3.org/1999/xhtml', 'a')), ('enterattrns', (None, 'href')), ('text', 'http://www.python.org/'), ('leaveattrns', (None, 'href')), ('leavestarttagns', ('http://www.w3.org/1999/xhtml', 'a')), ('position', (0, 39)), ('text', 'Python'), ('endtagns', ('http://www.w3.org/1999/xhtml', 'a'))] def ``__init__``​(``self``, ``prefixes``=``None``, **``kwargs``): ----------------------------------------------------------------- Create an ``NS`` object. ``prefixes`` (if not ``None``) can be a namespace name (or module), which will be used for the empty prefix, or a dictionary that maps prefixes to namespace names (or modules). ``kwargs`` maps prefixes to namespaces names too. If a prefix is in both ``prefixes`` and ``kwargs``, ``kwargs`` wins. def ``__call__``​(``self``, ``input``): --------------------------------------- def ``url``​(``self``, ``data``): --------------------------------- def ``xmldecl``​(``self``, ``data``): ------------------------------------- def ``begindoctype``​(``self``, ``data``): ------------------------------------------ def ``enddoctype``​(``self``, ``data``): ---------------------------------------- def ``comment``​(``self``, ``data``): ------------------------------------- def ``text``​(``self``, ``data``): ---------------------------------- def ``cdata``​(``self``, ``data``): ----------------------------------- def ``procinst``​(``self``, ``data``): -------------------------------------- def ``entity``​(``self``, ``data``): ------------------------------------ def ``position``​(``self``, ``data``): -------------------------------------- def ``enterstarttag``​(``self``, ``data``): ------------------------------------------- def ``enterattr``​(``self``, ``data``): --------------------------------------- def ``leaveattr``​(``self``, ``data``): --------------------------------------- def ``leavestarttag``​(``self``, ``data``): ------------------------------------------- def ``endtag``​(``self``, ``data``): ------------------------------------ ============================ class ``Node``​(``object``): ============================ A ``Node`` object is used in a parsing pipeline to instantiate XIST nodes. It consumes a namespaced event stream: >>> from ll.xist import xsc, parse >>> from ll.xist.ns import html >>> list(parse.events( ... parse.String(b"Python"), ... parse.Expat(), ... parse.NS(html), ... parse.Node(pool=xsc.Pool(html)) ... )) [('enterelementnode', ), ('textnode', ), ('leaveelementnode', ) ] The event data of all events are XIST nodes. The element node from the ``"enterelementnode"`` event already has all attributes set. There will be no events for attributes. def ``__init__``​(``self``, ``pool``=``None``, ``base``=``None``, ``loc``=``True``): ------------------------------------------------------------------------------------ Create a ``Node`` object. ``pool`` may be ``None`` or a ``xsc.Pool`` object and specifies which classes used for creating element, entity and processsing instruction instances. ``base`` specifies the base URL for interpreting relative links in the input. ``loc`` specified whether location information should be attached to the nodes that get generated (the ``startloc`` attribute (and ``endloc`` attribute for elements)) property base: -------------- def ``__get__``​(``self``): """"""""""""""""""""""""""" def ``__call__``​(``self``, ``input``): --------------------------------------- def ``url``​(``self``, ``data``): --------------------------------- def ``xmldecl``​(``self``, ``data``): ------------------------------------- def ``begindoctype``​(``self``, ``data``): ------------------------------------------ def ``enddoctype``​(``self``, ``data``): ---------------------------------------- def ``entity``​(``self``, ``data``): ------------------------------------ def ``comment``​(``self``, ``data``): ------------------------------------- def ``cdata``​(``self``, ``data``): ----------------------------------- def ``text``​(``self``, ``data``): ---------------------------------- def ``enterstarttagns``​(``self``, ``data``): --------------------------------------------- def ``enterattrns``​(``self``, ``data``): ----------------------------------------- def ``leaveattrns``​(``self``, ``data``): ----------------------------------------- def ``leavestarttagns``​(``self``, ``data``): --------------------------------------------- def ``endtagns``​(``self``, ``data``): -------------------------------------- def ``procinst``​(``self``, ``data``): -------------------------------------- def ``position``​(``self``, ``data``): -------------------------------------- ============================ class ``Tidy``​(``object``): ============================ A ``Tidy`` object parses (potentially ill-formed) HTML from a source into a (non-namespaced) event stream by using lxml's HTML parser: >>> from ll.xist import parse >>> list(parse.events(parse.URL("http://www.yahoo.com/"), parse.Tidy())) [('url', URL('http://de.yahoo.com/?p=us')), ('enterstarttag', 'html'), ('enterattr', 'class'), ('text', 'y-fp-bg y-fp-pg-grad bkt708'), ('leaveattr', 'class'), ('enterattr', 'lang'), ('text', 'de-DE'), ('leaveattr', 'lang'), ('enterattr', 'style'), ('leaveattr', 'style'), ('leavestarttag', 'html'), ... def ``__init__``​(``self``, ``encoding``=``None``, ``xmldecl``=``False``, ``doctype``=``False``): ------------------------------------------------------------------------------------------------- Create a new ``Tidy`` object. Parameters have the following meaning: ``encoding`` (string or ``None``) The encoding of the input. If ``encoding`` is ``None`` it will be automatically detected by the HTML parser. ``xmldecl`` (bool) Should the parser produce events for the XML declaration? ``doctype`` (bool) Should the parser produce events for the document type? def ``__repr__``​(``self``): ---------------------------- def ``_asxist``​(``self``, ``node``): ------------------------------------- def ``__call__``​(``self``, ``input``): --------------------------------------- =============================== def ``events``​(*``pipeline``): =============================== Return an iterator over the events produced by the pipeline objects in ``pipeline``. ===================================================== def ``tree``​(*``pipeline``, ``validate``=``False``): ===================================================== Return a tree of XIST nodes from the event stream ``pipeline``. ``pipeline`` must output only events that contain XIST nodes, i.e. the event types ``"xmldeclnode"``, ``"doctypenode"``, ``"commentnode"``, ``"textnode"``, ``"enterelementnode"``, ``"leaveelementnode"``, ``"procinstnode"`` and ``"entitynode"``. If ``validate`` is true, the tree is validated, i.e. it is checked if the structure of the tree is valid (according to the ``model`` attribute of each element node), if no undeclared elements or attributes have been encountered, all required attributes are specified and all attributes have allowed values. The node returned from ``tree`` will always be a ``Frag`` object. Example: >>> from ll.xist import xsc, parse >>> from ll.xist.ns import xml, html, chars >>> doc = parse.tree( ... parse.URL("http://www.python.org/"), ... parse.Tidy(), ... parse.NS(html), ... parse.Node(pool=xsc.Pool(xml, html, chars)) ... ) >>> doc[0] ====================================================================================================================================================================================================================================================================================== def ``itertree``​(*``pipeline``, ``entercontent``=``True``, ``enterattrs``=``False``, ``enterattr``=``False``, ``enterelementnode``=``False``, ``leaveelementnode``=``True``, ``enterattrnode``=``True``, ``leaveattrnode``=``False``, ``selector``=``None``, ``validate``=``False``): ====================================================================================================================================================================================================================================================================================== Parse the event stream ``pipeline`` iteratively. ``itertree`` still builds a tree, but it returns an iterator of ``xsc.Cursor`` objects that tracks changes to the tree as it is built. ``validate`` specifies whether each node should be validated after it has been fully parsed. The rest of the arguments can be used to control when ``itertree`` returns to the calling code. For an explanation of their meaning see the class ``xsc.Cursor``. Example: >>> from ll.xist import xsc, parse >>> from ll.xist.ns import xml, html, chars >>> for c in parse.itertree( ... parse.URL("http://www.python.org/"), ... parse.Tidy(), ... parse.NS(html), ... parse.Node(pool=xsc.Pool(xml, html, chars)), ... selector=html.a/html.img ... ): ... print(c.path[-1].attrs.src, "-->", c.path[-2].attrs.href) https://www.python.org/static/img/python-logo.png --> https://www.python.org/