Shimul Chowdhury

Creating XML document in Python - without losing your mind

When working with parsing and building XML documents or working with SOAP in python, we often use xml.etree.ElementTree or xml.dom.minidom API. Those are really powerful and feature-complete APIs to work with any sort of XML Documents. But those don’t have the most intuitive API for building XML trees. For example, let’s build the following XML document -

<?xml version="1.0" encoding="UTF-8"?>
<blog>
    <author>
        <name>John Doe</name>
        <email>[email protected]</email>
    </author>
    <articles>
        <article published="true">
            <title>Article 1</title>
            <body>Article Body</body>
        </article>
        <article published="true">
            <title>Article 2</title>
            <body>Article Body</body>
        </article>
    </articles>
<blog>

The python code to build the document will look something like the following -

from xml.dom import minidom

articles = [
    {
        'title': 'Article 1',
        'body': 'Article Body',
        'published': True
    },
    {
        'title': 'Article 2',
        'body': 'Article Body',
        'published': True
    }
]

root = minidom.Document()

blogElem = root.createElement('blog')
authorElem = root.createElement('author')

nameElem = root.createElement('name')
nameElem.appendChild(root.createTextNode('John Doe'))
authorElem.appendChild(nameElem)

emailElem = root.createElement('email')
emailElem.appendChild(root.createTextNode('[email protected]'))
authorElem.appendChild(emailElem)

articlesElem = root.createElement('articles')

for article in articles:
    articleElem = root.createElement('article')
    articleElem.setAttribute('published', 'true' if article['published'] else 'false')

    titleElem = root.createElement('title')
    titleElem.appendChild(root.createTextNode(article['title']))
    articleElem.appendChild(titleElem)

    bodyElem = root.createElement('body')
    bodyElem.appendChild(root.createTextNode(article['body']))
    articleElem.appendChild(bodyElem)

    articlesElem.appendChild(articleElem)

blogElem.appendChild(authorElem)
blogElem.appendChild(articlesElem)

root.appendChild(blogElem)
print(root.toprettyxml())

The code above works, it gets its job done. But it’s very hard to understand the hierarchical relation between tags. Also, the code is very procedural. Imagine building a large complex XML document using this approach. It will be very hard for people, who wants to change something.

We can do better. Let’s take the idea from React or hyperscript and make a utility to deal with this complexity -

def element_builder(xml_doc):
    """
    Given a document returns a function to build xml elements.
    Args:
        xml_doc (xml.dom.minidom.Document)
    Returns:
        element (func)
    """

    def element(name, children, attributes=None):
        """
        An utility to help build xml tree in a managable way.
        Args:
            name (str) - tag name
            children (str|list|xml.dom.minidom.Element)
            attributes (dict)
        Returns:
            elem (xml.dom.minidom.Element)
        """

        elem = xml_doc.createElement(name)

        # set attributes if exists
        if attributes is not None and isinstance(attributes, dict):
            [elem.setAttribute(key, val) for key, val in attributes.items()]

        # set children if exists
        if children is not None:
            if isinstance(children, list) or isinstance(children, tuple):
                [elem.appendChild(c) for c in children]
            elif isinstance(children, str):
                elem.appendChild(xml_doc.createTextNode(children))
            else:
                elem.appendChild(children)

        return elem

    return element

The utility function takes an XML document and returns another function that can create elements, set properties and append child elements. All these functionalities using a simple function call.

After using this utility function the code looks like -

from xml.dom import minidom
from .utility import element_builder


articles = [
    {
        'title': 'Article 1',
        'body': 'Article Body',
        'published': True
    },
    {
        'title': 'Article 2',
        'body': 'Article Body',
        'published': True
    }
]

root = minidom.Document()
el = element_builder(root)

elem = el('blog', [
    el('author', [
        el('name', 'John Doe'),
        el('email', '[email protected]')
    ]),
    el('articles', [
        el('article', [
            el('title', article['title']),
            el('body', article['body']),
        ], {'published': 'true' if article['published'] else 'false'})
        for article in articles
    ])
])

root.appendChild(elem)
print(root.toprettyxml())

Look at the difference. It is way easier to reason about the hierarchical relation between the tags. It is the same way the XML itself defines the structure of the document.

Hope this helps you, cheers!

programmingpythonetreetips & tricks