Processing XML in Python with minidom

I recently needed to programmatically modify some Tomcat context files related to Solr which are encoded as XML. As python is quick to use for scripting while being more comfortable than the Bash for such tasks, I decided to use the module xml.dom.minidom, which is shipped with every python distribution.

In the following, I describe some use cases for working with XML using minidom. I will use the following example file /tmp/test.xml:

<?xml version="1.0" ?>
<Context docBase="changeme">
   <Environment name="solr/home" value="changeme"/>
   <Environment name="other_env_var" value="123"/>
</Context>

If we want to edit an existing XML file, we first need to read this file:

import xml.dom.minidom 
dom = xml.dom.minidom.parse("/tmp/test_template.xml")

Afterwards, I fetch the Context element and check for the attribute docBase. I do not check whether the document contains a Context element, as I always use the same template file to work with:

context_element = dom.getElementsByTagName("Context")[0] # We assume there exists exactly one Context element 
context_element.setAttribute("docBase", "/srv/tomcat/webapps/solr.war") # Update of attribute docBase
print(context_element.getAttribute("docBase"))

In the previous case we assumed that the Context element we (hopefully) got back from getElementsByName was the right one. What if we have mulitple elements and want to select a specific one? In the following example, we want to select a certain Environment element inside Context which has an attribute of name name with a value solr/home. We use the powerful filter function for finding the correct element. The filter function accepts a list of elements and a function which maps from the element type to True/False, thereby deciding whether an element shall be in the output or not.

env_elements = dom.getElementsByTagName("Environment")
home_element = filter(lambda e: e.getAttribute("name") == "solr/home", env_elements)[0] # Again, the check for existence of the home_element is omitted here
home_element.setAttribute("value","/src/solr/")
print(home_element.getAttribute("value")

Finally, we need to write out the file

dom.writexml(open("/tmp/test.xml", "w"))

or we may inspect the generated XML code before writing it to file:

print(dom.toxml())

Links

Leave a Reply