Convert XML to JSON in Python

This post is lesson 46 of 54 in the subject Python Programming Language

Two common data formats today are XML and JSON. In Python, we can convert between these two data formats. This article will use the xmltodict module to convert XML to JSON in Python.

First of all, you should read the following articles on XML and JSON in Python:

1. Installing the xmltodict module in Python

The xmltodict module helps convert XML data to JSON format in Python. This module is not available in Python. For using xmltodict module, we need to install this module with the following command:

pip install xmltodict

To learn more about this library, you can visit the project’s website xmltodict 0.12.0.

You can read the article Installing Python and programming environment with Visual Studio Code to know how to install Python libraries in Visual Studio Code.

2. Converting XML data to JSON data in Python

We have a few steps to convert XML to JSON in Python:

Step 1. Open and read the data in the XML file to get the XML string.

Step 2. Convert the XML string to a dictionary in Python with the xmltodict.parse() function.

Step 3. Convert the dictionary to a JSON string in Python.

Suppose we have a file info.xml with the content below.

<website>
	<domainname>gochocit.com</domainname>
	<active>True</active>
	<numberposts>360</numberposts>
	<category>
		<item>hardware</item>
		<item>software</item>
		<item>network</item>
	</category>
	<facebookpage>
		https://www.facebook.com/gochocit/
	</facebookpage>
	<build>
		<language>php</language>
		<cms>wordpress</cms>
		<database>mysql</database>
	</build>
</website>

The file info.xml above only contains tags without attributes. The code below helps convert the XML data in the info.xml file into JSON.

import xmltodict, json

# read file xml
with open("info.xml") as file:
    data_xml = file.read()
# convert xml string to dictionary
data_dict = xmltodict.parse(data_xml)
# convert dictionary to json string
data_json = json.dumps(data_dict, indent=4)
# print xml string, dictionary, json string
print("type of data_xml:", type(data_xml))
print("type of data_dict:", type(data_dict))
print("type of data_json:", type(data_json))
print(data_json)

Result

type of data_xml: <class 'str'>
type of data_dict: <class 'collections.OrderedDict'>
type of data_json: <class 'str'>
{
    "website": {
        "domainname": "gochocit.com",
        "active": "True",
        "numberposts": "360",
        "category": {
            "item": [
                "hardware",
                "software",
                "network"
            ]
        },
        "facebookpage": "https://www.facebook.com/gochocit/",
        "build": {
            "language": "php",
            "cms": "wordpress",
            "database": "mysql"
        }
    }
}

As we can see, the tags in XML will be converted to “key” and the text of the tags will be converted to “value” in JSON.

Suppose, the file info1.xml contains tags and the post attribute of the item tag as below.

<website>
	<domainname>gochocit.com</domainname>
	<active>True</active>
	<numberposts>360</numberposts>
	<category>
		<item post="50">hardware</item>
		<item post="150">software</item>
		<item post="17">network</item>
	</category>
	<facebookpage>https://www.facebook.com/gochocit/</facebookpage>
	<build>
		<language>php</language>
		<cms>wordpress</cms>
		<database>mysql</database>
	</build>
</website>

What will attributes in XML be converted to? Let’s see the result of the code below to convert XML data in the file info1.xml to JSON to find the answer.

import xmltodict, json

# read file xml
with open("info1.xml") as file:
    data_xml = file.read()
# convert xml string to dictionary
data_dict = xmltodict.parse(data_xml)
# convert dictionary to json string
data_json = json.dumps(data_dict, indent=4)
# print xml string, dictionary, json string
print("type of data_xml:", type(data_xml))
print("type of data_dict:", type(data_dict))
print("type of data_json:", type(data_json))
print(data_json)

Result

type of data_xml: <class 'str'>
type of data_dict: <class 'collections.OrderedDict'>
type of data_json: <class 'str'>
{
    "website": {
        "domainname": "gochocit.com",
        "active": "True",
        "numberposts": "360",
        "category": {
            "item": [
                {
                    "@post": "50",
                    "#text": "hardware"
                },
                {
                    "@post": "150",
                    "#text": "software"
                },
                {
                    "@post": "17",
                    "#text": "network"
                }
            ]
        },
        "facebookpage": "https://www.facebook.com/gochocit/",
        "build": {
            "language": "php",
            "cms": "wordpress",
            "database": "mysql"
        }
    }
}

As can be seen, the post attribute of the item tag is converted to key “@post” and the text of the item tag is converted to key “#text”.

5/5 - (1 vote)
Previous and next lesson in subject<< How to Write a JSON File in PythonConvert JSON to XML in Python >>

Leave a Reply

Your email address will not be published. Required fields are marked *