Two common data formats today are XML and JSON. In Python, we can convert between these two data formats. This article will use the xmltodict module to convert XML to JSON in Python.
First of all, you should read the following articles on XML and JSON in Python:
- Read XML file with Python
- Write XML file with Python
- Read JSON file with Python
- Write JSON file with Python
1. Installing the xmltodict module in Python
The xmltodict module helps convert XML data to JSON format in Python. This module is not available in Python. For using xmltodict module, we need to install this module with the following command:
pip install xmltodict
To learn more about this library, you can visit the project’s website xmltodict 0.12.0.
You can read the article Installing Python and programming environment with Visual Studio Code to know how to install Python libraries in Visual Studio Code.
2. Converting XML data to JSON data in Python
We have a few steps to convert XML to JSON in Python:
Step 1. Open and read the data in the XML file to get the XML string.
Step 2. Convert the XML string to a dictionary in Python with the xmltodict.parse()
function.
Step 3. Convert the dictionary to a JSON string in Python.
Suppose we have a file info.xml
with the content below.
<website>
<domainname>gochocit.com</domainname>
<active>True</active>
<numberposts>360</numberposts>
<category>
<item>hardware</item>
<item>software</item>
<item>network</item>
</category>
<facebookpage>
https://www.facebook.com/gochocit/
</facebookpage>
<build>
<language>php</language>
<cms>wordpress</cms>
<database>mysql</database>
</build>
</website>
The file info.xml above only contains tags without attributes. The code below helps convert the XML data in the info.xml file into JSON.
import xmltodict, json
# read file xml
with open("info.xml") as file:
data_xml = file.read()
# convert xml string to dictionary
data_dict = xmltodict.parse(data_xml)
# convert dictionary to json string
data_json = json.dumps(data_dict, indent=4)
# print xml string, dictionary, json string
print("type of data_xml:", type(data_xml))
print("type of data_dict:", type(data_dict))
print("type of data_json:", type(data_json))
print(data_json)
Result
type of data_xml: <class 'str'>
type of data_dict: <class 'collections.OrderedDict'>
type of data_json: <class 'str'>
{
"website": {
"domainname": "gochocit.com",
"active": "True",
"numberposts": "360",
"category": {
"item": [
"hardware",
"software",
"network"
]
},
"facebookpage": "https://www.facebook.com/gochocit/",
"build": {
"language": "php",
"cms": "wordpress",
"database": "mysql"
}
}
}
As we can see, the tags in XML will be converted to “key” and the text of the tags will be converted to “value” in JSON.
Suppose, the file info1.xml
contains tags and the post attribute of the item tag as below.
<website>
<domainname>gochocit.com</domainname>
<active>True</active>
<numberposts>360</numberposts>
<category>
<item post="50">hardware</item>
<item post="150">software</item>
<item post="17">network</item>
</category>
<facebookpage>https://www.facebook.com/gochocit/</facebookpage>
<build>
<language>php</language>
<cms>wordpress</cms>
<database>mysql</database>
</build>
</website>
What will attributes in XML be converted to? Let’s see the result of the code below to convert XML data in the file info1.xml to JSON to find the answer.
import xmltodict, json
# read file xml
with open("info1.xml") as file:
data_xml = file.read()
# convert xml string to dictionary
data_dict = xmltodict.parse(data_xml)
# convert dictionary to json string
data_json = json.dumps(data_dict, indent=4)
# print xml string, dictionary, json string
print("type of data_xml:", type(data_xml))
print("type of data_dict:", type(data_dict))
print("type of data_json:", type(data_json))
print(data_json)
Result
type of data_xml: <class 'str'>
type of data_dict: <class 'collections.OrderedDict'>
type of data_json: <class 'str'>
{
"website": {
"domainname": "gochocit.com",
"active": "True",
"numberposts": "360",
"category": {
"item": [
{
"@post": "50",
"#text": "hardware"
},
{
"@post": "150",
"#text": "software"
},
{
"@post": "17",
"#text": "network"
}
]
},
"facebookpage": "https://www.facebook.com/gochocit/",
"build": {
"language": "php",
"cms": "wordpress",
"database": "mysql"
}
}
}
As can be seen, the post attribute of the item tag is converted to key “@post” and the text of the item tag is converted to key “#text”.