Serialization of Data

When two computers communicate, they send their data as a sequence of bits or bytes, which in turn are transformed into electrical or optical signals that then travel via cables or fibers. But how do we get such a sequence of bits from the data? Somehow we must ensure that data that is translated into a sequence of bits also can be translated back again once we receive them. For this, we need serialization methods.

If we just want to transfer text, there is for instance the ASCII standard that explains how letters, numbers and other signs are transformed into bits. A text (or string) is a sequence of characters, An A, for instance, is represented by the bits 100 0001. The Unicode standard expands ASCII to convert a much larger set of symbols, including Ê, ¯, and Â. Unicode also encodes emoji. To transfer text or emoji, we must hence know ho they are encoded (for instance which specific version of Unicode) and look up the bits and bytes in a table. (We serialized the data.) The other way round, when we receive bits and want to create letters from it, we deserialize the data, which means that we look up which letters the bits correspond to.

Serialization is not only useful when we want to transmit data from one computer to another. It is also useful when we want to store a data structure into a file. A file is storing a series of bytes, and serialization determines how we get these bytes from a data structure.

Now imagine we want to transfer more structured data, for instance the measurements of a sensor in a house. We would like to send the current time, the temperature and the humidity. To do that, we could of course send a string that just lists the different data items in a sequence, for instance:

12:05  20.0  54.3

When we then deserialize the data, i.e., construct the measurement values at the receiver side, we need to know in which sequence these values were and put everything correct back together. For a simple problem, this is possible, but you can imagine how many mistakes may happen here!

Serialization of Data With JSON

Instead of inventing our method of putting data into a data sequence each time, we use a more robust and standardized way of packing the data together. Have a look at the following:

{"time": "12:05", "temperature": 20.0, "humidity": 54.3}

This way of writing data is called JSON, pronounced "Jason," and an abbreviation for JavaScript Object Notation, because it was originally inspired by how JavaScript serialized its objects. It may also remind you of a Python dictionary. The example above is only a very simple one. The data can also be nested, i.e., a data field can refer to a JSON structure that for instance declares the data fields separate from each other or adds the information in which unit the measurement is given, for instance as follows:

{"time": {"hour":12, 
		  "minutes": 5,
		  "seconds": 0}, 
 "temperature": {"value": 20.0, 
				 "unit": "celcius"}, 
 "humidity": {"value": 54.3, 
			  "unit": "percent"}}

There's a lot of history in such data formats, and sometimes it's not even technical reasons why one is used or the other, or how they look in detail. Another format used for similar purposes is called XML, or Extensible Markup Language. In this language, the same piece of data could look as follows:

<dataset>
	<data name="time" value="12:05"/>
	<data name="temperature" value="20.0"/>
	<data name="humidity" value="54.3"/>
</dataset>

XML looks a lot more like the HTML that is used to encode a website. Since JSON is a bit simpler to use, we will stick with it in the following.

JSON in Python

For Python there is a library to convert data structures into JSON and back again. This is how you import the library:

import json
import json

We assume that the data structure is stored as a Python dictionary:

measurement = {'time': '12:05', 'temperature': 20.0, 'humidity': 54.3}

There is an operation json.dumps() to create the JSON-serialized string out of a data structure.

string = json.dumps(measurement)
print(string)

The output will be the following:

{"time": "12:05", "humidity": 54.3, "temperature": 20.0}

The resulting string in JSON almost looks like the original Python string. This is because Python dictionaries look very similar to JSON, but this also works for more complex data types.

The other way round will also work; The function json.loads() takes a JSON string and creates a Python data type from it. For instance, we can do the following:

measurement = json.loads('{"time": "12:05", "humidity": 54.3, "temperature": 20.0}')

The result is that the variable measurement now has data of type dict. Now we can work on the data like with any other Python data type, and integrate it in our application.

So, simplified said, with JSON we can convert any Python data type into a string that follows the JSON specification. It is then easy to store or transmit this string (as a series of bytes), and transform it back into data again.

Edit this page