In the following post I want to discuss how to deal with empty or missing fields in messages from the perspective of the receiver. I will try to explain the problem with the following scenario.
Scenario
There is a ‘sending system’ which sends data in messages to a ‘receiving system’. In my example the sending system sends demographic data of persons to the receiving system. The sending system may be a patient registration system in a hospital and the receiving system a statistics tool. How the messages are sent is not relevant in this scenario but we can assume that the messages are all sent via HTTP POST-requests.
Sending a message
If the sending system has new data it creates a message and sends it to the receiving system via a HTTP POST-request.
The following data structure (simple XML) is sent (via HTTP-request) to the receiving system. Somebody has entered new data in the sending system and triggered the message sending process.
<person> <id>1234</id> <name>John</name> <surname>Doe</surname> <birthdate>1980-01-01</birthdate> <city>Gotham City</city> </person>
Receiving a message
On the other side the receiving system receives the message and writes it into its database.
The data from the message is mapped into a table of a relational database management system like (MariaDB or PostgreSQL).
The following table shows the simple mapping from the XML-message to the database table.
Mapping from the XML to the database table | |
XML-Path | Column name |
person.id | ID |
person.name | NAME |
person.surname | SURNAME |
person.birthdate | BIRTHDATE |
person.city | CITY |
When the data from the message is stored in the database the receiving applications uses the following (simple) algorithm:
- check if there is already a entry in the database with the corresponding id
- if yes: update the entry with the data from the message (SQL UPDATE)
- if no: insert the data from the message (SQL INSERT)
This algorithm leads to the following table.
ID | NAME | SURNAME | BIRTHDATE | CITY |
1234 | John | Doe | 1980-01-01 | Gotham City |
We assume further that the table for the data in the receiving system uses the column ID as unique identifier and all the other columns do not have restrictions. They can be filled or not and there is no data validation (e.g. for date format).
Empty or missing fields
The scenario above seems to be quiet simple but what happens when you receive messages with empty fields or with missing fields. The following table shows two examples of messages you could receive. (Pay attention to the fact that the name has changed from John to Bob.)
Empty fields | Missing field |
<person> <id>1234</id> <name>Bob</name> <surname></surname> <birthdate></birthdate> <city></city> </person> |
<person> <id>1234</id> <name>Bob</name> <surname>Doe</surname> <!-- birthdate is missing --> <city>Gotham City</city> </person> |
Empty field case
The first case I am looking at is the ’empty field’ case. We assume the table is already filled with the data from the scenario above. When you receive a message with an empty field as an UPDATE (you already have a row with the corresponding id in your table) for your database you have to decide what the meaning of this empty field is. The most obvious assumption is maybe that the name changed from John to Bob and the surname,birthdate and city were deleted in the sending system. So the consequence for your implementation is that you set the values in your database to NULL or an empty string “”. This would result in the following database table.
ID | NAME | SURNAME | BIRTHDATE | CITY |
1234 | Bob |
A other assumption can be that the name is the only field which was changed and the sending system only sent the values which have to be updated. This would lead in the following table.
ID | NAME | SURNAME | BIRTHDATE | CITY |
1234 | Bob | Doe | 1980-01-01 | Gotham City |
Here you can see there can be a huge difference in the resulting data your database based on the interpretation of empty fields.
One reason for the second interpretation could be that there is so much more data than in our simple example which could not submitted completely at every single small change.
Conclusion to the empty field case
You can submit all of your data at every single request then you don’t have to care about empty and missing fields.
But when you are submitting differences (maybe because the complete data is to big to be sent at every request) you need a possibility to delete values explicitly. This can be done through an extra attribute in an XML-tag. In the following example you would know that the name changed and the city was deleted.
<person> <id>1234</id> <name>Bob</name> <surname></surname> <birthdate></birthdate> <city delete='yes'></city> <person>
Missing field case
We are starting here at the same point as in the empty field case above. The table in the receiving system is filled like in the scenario above.
We can treat the missing field (birthdate) in different ways. I think the most obvious assumption would be that in this case it is depending on the libraries you use. When you use an XML-library to parse the the message from the sending system this library can act in different ways when you try to access the missing field. The next examples shows some possible results when you try to access the missing field in the implementation of the receiving system:
- throw a exception
- return NULL
- return an empty string
If your library throws a exception maybe you decide to ignore the missing field during the update process which will lead to the following table.
ID | NAME | SURNAME | BIRTHDATE | CITY |
1234 | Bob | Doe | 1980-01-01 | Gotham City |
If your library returns Null or an empty string the resulting table may look like the following one.
ID | NAME | SURNAME | BIRTHDATE | CITY |
1234 | Bob | Doe | Gotham City |
Conclusion to the missing field case
There is a significant difference between the results based on the treatment of missing fields. You can ignore them and keep the data which is already stored in the receiving systems database or you can delete the data in the corresponding database field.
Conclusion
It is important to consider how to work with empty or missing fields even in simple message based data exchange. We have seen when there is no clear definition how to handle these cases there is a huge difference in the results.
A possible definition for treatment of empty or missing fields in our example could be the following:
- If a field is empty -> delete the data in the receiving system
- If a field is missing -> ignore it and keep the value you already have in your database
This would be a simple but clear definition which could also be formulated in the following sentence: Take the data (and only the data which is in the message) you receive and write it to your database – even if the data is NULL or a empty string. With this simple definition we could avoid different understandings about empty or missing fields. You can delete data in the receiving system when you put a empty tag in the message or you can update a single specific field with a message which contains only the identifier and a second data field.
I think the best way to avoid misunderstandings about empty and missing fields in messaging is to make the action explizit like in the following example where a delete attribute is set in the city tag. A further advantage is that if you look at the message you can easily see that there is a field city which should be deleted.
<person> <id>1234</id> <name>Bob</name> <surname></surname> <birthdate></birthdate> <city delete='yes'></city> <person>